Spent some more time on the problem. Figured out correct reflection bias, also finally implemented light sampling. Learned that if you have path tracing - directional light looks really really bad compared to sun (disk, basically).
Pretty happy with what I got, I only need diffuse lighting for light probes, so this is enough.
Performance is at a good point too, ~75,000 paths per second, up to 16 bounces. For reference, 10,000 was the target.