Here are a few simple ideas (I have used them in a project that checks closeness of tens of thousands of moving objects):
remove everything that can be moved outside the loop (e.g. in the loop you create Matrix4, but it could be created once and then reused)
reduce the computations inside the loop (e.g. matrix inversion and multiplications are CPU-heavy operations – do you always need them)
add some quick tests (e.g. if objects are too far from each other, there is no need to check them)
use the fact that collisions are symmetrical, if A collides with B, then B also collides with A – this can reduce the checks by 50%
you may distribute collision test along the time, e.g. one frame you test the first 1000 objects, the next frame you test the next 1000 objects and so on
And here are some more complex ideas:
tailor the checks to the shapes – e.g. if you have only thin and long cylinders, you can check the distance between the lines along the cylinders and only if it is too small, then continue with the other checks
use physics engine with optimized collision tests
move the collision check to a webworker
move the collision check to the GPU
As I have never needed to use BVH, there might be some functionality in three-mesh-bvh that makes such tests faster, as it is the nature of BVH to split space into subspaces and achieve time complexity lower than O(n^2). So I hope someone will join the discussion and provide more useful advices.