Summing up your requirements:
- a perspective camera
- 2D HTML text in front (i.e.: on top) of 3D scene
- shifting of the world not allowed
- translation/rotation of camera preferred, to see what’s “behind” 2D HTML text
the only solution I can come up with would look something like this, with the light blue representing the camera frustum seen from above (mockup not true to scale):
What you’d be actually ending up with amounts to this (reddish frustum):
If your scene contains some axis-aligned objects, it would be inevitable to have perspective foreshortening on the left side of your scene. I’m not sure if you’d be willing to accept that.

