How can this be achieved using orthographic camera?
Edit fiddle - JSFiddle - Code Playground

When using a perspective camera, zooming with OrbitControls is achieved by dollying. Meaning the camera is translated along the viewing direction.

When using an orthographic camera, the zoom value of the projection matrix is changed (meaning the camera keeps its position). Iā€™m not aware of code that is able to transform the current zoom value into something that you can apply to your cube so you get the same result as with your first fiddle.

Is there any other way i can achieve this? Maybe overlapping 2 scenes, the one in the front with the static object (and the rest of the scene being transparent) and in the back the main scene?

Have you already tried using a screen-space sprite? It seems the effect would be identical. Check out how the red sprites are implemented in: three.js webgl - sprites