Hardware requirements for headless rendering

Im about building a cloud server that join and render multiple .FBX files into video scenes. FBX files are dynamic, and scenes are different so.

The rendering part will be a headless browser that capture multiple images and to be joined in a video later.

My question, what are the server specifications can make this faster? In another word, higher RAM or CPU?

Also, if you have a better way to do this please feel free to give an advice.