Spatial is a different concept. It is real world, 3D learning. That will require robots and such. LLMs reside and learn from text, video, and images, all 2D media.
I imagine spatial will require more compute in the early days. There is a lot of data that will be flowing through, mostly video.
i agree that the data will be mostly videos. Infact the only reason AI videos have dominated AI images is because AI video generators are not as common and advanced yet but they will be