The idea is that by training a model on vast amounts of data (e.g., videos, images, sensory inputs), it can learn to capture the underlying dynamics and physics of the world, much like how language models learn to capture linguistic patterns.
While these models may not represent their understanding in the same way as traditional physics equations, their ability to generate realistic simulations demonstrates a form of implicit understanding of the world's dynamics. As computational resources continue to increase, the potential for world models to capture increasingly complex phenomena grows. Just as language is considered "solved" by large language models, some researchers believe that with enough data and compute power, we may be able to solve the simulation of physical processes through these statistical world models.