Spatial Intelligence: The Future of AI

Images from assistant professor Lei Li's artificial intelligence lab that are AI-generated, including a person walking on a bridge, a boxy diagram, and a model of an office.
Lei Li's GenZi (top left), LT3SD (bottom left), and MeshArt (right) showcase the next generation of AI-driven 3D creation.

If you wake up in the middle of the night, chances are you will be able to reach your bedroom light switch pretty quickly, even if the room is pitch black. But what about a hotel room or a friend’s house? Have you ever woken up in an unfamiliar space and fumbled around looking for the light or the doorknob, only to find that you were nowhere near where you thought you were?

At home, your spatial intelligence is probably very good. In a new hotel room, you haven’t learned how to navigate without visual cues, so you might find yourself pawing the walls of the closet before you realize your mistake. In time and with practice, you will probably create a map of the room in your mind that you can access even in total darkness.

This ability to understand the relationship between objects in space develops early in life, and it’s something most of us take for granted. We catch balls, pour drinks, ride escalators, and do countless activities that require this ability to anticipate the way objects exist and move in space.

Are we alone in our ability to understand spatial relationships, or can machines replicate this understanding? Artificial intelligence systems are increasing their capabilities by the second. Large language models (LLMs) like ChatGPT, which focus on processing and creating text, are gaining popularity worldwide. Multimodal large language models (MLLMs) like Claude and Gemini take AI to the next level with their ability to interpret images and audio. With new apps, models, and tools popping up daily, AI systems are delivering increasingly sophisticated performance — except in one area.

AI’s Blind Spot

As it happens, machines emulating spatial awareness — information that goes well beyond language — is well outside of their current reach.

Fei-Fei Li, often called the “Godmother of AI,” is a pioneering computer scientist and professor at Standford University who cofounded World Labs, an AI company that focuses on spatial intelligence. “State-of-the-art MLLM models rarely perform better than chance on estimating distance, orientation, and size — or “mentally” rotating objects by regenerating them from new angles,” Fei-Fei Li wrote in her Substack post, “From Words to Worlds: Spatial Intelligence is AI’s Next Frontier.” “They can’t navigate mazes, recognize shortcuts, or predict basic physics. AI-generated videos — nascent and yes, very cool — often lose coherence after a few seconds.”

Image
masters of data science students throwing and catching a cube in a conference room.
MSDS students demonstrating spatial awareness.

If you throw a ball to a toddler, they may not get the timing right, but they will likely show you their basic understanding of physics by reaching their arms out to try and catch the ball. With time, most children can play catch, a basic activity that requires complex spatial understanding, reliably, knowing just when the ball that’s coming toward them is within reach.

A Whole New World (Model)

Why is it so hard for a machine to replicate this spatial intelligence? “Building spatially intelligent AI requires something even more ambitious than LLMs: world models, a new type of generative models whose capabilities of understanding, reasoning, generation and interaction with the semantically, physically, geometrically and dynamically complex worlds — virtual or real — are far beyond the reach of today’s LLMs,” Fei-Fei Li said.

At World Labs, Fei-Fei Li and her colleagues are exploring this next frontier of AI by building these foundational world models that can interact with the 3D world.

Here at the University of Virginia, assistant professor Lei Li created the Spatial AI Lab with a similar mission. “Our goal is really to develop the machines to perceive, understand, and interact with the 3D world,” Lei Li said. He joined the School of Data Science in August 2025 after conducting postdoctoral research at the Technical University of Munich. “We are working at the intersection of computer vision, computer graphics, and AI…so basically combining our expertise in these areas to build our foundation for spatial AI.”

Lei Li is recruiting doctoral students to build general-purpose systems with a grounded 3D spatial intelligence. To make this possible, his research focuses on developing innovative machine learning techniques combined with geometric and physical principles to solve entirely new spatial perception and generation tasks across scientific disciplines.

Mapping the Future of Discovery

When LLMs can already do so much, why extend their capabilities to mimic a human understanding of spatial relationships? What is it about understanding how objects move and interact in space that could be so pivotal in the field of AI?

“You need to understand the spatiotemporal structures of proteins and cells so you can discover new drugs,” Lei Li said. “If you’re an archeology researcher, you might want to reconstruct artifacts based on only a few fragmented fossils.” There are countless other potential applications. In dermatology, Lei Li said that spatial AI could help predict skin cancer growth patterns by analyzing how previous limited thermoscopic scans have changed over time. In orthodonture, spatial AI could help predict tooth movements. In digital manufacturing, spatial AI could help us understand if a mechanical structure is stable.

“Without spatial intelligence, today’s AI is sort of detached from the physical world,” Lei Li said. “It just tries to understand the world through languages, through images, but it doesn’t really have a true understanding of spatial structures or 3D geometry.”

When Lei Li was conducting postdoctoral research in Munich, he created multiple 3D environments that show the beginnings of how machines can understand the dynamics of the environment. In MeshPad, users can draw a simple object from scratch, and the program converts the line drawing to 3D, where they can rotate the object or modify its design. 

 

Other programs he’s created include MeshArt for articulated 3D object generation, LT3SD for large-scale, high-fidelity 3D scene generation, and GenZI for synthesizing realistic human interactions with 3D environments.

The scene generation models resemble video game landscapes where players can navigate through space. But Lei Li’s models are actually quite different from Minecraft or similar games. “Video games and virtual environments are created by special artists, which demands a significant amount of effort,” Lei Li said. With spatially intelligent AI, the average person could create a high-fidelity immersive environment with applications across the health industry, the business sector, the education industry, manufacturing, and beyond.

Building Responsibly

But like every new technology, spatial AI is not without its risks. To ensure that this technology is used to benefit society, Lei Li says we need to be transparent about how the models are trained and what kind of data we’re using. “The emphasis should be, we are developing AI models that can assist humans with their tasks rather than replace humans,” Lei Li said. “I think that’s really important.”

I interviewed Lei Li at the beginning of Lunar New Year, welcoming in the Year of the Horse. In Chinese culture, the horse symbolizes energy, passion, and rapid forward momentum. Lei Li feels like this could not be more appropriate when he thinks about recent technological advances. “I feel very excited about this year,” he said. “I hope to see a lot of breakthroughs in the field of spatial AI.”

Author

Writer and Editorial Specialist