The Multi-Modal Computer Vision (MMCV) lab, led by Cheng Peng, fundamentally seeks to understand how machines can perceive and understand the world through visual signals. These signals can be visible (e.g., from camera sensors) or invisible (e.g., from X-ray machines in hospitals or hyper-spectral sensors from satellites). Relevant tasks for machine perception are 3D reconstruction and generation of the world - often referred to colloquially as “World Models”.
Additionally, MMCV also works on machine understanding - once AI can observe the world, how can it come to conclusions? Various tasks such as biometrics, robotic planning, and Visual-Language understanding are examples of interest. The ultimate goal of MMCV is to create AI agents that can interact with the real world through visual understanding and embodiment.
Research Tags: Computer Vision, Computational Imaging, Embodied AI, Medical Image Analysis
