Yichong Lu

I am a second-year Master's student in Zhejiang University supervised by Yiyi Liao and Huimin Yu. I was lucky to have research collaboration with Prof. Andreas Geiger. Before that, I obtained my Bachelor degree in Zhejiang University at 2023.

My research interest lies in 3D computer vision, including scene understanding, 3D reconstruction and 3D generative models.

Feel free to contact me via email for cooperation!

news

Jun 30, 2025	We are organizing the RealADSim Workshop & Challenges @ ICCV 2025!
Feb 27, 2025	UrbanCAD is accepted to CVPR 2025.

selected publications

Full publication list can be found on Google Scholar.
^*equal contribution; ^♯corresponding author.

Orientation Matters: Making 3D Generative Models Orientation-Aligned

Lu, Yichong^*, Tian, Yuzhuo^*, Jiang, Zijin^*, Zhao, Yikun, Yang, Yuanbo, Ouyang, Hao, Hu, Haoji, Yu, Huimin, Shen, Yujun, and Liao, Yiyi^♯

Arxiv 2025

Abs PDF Website Video Code

Humans intuitively perceive object shape and orientation from a single image, guided by strong priors about canonical poses. However, existing 3D generative models often produce misaligned results due to inconsistent training data, limiting their usability in downstream tasks. To address this gap, we introduce the task of orientation-aligned 3D object generation: producing 3D objects from single images with consistent orientations across categories. To facilitate this, we construct Objaverse-OA, a dataset of 14,832 orientation-aligned 3D models spanning 1,008 categories. Leveraging Objaverse-OA, we fine-tune two representative 3D generative models based on multi-view diffusion and 3D variational autoencoder frameworks to produce aligned objects that generalize well to unseen objects across various categories. Experimental results demonstrate the superiority of our method over post-hoc alignment approaches. Furthermore, we showcase downstream applications enabled by our aligned object generation, including zero-shot model-free object orientation estimation via analysis-by-synthesis and efficient arrow-based object manipulation.
UrbanCAD: Towards Highly Controllable and Photorealistic 3D Vehicles for Urban Scene Simulation

Lu, Yichong^*, Cai, Yichi^*, Zhang, Shangzhan, Zhou, Hongyu, Hu, Haoji, Yu, Huimin, Geiger, Andreas, and Liao, Yiyi^♯

CVPR 2025

Abs PDF Website

Photorealistic 3D vehicle models with high controllability are essential for autonomous driving simulation and data augmentation. While handcrafted CAD models provide flexible controllability, free CAD libraries often lack the high-quality materials necessary for photorealistic rendering. Conversely, reconstructed 3D models offer high-fidelity rendering but lack controllability. In this work, we introduce UrbanCAD, a framework that pushes the frontier of the photorealism-controllability trade-off by generating highly controllable and photorealistic 3D vehicle digital twins from a single urban image and a large collection of free 3D CAD models and handcrafted materials. These digital twins enable realistic 360 degrees rendering, vehicle insertion, material transfer, relighting, and component manipulation such as opening doors and rolling down windows, supporting the construction of long-tail scenarios. To achieve this, we propose a novel pipeline that operates in a retrieval-optimization manner, adapting to observational data while preserving fine-grained handcrafted properties including component-level controllability, complex material physical properties, interior geometry, and detailed material assignment. Furthermore, given multi-view background perspective and fisheye images, we approximate environment lighting using fisheye images and reconstruct the background with 3DGS, enabling the photorealistic insertion of optimized CAD models into rendered novel view backgrounds. Experimental results demonstrate that UrbanCAD outperforms baselines based on reconstruction and retrieval in terms of photorealism. Additionally, we show that various perception models maintain their accuracy when evaluated on UrbanCAD with in-distribution configurations but degrade when applied to realistic out-of-distribution data generated by our method. This suggests that UrbanCAD is a significant advancement in creating photorealistic, safety-critical driving scenarios for downstream applications.
PanopticNeRF-360: Panoramic 3D-to-2D Label Transfer in Urban Scenes

Fu, Xiao^*, Zhang, Shangzhan^*, Chen, Tianrun, Lu, Yichong, Zhou, Xiaowei, Geiger, Andreas, and Liao, Yiyi^♯

TPAMI 2025

Abs PDF Code Website

Training perception systems for self-driving cars requires substantial 2D annotations that are labor-intensive to manual label. While existing datasets provide rich annotations on pre-recorded sequences, they fall short in labeling rarely encountered viewpoints, potentially hampering the generalization ability for perception models. In this paper, we present PanopticNeRF-360, a novel approach that combines coarse 3D annotations with noisy 2D semantic cues to generate high-quality panoptic labels and images from any viewpoint. Our key insight lies in exploiting the complementarity of 3D and 2D priors to mutually enhance geometry and semantics. Specifically, we propose to leverage coarse 3D bounding primitives and noisy 2D semantic and instance predictions to guide geometry optimization, by encouraging predicted labels to match panoptic pseudo ground truth. Simultaneously, the improved geometry assists in filtering 3D&2D annotation noise by fusing semantics in 3D space via a learned semantic field. To further enhance appearance, we combine MLP and hash grids to yield hybrid scene features, striking a balance between high-frequency appearance and contiguous semantics. Our experiments demonstrate PanopticNeRF-360’s state-of-the-art performance over label transfer methods on the challenging urban scenes of the KITTI-360 dataset. Moreover, PanopticNeRF-360 enables omnidirectional rendering of high-fidelity, multi-view and spatiotemporally consistent appearance, semantic and instance labels.
HUGSIM: A Real-Time, Photo-Realistic and Closed-Loop Simulator for Autonomous Driving

Zhou, Hongyu, Lin, Longzhong, Wang, Jiabao, Lu, Yichong, Bai, Dongfeng, Liu, Bingbing, Wang, Yue, Geiger, Andreas, and Liao, Yiyi^♯

Arxiv 2024

Abs PDF Website Video Code

We present HUGSIM, a real-time, photo-realistic and closed-loop simulator for autonomous driving. HUGSIM enables the full closed simulation loop, dynamically updating the ego and actor positions and observations based on control commands. It incorporates 360-degree high-fidelity actor insertion and efficiently generates both normal and aggressive actor behaviors. Moreover, HUGSIM offers a comprehensive benchmark across more than 70 sequences from KITTI-360, Waymo Open Dataset, Nuscenes, and Pandaset, along with over 400 varying scenarios, providing a fair and realistic evaluation platform for existing autonomous driving algorithms..
Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene Segmentation

Fu, Xiao^*, Zhang, Shangzhan^*, Chen, Tianrun, Lu, Yichong, Zhu, Lanyun, Zhou, Xiaowei, Geiger, Andreas, and Liao, Yiyi^♯

3DV 2022

Abs PDF Code Poster Website Video

Large-scale training data with high-quality annotations is critical for training semantic and instance segmentation models. Unfortunately, pixel-wise annotation is labor-intensive and costly, raising the demand for more efficient labeling strategies. In this work, we present a novel 3D-to-2D label transfer method, Panoptic NeRF, which aims for obtaining per-pixel 2D semantic and instance labels from easy-to-obtain coarse 3D bounding primitives. Our method utilizes NeRF as a differentiable tool to unify coarse 3D annotations and 2D semantic cues transferred from existing datasets. We demonstrate that this combination allows for improved geometry guided by semantic information, enabling rendering of accurate semantic maps across multiple views. Furthermore, this fusion process resolves label ambiguity of the coarse 3D annotations and filters noise in the 2D predictions. By inferring in 3D space and rendering to 2D labels, our 2D semantic and instance labels are multi-view consistent by design. Experimental results show that Panoptic NeRF outperforms existing label transfer methods in terms of accuracy and multi-view consistency on challenging urban scenes of the KITTI-360 dataset.