Xinhua Cheng (程鑫华)
ID Photo

I am a forth-year Ph.D. candidate of Computer Science and Technology at School of Electron and Computer Engineering, Peking University, advised by Prof. Li Yuan. Before this, I got a B.E. degree of computer science at College of Computer Science, Sichuan University.

My recent research interests includes 3D content and video generation.

Selected Publications

Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts

Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts

International Conference on Learning Representations (ICLR), 2024

We introduce progressive local editing to create precise 3D content consistent with prompts describing multiple interacted objects binding with different attributes.

Null-Space Diffusion Sampling for Zero-Shot Point Cloud Completion

Null-Space Diffusion Sampling for Zero-Shot Point Cloud Completion

Xinhua Cheng*, Nan Zhang*, Jiwen Yu, Yinhuai Wang, Ge Li, Jian Zhang

International Joint Conference on Artificial Intelligence (IJCAI), 2023

We propose a zero-shot point cloud completion framework by only refining the null-space content during the reverse process of a pre-trained diffusion model.

Panoptic Compositional Feature Field for Editable Scene Rendering with Network-Inferred Labels via Metric Learning

Panoptic Compositional Feature Field for Editable Scene Rendering with Network-Inferred Labels via Metric Learning

Conference on Computer Vision and Pattern Recognition (CVPR), 2023

We introduce metric learing for leveraging 2D network-inferred labels to obtain discriminating feature fields, leading to 3D segmentation and editing results.

EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding

EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding

Conference on Computer Vision and Pattern Recognition (CVPR), 2023

We explicitly decouple the textual attributes and conduct dense alignment between such fine-grained language and point cloud objects for 3D visual grounding.

More is better: Multi-source Dynamic Parsing Attention for Occluded Person Re-identification

More is better: Multi-source Dynamic Parsing Attention for Occluded Person Re-identification

Xinhua Cheng*, Mengxi Jia*, Qian Wang, Jian Zhang

ACM International Conference on Multimedia (ACM MM), 2022

We introduce the multi-source knowledge ensemble in occluded re-ID to effective leverage external semantic cues learned from different domains.

A Simple Visual-Textual Baseline for Pedestrian Attribute Recognition

A Simple Visual-Textual Baseline for Pedestrian Attribute Recognition

Xinhua Cheng*, Mengxi Jia*, Qian Wang, Jian Zhang

IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2022

We model pedestrian attribute recognition as a multimodal problem and build a simple visual-textual baseline to captures the intra- and cross-modal correlations.

Projects

Open-Sora Plan: Open-Source Large Video Generation Model

Open-Sora Plan: Open-Source Large Video Generation Model

Bin Lin*, Yunyang Ge*, Xinhua Cheng*, et al.

We introduce Open-Sora Plan, an open-source project that aims to contribute a large generation model for generating desired high-resolution videos with long durations based on various user inputs.