Selected Publications (* indicates equal contribution)
|
|
Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts
Xinhua Cheng,
Tianyu Yang,
Jianan Wang,
Yu Li,
Lei Zhang,
Jian Zhang,
Li Yuan
Arxiv, 2023
[Paper]
[Code]
[Page]
We introduce progressive local editing to create precise 3D content consistent with prompts describing multiple interacted objects binding with different attributes.
|
|
Null-Space Diffusion Sampling for Zero-Shot Point Cloud Completion
Xinhua Cheng*,
Nan Zhang*,
Jiwen Yu,
Yinhuai Wang,
Ge Li,
Jian Zhang
International Joint Conference on Artificial Intelligence (IJCAI), 2023
[Paper]
We propose a zero-shot point cloud completion framework by only refining the null-space content during the reverse process of a pre-trained diffusion model.
|
|
Panoptic Compositional Feature Field for Editable Scene Rendering with Network-Inferred Labels via Metric Learning
Xinhua Cheng,
Yanmin Wu,
Mengxi Jia,
Qian Wang,
Jian Zhang
Conference on Computer Vision and Pattern Recognition (CVPR), 2023
[Paper]
We introduce metric learing for leveraging 2D network-inferred labels to obtain discriminating feature fields, leading to 3D segmentation and editing results.
|
|
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
Yanmin Wu,
Xinhua Cheng,
Renrui Zhang,
Zesen Cheng,
Jian Zhang
Conference on Computer Vision and Pattern Recognition (CVPR), 2023
[Paper]
[Code]
we explicitly decouple the textual attributes and conduct dense alignment between such fine-grained language and point cloud objects for 3D visual grounding.
|
|
More is better: Multi-source Dynamic Parsing Attention for Occluded Person Re-identification
Xinhua Cheng*,
Mengxi Jia*,
Qian Wang,
Jian Zhang
ACM International Conference on Multimedia (ACM MM), 2022
[Paper]
We introduce the multi-source knowledge ensemble in occluded re-ID to effective leverage external semantic cues learned from different domains.
|
|
A Simple Visual-Textual Baseline for Pedestrian Attribute Recognition
Xinhua Cheng*,
Mengxi Jia*,
Qian Wang,
Jian Zhang
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2022
[Paper]
[Code]
We model pedestrian attribute recognition as a multimodal problem and build a simple visual-textual baseline to captures the intra- and cross-modal correlations.
|
Template is adapted from Here
Last updated: Oct 2023
|
|