I am currently an AI researcher working on embodied AI with Dr. Tao Kong at ByteDance Research.

I received my Master’s degree in Artificial Intelligence from Fudan University (Sep. 2021 - Jun. 2024), where Prof. Tao Chen is my advisor. I am fortunate to work closely with Dr. Hongyuan Zhu from A*STAR, Singapore, and Dr. Gang Yu, Dr. Xin Chen, and Dr. Chi Zhang from Tencent. Before this, I obtained my Bachelor’s degree in Data Science and Big Data Technology also from Fudan University (Sep. 2017 - Jun. 2021).

My long-term research goal is to develop robust and generalized multi-modality systems that can perceive, understand, and interact with the physical world.

📣 If you are interested in my previous projects, feel free to check out my resume here.

🔥 News

  • Sep. 2024.  🎉🎉 Two papers accepted to NeurIPS 2024, one focuses on foundational 3D generative models (MeshXL ), and another one explores Mamba architecture for 3D detection (comming soon).
  • Jul. 2024.  🎉🎉 Our M3DBench , a dataset querying 3D LLMs with multi-modal prompts, is accepted to ECCV 2024.
  • May. 2024.  🎉🎉 We release MeshXL , a family of generative 3D foundation models for 3D meshes.
  • May. 2024.  🎉🎉 I successfully defended my master’s thesis! [defense slides]
  • Apr. 2024.  🎉🎉 Our state-of-the-art 3D dense captioning method Vote2Cap-DETR++ , is accepted to T-PAMI 2024.
  • Feb. 2024.  🎉🎉 Our Large Language 3D Assistant, LL3DA , is accepted to CVPR 2024.
  • Jan. 2024.  🐧🐧 Join Tencent as a research intern, working on 3D generation.
  • Oct. 2023.  🥇🥇 Win the Scan2Cap Challenge at ICCV 2023.
  • Feb. 2023.  🎉🎉 Our Vote2Cap-DETR paper is accepted to CVPR 2023.

📝 Selected Publications

I started my research from exploring how to use language for better 3D scene understanding (Vote2Cap-DETR and Vote2Cap-DETR++). Then, as large language models exhibits tremendous generalist potentials, I also explored whether LLMs can understand 3D (LL3DA and M3DBench). After that, I spent a wonderful half year exploring whether LLMs can speak 3D (MeshXL). Currently, I am working on both embodied AI and AIGC.

NeurIPS 2024
sym

MeshXL: Neural Coordinate Field for Generative 3D Foundation Models
NeurIPS 2024 |
Sijin Chen, Xin Chen$^{\dagger}$, Anqi Pang, Xianfang Zeng, Wei Cheng, Yijun Fu, Fukun Yin, Yanru Wang, Zhibin Wang, Chi Zhang, Jingyi Yu, Gang Yu, Bin Fu, Tao Chen$^{\ddagger}$

project | arXiv | github

  • MeshXL turns a 3D mesh into one unique coordinate sequence, facilitating an end-to-end training pipeline for large-scale 3D mesh data.
CVPR 2024
sym

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
CVPR 2024 |
Sijin Chen, Xin Chen$^{\dagger}$, Chi Zhang, Mingsheng Li, Gang Yu, Hao Fei, Hongyuan Zhu, Jiayuan Fan, Tao Chen$^{\ddagger}$

paper | project | arXiv | github | youtube

  • Propose a Large Language 3D Assistant that responds to both visual interactions and textual instructions in complex 3D environments.
  • 🎉 Please also see our M3DBench , a dataset querying 3D LLMs with multi-modal prompts, which is accepted to ECCV 2024.
T-PAMI 2024
sym

Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning
T-PAMI 2024 |
Sijin Chen, Hongyuan Zhu, Mingsheng Li, Xin Chen, Peng Guo, Yinjie Lei, Gang Yu, Taihao Li, Tao Chen$^{\dagger}$

paper | arXiv | github

  • Decoupled feature extraction and task decoding for 3D Dense Captioning.
CVPR 2023
sym

End-to-End 3D Dense Captioning with Vote2Cap-DETR
CVPR 2023 |
Sijin Chen, Hongyuan Zhu, Xin Chen, Yinjie Lei, Gang Yu, Tao Chen$^{\dagger}$

paper | arXiv | github | youtube

  • We address 3D Dense Captioning as a set prediction problem with parallel decoding.
  • The first non-“detect-then-describe” framework for 3D Dense Captioning.
  • 🥇 Winner of the Scan2Cap Challenge in the 3rd Language for 3D Scene Workshop at ICCV 2023. [talk]

🥇 Awards and Scholarships

  • Apr. 2024. Award for Outstanding Graduate Student (rank 1/24).
  • Oct. 2023. 1st place of the Scan2Cap Challenge in the 3rd Language for 3D Scene Workshop at ICCV 2023.
  • Sep. 2023. National Scholarship (rank 1/46).
  • Sep. 2022. 2nd prize of the Scholarship for Outstanding Students of Master’s Degrees.
  • Sep. 2021. Award for the Scholarship for Outstanding Students of Master’s Degrees.
  • Jun. 2021. 2nd prize of the Scholarship for Outstanding Students.

📖 Educations

  • Sep. 2021 - Jun. 2024. Master student at Fudan University.
  • Sep. 2017 - Jun. 2021. Bachelor student at Fudan University.

💬 Oral Presentations

  • Jul. 2024. “MeshXL: Neural Coordinate Field for Generative 3D Foundation Models”. MeshXL paves the way for scaling up training on large-scale 3D mesh data. Our mesh representation turns a 3D mesh into one unique coordinate sequence, which enables us to simplify our architecture design into a decoder-only transformer model, facilitating an end-to-end training pipeline for large-scale 3D mesh data. A technical report at miHoYo.

  • Oct. 2023. “Vote2Cap-DETR: A Set-to-Set Perspective Towards 3D Dense Captioning”. By treating 3D Dense Captioning as a translation task from a set of object queries into a set of ``box-caption’’ pairs, we present a set-to-set perspective towards 3D Dense Captioning. A winner presentation for the Scan2Cap challenge at ICCV 2023. [talk | slides]

  • Jun. 2023. “End-to-End 3D Dense Captioning with Vote2Cap-DETR”. We present an end-to-end transformer model for localizing and describing objects in parallel within diverse 3D environments. A paper presentation at VALSE 2023, Wuxi, China.