|
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models
Letian Zhang,
Xiaotong Zhai,
Zhongkai Zhao,
Yongshuo Zong,
Xin Wen,
Bingchen Zhao
Preprint, 2023
Project Page /
ArXiv /
Code
We build a counterfactual visual question answering benchmark, and show that strong Vision-Language
Models, even GPT-4, cannot handle them very well.
|
|
CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
Chuofan Ma,
Yi Jiang*,
Xin Wen*,
Zehuan Yuan,
Xiaojuan Qi
Conference on Neural Information Processing Systems (NeurIPS), 2023
Project Page /
ArXiv /
Code
We bridge the gap between vision & language spaces by reformulating region-word alignment as
co-occurring object discovery, and images mention a shared concept in their
captions are grouped together.
|
|
Parametric Classification for Generalized Category Discovery: A Baseline Study
Xin Wen*,
Bingchen Zhao*,
Xiaojuan Qi
IEEE International Conference on Computer Vision (ICCV), 2023
Project Page /
ArXiv /
Code /
Slides /
Poster
We revisit the reason that makes previous parametric classifiers fail to recognise new classes for
GCD, identify the prediction biases between and within seen and novel classes as the key issue, and
propose a simple yet strong framework that addresses these limitations and achieves state-of-the-art
performance in this field.
|
|
Learning Semi-supervised Gaussian Mixture Models for Generalized Category Discovery
Bingchen Zhao,
Xin Wen,
Kai Han
IEEE International Conference on Computer Vision (ICCV), 2023
ArXiv /
Code
We tackle GCD without knowing the class number as a-priori, propose a semi-supervised variant of
GMM with stochastic splitting and merging to dynamically determine prototypes,
and leverage prototpyical contrastive learning for representation learning on partially labelled
data.
|
|
Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning
Xiaoyang Wu,
Xin Wen,
Xihui Liu,
Hengshuang Zhao
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023
ArXiv /
Code
We propose the Masked Scene Contrast (MSC) framework for unsupervised 3D representation learning,
which efficiently generates contrastive views directly on scene-level point clouds and enables
large-scale 3D pre-training across multiple datasets.
|
|
Self-Supervised Visual Representation Learning with Semantic Grouping
Xin Wen,
Bingchen Zhao,
Anlin Zheng,
Xiangyu Zhang,
Xiaojuan Qi
Conference on Neural Information Processing Systems (NeurIPS), 2022
Project Page /
ArXiv /
Code /
Slides /
Poster
We show that object discovery can be learned jointly with the representations from scratch on
real-world scene-centric data, which leads to strong transfer learning results in various downstream
tasks.
|
|
Temporal Context Aggregation for Video Retrieval with Contrastive Learning
Jie Shao*,
Xin Wen*,
Bingchen Zhao,
Xiangyang Xue
IEEE Winter Conference on Applications of Computer Vision (WACV), 2021
ArXiv /
Code /
Slides
We present a contrastive learning-based video representation learning framework that adopts
long-range temporal information between frame-level features using self-attention.
|
|
Distilling Visual Priors from Self-Supervised Learning
Bingchen Zhao,
Xin Wen
European Conference on Computer Vision (ECCV) VIPriors Workshop, 2020
ArXiv /
Code /
Slides
We leverage self-supervised learning and knowledge distillation to improve the generalizability of
CNN models for image classification under the data-deficient setting.
|
Research Experiences
Research Intern | OpenRobotLab, Shanghai AI Laboratory | Shanghai, China
Advisor: Dr. Yilun Chen and Dr. Jiangmiao Pang
Topic: generalization and robustness of vision-language models
|
Aug. 2023 - Present |
Research Intern | Foundation Model Group, MEGVII Research | Remote
Advisor: Anlin Zheng and Dr. Xiangyu Zhang
Topic: unsupervised object-centric representation learning and open-world understanding
|
Apr. 2022 - June 2023 |
Research Assistant | CVMI Lab, The University
of
Hong Kong |
Hong Kong
SAR
Advisor: Dr. Xiaojuan Qi
Topic: unsupervised object-centric representation learning and open-world understanding
|
Jan. 2021 - Present |
Research Intern | Visual Computing Group, ByteDance
AI Lab |
Shanghai,
China
Advisor: Dr. Jie Shao and Prof. Xiangyang Xue
Topic: video retrival, action recognition, and video-language pre-training
|
Jan. 2020 - June 2021 |
Research Assistant | Deep Learning Lab, Tongji University | Shanghai, China
Advisor: Prof. Yin Wang
Topic: person re-identification and self-supervised learning
|
Sept. 2019 - Jan. 2021 |
|
Honors and Recognitions
NeurIPS 2022 Scholar Award |
Oct. 2022 |
Shanghai Outstanding Graduates |
June 2021 |
2nd place in the
ECCV
2020 Workshop VIPriors Image Classification Challenge |
July 2020 |
Qidi Scholarship of Tongji University (top 1%) |
June 2020 |
Regional
Champion (China) of the Covestro International Data Science Hackathon |
Nov. 2019 |
Silver Medal of the 43rd ACM International Collegiate Programming
Contest (ICPC) Asia-East Continent Final |
Dec. 2018 |
|
Academic Services
Reviewer for TPAMI, NeurIPS, ICLR, ICML, CVPR, ICCV, ECCV, WACV, CVinW, and OOD-CV.
|
Template gratefully stolen from here.
|
|