Zhen Zhu

Picture taken in Hawaii in 2024.

Welcome! I’m currently a final year Ph.D. candidate at the University of Illinois at Urbana-Champaign (UIUC), working with Professor Derek Hoiem. I received my Master’s degree from HUST in June 2020, supervised by Professor Xiang Bai.

My research goal is to close the loop between perception and creation by building on device multimodal models that learn a concept from few interactions and quickly reuse that knowledge to recognize, reason, and reshape the visual world.

Research Focus

Continual & Dynamic Learning — algorithms that update models in real time without forgetting.
Multimodal Models — factual and grounded large multimodal models that integrate images, text, video, etc.
Controllable Synthesis — autoregressive/diffusion-based models for fast, precise and user-directed editing.
Visual Recognition — models for fundamental visual understanding, such as object detection and segmentation.

I am currently on the job market for tenure-track faculty, postdoctoral, and research scientist positions beginning in around early 2026. Feel free to reach out if our interests align.

More about me: CV · Google Scholar

News

Jun 13, 2025	Selected as an Outstanding Reviewer for CVPR 2025. Cheers!
Mar 18, 2025	I visited Prof. Carl Vondrick’s lab from Columbia University and gave a talk about flexible and dynamic learning.
Feb 19, 2025	I gave a talk at AI2 invited by Prof. Ranjay Krishna about “Towards Flexible Continual Learning and Beyond”.
Dec 07, 2024	I spent a wonderful week at UC Berkeley, visiting Prof. Alyosha Efros’s lab and gave a talk about flexible and dynamic learning.
Sep 20, 2024	Our TreeProbe paper is accepted by TMLR.
Aug 12, 2024	Our AnytimeCL paper is accepted by ECCV2024 as an oral presentation.
May 21, 2024	I started working as a research intern at Google, working with Daniel ReMine and Catherine Zhang.
Aug 01, 2023	One paper about multi-modal generation accepted to WACV2024 in the first round.
May 15, 2023	Internship started at Adobe with the same group.
May 15, 2022	Internship started at Adobe, while working with Yijun, Krishna and Zhixin.
May 01, 2022	One paper accepted to ECCV 2022 as oral presentation.
Aug 11, 2021	Arrived at UIUC.
Mar 01, 2021	Our extended version of the CVPR 2019 pose transfer paper got accepted by TPAMI.
Jan 01, 2021	Enrolled in UIUC and began the Ph.D student life.
Dec 01, 2020	Journey at ShanghaiTech and Next Steps

Selected Publications

View Full Publication List

🔍 Filter by Research Category

2018

Detection/Segmentation

2018 CVPR

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Gui-Song Xia, Xiang Bai, Jian Ding , Zhen Zhu, Serge Belongie, Jiebo Luo, Mihai Datcu , Marcello Pelillo , and Liangpei Zhang

Bib Code Website

Abstract: Object detection is an important and challenging problem in computer vision. Although the past decade has witnessed major advances in object detection in natural scenes, such successes have been slow to aerial imagery, not only because of the huge variation in the scale, orientation and shape of the... Read more
```
@inproceedings{xia2018dota,
  title = {DOTA: A Large-scale Dataset for Object Detection in Aerial Images},
  author = {Xia, Gui-Song and Bai, Xiang and Ding, Jian and Zhu, Zhen and Belongie, Serge and Luo, Jiebo and Datcu, Mihai and Pelillo, Marcello and Zhang, Liangpei},
  booktitle = {CVPR},
  pages = {3974--3983},
  year = {2018},
  category = {Detection/Segmentation},
}
```
Image Generation

2018 TOG

Non-stationary Texture Synthesis by Adversarial Expansion

Yang Zhou^*, Zhen Zhu^*, Xiang Bai, Dani Lischinski, Daniel Cohen-Or , and Hui Huang

*Joint first author

Bib Code Website

Abstract: The real world exhibits an abundance of non-stationary textures. Examples include textures with large-scale structures, as well as spatially variant and inhomogeneous textures. While existing example-based texture synthesis methods can cope well with stationary textures, non-stationary textures stil... Read more
```
@article{zhou2018non,
  title = {Non-stationary Texture Synthesis by Adversarial Expansion},
  author = {Zhou, Yang and Zhu, Zhen and Bai, Xiang and Lischinski, Dani and Cohen-Or, Daniel and Huang, Hui},
  journal = {ACM Transactions on Graphics (TOG)},
  volume = {37},
  number = {4},
  pages = {1--13},
  year = {2018},
  category = {Image Generation},
  publisher = {ACM New York, NY, USA},
}
```
Detection/Segmentation

2018 CVPR

Rotation-sensitive Regression for Oriented Scene Text Detection

Minghui Liao, Zhen Zhu, Baoguang Shi , Gui-song Xia , and Xiang Bai

arXiv Bib Code

Abstract: Text in natural images is of arbitrary orientations, requiring detection in terms of oriented bounding boxes. Normally, a multi-oriented text detector often involves two key tasks: 1) text presence detection, which is a classification problem disregarding text orientation; 2) oriented bounding box r... Read more
```
@inproceedings{liao2018rotation,
  title = {Rotation-sensitive Regression for Oriented Scene Text Detection},
  author = {Liao, Minghui and Zhu, Zhen and Shi, Baoguang and Xia, Gui-song and Bai, Xiang},
  booktitle = {CVPR},
  pages = {5909--5918},
  year = {2018},
  category = {Detection/Segmentation},
}
```

2019

Detection/Segmentation

2019 ICCV

Asymmetric Non-local Neural Networks for Semantic Segmentation

Zhen Zhu, Mengde Xu , Song Bai , Tengteng Huang, and Xiang Bai

arXiv Bib Code

Abstract: The non-local module works as a particularly useful technique for semantic segmentation while criticized for its prohibitive computation and GPU memory occupation. In this paper, we present Asymmetric Non-local Neural Network to semantic segmentation, which has two prominent components: Asymmetric P... Read more
```
@inproceedings{zhu2019asymmetric,
  title = {Asymmetric Non-local Neural Networks for Semantic Segmentation},
  author = {Zhu, Zhen and Xu, Mengde and Bai, Song and Huang, Tengteng and Bai, Xiang},
  booktitle = {ICCV},
  pages = {593--602},
  year = {2019},
  category = {Detection/Segmentation},
}
```
Image Generation

2019 CVPR 🏆 Oral

Progressive Pose Attention Transfer for Person Image Generation

Zhen Zhu, Tengteng Huang, Baoguang Shi, Miao Yu , Bofei Wang , and Xiang Bai

arXiv Bib Video Code

Abstract: This paper proposes a new generative adversarial network for pose transfer, i.e., transferring the pose of a given person to a target pose. The generator of the network comprises a sequence of Pose-Attentional Transfer Blocks that each transfers certain regions it attends to, generating the person i... Read more

Oral
```
@inproceedings{zhu2019progressive,
  title = {Progressive Pose Attention Transfer for Person Image Generation},
  author = {Zhu, Zhen and Huang, Tengteng and Shi, Baoguang and Yu, Miao and Wang, Bofei and Bai, Xiang},
  booktitle = {CVPR},
  pages = {2347--2356},
  year = {2019},
  category = {Image Generation},
}
```

2020

Detection/Segmentation

2020 ECCV 🏆 Oral

Semantic Flow for Fast and Accurate Scene Parsing

Xiangtai Li, Ansheng You, Zhen Zhu, Houlong Zhao , Maoke Yang , Kuiyuan Yang , and Yunhai Tong

arXiv Bib

Abstract: In this paper, we focus on designing effective method for fast and accurate scene parsing. A common practice to improve the performance is to attain high resolution feature maps with strong semantic representation. Two strategies are widely used – atrous convolutions and feature pyramid fusion, are ... Read more

Oral
```
@inproceedings{li2020semantic,
  title = {Semantic Flow for Fast and Accurate Scene Parsing},
  author = {Li, Xiangtai and You, Ansheng and Zhu, Zhen and Zhao, Houlong and Yang, Maoke and Yang, Kuiyuan and Tong, Yunhai},
  booktitle = {ECCV},
  year = {2020},
  category = {Detection/Segmentation},
}
```
Image Generation

2020 CVPR

Semantically Multi-modal Image Synthesis

Zhen Zhu , Zhiliang Xu, Ansheng You, and Xiang Bai

arXiv Bib Code Website

Abstract: In this paper, we focus on semantically multi-modal image synthesis (SMIS) task, namely, generating multi-modal images at the semantic level. Previous work seeks to use multiple class-specific generators, constraining its usage in datasets with a small number of classes. We instead propose a novel G... Read more
```
@inproceedings{zhu2020semantically,
  title = {Semantically Multi-modal Image Synthesis},
  author = {Zhu, Zhen and Xu, Zhiliang and You, Ansheng and Bai, Xiang},
  booktitle = {CVPR},
  pages = {5467--5476},
  year = {2020},
  category = {Image Generation},
}
```

2021

Image Generation

2021 TPAMI

Progressive and Aligned Pose Attention Transfer for Person Image Generation

Zhen Zhu, Tengteng Huang, Mengde Xu, Baoguang Shi, Wenqing Cheng, and Xiang Bai

Bib HTML

Abstract: This paper proposes a new generative adversarial network for pose transfer, i.e., transferring the pose of a given person to a target pose. We design a progressive generator which comprises a sequence of transfer blocks. Each block performs an intermediate transfer step by modeling the relationship ... Read more
```
@article{zhu2021progressive,
  title = {Progressive and Aligned Pose Attention Transfer for Person Image Generation},
  author = {Zhu, Zhen and Huang, Tengteng and Xu, Mengde and Shi, Baoguang and Cheng, Wenqing and Bai, Xiang},
  journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
  volume = {44},
  number = {8},
  pages = {4306--4320},
  year = {2021},
  category = {Image Generation},
  publisher = {IEEE},
  doi = {10.1109/TPAMI.2021.3066681},
}
```

2022

Image Generation

2022 ECCV 🏆 Oral

CCPL: Contrastive Coherence Preserving Loss for Versatile Style Transfer

Zijie Wu^*, Zhen Zhu^*, Junping Du , and Xiang Bai

*Joint first author

arXiv Bib Video Code

Abstract: In this paper, we aim to devise a universally versatile style transfer method capable of performing artistic, photo-realistic, and video style transfer jointly, without seeing videos during training. Previous single-frame methods assume a strong constraint on the whole image to maintain temporal con... Read more

Oral
```
@inproceedings{wu2022ccpl,
  title = {CCPL: Contrastive Coherence Preserving Loss for Versatile Style Transfer},
  author = {Wu, Zijie and Zhu, Zhen and Du, Junping and Bai, Xiang},
  booktitle = {ECCV},
  pages = {189--206},
  year = {2022},
  category = {Image Generation},
  organization = {Springer},
}
```

2024

Multimodal Learning Continual/Dynamic Learning

2024 ECCV 🏆 Oral

Anytime Continual Learning for Open Vocabulary Classification

Zhen Zhu, Yiming Gong, and Derek Hoiem

arXiv Bib HTML Video Code

Abstract: We propose an approach for anytime continual learning (AnytimeCL) for open vocabulary image classification. The AnytimeCL problem aims to break away from batch training and rigid models by requiring that a system can predict any set of labels at any time and efficiently update and improve when recei... Read more

Oral
```
@article{zhu2024anytime,
  title = {Anytime Continual Learning for Open Vocabulary Classification},
  author = {Zhu, Zhen and Gong, Yiming and Hoiem, Derek},
  journal = {ECCV},
  year = {2024},
  category = {Multimodal Learning, Continual/Dynamic Learning},
}
```
Multimodal Learning Continual/Dynamic Learning

2024 TMLR

Continual Learning in Open-vocabulary Classification with Complementary Memory Systems

Zhen Zhu , Weijie Lyu , Yao Xiao, and Derek Hoiem

arXiv Bib Code

Abstract: We introduce a method for flexible and efficient continual learning in open-vocabulary image classification, drawing inspiration from the complementary learning systems observed in human cognition. Specifically, we propose to combine predictions from a CLIP zero-shot model and the exemplar-based mod... Read more
```
@article{zhu2023continual,
  title = {Continual Learning in Open-vocabulary Classification with Complementary Memory Systems},
  author = {Zhu, Zhen and Lyu, Weijie and Xiao, Yao and Hoiem, Derek},
  journal = {TMLR},
  year = {2024},
  category = {Multimodal Learning, Continual/Dynamic Learning},
}
```

2025

2025 Under Review

How To Teach Large Multimodal Models New Tricks?

Zhen Zhu, Yiming Gong, Yao Xiao , Yaoyao Liu, and Derek Hoiem

Under Review

Bib

Multimodal Learning Continual/Dynamic Learning

Abstract: Large multimodal models (LMMs) are effective for many vision and language problems but may underperform in specialized domains such as object counting and clock reading. Fine-tuning improves target task performance but sacrifices generality, while retraining with an expanded dataset is expensive. We... Read more
```
@article{zhu2025multimodal,
  title = {How To Teach Large Multimodal Models New Tricks?},
  author = {Zhu, Zhen and Gong, Yiming and Xiao, Yao and Liu, Yaoyao and Hoiem, Derek},
  journal = {Under review},
  year = {2025},
  category = {Multimodal Learning, Continual/Dynamic Learning},
}
```
2025 Under Review

TextRegion: Text-Aligned Region Tokens from Frozen Image-Text Models

Yao Xiao, Qiqian Fu , Heyi Tao , Yuqun Wu , Zhen Zhu, and Derek Hoiem

Under Review

arXiv Bib Code

Multimodal Learning Detection/Segmentation

Abstract: Image-text models excel at image-level tasks but struggle with detailed visual understanding. While these models provide strong visual-language alignment, segmentation models like SAM2 offer precise spatial boundaries for objects. To this end, we propose TextRegion, a simple, effective and training-... Read more
```
@article{xiao2025textregion,
  title = {TextRegion: Text-Aligned Region Tokens from Frozen Image-Text Models},
  author = {Xiao, Yao and Fu, Qiqian and Tao, Heyi and Wu, Yuqun and Zhu, Zhen and Hoiem, Derek},
  journal = {Under review},
  year = {2025},
  category = {Multimodal Learning, Detection/Segmentation},
}
```

Collaborations

I've had the privilege of mentoring several talented students throughout my Ph.D. journey:

Zhiliang Xu — Image Generation and Face Synthesis
Yang Liu — Watermark Removal and Image Processing
Zijie Wu — Style Transfer and Generative Models
Yiming Gong — Machine Learning and Image Editing
Joshua Cho — Computational Photography and Image Enhancement
Xudong Xie — Texture Synthesis

My current close collaborators:

Yao Xiao — Video Understanding and Multimodal Learning
Zhipeng Bao — Multimodal Generation

Service

Co-organizer: UIUC External Speaker Series — Interested speakers are welcome to reach out to register for upcoming sessions

Co-organizer: UIUC Vision Mini-Conference

Conference Reviewer: CVPR, ICCV, ECCV, ICLR, NeurIPS, ICML, AAAI, IJCAI, BMVC, WACV, and others

Journal Reviewer: TPAMI, IJCV, TIP, PR, and others