Research Interests
Computer Vision Knowledge Representation and Machine Learning Artificial IntelligenceMr Yichao Cai
Higher Degree by Research Candidate
School of Computer Science and Information Technology
College of Engineering and Information Technology
Mr. Yichao Cai is a third-year Ph.D. student in Computer Science at the Australian Institute for Machine Learning (AIML), University of Adelaide, advised by Prof. Javen Qinfeng Shi. Previously, He received his M.Sc. and B.Eng. in Instrument Science from Wuhan University of Technology. His research studies multimodal learning and identifiable, causal representation learning, with a focus on how language supervision shapes alignment and semantic factorization in vision–language models.
My research studies what representations learn from supervision—particularly language supervision—and when such learning leads to identifiable latent structure.
I work on understanding when modern learning objectives recover latent structure beyond predictive performance, using tools from identifiability theory, latent-variable modeling, and representation geometry. In particular, I study the equivalence classes of representations induced by learning objectives, and how cross-modal supervision shapes the geometry of vision-language models. I am also interested in how these learned representations relate to human-interpretable concepts.
| Language | Competency |
|---|---|
| Chinese (Mandarin) | Can read, write, speak, understand spoken and peer review |
| English | Can read, write, speak, understand spoken and peer review |
| Date | Institution name | Country | Title |
|---|---|---|---|
| 2016 - 2019 | Wuhan University of Technology | China | M.S. |
| 2012 - 2016 | Wuhan University of Technology | China | B.Eng. |
| Year | Citation |
|---|---|
| 2018 | Cai, Y., Li, D., Zhou, X., & Mou, X. (2018). Robust Drivable Road Region Detection for Fixed-Route Autonomous Vehicles Using Map-Fusion Images. SENSORS, 18(12), 15 pages. WoS13 Europe PMC4 |
| Year | Citation |
|---|---|
| 2024 | Cai, Y., Liu, Y., Zhang, Z., & Shi, J. Q. (2024). CLAP: Isolating Content from Style Through Contrastive Learning with Augmented Prompts. In Lecture Notes in computer science Vol. 15079 (pp. 130-147). Milan, Italy: Springer Nature Switzerland. DOI Scopus5 |
| Year | Citation |
|---|---|
| 2025 | Cai, Y., Liu, Y., Gao, E., Jiang, T., Zhang, Z., Hengel, A. V. D., & Shi, J. Q. (2025). On the Value of Cross-Modal Misalignment in Multimodal Representation Learning. |
- Teaching Assistant, Neural Networks and Deep Learning (ARTI X300) @ Adelaide University
- Head Tutor & Guest lecturer, Statistical Machine Learning (Semester 2, 2025) @ Adelaide University
- Teaching Assistant, Using Machine Learning Tools (Trimester 2, 2025) @ Adelaide University
- Teaching Assistant, Concepts in AI and ML (Trimester 1, 2025) @ Adelaide University