Mr Yichao Cai

Higher Degree by Research Candidate

School of Computer Science and Information Technology

College of Engineering and Information Technology


Mr. Yichao Cai is a third-year Ph.D. student in Computer Science at the Australian Institute for Machine Learning (AIML), University of Adelaide, advised by Prof. Javen Qinfeng Shi. Previously, He received his M.Sc. and B.Eng. in Instrument Science from Wuhan University of Technology. His research studies multimodal learning and identifiable, causal representation learning, with a focus on how language supervision shapes alignment and semantic factorization in vision–language models.

My research studies what representations learn from supervision—particularly language supervision—and when such learning leads to identifiable latent structure.

I work on understanding when modern learning objectives recover latent structure beyond predictive performance, using tools from identifiability theory, latent-variable modeling, and representation geometry. In particular, I study the equivalence classes of representations induced by learning objectives, and how cross-modal supervision shapes the geometry of vision-language models. I am also interested in how these learned representations relate to human-interpretable concepts.

Language Competency
Chinese (Mandarin) Can read, write, speak, understand spoken and peer review
English Can read, write, speak, understand spoken and peer review

Date Institution name Country Title
2016 - 2019 Wuhan University of Technology China M.S.
2012 - 2016 Wuhan University of Technology China B.Eng.

Year Citation
2018 Cai, Y., Li, D., Zhou, X., & Mou, X. (2018). Robust Drivable Road Region Detection for Fixed-Route Autonomous Vehicles Using Map-Fusion Images. SENSORS, 18(12), 15 pages.
DOI WoS13 Europe PMC4

Year Citation
2024 Cai, Y., Liu, Y., Zhang, Z., & Shi, J. Q. (2024). CLAP: Isolating Content from Style Through Contrastive Learning with Augmented Prompts. In Lecture Notes in computer science Vol. 15079 (pp. 130-147). Milan, Italy: Springer Nature Switzerland.
DOI Scopus5

Year Citation
2025 Cai, Y., Liu, Y., Gao, E., Jiang, T., Zhang, Z., Hengel, A. V. D., & Shi, J. Q. (2025). On the Value of Cross-Modal Misalignment in Multimodal Representation
Learning.
  • Teaching Assistant, Neural Networks and Deep Learning (ARTI X300) @ Adelaide University
  • Head Tutor & Guest lecturer, Statistical Machine Learning (Semester 2, 2025) @ Adelaide University
  • Teaching Assistant, Using Machine Learning Tools (Trimester 2, 2025) @ Adelaide University
  • Teaching Assistant, Concepts in AI and ML (Trimester 1, 2025) @ Adelaide University

Connect With Me

External Profiles

Other Links