Gengze Zhou

— Gengze Zhou

Higher Degree by Research Candidate

PhD Candidate

School of Computer and Mathematical Sciences

Faculty of Sciences, Engineering and Technology


My research is dedicated to creating explainable and embodied AI systems that can interact dynamically with both humans and their environments. I aim to build an autonomous agent that can understand, reason, and navigate the physical world, while seamlessly communicating with humans in natural language. By integrating machine learning with visual and linguistic applications, I strive to enhance the transparency and interpretability of AI decision-making, fostering more natural and effective human-AI interactions.

Some topics that I currently focus on:

  • Self Explainable and Communicative Vision-and-Language Navigation (VLN) with Language Models: NavGPT, NavGPT-2
  • Sim2Real Transfer for VLN with Large Vision-Language Models: NaVid
  • Journals

    Year Citation
    2023 Zhou, G., Hong, Y., & Wu, Q. (2023). NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large
    Language Models.
  • Book Chapters

    Year Citation
    2025 Zhou, G., Hong, Y., Wang, Z., Wang, X. E., & Wu, Q. (2025). NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models. In Lecture Notes in Computer Science (Vol. 15065 LNCS, pp. 260-278). Springer Nature Switzerland.
    DOI
  • Conference Papers

    Year Citation
    2024 Chen, Q., Pitawela, D., Zhao, C., Zhou, G., Chen, H. T., & Wu, Q. (2024). WebVLN: Vision-and-Language Navigation on Websites. In Proceedings of the AAAI Conference on Artificial Intelligence Vol. 38 (pp. 1165-1173). Online: Association for the Advancement of Artificial Intelligence (AAAI).
    DOI Scopus2
    2024 Zhou, G., Hong, Y., & Wu, Q. (2024). NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models. In Proceedings of the AAAI Conference on Artificial Intelligence Vol. 38 (pp. 7641-7649). Online: Association for the Advancement of Artificial Intelligence (AAAI).
    DOI Scopus9
  • Preprint

    Year Citation
    2024 Zhou, G., Hong, Y., Wang, Z., Wang, X. E., & Wu, Q. (2024). NavGPT-2: Unleashing Navigational Reasoning Capability for Large
    Vision-Language Models.
    2023 Chen, Q., Pitawela, D., Zhao, C., Zhou, G., Chen, H. -T., & Wu, Q. (2023). WebVLN: Vision-and-Language Navigation on Websites.
  • Position: PhD Candidate
  • Email: gengze.zhou@adelaide.edu.au
  • Campus: Lot 14
  • Building: Australian Institute for Machine Learning Building, floor Second Floor
  • Room: 2.04.04
  • Org Unit: Australian Institute for Machine Learning - Projects

Connect With Me
External Profiles