Gengze Zhou

— Gengze Zhou

Higher Degree by Research Candidate

PhD Candidate


My research is dedicated to creating explainable and embodied AI systems that can interact dynamically with both humans and their environments. I aim to build an autonomous agent that can understand, reason, and navigate the physical world, while seamlessly communicating with humans in natural language. By integrating machine learning with visual and linguistic applications, I strive to enhance the transparency and interpretability of AI decision-making, fostering more natural and effective human-AI interactions.

Some topics that I currently focus on:

  • Self Explainable and Communicative Vision-and-Language Navigation (VLN) with Language Models: NavGPT, NavGPT-2
  • Sim2Real Transfer for VLN with Large Vision-Language Models: NaVid

Year Citation
2023 Zhou, G., Hong, Y., & Wu, Q. (2023). NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large
Language Models.

Year Citation
2025 Zhou, G., Hong, Y., Wang, Z., Wang, X. E., & Wu, Q. (2025). NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models. In Lecture Notes in Computer Science Vol. 15065 LNCS (pp. 260-278). Milan, Italy: Springer Nature Switzerland.
DOI Scopus17 WoS3
2024 Chen, Q., Pitawela, D., Zhao, C., Zhou, G., Chen, H. T., & Wu, Q. (2024). WebVLN: Vision-and-Language Navigation on Websites. In Proceedings of the AAAI Conference on Artificial Intelligence Vol. 38 (pp. 1165-1173). Online: Association for the Advancement of Artificial Intelligence (AAAI).
DOI Scopus8 WoS2
2024 Zhou, G., Hong, Y., & Wu, Q. (2024). NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models. In Proceedings of the AAAI Conference on Artificial Intelligence Vol. 38 (pp. 7641-7649). Online: Association for the Advancement of Artificial Intelligence (AAAI).
DOI Scopus92 WoS43

Year Citation
2024 Zhou, G., Hong, Y., Wang, Z., Wang, X. E., & Wu, Q. (2024). NavGPT-2: Unleashing Navigational Reasoning Capability for Large
Vision-Language Models.
2024 Zhou, G., Hong, Y., Wang, Z., Zhao, C., Bansal, M., & Wu, Q. (2024). SAME: Learning Generic Language-Guided Visual Navigation with
State-Adaptive Mixture of Experts.
2023 Chen, Q., Pitawela, D., Zhao, C., Zhou, G., Chen, H. -T., & Wu, Q. (2023). WebVLN: Vision-and-Language Navigation on Websites.

Connect With Me

External Profiles

Other Links