Yuankai Qi

Yuankai Qi

Grant-Funded Researcher (B)

Australian Institute for Machine Learning - Projects

Faculty of Sciences, Engineering and Technology

Eligible to supervise Masters and PhD - email supervisor to discuss availability.


My research involves broad machine learning and artificial intelligence at the joint of computer vision, natural language processing, and speech processing, such as vision-and-language navigation for embedded AI (teaching robots/drones to understand/execute human commands), visual voice cloning (aka movie dubbing), and video captioning (summarize events in a video). I also do pure computer vision tasks and pure natural language processing tasks, such as crowd counting (count objects/people in an image), anomaly video detection, visual object tracking (track a target in a video), and document event prediction. I have published over 40 papers on top-tier venues, such as CVPR, ICCV, AAAI, ACM MM, ECCV, IJCAI, IEEE TPAMI, IEEE TIP. Four of my authored/co-authored papers were accepted as oral reports (acceptance rate < 5%) on CVPR (twice), ACM MM, and NAACL. My academic service includes working as area chair for IJCAI and BMVC and as a regular reviewer for the above conferences/journals. Here is my profile on Google Scholar. I am eligible as Principal Supervisor for both Masters and PhD students.

Honors and Awards:

Winner of CAAI Outstanding Doctoral Dissertations, China, 2020 (10 winners across China, link)

Merit PhD Candidate of Heilongjiang Province, China, 2017

Winner of Supreme National Scholarship for PhD Candidates, 2016

VisDrone 2018: 2nd place in the Vision Meets Drones: Single Object Tracking Challenge! [VisDrone2018 results]

DAVIS 2017: Champion in the DAVIS Challenge on Video Object Segmentation 2017! [DAVIS2017 results]

VOT 2016: Our State-and-Scale Aware Tracker (SSAT) achieves the most accurate tracking results among totally 70 trackers on VOT 2016!  [ VOT2016 results paper ]

  • Journals

    Year Citation
    2023 Jiang, S., Wang, Q., Cheng, F., Qi, Y., & Liu, Q. (2023). A Unified Object Counting Network with Object Occupation Prior. IEEE Transactions on Circuits and Systems for Video Technology, 1.
    DOI
    2023 Ge, C., Song, Y., Ma, C., Qi, Y., & Luo, P. (2023). Rethinking Attentive Object Detection via Neural Attention Learning. IEEE Transactions on Image Processing, 1.
    DOI
    2023 Qiao, Y., Qi, Y., Hong, Y., Yu, Z., Wang, P., & Wu, Q. (2023). HOP+: History-Enhanced and Order-Aware Pre-Training for Vision-and-Language Navigation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(7), 8524-8537.
    DOI Scopus1 WoS1
    2022 Xu, K., Li, G. R., Hong, D. X., Zhang, W. G., Qi, Y. K., & Huang, Q. M. (2022). A Fast Video Object Segmentation Method Based on Inductive Learning and Transductive Reasoning. Jisuanji Xuebao/Chinese Journal of Computers, 45(10), 2117-2132.
    DOI
    2021 Wang, Y., Qi, Y., Yao, H., Gong, D., & Wu, Q. (2021). Image editing with varying intensities of processing. Computer Vision and Image Understanding, 211, 1-13.
    DOI Scopus3 WoS4
    2021 Han, T., Qi, Y., & Zhu, S. (2021). A continuous semantic embedding method for video compact representation. Electronics (Switzerland), 10(24), 3106-1-3106-14.
    DOI
    2021 Jiang, S., Qi, Y., Zhang, H., Bai, Z., Lu, X., & Wang, P. (2021). D3D: Dual 3-D Convolutional Network for Real-Time Action Recognition. IEEE Transactions on Industrial Informatics, 17(7), 4584-4593.
    DOI Scopus17
    2021 Jiang, S., Qi, Y., Cai, S., & Lu, X. (2021). Light fixed-time control for cluster synchronization of complex networks. Neurocomputing, 424, 63-70.
    DOI Scopus14
    2020 Zheng, S., Sun, J., Liu, Q., Qi, Y., & Yan, J. (2020). Overwater image dehazing via cycle-consistent generative adversarial network. Electronics (Switzerland), 9(11), 1-19.
    DOI Scopus3
    2020 Qi, Y., Zhang, S., Jiang, F., Zhou, H., Tao, D., & Li, X. (2020). Siamese Local and Global Networks for Robust Face Tracking. IEEE Transactions on Image Processing, 29, 9152-9164.
    DOI Scopus25
    2019 Qi, Y., Qin, L., Zhang, S., Huang, Q., & Yao, H. (2019). Robust visual tracking via scale-and-state-awareness. Neurocomputing, 329, 75-85.
    DOI Scopus25
    2019 Qi, Y., Zhang, S., Qin, L., Huang, Q., Yao, H., Lim, J., & Yang, M. H. (2019). Hedging deep features for visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(5), 1116-1130.
    DOI Scopus88 Europe PMC3
    2018 Qi, Y., Qin, L., Zhang, J., Zhang, S., Huang, Q., & Yang, M. H. (2018). Structure-aware local sparse coding for visual tracking. IEEE Transactions on Image Processing, 27(8), 3857-3869.
    DOI Scopus47
    2018 Zhang, S., Qi, Y., Jiang, F., Lan, X., Yuen, P. C., & Zhou, H. (2018). Point-to-Set Distance Metric Learning on Deep Representations for Visual Tracking. IEEE Transactions on Intelligent Transportation Systems, 19(1), 187-198.
    DOI Scopus50
    2018 Zhang, L., Zhang, S., Jiang, F., Qi, Y., Zhang, J., Guo, Y., & Zhou, H. (2018). BoMW: Bag of Manifold Words for One-Shot Learning Gesture Recognition from Kinect. IEEE Transactions on Circuits and Systems for Video Technology, 28(10), 2562-2573.
    DOI Scopus21
    2018 Zhu, H., Liu, Q., Qi, Y., Huang, X., Jiang, F., & Zhang, S. (2018). Plant identification based on very deep convolutional neural networks. Multimedia Tools and Applications, 77(22), 29779-29797.
    DOI Scopus35
    2017 Zhang, S., Lan, X., Qi, Y., & Yuen, P. C. (2017). Robust Visual Tracking via Basis Matching. IEEE Transactions on Circuits and Systems for Video Technology, 27(3), 421-430.
    DOI Scopus75
    - Zhao, C., Qi, Y., & Wu, Q. (n.d.). Mind the Gap: Improving Success Rate of Vision-and-Language Navigation
    by Revisiting Oracle Success Routes.
  • Conference Papers

    Year Citation
    2022 Chen, Q., Tan, M., Qi, Y., Zhou, J., Li, Y., & Wu, Q. (2022). V2C: Visual Voice Cloning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR, 2022) Vol. 2022-June (pp. 21210-21219). Online: IEEE.
    DOI Scopus1
    2022 Ye, H., Li, G., Qi, Y., Wang, S., Huang, Q., & Yang, M. H. (2022). Hierarchical Modular Network for Video Captioning. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Vol. 2022-June (pp. 17918-17927). Online: IEEE.
    DOI Scopus14
    2022 Qiao, Y., Qi, Y., Hong, Y., Yu, Z., Wang, P., & Wu, Q. (2022). HOP: History-and-Order Aware Pretraining for Vision-and-Language Navigation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Vol. 2022-June (pp. 15397-15406). New Orleans, LA, USA: IEEE.
    DOI Scopus7
    2022 Chen, W., Hong, D., Qi, Y., Han, Z., Wang, S., Qing, L., . . . Li, G. (2022). Multi-Attention Network for Compressed Video Referring Object Segmentation. In MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia (pp. 4416-4425). Online: Association for Computing Machinery, Inc.
    DOI Scopus2
    2022 Qi, Y., Pan, Z., Hong, Y., Yang, M. H., Van Den Hengel, A., & Wu, Q. (2022). The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2021) (pp. 1635-1644). online: IEEE.
    DOI Scopus19
    2022 Zhu, W., Qi, Y., Narayana, P., Sone, K., Basu, S., Wang, E. X., . . . Wang, W. Y. (2022). Diagnosing Vision-and-Language Navigation: What Really Matters. In NAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (pp. 5981-5993). Online: ssociation for Computational Linguistics (ACL).
    Scopus2 WoS1
    2021 Hong, Y., Wu, Q., Qi, Y., Rodriguez Opazo, C., & Gould, S. (2021). VLN↻BERT: A Recurrent Vision-and-Language BERT for Navigation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 1643-1653). online: IEEE.
    DOI Scopus66 WoS25
    2021 An, D., Qi, Y., Huang, Y., Wu, Q., Wang, L., & Tan, T. (2021). Neighbor-view Enhanced Model for Vision and Language Navigation. In MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia (pp. 5101-5109). virtual online: ACM.
    DOI Scopus14
    2021 Qiao, Y., Chen, Q., Deng, C., DIng, N., Qi, Y., Tan, M., . . . Wu, Q. (2021). R-GAN: Exploring Human-like Way for Reasonable Text-to-Image Synthesis via Generative Adversarial Networks. In MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia (pp. 2085-2093). New York, NY, United States: Association for Computing Machinery.
    DOI Scopus4
    2021 Zheng, S., Sun, J., Liu, Q., Qi, Y., & Zhang, S. (2021). Overwater Image Dehazing via Cycle-Consistent Generative Adversarial Network. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 12623 LNCS (pp. 251-267). Switzerland: Springer International Publishing.
    DOI Scopus1
    2020 Yang, Y., Li, G., Qi, Y., & Huang, Q. (2020). Release the power of online-training for robust visual tracking. In AAAI 2020 - 34th AAAI Conference on Artificial Intelligence Vol. 34 (pp. 12645-12652). online: AAAI.
    Scopus11
    2020 Qi, Y., Wu, Q., Anderson, P., Wang, X., Wang, W. Y., Shen, C., & Van Den Hengel, A. (2020). Reverie: Remote embodied visual referring expression in real indoor environments. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 9979-9988). online: IEEE.
    DOI Scopus90
    2020 Qi, Y., Pan, Z., Zhang, S., van den Hengel, A., & Wu, Q. (2020). Object-and-Action Aware Model for Visual Language Navigation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 12355 LNCS (pp. 303-317). Switzerland: Springer International Publishing.
    DOI Scopus24
    2020 Hong, Y., Rodriguez-Opazo, C., Qi, Y., Wu, Q., & Gould, S. (2020). Language and visual entity relationship graph for agent navigation. In Advances in Neural Information Processing Systems Vol. 2020-December (pp. 1-12). online: NIPS.
    Scopus32
    2019 Wen, L., Zhu, P., Du, D., Bian, X., Ling, H., Hu, Q., . . . He, Z. (2019). VisDrone-SOT2018: The vision meets drone single-object tracking challenge results. In L. LealTaixe, & S. Roth (Eds.), Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 11133 LNCS (pp. 469-495). Munich, GERMANY: SPRINGER INTERNATIONAL PUBLISHING AG.
    DOI Scopus19 WoS6
    2019 Qi, Y., Zhang, S., Zhang, W., Su, L., Huang, Q., & Yang, M. H. (2019). Learning attribute-specific representations for visual tracking. In 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019 (pp. 8835-8842). online: AAAI.
    Scopus48
    2019 Yi, Y., Ni, F., Ma, Y., Zhu, X., Qi, Y., Qiu, R., . . . Wang, Y. (2019). High performance gesture recognition via effective and efficient temporal modeling. In 28th IJCAI International Joint Conference on Artificial Intelligence, IJCAI 19 Vol. 2019-August (pp. 1003-1009). online: International Joint Conferences on Artificial Intelligence Organization, IJCAI.
    DOI Scopus2
    2018 Du, D., Qi, Y., Yu, H., Yang, Y., Duan, K., Li, G., . . . Tian, Q. (2018). The unmanned aerial vehicle benchmark: Object detection and tracking. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 11214 LNCS (pp. 375-391). Switzerland: Springer International Publishing.
    DOI Scopus111
    2016 Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Čehovin, L., . . . Yuen, P. C. (2016). The visual object tracking VOT2016 challenge results. In Computer Vision – ECCV 2016 Workshops. ECCV 2016. Vol. 9914 LNCS (pp. 777-823). Switzerland: Springer International Publishing.
    DOI Scopus920
    2016 Qi, Y., Zhang, S., Qin, L., Yao, H., Huang, Q., Lim, J., & Yang, M. H. (2016). Hedged Deep Tracking. In Proceedings of the 2016 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Vol. 2016-December (pp. 4303-4311). online: IEEE.
    DOI Scopus708
    2015 Kristan, M., Pflugfelder, R., Leonardis, A., Matas, J., Čehovin, L., Nebehay, G., . . . Niu, Z. H. (2015). The visual object tracking VOT2014 challenge results. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 8926 (pp. 191-217). Springer International Publishing.
    DOI Scopus151
    2014 Qi, Y., Yao, H., Sun, X., Sun, X., Zhang, Y., & Huang, Q. (2014). Structure-aware multi-object discovery for weakly supervised tracking. In 2014 IEEE International Conference on Image Processing, ICIP 2014 (pp. 466-470). IEEE.
    DOI Scopus8
    2013 Qi, Y., Dong, K., Yin, L., & Li, M. (2013). 3D segmentation of the lung based on the neighbor information and curvature. In Proceedings - 2013 7th International Conference on Image and Graphics, ICIG 2013 (pp. 139-143). IEEE.
    DOI Scopus1
  • Lentil segmentation and classification, Trust Provenance, $20,000
  • Left-behind object detection, Certis Group, $100,000
  • Artificial Intelligence Technologies - Coordinator
  • Computer Vision
  • Algorithm & Data Structure Analysis
  • Current Higher Degree by Research Supervision (University of Adelaide)

    Date Role Research Topic Program Degree Type Student Load Student Name
    2023 Co-Supervisor Low-supervision Learning via Knowledge Transfer from Pretrained Models Doctor of Philosophy Doctorate Full Time Mr Zicheng Duan
    2022 Co-Supervisor Towards Robust and Efficient Referring Expression Comprehension Master of Philosophy Master Full Time Mr Chongyang Zhao
    2021 Co-Supervisor Generative Adversarial Networks (GANs) to Synthesize Images or Videos from Text. Doctor of Philosophy Doctorate Full Time Mr Qi Chen
    2020 Co-Supervisor General Vision and Language Methods in Real Applications Doctor of Philosophy Doctorate Full Time Miss Yanyuan Qiao
    2020 Co-Supervisor Towards Conversational Vision-Based Artificial Intelligence Doctor of Philosophy Doctorate Full Time Mr Chaorui Deng
  • Position: Grant-Funded Researcher (B)
  • Email: yuankai.qi@adelaide.edu.au
  • Campus: North Terrace
  • Building: Australian Institute for Machine Learning, floor G
  • Org Unit: Australian Institute for Machine Learning - Projects

Connect With Me
External Profiles