
Yuankai Qi
Grant-Funded Researcher (B)
Australian Institute for Machine Learning - Projects
Faculty of Sciences, Engineering and Technology
Eligible to supervise Masters and PhD - email supervisor to discuss availability.
My research involves broad machine learning and artificial intelligence at the joint of computer vision, natural language processing, and speech processing, such as vision-and-language navigation for embedded AI (teaching robots/drones to understand/execute human commands), visual voice cloning (aka movie dubbing), and video captioning (summarize events in a video). I also do pure computer vision tasks and pure natural language processing tasks, such as crowd counting (count objects/people in an image), anomaly video detection, visual object tracking (track a target in a video), and document event prediction. I have published over 40 papers on top-tier venues, such as CVPR, ICCV, AAAI, ACM MM, ECCV, IJCAI, IEEE TPAMI, IEEE TIP. Four of my authored/co-authored papers were accepted as oral reports (acceptance rate < 5%) on CVPR (twice), ACM MM, and NAACL. My academic service includes working as area chair for IJCAI and BMVC and as a regular reviewer for the above conferences/journals. Here is my profile on Google Scholar. I am eligible as Principal Supervisor for both Masters and PhD students.
Honors and Awards:
Winner of CAAI Outstanding Doctoral Dissertations, China, 2020 (10 winners across China, link)
Merit PhD Candidate of Heilongjiang Province, China, 2017
Winner of Supreme National Scholarship for PhD Candidates, 2016
VisDrone 2018: 2nd place in the Vision Meets Drones: Single Object Tracking Challenge! [VisDrone2018 results]
DAVIS 2017: Champion in the DAVIS Challenge on Video Object Segmentation 2017! [DAVIS2017 results]
VOT 2016: Our State-and-Scale Aware Tracker (SSAT) achieves the most accurate tracking results among totally 70 trackers on VOT 2016! [ VOT2016 results paper ]
-
Journals
Year Citation 2023 Jiang, S., Wang, Q., Cheng, F., Qi, Y., & Liu, Q. (2023). A Unified Object Counting Network with Object Occupation Prior. IEEE Transactions on Circuits and Systems for Video Technology, 1.
2023 Ge, C., Song, Y., Ma, C., Qi, Y., & Luo, P. (2023). Rethinking Attentive Object Detection via Neural Attention Learning. IEEE Transactions on Image Processing, 1.
2023 Qiao, Y., Qi, Y., Hong, Y., Yu, Z., Wang, P., & Wu, Q. (2023). HOP+: History-Enhanced and Order-Aware Pre-Training for Vision-and-Language Navigation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(7), 8524-8537.
Scopus1 WoS12022 Xu, K., Li, G. R., Hong, D. X., Zhang, W. G., Qi, Y. K., & Huang, Q. M. (2022). A Fast Video Object Segmentation Method Based on Inductive Learning and Transductive Reasoning. Jisuanji Xuebao/Chinese Journal of Computers, 45(10), 2117-2132.
2021 Wang, Y., Qi, Y., Yao, H., Gong, D., & Wu, Q. (2021). Image editing with varying intensities of processing. Computer Vision and Image Understanding, 211, 1-13.
Scopus3 WoS42021 Han, T., Qi, Y., & Zhu, S. (2021). A continuous semantic embedding method for video compact representation. Electronics (Switzerland), 10(24), 3106-1-3106-14.
2021 Jiang, S., Qi, Y., Zhang, H., Bai, Z., Lu, X., & Wang, P. (2021). D3D: Dual 3-D Convolutional Network for Real-Time Action Recognition. IEEE Transactions on Industrial Informatics, 17(7), 4584-4593.
Scopus172021 Jiang, S., Qi, Y., Cai, S., & Lu, X. (2021). Light fixed-time control for cluster synchronization of complex networks. Neurocomputing, 424, 63-70.
Scopus142020 Zheng, S., Sun, J., Liu, Q., Qi, Y., & Yan, J. (2020). Overwater image dehazing via cycle-consistent generative adversarial network. Electronics (Switzerland), 9(11), 1-19.
Scopus32020 Qi, Y., Zhang, S., Jiang, F., Zhou, H., Tao, D., & Li, X. (2020). Siamese Local and Global Networks for Robust Face Tracking. IEEE Transactions on Image Processing, 29, 9152-9164.
Scopus252019 Qi, Y., Qin, L., Zhang, S., Huang, Q., & Yao, H. (2019). Robust visual tracking via scale-and-state-awareness. Neurocomputing, 329, 75-85.
Scopus252019 Qi, Y., Zhang, S., Qin, L., Huang, Q., Yao, H., Lim, J., & Yang, M. H. (2019). Hedging deep features for visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(5), 1116-1130.
Scopus88 Europe PMC32018 Qi, Y., Qin, L., Zhang, J., Zhang, S., Huang, Q., & Yang, M. H. (2018). Structure-aware local sparse coding for visual tracking. IEEE Transactions on Image Processing, 27(8), 3857-3869.
Scopus472018 Zhang, S., Qi, Y., Jiang, F., Lan, X., Yuen, P. C., & Zhou, H. (2018). Point-to-Set Distance Metric Learning on Deep Representations for Visual Tracking. IEEE Transactions on Intelligent Transportation Systems, 19(1), 187-198.
Scopus502018 Zhang, L., Zhang, S., Jiang, F., Qi, Y., Zhang, J., Guo, Y., & Zhou, H. (2018). BoMW: Bag of Manifold Words for One-Shot Learning Gesture Recognition from Kinect. IEEE Transactions on Circuits and Systems for Video Technology, 28(10), 2562-2573.
Scopus212018 Zhu, H., Liu, Q., Qi, Y., Huang, X., Jiang, F., & Zhang, S. (2018). Plant identification based on very deep convolutional neural networks. Multimedia Tools and Applications, 77(22), 29779-29797.
Scopus352017 Zhang, S., Lan, X., Qi, Y., & Yuen, P. C. (2017). Robust Visual Tracking via Basis Matching. IEEE Transactions on Circuits and Systems for Video Technology, 27(3), 421-430.
Scopus75- Zhao, C., Qi, Y., & Wu, Q. (n.d.). Mind the Gap: Improving Success Rate of Vision-and-Language Navigation
by Revisiting Oracle Success Routes. -
Conference Papers
Year Citation 2022 Chen, Q., Tan, M., Qi, Y., Zhou, J., Li, Y., & Wu, Q. (2022). V2C: Visual Voice Cloning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR, 2022) Vol. 2022-June (pp. 21210-21219). Online: IEEE.
Scopus12022 Ye, H., Li, G., Qi, Y., Wang, S., Huang, Q., & Yang, M. H. (2022). Hierarchical Modular Network for Video Captioning. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Vol. 2022-June (pp. 17918-17927). Online: IEEE.
Scopus142022 Qiao, Y., Qi, Y., Hong, Y., Yu, Z., Wang, P., & Wu, Q. (2022). HOP: History-and-Order Aware Pretraining for Vision-and-Language Navigation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Vol. 2022-June (pp. 15397-15406). New Orleans, LA, USA: IEEE.
Scopus72022 Chen, W., Hong, D., Qi, Y., Han, Z., Wang, S., Qing, L., . . . Li, G. (2022). Multi-Attention Network for Compressed Video Referring Object Segmentation. In MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia (pp. 4416-4425). Online: Association for Computing Machinery, Inc.
Scopus22022 Qi, Y., Pan, Z., Hong, Y., Yang, M. H., Van Den Hengel, A., & Wu, Q. (2022). The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2021) (pp. 1635-1644). online: IEEE.
Scopus192022 Zhu, W., Qi, Y., Narayana, P., Sone, K., Basu, S., Wang, E. X., . . . Wang, W. Y. (2022). Diagnosing Vision-and-Language Navigation: What Really Matters. In NAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (pp. 5981-5993). Online: ssociation for Computational Linguistics (ACL).
Scopus2 WoS12021 Hong, Y., Wu, Q., Qi, Y., Rodriguez Opazo, C., & Gould, S. (2021). VLN↻BERT: A Recurrent Vision-and-Language BERT for Navigation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 1643-1653). online: IEEE.
Scopus66 WoS252021 An, D., Qi, Y., Huang, Y., Wu, Q., Wang, L., & Tan, T. (2021). Neighbor-view Enhanced Model for Vision and Language Navigation. In MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia (pp. 5101-5109). virtual online: ACM.
Scopus142021 Qiao, Y., Chen, Q., Deng, C., DIng, N., Qi, Y., Tan, M., . . . Wu, Q. (2021). R-GAN: Exploring Human-like Way for Reasonable Text-to-Image Synthesis via Generative Adversarial Networks. In MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia (pp. 2085-2093). New York, NY, United States: Association for Computing Machinery.
Scopus42021 Zheng, S., Sun, J., Liu, Q., Qi, Y., & Zhang, S. (2021). Overwater Image Dehazing via Cycle-Consistent Generative Adversarial Network. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 12623 LNCS (pp. 251-267). Switzerland: Springer International Publishing.
Scopus12020 Yang, Y., Li, G., Qi, Y., & Huang, Q. (2020). Release the power of online-training for robust visual tracking. In AAAI 2020 - 34th AAAI Conference on Artificial Intelligence Vol. 34 (pp. 12645-12652). online: AAAI.
Scopus112020 Qi, Y., Wu, Q., Anderson, P., Wang, X., Wang, W. Y., Shen, C., & Van Den Hengel, A. (2020). Reverie: Remote embodied visual referring expression in real indoor environments. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 9979-9988). online: IEEE.
Scopus902020 Qi, Y., Pan, Z., Zhang, S., van den Hengel, A., & Wu, Q. (2020). Object-and-Action Aware Model for Visual Language Navigation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 12355 LNCS (pp. 303-317). Switzerland: Springer International Publishing.
Scopus242020 Hong, Y., Rodriguez-Opazo, C., Qi, Y., Wu, Q., & Gould, S. (2020). Language and visual entity relationship graph for agent navigation. In Advances in Neural Information Processing Systems Vol. 2020-December (pp. 1-12). online: NIPS.
Scopus322019 Wen, L., Zhu, P., Du, D., Bian, X., Ling, H., Hu, Q., . . . He, Z. (2019). VisDrone-SOT2018: The vision meets drone single-object tracking challenge results. In L. LealTaixe, & S. Roth (Eds.), Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 11133 LNCS (pp. 469-495). Munich, GERMANY: SPRINGER INTERNATIONAL PUBLISHING AG.
Scopus19 WoS62019 Qi, Y., Zhang, S., Zhang, W., Su, L., Huang, Q., & Yang, M. H. (2019). Learning attribute-specific representations for visual tracking. In 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019 (pp. 8835-8842). online: AAAI.
Scopus482019 Yi, Y., Ni, F., Ma, Y., Zhu, X., Qi, Y., Qiu, R., . . . Wang, Y. (2019). High performance gesture recognition via effective and efficient temporal modeling. In 28th IJCAI International Joint Conference on Artificial Intelligence, IJCAI 19 Vol. 2019-August (pp. 1003-1009). online: International Joint Conferences on Artificial Intelligence Organization, IJCAI.
Scopus22018 Du, D., Qi, Y., Yu, H., Yang, Y., Duan, K., Li, G., . . . Tian, Q. (2018). The unmanned aerial vehicle benchmark: Object detection and tracking. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 11214 LNCS (pp. 375-391). Switzerland: Springer International Publishing.
Scopus1112016 Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Čehovin, L., . . . Yuen, P. C. (2016). The visual object tracking VOT2016 challenge results. In Computer Vision – ECCV 2016 Workshops. ECCV 2016. Vol. 9914 LNCS (pp. 777-823). Switzerland: Springer International Publishing.
Scopus9202016 Qi, Y., Zhang, S., Qin, L., Yao, H., Huang, Q., Lim, J., & Yang, M. H. (2016). Hedged Deep Tracking. In Proceedings of the 2016 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Vol. 2016-December (pp. 4303-4311). online: IEEE.
Scopus7082015 Kristan, M., Pflugfelder, R., Leonardis, A., Matas, J., Čehovin, L., Nebehay, G., . . . Niu, Z. H. (2015). The visual object tracking VOT2014 challenge results. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 8926 (pp. 191-217). Springer International Publishing.
Scopus1512014 Qi, Y., Yao, H., Sun, X., Sun, X., Zhang, Y., & Huang, Q. (2014). Structure-aware multi-object discovery for weakly supervised tracking. In 2014 IEEE International Conference on Image Processing, ICIP 2014 (pp. 466-470). IEEE.
Scopus82013 Qi, Y., Dong, K., Yin, L., & Li, M. (2013). 3D segmentation of the lung based on the neighbor information and curvature. In Proceedings - 2013 7th International Conference on Image and Graphics, ICIG 2013 (pp. 139-143). IEEE.
Scopus1
- Lentil segmentation and classification, Trust Provenance, $20,000
- Left-behind object detection, Certis Group, $100,000
- Artificial Intelligence Technologies - Coordinator
- Computer Vision
- Algorithm & Data Structure Analysis
-
Current Higher Degree by Research Supervision (University of Adelaide)
Date Role Research Topic Program Degree Type Student Load Student Name 2023 Co-Supervisor Low-supervision Learning via Knowledge Transfer from Pretrained Models Doctor of Philosophy Doctorate Full Time Mr Zicheng Duan 2022 Co-Supervisor Towards Robust and Efficient Referring Expression Comprehension Master of Philosophy Master Full Time Mr Chongyang Zhao 2021 Co-Supervisor Generative Adversarial Networks (GANs) to Synthesize Images or Videos from Text. Doctor of Philosophy Doctorate Full Time Mr Qi Chen 2020 Co-Supervisor General Vision and Language Methods in Real Applications Doctor of Philosophy Doctorate Full Time Miss Yanyuan Qiao 2020 Co-Supervisor Towards Conversational Vision-Based Artificial Intelligence Doctor of Philosophy Doctorate Full Time Mr Chaorui Deng
Connect With Me
External Profiles