[an error occurred while processing this directive]

Journal of Systems Engineering and Electronics ›› 2024, Vol. 35 ›› Issue (6): 1337-1356.doi: 10.23919/JSEE.2022.000155

• •    

  

  • 收稿日期:2022-01-10 出版日期:2024-12-18 发布日期:2025-01-14

A survey of fine-grained visual categorization based on deep learning

Yuxiang XIE1,*(), Quanzhi GONG1(), Xidao LUAN2(), Jie YAN1(), Jiahui ZHANG1()   

  1. 1 College of System Engineering, National University of Defense Technology, Changsha 410000, China
    2 College of Computer Engineering and Applied Mathematics, Changsha University, Changsha 410003, China
  • Received:2022-01-10 Online:2024-12-18 Published:2025-01-14
  • Contact: Yuxiang XIE E-mail:yxxie@nudt.edu.cn;Charles_g27@qq.com;xidaoluan@ccsu.cn;yjierrr@163.com;100634004@qq.com
  • About author:
    XIE Yuxiang was born in 1976. She received her B.S., M.S., and Ph.D. degrees from National University of Defense Technology in 1998, 2001, and 2004 respectively. She is a professor in the School of Information System and Management, National University of Defense Technology. Her research interests include computer vision and image and video analysis, classification, and retrieval. E-mail: yxxie@nudt.edu.cn

    GONG Quanzhi was born in 1998. He received his B.S. degree from 2020. He is pursuing his M.S. degree in National University of Defense Technology. His research interests include fine-grained image classification and action recognition. E-mail: Charles_g27@qq.com

    LUAN Xidao was born in 1976. He received his B.S. degree in applied mathematics in 1998, M.S. and Ph.D. degrees in systems engineering in 2005, 2009 respectively, from National University of Defense Technology. Now he is a professor in the School of Computer Engineering and Applied Mathematics, Changsha University. His research interests include computer vision and image and video analysis, classification, and retrieval. E-mail: xidaoluan@ccsu.cn

    YAN Jie was born in 1999. She received her B.S. degree from 2020. She is pursuing her M.S. degree in National University of Defense Technology. Her research interests include computer vision and image caption. E-mail: yjierrr@163.com

    ZHANG Jiahui was born in 1996. He received his B.S. degree from 2019. He is pursuing his M.S. degree in the National University of Defense Technology. His research interests include computer vision and deep learning. E-mail: 100634004@qq.com
    First author contact:

    GONG Quanzhi

  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61571453; 61806218).

Abstract:

Deep learning has achieved excellent results in various tasks in the field of computer vision, especially in fine-grained visual categorization. It aims to distinguish the subordinate categories of the label-level categories. Due to high intra-class variances and high inter-class similarity, the fine-grained visual categorization is extremely challenging. This paper first briefly introduces and analyzes the related public datasets. After that, some of the latest methods are reviewed. Based on the feature types, the feature processing methods, and the overall structure used in the model, we divide them into three types of methods: methods based on general convolutional neural network (CNN) and strong supervision of parts, methods based on single feature processing, and methods based on multiple feature processing. Most methods of the first type have a relatively simple structure, which is the result of the initial research. The methods of the other two types include models that have special structures and training processes, which are helpful to obtain discriminative features. We conduct a specific analysis on several methods with high accuracy on public datasets. In addition, we support that the focus of the future research is to solve the demand of existing methods for the large amount of the data and the computing power. In terms of technology, the extraction of the subtle feature information with the burgeoning vision transformer (ViT) network is also an important research direction.

Key words: deep learning, fine-grained visual categorization, convolutional neural network (CNN), visual attention