A NOVEL DEEP MODEL WITH STRUCTURE OPTIMIZATION FOR SCENE UNDERSTANDING

Hengyi Zheng,∗ Chuan Sun,∗∗ and Hongpeng Yin∗∗∗

Keywords

Scene understanding, robots, image caption, deep model, model optimization

Abstract

Scene understanding is a fundamental problem in the field of robots, which provides a reference for the decision-making layer of the robot. Recently, the encoder–decoder image caption model based on the deep model was quite capable of accomplishing scene understanding. However, the deep model generally has a large scale, and the high time and space complexity restrict its implementation in the field of robots. In this paper, a hybrid optimization approach based on network pruning and tensor decomposition is presented for the basic components of “encoder–decoder , aiming to reduce the temporal and spatial complexity. First, the data-driven global supervised iterative method is used to decompose the convolutional layer, then the convolution kernels and neurons are sorted according to the important evaluation criteria proposed, and finally, the relatively unimportant convolution kernel or neurons are pruned. Through experiments on public datasets, the results compared with the state- of-the-art methods show that the proposed approach can quickly and accurately understand the image. Moreover, the proposed optimization method can effectively reduce the time and spatial complexity of the deep model.

Important Links:

Go Back