A NOVEL DEEP MODEL WITH STRUCTURE OPTIMIZATION FOR SCENE UNDERSTANDING, 392-401.

doi:10.2316/J.2021.206-0412

A NOVEL DEEP MODEL WITH STRUCTURE OPTIMIZATION FOR SCENE UNDERSTANDING, 392-401.

Hengyi Zheng,∗ Chuan Sun,∗∗ and Hongpeng Yin∗∗∗

References

[1] C. Li, H. Gao, Y. Yang, X. Qu, and W. Yuan, Segmentationmethod of high-resolution remote sensing image for fast targetrecognition, International Journal of Robotics and Automation,34(3), 2019, 216–224.
[2] F. Bonin-Font, A. Ortiz, and G. Oliver, Visual navigation formobile robots: A survey, Journal of Intelligent and RoboticSystems, 53(3), 2008, 263–296.400
[3] W. Yuan, Z. Cao, Y. Zhang, M. Tan et al., A robot poseestimation approach based on object tracking in monitoringscenes, International Journal of Robotics and Automation,32(3), 2017, 256–265.
[4] J. Chu, X. P. Liu, C. Jiao, J. Miao, and L. Wang, Multi-view reconstruction of annular outdoor scenes from binocularvideo using global relaxation iteration, International Journalof Robotics and Automation, 26(3), 2011, 272.
[5] J. D. Lin, X. Y. Wu, Y. Chai, and H. P. Yin, Structureoptimization of convolutional neural networks: A survey, ActaAutomatica Sinica, 46(1), 2020, 24–37.
[6] W. Lu, H. Sun, J. Chu, X. Huang, and J. Yu, A novel approachfor video text detection and recognition based on a cornerresponse feature map and transferred deep convolutional neuralnetwork, IEEE Access, 6, 2018, 40198–40211.
[7] S. Wang, L. Lan, X. Zhang, G. Dong, and Z. Luo, Cascadesemantic fusion for image captioning, IEEE Access, 7, 2019,66680–66688.
[8] A. Farhadi, M. Hejrati, M. A. Sadeghi, et al., Every picturetells a story: Generating sentences from images, Proc. ECCV,Berlin, Heidelberg, 2010, 15–29.
[9] M. Hodosh, P. Young, and J. Hockenmaier, Framing imagedescription as a ranking task: Data, models and evaluationmetrics, Proc. AAAI, Buenos Aires, Argentina, 2015, 4188–4192.
[10] Y. Yang, C. L. Teo, H. D. Iii, and Y. Aloimonos, Corpus-guidedsentence generation of natural images, Proc. EMNLP, 2011.
[11] R. Kiros, R. Salakhutdinov, and R. S. Zemel, Unifying visual-semantic embeddings with multimodal neural language models,Computer Science, arXiv:1411.2539, 2014.
[12] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, Show andtell: A neural image caption generator, Proc. IEEE CVPR,2015.
[13] J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach,S. Venugopalan, K. Saenko, and T. Darrell, Long-term recurrentconvolutional networks for visual recognition and description,Proc. IEEE CVPR, Boston, MA, 2015, 2625–2634.
[14] Q. Wu, C. Shen, L. Liu, A. Dick, and A. Van Den Hengel,What value do explicit high level concepts have in vision tolanguage problems? Proc. IEEE CVPR, Las Vegas, NV,2016, 203–212.
[15] Y. Pu, Z. Gan, R. Henao, X. Yuan, C. Li, A. Stevens, andL. Carin, Variational autoencoder for deep learning of images,labels and captions, Proc. NIPS, Barcelona, Spain, 2016,2352–2360.
[16] J. Cheng, P.-s. Wang, G. Li, Q.-h. Hu, and H.-q. Lu, Recentadvances in eﬃcient computation of deep convolutional neuralnetworks, Frontiers of Information Technology & ElectronicEngineering, 19(1), 2018, 64–77.
[17] Y. He, X. Zhang, and J. Sun, Channel pruning for acceleratingvery deep neural networks, Proc. IEEE ICCV, Venice, Italy,2017, 1389–1397.
[18] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P.Graf, Pruning ﬁlters for eﬃcient convnets, arXiv preprintarXiv:1608.08710, 2016.
[19] S. Anwar and W. Sung, Coarse pruning of convolutional neuralnetworks with random masks, 2016.
[20] R. Rigamonti, A. Sironi, V. Lepetit, and P. Fua, Learningseparable ﬁlters, Proc. IEEE CVPR, Portland, OR, 2013,2754–2761.
[21] M. Jaderberg, A. Vedaldi, and A. Zisserman, Speeding upconvolutional neural networks with low rank expansions, arXivpreprint arXiv:1405.3866, 2014.
[22] C. Szegedy, S. Ioﬀe, V. Vanhoucke, and A. A. Alemi, Inception-v4, inception-resnet and the impact of residual connectionson learning, Proc. AAAI, San Francisco, California, 2017,4278–4284.
[23] F. Chollet, Xception: Deep learning with depthwise separableconvolutions, Proc. IEEE CVPR, Honolulu, HI, 2017, 1251–1258.
[24] A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenetclassiﬁcation with deep convolutional neural networks, Proc.NIPS, Lake Tahoe, Nevada, 2012, 1106–1114.
[25] T. N. Sainath, B. Kingsbury, G. Saon, H. Soltau, A. R.Mohamed, G. Dahl, and B. Ramabhadran, Deep convolutionalneural networks for large-scale speech tasks, Neural Networks,64, 39–48, 2015.
[26] C. Szegedy, W. Liu, Y. Jia, et al. Going deeper withconvolutions, Proc. IEEE CVPR, Boston, MA, 2015, 1–9.
[27] K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learningfor image recognition, Proc. IEEE CVPR, Las Vegas, NV,2016, 770–778.
[28] C. Szegedy, V. Vanhoucke, S. Ioﬀe, J. Shlens, and Z. Wojna,Rethinking the inception architecture for computer vision,Proc. IEEE CVPR, Las Vegas, NV, 2016, 2818–2826.
[29] T. Mikolov, K. Chen, G. Corrado, and J. Dean, Eﬃcientestimation of word representations in vector space, arXivpreprint arXiv:1301.3781, 2013.
[30] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, Show andtell: A neural image caption generator, Proc. IEEE CVPR,Boston, MA, 2015, 3156–3164.
[31] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov,R. Zemel, and Y. Bengio, Show, attend and tell: Neural imagecaption generation with visual attention, 2015, 2048–2057.

Important Links:

Abstract
DOI: 10.2316/J.2021.206-0412
From Journal (206) International Journal of Robotics and Automation - 2021

Go Back