Loading...

Table of Content

    28 February 2023, Volume 29 Issue 1
    Multimedia Information Processing
    Robust metric transfer using joint adversarial training
    YANG Qiancheng, LUO Yong, HU Han, ZHOU Xin, DU Bo, TAO Dacheng
    2023, 29(1):  1-9.  doi:10.12066/j.issn.1007-2861.2460
    Asbtract ( 1324 )   HTML ( 300)   PDF (3162KB) ( 310 )  
    Figures and Tables | References | Related Articles | Metrics

    Transfer metric learning (TML) aims to improve the metric learning in target domains by transferring knowledge from those related tasks where the distance metrics are strong and reliable. Existing TML approaches focus on only transferring the source metric knowledge, which is often prone to overfitting to the source domain. In this study, we train a source metric that is appropriate for transfer and then design a general deep TML method for effective metric transfer. In particular, we propose learning the source metric parameterized by a deep neural network through joint adversarial training and then transferring the metric to the target domain by embedding imitation, which allows the inputs of source and target domains to be heterogeneous. Besides, we restrict the size of the target metric network to be small so that the inference is efficient in the target domain. Finally, the results of applying the proposed method to a popular face verification application demonstrate its effectiveness.

    Multi-granularity fusion-based image inpainting network resistant to deep forensics
    DOU Liyun, FENG Guorui, QIAN Zhenxing, ZHANG Xinpeng
    2023, 29(1):  10-23.  doi:10.12066/j.issn.1007-2861.2456
    Asbtract ( 1553 )   HTML ( 245)   PDF (11465KB) ( 222 )  
    Figures and Tables | References | Related Articles | Metrics

    Considering that it is relatively easy to use technological tools to attack and tamper with digital media, image forensics technology has been studied extensively in the field of image security. In addition to developing more realistic image forgery operations, research on image forgery anti-forensics technology also promotes forensics technology development in the opposite direction. The process of image inpainting has always been a research hotspot. This paper proposes a multi-granularity fusion-based image inpainting network resistant to deep forensics (MGFR) network. MGFR network comprises three parts: a codec, multi-granularity generation module, and a multi-granularity attention module. First, the encoder encodes the input damaged image into depth features, and then the depth features are generated by the multi-granularity generation module into three intermediate features. Subsequently, we use the multi-granularity attention module to fuse the intermediate features of different granularities. As a final step, the fused features are passed through the decoder to produce the output. Additionally, the proposed MGFR is jointly supervised by reconstruction loss, pattern noise loss, deep forensic loss, and adversarial loss. Experimental results reveal that the proposed MGFR avoids the forensics of deep forensics networks while maintaining decent inpainting performance.

    Research Articles
    Convolutional speech emotion recognition network based on incremental method
    ZHU Yonghua, FENG Tianyu, ZHANG Meixian, ZHANG Wenjun
    2023, 29(1):  24-40.  doi:10.12066/j.issn.1007-2861.2332
    Asbtract ( 1589 )   HTML ( 236)   PDF (503KB) ( 286 )  
    Figures and Tables | References | Related Articles | Metrics

    A new speech emotion recognition structure was proposed, which extracted Mel-scale frequency cepstral coefficients (MFCCs), linear prediction cepstral coefficients (LPCCs), chromaticity diagrams, Mel-scale spectrograms, Tonnetz representations and spectral contrast features from sound files. It was uesd as the input of one-dimensional convolutional neural network (CNN). A network was constructed consisting of one-dimensional convolutional layer, Dropout layer, batch normalization layer, weight pooling layer, fully connected layer and activation layer. The samples of RAVDESS(Ryerson audio-visual database of emotional speech and song), EMO-DB(Berlin emotional database) and IEMOCAP (interactiveemotionaldyadic motioncapture) data sets were used to identify emotions. In order to improve the classification accuracy, an incremental method was used to modify the initial model. In order to enable the network to automatically deal with the uneven distribution of emotional information in the discourse, a weighted pool method based on the attention mechanism was used to generate more effective discourse-level representations. Experimental results showed that the performance of this model was better than existing methods on the RAVDESS and IEMOCAP data sets. For the EMO-DB, it had advantages in versatility, simplicity and applicability.

    Foreground object perception and location algorithm based on semantic feature propagation model in MR
    FANG Zhe, ZHANG Jinyi, JIANG Yuxi
    2023, 29(1):  41-55.  doi:10.12066/j.issn.1007-2861.2413
    Asbtract ( 1345 )   HTML ( 6)   PDF (3293KB) ( 395 )  
    Figures and Tables | References | Related Articles | Metrics

    Accurate location information obtained by mobile agents is the key to building a stable mixed reality (MR) system. However, foreground objects in an MR scene have a significant impact on the accuracy of traditional location algorithms. At present, location algorithms based on deep learning show relatively improved accuracy by identifying foreground objects, but the time consumption of a deep learning model is too high, resulting in a decline in the real-time performance of the algorithms. To solve this problem, this paper proposes a foreground object-aware location algorithm based on an MR semantic feature propagation model. The algorithm builds a semantic feature propagation model based on a semantic segmentation network and the oriented FAST and rotated BRIEF feature extraction algorithm to realize high-speed semantic feature extraction. The model and a geometric feature detection method are fused to realize the foreground object perception layer in the algorithm, which eliminates the feature points on the foreground objects in MR, and to construct a background feature point set to realize high precision and high real-time location. Experimental results show that the proposed algorithm reduces the relative pose error by 60.5% and improves the real-time location performance by 39.5% compared to the dynamic scenes simultaneous localization and mapping location algorithm in the high-dynamic foreground object scene of the Technical University of Munich public dataset. Therefore, this algorithm has high application value in MR scenes.

    HDR video reconstruction based on attention and feedback mechanism
    YANG Yingjie, WANG Yongfang, ZHANG Han
    2023, 29(1):  56-67.  doi:10.12066/j.issn.1007-2861.2307
    Asbtract ( 1959 )   HTML ( 7)   PDF (13684KB) ( 203 )  
    Figures and Tables | References | Related Articles | Metrics

    In the study, we developed a high dynamic range (HDR) reconstruction method based on attention and feedback mechanism. First, three continuous frames with cyclic exposure were captured as the input of the network. The attention image was generated by introducing the attention module, and the acquired features were weighted adaptively to optimise the feature extraction of the network and reduce ghost phenomenon occurrence. Subsequently, the feedback mechanism was introduced into the network to improve the efficiency of feature information further and optimise the network performance in feature fusion and reconstruction. Finally, based on the L1 loss function, the proposed network added colour similarity and VGG loss functions to enhance the colour similarity and reconstructed HDR image details. The experimental results show that the proposed HDR reconstruction method based on attention and feedback mechanism can achieve better subjective and objective reconstruction quality and is superior to the existing mainstream algorithm.

    Design of real-time basketball referee gesture recognition system based on loss weighting
    LI Zhongyu, SUN Haodong, LI Jiao
    2023, 29(1):  68-81.  doi:10.12066/j.issn.1007-2861.2422
    Asbtract ( 1372 )   HTML ( 17)   PDF (7771KB) ( 804 )  
    Figures and Tables | References | Related Articles | Metrics

    To help the audience better understand the meaning of the referee's gesture when watching a live broadcast or a video of a basketball game, or to help video analysts analyze the game video, a real-time basketball referee gesture detection and recognition system, Yolov5-Basketball Referee (Yolov5-BR), was designed. The Yolov5 target detection algorithm was used as the basic model, and the intersection over union (IoU) and complete IoU (CIoU) loss functions of its boundary box were weighted to enhance the robustness of the prediction box. After the C3 module, an attention mechanism was added to generate more distinguishing feature representation and improve the network recognition performance. In addition, an adaptive feature fusion mechanism was incorporated into the head of the detection layer to make full use of the high-level semantic information of the image. Finally, the target confidence loss function was weighted unequally to improve the robustness of small-target detection. On a self-made referee gesture dataset, Yolov5-BR achieved a 95.4% mAP value, with a local video detection rate of 55.5 frame/s, an external camera resolution of $1 280 \times 960$, and a detection rate of 25 frame/s. Experimental results show that, compared with the original model, Yolov5-BR can effectively improve the performance of judging gestures while maintaining high accuracy, stability, and real-time response.

    A collaborative neighbor discovery protocol for unmanned aerial vehicle network based on improved three-way handshake mechanism and SVM
    WANG Tao, CHEN Yinhao, LI Ping, WU Yating, SUN Yanzan, WANG Rui
    2023, 29(1):  82-94.  doi:10.12066/j.issn.1007-2861.2347
    Asbtract ( 1312 )   HTML ( 4)   PDF (1363KB) ( 274 )  
    Figures and Tables | References | Related Articles | Metrics

    Unmanned aerial vehicles have been widely used in military and civil fields, and unmanned aerial vehicle networking has become a hot research topic. Neighbor discovery is a prerequisite step to complete a network. In this study, the 3-handshake non-cooperation neighbor discovery (3-NCND) protocol is first improved. Then a support vector machine (SVM) algorithm is integrated and a 3-handshake and SVM cooperative neighbor discovery (3-SVMCND) protocol is proposed. SVMs are trained as classifiers and then added to the neighbor discovery protocol. The protocol can intelligently recommend its own neighbor nodes to the target nodes and can reduce the number of neighbor node recommendations while improving the speed of neighbor discovery. In this study, three performance indices of neighbor discovery, namely, slot number, neighbor discovery rate, and energy consumption, are compared in a simulation experiment. Results show that 3-SVMCND has a faster discovery speed and higher neighbor discovery rate than those of 3-NCND.

    A novel high-efficiency triple-band rectenna for energy collection
    LIU Jiuchun, YANG Xuexia
    2023, 29(1):  95-104.  doi:10.12066/j.issn.1007-2861.2356
    Asbtract ( 1430 )   HTML ( 7)   PDF (23120KB) ( 325 )  
    Figures and Tables | References | Related Articles | Metrics

    A novel triple-band microstrip patch rectenna was proposed for energy collection, and the receiving antenna was a new triple-band patch antenna with high gain. First, a U-shaped slot was etched on both sides of the patch to extend the path of the surface current; this ensured that the resonant frequency shifted to a low frequency and that the patch antenna was miniaturized. Second, by etching an H-slot and a U-slot on the rectangular patch, the current distribution was altered, and two new resonance frequencies were generated. The resonant frequencies of the antenna could be modified by adjusting the lengths of the H-slot and the inverted U-slot. The rectifier consisted of an impedance matching network, a rectifying diode, a pass-through filter, and a load. The two-stage matching network consisted of $\Pi$-type and T-type networks for matching the input impedance of the receiving antenna and the rectifying diode. A pass-through filter comprising a quarter-wavelength microstrip line and a filter capacitor was used to suppress high-order harmonics generated by diode nonlinearity in order to avoid energy loss. The receiving antenna and the rectifier were integrated as the rectenna. The experimental results show that, when the receiving power is approximately 3 dBm, the maximum efficiency of the rectenna is 54.1%, 43.9%, and 39.9% at 2.06, 3.43, and 5.25 GHz, respectively. Therefore, the rectenna can be used for the power supply of low-power electronic devices in the Internet of Things.

    Application of priority deep deterministic strategy algorithm in autonomous driving
    JIN Yanliang, LIU Qianhong, JI Zeyu
    2023, 29(1):  105-117.  doi:10.12066/j.issn.1007-2861.2365
    Asbtract ( 1543 )   HTML ( 8)   PDF (27332KB) ( 171 )  
    Figures and Tables | References | Related Articles | Metrics

    The deep deterministic policy gradient (DDPG) algorithm is widely used in autonomous driving; however, some problems, such as the high proportion of inefficient policies, low training efficiency, and slow convergence due to uniform sampling, still need to be addressed. In this paper, a priority-based deep deterministic policy gradient (P-DDPG) algorithm is proposed to enhance sampling utilization, improve exploration strategies, and increase the neural network training efficiency by using priority sampling instead of uniform sampling and employing a new reward function as an evaluation criterion. Finally, the performance of P-DDPG is evaluated on the The Open Racing Car Simulator (TORCS) platform. The results show that the cumulative reward of P-DDPG significantly improve after 25 rounds compared with that of the DDPG algorithm. Furthermore, the training effect of DDPG is gradually obtained after 100 rounds, which is approximately 4 times higher than that of P-DDPG. The training efficiency and convergence speed are, therefore, enhanced by using P-DDPG instead of DDPG.

    Emotional analysis model of financial text based on the BERT
    ZHU He, LU Xiaofeng, XUE Lei
    2023, 29(1):  118-128.  doi:10.12066/j.issn.1007-2861.2308
    Asbtract ( 2734 )   HTML ( 93)   PDF (365KB) ( 3026 )  
    Figures and Tables | References | Related Articles | Metrics

    n the financial sector, more and more investors choose to express their opinions on the internet platform. These comment texts can fully reflect investor sentiment and influence their investment decisions and market trends. Emotion analysis as an important branch of natural language processing (NLP), which provides an effective research means for analyzing a large number of text emotional types in financial sector. However, due to the professional nature of domain-specific texts and the inapplicability of large label data sets, text emotion analysis in the financial field has brought great challenges to the traditional emotion analysis model. When the general emotion analysis model is applied to specific fields such as finance, its accuracy and recall rate are poor. In order to overcome these challenges, a BERT (bidirectional encoder representations from Transformers) preprocessing model based on full word coverage and feature enhancement in financial field was proposed for the emotional analysis task of financial text from the perspective of word representation model.

    Personalized learning path recommendation based on improved ant colony algorithm
    XIA Ruiling, LI Guoping, WANG Guozhong, TENG Guowei
    2023, 29(1):  129-139.  doi:10.12066/j.issn.1007-2861.2342
    Asbtract ( 1879 )   HTML ( 22)   PDF (619KB) ( 848 )  
    Figures and Tables | References | Related Articles | Metrics

    At present, most learning path recommendation fields are learning resource recommendation and the application rate of course knowledge graph at a low rate. Therefore, a method which combines knowledge mapping technology, deep knowledge tracking model, and ant colonies, is proposed, to improve the classification of ant colonies in the traditional ant colony algorithm. Initially, taking a course knowledge map as a foundation, deep knowledge tracking is applied to the classification of different levels of learners and combined with knowledge difficulty weights. The corresponding path planning with ant colony algorithm, classifies ants according to different learner categories. The shortest path in considering objective knowledge groups of different learning levels is also studied to make personalized efficient learning path recommendation. Finally, the validity of the proposed method is verified on the open dataset of ASSISTment.

    Bayesian spatial interpolation method for compression modulus fusion of CPT data
    DONG Jihan, WANG Changhong
    2023, 29(1):  140-154.  doi:10.12066/j.issn.1007-2861.2272
    Asbtract ( 8381 )   HTML ( 9)   PDF (910KB) ( 713 )  
    Figures and Tables | References | Related Articles | Metrics

    Large-scale modern exhibition venues are more sensitive to uneven foundation settlements, where the spatial distribution of the compressive modulus of the bearing layer is essential in controlling foundation deformations. Conventional engineering survey boreholes provide only a small number of precise compressive modulus geotechnical test values, whereas in-situ testing can provide numerous random cone penetration values. To integrate the data of indoor and in-situ tests, a Bayesian spatial interpolation method of compression modulus is proposed in this study. Our research was conducted as follows. Based on the data accuracy of geotechnical engineering investigation, test data were divided into hard and soft data. A spatial random function was then used to describe the spatial variability of the compression modulus. Next, maximum entropy theory was applied to analyze the uncertainty of the soft data. Based on Bayesian theory, a random field interpolation method was then established to estimate the posterior distribution of the compression modulus of unknown points. Finally, to verify the effectiveness of the proposed method, a Bayesian spatial interpolation method was applied to the spatial variability analysis of the compressive modulus of silty clay in the shallow bearing layer ②$_1$ of Shanghai National Convention and Exhibition Center. Compared with the ordinary Kriging interpolation method, the proposed Bayesian method can integrate multi-source survey data for spatial interpolation with greater accuracy.

    Natural frequency drift induced by internal flow and cross flow in a straight tube heat exchanger
    ZHANG Xin, LI Xiaowei, MAO Fangsai, LI Chunxin
    2023, 29(1):  155-165.  doi:10.12066/j.issn.1007-2861.2322
    Asbtract ( 1384 )   HTML ( 4)   PDF (468KB) ( 262 )  
    Figures and Tables | References | Related Articles | Metrics

    Using a two-dimensional flow model around a circular cylinder, the lift and drag functions of a straight tube heat exchanger were developed to estimate the external flow action with a wide range of Reynolds numbers. Subsequently, an improved model coupling the internal flow with the external flow was proposed, in which the lift and drag functions were treated as excitations and the added mass resulting from the unsteady internal and external flows was also considered. The results showed that the added mass due to the cross flow led to a drift in the natural frequency of the tube and that the Reynolds number also affected the frequency of the external excitation. Given that the internal flow Reynolds number also affects the natural frequency of the tube, resonance can occur under the combined action of specific internal flow and cross flow. The current model can predict the range of internal and external Reynolds numbers and therefore provide a theoretical basis for reasonably setting the safe operation conditions of internal and external flows.

    Microbial diversity in lichen and moss habitats on Leshan Giant Buddha in Sichuan, China
    CHEN Xueping, BAI Fayan, YU Juan, LU Yongsheng, SONG Shaolei, DONG Haiyan, PENG Xueyi, HUANG Jizhong
    2023, 29(1):  166-174.  doi:10.12066/j.issn.1007-2861.2295
    Asbtract ( 1891 )   HTML ( 17)   PDF (3565KB) ( 1082 )  
    Figures and Tables | References | Related Articles | Metrics

    Microbial diversity in lichen (LI) and moss habitats on Leshan Giant Buddha was analysed using high-throughput sequencing to determine the major microorganisms responsible for the biodeterioration of this monument. Bacterial and fungal communities were clustered in the LI, live moss (LM), and dead moss residue (DM) groups. The communities could be distinguished among the three groups. The bacterial communities in LI and LM groups were similar, with the families Cyanobacteria and Acetobacteraceae predominating. The order Capnodiales was the dominant fungal order in the LI group, but it was less abundant in the LM group. Ascomycota was the dominant fungal phylum in the LM group. The relative abundance ranged from 7.47% to 52.6% in the LI and DM groups. The archaeal community markedly differed among LI, LM, and DM groups. The soil Crenarchaeotic group (SCG) related to nitrogen transformation was the most abundant. In addition, large numbers of unclassified-k-norank species were observed across all groups.

    Faddeev model in Minkowski space $ R^{{\bf 1\textbf{+}}\textbf{(}{\bf 1}\textbf{+} n\textbf{)}}$
    LIU Sijie, LIU Jianli, SHENG Wancheng
    2023, 29(1):  175-184.  doi:10.12066/j.issn.1007-2861.2298
    Asbtract ( 1871 )   HTML ( 14)   PDF (297KB) ( 410 )  
    References | Related Articles | Metrics

    The Faddeev model is used in modeling heavy elementary particles by topological knotted solitons in classical field theory. It is a generalization of the well-known classical nonlinear sigma model of Gell-Mann and Levy. In addition, it is closely related to the celebrated Skyrme model. In this paper, we derive the equation of the Faddeev model in the Minkowski space $R^{1+(1+n)}$, and show that the system enjoys many interesting properties, and provide some exact solutions in special cases.