Table of contents

Volume 1229

May 2019

Previous issue Next issue

2019 3rd International Conference on Machine Vision and Information Technology (CMVIT 2019) 22–24 February 2019, Guangzhou, China

Accepted papers received: 11 April 2019
Published online: 29 May 2019

Preface

011001
The following article is Open access

PREFACE

CMVIT 2019 was held in Guangzhou, China, Feb. 22-24, 2019. CMVIT 2019 was organized by Asia Pacific Institute of Science and Engineering. The conference provides a useful and wide platform both for display the latest research and for exchange of research results and thoughts in broad Machine Vision and Information Technology. The participants of the conference were from almost every part of the world, with background of either academia or industry, even well-known enterprise. The success and prosperity of the conference is reflected high level of the papers received.

The proceedings are a compilation of the accepted papers and represent an interesting outcome of the conference. This book covers 2 chapters: 1. Machine Vision; 2. Information Technology.

We would like to acknowledge all of those who supported CMVIT 2019. Each individual and institutional help were very important for the success of this conference. Especially we would like to thank the organizing committee for their valuable advices in the organization and helpful peer review of the papers.

We sincerely hope that CMVIT 2019 will be a forum for excellent discussions that will put forward new ideas and promote collaborative researches. We are sure that the proceedings will serve as an important research source of references and the knowledge, which will lead to not only scientific and engineering progress but also other new products and processes.

011002
The following article is Open access

List of Conference Committee Co-Chairs, Program Committee and International Technical Committees are available in this pdf.

011003
The following article is Open access

All papers published in this volume of Journal of Physics: Conference Series have been peer reviewed through processes administered by the proceedings Editors. Reviews were conducted by expert referees to the professional and scientific standards expected of a proceedings journal published by IOP Publishing.

Papers

Machine Vision

012001
The following article is Open access

, and

There are many ways to super-resolution reconstruction of a single image. This method require the association between low-resolution images and high-resolution images, and has achieved good results in different applications. This article also pursues the correlation between low-resolution images and high-resolution images, but we combine images with linear networks to explore the possibilities of combining the two domains. In order to achieve the purpose of restoring image details, we used a method of combining image and network, different from the traditional single-picture super-resolution, we using network delay and packet loss compensation, fully retain the data packets before and after, and find the internal similarity from low-resolution images. Experiments show that inputting images into a network system is affected by system packet loss and has a large impact. It also shows that it is feasible and effective to combine images with network packet loss in the networked control systems model, and specific image effects will be shown in the paper. Our algorithm has achieve a certain expected effect, by superimposing the image super-resolution reconstruction is more extensive, and it is also convenient for future research based on image super-resolution.

012002
The following article is Open access

, and

Facial expression is a key to nonverbal communication, which has been confirmed by many different research projects. A change in intensity or magnitude of even one specific facial expression can cause different interpretations. With the continuous and fast development of computer vision and pattern recognition, facial expression recognition has received significant attention recently due to the wide range of commercial and law enforcement application and the availability of feasible technology during 30 years of research. In this thesis, facial expression recognition is studied by applying several commonly used methods in the whole process. By numerical experiment, we find that our approach with Gabor based transformation for face expression feature extraction, combining the advantages of various algorithms, Gabor wavelet transform and non-negative matrix decomposition of facial expressions are used to obtain features, and CNN is used to classify and apply them to facial expression recognition.

012003
The following article is Open access

, , and

Traditional statistical methods have become insufficient when applied to image analysis. The increasing size of data volume and its complexity demands new statistical approaches and algorithms. Current methods imply losing intrinsic data structures, for example when data comes from multiway arrays. In this work we concentrate in two applications i) A pre-diagnostic smartphone application for detection of cardiovascular abnormalities through the analysis of heartbeat sounds and the use of augmented reality for displaying valuable information to the end user in an immersive experience. Using the latest augmented reality smartphone applications, a digital stethoscope, heartbeat audios and classification using neural networks, we measure a user's heartbeat and output in real time, a pre-diagnostic of their current cardiovascular health. ii) A study concerning comatose patients, based on a Diffusion Tensor Image Magnetic Resonance Imaging (MRI) dataset that predicts long-term outcome for patients having suffered a brain traumatic injury. MRI images were obtained from 104 comatose patients, 65 with positive outcome and 39 with negative outcome, 39 controls were used. The fact that each volumetric image led into a 143x255726x4 tensor input, is used to briefly explain how new multiway methods could be useful in image analysis methods.

012004
The following article is Open access

, , , , and

Visual search in millions of samples in high-dimensional feature space is computationally expensive and challenging. A natural solution is to reduce the dimension of image representation by mapping each sample into compact binary code. In this paper, we propose a Ranking Based Semantic Hashing (RBSH) method to tackle with this problem. Observing that semantic structures carry complementary information, this paper takes advantage of semantic supervision for training high quality hashing, the semantic mapping between the high-dimensional feature space of samples and the reduced representation space of binary code, with the help of pre-trained word2vec. Specifically, the proposed method learns the mapping based on two criteria: the contrastive ranking loss and the orthogonality constraint. The former preserves the ordering of relative similarity in image pairs, while the latter makes different bit in the hash stream as orthogonal as possible. Extensive experimental study has been conducted on VOC2012 and ILSVRC2014 image sets, demonstrating that the proposed approach generally outperforms the state-of-the-art hashing techniques based methods in image search.

012005
The following article is Open access

, , and

Due to the small environmental limitations of GPS satellite positioning and the complexity of the indoor environment, indoor object location technology has the opportunity to display its talents. Considering that the map constructed by ROS SLAM can only describe the two-dimensional information of the environment, the three-dimensional point cloud image can only describe the independent three-dimensional information of the object. This paper combines the indoor 2D map and the 3D point cloud image information of the object constructed by Gmapping algorithm and proposes a composite coordinate positioning system. The whole positioning experimental data has shown that the average measurement error of the position in the object room is only 4.2cm, and its positioning accuracy is 6.7% higher than that of the common ultrasonic and infrared positioning system[1], and the positioning accuracy of the positioning system based on Bluetooth angle measurement[2] is improved by 20% compared with ultra-wideband[3]. Positioning system increased by 72%. The object positioning error is small and the positioning is accurate.

012006
The following article is Open access

, , and

Large-margin softmax (L-Softmax) loss for deep neural networks in face recognition tasks. Different from the softmax cross-entropy loss, L-Softmax explicit encourages intra-class compactness and inter-class separability between learned features. RFB Net proposed RFB module to simulate Receptive Fields (RFs) in human visual systems and gain higher accuracy. In this paper, we proposed Integrate Receptive Field Block into L-Softmax Loss for Face Recognition, an enhanced L-Softmax loss with a RFB module, which not only enhance the feature discriminability, but also enhance the feature robustness. Extensive experiments on recognition benchmarks like MNIST, LFW, Experiments with two benchmark datasets show that our proposed approach makes deep learning features more discriminating and thus significantly improves the performance of various visual classification and verification tasks .

012007
The following article is Open access

, , and

Medical images are inevitably affected by noise in the acquisition process. This paper describes an image denoising method based on bilateral filter and the K-SVD algorithm. Firstly, the method uses bilateral filter to divide the image into edge layer and residual layer. Then the K-SVD algorithm is used to process the residual layer of the image to avoid damaging the edges of the image. Finally, the image denoising result is obtained by adding residual layer and edge layer. The experimental results for image with different intensity noise show that the proposed method can acquire higher peak signal-to-noise ratio (PSNR) than the K-SVD algorithm. In the denoising experiments of computed tomography (CT) and magnetic resonance (MR) images, the proposed method can obtain clearer soft tissue and bone structure than the K-SVD algorithm.

012008
The following article is Open access

, , and

In this paper, we present a novel feature-processing method for age-invariant face recognition so that it is robust to aging process. The main purpose of feature-processing is to remove aging effects while keep personalized properties stable simultaneously. In order to achieve this, we try to learn a space map and then encode the mapped feature to an age - invariant representation. In the encoding step, we introduce two kinds of constraints: the temporal constraint (local constraint) and boundary constraint (global constraint). We applied our feature-processing method to Cross-Age Celebrity Dataset (CACD). In order to verify the versatility of our method, we apply it to both high-dimensional LBP feature and deep feature. Results show that our feature-processing method works well on CACD and the face verification subset of CACD (CACD-VS).

012009
The following article is Open access

, , and

Video-based person re-identification (re-id) is a very challenging problem because of the occlusion and the changes of viewpoints, pedestrian postures and illumination. Most existing methods of video-based person re-id usually concatenates the extracted appearance and space-time features directly, which do not consider the discrepancy of different features fully. In order to deal with this problem well, we propose a simple but effective method based on feature learning of valid regions and distance fusion, which combines three distances. The first distance is local distance of valid regions calculated using the Gaussian of Gaussian(GOG) feature. Pedestrian images are divided in horizontal directions, and then regions with smaller distances are selected as valid ones to be reserved and regions with larger distances are as invalid ones to be removed due to occlusions and posture changes. The other two distances are obtained by independent metric learning using the histogram of oriented gradient 3D (HOG3D) feature and the Local Maximal Occurrence (LOMO) feature. The three distances are added as the final distance between pedestrian pair, so the matching rank of gallery is obtained. Extensive experiments are performed on the iLIDS-VID and PRID-2011 datasets, and the results prove the effectiveness of our methods.

012010
The following article is Open access

, , and

The great progress in recommendation system help users discover more interesting items that satisfy their appetites. Considering the video recommendation is an increasing popular sub-field of recommendation, but the traditional recommendation techniques such as Collaborative Filtering and Content-based model simply exploit one information source that limits its performance. In this paper, we proposed a Multi-info fusion based recommendation system which integrates several different information sources to comprehensively model the similarity between videos. The information sources including the common user-item rating data and video's textual content that consists of video's genres and textual description. Experimental results on a public dataset show that the proposed system is of high quality and achieves significant improvements over the traditional Collaborative Filtering techniques.

012011
The following article is Open access

, , , and

Image segmentation is a key technology from image processing to analysis. Without proper segmentation, it is impossible to recognize correctly. In this paper, we propose a method for image co-segmentation based on the biased normalized cuts using a semi-supervised way to deal with foreground regions. In order to take advantage of biased normalized cuts to solve problem, we use 2D adaptive Wiener filter to smooth the seeded parts of images, then divide images into a set of super-pixels, after that take super-pixels as vertices to form a weighted undirected graph. Thus, the co-segmentation can be seen as an issue of graph partition that solved by biased normalized cuts. The experiments on image data sets show the superior performance of our method.

012012
The following article is Open access

, and

Nowadays, in the traditional door or object recognition system, people use GluonCV and Tensorflow to help teach machines millions of pictures, and then base on the information in the database collected from the pictures to do the door recognition. While in the new proposal, a new door recognition system based on a machine carrier called GITA is proposed. In the proposal, the recognition elements comprise motionless elements, including door boundary, handle/bar, and ratio of body width and door width, and loco-motor elements, which include people's gesture, change of door depth and change of illumination. In order to prove it, several different tests based on seven doors of different types are done.

012013
The following article is Open access

and

In some family photos or special scene pictures, we can find some rotated faces. Most existing methods are based on increasing the features of rotated faces or changing the directions of pictures to augment the training data. However, these methods have their own limitations and can not detect rotated faces accurately. We propose a method based on three-window convolutional neural networks designed by ourselves. We extract the features of faces and change the feature matrices of faces by clockwise rotation and anticlockwise rotation through three-window convolutional layer in order to increase the face features. We retain parameters of the model after training, and convolutional layer replace fully connected layer. According to the heatmap of the sample, our method can predict face region. We carry out the experiments on FDDB and LFW datasets. The experiments on LFW show that our model achieves AUC of 0.9240, and recall of 0.9367 and the experiments on FDDB show that our model achieves recall of 0.9541.

012014
The following article is Open access

, and

Most of the visual odometry is based on the matching of the feature points, or the pixel matching of the direct method. However, images have another obvious feature, i.e. the line feature. If we use the point based visual odometry in low texture images, it may result in bad performance in the experiment because of few numbers of feature points. Although in some texture-less environments, it is still possible to reliably estimate the line based on geometric elements. For example, the structured edges are obvious in the indoor scenes. In this paper, we propose a monocular visual odometry method which is based on the combination of direct method and line feature. Then we use TUM-RGB and ICL-NUIM datasets to test our algorithm. Experimental results show that our method improves the robustness and accuracy of the estimation of the position and attitude of the camera.

012015
The following article is Open access

and

The problem that the dimension of facial features is too large does exist with the Deep learning face recognition. This paper proposes a face recognition algorithm based on SVM combined with VGG network model extracting facial features, which can not only accurately extract face features, but also reduce feature dimensions and avoid irrelevant features to participate in the calculation. Firstly, the VGG-16 model is obtained by training the training data set, which is used for feature extraction, on top of this, principal component analysis method (PCA) is used for feature dimensionality reduction, and last, the face recognition is performed by SVM classifier with linear kernel function. In this paper, we conduct a comparative experiment on CelebA dataset and find that the accuracy reaches its peak when the feature dimension is reduced to 400.The experiment is carried out on LFW dataset using 400-dimensional feature data, and comparing with other algorithms, the results show that the algorithm in this paper has reached the level of state-of-art.

012016
The following article is Open access

, and

The compressed sensing (CS) theory has been applied to image compression successfully as most image signals are sparse in a certain domain. In this paper, we focus on how to improve the sampling efficiency for network-based image compressed sensing by using our proposed adaptive sampling algorithm. We conduct content adaptive sampling to achieve a significant improvement. Experiments results indicate that our proposed framework outperforms the state-of-the-arts both in subjective and objective quality. An average of 1-6 dB improvement in peak signal to noise ratio (PSNR) is observed. Moreover, the proposed work reconstructs images with more details and less image blocking effects, leading to apparent visual improvement.

012017
The following article is Open access

, and

In order to improve the cell sensitivity or to satisfy the needs of a large field of view, a large size CCD pixel is usually chosen as detection unit in optical imaging systems. However, this methodology cannot meet the Nyquist sampling theorem, and thus generate ill-sampling images. In other words, the geometric resolution of images in optical diffraction limited systems is directly restricted by the size of CCD pixel. In this paper, a carefully designed optical mask is implemented to ensure loseless images before CCD sampling. By applying spatial spectral filtering technology, we can acquire images with appropriate resolution. The method presented in this paper significantly abates the resolution decline due to ill-sampling. By mathematical deduction and simulation, the geometric super resolution images can be achieved.

012018
The following article is Open access

and

Emotional information in movie comments is critical to sentiment analysis, Sentiment analysis, which focuses on classify the comments into positive class and negative class according to sentiment lexicon, is one of the studies. Most of the existing researches are centered on sentiment words and user rating, while the user's attitude towards comments are ignored. And, considering that Chinese is the second largest language in the world. In this paper, in order to get this point to be considered, we propose a method for Chinese movie comments sentiment analysis based on HowNet and user likes which we called HAL. Our research consists of four parts. First, we use HowNet sentiment lexicon to get a new lexicon in the field of movies. Second, we use the new lexicon and word segmentation tool named Jieba to segment the movie comments. Third, we use the user likes and sentiment words to get the positive feature and negative feature. Finally, we train the movie comment data using three models (SVC, LinearSVC, LogisticRegression). The experimental results show that our method performs better than HowNet-based method in Chinese movie comment sentiment analysis.

012019
The following article is Open access

, and

ViBe is one of the most commonly used background subtraction method, which conducts foreground detection on each frame pixel-by-pixel. When the algorithm is applied to video sequence with large depth differences such as forest fire surveillance, the problem of traditional ViBe is appearing. If the algorithm parameters of different depth are the same, it will inevitably lead to two situations: the nearby shaking of trees is easy to cause false detection because of the large pixel area, and the distant smoke is easy to miss as it occupies small pixels of the image. To solve this problem, an improved visual background extraction algorithm combining depth information is proposed. The method calculates the dark channel information as depth information, and a conversion function is designed to adjust sensitivity to accommodate moving targets with huge depth difference. The experimental results demonstrate that the improved ViBe algorithm has better performance than traditional ViBe algorithm in forest fire surveillance. In addition, the improvement of the algorithm does not excessively increase the computational time-consuming.

012020
The following article is Open access

, and

Microstructural parameters are important for analyzing the chemistry and performance of solid oxide fuel cells (SOFCs). Aiming at the YSZ / Ni anode optical microscopy (OM) image of SOFC, in this paper, particle swarm intelligent optimization algorithm is used to improve the fuzzy C-means clustering algorithm for image segmentation. Particle swarm optimization is used to adaptively search the initial clustering center, helping to avoid local optimization and preserve more image detail. The experimental results show that the proposed method can improve the segmentation accuracy of images. At the same time, it can accurately segment the SOFC three-phase and provide effective image segmentation results for the microstructure parameters.

012021
The following article is Open access

Based on the real ship structure and equipment, the game engine is used to construct a three-dimensional visual model scene. Multigen Creator will be used to model, and after format conversion, the model will be driven by Unity 3D to complete the virtual ship platform construction. By introducing control logic such as roaming logic and navigation logic, the user is allowed to perform operations such as roaming and navigation in the virtual scene. The system has the characteristics of strong sense of reality, friendly interface, and interaction, which meets the needs of information teaching. With the cross-platform features of Unity3d, PC and mobile programming are respectively carried out to realize the network development of VR training platform.

012022
The following article is Open access

, and

We propose a shape matching algorithm based on an improved functional map in order to calculate correspondence between two given 3D non-rigid shapes, in which shape correspondence can be represented as a mapping function of mixed transformation matrix B and calibration matrix P of basis functions. First, Laplace matrix is calculated and a new matrix is constructed as basis matrix of function space by using eigenvectors of Laplace matrix after eigen-decomposition. Then, a calibration algorithm based on statistical covariance is proposed to calculate the matrix P that is used to calibrate basis matrices of function space of two shapes. Finally, the matrix B is optimized by an improved ICP(Iterative Closest Point) algorithm in order to calculate shape correspondence with the matrix P. Experimental results show that the proposed algorithm avoids excessive initial conditions, obtains accurate shape correspondence and significantly solves symmetry ambiguities of 3D shapes during matching process.

012023
The following article is Open access

, , , , and

Skull feature points play an important role in computer-aided craniofacial restoration. An improved relative angle histogram algorithm is proposed to match the feature points of the skull, aiming at the low positioning accuracy of the existing skull feature point matching algorithm and the difference of the number of points and the difference of the distribution of the model points. First, the Iterative Closest Point (ICP) algorithm is used to register the original skull model. Then, a new spindle is established for the model after registration. The relative angle of the model points is calculated and the phase diagonal distribution of the model points is calculated. Finally, the model points which are most similar to the histogram distribution of the model feature points are selected as the matching points. Experiments show that the algorithm has achieved good results in the matching of feature points of the skull model.

012024
The following article is Open access

, , , and

Aiming at the characteristics of abundant Magnetic Resonance (MR) image information, uneven gray scale, fuzzy boundary, and a fine structure that is difficult to distinguish and segment, this paper proposes an algorithm for segmenting MR images by an improved density clustering algorithm. First, the superpixel clustering algorithm (SLIC) is used to divide the image into a certain number of ultra-pixel regions and subsequently search the neighborhood of each superpixel according to a pre-specified threshold. Then, the KNN-DPC algorithm, in which the value of K is adaptively determined, is used to obtain the distance information of K nearest neighbors (density) and adjacent superpixels, and image segmentation is completed for each superpixel cluster. Two sets of natural image experiments show that this algorithm has high segmentation accuracy. Experiments on clinical breast MR images showed that this algorithm achieved good results for clinical MR image segmentation.

012025
The following article is Open access

, and

Breast-conserving surgery followed by radiotherapy to the whole breast and boost irradiation to the lumpectomy cavity (LC) is the standard strategy for the early stage breast cancer patients. Accurate segmentation of the target volume is a prerequisite for accurate radiotherapy, which directly affects the success or failure of tumor treatment. The current delineation of target is mainly done by manual drawing, which is time-consuming, laborious and easy to be affected by subjective factors. To solve this problem, we enhance the MR breast images using DWT (discrete wavelet transform) to get more detail of MR image feature firstly. Secondly, we use K-means algorithm to classify the feature vectors and establish the image segmentation model. Finally, compared with the traditional threshold segmentation method, the model is most suitable for automatic delineation of radiotherapy target area and the setting of optimal parameters are obtained. This method can realize the accurate automatic delineation of target area basically, and solve the problem of lack of accuracy and standardization in current tumor bed delineation.

012026
The following article is Open access

, , , , , and

There are a very few people who have the ability to "see" the surroundings by the echoes, which is called echolocation. The study of the brain mechanism of echolocation can not only help to improve the blind assistance device, but also provides a window into the research of brain's plasticity. In this paper, we developed a wearable system to transform the spatial information captured by camera into a voice description and fed it back to blind users which is inspired by echolocation. After our online virtual scene training, users can easily discriminate object location in the camera's view, motion of the objects, even shape of the objects. Compared with natural echolocation, it's easier to learn and be applied in daily life. In addition, the device achieves high spacial resolution. In this study, two trained blind subjects and two non-trained sighted subjects were tested by using functional Magnetic Resonance Imaging (fMRI). We obtain the fMRI images of the subjects' brain activity when they were listening to the sound of the wearable prototype. Intriguingly, we find that after training with the blind assistance system, the blind' visual area of the brain have been activated when they are dealing with the acoustic feedback from the device.

012027
The following article is Open access

, and

With the development of virtual reality, the appropriate interaction technology has become the focus of practitioners. Today's virtual reality interaction is mostly through the interaction of traditional electronic devices, such as handles and other products, temporarily solve the problem of interaction, but also make the user from the virtual world is taken out, immersion greatly reduced. The paper uses hand gestures to interact with virtual reality. By using skin detection in YCbCr color space and indicates the location of hand to determine the approximate range of hand. Then, the histograms of oriented gradient (HOG) which has been widely used in target detection research in recent years is used to extract gesture features, and applies support vector machine (SVM) method to achieve the real-time hand gesture recognition. Finally, experiments prove that the proposed method is accurate and stable in virtual reality interaction.

012028
The following article is Open access

, , and

With the development of autonomous driving technology, it is possible self-driving cleaning truck being driving and working automatically. However, a lot of field trials are timeconsuming and wasteful during the investigation of self-driving truck. In order to reduce lots of on-site and real vehicle tests, we construct a 3D virtual pilotless vehicle testing system for the cleaning truck based on virtual reality technology. 3DMax software is used to build virtual 3D scene static model and cleaning truck model. The established static models are imported into Unity3D to simulate different 3D scenes during cleaning truck self-driving. We designed a UI interface to build a virtual testing system of self-driving cleaning truck which is easy to realize human-computer interaction. we have achieved successfully the avoidance of obstacles and round-the-island driving. And we have successfully completed the identification of traffic lights and the pilotless driving test on crossroad virtual scene. In the virtual testing system, we can quickly experience the virtual testing scenes of the self-driving cleaning truck in various scenarios and get some kinematics information. The virtual testing system can partly replace the field tests and provide methods and theoretical guidance for field test, which can shorten the development cycle and save money, and is also unrestricted and repeatable. At the same time, the researching results provide new ideas and methods for the investigation of vehicle tests and pilotless driving.

012029
The following article is Open access

, , and

The previously proposed dithering defocusing technology performs well for threedimensional (3D) measurement when stripes are relatively wide, yet suffers if stripes are narrow. This paper finds two asymmetries in dithered patterns generated by the Sierra Lite dithering algorithm and verifies the longitudinal fringes are more advantageous for phase-shifting technique over transverse fringes. Furthermore, this paper proposes an algorithm with a meandering scan. In each pattern, the pixels of odd lines are scanned from left to right while the even lines are scanned from right to left. The proposed method avoids the quantization errors propagating in a specific direction and greatly improves the symmetry of longitudinal fringes. Both simulation and experimental results have shown this method can effectively improve accuracy of 3D measurement especially for narrow stripes.

012030
The following article is Open access

, , , and

Recently, many approaches based on deep learning have demonstrated amazing capabilities in varieties of challenging image tasks, such as image classification, object detection, semantic and instance segmentation, and so on. These methods are more capable of extracting deeper features than traditional methods, which is critical for different kinds of image tasks. Similarly, these methods are gradually applied to the work of image completion of natural images. In consideration of the fact that most of the current methods would lead to blurring and fake results, we propose an image completion method based on the generation adversarial network. We use the network structure of the encoder and decoder to obtain the high-level feature information of the image and generate reasonable pixel values to fill the missing regions. Besides, we construct a new joint loss function based on SSIM evaluation indicators, which can retain the similarity between two images as much as possible. Our proposed method can keep the completion regions consistent with the surrounding pixels, which makes the images look more realistic. We evaluate on our datasets with our proposed method and compare with other methods in this paper, our results are sharper and more realistic than the previous ones.

012031
The following article is Open access

, , and

Labanotation uses a variety of graphic symbols to analyse and record human movements accurately and flexibly, which is an important means to protect traditional dance. In this paper, we introduce an efficient method for automatic generation of Labanotation from motion capture data by identifying human movements with bidirectional LSTM network (Bi-LSTM). Up to our knowledge, this is the first time that Bi-LSTM network has been introduced to the field of Labanotation generation. Compared with previous methods, Bi-LSTM used in our human movements recognition system learns context information for sequential data from not only the past but also the future directions. Combined with a newly designed discriminative skeleton-topologic feature, our approach has the ability to generate more accurate Labanotation than previous work. Experiment results on two public motion capture datasets show that our method outperforms state-of-the-art methods, demonstrating its effectiveness.

012032
The following article is Open access

, , , , and

For content-based image retrieval, a good presentation is crucial. Nowadays, as deep learning models can be used to generate an excellent presentation, it has been extensively investigated and widely used in research systems and commercial production systems. However, the deep representation (deep feature) is still too large. Compared with directly using deep representation, binary code can reduce significant storage overhead. Meanwhile, the bit-wise operations for binary code can dramatically fasten the computation. There exist some schemes used to convert the deep feature to binary code, but all of them directly applied the last layer of the connection layers, which exhibit global feature and discriminating features. To achieve deep generative feature and avoid destroying the image locality, we aim to construct the binary hash code based on convolutional auto-encoders. Namely, we use the generative model to transform the local feature to binary code. The training process of our proposed model is decomposed into three stages. Firstly, the convolutional layers are trained using convolutional autoencoders, followed by the fully-connected layers training using Restricted Boltzmann Machine. Thirdly, we deploy a supervised similarity learning algorithm to learn close code for similar images.

012033
The following article is Open access

, , , and

With the rapid increase of cars, the research of license plate recognition in intelligent traffic algorithm has attracted much attention. In this paper, pointing at the shortcomings of the existing recognition algorithm, we adopt a license plate recognition method based on BP neural network with a median filtering. The method consists of a median filtering algorithm for denoising and a BP neural network for training and testing. First the character library is processed by median filtering to remove the influence of noise. Then, utilizing the strong learning ability of BP, the character library is passed into the network for classification. Compared with the traditional algorithm, our method not only speeds up the recognition efficiency, but also obtains more robust results. Therefore, the proposed approach can provide a technical support for the practical application of license plate recognizing.

012034
The following article is Open access

, , and

Object detection has made great progress in recent years, the two-stage approach achieves high accuracy and the one-stage approach achieves high efficiency. In order to inherit the advantages of both while improving detection performance, this manuscript present a useful method, named Densely Connected Refinement Network (DCRN). It adds the dense connection based on RefineDet. Compare to the RefineDet, our approach can take full advantage of the bottom feature information. DCRN is formed by three interconnected modules, the dense anchor refinement module (DARM), the dense object detection module (DODM) and the dense transfer connection block (DTCB). First module can make better use of the features from different layers to initially adjust anchors by attaching dense connection. The latter module takes the refined anchors to further improve the regression and predict multi-class label. Due to the dense connection in DCRN, the network parameters are reduced and the computing costs of this approach is also saved. Extensive experimental results on PASCAL VOC 2007 and PASCAL VOC 2012 demonstrate that DCRN achieves higher accuracy than the one-stage method and higher efficiency than the two-stage method.

012035
The following article is Open access

, , , and

Temporal action detection is an important research topic in computer vision, of which Temporal Action Proposal (TAP) generation is a key step for finding candidate action segments. Our paper provides an action proposal generation network for temporally untrimmed videos in which a new effective and efficient deep architecture named action keyframe connection network for temporal action proposal Generation. Firstly, a two-stream network is adopted to extract frame-level features which inclued appearance feature and optical flow feature. The temporal information helps the subsequent network to determine whether a frame is the beginning or the ending of the action. Secondly, a position discrimination network is designed to infer the probability of each frame being starting frame or ending frame. The network outputs a starting probability sequence and an ending probability sequence which indicates the start of the action and the end of the action respectively. Finally, our network generates a proposal by a specific threshold rule combining the points in the starting probability sequence and the ending probability sequence. We carry out experiments on ActivityNet dataset to compare our proposed method with the state-of-the-art methods. Experiment results show that our method achieves superior performance over other methods.

012036
The following article is Open access

and

Distinct feature extraction methods are simultaneously used to describe single channel Electroencephalography (EEG) based biometrics. This study proposes a new strategy to extract features from EEG signals. Based on the time and frequency information, the statistics features are obtained from the EEGs. For the dichotomize process, the support vector machine classifier is used with 10-cross-fold in this research. The main contribution of this paper is to propose a simple but effective single-channel EEG feature extraction method and consider feature selection to optimize classification efficiency. In the experiments, the EEG data is obtained from a human-computer interaction environment when the subjects are under the non-stationary states with different emotions. The results show that this proposed method achieves better classification performance on a single-channel EEG system than previous work.

012037
The following article is Open access

, and

Rolling bearing faults are among the primary causes of breakdown in mechanical equipment. Aiming at the vibration signals of rolling bearing which are non-stationary and easy to be disturbed by noise, a novel fault diagnosis method based on curvelet transform and metric learning is proposed. This method consists of 3 parts. The first one is feature engineering which includes reshaping the original timing features of rolling bearings, employing curvelet transform to transform reshaped features and making its coefficients as the new features. Curvelet transform can analyse the original signal from many angles. The second one is employing metric learning to map these new features into special embedding space. The last one is applying KNN classifier to detect the rolling bearing faults. Metric learning can effectively improve the performance of KNN by learning a mapping matrix to modify the distribution of samples. The proposed method overcomes the problems such as the subjectivity and blindness of manual feature extraction, poor coupling in each stage and sensitive to the effect of noise. Extensive simulations based on several data-sets show that the our method has better performance on bearing fault diagnosis than traditional methods.

012038
The following article is Open access

, , , and

Cerebral microbleed (CMB) the small vessels in the brain which is one of the major factors used to facilitate in the early stage diagnosis for Alzheimer's disease detection. In traditional, CMBs detection can be done manually by the neurologists, doctors or specialists. However, the process is time-consuming and the results are not accurate depending on the doctor experiences. Therefore the efficient and reliable of the automatic detection of CMB is needed. This paper proposes a new framework for CMB detection which employs segmentation of the region of interests (ROIs), detection of the CMBs and identification of the area from SWI scan images. Convolutional Neural Network(CNN) is applied to generate the desired models for later prediction. Shape matching mechanism is also applied to identify locations of CMB in the brain. The experimental result shows that the CMB can be classified with a recorded accuracy value of 95.45%. The CMBs were discovered from three different locations include (i) cortical region, (ii) cerebellum and (iii) brainstem with an accuracy value of 100%.

012039
The following article is Open access

, , , and

Video object detection is of great significance for video analysis. In contrast to object detection in still image, video object detection is more challenging which suffers from motion blur, varying view-points/pose, and occlusion. Existing methods utilized temporal information during detection in videos and show improvement over static-image detector. In this paper, we propose a novel method for video object detection that can aggregate features across adjacent frames adaptively as well as capture more global cues so as to be more robust to drastic appearance changes. Initially, current frame feature and warped feature from adjacent frames can be obtained via feature extraction network and optical flow network. Next, a coherence contribution module is designed for adaptively feature aggregation of the two kinds of features obtained from the first step. Finally, the still-image detector which included an extra instance-level module that agregates features from adjacent frames for capturing more global feature is adopted to get the final result. The experimental results evaluated on our method shows leading performance on the ImageNet VID dataset.

012040
The following article is Open access

and

Object detection algorithms have made great progress in the past few years because of the development of deep learning. From region selection to anchor box regression, the accuracy of the algorithms becoming more and more accurate, but they're also far away from vision mechanism of human. Corner based is a new approach for object detection, but methods proposed nowadays are in huge wastage of time and resource, leading them not available in vast bulk of jobs. In this paper, we proposed a method which makes corner-based object detector could inference in real-time. We analyze the disadvantages of existing methods, found out the most consuming parts and the solution for improvements, which finally makes the algorithm efficiency and get a result which is competitive with other existing real-time methods.

012041
The following article is Open access

and

We present a novel model for multiple object recognition. Our model combines current deep learning recognition systems with object category relation information. This model is mainly inspired by spatial memory network [1], which treats multiple object recognition task as an iterative process, reusing some region feature as context information. We extend this work by presenting a statistical-based category relation model to measure object category semantic relevance. With the spatial memory and category relation model as relation reasoning modules, our model takes in context information. Our model achieves 4.4% per-instance and 4.9% per-class absolute improvement of average classification accuracy over plain convolutional neural networks on ADE dataset.

012042
The following article is Open access

In this paper, we use TensorFlow Mobile Lite for Object Detection with datasets of basic geometric figures on iOS mobile devices. Additionally, we trained 4 datasets in 2D and 3D and we compare the accuracy of detection between using colour and grayscale image data. Also, we evaluate the detection rate using 2D and 3D for some kind of normal objects in precision and label output. We used Convolutional Neural Networks (CNN) for build the datasets and OPENCV for convert into grayscale. We value the result relation between flat and volume datasets in way of label and numeric detection, also the affectation of TensorFlow Mobile with this kind of datasets. We make comparisons based on the results of the different experiments object the detection and, in this work, TensorFlow Mobile Lite implementation does not have pseudo boxes and the reason is explained for detection purposes and accuracy adjusted to this kind of experiments.

012043
The following article is Open access

, and

Robotic follower is receiving attention widely in recent years. Aiming at the problems of low sample collection efficiency, high training cost and difficult design of reward function in the real world, we propose a control method based on deep reinforcement learning. Different depth layers are adopted to attain the end-to-end control of the robotic follower through pre-trained. Then, we design a reward function mechanism to judge whether the robot follower follow falsely. Then the appropriate pre-trained network is transferred to reinforcement learning, and a deep reinforcement learning system for monocular vision robot following tasks is established. According to the experimental results, the proposed deep reinforcement learning method can efficiently collect a large number of data sets, shorten the training period and reduce the number of times that the robot follower loses its target.

012044
The following article is Open access

, and

User authentication for an accurate biometric system is the demand of the hour in today's world. When somebody attempts to take on the appearance of another person by introducing a phony face or video before the face detection camera and gets illegitimate access, a face presentation attack usually happens. To effectively protect the privacy of a person, it is very critical to build a face authentication and anti-spoofing system. This paper introduces a novel and appealing face spoof detection technique, which is primarily based on the study of contrast and dynamic texture features of both seized and spoofed photos. Valid identification of photo spoofing is anticipated here. A modified version of the DoG filtering method, and local binary pattern variance (LBPV) based technique, which is invariant to rotation, are designated to be used in this paper. Support vector machine (SVM) is used when feature vectors are extracted for further analysis. The publicly available NUAA photo-imposter database is adapted to test the system, which includes facial images with different illumination and area. The accuracy of the method can be assessed using the false acceptance rate (FAR) and false rejection rate (FRR). The results express that our method performs better on key indices compared to other state-of-the-art techniques following the provided evaluation protocols tested on a similar dataset.

Information Technology

012045
The following article is Open access

, , and

With the widely applying of 3D sensors such as LiDAR and RGBD camera in robots and driverless technology, 3D point clouds classification has also achieved some development. PointNet[1] is a competitive method in point clouds classification due to it's fast speed and well performance in real applications. However, when dealing with easy-classified samples overwhelming in 3D object classification, the PointNet shows incompatibility due to its gradients are mainly determined by these easy-classified samples. In order to optimize the gradients in PointNet, we designed the SFL(Sigmoid Focal Loss) function for 3D object classification to instead of the standard softmax cross-entropy loss function. The proposed SFL function will automated refined the weights of the hard-classified samples in training, so that the new neural network can focus on these hard-classified samples to avoid the unbalance of the classifier. Our results show that the new loss function can achieve the best result in classification tasks that directly processes point clouds.

012046
The following article is Open access

A Numerical optimization is a classical field in operation research and computer science, which has been widely used in areas such as physics and economics. Although optimization algorithms have achieved great success for plenty of applications, handling the big data in the best fashion possible is a very inspiring and demanding challenge in the artificial intelligence era. Stochastic gradient descent (SGD) is pretty simple but surprisingly, highly effective in machine learning models, such as support vector machine (SVM) and deep neural network (DNN). Theoretically, the performance of SGD for convex optimization is well understood. But, for the non-convex setting, which is very common for the machine learning problems, to obtain the theoretical guarantee for SGD and its variants is still a standing problem. In the paper, we do a survey about the SGD and its variants such as Momentum, ADAM and SVRG, differentiate their algorithms and applications and present some recent breakthrough and open problems.

012047
The following article is Open access

, , and

Specific emitter identification provides the capability to distinguish radio emitters with the external features carried by the received waveforms. As existing features are not specific to time-division multiple access (TDMA) devices, it is yet intractable to discern the emitter in the case of low signal-to-noise rates or short durations. In this paper, we propose a novel characteristic based on the continuity of carrier phase and explore its application on TDMA device identification. The characteristic reflects a fact whether adjacent time slots are assigned to the same user terminal, which reveals a potential link between slots even if the protocol is unknown. To apply it to TDMA device identification, we augment the typical SEI scheme with two subsystems, Decision and Correction, in which the recognition results can be corrected to improve the accuracy. Simulation results demonstrate that the characteristic is resilient against the ambient noise, and it evidently improves the recognition accuracy.

012048
The following article is Open access

and

An adaptive compression algorithm based on sparse dictionary is proposed to solve the pressure on the transmission system caused by long-time high sampling rate sampling of transient signals. Due to the obvious difference between transient and non-transient information features of transient signals. According to the sparse dictionary construction characteristic that the more matching atoms and signal features are, the better sparse performance is, the transient signal feature information is extracted and compressed separately. After data compression, the principle of compressed sensing is applied to restore the transient signal, so as to reduce the amount of data transmitted in the transmission system. In summary, the adaptive sparse dictionary compression algorithm effectively compresses the transient signal data, and the compressed data can accurately recover the transient signal.

012049
The following article is Open access

, and

A measure function is used to judge whether match in boost matching. In this paper, we offer a kind of new measure function called GST. We simply use the relationship between gray differences and threshold of the points of domain tests the matching degree. We use homography and GST to improve the number and precision of correspondences of SIFT matching. Firstly, homography is estimated using the result of SIFT matching by VRANSAC. Then, wrong correspondences are eliminated by homography. Finally, we use GST and homography to boost matching the wrong correspondences. Experiments with the Oxford datasets show that our method can get better performance than the existed methods.

012050
The following article is Open access

and

In the wideband receiving system of radar, it is often necessary to process signals with high sampling rate and wide bandwidth. The sampling rate that the system can process is limited by the design structure of the whole system. Usually, the signal is processed by a polyphase filter bank composed of several low-order filters. The interpolation and decimation of signals are realized at the same time, which further reduces the sampling rate. However, the baseband signal data rate required by the signal processing system may not be obtained by integer decimation of the sampled signal. Therefore, a structure combining parallel polyphase interpolation filtering and parallel polyphase decimation filtering is designed to realize the decimation of L/M multiple of large bandwidth signals without improving the processing clock of the FPGA.

012051
The following article is Open access

and

In order to maintain the strong representational learning of the deep neural network model to learn the interaction function between any users and items, combining the trust relationship between users as a local relationship to enhance the ability to supplement interactive data, to achieve a better recommendation effect. This paper proposes a Trust-based Neural Collaborative Filtering model (TNCF). Firstly, trust information and scoring information are merged through the Generalized Matrix Factorization model (GMF) to obtain recommendations based on trust friend preferences. Then, using the Multi-Layer Perceptron model (MLP), the nonlinear kernel is utilized to learn the interaction function from the data to obtain the recommendation based on the user's personal taste. Finally, all the interaction results are aggregated for implicit prediction. Compared with the three different baselines on the FilmTrust and Epinions datasets, the experimental consequences reveal that the proposed model improves the recommendation and also performs well on sparse statistics.

012052
The following article is Open access

, and

Given the existing packet dimensionality reduction model, the simple distance hypothesis is only used as a simple assumption that there is a certain relationship between packet data. This document proposes to share information between packet data with relevant random measures as a priori. We explicitly calculated the Lévy measure of the mixed random measure and offered the inference steps of detailed parameter a posteriori. Compared with the traditional method, the grouping dimensionality reduction model can achieve faster convergence and can well maintain the original information of data. The experimental results on the public dataset show that the grouping dimensionality reduction model is an effective dimensionality reduction algorithm and can be employed to extract characteristics on big data.

012053
The following article is Open access

and

The most mature and widely used collaborative filtering algorithm is facing the problem of data sparsity, which is not conducive to the acquisition of user preferences, thus affecting the recommendation effect. Introducing item genres into recommendation algorithm can reduce the impact of data sparsity on recommendation effect. The personalized preferences of users can be extracted more effectively from the preference information of item genres, and the recommendation accuracy can be further improved. On this basis, this paper proposes a recommendation algorithm based on item genres preference and GBRT, which divides similar users by K-means clustering algorithm, extracts user preferences of item genres and auxiliary features, and establishes a rating prediction model combined with GBRT algorithm. The experiments on the common datasets of Movie Lens 100K and Movie Lens 1M show that the proposed algorithm achieves 0.8%-7% optimization on the evaluation index MAE, which indicates that the impact of data sparsity is reduced to a certain extent and the recommendation efficiency is better than the existing recommendation algorithm.

012054
The following article is Open access

and

In order to satisfy the consumers' increasing personalized service demand, the Intelligent service has arisen. User service intention recognition is an important challenge for intelligent service system to provide precise service. It is difficult for the intelligent system to understand the semantics of user demand which leads to poor recognition effect, because of the noise in user requirement descriptions. Therefore, a hybrid neural network classification model based on BiLSTM and CNN is proposed to recognize users service intentions. The model can fuse the temporal semantics and spatial semantics of the user descriptions. The experimental results show that our model achieves a better effect compared with other models, reaching 0.94 on the F1 score.

012055
The following article is Open access

and

A widespread approach in machine learning to evaluate the quality of a classifier is to cross – classify predicted and actual decision classes in a confusion matrix, also called error matrix. A classification tool which does not assume distributional parameters but only information contained in the data is based on rough set data model which assumes that knowledge is given only up to a certain granularity. Using this assumption and the technique of confusion matrices, we define various indices and classifiers based on rough confusion matrices.

012056
The following article is Open access

, , and

DBSCAN is a classic and commonly applied density-based clustering algorithm, but its clustering accuracy depends on the choice of two input parameters. This paper presents a new algorithm for adaptative parameter determination in DBSCAN. The assumption of this new algorithm is that regions with larger sample density gradient usually corresponds to the edge areas of clusters. The main idea is to generate some pre-clusters and determine the values of parameters by their statistics information. We first randomly pick pairs of points in the sample space to form circles, which are called "disks". Then we estimate the density information of a disk by random sampling, and define the criteria of disk quality to select disks with larger sample density gradient. Finally, we obtain the suitable parameters of DBSCAN in terms of the distributions of radius and points number of these extracted disks by Gaussian kernel density estimation. Experimental results show that the new algorithm improves the accuracy of DBSCAN and performs better than classic algorithms like k-means and Birch in some cases.

012057
The following article is Open access

, and

In order to extract semantic feature information between texts more efficiently and reduce the effect of text representation on classification results, we propose a features fusion model C_BiGRU_ATT based on deep learning. The core task of our model is to extract the context information and local information of the text using Convolutional Neural Network(CNN) and Attention-based Bidirectional Gated Recurrent Unit(BiGRU) at character-level and word-level. Our experimental results show that the classification accuracies of C_BiGRU_ATT reach 95.55% and 95.60% on two Chinese datasets THUCNews and WangYi respectively. Meanwhile, compared with the single model based on character-level and word-level for CNN, the classification accuracies of C_BiGRU_ATT is increased by 1.6%, 2.7% on the THUCNews, and is increased by 0.6%, 5.2% on the WangYi. The results show that the proposed model C_BiGRU_ATT can extract text features more effectively.

012058
The following article is Open access

, and

Automatic text classification is a classic topic for natural language processing. Text classification research mainly focuses on feature representation of text documents or designing an efficient machine learning model. Although various approaches have been proposed to address these problems, they are still far from being solved. In this paper, we proposed a novel method called LAC_DNN to achieve the text classification based on diverse feature representation approaches and classifiers. More specifically, LAC_DNN firstly introduces a novel feature representation approach called LATW to extract feature information of the documents, which integrates the feature information extracted by LSI model, TF-IDF weighted vector space model (TF-IDF_VSM), TF-IDF weighted word2vec (TF-IDF_word2vec) and average word2vec (Avg_word2vec), respectively. Secondly, it trains different classifiers including support vector machine, k nearest neighbor, logistic regression and convolutional neural networks based on the feature encoded by LATW. Finally, LAC_DNN integrates these classifiers into an ensemble predictor to leverage complimentary information of feature representation methods and classifiers, and predict the topic of text documents. LAC_DNN achieves superior performance with accuracy of 97.44% and 97.43% on the text datasets of Fudan and Netease news, respectively. Extensive experiments show that LAC_DNN is prominent and useful for text classification.

012059
The following article is Open access

, , and

Mobile-edge computing(MEC) is considered to be a new network architecture concept that provides cloud-computing capabilities and IT service environment for applications and services at the edge of the network, and it has the characteristics of low latency, high bandwidth and real-time access to wireless network information. In this paper, we mainly consider task scheduling and offloading problem in mobile devices, in which the computation data of tasks that are offloaded to MEC server have been determined. In order to minimize the average slowdown and average timeout period of tasks in buffer queue, we propose a deep reinforcement learning (DRL) based algorithm, which transform the optimization problem into a learning problem. We also design a new reward function to guide the algorithm to learn the offloading policy directly from the environment. Simulation results show that the proposed algorithm outperforms traditional heuristic algorithms after a period of training.

012060
The following article is Open access

, , and

The normal operation of network devices is an important cornerstone of network security. The security evaluation of network device is very important to prevent network security problem. In order to dynamically evaluate the security level of network device and quantify the state, a method of dynamic network device security assessment is proposed. Firstly, this method makes full use of the alert log which includes state information of network device and combines with the idea of TF-IDF algorithm to analyze the frequency and the distribution of alert. Then, it puts forward a new algorithm ETA to calculate the value of event threat. Finally, these values are used for calculating the security index of network device. The experiment shows that the proposed method in this paper can find the network device in low-level security and provide effective decision support for network security administrator.

012061
The following article is Open access

, and

In this paper, we select price, wear resistance, resistance to fall, charging interval, battery life, communication stability, photo effect, appearance design, memory size and whether to buy again as input variables, take different mobile phone sales foreground grade as output variables based on the survey data of all kinds of mobile phone users in the current Chinese market, using support vector machine regression algorithm (SVMR), BP Neural Network Algorithms and K-Nearest Neighbor Algorithms to establish models and predict the sales prospects of various kinds of mobile phones in China. The prediction results show that the predicted value of the mobile phone sales prediction model constructed by SVMR is basically consistent with the actual sales of all kinds of mobile phones in the market, which can provide some guidance for the manufacture and sale of various of mobile phones.

012062
The following article is Open access

and

Existing research treated all sentences in the text on an equal basis during the training process and did not consider that key sentences tend to have a stronger influence. We propose a Convolutional Neural Network text sentiment classification model based on the key sentences enhancement. The proposed model can identify key sentences in the text and generate text representation based on these key sentences to reduce noise and improve the accuracy of the model sentiment classification. The experiment results show that the proposed model improves the accuracy of the text sentiment classification compared with other classic text sentiment classification models.

012063
The following article is Open access

, , and

The RNN network has been widely investigated in automatic abstractive summarization and has achieved good results in previous studies. However, in the process of processing and storing information in the RNN structure, the problem of losing long-term information may occur, resulting in the inability to generate high-quality summaries containing comprehensive information about the corresponding documents. In this paper, in order to overcome this problem as well as enhance the global information, we propose a memory-enhanced abstractive summarization (MEAS) model consisting of a memory enhancement module and a Seq2Seq module. Our model is able to capture and store global information about the entire document, such as the relationship between sentences and sentences, resulting in a richer representation of information to the Seq2Seq module to generate higher quality summaries. Our experimental results indicate that on the CNN/DailyMail corpus, our MEAS model achieve improvements of up to 1.17, 0.27 and 0.85 on the R-1, R-2, and R-L score, respectively, when compared with the related state-of-the-art baseline.

012064
The following article is Open access

and

Text sentiment analysis is part and parcel of natural language processing. The task of sentiment classification is actually the process of feature extraction through models. The comment text of commodities is very different from the ordinary text. The comment text has no fixed grammar and writing format and the sentiment feature information is scattered in various places of text. Due to these factors, model learning of sentiment classification is becoming increasingly complex. The paper aims at establishing a fine-grained feature extraction model based on BiGRU and attention. Firstly, the vocabulary is vectorized by means of the skip-gram model. Then, according to the pre-trained word vector, the sentiment words list can be reached and noise filtering would be conducted by Naive Bayes algorithm. Finally, the model extracts features using BiGRU and fine-grained attentions. Based on the hypothesis that a long review may lead to feature differentiation, a fine-grained attention model is proposed. In this model, the attention layer is design to focus on the feature in different level such as word level, sentence level and paragraph level. This paper validate the proposed model on two sentiment corpus JD reviews and IMDB. Empirical results show that the FGAtten-BiGRU model achieves state of the art results on sentiment analysis tasks.

012065
The following article is Open access

and

Text classification enables developers to track consumer's reaction to e-commerce products. Such information, often expressed in the form of raw emotions, can be used to measure consumers' emotions and their emotional preference for commodities, so as to help future consumers to make choices. However, capturing and interpreting human emotions expressed in product reviews is not a trivial task. Challenges stem from integrated approach and optimal feature combination methods of different classifiers. In this paper, we present a ensemble framework of text classification. It can found that the most adapitve feature sets for classifiers. An effective method of CRF model to process medium and long text is proposed, this method significantly improves the CRF model's ability to handle text, length of which is more than ten. As base learning algorithms, three classifiers are integrated to improve the efficiency of classification, which are Support Vector Machine (SVM), Conditional Random Filed(CRF), and Naive Bayes Multinomial(NBM). The experiments prove the effectiveness of our proposed method.

012066
The following article is Open access

, , , and

Key management in wireless sensor networks (WSNs) is the basic service for deploying security policies. In this paper, we combine q-composite scheme with polynomial scheme to propose a hybrid key pre-distribution scheme for WSNs. The characteristic of the scheme is that a partial polynomial is preloaded for each sensor node, and then a polynomial share stored by each node is used to generate a corresponding key pool. Perform the corresponding hash operation on the generated keys to hide part of the key information of the nodes. Then select the corresponding number of processed generation keys to distribute to each node. The scheme introduces a random key method, which avoids the "t-secure" problem faced by the conventional polynomial scheme and achieves the significant improvement of network security. The corresponding key is generated using the polynomial share preloaded on the sensor node, which has a higher connectivity than the polynomial scheme. Theoretical and simulation results show that the propose scheme not only enhances the network secure connectivity, but also improves the node's anti-capture attack capability when compared to other schemes.

012067
The following article is Open access

, and

For a network, two key evaluation indicators are the network lifetime and network throughput. Different clustering algorithms can be promoted for different aspects. A reasonable clustering algorithm can effectively improve the load balancing of the network and can make the distribution of CHs (cluster heads) more uniform. Aiming at the problem of cluster preference, this article presents an improved algorithm based on the classical LEACH (Low Energy Adaptive Clustering Hierarchy) algorithm, which takes the residual Energy of nodes and the uniformity of node distribution as the primary consideration for selecting CH.

012068
The following article is Open access

, and

A multi-component infrared gas sensor detection system is designed based on the principle that the gas under test absorbs infrared radiation of a specific wavelength,periodic infrared radiation signals pass through the chamber to reach multiple sets of filters. The infrared sensitive component extracts an electrical signal to filter and zoom which having the same period as the light source. And in a period of signal acquisition repeatedly, which achieved the effect of improving the reliability of signal acquisition,Enhanceing the capability of anti-jamming,Reduce the response time of the sensor, etc.

012069
The following article is Open access

, and

With the development of the virtual reality industry, the demand for performance of virtual simulation is increasing. In the field of virtual simulation, fluid simulation has become one of the most challenging research directions in computer graphics. High-performance fluid simulation methods are still a problem which is worth studying.This paper presents a method based on grid and Spring-mass model interactions for simulate liquid in the container. Firstly, the mass-spring model is introduced to simulate the water surface. Create a flat grid, then add physical properties to the point on the triangular grid. Grids are used to connect the water proton. Secondly, change the current height of the particle through calculating the particle density on the surface of liquid to simulate the fluctuations of the water.Finally realize the calculation method in unity, prove that the method has both performance and sense of reality and better than previous methods.

012070
The following article is Open access

, , and

With the development of computer vision, wearable computing technologies not only have changed our lifestyle, but also have provided much convenience for vulnerable road users, especially the Visually Impaired (VI) pedestrians. VI people have difficulties in locating and socializing due to the limitations of traditional assistive tools, e.g., the inability to recognize text. Text plays a significant role in various aspects, which can convey abundant semantic information of the scene. In recent years, text detection and recognition has made huge progress which makes it possible for VI people to understand the surroundings by using scene text information. In this paper, a text recognition system is proposed to help VI people to perceive store sign text. Firstly, we locate the text on the sign with the aim of leading VI pedestrian to reach the destination store. Towards this end, an objection detection network is integrated into the system to extract Regions of Interest (ROI) in complex real-world scenarios. In order to fulfil real-time assistance, an efficient detection network named Single Shot MultiBox Detector (SSD) has been made light-weight and embedded in the wearable system. Secondly, we leverage an open-source optical character recognition (OCR) instrument to recognize the detected text. Afterwards, we introduce the collected dataset and critical training tips for the task. Finally, a comprehensive set of experiments on our dataset demonstrates that our approach significantly improves the precision and make the recognition robust even in real-world settings. Based on our approach, the wearable system can feedback the recognized text in real time and assist the VI people during their every independent navigation.

012071
The following article is Open access

, and

Connection topography mapping is crucial for understanding how information is processed in the brain, which is an essential precursor for revealing principles of brain organization. However, existing connectopic mapping methods are dependent on prior knowledge, or not completely driven by data. Accordingly, the constructed connection topographies by these methods are biased towards hypotheses, or deviate from data. For these challenges, we propose a novel co-clustering based method for connection topography mapping in a fully data-driven manner. The proposed method aims to construct the connection topography between two ROIs of a certain neural circuit in consideration by leveraging the power of co-clustering. More precisely, the proposed method parcellates one ROI into subregions and identified their respective connected subregions from the other ROI simultaneously. The effectiveness of our method was validated on the mapping of the human thalamocortical system for 57 subjects based on their resting state fMRI data. The validation experiment results have demonstrated that our method can construct neurobiologically meaningful thalamocortical connection topography. Compared with existing methods, our method yields more meaningful and interpretable connection topography.

012072
The following article is Open access

, and

This paper describes the design and implementation of a novel augmented reality application, referred as FunPianoAR, that aiming to activate the interests of user and improve the experience of piano learning. The FunPianoAR with a user-friendly interface considers the paired play to further reduce the difficulty of playing the piano for adult novices. For the paired play, one user plays the melody and the other plays the harmony part. The app is developed by using Android Studio 3.0 and artoolkitX 1.0, an open source augmented reality SDK, and installed on the AR smart glasses, Epson Moverio BT300. Due to the textureless features of real piano and registration precision, the application is implemented by using the fiducial marker tracking instead of the markerless recognition. Besides, we divided the piano keyboard into four zones and each zone used a separate marker for tracking to solve the limited field of view (FOV) to some extent. The virtual keys can be accurately superimposed on the piano keys by using multi-marker tracking. An evaluation was also conducted to compare the effects of two types of augmented reality information superimposed, i.e., the instant way without the hints for the next note to be played and the FunPianoAR with the hints for the coming notes to be played. The correctness rate was calculated and the time differences between the notes that two players should play at the same time were collected. The FunPianoAR shows more advantages over the instant way based on the evaluation results.

012073
The following article is Open access

and

When utilizing the most likely state sequence (MLSS) criterion in Gauss mixture model-hidden Markov model (GMM-HMM) to acquire the best state series of observations, only the maximum likelihood state of speech frame is considered. Therefore, the influence of other states is neglected, which leads to the losing of some important information, and further reduces the recognition rate of the system. In this paper, we propose two new features, which are called state likelihood cluster feature (SLCF) and supervised state feature (SSF), to both reflect acoustic features and fuse state information. Combining SLCF and SSF with Mel frequency cepstrum coefficient (MFCC), Mel frequency cepstrum & state likelihood cluster feature (MSLCF) and Mel frequency cepstrum & supervised state feature (MSSF) are formed, respectively. By the proposed MSLCF and MSSF in Chinese speech recognition experiment, the relative error rate of the isolated word recognition system declines 6.10% and 9.66%, respectively, and the relative error rate of the continuous speech recognition system declines 2.53% and 11.05%, respectively.

012074
The following article is Open access

and

The electrocardiogram (ECG) is non-invasive, inexpensive and widely used in several applications, implemented to detect the physical condition and disease of the human body. Atrial fibrillation (AF) is the most common of many different forms of sustained arrhythmia. Therefore, early diagnosis of AF may help to improve doctor's diagnostic efficiency and is essential to prevent further progression of Atrial fibrillation to other heart disease and stroke complications. With the popularity of the machine learning and deep learning, more and more researchers apply them in image recognition, speech recognition and so on. Naturally, there are also many studies which achieve the purpose of diagnosing diseases, such as detection of arrhythmia, biometric identification based on ECG signals and machine learning or deep learning. A novel approach to detect AF from ECG signals was developed on this study, we used great filter EEMD (Ensemble Empirical Mode Decomposition) and classifier XGBoost (eXtreme Gradient Boosting) to detect normal rhythm, AF and other rhythm. Finally, the great performance was achieved with an average F1 score of 0.84 and accuracy of 0.86.

012075
The following article is Open access

, , , and

In this paper, the driver's seatbelt detection algorithm is transplanted to the PYNQ embedded platform of XILINX to meet the technical requirements of automatic identification and detection of whether the driver wears a seatbelt or not. In this paper, aiming at the characteristics of the hardware of the FPGA, the driver's seatbelt detection algorithm is realized by IP core from XILINX open source. Vehicle detection is realized by using YOLO object detection algorithm with 3 bit weight and 1 bit activation, and driver's seatbelt classification is realized by using binary model with 1 bit weight and 1 bit activation. On the PYNQ embedded hardware of XILINX, the acceleration of the algorithm is realized by calling the hardware of FPGA on the ARM side.

012076
The following article is Open access

, , , and

Time delay neural networks (TDNNs) have been shown to be an efficient network architecture for modelling long temporal contexts in speech recognition. Meanwhile, the training times of TDNNs are much less, compared with other long temporal contexts models based on recurrent neural networks. In this paper, we propose deeper architectures to improve the modelling power of TDNNs. At each TDNN layer that needs spliced input, we increase the number of transforms so that the lower layers can provide more salient features for upper layers. Dropout is found to be an effective way to prevent the model from overfitting once the depth of the model is substantially increased. The proposed architectures significantly improvements the recognition accuracy in Switchboard and AMI.

012077
The following article is Open access

, , , , and

In deep neural networks, the gate mechanism is a very effective tool for controlling the information flow. For example, the gates of Long Short-Term Memory (LSTM) help alleviate the gradient vanishing problem. In addition, these gates preserve useful information. We believe that it will benefit if the system learns to explicitly focus on the relevant dimensions of the input. In this paper, we propose Gated Time Delay Neural Networks (Gated TDNN) for speech recognition. Time-delay layers are utilized to model the long temporal context correlation of speech signal while the gate mechanism enables the model to discover the relevant dimensions of the input. Our experimental results on the Switchboard and the Librispeech data sets demonstrate the effectiveness of the proposed method.

012078
The following article is Open access

, , and

In Automatic Speech Recognition(ASR), Time Delay Neural Network (TDNN) has been proven to be an efficient network structure for its strong ability in context modeling. In addition, as a feed-forward neural architecture, it is faster to train TDNN, compared with recurrent neural networks such as Long Short-Term Memory (LSTM). However, different from recurrent neural networks, the context in TDNN is carefully designed and is limited. Although stacking Long Short-Term Memory (LSTM) together with TDNN in order to extend the context information have been proven to be useful, it is too complex and is hard to train. In this paper, we focus on directly extending the context modeling capability of TDNNs by adding recurrent connections. Several new network architectures were investigated. The results on the Switchboard show that the best model significantly outperforms the base line TDNN system and is comparable with TDNN-LSTM architecture. In addition, the training process is much simpler than that of TDNN-LSTM.

012079
The following article is Open access

, , and

Speaker verification (ASV) systems are still vulnerable to different kinds of spoofing attacks, especially replay attack due to high-quality playback devices. Many countermeasures have been developed recently. Most of the efforts focus on the search for more salient features and many new features have been proposed. Five kinds of features, namely Mel-frequency cepstral coefficients (MFCCs), linear frequency cepstral coefficients (LFCCs), inverted Mel-frequency cepstral coefficients (IMFCCs), constant Q cepstral coefficients (CQCCs) and bottleneck features were compared on the public ASVspoof 2017 and BTAS 2016 datasets in this paper. Our experimental results show that MFCCs and bottleneck features yield comparable results. Both of them significantly outperform others (including the recently proposed CQCCs). However, the number of filters and cepstral bins are essential to the success of MFCCs.

012080
The following article is Open access

Some deafblind persons utilize Finger Braille, a communication medium using tactile sense, to communicate each other. Deafblind persons can communicate words, if they are trained in Finger Braille, and also communicate varied emotions by Finger Braille. In this work, the emojis and emoticons were applied to the emotion teaching interfaces in order to develop Finger Braille emotion teaching system. Two novel emojis for joy, anger and sadness were selected, respectively. The best depicted emoticons for joy, anger and sadness were also selected, respectively. Then we designed six emotion teaching interfaces with the emojis and three ones with the emoticons. We conducted the evaluation experiment which evaluate the associable emotion teaching interfaces for joy, anger and sadness, respectively. As a result, the emojis Anger D and Sadness D were associated the best with anger and sadness, respectively; the emoticon Joy 1 was associated the best with joy.

012081
The following article is Open access

and

Reasoning with inconsistent ontologies plays an important role in Semantic Web applications. An important feature of argument theory is that it is able to naturally handle inconsistencies in ontologies and allows a user to represent information in the form of an argument. In argumentation, given an inconsistent knowledge base, arguments compete to decide which are the accepted consequences. In this paper, we are interested in using the argumentation for the inconsistency degree of uncertain knowledge bases expressed in possibilistic DL-Lite (the key notion in reasoning from a possibilistic DL-Lite knowledge base) without going through the negative closure. In the present work, the terminological base is assumed to be fully certain and the uncertainty is only considered on the assertion based. We proved that it is coherent and feasible to use Dung's abstract argumentation theory to compute the inconsistency degree and how argumentation semantics relate to the state of the art of handling inconsistency.