Journal of Physics: Conference Series, Volume 2294, 2022

011001

The following article is Open access

The 2022 5th International Symposium on Big Data and Applied Statistics (ISBDAS 2022), was planned to be held on April 22-24, 2022 in Xining, China. Due to the impact of COVID-19, many communities from all over the world were under strict health measures and strict travel restrictions, and participants of the 2022 5th International Symposium on Big Data and Applied Statistics (ISBDAS 2022) also meet with the difficulty of travel restrictions. Considering the situation that most of the authors would like to publish their articles and make academic communications as scheduled, ISBDAS 2022 was held on-line instead of postponing the conference. It was a challenge, not only to the organizers, but also to the participants who were delivering their speeches using videoconference tools, under different time zones.

ISBDAS 2022 is to bring together innovative academics and industrial experts in the field of Big Data and Applied Statistics to a common forum. The primary goal of the conference is to promote research and developmental activities in Big Data and Applied Statistics and another goal is to promote scientific information interchange between researchers, developers, engineers, students, and practitioners working all around the world. The conference will be held every year to make it an ideal platform for people to share views and experiences in Big Data and Applied Statistics and related areas.

The conference brings together about 170 leading researchers, engineers and scientists in the domain of interest from China, Singapore, India, Slovakia and UK. The conference model was divided into two parts, including keynote presentations and online discussion. In the first part, each keynote speakers were allocated 30 minutes to present their talks via Zoom. After the keynote talks, all participants joined in a WeChat communication group to discuss more about the talks and presentations.

List of Committee member, Organizing Committees are available in this pdf.

https://doi.org/10.1088/1742-6596/2294/1/011001

011002

The following article is Open access

Peer Review Statement

View article, Peer Review Statement PDF, Peer Review Statement

All papers published in this volume have been reviewed through processes administered by the Editors. Reviews were conducted by expert referees to the professional and scientific standards expected of a proceedings journal published by IOP Publishing.

• Type of peer review: Single Anonymous

• Conference submission management system: Morressier

• Number of submissions received: 79

• Number of submissions sent for review: 75

• Number of submissions accepted: 40

• Acceptance Rate (Submissions Accepted / Submissions Received × 100): 50.6

• Average number of reviews per paper: 2

• Total number of reviewers involved: 12

• Contact person for queries:

Name: Xuexia Ye

Email: xx.ye@keoaeic.org

Affiliation: AEIC Academic Exchange Information Centre

https://doi.org/10.1088/1742-6596/2294/1/011002

012001

The following article is Open access

A Group Learning based Optimization Algorithm Applied to UWB Positioning

XiuWei Zhang, WenYi Zheng and Yong chen

View article, A Group Learning based Optimization Algorithm Applied to UWB Positioning PDF, A Group Learning based Optimization Algorithm Applied to UWB Positioning

In dealing with optimization problems, metaheuristic algorithms have attracted much attention due to their simple structure and flexible characteristics. Inspired by the principle of teaching students in accordance with their aptitude, this paper proposed a novel metaheuristic algorithm——Group Learning based Optimization (GLBO) Algorithm, this algorithm is suitable for continuous optimization problems. The main idea of this method is to divide a class into three study groups according to their scores, and formulate different study strategies for them according to the characteristics of each group, so as to improve the scores of the whole class. To verify the performance of the algorithm, it is tested on the CEC21 Benchmark suit and applied to UWB positioning. The results show that the proposed method has excellent performance when dealing with continuous optimization problems.

https://doi.org/10.1088/1742-6596/2294/1/012001

012002

The following article is Open access

Research on graph classification based on improved metric learning

Miaomiao Dai, Xiaohui Zhang and Wenxia Chen

View article, Research on graph classification based on improved metric learning PDF, Research on graph classification based on improved metric learning

Few-shot graph classification is to learn valuable information from limited labelled graph data for graph classification. However, although the traditional graph neural networks (GNNs) have been widely used in graph classification tasks, the problem of insufficient graph sample of many domains is ignored. At present, using meta learning method can greatly solve the problem of low data in many fields. However, when facing the new few-shot graph classification task, there are still fundamental difficulties in using the existing methods. In this paper, a novel loss function compared with prototype networks is proposed which takes both inter-class and intra-class distance into account, is applied to solve the problem of graph classification under the condition of few samples. In addition, combined with the existing idea of considering the structural features of graphs, using this idea can extract a more accurate representation of graph sample features, so as to improve the classification accuracy of the model. Extensive experiments on two data sets Chembl and TRIANGLES verify the effectiveness of our proposed method.

https://doi.org/10.1088/1742-6596/2294/1/012002

012003

The following article is Open access

Technology Strategy for Meticulous Governance of Smart Cities: Memory Bus Width Aware Pruning for Efficient Image Super-Resolution Implementation in FPGA

Chuo Zhang

View article, Technology Strategy for Meticulous Governance of Smart Cities: Memory Bus Width Aware Pruning for Efficient Image Super-Resolution Implementation in FPGA PDF, Technology Strategy for Meticulous Governance of Smart Cities: Memory Bus Width Aware Pruning for Efficient Image Super-Resolution Implementation in FPGA

The mega-city governance is based on the aggregation, collation, development and application of multi-scene data, and efficient access to scene data is a key link to promote the meticulous governance of smart cities. Super-resolution technique is the process of upscaling and improving the details within an image. In this paper, we implement a 16-layer residual neural network (ResNet) for the efficient image super-resolution in FPGA. We discover that the memory access is the major performance bottleneck of this implementation. To reduce the memory access overhead, we design a pruning algorithm with the consideration of the memory bus width. Since the memory used in our design generates 256 bits for each access, the proposed pruning algorithm drops the kernel by aligning this bit width. That is, all kernels in one layer are ranked by its L1-norm and we drop the kernels out of the 256 bits. The experimental results show that the proposed method reduces the number of weights by 50% compared with the baseline. As a result, the inference speed can be enhanced by 3 times.

https://doi.org/10.1088/1742-6596/2294/1/012003

012004

The following article is Open access

Research on univariate anomaly diagnosis of gas pipeline measurement data based on Random Forest algorithm

Qing Quan, Dan Li and Shouxi Wang

View article, Research on univariate anomaly diagnosis of gas pipeline measurement data based on Random Forest algorithm PDF, Research on univariate anomaly diagnosis of gas pipeline measurement data based on Random Forest algorithm

The basis of univariate anomaly diagnosis of pipeline system is that univariate time series can reflect fault information of the system. Random Forest belongs to random sampling with put back, which can efficiently process high-dimensional data. This paper firstly combines measurement data with control graph theory, obtains sample data of different modes by Monte Carlo method. Then, Gini Impurity values of random forest algorithm are introduced to sort and optimize the feature of sample data. Finally, the Random Forest is used to realize the high precision intelligent recognition of different modes of the measurement data control chart, and the univariate anomaly diagnosis of the gas pipeline measurement data is realized combining with the fault knowledge base.

https://doi.org/10.1088/1742-6596/2294/1/012004

012005

The following article is Open access

A New Black Widow Algorithm for Discontinuous Optimization in Cloud Task Environment

Die You, WenYi Zheng and Yong Chen

View article, A New Black Widow Algorithm for Discontinuous Optimization in Cloud Task Environment PDF, A New Black Widow Algorithm for Discontinuous Optimization in Cloud Task Environment

Task scheduling in cloud environment has received extensive attention due to its complex characteristics, and most of the previous methods ignore the discontinuity problem of workflow scheduling. Therefore, a new black widow optimization algorithm (NBWO) is proposed. NBWO proposed a new reproductive strategy to select excellent male individuals and female individuals to obtain better offspring. In NBWO, the number of female individuals is fixed, and males obtain mating rights through competition; NBWO improves the mutation strategy, which makes the exploration ability and development ability of the algorithm more balanced. Applying NBWO to cloud computing task scheduling, the experimental results show that NBWO has excellent performance in dealing with discontinuous optimization problems.

https://doi.org/10.1088/1742-6596/2294/1/012005

012006

The following article is Open access

Development and application of data mining method for synthetic aperture radar image ship inspection based on big data application technology

Ruili Chang

View article, Development and application of data mining method for synthetic aperture radar image ship inspection based on big data application technology PDF, Development and application of data mining method for synthetic aperture radar image ship inspection based on big data application technology

Synthetic aperture radar belongs to the radar signal working in microwave band, which has the characteristics of strong penetrating performance, large area imaging, all-weather and so on, and is widely used in the civil field and military field. Especially in the background of the information era, synthetic aperture radar imaging technology has developed to different degrees, and the resolution of synthetic aperture radar images has been significantly improved, which is highly valued by the target detection field, and the detection of naval targets through the reasonable use of synthetic aperture radar images has become an important application direction in the field of marine remote sensing. Based on this, this paper analyzes the characteristics of ship targets in SAR images, analyzes the differences between them and optical images from different aspects, and then concentrates on the most common statistical models and prediction methods of clutter analysis in SAR images, proposes the most reasonable way of clutter distribution simulation, and then uses the experimental way to accurately evaluate the complex sea clutter in SAR images, so as to propose the similar characteristics of the ship detection method‥

https://doi.org/10.1088/1742-6596/2294/1/012006

012007

The following article is Open access

Key Technologies of Media Big Data in-Depth Analysis System Based on 5G Platform

Qiang Lin and Xilin Zhang

View article, Key Technologies of Media Big Data in-Depth Analysis System Based on 5G Platform PDF, Key Technologies of Media Big Data in-Depth Analysis System Based on 5G Platform

To meet the needs of large-scale users for personalized streaming media services with high speed, low delay, and high quality in a 5G mobile network environment, this paper studies the resource allocation mechanism of streaming media based on a 5G network from the perspective of user demand prediction, which can alleviate the pressure of mobile network, improve the utilization rate of streaming media resources and the quality of user service experience. The augmented reality visualization of large-scale social media data must rely on the computing power of distributed clusters. This paper constructs a distributed parallel processing framework in a high-performance cluster environment, which adopts a loosely coupled organizational structure. Each module can be combined, called, and expanded arbitrarily under the condition of following a unified interface. In this paper, the algebraic method of parallel computing algorithm is innovatively proposed to describe parallel processing tasks and organize and call large-scale data-parallel processing operators, which effectively supports the business requirements of large-scale parallel processing of large-scale spatial social media data and solves the bottleneck of large-scale spatial social media data-parallel processing.

https://doi.org/10.1088/1742-6596/2294/1/012007

012008

The following article is Open access

Design of low-power CIC decimation filter based on nibble serial arithmetic

Min Gao, Zongguang Yu, Shuqin Wan and Jie Shao

View article, Design of low-power CIC decimation filter based on nibble serial arithmetic PDF, Design of low-power CIC decimation filter based on nibble serial arithmetic

Aiming at the problem of large area and high power consumption of traditional CIC (Cascade Integral Comb Filter) decimation filter, a folded multistage cascade CIC decimation filter is designed. The nibble-serial arithmetic and multiplexing technology are used to reduce the operation and storage logic unit, lower the cost and improve the utilization efficiency of system resources. The folded multistage cascade CIC decimation filter has a programmable structure of down sampling factor and can dynamically realize 1∼16 times extraction. It is suitable for communication systems with multiple channel bandwidths. In addition, the designed truncation module can optimize the effective accuracy of the final stage output.

https://doi.org/10.1088/1742-6596/2294/1/012008

012009

The following article is Open access

Research on Monitoring Data Security Sharing Method for Hydropower Station Operation and Maintenance

Yuanjiang Ma, Liang Hong, Taisong Qin, Xia Hao and Taiquan Tan

View article, Research on Monitoring Data Security Sharing Method for Hydropower Station Operation and Maintenance PDF, Research on Monitoring Data Security Sharing Method for Hydropower Station Operation and Maintenance

The computer monitoring system centrally and uniformly manages the hydropower stations in the basin, which has high operation efficiency. The current data security sharing methods can not solve the multi-party trust problems such as data ownership and data security. In this regard, a monitoring data security sharing method for hydropower station operation and maintenance is proposed. The monitoring data of hydropower station comes from multiple independent operation systems developed by different suppliers according to different application requirements. The multi-source data are integrated and processed to meet the interactive needs of users. User permission has the characteristics of dynamic change, which can identify the permission attributes. Create a trusted container based on the user ID, locate the monitoring data, and trace the source in data sharing and storage. The access policy is formulated by the data user and bound with the private key, while the data is given an attribute set during encryption. Each data user has a corresponding access policy and a private key corresponding to the access policy, so as to realize the security sharing of monitoring data. The test results show that the design method in this paper can effectively reduce the communication overhead and storage overhead, and has good computing performance.

https://doi.org/10.1088/1742-6596/2294/1/012009

012010

The following article is Open access

Power Grid Multi-protocol Network Topology Based on Data Exchange Strategy

Kejun Qian, Yafei Li, Yi Liu and Zhong Zheng

View article, Power Grid Multi-protocol Network Topology Based on Data Exchange Strategy PDF, Power Grid Multi-protocol Network Topology Based on Data Exchange Strategy

With the combination of Internet of Things technology and smart grid, there are more and more application scenarios of Internet of Things in power system, which makes the whole power system more informationized and intelligent. In this paper, an intelligent data switching technology for power communication network based on multi-protocol label is proposed. The link structure model of mobile core network is constructed, and the TDMA protocol of the multi-protocol intelligent data switching node is designed by using route detection method. The dynamic routing decision and node rotation scheduling of the mobile core network are realized, and the intelligent data exchange performance is improved. Finally, the simulation experiment of the intelligent data exchange protocol is carried out, and the comparative experiment between the proposed method and other methods is carried out, which shows the superior performance of the proposed method in improving the intelligent data exchange capability of the mobile core network.

https://doi.org/10.1088/1742-6596/2294/1/012010

012011

The following article is Open access

SOLOv2-based multi-view contactless bovine body size measurement

Biao Ai and Qi Li

View article, SOLOv2-based multi-view contactless bovine body size measurement PDF, SOLOv2-based multi-view contactless bovine body size measurement

To address the problems that traditional bovine body measurement methods require a lot of manual assistance and lead to stress reactions in cattle, this paper achieves contactless measurement of bovine body length, withers height, chest breath, belly breath and chest depth by using a deep learning approach. This paper use SOLOv2 instance segmentation to identify cattle and extract cattle contours from the top and side views, combines cattle image dataset and OpenCV image processing function to extract cattle feature parts, and use discrete curvature calculation method to extract cattle body size to calculate feature points, and calculate cattle body size parameters by Euclidean distance calculation method. Experiments were conducted using custom model cattle to which bovine body size measurements were taken, after comparing with the manual measurement results, the average relative errors of body length, body height, chest depth, chest breath and belly breath of the model cattle were 1.36%, 0.44%, 2.05%, 2.80% and 1.47%, respectively. The experiment proved that this measurement method performed well in the non-contact measurement of bovine body size and had good accuracy, which provided a new way and method for the measurement of the non-stress response of cattle.

https://doi.org/10.1088/1742-6596/2294/1/012011

012012

The following article is Open access

Lightweight hybrid signature scheme for Internet of Thing based on bilinear mapping

Gangpeng Duan

View article, Lightweight hybrid signature scheme for Internet of Thing based on bilinear mapping PDF, Lightweight hybrid signature scheme for Internet of Thing based on bilinear mapping

Most of the existing IoT communication encryption schemes have the following two problems: the sensor side needs to perform complex bilinear mapping calculations; most schemes separate key agreement and data encryption, which increases the user's computational burden and management difficulty. This paper proposes a lightweight IoT hybrid signature scheme based on bilinear mapping. In terms of transmission efficiency, the calculation process of the bilinear map is transferred to the initialization phase of the system. The sensor side only needs low-cost operations such as hash mapping and exponential operation, which reduces the overall computing cost of the solution. In terms of security, the mathematical difficulty caused by the bilinear mapping calculation in the initialization phase is used to ensure the security of data transmission. In solving the problem of key management, the scheme uses the semi-trusted key generation center (KGC) and sensor ID to generate user session keys and data keys, which solves the public key authentication and key escrow problems of massive sensors in the Internet of Things.

https://doi.org/10.1088/1742-6596/2294/1/012012

012013

The following article is Open access

Research on 3D face measurement for facial virtual plastic surgery

Liwei Su, Guiwen Li, Zhenyao Liu and Xiaoqiang Ji

View article, Research on 3D face measurement for facial virtual plastic surgery PDF, Research on 3D face measurement for facial virtual plastic surgery

Calculating the key data of human face through three-dimensional face model can help doctors predict the surgical results before plastic surgery, guide the formulation of surgical plan and reduce the surgical risk. In this paper, 3D face measurement technology is studied, and the area of plastic area is calculated by slice method. A large number of experimental tests show that the average errors of facial geometric feature measurement, area measurement are 0.458mm and 0.65mm² respectively. The rating of simulation results provided by plastic surgeons shows that facial measurement technology can effectively improve the success rate of plastic surgery. Compared with the original face, the edited 3D face is more attractive in respecting the following beauty standards: "three courts and five eyes", golden section, etc.

https://doi.org/10.1088/1742-6596/2294/1/012013

012014

The following article is Open access

Python-based film review data acquisition and visualization design

Yong Yang, Ying Xin Liu, Yu Xi Zhang and Na Zhang

View article, Python-based film review data acquisition and visualization design PDF, Python-based film review data acquisition and visualization design

With the rapid development of the Internet and artificial intelligence era, how to obtain effective information is particularly important in the complex network world. In order to obtain a real movie viewing experience and provide some reference for other users, this article is based on the Python language, taking the popular movie "Shuimen bridge of Changjin Lake" as an example on Douban website, and using web crawler technology to comment and rate users, etc. Relevant data is crawled, the crawled data is the object of analysis, and data visualization processing is carried out to more intuitively show the true feelings of moviegoers. The results show that the film meets the expectations of the public and is worthy of recommendation.

https://doi.org/10.1088/1742-6596/2294/1/012014

012015

The following article is Open access

Research on user granularity-level personalized social text generation technology

Y B Gao, J T Gao, R Ma and L D Yang

View article, Research on user granularity-level personalized social text generation technology PDF, Research on user granularity-level personalized social text generation technology

With the introduction of large-scale pre-trained language models, breakthroughs have been made in text generation technology research. On this basis, in order to assist users to complete personalized creation, this paper proposes a user-level fine-grained control generation model. First, we design the Encoder-Decoder framework based on the GPT2 structure, and model and encode the user's static personalized information on the Encoder side. Then add a bidirectional independent attention module to receive the personalized feature vector on the Decoder side. The attention module in the original GPT2 structure captures the dynamic personalized features in the user text, namely writing style, expression way, etc. Next, the scores of each attention module are weighted and fused to participate in the subsequent decoding to automatically generate social texts constrained by the user's personalized feature attributes. However, the semantic sparsity of the user's basic information will cause occasional conflicts between the generated text and some personalized features. Therefore, we use the Alignment module to perform the secondary enhancement and generation of consistent understanding between the output data of the Decoder and the user's personalized features, and finally realize the personalized social text generation. Experiments show that compared with the GPT2 baseline model, the fluency of the model is improved by 0.3%-0.6%, and on the basis of no loss of language fluency, the social text generated by the model can have significant user personalization characteristics, among which personalization and consistency the two evaluation indicators of sexuality both increased significantly by 8.4% and 9%.

https://doi.org/10.1088/1742-6596/2294/1/012015

012016

The following article is Open access

NL2SQL Generation with Noise Labels based on Multi-task Learning

Lingli Long, Yongjin Zhu, Jun Shao, Zheng Kong, Jian Li, Yanzheng Xiang and Xu Zhang

View article, NL2SQL Generation with Noise Labels based on Multi-task Learning PDF, NL2SQL Generation with Noise Labels based on Multi-task Learning

With the rapid development of artificial intelligence technology, semantic recognition technology is becoming more and more mature, providing the preconditions for the development of natural language to SQL (NL2SQL) technology. In the latest research on NL2SQL, the use of pre-trained models as feature extractors for natural language and table schema has led to a very significant improvement in the effectiveness of the models. However, the current models do not take into account the degradation of the noisy labels on the overall SQL statement generation. It is crucial to reduce the impact of noisy labels on the overall SQL generation task and to maximize the return of accurate answers. To address this issue, we propose a restrictive constraint-based approach to mitigate the impact of noise-labeled labels on other tasks. In addition, parameter sharing approach is used in noiseless-labeled labels to capture each part's correlations and improve the robustness of the model. In addition, we propose to use Kullback-Leibler divergence to constrain the discrepancy between hard and soft constrained coding of noisy labels. Our model is compared with some recent state-of-the-art methods, and experimental results show a significant improvement over the approach in this paper.

https://doi.org/10.1088/1742-6596/2294/1/012016

012017

The following article is Open access

DADDE:deep anomaly detection with density estimation

Shunran Duan, Longlong Jiao, Meijuan Yin and Lanlan Yü

View article, DADDE:deep anomaly detection with density estimation PDF, DADDE:deep anomaly detection with density estimation

The problem of abnormal nodes detection in attributed network is widely used in daily life, such as social networks, cyberspace security, financial fields and so on. Most existing detection methods ignore the relationship between structure information and attribute information in attributed network. Although some methods consider the relationship between them, but it can't distinguish the types of abnormal nodes well, that is, attribute exceptions or structural exceptions. Aiming at the shortcomings of existing methods, this paper proposes a deep anomaly detection model combining density estimation. The idea of detecting abnormal nodes by reconstruction error is used, the structure information and attribute information of attributed network are reconstructed respectively, and abnormal nodes are detected by density estimation based on reconstruction error and embedding vector of nodes. In this paper, the validity of DADDE model is proved by experiments on three commonly used data sets in this field.

https://doi.org/10.1088/1742-6596/2294/1/012017

012018

The following article is Open access

An approach to error label discrimination based on joint clustering

Z M Tan, J Y Liu, Q Li, D Y Wang and C Y Wang

View article, An approach to error label discrimination based on joint clustering PDF, An approach to error label discrimination based on joint clustering

Inaccurate multi-label learning aims at dealing with multi-label data with wrong labels. Error labels in data sets usually result in cognitive bias for objects. To discriminate and correct wrong labels is a significant issue in multi-label learning. In this paper, a joint discrimination model based on fuzzy C-means (FCM) and possible C-means (PCM) is proposed to find wrong labels in data sets. In this model, the connection between samples and their labels is analyzed based on the assumption of consistence between samples and their labels. Samples and labels are clustered by considering this connection in the joint FCM-PCM clustering model. An inconsistence measure between a sample and its label is established to recognize wrong labels. A series of simulated experiments are comparatively implemented on several real multi-label data sets and experimental results show superior performance of the proposed model in comparison with two state of the art methods of mislabeling correction.

https://doi.org/10.1088/1742-6596/2294/1/012018

012019

The following article is Open access

A Compressed Sensing Reconstruction Algorithm Based on Hyperbolic Function

Hui Wang

View article, A Compressed Sensing Reconstruction Algorithm Based on Hyperbolic Function PDF, A Compressed Sensing Reconstruction Algorithm Based on Hyperbolic Function

The accuracy of the compressed sensing theory reconstruction algorithm is important for signal recovery. Sparsity Adaptive Matching Pursuit (SAMP) has a fixed step size in each iterative process of reconstruction, which has a significant impact on accuracy in actual use, often leading to overestimation and underestimation. In order to solve this problem, combined with the advantages of regular backtracking and variable step size, the article introduces the idea of retrospective in the atom selection stage. Then, the atoms are inspected using backtracking and re-screened. The algorithm uses regularization at the beginning to select atoms with higher energy and reduce the number of atoms in the candidate set at the threshold stage. During reconstruction, because the energy difference of the reconstructed signal decreases rapidly at the initial stage, and then the energy difference decreases slowly. This algorithm takes advantage of the large change rate of the hyperbolic function at the beginning and the slower change rate at the later stage. At first, a large step size is used. When the energy difference reaches a certain threshold, a small step size is used instead. After the actual simulation, the improved algorithm improves the reconstruction accuracy and the effect is better.

https://doi.org/10.1088/1742-6596/2294/1/012019

012020

The following article is Open access

Testing homogeneity of high-dimensional covariance matrices under non-normality

Jieqiong Shen

View article, Testing homogeneity of high-dimensional covariance matrices under non-normality PDF, Testing homogeneity of high-dimensional covariance matrices under non-normality

In this paper, we test the homogeneity of multiple covariance matrices when the dimension may exceed the sample sizes. A test statistic is proposed which does not depend on the normality assumption. Furthermore, the asymptotic distribution of the test statistic is derived. Numerical simulations indicate that the proposed test has accurate significance level, and has a greater improvement in power than some existing tests.

https://doi.org/10.1088/1742-6596/2294/1/012020

012021

The following article is Open access

The research of begin the Apache Giraph

Haowei Shen

View article, The research of begin the Apache Giraph PDF, The research of begin the Apache Giraph

Apache Giraph is an open-source image processing framework built from Apache Hadoop. Apache Giraph is based on bulk synchronous parallel and Google's Pregel. It is suitable for running large-scale logical calculations, such as page ranking, shared links, personalized ranking, etc. Giraph focuses on social graph computing and is at the core of Facebook's open graph tool, handling trillions of connections between users and their behaviors in a few minutes. But because the time is limited, the work just has been done with Hadoop. This paper will be mainly talked about Hadoop and how the difficulty to be solved.

https://doi.org/10.1088/1742-6596/2294/1/012021

012022

The following article is Open access

Solve the Optimal Strategy of "Desert Crossing" with multi-goal programming

Yifan Dai, Xinyi Zhou and Jingyi Mao

View article, Solve the Optimal Strategy of "Desert Crossing" with multi-goal programming PDF, Solve the Optimal Strategy of "Desert Crossing" with multi-goal programming

In this paper, we take a game called "Desert Crossing" into research. With the game strategy of avoiding failure and maximizing the revenue, we proposed a topological model for removing redundant paths in maps with Dijkstra. For the simple case of known daily weather in the game, we established a goal programming model with the constraint of game rules. A method based on Tree DP is proposed to solve the complex mathematical model. For the complex case of random weather, a probability model of weather is established by Markov Chain, and a multi-goal programming model is proposed.

https://doi.org/10.1088/1742-6596/2294/1/012022

012023

The following article is Open access

Self-Knowledge Distillation For the Object Segmentation Based on Atrous Spatial Pyramid

Xiaobing Wang, Yunfei Zheng, Xiongwei Zhang, Tieyong Cao, Chenming Li, Xiulai Wang, Yong Wang, Zheng Fang and Yang Wang

View article, Self-Knowledge Distillation For the Object Segmentation Based on Atrous Spatial Pyramid PDF, Self-Knowledge Distillation For the Object Segmentation Based on Atrous Spatial Pyramid

This paper presents an effective Self-Knowledge Distillation (SKD) framework via Atrous Spatial Pyramid Structure (ASPS), which is able to enhance the performance of the object segmentation network without increasing network parameters. In the framework, a lightweight object segmentation network is constructed to achieve the pixel-level object segmentation efficiently. A SKD learning model, including the SKD representation structure based on ASPS and loss function, is proposed to transfer the knowledge in the ASPS into our object segmentation network and improve its generalization ability. The experimental results confirm that compared with recent typical object segmentation networks, our object segmentation network contains the fewest parameters but achieves better performance. Moreover, the proposed SKD method achieves the best performance-boosting compared with recent SKD methods

https://doi.org/10.1088/1742-6596/2294/1/012023

012024

The following article is Open access

A nonfunctional data transformation approach via kurtosis adjustment and its application to SVM classification

Yu Liu

View article, A nonfunctional data transformation approach via kurtosis adjustment and its application to SVM classification PDF, A nonfunctional data transformation approach via kurtosis adjustment and its application to SVM classification

Many statistical methods are very sensitive to data containing outliers and heavy tails, and simply eliminating these data often does not achieve the desired results. We usually need to do some data transformation to make it approximately follow a normal distribution. But not all data can be transformed into a normal distribution, and then we can only adjust the shape of its data distribution to make its shape close to a normal distribution. The kurtosis of the distribution can better reflect the peakedness or flatness of the distribution. So in this paper, I propose a nonfunctional data transformation approach to improve the efficiency of statistical methods by continuously adjusting the kurtosis of the data while maintaining the distribution of the data. I apply the transformed data to SVM classification, and the numerical results show that the transformed data by my method performs significantly better than the untransformed data, as well as better than other comparable methods.

https://doi.org/10.1088/1742-6596/2294/1/012024

012025

The following article is Open access

Two inequalities for the minimal eigenvalue of M-matrices

Qin Zhong

View article, Two inequalities for the minimal eigenvalue of M-matrices PDF, Two inequalities for the minimal eigenvalue of M-matrices

M-matrices are closely related to nonnegative matrices and they have extensive application background in computational mathematics and related fields. Some properties on the minimal eigenvalue of M-matrices are presented in terms of the relationship between the nonnegative matrices and the M-matrices.

https://doi.org/10.1088/1742-6596/2294/1/012025

012026

The following article is Open access

Stability analysis of a predator-prey model with stage structure

Qiyue Pang and Yang Gao

View article, Stability analysis of a predator-prey model with stage structure PDF, Stability analysis of a predator-prey model with stage structure

Many scholars have explored the population dynamics system, functional response, time-delay response, and so on. Based on the existing research, this paper studies a class of predator-prey models with stage structure, time delay, and Holling type-II functional response function. The stability of that model at the positive equilibrium point, the sufficient condition for stability, and the existence of Hopf bifurcation are discus. The model was numerically simulated by taking appropriate parameters and different time delay values, variation diagram of each component, and solution curves were given near the critical value. The results show that the stability of the system will change with the variation of the bifurcation parameter value, and Hopf bifurcation will occur.

https://doi.org/10.1088/1742-6596/2294/1/012026

012027

The following article is Open access

Modeling and Research on Route Planning

Ying Liu

View article, Modeling and Research on Route Planning PDF, Modeling and Research on Route Planning

In view of the problem of fixed and mobile threat areas in the navigation of reconnaissance aircraft, this paper proposes to connect the position information with the time information, and build a distance matrix based on the ant colony algorithm to model and solve the optimal navigation route.This article is based on the problem of the threat area in the area where the reconnaissance aircraft is sailing, since the distance between two target points is no longer a straight line distance but a curve distance, so consider building a distance matrix.The simulation experiment results show that the algorithm proposed in this paper can select the optimal path faster and has good accuracy when traversing each target point.

https://doi.org/10.1088/1742-6596/2294/1/012027

012028

The following article is Open access

Empirical Analysis of Factors Affecting Air Quality in China-Based on the Skew-Normal Spatial Dynamic Panel Data Models

Yan Yang, Yuanyuan Ju, Liucang Wu and Lin Dai

View article, Empirical Analysis of Factors Affecting Air Quality in China-Based on the Skew-Normal Spatial Dynamic Panel Data Models PDF, Empirical Analysis of Factors Affecting Air Quality in China-Based on the Skew-Normal Spatial Dynamic Panel Data Models

To explore the influence factors of air quality, the air quality data of 31 provincial capitals in China from 2014 to 2019 is studied by introducing skew-normal spatial dynamic panel data models in this paper. A Markov Chain Monte Carlo algorithm is developed to estimate the unknown parameters in the model. The main conclusions of the research are followed. (1) The variation of air quality in China could be well reflected by the skew-normal space dynamic panel data model. (2) PM₁₀ had the most significant influence on air quality among the five air quality indexes, such as SO₂, CO, NO₂, PM₁₀ and O₃. (3) There are significant temporal and spatial correlations on air quality among provincial capitals. (4) In China, the air quality in southern and southwestern is excellent, while the air quality in northern is slightly polluted. These conclusions provide theoretical guidance for improving air quality in China and a scientific basis for making decisions of relevant departments.

https://doi.org/10.1088/1742-6596/2294/1/012028

012029

The following article is Open access

Rainfall study based on ARIMA-RBF combined model

Jialu Zhao, Ruyun Chen and Haiyuan Xin

View article, Rainfall study based on ARIMA-RBF combined model PDF, Rainfall study based on ARIMA-RBF combined model

The traditional differential integrated moving average autoregressive model (ARIMA) has some deviations in the prediction accuracy of monthly rainfall. In this paper, we propose to combine the ARIMA model with the radial basis function neural network (RBF) neural network model to predict the monthly rainfall in Nanchang, Jiangxi Province, using the ARIMA-RBF model. Firstly, the ARIMA model is used to predict the monthly rainfall and calculate its residuals, and then the RBF neural network model is used to approximate and compensate the prediction results of the ARIMA model to correct the final prediction results. The results show that the prediction results of the combined model are better than those of the single ARIMA model and the single RBF neural network model with good accuracy.

https://doi.org/10.1088/1742-6596/2294/1/012029

012030

The following article is Open access

A Novel Condenser Vacuum Degree Prediction Model Based on LSTM and MemN2N

Ziwen Sun, Tao Wang, Yanning Lu, Bo Chen, Yawei Jin, Shipian Guan and Jiasheng Si

View article, A Novel Condenser Vacuum Degree Prediction Model Based on LSTM and MemN2N PDF, A Novel Condenser Vacuum Degree Prediction Model Based on LSTM and MemN2N

Condenser vacuum degree prediction of power plants is a challenge task in power system security field. Most existing studies are based on shallow machine learning algorithms, which fail to leverage historical data comprehensively, resulting inaccuracy and unreliable predictions. Therefore, using a serialization model like Recurrent Neural Network to capture time-series information from historical data is necessary. However, these serialization model alone has inherent defects in dealing with long-distance dependence, which may cause historical information forgetting problem. This paper proposes a new prediction model combining LSTM and End-To-End Memory Network (MemN2N). We use LSTM to mine the long-distance dependency information in historical data, and introduce the encoding historical information into the memory pool of MemN2N. MemN2N allows better preservation of historical information for serialization model LSTM, and can make accurate and reliable predictions through soft attention mechanism. Through the experiments on real data from the power plant show that, compared with other prediction models, the model proposed in this paper achieves higher prediction accuracy and has great engineering value.

https://doi.org/10.1088/1742-6596/2294/1/012030

012031

The following article is Open access

Extraction and Analysis of Crowd Activity Vergence Model in Space-Time Vector Field

Haiyan Liu, Jing Li, Qiang Guo, Youwei Zhang, Chuanwei Lu, Fang Hu and Hongjian Wu

View article, Extraction and Analysis of Crowd Activity Vergence Model in Space-Time Vector Field PDF, Extraction and Analysis of Crowd Activity Vergence Model in Space-Time Vector Field

The vergence model of crowd activity is one of the core contents of human mobility research. Traditional methods do not consider the mobility of crowd activities in terms of extracting vergence models. In this paper, the model extraction problem is transformed into a time series clustering problem, and the mobility of crowd activities is dynamically modeled by introducing vector field theory. Then, the vergence of the crowd is calculated by the divergence. Finally, a time series composed of the crowd vergence is constructed to obtain the main vergence model of crowd activity through clustering. The method proposed in this paper is experimentally verified on the Didi Chuxing data in Haikou City, and four main vergence models of the crowd activity are extracted, which proves that the method proposed in this paper is effective and provides research ideas and method support for exploring human mobility.

https://doi.org/10.1088/1742-6596/2294/1/012031

012032

The following article is Open access

Condenser Vacuum Degree Prediction Model with Multi-View Information Fusion

Pengfei He, Yingjie Zhu, Bo Chen, Yanning Lu, Yongling Yao, Jie Xiao and Jiasheng Si

View article, Condenser Vacuum Degree Prediction Model with Multi-View Information Fusion PDF, Condenser Vacuum Degree Prediction Model with Multi-View Information Fusion

Vacuum degree is a crucial factor for the operation of thermoelectric generating set. Existing approaches typically use machine learning algorithms to link the relationship between unit operating data and condenser vacuum degree by focusing on the temporal information within the data, while ignoring the frequency information implied in the historical condenser vacuum degree. To fully use of frequency information and further improve prediction accuracy, we propose a novel condenser vacuum degree prediction model with multi-view information fusion. In specific, the implicit frequency information in the historical vacuum degree sequence is explored via a combination of Variational Mode Decomposition (VMD) and Convolutional Neural Network (CNN). Furthermore, Transformer encoder is used to extract the temporal information from the unit operating data. Finally, the two views information are fused for condenser vacuum degree prediction. Extensive experiments conducted on the real data collected from a power plant demonstrate the superiority of the proposed method over several state-of-the-art methods.

https://doi.org/10.1088/1742-6596/2294/1/012032

012033

The following article is Open access

A Hierarchical Intrusion Detection System Based on Machine Learning

Dehua Kong, Sicheng Peng, Yihong Zhai, Zhangyuan Liu, Luming Zhang and Zixuan Wan

View article, A Hierarchical Intrusion Detection System Based on Machine Learning PDF, A Hierarchical Intrusion Detection System Based on Machine Learning

The Intrusion detection system (IDS) is one of the most important tools for defending against abnormal flow and attack messages. Most of the existing IDSs use detection technology based on security policies, and there is a risk that it cannot be accurately analyzed and evaluated. Therefore, machine learning techniques provide a new direction for solving this problem. This paper uses and analyzes the CIC-IDS series datasets, but there is a data imbalance in this dataset. In order to solve the problem of data imbalance and reduce the accuracy of the model, this paper proposes a hierarchical detection model. Experiments have shown that the stratified detection module has good classification accuracy for attack types with a small sample size.

https://doi.org/10.1088/1742-6596/2294/1/012033

012034

The following article is Open access

A simple implement of Q-learning in robot path planning

Haoran Gao, Yujie Liu, Shiqi Su and Wetao Yao

View article, A simple implement of Q-learning in robot path planning PDF, A simple implement of Q-learning in robot path planning

This paper firstly gives an introduction of robot path planning problem, which includes the brief definition of path planning, some representative methods and previous applications of Q-learning. Secondly, the paper compares some typical methods, like Breadth First Search and Depth-First-Search, A* and deep learning, with corresponding pseudo codes in detail. Their advantages and disadvantages are also listed in this part. Thirdly, we carry out a simple simulation experiment by applying Q-learning method. The experiment is clearly presented in several parts which includes environment establishment, realizing the Q-learning, simulation experiment and interpretation of the Q-table. In the end, a short conclusion summarizes the achievement of our results.

https://doi.org/10.1088/1742-6596/2294/1/012034

012035

The following article is Open access

Application of LSTM-LightGBM Nonlinear Combined Model to Power Load Forecasting

You Zhou, Qing Lin and Di Xiao

View article, Application of LSTM-LightGBM Nonlinear Combined Model to Power Load Forecasting PDF, Application of LSTM-LightGBM Nonlinear Combined Model to Power Load Forecasting

The accurate prediction of power system load is extremely important for the operation of the power market and the safe operation of the power grid. In order to improve the accuracy of short-term load forecasting of power systems, a combination model based on long short term memory network (LSTM) and light gradient boosting machine (LightGBM) is proposed. The experiment first decomposes historical load data by EMD, uses historical weather data and load data decomposed by EMD to establish LSTM prediction model and LightGBM prediction model respectively, and then these two predicted values are linearly combined to obtain the final predicted value. The electrical load data of the 2016 Electrician Mathematical Contest in Modeling is used as an example to verify. The experimental results show that the LSTM-LightGBM combined model has higher forecasting accuracy and application prospects for power load forecasting than traditional load forecasting methods and standard LSTM and LightGBM load forecasting methods.

https://doi.org/10.1088/1742-6596/2294/1/012035

012036

The following article is Open access

Research on mathematical modeling of order and transshipment optimal distribution

Lin Cui, Yin Liu and Zhi Liu

View article, Research on mathematical modeling of order and transshipment optimal distribution PDF, Research on mathematical modeling of order and transshipment optimal distribution

Aiming at the problems of production arrangement and ordering, this paper conducts quantitative analysis, overall analysis and individual analysis on the supply characteristics of 402 suppliers, and constructs a ratio ranking model of 402 suppliers in the past five years, which reflects the important mathematical model to ensure the production of enterprises. And multi-index evaluation model, comprehensive use of Matlab, Excel and other software programming to solve, to determine the 50 most important suppliers. On the basis of mathematical modeling, a control scheme is given, and further analysis is completed, so as to optimize the ordering and transportation of raw materials in the production enterprise.

https://doi.org/10.1088/1742-6596/2294/1/012036

012037

The following article is Open access

Research on the Variable Weight Combined Prediction Based on the Optimization Weighting Method

Xiaoli Wang

View article, Research on the Variable Weight Combined Prediction Based on the Optimization Weighting Method PDF, Research on the Variable Weight Combined Prediction Based on the Optimization Weighting Method

This paper improves fixed weight combined prediction model, which is established on the predicted value of the ARIMA model and the predicted value of the BP neural network model. Based on the minimum sum of error squares, the time varying optimal weight of the combined prediction model is determined, variable weight combined prediction model is constructed. Based on the analysis of the change rule of Shaanxi province's GDP over the years, the combined model is used to fit Shaanxi province GDP from 2018 to 2021.The result shows that the fitting errors are 0.11%, 0.01, 0.01% and 0.11%, respectively. The variable weight combined model is used to predict Shaanxi province GDP from 2022 to 2023, which is 29502 billion and 31301 billion respectively. Compared with the ARIMA and BP models, variable weight combined model improves the prediction effect.

https://doi.org/10.1088/1742-6596/2294/1/012037

Table of contents

Volume 2294

5th International Symposium on Big Data and Applied Statistics (ISBDAS 2022) 22/04/2022 - 24/04/2022 Xining, China

Preface

Big Data Analysis and Algorithm Image Processing

Applied Statistics and Numerical Visualization Analysis

Computer Model Prediction and Decision-Making Applications

Journal links