Brought to you by:
Paper The following article is Open access

Research on Android Multi-classification Based on Text

, , , , and

Published under licence by IOP Publishing Ltd
, , Citation Hua Zhang et al 2021 J. Phys.: Conf. Ser. 1828 012049 DOI 10.1088/1742-6596/1828/1/012049

1742-6596/1828/1/012049

Abstract

In recent years, more and more malicious applications have appeared on mobile application platforms, and they are often disguised as social, communication, and game applications. If we classify applications by category when detecting malware, which can improve the accuracy of malware detection. Classification of applications' categories requires a large number of high-quality samples, but labels of applications' categories are vary widely in different app stores, and samples of the same function type cannot be obtained quickly and efficiently. This thesis proposes a method for constructing a multi-classification model by using the text content of application description information, and guiding the classification of application by the category of application description. This method collects the description of an application in different app stores, predicts the category of the description through the classification model, and obtains the application category by voting. The model is based on CNN and RNN, and its F1-score is about 3% higher than the text classification model such as textCNN, LSTM. Its training prediction time and memory consumption are only 6% higher than that of textCNN and LSTM models. We named it CRNN, this thesis constructs a data set that can be used for application classification. The data set is classified using application description to obtain each application description and its category.

Export citation and abstract BibTeX RIS

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Please wait… references are loading.
10.1088/1742-6596/1828/1/012049