Robust Detection of Obfuscated Strings in Android Apps
2019 (English)In: Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, New York, NY, USA: Association for Computing Machinery (ACM), 2019, p. 25-35, article id 42Conference paper, Published paper (Refereed)
Abstract [en]
While string obfuscation is a common technique used by mobile developers to prevent reverse engineering of their apps, malware authors also often employ it to, for example, avoid detection by signature-based antivirus products. For this reason, robust techniques for detecting obfuscated strings in apps are an important step towards more effective means of combating obfuscated malware. In this paper, we discuss and empirically characterize four significant limitations of existing machine-learning approaches to string obfuscation detection, and propose a novel method to address these limitations. The key insight of our method is that discriminative classification methods, which try to fit a decision boundary based on a set of positive and negative samples, are inherently bound to generalize poorly when used for string obfuscation detection. Since many different string obfuscation techniques exist, both in the form of commercial tools and as custom implementations, it is close to impossible to construct a training set that is representative of all possible obfuscations. We instead propose a generative approach based on the Naive Bayes method. We first model the distribution of natural-language strings, using a large corpus of strings from 235 languages, and then base our classification on a measure of the confidence with which a language can be assigned to a string. Crucially, this allows us to completely eliminate the need for obfuscated training samples. In our experiments, this new method significantly outperformed both an n-gram based random forest classifier and an entropy-based classifier, in terms of accuracy and generalizability.
Place, publisher, year, edition, pages
New York, NY, USA: Association for Computing Machinery (ACM), 2019. p. 25-35, article id 42
Series
Proceedings of the ACM Conference on Computer and Communications Security, ISSN 1543-7221
Keywords [en]
Android, string obfuscation detection, string encryption, machine learning, generative models, malware
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:liu:diva-166612DOI: 10.1145/3338501.3357373Scopus ID: 2-s2.0-85075860536ISBN: 978-1-4503-6833-9 (electronic)OAI: oai:DiVA.org:liu-166612DiVA, id: diva2:1442453
Conference
AISec'19: Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security co-located with CCS'19: 2019 ACM SIGSAC Conference on Computer and Communications Security, London, United Kingdom, November 2019
Projects
Automated android malware analysis using machine learning2020-06-172020-06-172021-07-15