The AndroOBFS Dataset

Description:

With the large-scale adaptation of Android OS and ever-increasing contributions in the Android application space, Android has become the number one target of malware authors. In recent years, a large number of automatic malware detection and classification systems have evolved to tackle the dynamic nature of malware growth using either static or dynamic analysis techniques. Performance of static malware detection methods degrades due to the obfuscation attacks. Although many benchmark datasets are available to measure the performance of malware detection and classification systems, only a single obfuscated malware dataset (PRAGuard) is available to showcase the efficacy of the existing malware detection systems against the obfuscation attacks. PRAGuard contains outdated samples till 2013 and does not represent the latest application categories. Moreover, PRAGuard does not provide the family information for malware because of which PRAGuard can not be used to evaluate the efficacy of the malware family classification systems. Hence, we create and release AndroOBFS, a time-obfuscated malware dataset with familial information spanning over three years from 2018 to 2020.

The AndroOBFS dataset contains 16279 unique real-world obfuscated malware samples in six categories viz. (i) Trivial, (ii) Renaming, (iii) Encryption, (iv) Reflection, (v) Code, and (vi) Mix (a mix of two or more methods from (i) to (v)). Out of 16279 unique obfuscated malware samples, 114579 samples are distributed across 158 families with at least two unique malware samples in each family. We store all the information about obfuscated malware with family in two CSV files; one CSV file corresponds to 16279 samples ( 16279.csv) and the other for 14579 familial malware samples       ( 14579.csv). We release this dataset to aid the Android malware study in designing robust and obfuscation resilient malware detection and classification systems.

 

 

Note: To obtain the APK files of obfuscated malware dataset please look at the download policy
mentioned below.

Download Policy:

We are happy to share our malware dataset. However, in order to prevent any misuse, we kindly ask you to send us a mail to skmtr@cse.iitk.ac.in stating your identity and research scope. We will then send you the login credentials and dataset download link.

If you are in academia

  1. If you are a student, please ask your advisor to send us an email for the access. If you are a faculty, please send us the email from your university's email account.
  2. In your email, please include your name, affiliation, and homepage. This information is needed for verification purpose. Note that your request may be ignored if we are not able to determine your identity or affiliation.

If you are currently in industry

  1. Please send us an email from your company's email account. Please briefly introduce yourself (e.g., name) and your company in that mail. Please also attach a justification letter (PDF) in official letterhead. The
  2. Justification letter needs state clearly the reasons why the dataset is being requested. Also, acknowledge that the dataset will not be shared to others without our permission.