How to reproduce the evaluation results in the paper

If you just want to reproduce the empirical results on unified malware detection as presented in the paper, just shoot the simple line as follows

python ML/confusionMatrix_multipleModels_loo_forMajorFamilyOnly.py

This will run the leave-one-out cross validation using the benchmark samples for training and testing, and will produce the confusion matrix shown in the paper, including results with alternative learning algorithms and feature sets.

If you want to reproduce the results on using DroidCat simply for the conventional, binary detection, shoot the following line

python ML/multipleModels_tab.py true true|false The last argument switches output between accuracy and precision/recall/F1.

This will produce binary-detection results with all learning algorithms and feature sets.

How to use droidcat on a set of unknown apps with the current classification model

The following instructions are used when running DroidCat on unknown apps using the model already trained with the benchmark samples

1. run static analysis (including the instrumentation) of each app

scripts/cgInstr.sh apk-file

where the apk-file is the APK of the app under detection.

2. run the instrumented app and generate call traces

scripts/runAllApps_monkey.sh

3. analyze traces and compute features

do so for general features, single-app ICC features, inter-app ICC features, and security features using "scripts/allGeneralReport_all.sh", "scripts/allICCReport_all.sh", "scripts/allInterAppICCReport_all.sh", and "scripts/allSecurityReport_all.sh", respectively.

The step will result in feature vector files including gfeatures.txt, iccfeatures.txt, and securityfeatures.txt.

4. classify the apps

Gather the above three feature vector files in a directory. Let's say, appfv. Then, run shoot the following line:

python ML/DroidCat_detect.py true|false

With the last argument being true for binary detection and false (default) for the unified detection.

How to use droidcat with an updated classification model

To accommodate new malware families and new types of benign apps, the classification model of DroidCat will need to be retrained using a new set of training samples.

1. obtain the new training data set

Compute the features of each training sample by following the first three steps in Section How to use droidcat on a set of unknown apps with the current classification model. Then merge all their features into the three feature files and move these three files in ML/features.

2. use DoridCat on the new training data set

Just use DroidCat as shown in the fourth step of Section How to use droidcat on a set of unknown apps with the current classification model, the new classification model will be trained automatically on the new training data set.