DistEA Usage

How to use DistEA

Requirement

All the libs listed below should be in your Java classpath:

Libraries required:

These libraries are required to compile DistEA source package and, more other libraries may be needed to add to either the JVM class path or Soot class path, depending on which part of distea is to be used. Those more libraries are clarified in the sections where they are involved as follows.

Input

Simply put, the input to DistEA should be the Java bytecodes, plus a set of methods that are supposed to be changed, which we call queries. The bytecodes, in the form of classfiles, are to be presented to the DistEA's static analysis module as the source input of the whole DistEA dynamic analysis pipeline as well, which includes three phases:
(1) static analysis, which builds the static MDG and instruments probes for method event monitoring and timestamping.
(2) running the instrumented code, by which DistEA traces the method entry, method exit, and returned-into events;
(3) post-processing, using the traces generated from phase (2), DistEA computes the impact set of the input queries.

As these phases being listed, the input for phase (2) is the instrumented subject classfiles, and that for the phase (3) is the outputs from running the subject instrumented. The runtime output is method execution trace per test input in each process. Another input for phase (2) is, of course, what the original subject should be fed with---the test inputs.

How to prepare subjects for DistEA analysis

In order for DistEA to instrument the bytecode of the target subject, adding certain statements in the subject source code is needed. More specifically, DistEA requires the insertion of statements into the main/entry class of the whole subject software.

Precisely, the following statements should be added to the entry class, which is essentially an extra static method and thus would not mess up the original source code:

public class TheEntryClass {
	......
	// added for DistEA Instrumentation
	static void __link() { 
		DistEA.distMonitor.__link(); 
	}
	......

As long as the libraries required as described above are included in the class paths, the compiler will be able to find the packages and names in there. For the package "DistEA.distMonitor" to be resolved, however, the path where the DistEA classfiles are located should also be added to the class path setting for the Java compiler.

Now the source code of target subject with the foregoing additions can be compiled, an exemplary class path setting is given below as it is in a complete compilation script (in Linux B-shell):

	ROOT=/home/hcai/
	MAINCP=".:$ROOT/tools/DUAForensics-bins-code/DUAForensics:$ROOT/tools/DUAForensics-bins-code/InstrReporters:$ROOT/tools/DistEA/bin/:$ROOT/subject/src/"
	javac -cp ${MAINCP} -d $ROOT/subject/bin subjectFile1.java

Other paths needed for the subject's class dependences and libraries ought also to be in "MAINCP". Once the compilation succeeds, it is ready to start the static analysis of DistEA as demonstrated as follows.

How to perform DistEA static analysis

When the inputs for DistEA are prepared, 'DistEAInst' is ready to run. As shown in the following example, the class path for JVM and Soot are given by "MAINCP" and "SOOTCP", respectively. To run 'DistEAInst' on command line on Windows, the shell script below can be converted into batch script accordingly. When running it in IDEs such as Eclipse, a Run/Debug configuration can be created by picking class paths as exemplified in the script too and indicating corresponding parameters.
The primary parameters for 'DistEAInst' are explained as follows.


ver=$1
seed=$2
ROOT=/home/hcai/
subjectloc=/home/hcai/SVNRepos/star-lab/trunk/Subjects/ZooKeeper/

MAINCP=".:$ROOT/tools/j2sdk1.4.2_18/jre/lib/rt.jar:$ROOT/tools/polyglot-1.3.5/lib/polyglot.jar:$ROOT/tools/soot-2.3.0/lib/sootclasses-2.3.0.jar:$ROOT/tools/jasmin-2.3.0/lib/jasminclasses-2.3.0.jar:$ROOT/workspace/DUAForensics/bin:$ROOT/workspace/mcia/bin:$ROOT/tools/java_cup.jar"

SOOTCP=".:/etc/alternatives/java_sdk/jre/lib/rt.jar:$ROOT/workspace/LocalsBox/bin:$ROOT/workspace/InstrReporters/bin:$ROOT/workspace/DUAForensics/bin:$ROOT/workspace/mcia/bin:$subjectloc/build/classes:$subjectloc/build/test/classes:$subjectloc/build/contrib/fatjar/classes:$subjectloc/lib/netty-3.7.0.Final.jar"

for i in $subjectloc/lib/*.jar;
do
	SOOTCP=$SOOTCP:$i
done

suffix="zk"

LOGDIR=out-distEAInstr
mkdir -p $LOGDIR
logout=$LOGDIR/instr-$suffix.out
logerr=$LOGDIR/instr-$suffix.err

OUTDIR=$subjectloc/distEAInstrumented
mkdir -p $OUTDIR

	#-main-class org.apache.zookeeper.util.FatJarMain \
	#-entry:org.apache.zookeeper.util.FatJarMain \
java -Xmx40600m -ea -cp ${MAINCP} distEA.distEAInst \
	-w -cp $SOOTCP -p cg verbose:false,implicit-entry:false \
	-p cg.spark verbose:false,on-fly-cg:true,rta:true -f c \
	-d $OUTDIR \
	-duaverbose \
	-brinstr:off -duainstr:off \
	-allowphantom \
	-dumpJimple \
	-nio \
	-socket \
	-wrapTryCatch \
	-dumpFunctionList \
	-slicectxinsens \
	-main-class $MAINCLASS \
	-entry:$MAINCLASS \
	-process-dir $subjectloc/build/classes \
	-process-dir $subjectloc/build/test/classes \
	1> $logout 2> $logerr

Note that another useful option is "-dumpFunctionList", which asks DistEA to dump the full list of methods in the target program, in the format of function signature. Since the post-process phase needs input queries in such format, you may want to always set this flag if you are not familiar with the method signatures in the software you are analyzing.

Runtime - gathering method-execution sequences

After successful instrumentation, you should find the instrumented subject in the "$OUTDIR" directory. The next step is to run the instrumented code just as running the original system. For instance, launch distributed processes on multiple machine.

How to obtain the results

When execution traces are gathered, it is the time to get the impact set of any queries of your interest. Of course, an addition input is the queries. As noted before, the input queries need be given in the full signature form to be exact. If you give only a part of method signatures, DistEA will automatically match all containing signatures and compute impact sets of all the matched methods.


An actual example of the post-process phase usage of DistEA is shown below.


INDIR=$subjectloc/traces/
query=${1:-"$subjectloc/distEAInstrumented/functionList.out"}
NT=${2:-"1"}

ROOT=/home/hcai/
MAINCP=".:/etc/alternatives/java_sdk/jre/lib/rt.jar:$ROOT/tools/j2sdk1.4.2_18/lib/tools.jar:$ROOT/tools/polyglot-1.3.5/lib/polyglot.jar:$ROOT/tools/soot-2.3.0/lib/sootclasses-2.5.0.jar:$ROOT/tools/jasmin-2.3.0/lib/jasminclasses-2.3.0.jar:$ROOT/tools/java_cup.jar:$ROOT/workspace/DUAForensics/bin:$ROOT/workspace/LocalsBox/bin:$ROOT/workspace/InstrReporters/bin:$ROOT/workspace/mcia/bin"

for i in $subjectloc/lib/*.jar;
do
	MAINCP=$MAINCP:$i
done

starttime=`date +%s%N | cut -b1-13`

	#-debug
java -Xmx40800m -ea -cp ${MAINCP} distEA.distEAAnalysis \
	"$query" \
	"$INDIR" \
	"$NT" \
	"-separate" \
	"-common"

The main parameters used by 'distEAAnalysis' are explained as follows.


Generated on 4 Apr 2015 by  doxygen 1.6.1