Detecting malicious documents with combined static and dynamic analysis
Carsten Willems University of Mannheim
Thorsten Holz University of Mannheim
Markus Engelberth University of Mannheim
Malicious documents, i.e. documents that contain a malicious payload (e.g. a keylogger) became a serious threat recently.
This attack vector is often used in targeted attacks against government, military and other high-profile targets.
For example, the attacks against European governments and US defence organisations in 2007 and 2008 were based on
malicious Microsoft Office documents. Recently, targeted attacks involving malicious Acrobat Reader documents
that contained a zero-day exploit were reported. In this talk, we present our approach for detecting potential security
risks that come with arbitrary data (=non-executable) in documents. We achieve this by a novel combination of static
scanning and dynamic behaviour-based analysis techniques.
The scanning technique allows us to find malicious artifacts in files, such as embedded PE32 files, known exploit code,
or similar anomalies in the document. The dynamic approach opens the to-be-analysed document in its associated application
(for example, .doc files would be opened in Microsoft Word and .pdf files would be opened with Acrobat Reader).
Since some exploits only trigger in particular application versions, we use many different instances of these client
applications in parallel. The monitoring is carried out in a 'sandbox' environment that allows us to observe suspicious
actions that may be happen when opening the document. This can, for example, be the creation of files, spawning of
processes, outgoing network connections, interference with other processes, or in the extreme case crashing of the client
application (e.g. due to a wrong offset in the malicious document's exploit code). During the analysis phase, we also
emulate the typical behaviour of a human to potentially trigger the exploit code.
Finally, we combine all of the results from the static scanning and the dynamic analysis phase. This enables us to decide
if the specific document is potentially harmful, actually harmful, or probably harmless. Our experiments with thousands
of documents show that our system has a detection rate of 100% for all malicious documents which we tested, and also a
0% rate of false positives.
In future work, we plan to tightly integrate our analysis suite with common mail servers in order to automatically verify
all attachments of incoming mails before they are relayed to their final destination. Such a tool can then protect
high-profile targets from targeted attacks that use malicious documents as an attack vector.
del.icio.us
digg this