File-fraction reputation based on digest of high granularity
Ethan YX Chen Trend Micro
download slides (PDF)
How to decide whether a file is benign or malicious has been a critical problem for the anti-malware industry for a long
time. One recently popular approach is to build a large database to store the characteristics (e.g. download source,
prevalence or age) and/or file analysis results related to file instances for later use. This kind of approach is
categorized as reputation-based technology.
Reputation-based technology applies a statistics-based method to the characteristics to determine the reputation of a
file. The characteristics being used (e.g. prevalence, URL) are usually different from content-based technology
(e.g. a malware definition composed of sequences of bytes).
In this paper we propose a solution to combine the reputation-based and content-based solutions. It provides a different
perspective on the efforts to fight against today's highly polymorphic, micro-distribution malware. The basic idea is to
factorize the content into 'fractions' by a rolling hash, and then build the reputation information of those fractions.
The content to be factorized can either be from raw files or memory dumps; for memory dumps it helps to detect packed
malware and also benefits memory forensics. Malware files of the same family often share at least several identical
fractions, especially fractions from the memory dump. Some fractions can also be identified to be part of some tool, whether
benign/neutral (e.g. AutoIT), aggressive (e.g. remote control tool) or malicious (malware toolkits).
Several possible applications of file-fraction reputation will also be discussed.