Data tainting for malware analysis – part one

2009-09-01

Florent Marceau

CERT-LEXSI, France
Editor: Helen Martin

Abstract

The increased use of virtualization technologies presents new angles for new types of attack. Florent Marceau presents a different view of this technical evolution in order to take advantage of full virtualization with the aim of helping malware researchers.


Malware technologies are becoming increasingly advanced and the use of compression and cryptographic ciphering is common. Flexible design allows for capabilities such as dynamic downloading of configuration files over the network. These practices have increased considerably over the last few years. The aim, among other things, is to make analysis of the malicious file more complex and time-consuming, as well as to hide its presence.

At the same time, the use of virtualization technologies is becoming increasingly common. Desktop virtualization (VMware) and virtual shared hosting (XEN) make great use of hardware-assisted virtualization such as hypervisors in order to improve performance. These kinds of technologies will soon become standard in PCs, and could be applied directly in the BIOS (cf. Phoenix HyperCore).

From a security point of view, these practices generate new angles for new types of attack. Many previous studies have been published on this subject. We have seen, among other things, the use of hypervisors in rootkits/anti-rootkit techniques [1]. In this three-part series we present a different view of this contextual technical evolution in order to take advantage of full virtualization (without a hypervisor) from a security point of view, and more specifically with the aim of helping malware researchers. We will first study the use and advantages of full virtualization, and then we will describe a concrete implementation: a way to dump character strings loaded from the network and manipulated by malware in RAM. The objective is to obtain the malware configuration file in clear text, in order to understand its impact and the risks involved with it.

Introduction and concepts

Full virtualization

Generally speaking, we use a hypervisor to accelerate emulation. When the host system and the emulated guest use the same instruction set (which is usually the case with VMware or XEN, from x86 to x86) a major acceleration can be achieved by executing some part of the guest machine code directly on the host processor. Meanwhile, full emulation is slower, and will simulate the guest processor behaviours for each opcode.

All reverse engineers know how important it is to have good tools. A good user-mode debugger for Windows like OllyDbg is very useful, but is not as powerful as a kernel-mode debugger like SoftICE, which is more complete and allows both kernel- and user-mode debugging. Unfortunately, when working in certain debugging contexts – like working on an MBR (Master Boot Record) rootkit such as BootRoot or Mebroot/MaOS [2] – a kernel-mode debugger will not be sufficient. Most will load too late. Moreover, most kernel debuggers are OS-dependent. In the case of WinDbg, to debug an MBR, we need two machines with a null modem link (which can be achieved with a virtual machine). It became apparent to us that the most generic and efficient debugging platform is not the kernel debugger but the virtual machine itself. The biggest constraint here is the lack of debugging symbols.

The ultimate debugging platform is an in-circuit emulator (ICE). This piece of hardware is clearly the most efficient, but it has disadvantages: it is expensive and it is fully hardware-dependent. Thus, it seems that debugging directly via a virtual machine is the cheapest solution. Cost-free from the hardware point of view, it can be applied to any architecture - given that you can emulate it. This kind of technique provides debugging capabilities that are sometimes even better than an ICE. Indeed, hardware in-circuit emulators are directly supported by the architecture’s debugging capabilities. The x86 architecture, for example, has four debug addresses that point to a maximum of a DWORD; using a virtual machine we can monitor a maximum of 4 x 4, 16 octets of memory with memory watchpoints. Finally, in a special debugging context such as reversing a BIOS, using a virtual machine is so difficult that the cost of an ICE becomes irrelevant: indeed, the BIOS code is the most hardware-dependent that exists and is difficult to virtualize correctly.

Using full virtualization we can easily modify the internal state of the CPU. It then becomes easy to modify an opcode interpretation, to break the code execution on a chosen mnemonic or to obtain a complete execution flow. Moreover, by modifying the CPU’s Memory Management Unit (MMU), we act directly between the RAM and the CPU; we can then monitor any RAM access arbitrarily. This allows us to create a memory watchpoint on the two first Go of RAM if needed. We refer here to the use of Qemu in full virtualization mode on the Anubis sandbox [5]. Anubis applies instrumentation to the emulated CPU in order to monitor the call and int opcodes to watch all API and system calls emitted from the monitored code. This monitoring technique is hidden since it takes place directly inside the emulated hardware and cannot be detected as it would be on hook-based solutions like Capture HPC or Microsoft Detours.

A lot of research concerning automatic and generic unpacking methods has been carried out and also uses the full virtualization concept for code instrumentation. It is based on the fact that the intersection between the deobfuscation code and the host code can be singularized by execution of a piece of code that was previously a data zone used by the deobfuscation code. The Pandora’s Bochs [6] and Renovo [7] engines equip the emulator to follow data propagation in order to detect this intersection. Unfortunately, these implementations have detectable parts so they aren’t fully hidden (Renovo in particular uses a kernel module), and generally they use an abstraction level that is too high – the use of virtual addresses, for example, allows evasion (cf. Skape [8]). Note that the need to track and differentiate data pages from code pages is quite similar to what is implemented to emulate the NX bit (non executable) that is missing on old processors (cf. the PAGEEXEC implementation in PaX [9]). Indeed, we can use the desynchronization between the data TLB (Translation Lookaside Buffers) and the code in order to differentiate data from code pages for the page fault handler (Interrupt 14) and eventually detect the execution from a data page. This mechanism is used by SAFFRON [10] rather than using a virtual machine. As we’ll see later, our own implementation will use full virtualization to apply instrumentation in order to monitor data flow.

Obviously, this technique isn’t perfect – we’ll see later that it has some constraints due to the nature of full virtualization.

This concludes the theoretical part. Many open source emulators are available; to emulate an x86 platform we can use the Bochs [11] emulators that provide many instrumentation capabilities but which are slower than Qemu. While Qemu is faster its optimization mechanisms make it quite difficult to instrument.

Context

Nowadays, many pieces of malware have banking credential-stealing capabilities. To this end, they use regular expression keywords for each targeted bank. For flexibility, such malware downloads its configuration files over the network. In this way the configuration can easily be upgraded.

These configuration files are compressed and/or cryptographically ciphered in order to remain hidden from the network flow. Moreover, for flexibility, some malware uses different executable modules for each of its functionalities that can then easily be upgraded through the network.

Our objective here is to automatically process these pieces of software in order to obtain the clear text configuration file, and the process must be independent of the cryptographic cipher or compression algorithm used.

Observation shows that malware will download its ciphered configuration file from the Internet and will then uncompress/decipher it for use. There is necessarily a period of time in which the malware applies a transformation to the ciphered data and then stores it in memory as clear text (on the deciphering algorithm). We need simply to dump this data during this brief period of time.

To achieve our goal we need two things:

  • The ability to track the full propagation of the monitored binary code (malware);

  • The ability to dump all data originating from the network and that is manipulated by our tracked malicious binary.

By fulfilling these two conditions, we can force the dump of the clear text configuration file (among other data), and this is the case even if the analysed malware doesn’t keep any instance of its clear text configuration file (for example if it re-encodes or destructs the configuration file after using it). To achieve this, we use data tainting.

Presentation of data tainting

Briefly, data tainting is a mechanism that allows us to track the full propagation of a given set of data on an information system.

Let’s take a simple example of data tainting in RAM. For a memory zone named A of x tainted octets (to be tracked), a simple memcpy of x octets from zone A to zone B means that zone B will be marked as tainted too. A simple implementation is to use a RAM mirror called a taintmap, which contains for each RAM octet a ‘tag’ octet that keeps the tainting information. In the previous example, during the memcpy from zone A to zone B in RAM, there will be a similar memcpy on the taintmap from the tainted information corresponding to zone A (of x octets) to the corresponding zone B.

Let’s examine a more concrete scenario. We want to track the propagation of data originating from the network (the classic case of a downloader that loads its payload onto the hard drive before executing it). This data came from the network and is stored on the network card cache. The kernel will load this data via IO or via the DMA, and copy it into the user-mode buffer of the application that requested this network resource. Finally, our application will request the kernel again in order to create a file to store this data.

In such a scenario, since we need to track all the incoming network data without filtering, we simply have to hook the emulator part that handles the network card cache in order to mark all incoming data as tainted as it is loaded into RAM (via the IO or the DMA). Our tainted data in RAM will then be propagated through the taintmap during all the processing that the data goes through. When the data is copied to the user-mode buffer it will retain its taint marks.

Note that we work here at the hardware level, since we are OS-independent. This means that when the OS frees a heap buffer that contains some tainted data, the data will continue to be resident in RAM and consequently the tag will persist. It is only when the buffer is reallocated that the data is overwritten and the tainted tag will also be overwritten with the new tainted data values.

A problem may appear when malware stores information on the hard drive. This requires that we extend the data tainting mechanism and propagate tainting information through the hard drive. The mechanism is exactly the same as for the network card cache: we only have to propagate tainting information (tags) for each exchange between the RAM and the hard drive via the IO or the DMA. Obviously, unless a very low capacity drive is used, we can’t mirror the hard drive as we did for the RAM. In normal conditions, there is a low volume of tainted data compared to the drive size. Moreover, if this data is stored on the hard drive before any arithmetic processing, there is a low risk of loss of these tainted marks (more details on this later). We can then consider this data as mostly contiguous. From this observation we decided to store the hard drive tainting mark as a table of offset and size, using an offset similar to the LBA (Logical Block Addressing).

Let’s examine the data tainting internals in more detail.

For previous examples we just used simple hooks on different data channels (IO/DMA) in order to propagate tainted data, but the RAM propagation mechanism is more complex. The simple memcpy of tainted data can itself take several different forms.

A memcpy implemented with a simple repz movsd will be different from a memory loading and storing via the register repeated on a loop. Indeed, the second case also implies a register-level propagation (anyway, registers can easily be mirrored). But the problem is really more complex. Indeed, in many cases we do not simply move data from one place in RAM to another; the data will be loaded, go through a lot of arithmetical processing and comparison before being stored in RAM.

Let’s consider an example where we use data tainting to track the propagation of packed binary code that injects itself into other processes. The code must keep its tag even after the unpacking operation so that we can continue to monitor its propagation.

Therefore, we have a binary mapped image in RAM that is tagged; this image will read itself as a data sequence and decipher it to generate the unpacked code. Since many packers use several cryptographic layers, the tainting mark can get lost in the heavy arithmetic process involved. Indeed, while it is logical to say that during the execution of a mnemonic ‘add REG, IMM’, REG will keep its tainted tag, what would happen during a bit permutation? It is in those kinds of cases that the propagation becomes increasingly complex. As you can see, the tainted tag propagation between the RAM and CPU requires instrumentation of each virtual CPU mnemonic to identify the potential propagation of the tag for a given mnemonic. The two most common open source data tainting implementations are Taint Bochs [12] for Bochs and Argos [13] for Qemu.

In the next part of this series (next month) we will study the inherent limitations of an efficient propagation and the overall limitations of this type of solution.

Bibliography

[1] Rutkowska, J. Subverting Vista Kernel for Fun and Profit. http://www.blackhat.com/presentations/bh-usa-06/BH-US-06-Rutkowska.pdf.

[2] Florio, E.; Kasslin, K. Your Computer is Now Stoned (...Again!) The Rise of MBR Rootkits. http://www.symantec.com/content/en/us/enterprise/media/security_response/whitepapers/your_computer_is_now_stoned.pdf.

[4] Arium ECM-700 JTAG Emulator. http://www.arium.com/product/?prod_id=56.

[5] Bayer, U.; Kruegel, C.; Kirda, E. TTAnalyze: A tool for analyzing malware.

[6] Bohne. L. Pandora’s Bochs: automatic unpacking of malware. http://www.damogran.de/PandorasBochs.pdf.

[7] Kang, M.G.; Poosankam, P.; Yin, H. Renovo: A hidden code extractor for packed executables. Proceedings of the 5th ACM Workshop on Recurring Malcode (WORM 07), October 2007.

[8] Skape. Using dual-mappings to evade automated unpackers. http://uninformed.org/?v=10\&a=1.

[9] PaX Team. Design of PAGEEXEC. http://pax.grsecurity.net/docs/pageexec.txt.

[10] Quist, D. Covert debugging circumventing software armoring techniques. http://www.offensivecomputing.net/bhusa2007/dquist-valsmith-covert-debugging-paper.pdf.

[11] Bochs IA-32 Emulator. http://bochs.sourceforge.net/.

[12] Taint Bochs Understanding Data Lifetime via Whole System Simulation. http://www.stanford.edu/~blp/papers/taint.pdf.

[13] Argos: An emulator for capturing zero-day attacks. http://www.few.vu.nl/argos/.

twitter.png
fb.png
linkedin.png
hackernews.png
reddit.png

 

Latest articles:

Nexus Android banking botnet – compromising C&C panels and dissecting mobile AppInjects

Aditya Sood & Rohit Bansal provide details of a security vulnerability in the Nexus Android botnet C&C panel that was exploited to compromise the C&C panel in order to gather threat intelligence, and present a model of mobile AppInjects.

Cryptojacking on the fly: TeamTNT using NVIDIA drivers to mine cryptocurrency

TeamTNT is known for attacking insecure and vulnerable Kubernetes deployments in order to infiltrate organizations’ dedicated environments and transform them into attack launchpads. In this article Aditya Sood presents a new module introduced by…

Collector-stealer: a Russian origin credential and information extractor

Collector-stealer, a piece of malware of Russian origin, is heavily used on the Internet to exfiltrate sensitive data from end-user systems and store it in its C&C panels. In this article, researchers Aditya K Sood and Rohit Chaturvedi present a 360…

Fighting Fire with Fire

In 1989, Joe Wells encountered his first virus: Jerusalem. He disassembled the virus, and from that moment onward, was intrigued by the properties of these small pieces of self-replicating code. Joe Wells was an expert on computer viruses, was partly…

Run your malicious VBA macros anywhere!

Kurt Natvig wanted to understand whether it’s possible to recompile VBA macros to another language, which could then easily be ‘run’ on any gateway, thus revealing a sample’s true nature in a safe manner. In this article he explains how he recompiled…


Bulletin Archive

We have placed cookies on your device in order to improve the functionality of this site, as outlined in our cookies policy. However, you may delete and block all cookies from this site and your use of the site will be unaffected. By continuing to browse this site, you are agreeing to Virus Bulletin's use of data as outlined in our privacy policy.