The clean theory

2013-08-01

Mircea Ciubotariu

Symantec Security Response, USA
Editor: Helen Martin

Abstract

Can the principles of logic be applied to the daily task of file analysis? Mircea Ciubotariu gives it a go.


Table of contents

Using analogies with the principles of logic, this article will look at the processes that Security Response Engineers (SREs) employ on a daily basis to make their decisions about incoming files and anticipate the future shape of the industry based on it.

Old rules

An old proverb goes something like this: when one adds a pint of clean water to a barrel of sewer water one gets a barrel of sewer water, and when one adds a pint of sewer water to a barrel of clean water one gets… a new barrel of sewer water.

Considering the clean water as a logic true statement and the sewer water as a logic false statement, the proverb expresses a long known principle of logic: adding a true statement (pint of clean water) to several ‘&’ (AND operator) chained false ones (barrel of sewer water) results in an overall false statement, just the same as adding a false statement to several ‘&’ chained true ones also results in an overall false statement.

One of the main duties of SREs is to determine whether a given file may pose a threat to the environment in which it would be deployed and to take necessary steps to prevent such a threat from materializing. To do this, the SRE needs to look within the file for specific sequences of code or commands that may perform unwanted or malicious actions in the deployment environment.

Such a file subjected to analysis may be expressed as a (long) logical sequence similar to this one:

S = P1 & P2 & P3 & … & Pi & … & Pn

where each Pi represents a fundamental block in the file, performing one atomic action, or a statement in the logic parallel. (Note that in reality the above expression may also contain other logical operators such as ‘IF/THEN’; nonetheless in order to evaluate the whole file one must evaluate each individual Pi.)

We refer here to file blocks in a general manner. In fact, the blocks have different representations for different file types – for example, a block in a script file is an atomic command executed by the script interpreter in one step, while for a native executable the block may be regarded as a basic code block, which means a block of instructions that has either more than one entry or more than one exit.

Since it only takes one false statement in the list to reach an overall false statement, as soon as one of the blocks is deemed to pose a threat, for the purpose of protection the analysis can stop and the file can be considered a threat – with a detection signature required.

To give a sense of the magnitude of the task, let’s consider a simple application such as Notepad, which has roughly 1,500 blocks. If an attacker were to insert a few additional malicious blocks at a random location, it would be very difficult to spot them among the 1,500 clean blocks – it’s very much like looking for a needle in a hay stack.

In many cases where the whole file has been created with malicious intent (such as most trojans), the threat can be spotted more easily, since elements of the threat can be found all over the place – for example, obfuscation and polymorphism are good indicators that there is something undesirable about the file. Currently, roughly three in four files that Symantec receives for analysis end up being assigned a signature for detection.

When detailed information is needed for documenting the actions of a threat, deep analysis must be performed on the whole file, which means having to go through almost all of its blocks, regardless of whether they are good or malicious, in order to fill in all the pieces of the puzzle. For example, in the case of Stuxnet – one of the most complex threats seen to date – it took a team of three senior engineers more than four months to go through its roughly 12,000 blocks of code.

Add to that obfuscation or polymorphism, which make analysis more difficult by making the blocks look different every time, and you get a picture of the amount of work needed for an SRE to make a determination.

There are several automation tools that can be used to accelerate the processing of information, such as those that identify library code, which is reused in many binaries and already considered clean, or which find the original clean file and compare the new file against it. But in the end, a large number of the blocks need to be inspected manually. The rule says that in making a determination one puts in an amount of time and effort that is directly proportional to the amount of information contained in the file, which in turn is usually directly related to its size.

The SRE can also make use of other specialized tools before diving into deep analysis – such as behaviour examiners, where a determination is made based on the actions performed by a file when deployed in a given environment – but that’s another story.

Intent

Some legitimate tools, such as the system tool cmd.exe (Command Prompt), have at least one block of code that deletes multiple files, and can do so even without user interaction. This behaviour on its own is regarded as malicious and if found by itself in a standalone application, the application would be classed as malicious. But cmd.exe and the like are in fact clean files, so how does that work?

The analogy of true/false logic statements with clean/bad files works here too: the destructive code only triggers when a specific parameter is given to cmd.exe and the interaction is suppressed by another external parameter. Basically, cmd.exe performs something like this:

If the ‘delete’ command is present in the command line, then delete specified files.

If the ‘silent’ parameter is present, then suppress prompting.

Each of the two statements can be expressed as:

S = if P then Q

The following applies equally to both statements:

When P is false, then S is true, or clean, just as when the ‘delete’ command is not present in the command line (P = false), no files will be deleted by cmd.exe, which in turn means a clean run, and if the ‘silent’ parameter is omitted there will be a prompt for each command, if there are any.

When P is true, then Q will be evaluated, which is true, resulting in S being true as well. In the same manner, when told to delete files, cmd.exe acts in a legitimate fashion as part of a larger scheme, but on its own it is just a clean tool. It’s similar to a knife that can be used either to help in the kitchen or for criminal activities.

All this brings us to another important factor in determining some of the files conditioned by external interaction: what is the purpose of commanding such files to perform the different actions (either legitimate or malicious), in other words the intent behind using them?

Many modern threats and/or attacks involve several modules which interact with each other. While most of the modules are specifically created with malicious intent, making them easier to determine, it may be that some of the modules are in fact legitimate tools – as in the case of NetCat, a command-line tool commonly used by network administrators for advanced network connections.

Despite the fact that NetCat is a clean tool, due to its common occurrence in hacking attacks, where it is mostly used for initiating backdoor connections, it has been deemed as a security threat and therefore is reported as a ‘security assessment tool’ (giving the user the option to ignore its detection).

Trust

Other factors that play an increasingly important role in the process are the original source of the files, and their popularity.

In general, legitimate companies produce high quality content that fits certain patterns of quality control, where integrity information such as digital signatures and version information is always present. Such information can be used to track the file to its creator and is often an indication that the file is trustworthy and therefore may be deemed ‘clean’, (because, unlike threat authors, legitimate companies have a reputation to maintain).

As history shows, there are cases where big companies have crossed the line, as in the Sony BMG copy protection rootkit scandal in 2005, or where legitimate signing certificates have been used in the creation of various threats. When a certificate is found to have been used in signing any threat it is revoked, and as a result, any other files signed with it, including any legitimate ones, are deemed untrustworthy.

The bottom line is that files produced by large, well known companies may, within certain limits of certitude, be assumed clean without going through the whole analysis process, unless there is a good reason to do so (such as an observed side effect or a suspicious action performed by it).

Trust can also be applied in flagging a file, since most clean files tend to be easy to analyse, while 83% of the threat files observed today use at least one packer. If a file claims to come from a trustworthy source but has signs of obfuscation, a mismatching digital signature, or appears to be packed with a custom packer, there is a more than 95% chance that it will pose a security threat.

Summary

Logic states that a truth can only imply a truth; in a similar manner, a clean file must be ‘clean’ on all levels: it must come from a known, reputable entity, it must serve a well defined, ‘good’ purpose and it must be made up only of clean blocks. The analysis techniques currently used in file determination are relatively slow, and the number of files needing to be processed daily is increasing rapidly – vendors need to look into new ways of dealing with threats where less of the classical per-file in-depth analysis is performed, with more emphasis placed on the trust/intent determination. The game must move to the next level.

twitter.png
fb.png
linkedin.png
hackernews.png
reddit.png

 

Latest articles:

Nexus Android banking botnet – compromising C&C panels and dissecting mobile AppInjects

Aditya Sood & Rohit Bansal provide details of a security vulnerability in the Nexus Android botnet C&C panel that was exploited to compromise the C&C panel in order to gather threat intelligence, and present a model of mobile AppInjects.

Cryptojacking on the fly: TeamTNT using NVIDIA drivers to mine cryptocurrency

TeamTNT is known for attacking insecure and vulnerable Kubernetes deployments in order to infiltrate organizations’ dedicated environments and transform them into attack launchpads. In this article Aditya Sood presents a new module introduced by…

Collector-stealer: a Russian origin credential and information extractor

Collector-stealer, a piece of malware of Russian origin, is heavily used on the Internet to exfiltrate sensitive data from end-user systems and store it in its C&C panels. In this article, researchers Aditya K Sood and Rohit Chaturvedi present a 360…

Fighting Fire with Fire

In 1989, Joe Wells encountered his first virus: Jerusalem. He disassembled the virus, and from that moment onward, was intrigued by the properties of these small pieces of self-replicating code. Joe Wells was an expert on computer viruses, was partly…

Run your malicious VBA macros anywhere!

Kurt Natvig wanted to understand whether it’s possible to recompile VBA macros to another language, which could then easily be ‘run’ on any gateway, thus revealing a sample’s true nature in a safe manner. In this article he explains how he recompiled…


Bulletin Archive

We have placed cookies on your device in order to improve the functionality of this site, as outlined in our cookies policy. However, you may delete and block all cookies from this site and your use of the site will be unaffected. By continuing to browse this site, you are agreeing to Virus Bulletin's use of data as outlined in our privacy policy.