Used in Computer Viruses
Part 2Copyright (C) 05/1995 by Howard Fuhs
In the first article the existing generators were described. The subject of this article will be a number of technical aspects concerning polymorhic viruses and polymorphic generators.
The invention of polymorphic viruses served exclusively the purpose of preventing or at least making it considerably harder to detect the viruses with the means at hand, i.e. string scanners. It became necessary to develop new methods and to implove existing methods in order to reliably be able to detect the polymorphic viruses.
Initially the heuristic methods assumed that a virus must always contain a certain program structure, no matter how it got changed and encrypted. Above all, this was the case with the decryption routine that all polymorphic viruses must contain in order to be able to turn themselves back into executable code. The heuristic method implies searching executable files for such program structures, typical of polymorphic viruses. This way it is possible to estimate the possibility of the file actually being infected by a polymorphic virus. When a certain level of probability is reached the anti-virus program reports the file in question to be suspicious or infected. Because the heuristic methods to a certain extent build on the asumption of probabilities, the risk of a false alarm, or a false positive, (the technical term normally used), is also comparatively high. This probability increases with the complexity of the polymorphic generator that the heuristic method must detect. A very high number of mutations requires a highly general en/decryption routine to be used, and this in turn increases the probability of encountering the same program code in an uninfected file.
For this reason purely heuristic scanners should only be used by trained persons with considerable knowledge about viruses and their behaviour.
Viruses programmed in High-Level Languages (HLL viruses), which cannot be found by means of heuristics, present even more problems. Because these are programmed in a high-level language they do not necessarily display any virus-characteristics detectable by a heuristic method. The HLL viruses are quite rare, because they are very large and contain a low degree of functionality in relation to their program size.
It is not possible to recognise all mutating viruses or all polymorphic encryption engines with one single heuristic detection routine. In general it is necessary to write a specific detection algorithm for each individual polymorphic virus or each individual polymorphic generator, which take the characteristics of the particular engine into consideration. It is necessary to analyse the virus or encryption engine carefully and completely to be able to accomplish this. In case something is overlooked in the analysis, the consequence can easily be that the polymorphic engine is allowed to propagate a virus under certain circumstances, without this being detected by the heuristic detector. When evaluating the ability of an anti-virus program to detect a particular polymorphic virus it is thus insufficient to see if it is able to detect just a few instances of the virus. It is normally necessary to generate several thousand instances of a particular polymorphic virus to evaluate the realiability of an anti-virus product when it comes to finding the mutating viruses. Just one instance gone undetected will abow the virus to reinfect the system all over again after all the detected instances have been removed.
In this method the anti-virus program first attempts to ascertain whether the file to be examined shows any signs of being encrypted. If this seems to be the case an attempt is made using generic decryption methods, to remove the encryption and search the decrypted program for viruses using scan-strings (or virus signatures). Because all polymorphic engines do not allow decryption by means of generic methods, it is most often necessary for the anti-virus programmer to analyse the virus in question. This method also fails to detect future viruses made with the same polymorphic generator because a signature is characteristic of the particular virus, not necessarfly of the polymorphic generator, itself.
Checksum programs calculate a so-called checksum across a file and stores this information in a database. A checksum is a number with the property that it changes in case anything in the file changes, i.e. changing just a single bit in the file will make the value of the checksum change. Checksums are calculated using algorithms having a very small probability of producing the same value of the checksum for two different files. For information security purposes checksumming entails calculating check-sums for all executable files in a system (assuming that these are not supposed to change) and storing this information in a database. This is performed on a regular basis and the results compared with the values contained in the database. This way even minor changes are detected, and thus a virus will be detected soon after infection, polymorphic or not.
The downside to checksumming is that although program changes are readily detectable by means of this method, it is unable to tell whether the changes are caused by a virus, or by a program changing itself quite legitimately (normally considered to be bad programming practice). Furthermore, the checksummer will not be able to identify a virus on its own. To cope with these difficulties a combination of methods must be used in a good anti-virus system.
Checksumming programs are also vulnerable to attacks by stealth viruses, retro-viruses and fast infector viruses. If a stealth virus is present in the computer memory while the checksummer is run, it will normally prove unable to detect changes caused by the virus, because the virus removes the infection before passing the file under examination to the checksummer, and reinfects afterwards. Thus the checksummer is lured into using false data for its calculation. If the virus is furthermore of the fast infector type, all files will have been infected in one swoop after running the checksummer just once. In case the database containing the checksumming information is stored on an accessible hard-disk, a retro-virus may attack it, either by simply deleting it so that the checksummer needs to calculate a new database (containing values based on the infected files) - or by changing its contents so that the new values correspond to files infected by the virus.
Irrespective of which method is used to detect polymorphic viruses, each has its advantages and disadvantages.
No matter how intelligently and securely a heuristic subroutine is designed and programmed, the risk of it producing a false positive cannot be ruled out entirely. Nevertheless heuristic search methods offer advantages. If a polymorphic generator is sufficiently well researched and a matching heuristic method carefully implemented it is safe to assume that all viruses containing this particular engine will be detected reliably.
Using decryption it is very unlikely that false positives are encountered, and for this reason only these polymorphic viruses can be detected, which have been thoroughly analysed and integrated into the decrypting routine. Using this method it is possible to identify each variant of the virus exactly, which is important in case disinfection is attempted. Using a heuristic method this is not possible.
Polymorphic viruses are easily detected by means of checksummers, as long as they do not use stealth techniques to protect themselves against checksummers and similar programs. Checksummers are able to detect the presence of new viruses, which have not yet been analysed.
Using the MutaGen Engine as an example it will be shown how a polymorphic engine is included in a virus. MutaGen is quite simple to use, because it only requires a few parameters to set it up. To implement it the following line must be present in the start of the virus code:
- extm -MUTAGEN:near
MutaGen requires the following parameters:
- DS:DI = Address of the code to be encrypted
- ES:DI = The area taken up by the encrypted code. Sufficient space must be allocated for the decryption module,which has a length of 40 - 180 bytes.
- CX Total size of the code.
- DX Memory offset to the encryption module.
After MutaGen has finished its work, CX contains the length of the encrypted code with decryption key, and DS:DX points to the encoded program.
In order to protect a virus efficiently against being discovered, a polymorphic engine must contain a number of functions:
- 1. The proper virus Code must be encrypted.
- 2. The encryption must be performed using different keys or even algorithms, so that no encrypted instance looks like any other instance.
- 3. The decryption routine must be changed with each new encryption, so that while maintaining its basic functionality, it is considerably different from instance to instance.
If the conditions 2 and 3 above were not included it would be possible to detect the virus by means of a search string.
A virus containing a polymorphic routine is illustrated in Figure 1.
The virus contains a decryption routine enabling it to turn itself back into executable code, and the polymorphic engine is embedded inside the encrypted part of the virus.
Figure 2 illustrates how an infected file functions.
When the file is executed the following sequence of events take place:
The JMP command in the start of the file immediately jumps to the decryption routine, which decrypts the virus, including the polymorphic engine. Once the virus code is decrypted it can be executed in the computer like any other program. When the virus has finished doing whatever it is programmed to do it jumps to the start of the original code in the infected file, and the program expected by the user starts.
This also illustrates the method used by a virus to infect a file. It starts by writing a jump command in the beginning of the infected file. In simple cases this points to a spot right behind the end of the file to be infected. Next step is to ask the polymorphic engine to generate a decryption routine and a corresponding key. An unencrypted version of the virus code, including the polymorphic engine, is then set up in memory. It is encrypted with the newly generated key, and attached to the decryption routine. These components are appended to the file under infection. Finally, a jump command pointing to the start of the original file is appended, and the infection is complete.
The explanations presented here are simplified and vary from one polymorphic engine to the next.
The Polymorphic Encryption Routines encrypt the virus code using one or several encryption algorithms. The goal is not only to prevent detection of the virus, but also to be able to produce as many differently encrypted virus versions (instances) as possible using as little code as possible. A number of different variables are required to generate different encryptions. As variables to generate different mutations you can use e.g. the date, the time, the size of the file to be infected, or any other value that varies sufficiently unpredictably to ensure few or no identical encrypted instances. One consequence of the choice of variables can be that a virus stops to multiply within a given environment after a while, because no more values of the chosen variables are suitable for use in the encryption routine.
Encryption of the virus code is not sufficient to make the virus invisible for a string scanner. The weakest link in a polymorphic engine is its decryption routine. Whereas the virus itself is only converted to executable Code when it is decrypted, the decryption routine must be able to run on the computer, and thus cannot be encrypted. In case the same decryption routine is used in all instances of the virus, a string scanner is able to look for that, thus detecting the virus.
In order to avoid this the instructions in the decryption routine are often intersperced with meaningless pieces of code or instructions, e.g. NOP or MOV BX:BX, which are legal but accomplish nothing. By inserting these fill sequences various places in the code, it is changed sufficiently not to be detectable with a simple string scanner any more. In order to be able to generate code including the meaningless instructions, many polymorphic generators include a Junk Code Generator, which has the purpose of changing the decryption routine code in a manner that is legal in terms of being able to be executed by the host computer, and without changing the functionality of the routine, but so that as many different instances of the routine are generated as possible. It is basically a pseudo-random generator that has at its disposal a number of different instructions or instruction sequences that it can insert at random in the decryptor routine code without changing its functionality. Apart from the junk Code Generator a number of other possibilities are exploited in polymorphic engines to attempt to protect decryption routines from detection.
Depending on how a polymorphic engine is programmed it can contain more or less complex encryption routines.
The simplest case is the use of a fixed number of always identical encryption routines. This method is not very efficient, because it is only necessary to include a number of string sequences in an ordinary string scanner in order to detect a virus of this type reliably.
The subsequent step in the development was using the same set of instructions, but always changing which registers they used, thus increasing the number of virus signatures necessary to obtain reliable detection with a string scanner, considerably.
The next step was to use a number of exchangeable pieces of code in the encryption routine, so that a number of code sequences are usable in different positions within the decryptor code, with the same effect.
The advanced polymorphic engines use a combination of the methods described.