What is Unicode?
Unicode character sets are used throughout Windows systems, largely to make it easier to present the same information (warning messages, alerts, notices, etc.) in different languages. Windows applications, including the Windows Explorer shell, understand Unicode character sets, control characters, and know how to present them to the user. This functionality can also be subverted for malicious purposes in order to hide the presence of malware, often in plain sight.
According to Microsoft, Unicode is a “world-wide character encoding standard” that “simplifies software localization and improves multilingual text processing”. The various available encodings allow for the use of a number of scripts, languages, as well as the presentation of scientific and technical symbols. The use of Unicode encoding provides a great deal of functionality on Windows systems. As with many other instances within the information security arena, valid and needed functionality within products can be, and has been, subverted and abused by those with malicious intent.
The key to subverting a system and remaining hidden through the use of Unicode character encoding is that computers “see” things in terms of 1s and 0s and do not make decisions regarding the inherent value of information they display to the user. This trait allows a cyber attacker to manipulate what the user (a corporate employee, home user, member of the IT staff, etc.) sees, thereby directing their decision-making process. If casual observation reveals that nothing is amiss, most users (and some analysts) will simply continue to use the system and look no further.
Malware in Unicode through Character Replacement
Within the Unicode character space there are a number of characters that visually look the same when displayed to the user via Windows Explorer, although on a binary level their encoding is different.
Microsoft Windows systems utilize a file named “hosts” as one of the initial resources for the name resolution of systems on the network, or translating the name of a destination system to an IP address. The use of this file can be beneficial from a networking perspective, but it can also be used to subvert the system. For example, if you’re a parent and do not want your middle-schooler to visit certain websites, one way to prevent that from happening is to add entries in the hosts file that instruct the operating system that the IP address of the website is 127.0.0.1, or “local host”. The web browser attempts to connect back to the student’s system when requesting pages from the site, thereby disabling access to the site.
CTU analysts have observed malware copying the hosts file to another file in the same directory, changing only the “o” in the name to another Unicode character that, when displayed by Windows Explorer, looks exactly like an “o”. The malware then modifies the original hosts file, often redirecting operating systems and anti-virus application updates to either the local host (effectively disabling the protections offered by the updates) or to a malicious site. It then sets attributes on the modified file that make it difficult to view when the directory is displayed in Windows Explorer. From a visual perspective, on a live, running system, it looks as if there is only one file named “hosts” in the directory. However, the operating system sees the two different files, because the names are different at a level where most users do not have visibility. The letter chosen to replace in the name is irrelevant; as long as there is a visually identical character in another encoding scheme, any character (or combination of characters) can be replaced in this manner.
Malware in Unicode through RLO Control Character
The “right-to-left override” (RLO) Unicode control character can be used to great effect from a malicious perspective. This control character instructs the display mechanism (Windows Explorer, the Registry Editor, etc.) to display the characters after the control character in reverse order.
Using the previous example of the hosts file, adding an RLO control character adds an additional character to the name; even though the user would see five characters, a sixth one is added, so at that point, the operating system would deem this to be a different file. Adding the RLO control character to the beginning of the name causes the file name to be displayed as “stsoh”, which would be easily recognized as incorrect. However, adding the RLO control character after the “o” would result in the name being displayed as “hosts”, which would not appear to be unusual to a casual user. But again, the operating system would see this as completely different file name.
The same use of the RLO control character works for other strings, as well, in particular Windows Registry key and value names. These strings are stored in their appropriate structures as ASCII strings, but the tools used to view them, such as the Registry Editor, are capable of handling Unicode strings. For example, an intruder with the appropriate privileges can create a Windows service using the name of a legitimate service, but reversing it. Creating a service name that starts with the RLO control character and continues with “hctefrepuS” will result in “Superfetch” (a legitimate service on Windows 7 systems) being displayed by the Registry Editor (information about the configuration of Windows services is maintained in the Windows Registry). As with file names, all of the characters following the RLO control character will be displayed in a reversed direction.
An exploration of new Windows artifacts with respect to the use of Unicode characters, particularly as they apply to the digital forensic analysis of Windows systems, uncovered an article on the Microsoft Malware Protection Center website describing how malware authors make use of the Unicode RLO control character. Using the RLO control character (Unicode “202E”) causes the service name to display as a legitimate Windows service, which helps the malware remain persistent. Behind the scenes, though, the computer ‘sees’ a different name.
Researchers interested in host-based artifacts and analysis are often curious about the information not included in malware write-ups provided by antivirus (AV) vendors. AV vendors tend to focus on the aspects of detection and analysis that they’re most familiar with, which often leaves room for exploration by information security researchers interested in areas such as host-based digital forensic analysis and incident response (DFIR). For example, how could the RLO technique be detected during large-scale incident response in an enterprise environment? What about malware detection within a forensic image acquired from a Windows system?
How Malware can be Hidden in RLO Control Character
Some malware variants will create a Windows Registry key for persistence, which can include the RLO control character in the registry key name. One approach to replicate and illustrate this is to open the Registry Editor, create a new registry key named “etadpupg” (that is, “gpupdate” backwards) under the Software key in a separate (NTUSER.DAT) registry hive, and then rename it to use the RLO control character. To do this, highlight the registry key in the Registry Editor (RegEdit), right-click the key name, and choose “Insert Unicode control character”, as illustrated in Figure 1.
Figure 1. Adding the RLO control character to a registry key name.
Next, select the RLO control character, and the key name becomes “gpupdate”. Follow the same process when adding a string value of the same name beneath the key. This procedure does not add any data to the value. At this point, the new key appears as “HKCUSoftwaregpupdate”, with the “gpupdate” value beneath the key.
A forensic investigation can reveal how various freeware forensics tools display the newly created registry key and value. The first step is to add the system’s C: volume to the AccessData FTK (Forensic Toolkit) Imager application as an evidence item, and then export the NTUSER.DAT hive from the researcher’s profile.
Figure 2 shows the registry key opened in MiTeC Windows Registry Recovery (WRR) version 1.5.2, which displays the key name differently than RegEdit. Using WRR may not be convenient because an analyst needs to load and examine each hive individually, but it does a good job of indicating that something is amiss.
Figure 2. Displayed registry key in MiTeC WRR v1.5.2.
Figure 3 shows what happens when the same hive file is opened in TZWorks yaru (Yet Another Registry Utility, version 1.17). The results from yaru are unusual, and the discrepancy in the displayed name clearly indicates that something is amiss. Similar to WRR, yaru requires that an analyst individually load and examine each registry hive, which is a time-intensive process. Analysts would not typically use this approach for enterprise-wide response and analysis, but the comparison illustrates how the different host-based analysis tools display the obfuscated information.
Figure 3. Displayed registry key in TZWorks yaru v.1.17.
The next step is to display the list of HKCUSoftware subkey names using a RegRipper plugin. RegRipper relies on James MacFarlane’s Parse::Win32Registry Perl module, which uses Unicode code pages to translate the key (and value) names for output and display. Running the RegRipper “rlo.pl” plugin against the test hive file produces the output illustrated in Figure 4.
Figure 4. Key name displayed by rlo.pl Perl script.
The code within the plugin looks for key and value names that contain the hexadecimal character “2E”, which is the result of RLO control character being translated via the Parse::Win32Registry Perl module. As this character is also the hexadecimal representation for a period, the code checks to see if the character at that position within the original name string is, in fact, a period. If not, the character at that position is assumed to be the remnants of the RLO control character. As illustrated in figure 4, the code replaces the control character with a period in the name, displays the name, and also displays the name as it would appear via the Registry Editor (with the characters following the control character in reverse order).
This detection method can be extremely useful in malware detection and intrusion discovery, particularly during analysis of host-based artifacts. CTU analysts have observed that some malware authors will prepend the RLO control character to the key name. Future scenarios might bypass conventional detection techniques by using variations of the RLO control character. For example, rather than creating a Windows service for persistence, malware authors could use the ubiquitous Run key with a value name that is prepended by the control character. Or, rather than prepending the key or value name, the control character could be inserted at any point in the name; using the example in this article, “gpu + U(202E) + etadp” (this example is illustrated in figure 4). Analysts and incident responders should avoid the trap of only examining one use case. Instead, look at multiple uses of this technique and incorporate the appropriate measures into the analysis process.
Dell SecureWorks CTU analysts are aware of this issue and how it relates to the information security of our customers, and have incorporated detection measures into analytic processes and methodologies.