Dual Purpose Volatile Data Collection Script

Monday, January 2, 2012 Posted by Corey Harrell 18 comments
When responding to a potential security incident a capability is needed to quickly triage the system to see what's going on. Is a rogue process running on the system, whose currently logged onto the system, what other systems are trying to connect over the network, or how do I document the actions I took on the system. These are valid questions during incident response whether the response is for an actual event or a simulation. One area to examine to get answers is the systems' volatile data. Automating the collection of volatile data can save valuable time which in turn helps analysts examine the data faster in order to get answers. This post briefly describes (and releases) the Tr3Secure volatile data collection script I wrote.

Tr3Secure needed a toolset for responding to systems during attack simulations and one of the tools had to quickly collect volatile data on a system (I previously discussed what Tr3Secure is here). However, the volatile data collection tool had to provide dual functions. First and foremost it had to properly preserve and acquire data from live systems. The toolset is initially being used in a training environment but the tools and processes we are learning need to be able to translate over to actual security incidents. What good is mastering a collection tool that can’t be used during live incident response activities? The second required function was the tool had to help with training people on examining volatile data. Tr3Secure members come from different information security backgrounds so not every member will be knowledgeable about volatile data. Collecting data is one thing but people will eventually need to know how to understand what the data means. The DFIR community has a few volatile data collection scripts but none of the scripts I found provided the dual functionality for practical and training usage. So I went ahead and wrote a script to meet our needs.

Practical Usage

These were some considerations taken into account to ensure the script is scalable to meet the needs for volatile data collection during actual incident response activities.

        Flexibility

Different responses will have different requirements on where to store the volatile data that’s collected. At times the data may be stored on the same drive where the DFIR toolset is located while at other times the data may be stored to a different drive. I took this into consideration and the volatile data collection script allows for the output data to be stored on a drive of choice. If someone prefers to run their tools from a CD-ROM while someone else works with a large USB removable drive then the script can be used by the both of them.

        Organize Output
 
Troy Larson posted a few lines of code from his collection script to the Win4n6 sometime ago. One thing I noticed about his script was that he organized the output data based on a case number. I incorporated his idea into my script; a case number needs to be entered when the script is run on a system. A case folder enables data collected from numerous systems to be stored in the same folder (folder is named Data-Case#). In addition to organizing data into a case folder, the actual volatile data is stored in a sub-folder named after the system the data came from (system's computer name is used to name the folder). To prevent overwriting data by running the script multiple times on the same system I incorporated a timestamp into the folder name (two digit month, day, year, hour, and minute). Appending a timestamp to the folder name means the script can execute against the same system numerous times and all of the volatile data is stored in separate folders. Lastly, the data collected from the system is stored in separate sub-folders for easier access. The screenshot below shows the data collected for Case Number 100 from the system OWNING-U on 01/01/2012 at 15:46.


        Documentation

Automating data collection means that documentation can be automated as well. The script documents everything in a collection log. Each case has one collection log so regardless if data is collected from one or ten systems an analyst will only have to worry about reviewing one log.


The following information is documented both to the screen for an analyst to see and a collection log file: case number, examiner name, target system, user account used to collect data, drives for tools and data storage, time skew, and program execution. The script prompts the analyst for the case number, their name, and the drive to store data on. This information is automatically stored in the collection log so the analyst doesn’t have to worry about maintaining documentation elsewhere. In addition, the script prompts the analyst for the current date and time which is used to record the time difference between the system and the actual time. Every program executed by the script is recorded in the collection log along with a timestamp of when the program executed. This will make it easier to account for artifacts left on a system if the system is examined after the script is executed. The screenshot below shows the part of the collection log for the data collected from the system OWNING-U.


        Preservation

RFC 3227’s Order of Volatility outlines that evidence should be collected starting with the most volatile then proceeding to the less volatile. The script takes into account the order of volatility during data collection. When all data is selected for collection, the memory is first imaged then volatile data is collected followed by collecting non-volatile data. The volatile data collected is: process information, network information, logged on users, open files, clipboard, and then system information. The non-volatile data collected is installed software, security settings, configured users/groups, system's devices, auto-runs locations, and applied group policies. Another item the script incorporated from Troy Larson’s comment in the Win4n6 group is preserving the prefetch files before volatile data is collected. I never thought about this before I read his comment but it makes sense. Volatile data gets collected by executing numerous programs on a system and these actions can overwrite the existing prefetch files with new information or files. Preserving the prefetch files upfront ensures analysts will have access to most of the prefetch files that were on the system before the collection occurred (four prefetch files may be overwritten before the script preserves them). The script uses robocopy to copy the prefetch files so the file system metadata (timestamps, NTFS permissions, and file ownership) is collected along with the files themselves. The screenshot below shows the preserved files for system OWNING-U.


        Tools Executed

The readme file accompanying the script outlines the various programs used to collect data. The programs include built-in Windows commands and third party utilities. The screenshot below shows the tools folder where the third party utilities are stored.




I’m not going to discuss every program but I at least wanted to highlight a few. Windows diskpart command allows for disks, partitions, and volumes to be managed through the command line. The script leverages diskpart to make it easy for an analyst to see what drives and volumes are attached to a system. Hopefully, the analyst won’t need to open up Windows explorer to see what the removable media drive mappings are since the script displays the information automatically as shown below. Note, to make diskpart work a text file needs to be created in the tools folder named diskpart_commands.txt and the file needs to contain these two commands on separate lines: list disk and list volume.


Mandiant’s Memoryze is used to obtain a forensic image of the system’s memory. Memoryze supports a wide range of Windows operating systems which makes the script more versatile for dumping RAM. The key reason the script uses Memoryze is because it’s the only free memory imaging program I found that allows an image to be stored in a folder of your choice. Most programs will place the memory image in the same folder where the command line is opened. This wouldn’t work because the image would be dropped in the folder where the script is located instead of the drive the analyst wants. Memoryze uses an xml configuration file to image RAM so I borrowed a few lines of code from the MemoryDD.bat batch file to create the xml file for the script. Note, the script only needs the memoryze.exe; to obtain the exe install Memoryze on a computer then just copy memoryze.exe to the Tools folder.

PXServer’s Winaudit program obtains the configuration information from a system and I first became acquainted with the program during my time performing vulnerability assessments. The script uses Winaudit to collect some non-volatile data including the installed software, configured users/groups, and computer devices. Winaudit is capable of collecting a lot more information so it wouldn’t be that hard to incorporate the additional information by modifying the script.

Training Usage

These were the two items put into the script to assist with training members on performing incident response system triage.

        Ordered Output Reports

The script collects a wealth of information about a system and this may be overwhelming to analysts new to examining volatile data. For example, the script produces six different reports about the processes running on a system. A common question when faced with so many reports is how should they be reviewed. The script’s output reports have numbers which is the suggested order for them to be reviewed. This provides a little assistance to analysts until they develop their own process for examining the data. The screenshots below shows the process reports in the output folder and those reports opened in Notepad ++.




        Understanding Tool Functionality and Volatile Data

The script needs to help people better understand what the collected data means about the system where it came from. Two great references for collecting, examining, and understanding volatile data are Windows Forensic Analysis, 2nd edition and Malware Forensics: Investigating and Analyzing Malicious Code. I used both books when researching and selecting the script’s tools to collect volatile data. What better ways to help someone better understand the tools or data then by directing them to references that explain it? I placed comments in the script containing the page number where a specific tool is discussed and the data explained in both books. The screenshot below shows the portion of the script that collects process information and the references are highlighted in red.



Releasing the Tr3Secure Volatile Data Collection Script

There are very few things I do forensically that I think are cool; this script happens to be one of them. There are not many tools or scripts that work as intended while at the same time provide training. People who have more knowledge about volatile data can hit the ground running with the script investigating systems. The script automates imaging memory image, collecting volatile/non-volatile data, and documenting every action taken on the system. People with less knowledge can leverage the tool to learn how to investigate systems. The script collects data then the ordered output and references in the comments can be used to interpret the data. Talk about killing two birds with one stone.

The following is the location to the zip file containing the script and the readme file (download link is here). Please be advised, a few programs the script uses require administrative rights to run properly.

Enjoy and Happy Hunting ...

***** Update *****

The Tr3Secure script has been updated and you can read about the changes in the post Tr3Secure Data Collection Script Reloaded

***** Update *****

Ripping Volume Shadow Copies Sneak Peek

Monday, December 19, 2011 Posted by Corey Harrell 4 comments
I was hesitant to do a sneak peak about a different approach to examine Volume Shadow Copies (VSCs). I personally don’t like sneak peeks and would rather wait to see the finished product. I think it’s along the lines of starting a movie then stopping it after 15 minutes and being forced to finish watching months later. If I don’t like sneak peeks then why am I putting others through it? I previously mentioned how I wanted to spend my furlough days by putting together some posts about another approach to examining VSCs. Well last week was my furlough week and my family wrote a new version to the carol The Twelve Days of Christmas. Four out of town trips, three sick kids, two family emergencies, and one blogger quarantined to his room. Needless to say I had to spend my time focused on my family. I won’t have time to write the VSCs blog posts until next month so I at least wanted to show one example on how I use this method.

There are times when I get a system that has been altered and one change is removing financial software from the system. This is pretty important because if I’m trying to locate financial data then I need to know what software is on the system so I know what kind of files to look for. There is a chance some file types might initially be missed if I’m not aware a certain program was installed at some point in the past. Different registry keys can help determine what programs were installed or executed but you can get a more complete picture about a system by looking at those same registry keys at different points in time. Performing registry analysis in this manner has allowed me to quickly identify uninstalled financial applications which reduced the time needed to find the data. Anyone who has used Harlan’s RipXP understands the value in seeing registry keys at different points in time. I used the same concept with one exception: numerous registry keys can be queried at the same time when dealing with VSCs.

The system I used for this demonstration was a live Windows 7 Ultimate 32 bit system. In the past I also used it against Windows 7 and Vista. forensic images

Obtaining General Operating System Information

I discussed previously one initial examination step is to get a better understanding about the system I’m facing. I use a batch script with Regripper to obtain a wealth of information about how the system was configured when it was last powered on. The configuration information is from only one point in time but if the system has VSCs then that means the same information can be obtained from different points in time. Seeing the same configuration information enables you to see how the system changed slightly over time including what software was installed or uninstalled. To do this I made some modifications to the general operating system batch script which lets me run it against VSCs I have access to.

I’m not going to discuss accessing VSCs in this post. For information on how to access VSCs I’d check out Harlan’s Even More Stuff post since he provides a link to his slide deck he gave to the online DFIR meet-up on the topic. My Windows 7 system had 19 VSCs and for the demonstration I only used the following:

        - ShadowCopy19 12/13/2011 6:13:35 PM

        - ShadowCopy16 12/01/2011 8:08:50 AM

        - ShadowCopy3 11/28/2011 11:19:40 AM

        - ShadowCopy1 8/26/2011 12:15:34 PM

The screen shot below shows the main menu to the vsc-parser (most selections have sub menus). To review the system to identify software of interest I’m interested in selection 2: “Obtain General Operating System Information from Volume Shadow Copies”.


The selection will immediately execute my Regripper batch file against every VSC I have access to. The picture below shows the script running against my four VSCs. I highlighted the samparse and uninstall plug-ins that executed.


The output from the script is nicely organized into different folders based on what the information is.


I’m interested in the software on the system which means I need the reports in the software-information folder. A report was created for each VSC I had access to (notice how the file name contains the VSC number it came from).


Now at this point I can review the reports and notice the slight differences between each VSCs. I tend to look at the most recent VSC then work my way to the oldest VSC. It makes it easier to see how the system slightly changed over time from the forensic image I examined first.


On a case I used this technique and it helped me to identify a financial application that was removed from the system. In the end it saved some a lot of time because this was one of my initial steps and I knew right off the bat I was looking for specific file types. Some may be wondering why I decided to highlight the samparse plug-in as well. At another time the same technique helped me verify a user account existed on the system and narrow down the timeframe when it was removed from the system.

I showed an example running Regripper against registry hives stored in VSCs on a live Windows 7 system. However, the approach is not only limited to registry hives or Regripper since you can pretty much parse any data stored in a VSC.

A Time of Reflection

Tuesday, December 13, 2011 Posted by Corey Harrell 7 comments
Certain events in life cause you to reflect on humility and put back into perspective the meaningful things in life. You remember that in time almost everything is replaceable. Another forensicator will fill your shoes at work and your organization will continue to go on. Another researcher will continue your research and the little that you did accomplish will eventually just be a footnote. Another person will step up to provide assistance to others in forensic forums and listservs. Your possessions and equipment will become someone else’s to enjoy and use. When looking at the big picture, the work we do and value will eventually fade away and life will go on as if we were never there. One of the only things remaining will be the impact we make on others in the little time we have available to us. One doesn’t need a lot of time or resources to make an impact; all that’s needed is having a certain perspective.

Everyone should look out not [only] for his own interests, but also for the interests of others. Philippians 2:4

Having an outlook that looks beyond one’s own self interests can positively impact others and I think the statement holds true regardless of religious beliefs. A perspective that takes into consideration others’ interests is displayed everyday in the Digital Forensic and Incident Response (DFIR) community. DFIR forums have thousands of members but there are only a few who regularly take the time to research and provide answers to others’ questions. DFIR listservs are very similar that despite their membership the minority are the ones who regularly try to help others. Look at the quality information (books, articles, blogs, white papers, etc) available throughout the community and their authorship is only a small fraction of the people in the community. These are just a few examples out of many how individuals within the DFIR community use their time and resources in an effort to not only better themselves but to educate others as well.

When I look at the overall DFIR community I think there’s only a minority who are looking beyond their own interests in an effort to help others. A few people have helped me over my career which contributed to where I am today. They never asked for anything in return and were genuinely interested in trying to help others (myself included). If the DFIR community is what it is because of a few people giving up their time and resources to make a positive impact on others, then I can only wonder what our community would look like if the majority of people looked beyond their own interests to look after the interests of others. In the meantime all I can do is to continue to try to remember to look beyond myself in every aspect of my life. To try to consider those around me so I can help whoever crosses my path needing assistance. When the day is over one of the only things remaining will be the impact I have on others.
Labels:

Don’t Overlook Simulations

Monday, December 5, 2011 Posted by Corey Harrell 0 comments
A few weeks ago my family and I were eating dinner at our dining room table. A car alarm started going off outside so I went to the window to see what was going on. I first checked to make sure our cars weren't the ones making the noise and then I saw it was my neighbor’s car across the street. I went back to the dining table when my three year old said "the car is saying there is a fire drill". Laughing aside his statement made a lot of sense. Before that moment the only time he has heard loud sirens have been during fire drills. Naturally, his first thought when he heard something similar was a fire drill was happening. Fire drills are one simulation people have practiced (most of the time forced) over and over again to help them know how to proceed when the real thing occurs. Simulations in DFIR work the same way in helping educate ourselves how to proceed in certain types of scenarios.

Most trainings I attended reinforced learning by having the attendees practice on test images or data. The attendees just don't stumble around in the data since their objective is dictated by working through a simulation based on some case scenario. The simulation training approach even carries over to when people want to improve their skills on their own. Similar to the fire drill, different DFIR scenarios can expose people to different types of cases so they are more aware about their options and what to do when a real case comes up. The choices one has available for scenarios are to either use a test case put together by someone else or create your own. I found the latter option to be extremely effective at better preparing me since I can focus on areas I want to improve on. Simulations are how I developed my skills to investigative malware infected systems.

Forensicator Readiness is the thought process I use to develop and implement different scenarios. The process focuses my efforts on the exact skills or knowledge I want to learn more about. One simulation I’ve been working on for some time is answering these two questions about infected systems: is the system infected and how did the system get infected. All the different scenarios I developed overtime and some research I conducted was a direct result of trying to answer those two questions.

My scenarios started out by manually infecting systems with different malware to develop my skills in finding malware both in memory and on disk. Once I was effective at quickly locating the malware - without scanning - then the next step was to purposely attack systems. Some attacks I conducted such as running Metasploit against systems with malware as the payload while other attacks involved finding malicious SPAM emails or active drive-by attacks. In all the scenarios I simulated infections with different initial infection vectors on systems to provide myself with test cases. I improved my skills by examining the infected systems so I could answer were the systems infected and how did it occurred.

My neighbor’s car alarm put my three year old in fire drill mode. He didn’t get up and start walking towards the door because it was my neighbor’s car. The drill was for my neighbor and not us; otherwise our cars would have told us. :) Putting ourselves through our own simulations in advance increases our ability to be in the right mode when we need it. I was better prepared when I took on my first infected system. Not only did I locate the malware (without av scanning) but I was successful in tracing the infection back to a drive-by against an Adobe Reader vulnerability. It wasn’t luck I was able to do this right out of the gates. Nor was it luck I have continued to do this on system after system. This ability is a direct result of honing my skills in advanced by putting together my own simulations focused on areas I want to improve on.

People and children are not able to just able magically figure out how to exit a building in chaos. It takes practice and when chaos occurs the training kicks him to help people know how to proceed. DFIR is the same way; we won’t magically know how to process certain cases or answer certain questions. It takes practice and overtime we develop the knowledge on how to proceed with certain cases. Practicing can take the form of trainings or self simulations. Trainings are a one size fits all where the content is the same across the board. An advantage that self simulations have over trainings is one’s ability to focus on whatever area one wants. Time can be better spent focusing on the areas one doesn’t have knowledge about while trainings can be used to supplement other areas (this approach is a better way to use training dollars as well). The next time one wants to develop their DFIR skills then self simulations shouldn’t be overlooked as a viable option.
Labels:

jIIr Updates

Saturday, December 3, 2011 Posted by Corey Harrell 1 comments
A few quick updates about some things related to the blog …

Digital Forensic Search (DFS) Updates

I updated the Digital Forensic Search’s index today. Eight new blogs were added and I updated the URL for an existing blog. In no particular order the new editions are: Sketchymoose's Blog, Forensics For the Newbs, WriteBlocked, Hexacorn Blog, Zena Forensics, Taksati, Chris Sanders, and SANs Penetration Testing Blog. As usual, the Introducing the DFS blog post has been updated to reflect the changes.

I’m going to continue documenting the sites in the index on the Intro to DFS post. However, I’m probably going to stop posting updates on the blog since I’m leaning towards mentioning the changes through my twitter account.

I’m Now on Twitter

Earlier in the week I finally finished setting up my Twitter account and actually started to use it. As my profile indicates Twitter is my platform to share random thoughts which will mostly be focused on information security. I said mostly because the account won’t solely be used to discuss security. Please feel free to hit me up at corey_harrell.

A Different Approach to Analyzing Volume Shadow Copies

In a few weeks I’m going to have some time off from work since I’m taking some “furlough” days. My plan is to spend the time putting together some material (blog posts and videos) to further demonstrate a different approach to analyzing the data stored volume shadow copies.

Before discussing my approach I’m pointing out two current approaches. One is to image each VSCs then examining the data in the images. Another approach is to copy the data - including metadata - from all or select VSCs so it can be examined outside the VSCs. The approach I’ve been using is to examine the data while it’s still stored in the volume shadow copies. There are numerous benefits doing it this way such as reducing the amount of time needed or being able to work on both live systems and forensic images. I think the technique’s true power is the ability to see the same data at different points in time since shows how the data changed over time. This has been critical for me on a few different cases.

To help me examine VSCs in this manner I wrote a few different scripts. The material I’m putting together will not only explain my logic behind the scripts’ functionality but will show how it can be easily extended by anyone to meet their own needs. Yes, I'll also release the scripts as well. Plus, if I can pull off a video or two it should be cool for people to see it in action.

What’s TR3Secure?

At some point over the next few months you may see me start referencing and sharing some work I completed for something called TR3Secure. I’ll be the sole author of any work I share (mostly scripts) but I wanted to briefly discuss what TR3Secure is since I’ll be tagging my work with it. A few co-workers and a colleague of mine are working on setting up a training group for us to collaborate and develop our information security skills together. We are trying to create an environment to bring together security testers, incident responders, and digital forensic practitioners. We envision doing different activities including conducting live simulations and this is where bringing together the three different skillsets will shine. The live simulations will be conducted with select people attacking a test network while a second group responds, triages the situation, and if necessary contains the attack. Afterwards, the examiners will collect and examine any evidence to document the attack artifacts. When it’s all said and done then everyone will share their experiences and knowledge about the atack and if necessary train other members on any actions they completed during the simulation.

We are still in the early stages setting the group up and once established it initially has to be a closed group. I’m only mentioning TR3Secure here because I’m going to write various scripts (Perl and Batch) to help with certain aspects of the live simulations. If my scripts work well especially for training then I’ll share it for others to use for self training purposes. The scripts will solely be my own work but I’m still tagging everything with TR3Secure since I’m working with some great individuals. The first item coming down the pipeline is a cool dual purpose volatile data collection script that doubles as a training and incident response tool.

Linkz 4 Exploits to Malware

Wednesday, November 30, 2011 Posted by Corey Harrell 0 comments
In this edition of linkz the theme goes from exploitation to infection to detection. Some linkz discussed include: providing clarity about my exploit artifacts, a spear-phishing write-up, a malware analysis checklist, and thoughts about automated vs in-depth malware analysis.

Picking Vulnerabilities

Over the past year I’ve been conducting research to document attack vector artifacts. Vulnerabilities and the exploits that target them are one component to an attack vector. Some may have noticed I initially focused most of my efforts on vulnerabilities present in Adobe Reader and Java. I didn’t pick those applications by flipping a coin or doing “eeny, meeny, miny, moe”. It is not a coincidence I’m seeing exploit artifacts left on systems that target those applications. This has occurred because I pick vulnerabilities based on the exploits contained in exploit packs.

Exploit packs are toolkits that automate the exploitation of client-side vulnerabilities such as browsers, Adobe Reader, and Java. Mila Parkour over at Contagio maintains an excellent spreadsheet outlining the exploits available in different exploit packs on the market. The reference by itself is really informative. The screenshot below shows part of the vulnerabilities section in the spreadsheet.


Notice how many Java and Adobe vulnerabilities are on the list. Maybe now it’s a little more clearer why I wrote about Adobe/Java exploits and why I wasn’t surprised when system after system I keep finding artifacts associated with those exploits. The spreadsheet shows what applications exploit packs are targeting. I’ve been using the document as a reference to help me decide what exploit artifacts to document. Down the road when I start looking into Word, Excel, and flash exploits then at least there will be a little more clarity as to what I’m choosing to document.

Another Adobe Flash Exploit

Since I mentioned both Adobe, flash, and exploit in the same paragraph then I might as well mention them in the same sentence. Zscaler ThreatLab blog recently posted Adobe Flash “SWF” Exploit still in the Wild. The short write-up discusses how a vulnerability in Adobe flash (CVE-2011-0611) is being exploited by embedding a .swf file into Microsoft Office documents or html pages. I wanted to highlight one specific sentence "this exploit code embeds a “nb.swf” flash file into a webpage, which is then executed by the Adobe Flash player". That one sentence identified numerous potential artifacts one could find on a system indicating this attack vector was used. First there will be Internet browser activity followed by a flash file being accessed. The system may then show a swf file being created in a temporary folder and there may also be indications that Adobe flash executed shortly thereafter. The write-up doesn't go into what artifacts are left on a system since its focus is on how the attack worked. At this point those potential artifacts are just that potential. However, flash exploits are third on my list for what I'm going to start documenting. It shouldn’t come as a surprise by now that Mila’s spreadsheet also shows a few exploit packs targeting the flash vulnerability.

Walking through a Spearphishing Attack

The Kahu Security blog post APEC Spearfish does an excellent job walking the reader through an actual spearphishing attack. The spearphish was an email targeted at a single individual and contained a malicious PDF attachment. There was a flash object in the PDF file that exploited a vulnerability in Adobe Flash Player (yup … CVE-2011-0611). The end result was a malware infection providing backdoor access for the attackers. The post isn't written from the DF perspective; it doesn't outline the artifacts on a system indicating a spearphish occurred or a flash exploit caused the malware infection. However, it does a great job breaking down the attack from the person receiving the email to the PDF file launching to malware getting dropped. One image I liked was the one showing "what’s happening behind the scenes" since it helps DF readers see the potential artifacts associated with the method use to infect the system.

Finding Malware Checklist

Last month Harlan posted about his experience at PFIC 2011. At the end of his post he shared his Malware Detection Checklist which outlines the examination steps to locate malware on the system. I think it’s a great list and I like how the checklist is focused on a specific task; finding malware. I added some of the activities in Harlan’s checklist to my own because I either wasn’t doing it or I wasn’t deliberate about doing it. One step I wasn’t doing was scanning for packed files and thinking about it I can see how it can help reduce the amount of binaries to initially examine. On the other end, one step I wasn’t deliberate about was examining the user’s temporary directories. I was examining these directories through timeline analysis but I wasn’t deliberate about searching the entire folder for malware or exploit artifacts.

The cool thing about Harlan’s checklist is that he already did the heavy lifting. He put together a process that works for him (including the tools he uses) and is sharing it with the community. It wouldn’t be hard for anyone to take what he already did and incorporate it into their own examination process. Plus, his blog post mentioned the checklist came from Chapter 6 in WFA 3 so the book (once released) can be a reference to better understand how to go about finding malware on a system.

Another Angle at Finding Malware

Along the same lines about trying to identify malware on a system Mark Morgan over at My Stupid Forensic Blog discussed the topic in his post How to Identify Malware Behavior. Mark first proceeded to explain the four main characteristics of malware which are: an initial infection vector, malware artifacts, propagation mechanism, and persistence mechanism. (Harlan also described these characteristics on his malware webpage). The characteristics are important since there artifacts associated with them and those artifacts can help identify the malware. Mark provided a great example about how the persistence mechanism played a role in one of his cases. He even went on to explain a few different ways to track down malware and its persistence mechanism.

Analyzing Malware

At some point during the examination malware will be identified on the system. Some can just start analyzing the malware since they are fortunate enough to know how to reverse engineer its functionality or know someone who can. The rest of us may not have that luxury so we should just upload the sample to online scanners such as VirusTotal or Sandboxes right? Well, I wouldn’t be too fast in pulling the trigger without understanding the risks involved. The Hexacorn blog put together the excellent post Automation vs. In-depth Malware Analysis. In the author’s own words the "post is my attempt to summarize my thoughts on the topic of both automated malware analysis in general and consensual submission of files to a web site owned by a third party". There are times when submitting samples to a third party service are not the best choice to make. I first learned about the risk when a discussion occurred in the Win4n6 group sometime ago but the post goes into more depth. For anyone dealing with malware and considering using third party services -such as VirusTotal or ThreatExert - than I highly recommend reading this post to help you make an informed decision.

On a side note, the Hexacorn blog started to post forensic riddles every Friday followed by posting the answer every Monday. The riddles are entertaining and educational. My stat count so far is zero for two (I wasn’t even close).
Labels: ,

Finding the Initial Infection Vector

Sunday, November 20, 2011 Posted by Corey Harrell 9 comments
There are different ways to spread malware. Email, instant messaging, removable media, or websites are just a few options leveraged to infect systems. One challenge when performing an examination is determining how the malware ended up on the system which is also referred to as identifying the malware’s initial infection vector (IIV). A few obstacles in determining the IIV is that a system changes over time: files are deleted, programs are installed, temporary folders are emptied, browser history is cleared, or an antivirus program cleaned the system. Every one of those obstacles may hinder the examination. However, they don’t necessary result in not being able to narrow down the IIV since some artifacts may still be present on the system pointing to the how.

There are various reasons provided why an examination isn’t performed on a malware infected system to locate the IIV. I first wanted to point out why taking the time to find the IIV is beneficial instead of focusing on the reasons why people don’t. The purpose of the root cause analysis is to identify the factors lead up to the infection and what actions need to be changed to prevent the reoccurrence of a similar incident. If the infected system is just cleaned and put back into production then how can security controls be adjusted or implemented to reduce malware infecting systems in a similar manner? Let’s see how this works by skipping the root cause analysis and placing blame on a user opening a SPAM email. A new security awareness initiative educates employees on not opening SPAM email which does very little if the malware was a result of a break down in the patch management process. Skipping figuring out the IIV is not only a lost opportunity for security improvements but it prevents knowing when the infection first occurred and what data may have been exposed. This applies to both organizations and individuals.

Determining how the malware infected a system is a challenge but that's not a good enough reason to not try. It may be easier to say it can’t be done, takes too much resources or it's not worth it since someone (aka users) never listen and did something they weren’t suppose to. As a learning opportunity I’m sharing how I identified the initial infection vector in a recent examination by showing my thought process and tool usage.

First things first… I maintain the utmost confidentiality in any work I perform whether if it’s DFIR or vulnerability assessments. At times on my blog I write detailed posts about actual examinations I performed and every time I’ve requested permission to do so. This post is no different. I was told I can share the information for the greater good since it may help educate others in the DFIR community who are facing malware infected systems.

Background Information

People don’t treat me as their resident “IT guy” to fix their computer issues anymore. They now usually contact me for another reason because they are aware that I’ve been cleaning infected computers for the past year free of charge. So it’s not a strange occurrence when someone contacts me saying their friend/colleague/family member/etc appears to be infected with a virus and needs a little help. That’s pretty much how this examination came about and I wasn’t provided with any other information except for two requests:

        * Tell them how the infection occurred so they can avoid this from happening again

        * Remove the viruses from the computer

     Investigation Plan

The methodology used throughout the examination is documented on the jIIr Methodology Page. I separated the various system examination steps into the first three areas listed below.

        1. Verify the system is infected
        2. Locate all malware present on the system
        3. Identify the IIV
        4. Eradicate the malware and reset any system changes

I organized the areas so each one will build on the previous one. My initial activities were to verify that the system was actually infected as opposed to the requester interpreting a computer issue as an infection. To accomplish this I needed to locate a piece of malware on the system either through antivirus scanning or reviewing the system auto-run locations. If malware was present then the next thing I had to do was locate and document every piece of malware on the computer by: obtaining general information about the system, identifying files created around the time frame malware appeared, and reviewing the programs that executed on the system. The examination would require since the technique excels at highlighting malware on a system. The third area and the focus of this post was to identify the initial infection vector. The IIV is detected by looking at the system activity in the timeline around the timeframe when each piece of malware was dropped onto the system. The activity can reveal if all of the malware is from the same attack or if there were numerous attacks resulting in different malware getting dropped onto the system. The final area is to eradicate every malware identified.

Note: Some activities were conducted in parallel to save time. To make it easier for people to follow my examination I identified each activity with the symbol <Step #>, the commands I ran are in bold, and registry and file paths are italicized.

Verifying the Infection

The computer’s hard drive was connected to my workstation and a software write blocker prevented the drive from being modified. I first reviewed the master boot record (MBR) to see the drive configuration I was dealing with and to check for signs of MBR malware <Step 1>. I ran the Sleuthkit command: mmls.exe -B \\.\PHYSICALDRIVE1 (the -B switch shows the size in bytes). There was nothing odd about the hard drive configuration and I found out that additional time was needed to complete the examination since I was dealing with a 500 GB hard drive. To assist with identifying known malware on the system I fired off a Kaspersky antivirus scan against the drive <Step 2>.

Knowing the antivirus scan was going to take forever to complete I moved on to checking out the system’s auto-runs locations for any signs of infection. The Sysinternals AutoRuns for Windows utility was executed against the Windows folder and the only user profile on the system <Step 3>. In the auto-runs I was looking for unusual paths launching executables, misspelled file names, and unusual folders/files. It wasn’t long before I came across an executable with a random name in the HKCU\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon\Shell registry key.

The HKCU\Software\Microsoft\Windows\CurrentVersion\Run registry key also listed under Auto-runs Logon tab showed that the C:\Users\John_Doe\AppData\Roaming folder had more than just one randomly named executable. The key also showed an additional location which was the C:\Users\John_Doe\AppData\Roaming\Microsoft folder.

I added the hard drive as an evidence item in FTK Imager v3 to review the folders and executables identified by the Auto-runs utility <Step 4>. I noticed there were two additional executables located directly underneath the Roaming folder with the names iexplore.exe and java.exe. Both files had the same MD5 hash e4c2a000e715d16ec25e2b0a0fb3532f so to confirm the infection I searched for the hash in the Malware Analysis Search custom Google search. There was one search hit for VirScan.org and a few scanners flagged the file as malware (Kaspersky identified it as Trojan.Win32.FakeAV.emha). I followed the similar process to confirm that the other executables were malware as well. At this point I no longer needed the antivirus scan to finish since the infection was verified through other means. Before I moved onto manually locating all malware on the system I needed to document what my timeframe of interest was. I looked at the last modification times and creation times for all the folders/files I found. The rough timeframe spanned over a few days: from 10\13\2011 1:29:34 AM to 10/08/11 11:38:48PM. The picture below shows the last modification times for a few folders in the C:\Users\John_Doe\AppData\Roaming folder.


Locating All Malware on the System

After I verified the system was in fact infected I then proceeded to locate and document every piece of malware. First I had to shed light on the system’s configuration since it would impact how I performed my analysis <Step 5>. I used my regripper-general-os-info.bat batch file to run RegRipper against the system’s registry hives including the one profile’s NTUSER.dat hive. Below I highlighted some information and to the right of the arrow are quick notes about its significance.

        * Operating system was Windows 7 Home Premium <= affected what artifacts are available and where they are located

        * OS Install Date was Sun Feb 20 23:26:29 2011 (UTC) <= may assist with identify activity occurring before this date

        * Timezone was Eastern Standard <= needed to understand time information

        * The registry setting NtfsDisableLastAccessUpdate was enabled <= can’t use files’ last access times since it’s not tracked (default setting in Windows 7)

        * Profilelist registry key only showed one user account besides the default ones <= focused the examination around the activity for one specific user account

       * Installer\UserData registry key showed the following programs: Microsoft Office 2010 including Outlook, iTunes v.10, QuickTime v.7.69, Adobe Reader v9.3.4, and Java(TM) 6 Update 17 <= identified applications that could have been responsible for the malware infection

       * Default browser plugin showed the default browser was Internet Explorer <= system had two web browsers (Chrome was the other) so my initial focus is on the artifacts from the default one

       * Listsoft registry key showed McAfee <= McAfee antivirus software was on the system and its logs may show additional information about the infection.

I opted for the timeline analysis technique to locate all malware on the system and the general information obtained about the system helped to narrow down my artifact list to incorporate into my timeline. Building a timeline on a 500 GB hard drive was going to take some time so I looked at the McAfee logs before tying up my workstation <Step 6>. I exported the McAfee logs with FTK Imager and reviewed them using Notepad ++. The last entry in the log occurred at 10/16/2011 6:50:09 PM and it logged that the file "C:\windows\system32\consrv.DLL" was detected as Generic.dx!bbd4. The next entry didn’t occur until 10/12/11 but there were numerous log entries leading right up until 10/08/11. A few detections included Generic Dropper!1cj, DNSChanger!fa, and Artemis!E4C2A000E715 and they were for files located the folders C:\Users\John_Doe\AppData\Local\Temp\, C:\Windows\assembly\tmp\, and C:\windows\syswow64\. The flurry of McAfee detections for files other than cookies stopped at 10/8/2011 11:37:38 PM as shown in the picture below.

The McAfee log identified potential additional malware on the system and expanded my timeframe to 10/16/2011 6:50:09 PM to 10/08/11 11:37:38 PM. A significant piece of information the log highlighted was Internet activity occurred just before the first detection. I leverage the timeline analysis technique for the rest of the examination. I created a timeline by incorporating the following artifacts: event logs (evtx), registry hives (system, software, and ntuser), link files (win_link), prefetch files (prefetch), Internet Explorer history (iehistory), and the Master File Table (mft) <Step 7>. I ran the following command but replaced the plugin and file path for each desired artifact: log2timeline.pl -f evtx -w timeline.csv E:\Windows\System32\winevt\Logs\Application.evtx. Once my timeline was built I then I started my search for all malware on the system.

Identify the IIV

Locating all malware present on the system and identifying the IIV are not separate activities when I perform timeline analysis. The only reason I separated them was to make it easier to explain my thought process. In actuality the two go hand in hand. Each time a piece of malware is located the system activity around the malware is examined to determine what contributed to the malware being created. Approaching timeline analysis in this manner will help determine if the malware is from the one attack or multiple attacks at different points in time. I review timelines working backwards in time since I find that it’s easier to spot the IIV. Each time I come across a file that could be malicious I first review the file’s header (in this examination I used FTK Imager), perform searches for the file’s MD5 hash (search order is Malware Analysis Search, VirusTotal, and then Google), and at times if the hash search results in no hits and the file type is of interest then I may upload the file to VirusTotal to see if it’s detected. I continue this process in the timeline until I reach the point where the malware activity stops and that’s usually where the IIV is located.

To assist with confirming malicious files I used FTK Imager to export a file hash list for the entire hard drive <Step 8>. It’s a lot easier to already have files’ hashes on hand then it is to calculate the hash each time I come across a new file. I started working my timeline keeping in mind everything I found including the timeframe 10/16/2011 6:50:09 PM to 10/08/11 11:37:38 PM. Besides the timestamps that were not accurate (reflects activity in future) the timeline ended on 10/16/2011 so that is where I started my analysis. I first saw the consrv.dll file detected by McAfee but there were no artifacts around the malware indicating it was the result of a different attack.

After 10/16/11 the next activity started appearing in the timeline on 10/12/11. I found the same thing; more malware and artifacts associated with malware but no artifacts indicating an attack occurred.


I kept working the timeline going backwards in time. I kept finding more malware and malware artifacts but nothing pointing to an IIV explaining how the malware got onto the system. I finally reached the earliest time I noted which was 10/08/11 11:37:38 PM. There was a lot of activity involving files with similar names to the ones reflected in the McAfee log file.

I continued working backwards until I saw no more activity involving the C:\Windows\assessmbly\tmp\U\ folder which is shown in the screenshot below. The U folder was created on the system at the same time as a file resembling a configuration file. One line in the file was srv=hxxps://212.36.9.52/ and my research showed the address appeared in a blacklist and the spsyeyetracker IP blocklist. The activity just before the U folder and configuration file were created was an executable named dbywqomgec (MD5 hash a70e5c48612159b3e936d7e478f4d451) appearing in the John_Doe’s temp folder. VirusTotal showed a few antivirus programs identified the file as a dropper (Microsoft detection was TrojanDropper:Win32/Sirefef.B). Afterwards I analyzed the file with ThreatExpert to see what changes the malware caused.

The activity on the system before the dropper (MD5 hash a70e5c48612159b3e936d7e478f4d451) appeared on the system was a file showing up in the Java cache folder as shown below.

I previously discussed the forensic significance Java index files provide in the post (Almost) Cooked Up Some Java. I exported the Java index file 46e770f3-38b55d85.idx with FTK Imager and looked at the file with Notepad ++. The file’s contents are shown below.

The index file 46e770f3-38b55d85.idx showed a few interesting tidbits. First the file 46e770f3-38b55d85 was downloaded from the URL hxxp://www.seyminck.com/FFFO009/560[dot]gif which had the IP address 212.95.55.40. Secondly, the URL indicated the file was a gif image but the index recorded the file as an application. I checked the file 46e770f3-38b55d85 (MD5 hash 2e833ac26483aaad13a8051bc857ef15) header and it was indeed an executable since the file started with MZ. I analyzed the file with ThreatReport and it was identified as a dropper (Microsoft detection was TrojanDropper:Win32/Sirefef.B). The IIV still wasn’t located so I looked at the activity just before the dropper appeared in the Java cache. The activity showed at the same time another duplicate of the dropper (MD5 hash 2e833ac26483aaad13a8051bc857ef15) appeared in the John_Doe’s temp folder with the file name 0.945837921339929.exe. Four seconds beforehand a file appeared in the Java cache folder which can be seen below highlighted in red.

The Java index file 25e8c780-5c17647b.idx was exported with FTK Imager and read with Notepad ++. The information contained in the index showed that a Java archive file was downloaded from the URL hxxp://www.seyminck(dot)com/FFFO009/RRo/realestate (IP address 212.95.55.40). The Java archive came from the same domain and IP address as the executable located in the Java cache folder. I exported the Java archive 25e8c780-5c17647b (MD5 hash 6b478de65071d94c670a0bfa369a7890) and confirmed the file was a Jar file by examining it with JD-GUI. The MD5 hash search didn’t result in any hits so I uploaded the file to VirusTotal and only 2 out of 42 antivirus products detected it as an exploit. I wanted to know if Java actually executed around the time the exploit appeared in the cache. I exported and reviewed the Java log file C:\Users\John_Doe\AppData\Local\Temp\java_install_reg.log and the log showed Java did in fact execute.

The last piece I needed to identify the IIV was to determine what delivered the exploit to the system. The activity on the system before the exploit answered that question as shown below.

There was a PrivacIE entry for seyminck(dot)com/FFFO009/RRo/*87354602 which means the exploit came from third party content being displayed on a website. The PrivacIE entry was mixed in with activity resembling advertisements from the user searching for someone on peoplefinder and whitepages websites. I continued working backwards in the timeline but there was no more malware activity. The IIV was identified. A user was surfing the Internet when a website visited was hosting third party content which resulted in a successful drive-by download targeting a Java vulnerability.

More Information about the IIV

The Java archive 25e8c780-5c17647b (MD5 hash 6b478de65071d94c670a0bfa369a7890) didn’t have to be examined closer in order to identify the IIV. However, I wanted to better understand how to examine Jar files since they may provide more information about the IIV and help explain some files found on the system. I debated if I should put this section in another blog post because I didn’t want people to think this activity had to be done in order to figure out the IIV. I opted to include the information since it sheds light on what occurred when the exploit was downloaded.

The code in the Jar file was obfuscated to conceal its purpose. I reached out to the Win4n6 group about any methods to automate analyzing Jar files with obfuscated code. A few members pointed me to Java de-obfuscation tools and I’m still in the process of trying to learn how to use them. Another member mentioned that Java obfuscation appears to be not making analysts’ life difficult, but to evade detection by antivirus. The person went on to say the obfuscation is usually weak so it’s relatively simple to de-obfuscate. My first reaction was it may be simple for Java programmers but it seemed impossible to me; I know nothing about Java besides the artifacts left by Java exploits. I took a shot at manually trying to see what the Jar file did by focusing on trying to follow the logic associated the variables, class methods, and functions in the code (I don’t know the Java syntax so if I butcher the names of things such as functions then you know why).

I opened the Java archive 25e8c780-5c17647b in JD-GUI and looked at the manifest file to see the wall Java class gets executed first.

I extracted the Java source code by using the “Save All Sources” option in JD-GUI. I started reviewing the obfuscated source code in the Wall Java class when I saw two lines of code making a call to the Java method Muuum.kjdhfdkjg or Muuum.idufhidufh. For those who don’t know what a Java method is: it’s basically going to the Muuum class and executing the code listed under the method kjdhfdkjg or idufhidufh.

I followed the code to the Muuum class file and found out its purpose was to set a variable to contain an URL. Two variables are set to contain part of the URL and they are then used to build the entire URL. One URL that is built is hxxp://www.seyminck.com/ FFFO009 /560[dot]gif and this was the URL I found in the Java index 46e770f3-38b55d85.idx showing it was where the executable file 46e770f3-38b55d85 (MD5 hash 2e833ac26483aaad13a8051bc857ef15) came from. The screenshot below shows the URL being put together.

I went back to the Wall class and kept reading the code until I came across the first Java function as shown below. The Inputstream function reads data and the data being read was coming from the Java method Kkdjfhgdkfjhgkdfjhgkkkkkkkkkkkk.sodarifhsdoiufhdoiufg86fetgfyusgfyudif. I highlighted the Inputstream function in green while the Java method is highlighted in red.

The followed the code to the sodarifhsdoiufhdoiufg86fetgfyusgfyudif method. The method set the variable URL to contain the value contained in variable s3 which the Wall Java class passed to the method. The method ended with by returning a call to another method in the Kkdjfhgdkfjhgkdfjhgkkkkkkkkkkkk class as highlighted in red below.

Next I went to the mmmm3 method which is pictured below. The first function InputStream sets the URL to read from while the second function Openstream reads the URL stored in the URL variable. I couldn’t find the code that resulted in the URL variable containing the domain hxxp://www.seyminck[dot]com. However, this was the URL the method was reading from becaue the Jar file didn’t reference any other websites. The method returns to the Wall class the data read from the URL.

I went back to the Wall class and continued to follow the code. The next portion I picked up on is the data read from the URL was saved to a file with an exe extension. The picture below shows the code that accomplished this and I highlighted a few areas to make it easier to see. The variable ufy highlighted in the first red box was set to contain a string with a random number ending in .exe. The next variable iioi655er5w5 (highlighted in blue) was set to contain another variable concatenated with the ufy variable at the end. This means the string contained in iioi655er5w5 ends in .exe. The function FileOutputStream writes data to a file and names the file with the string in the iioi655er5w5 variable.

The previous code explains the activity on the system immediately after the exploit was downloaded. Reading the URL hxxp://www.seyminck.com/FFFO009/560[dot]gif resulted in Java caching the file while Java wrote the data to a file with an .exe file extension. The Java index file 25e8c780-5c17647b.idx showed that the file 46e770f3-38b55d85 (MD5 hash 2e833ac26483aaad13a8051bc857ef15) in the Java cache was read from the URL in the Java exploit. Another file with the same MD5 hash was created on the system at the same time and was named a random number with exe as the file extension.

At the bottom of the previous screenshot shows the Java method Kkdjfhgdkfjhgkdfjhgkkkkkkkkkkkk.kjsf8888 being called and the variable iioi655er5w5 (contains the filename ending in .exe) is passed for the method to use. The picture below is a close up of the method call.

My journey following the code ended when I went to the kjsf8888 method in the Kkdjfhgdkfjhgkdfjhgkkkkkkkkkkkk class file. The code highlighted in green in the picture below highlights the function Runtime exec executing the file contained in the iioi655er5w5 variable which is a file whose name is random number with an .exe extension (seems like this file 0.945837921339929.exe found on the system). The activity on the system after 0.945837921339929.exe was created in the temp folder was another dropper (MD5 hash a70e5c48612159b3e936d7e478f4d451) showing up on the system. To me this further confirms the Jar file was successful in exploiting a vulnerability in Java and this was how the system became infected in the first place.

Summary

I went into the examination planning on to perform a surgical malware removal and ended up doing a complete system rebuild due to how bad the infection was. The initial infection vector was a user surfing the Internet and coming across a website hosting third party content which resulted in a successful drive-by download targeting some Java vulnerability. Going back to the person and telling them how the infection happened makes it easier for them to change what lead up to the issue. I would have done a disservice if I skipped trying to find the IIV and went back to the person with a laundry list of recommendations. Enable the firewall, use strong passwords, update anti-virus software, use caution with opening attachments, use caution clicking on links, update computer software, etc … Throwing out a laundry list of recommendations is a lost opportunity to improve security since it doesn’t address the root cause. Trying to implement five or ten recommendations is a lot harder than focusing on the one recommendation that actual caused the infection.

Identifying the IIV is a challenge worth confronting. For success one not only needs to understand the forensic artifacts located on a system and their significance but needs to know about attack vector artifacts and how to recognize them. Being able to understand both artifacts types can help in answering the question how did malware end up on the system.