OSXCollector: Forensic Collection and Automated Analysis for OS X
Ivan L., Engineering Manager - Security
- Jan 12, 2015
We use Macs a lot at Yelp, which means that we see our fair share of Mac-specific security alerts. Host based detectors will tell us about known malware infestations or weird new startup items. Network based detectors see potential C2 callouts or DNS requests to resolve suspicious domains. Sometimes our awesome employees just let us know, “I think I have like Stuxnet or conficker or something on my laptop.”
When alerts fire, our incident response team’s first goal is to “stop the bleeding” - to contain and then eradicate the threat. Next, we move to “root cause the alert” - figuring out exactly what happened and how we’ll prevent it in the future. One of our primary tools for root causing OS X alerts is OSXCollector.
OSXCollector is an open source forensic evidence collection and analysis toolkit for OS X. It was developed in-house at Yelp to automate the digital forensics and incident response (DFIR) our crack team of responders had been doing manually.
Performing Forensics Collection
The first step in DFIR is gathering information about what’s going on - forensic artifact collection if you like fancy terms. OSXCollector gathers information from plists, sqlite databases and the local filesystem then packages them in an easy to read and easier to parse JSON file.
osxcollector.py is a single Python file that runs without any dependencies on a standard OS X machine. This makes it really easy to run collection on any machine - no fussing with brew, pip, config files, or environment variables. Just copy the single file onto the machine and run it.
sudo osxcollector.py is all it takes.
Details of Collection
The collector outputs a
.tar.gz containing all the collected artifacts. The archive contains a JSON file with the majority of information. Additionally, a set of useful logs from the target system logs are included.
The collector gathers many different types of data including:
- install history and file hashes for kernel extensions and installed applications
- details on startup items including LaunchAgents, LaunchDaemons, ScriptingAdditions, and other login items
- OS quarantine, the information OS X uses to show ‘Are you sure you wanna run this?’ when a user is trying to open a file downloaded from the internet
- file hashes and source URL for downloaded files
- a snapshot of browser history, cookies, extensions, and cached data for Chrome, Firefox, and Safari
- user account details
- email attachment hashes
The docs page on GitHub contains a more in depth description of collected data.
Performing Basic Forensic Analysis
Forensic analysis is a bit of an art and a bit of a science. Every analyst will see a bit of a different story when reading the output from OSXCollector - that’s part of what makes analysis fun.
Generally, collection is performed on a target machine because something is hinky: anti-virus found a file it doesn’t like, deep packet inspect observed a callout, endpoint monitoring noticed a new startup item, etc. The details of this initial alert - a file path, a timestamp, a hash, a domain, an IP, etc. - is enough to get going.
OSXCollector output is very easy to sort, filter, and search for manual forensic analysis. By mixing a bit of command-line-fu with some powerful tools like like grep and jq a lot of questions can be answered. Here’s just a few examples:
Get everything that happened around 11:35
Just the URLs from that time period
Just details on a single user
Performing Automated Analysis with OutputFilters
Output filters process and transform the output of OSXCollector. The goal of filters is to make it easy to analyze OSXCollector output. Each filter has a single purpose. They do one thing and they do it right.
For example, the FindDomainsFilter does just what it sounds like: it finds domain names within a JSON entry. The domains are added as a new key to the JSON entry. For example, given the input:
the FindDomainsFilter would add an
osxcollector_domains key to the output:
This enhanced JSON entry can now be fed into additional OutputFilters that perform actions like matching domains against a blacklist or querying a passive DNS service for domain reputation information.
Finds domain names in OSXCollector output and adds an
osxcollector_domains key to JSON entries.
Compares data against user defined blacklists and adds an
osxcollector_blacklist key to matching JSON entries.
Analysts should create blacklists for domains, file hashes, file names, and any known hinky stuff.
Breaks an initial set of file paths into individual file and directory names and then greps for these terms. The RelatedFilesFilter is smart and ignores usernames and common terms like
This filter is great for figuring out how evil_invoice.pdf landed up on a machine. It’ll find browser history, quarantines, email messages, etc. related to a file.
ChromeHistoryFilter and FirefoxHistoryFilter
Builds a really nice browser history sorted in descending time order. The output is comparable to looking at the history tab in the browser but contains more info such as whether the URL was visited because of a direct user click or visited in a hidden iframe.
Threat API Filters
OSXCollector output typically has thousands of potential indicators of compromise like domains, urls, and file hashes. Most are benign; some indicate a serious threat. Sorting the wheat from the chaff is quite a challenge. Threat APIs like OpenDNS, VirusTotal, and ShadowServer use a mix confirmed intelligence information with heuristics to augment and classify indicators and help find the needle in the haystack.
Looks up an initial set of domains and IP with the OpenDNS Umbrella API and finds related domains. Threats often involve relatively unknown domains or IPs. However, the 2nd generation related domains, often relate back to known malicious sources.
OpenDNS & VirusTotal LookupDomainsFilter
Looks up domain reputation and threat information in VirusTotal and OpenDNS.
The filters uses a heuristic to determine what is suspicious. These can create false positives but usually a download from a domain marked as suspicious is a good lead.
ShadowServer & VirusTotal LookupHashesFilter
Looks up hashes with the VirusTotal and ShadowServer APIs. VirusTotal acts as a blacklist of known malicious hashes while ShadowServer acts as a whitelist of known good file hashes.
AnalyzeFilter - The One Filter to Rule Them All
AnalyzeFilter is Yelp’s one filter to rule them all. It chains all the previous filters into one monster analysis. The results, enhanced with blacklist info, threat APIs, related files and domains, and even pretty browser history is written to a new output file.
Then Very Readable Output Bot takes over and prints out an easy-to-digest, human-readable, nearly-English summary of what it found. It’s basically equivalent to running:
and then letting a wise-cracking analyst explain the results to you. The Very Readable Output Bot even suggests new values to add to your blacklists.
This thing is the real deal and our analysts don’t even look at OSXCollector output until after they’ve run the AnalyzeFilter.