OSXCollector: Forensic Collection and Automated Analysis for OS X

Introducing OSXCollector

We use Macs a lot at Yelp, which means that we see our fair share of Mac-specific security alerts. Host based detectors will tell us about known malware infestations or weird new startup items. Network based detectors see potential C2 callouts or DNS requests to resolve suspicious domains. Sometimes our awesome employees just let us know, “I think I have like Stuxnet or conficker or something on my laptop.”

When alerts fire, our incident response team’s first goal is to “stop the bleeding” - to contain and then eradicate the threat. Next, we move to “root cause the alert” - figuring out exactly what happened and how we’ll prevent it in the future. One of our primary tools for root causing OS X alerts is OSXCollector.

OSXCollector is an open source forensic evidence collection and analysis toolkit for OS X. It was developed in-house at Yelp to automate the digital forensics and incident response (DFIR) our crack team of responders had been doing manually.

Performing Forensics Collection

The first step in DFIR is gathering information about what’s going on - forensic artifact collection if you like fancy terms. OSXCollector gathers information from plists, sqlite databases and the local filesystem then packages them in an easy to read and easier to parse JSON file.

osxcollector.py is a single Python file that runs without any dependencies on a standard OS X machine. This makes it really easy to run collection on any machine - no fussing with brew, pip, config files, or environment variables. Just copy the single file onto the machine and run it. sudo osxcollector.py is all it takes.

Details of Collection

The collector outputs a .tar.gz containing all the collected artifacts. The archive contains a JSON file with the majority of information. Additionally, a set of useful logs from the target system logs are included.

The collector gathers many different types of data including:

install history and file hashes for kernel extensions and installed applications
details on startup items including LaunchAgents, LaunchDaemons, ScriptingAdditions, and other login items
OS quarantine, the information OS X uses to show ‘Are you sure you wanna run this?’ when a user is trying to open a file downloaded from the internet
file hashes and source URL for downloaded files
a snapshot of browser history, cookies, extensions, and cached data for Chrome, Firefox, and Safari
user account details
email attachment hashes

The docs page on GitHub contains a more in depth description of collected data.

Performing Basic Forensic Analysis

Forensic analysis is a bit of an art and a bit of a science. Every analyst will see a bit of a different story when reading the output from OSXCollector - that’s part of what makes analysis fun.

Generally, collection is performed on a target machine because something is hinky: anti-virus found a file it doesn’t like, deep packet inspect observed a callout, endpoint monitoring noticed a new startup item, etc. The details of this initial alert - a file path, a timestamp, a hash, a domain, an IP, etc. - is enough to get going.

OSXCollector output is very easy to sort, filter, and search for manual forensic analysis. By mixing a bit of command-line-fu with some powerful tools like like grep and jq a lot of questions can be answered. Here’s just a few examples:

Get everything that happened around 11:35

Just the URLs from that time period

Just details on a single user

Performing Automated Analysis with OutputFilters

Output filters process and transform the output of OSXCollector. The goal of filters is to make it easy to analyze OSXCollector output. Each filter has a single purpose. They do one thing and they do it right.

For example, the FindDomainsFilter does just what it sounds like: it finds domain names within a JSON entry. The domains are added as a new key to the JSON entry. For example, given the input:

{ "visit_time": "2014-10-16 09:44:57", "title": "Pizza New York, NY", "url": "http://www.yelp.com/search?find_desc=pizza&find_loc=NYC" }

the FindDomainsFilter would add an osxcollector_domains key to the output:

{ "visit_time": "2014-10-16 09:44:57", "title": "Pizza New York, NY", "url": "http://www.yelp.com/search?find_desc=pizza&find_loc=NYC", "osxcollector_domains": ["yelp.com","www.yelp.com"] }

This enhanced JSON entry can now be fed into additional OutputFilters that perform actions like matching domains against a blacklist or querying a passive DNS service for domain reputation information.

Basic Filters

FindDomainsFilter

Finds domain names in OSXCollector output and adds an osxcollector_domains key to JSON entries.

FindBlacklistedFilter

Compares data against user defined blacklists and adds an osxcollector_blacklist key to matching JSON entries.

Analysts should create blacklists for domains, file hashes, file names, and any known hinky stuff.

RelatedFilesFilter

Breaks an initial set of file paths into individual file and directory names and then greps for these terms. The RelatedFilesFilter is smart and ignores usernames and common terms like bin or Library.

This filter is great for figuring out how evil_invoice.pdf landed up on a machine. It’ll find browser history, quarantines, email messages, etc. related to a file.

ChromeHistoryFilter and FirefoxHistoryFilter

Builds a really nice browser history sorted in descending time order. The output is comparable to looking at the history tab in the browser but contains more info such as whether the URL was visited because of a direct user click or visited in a hidden iframe.

Threat API Filters

OSXCollector output typically has thousands of potential indicators of compromise like domains, urls, and file hashes. Most are benign; some indicate a serious threat. Sorting the wheat from the chaff is quite a challenge. Threat APIs like OpenDNS, VirusTotal, and ShadowServer use a mix confirmed intelligence information with heuristics to augment and classify indicators and help find the needle in the haystack.

OpenDNS RelatedDomainsFilter

Looks up an initial set of domains and IP with the OpenDNS Umbrella API and finds related domains. Threats often involve relatively unknown domains or IPs. However, the 2nd generation related domains, often relate back to known malicious sources.

OpenDNS & VirusTotal LookupDomainsFilter

Looks up domain reputation and threat information in VirusTotal and OpenDNS.

The filters uses a heuristic to determine what is suspicious. These can create false positives but usually a download from a domain marked as suspicious is a good lead.

ShadowServer & VirusTotal LookupHashesFilter

Looks up hashes with the VirusTotal and ShadowServer APIs. VirusTotal acts as a blacklist of known malicious hashes while ShadowServer acts as a whitelist of known good file hashes.

AnalyzeFilter - The One Filter to Rule Them All

AnalyzeFilter is Yelp’s one filter to rule them all. It chains all the previous filters into one monster analysis. The results, enhanced with blacklist info, threat APIs, related files and domains, and even pretty browser history is written to a new output file.

Then Very Readable Output Bot takes over and prints out an easy-to-digest, human-readable, nearly-English summary of what it found. It’s basically equivalent to running:

$ cat SlickApocalypse.json | \ python -m osxcollector.output_filters.find_domains | \ python -m osxcollector.output_filters.shadowserver.lookup_hashes | \ python -m osxcollector.output_filters.virustotal.lookup_hashes | \ python -m osxcollector.output_filters.find_blacklisted | \ python -m osxcollector.output_filters.related_files | \ python -m osxcollector.output_filters.opendns.related_domains | \ python -m osxcollector.output_filters.opendns.lookup_domains | \ python -m osxcollector.output_filters.virustotal.lookup_domains | \ python -m osxcollector.output_filters.chrome_history | \ python -m osxcollector.output_filters.firefox_history | \ tee analyze_SlickApocalypse.json | \ jq 'select(false == has("osxcollector_shadowserver")) | select(has("osxcollector_vthash") or has("osxcollector_vtdomain") or has("osxcollector_opendns") or has("osxcollector_blacklist") or has("osxcollector_related"))'

and then letting a wise-cracking analyst explain the results to you. The Very Readable Output Bot even suggests new values to add to your blacklists.

This thing is the real deal and our analysts don’t even look at OSXCollector output until after they’ve run the AnalyzeFilter.

Give It a Try

The code for OSXCollector is available on GitHub - https://github.com/Yelp/osxcollector. If you’d like to talk more about OS X disk forensics feel free to reach out to me on Twitter at @c0wl.

Back to blog

Yelp

Engineering