Keeping Yelp two steps ahead: How we built GSET to protect employee email
-
Jose Martin de Vidales Biurrun, Application Engineer
- Dec 20, 2017
Earlier this year, Gmail users across the globe were affected by one of the largest phishing attacks of its kind. Yelp emails were among the many corporate email systems that experienced this Google Docs phishing attack. Fortunately, our security engineers had already prepared for this level of security threat and were able to delete the suspicious emails before impacting employees.
As phishing attacks have become more and more prevalent, the need for new tools and countermeasures to protect users has become more important than ever. According to the last IBM X-Force Threat Intelligence Index report, the amount of spam email increased 450% over the past two years (as shown by the graph below). More worrisome is the potential annual cost to a company affected by a phishing attack, estimated at $3.7 million.
Image source: IBM
Most companies offering SaaS solutions are delivering products that are capable of combating these types of attacks, but more often than not, they don’t cover basic cases that are critical to keeping employee accounts safe. The Google Docs phishing attack mentioned above is a good example of how out-of-the-box SaaS solutions often fall short. While Gmail has multiple anti-virus and phishing tools to prevent attacks from spreading, there are no options to mask or mass delete unwanted emails that have already landed in an employee’s mailbox.
We built GSET (G Suite Email Terminator), an internal tool developed by our Corporate Application Engineering team, to give us the ability to mass delete thousands of emails with just two clicks from our Gmail corporate accounts. Because we were ready for it, we could quickly and easily protect our employees from vulnerabilities like this year’s earlier Google Docs phishing attack.
Here is a screenshot of the GSET interface:
GSET is built on multiple technologies including Python, Flask, Celery and PaaSTA. With these technologies, the tool the can perform three critical actions:
- Execute a search query using Gmail search operators and generate a report based on that query
- Delete, hide, or quarantine any emails matching a specific query
- Log actions taken to create an audit trail
Technologies
GSET uses a G Suite service account with domain-wide access for authentication purposes. This allows GSET to generate OAuth 2.0 tokens and act on behalf of users when using the Gmail API. Once GSET has a valid token, it can interact with the API and access a user’s Gmail data. Service accounts with domain-wide access are particularly useful for enterprise environments where admins interact with most of Google APIs without a user’s interaction.
Another technology used by GSET is Celery, an asynchronous task queue/job queue based on distributed message passing. Celery’s potential is most evident through its interaction with Gmail’s API quota limits. In other words, every single time that a new report or action is made by our admins using GSET, it requires a large volume of transactions that can’t be processed instantly. These transactions need to be queued in order to comply with Gmail API quota limits. Using RabbitMQ, we can handle asynchronous tasks and queue processes in order to control the rate of requests the tool generates.
Now let’s go more in depth in how broker, queues, and workers come into play. Here is a diagram that shows how GSET manages multiprocessing and API rate control using queues:
Bringing it all together
To illustrate how these different technologies work together in GSET, let’s walk through a user flow:
- Our security team is notified when there’s a spike in phishing emails.
- Our security admin logs into GSET using single sign-on which transfers their identity credentials from our identity provider (Okta) to the service provider (GSET).
- Our security admins can take two actions:
- Create a query for our entire G Suite domain to find a specific phishing email.
- Mass delete emails found in a previous query.
- In this case, they run a new query based on the pattern detected in the reported phishing emails.
- The query is logged into S3 and Scribe and then ingested into Splunk for audit purposes.
- This query task is then pushed to the broker.
- As the query runs, the task will be placed on the search queue. The task is placed in one of three different queues (search/deletion/global_search). The reason you need different queues (search vs. delete) is that the number of quota units consumed by each worker varies depending on the method called (i.e. messages.list vs. messages.delete). You can find more information about Gmail API quota limits here.
- Each worker will then consume from one of the three queues. This helps us maintain a good ratio of API calls per second and function within the Gmail API quota limits.
- Once the query task is complete, GSET pulls a report and emails it to the security admin.
- The security admin can then act on the findings of GSET.
- In this case, the security admin will mass delete these emails based on the query task ID that just ran.
By leveraging the APIs of our existing tools, like we did with GSET, we can keep Yelp two steps ahead of the game. I hope you folks enjoyed this blog post and are as excited as we are about protecting people from phishing attacks!