« Calling All Data Miners! | Main | Mission: Mobile Makeover »


Output Filtering Failure

About a month ago, we launched m.yelp.com specifically targeting iPhone, Android, and WebKit-based smart phones. Our engineering team pushes code live on average three times a day. Moving fast means we need to have sound engineering practices internally — such as code reviews by peers and automated testing tools such as unit-tests and static analysis — to catch mistakes before they happen. In this blog post we will detail a misstep that we made and the response that followed.

On the morning of October 27, 2011, we were alerted by a team of researchers at Harvard, Yale, and Boston University that they had found a servlet on m.yelp.com that could expose private user information. Working jointly with this team, our engineers gained a full understanding of the exposure. The leak allowed clients to see a JSON dictionary with user-specific fields, including email address, birth date, gender, and full names. No financially sensitive information was exposed.

Once we understood the risk to our users we immediately took the mobile site down. We resolved the issue within an hour, but kept the site down for 12 more while we double- and triple-checked for other issues (none were found). We analyzed the servlet’s access logs to see if anyone exploited the hole, but we did not find any evidence that user information had actually been collected. We also created an automated system to detect sensitively named fields (last_name, birthdate, etc..) being sent to clients. Following this work, we felt comfortable that the risk of a future exposure of this type had been mitigated; so we turned the mobile site back on.

The servlet at issue was using an ORM to retrieve information from the database. An ORM (object-to-relational mapper) system that automatically creates objects containing database content may make it easy to access that data, but it also introduces risk if those objects are not sanitized before being passed across a trust boundary. In this case, we missed a sanitation step in a servlet when grabbing data from the ORM.

Our python logic for the biz details servlet looks like the following:
def reviews(self, businesss_id):
     “““unsafe version of reviews()“““
    reviews = self.get_business_reviews(business_id)
    return json.write(
        # UNSAFE CODE: unsanitized data going to the
        # client DANGER!
        {'reviews': reviews}

In the code above, the call get_business_reviews() under the covers returns data about reviews, including details of the users who wrote the reviews. This is where the private information on the user is requested.

To illustrate, here is an example JSON response from the offending servlet:
{'reviews' : [...
           'user': {'birthdate': '1971,01,01',
                    'display_name': 'Art G.',
                    'first_name': 'Art',
                    'gender': 'm',
                    'last_name' : 'Goldfin', 
            }, …]
Note: birthdate, gender, and last_name are all private fields that shouldn’t have been returned in the JSON response.

The above servlet function reviews() was rewritten to (approximately this):
def reviews(self, business_id):
    “““output filtered version of reviews()“““
    reviews = self.get_business_reviews(business_id)
    # Filter reviews to contain only data necessary
    # for frontend
    safe_reviews = filter_for_frontend(reviews)

    return json.write({'reviews': safe_reviews})

In the second implementation of the reviews() function you can see that the reviews object is transformed by filter_for_frontend() before being written into the JSON response.  We’ve also modified json.write() to have a list of sensitive fields that will throw an exception if an engineer tries to pass a field with a restricted name like last_name, birthdate, etc.. With these protections in place, we’re well protected from this type of exposure in the future.

We’d like to thank researchers at Harvard, Yale, and Boston University:

We appreciate the team’s diligence in finding and notifying us about this important problem; their thoughtful handling of a sensitive and tricky security situation is commendable. If you do find any security-related issues on Yelp, please send an email to security-abuse@yelp.com.

Yelp’s Engineering team is committed to excellence in engineering and data security; this incident was responded to with full force. Keeping user data safe is a top priority for Yelp, and we’ve taken concrete steps in response to this incident to make sure that it will not happen again.