Engineering Blog

July 2nd, 2015

Yelp API Now Returns Action Links

Savvy developers like you already know that the Yelp API is the best place to get information on local businesses. Today, as a part of our on-going integration with our friends at both Yelp SeatMe and Yelp Eat24, we are excited to announce an additional feature of the API: “Action Links.”

Action Links will allow your users to directly make a reservation or even start a food order for delivery or pickup from wherever you’re displaying Yelp data.

To see this in action (pardon the pun), next time you use the Search endpoint or the Business endpoint, simply pass in the optional “actionlinks=true” parameter. When requested, action links will be returned in fields called reservation_url and eat24_url with the response for businesses that support them. Checkout the handy API documentation for more details.

You can use these links within your application to launch the Yelp Eat24 ordering experience or into the Yelp SeatMe reservation experience.

Ordering Food on Yelp Eat24

Ordering Food on Yelp Eat24

Making A Yelp SeatMe Reservation

Making A Yelp SeatMe Reservation

Go ahead, give it a whirl and let us know what you think. We love hearing from fellow developers, so share your cool creations with us on Twitter via @YelpEngineering or write us at api@yelp.com and stay tuned for even more API goodies coming your way.

June 30th, 2015

New Yelp API Console: Life’s Easier in v2

Yelp’s API allows any developer to build rich user experiences by integrating Yelp’s local business information, reviews, and pictures into their web and mobile applications. We announced in January that we have ambitious plans to make it easier than ever for developers to integrate a local layer into their apps.

Today, we’re excited to present one of these efforts: a brand new API console that allows developers to explore the detail and depth of responses returned by the Yelp API, without having to write a single line of code.

Sign up for a free account, create your API v2 credentials, and check it out on the Yelp Developer Site.

image00

API V1 End of Life

As we continue to invest in Version 2 of the API, we will be discontinuing the previously deprecated Version 1 of the API. All v1 API endpoints will be shut down on July 15, 2015. If you are currently using any v1 API endpoints you can find out how to migrate to v2 endpoints on the Yelp developer site.

Build Your Yelp API Powered App Today

Finally, if this has got you hungry to build something, you can head over to our Github Repo to find examples and libraries to get you underway with your Yelp API powered App.

We love hearing from fellow developers, so share your cool creations with us on Twitter via @YelpEngineering or write us at api@yelp.com and stay tuned for even more API goodies coming your way.

June 12th, 2015

Advanced UITableViews Made Simple: YLTableView

Let’s say you had a great new idea for a Todo app and set out to build an awesome iPhone application. After firing up your copy of Xcode, you’d probably start by creating a UITableView – a critical part of the iOS SDK that allows you to build scrolling views.

At Yelp, we’ve been playing around with different table view architectures for years. After a lot of trial and error, we’ve come up with a framework that easily scales from the simplest list all the way up to our business page. We’re open sourcing it today – check out YLTableView on GitHub!

The Activity Feed and the Business View both make use of table views

The Activity Feed and the Business View both make use of table views

A table view contains a number of sections, each of which can be separated by a small margin. Each section can contain an arbitrary number of cells. We use table views all over the Yelp app, in places like the business page and Activity Feed above. The feed contains a list of ‘items’ representing actions other users have taken – like posting photos or checking-in at a business. In the Activity Feed, each feed item is a single section made up of multiple cells.

A section (green) has multiple cells (blue)

A section (green) has multiple cells (blue)

To implement something like the Activity Feed, you need to tell UITableView about the sections and cells in your table view by implementing a few methods from the UITableViewDataSource protocol:

However, you’re not done yet – if you want to have section headers or make your cells do something when tapped, you’ll need to implement some of the UITableViewDelegate protocol. The two protocols have a lot of overlap and it can get confusing really quickly. When trying to figure all of this out, you might come across UITableViewController, and end up with an architecture that looks like this:

This might work for a while, but the view controller is going to get complex very quickly since it will violate the single responsibility principle. The view controller has to deal with loading content, micromanaging the table view cells, and pushing on new view controllers for cells. It might work for a small table view, but it’ll get complicated very quickly. Enter: YLTableview.

At it’s core, YLTableView works with a series of models. Each cell expects a given type of model and, once it’s given that model, is able to configure itself. Separating out the model helps encourage modularity through your application, and makes it easy to build interfaces out of common components.

To support the models, we’ve created YLTableViewDataSource which implements both UITableViewDelegate and UITableViewDataSource. The two protocols typically need the same data, and we’ve found that combining them helps simplify your code. To use YLTableViewDataSource, make a subclass and implement a few easy methods:

YLTableViewDataSource will then take care of creating the cell, configuring it with your model, and even telling the table view how tall it should be. When using YLTableView, your architecture will look a bit like this:

image03

Unlike the UITableViewController design we looked at earlier, this architecture does a great job of separating out responsibilities. Your view controller loads content – like reviews or businesses – and passes them off to the data source. The data source then turns that content into table view models and displays them as cells. Before telling the view controller to take an action, the data source will translate the cell models back into content to pass to the view controller. The view controller doesn’t need to know anything about the models or cells: you could change your entire UI and it’s underlying data implementation without changing the logic in the view controller.

As we became better and better at building table views, we started to tackle some more complex cells. Take a look at this cell from the Activity Feed:

This cell has a swipeable photos cell. Swiping back and forth will reveal and load in more photos, while tapping on a photo will make it full-screen. As you can imagine, this is pretty tricky to do!

Our first approach to enabling complex cells like this was to have the cell delegate back to the data source, and the data source would delegate up to the view controller. This prevented the photos cell from being modular – every time you wanted to use it, you had to duplicate the entire chain of delegates. We knew there had to be a better way.

After thinking about it for a while, we decided to try creating child view controllers for our table view cells. Child view controllers allow you to add a new view controller to manage a subsection of your view. They behave like real view controllers and can push new view controllers onto the stack.

Normal table views don’t support child view controllers, but we figured out how to do it with YLTableView. Simply have your cell implement YLTableViewCellChildViewController and set the parent view controller property on your data source. Then, instead of having to deal with the mess of delegates, you have a much simpler view controller:

In addition to having a simpler architecture, this cell is now significantly more reusable. We can use it directly on the business page without having to duplicate any code. Taking it a step further, we can even use the PhotosViewController by itself. A little bit of configuration later, and we can now use the same view controller all over the app.

The cell’s child view controller can also be reused by itself.

The cell’s child view controller can also be reused by itself.

YLTableView has helped us simplify our architecture through the app. Using models has allowed us to build up a system of reusable components, making it easy to build interfaces out of common components. Using child view controllers has made this even more powerful, letting us reuse entire view controllers in table view cells. And, as more and more of our app is written with YLTableView, we’ve found that having a consistent architecture gives developers a base level of familiarity with features the haven’t worked on before.

Give YLTableView a shot, and let us know what you think!

June 1st, 2015

Things To Do Outside of WWDC

screenshot20150507at5.53.01pm

WWDC is coming up soon and it’s going to be a blast! We’ve got our Apple Watch (with our lovely Yelp App installed, of course!) and are ready to party. To celebrate, we’re hosting our third annual WWDC party on June 8 and raffling off an Apple Watch (so you can use our app too)!

While you’re in town for WWDC, make sure to catch some of the other great meetups we’ve got lined up. We’ll also be speaking at a Women in Tech event at Alpine Labs and at a DBA Happy Hour co-hosted with Box!

We’re also really looking forward to hearing presentations by the talented girls of Technovation. We’ll help judge their World Pitch competition where girls ranging from the ages of 10-18 partner up to develop ways of incorporating technology into their everyday lives. The top 10 finalist teams from around the world will be visiting us, competing for the chance to win $10,000 towards their project. Come meet the girls and support their hard work and dedication!

  • Thursday, June 4, 2014 – 6:00PM – Introduction to Functional Reactive Programming on Android (SF Android User Group)
  • Monday, June 8, 2015 – 6:30PM – Yelp WWDC Afterparty (Yelp Engineering)
  • Tuesday, June 9, 2015 – 6:00PM – How to Design Habit-Forming Products Workshop (Nir Eyal)
  • Wednesday, June 10, 2015 – 6:15PM – Learn about the inner workings of the Internet and Twisted (SF Python)
  • Thursday, June 11, 2015 – 6:00PM – Interactive Data Science and Sharing with Jupyter and IPython (SF Big Analytics)
  • Thursday, June 18, 2015 – 6:30PM – Failure and Success (Designers + Geeks)
  • Tuesday, June 23, 2015 – 6:45PM – Bigger Data, Bigger Impact (Products That Count)
  • Wednesday, June 24, 2015 – 5:30PM – World Pitch Competition (Technovation)
May 27th, 2015

Seeing Double On Yelp

Being able to easily find what you want on Yelp is a critical part in ensuring the best user experience. One thing that can negatively affect that experience is displaying duplicate business listings in search results, and if you use Yelp often enough, you might have run into duplicate listings yourself.

We constantly receive new business information from a variety of sources including external partners, business owners, and Yelp users. It isn’t always easy to tie different updates from different sources to the same business listing, so we sometimes mistakenly generate duplicates. Duplicates are especially bad when both listings have user-generated content as they lead to user confusion over which page is the “right” one to add a review or check-in to.

image01

The problem of detecting and merging duplicates isn’t trivial. Merging two businesses involves moving and destroying information from multiple tables which is difficult for us to undo without significant manual effort. A pair of businesses can have slightly different names, categories, and addresses while still being duplicates, so trying to be safe by only merging exact matches isn’t good enough. On the other hand, using simple text similarity measures generates a lot of false positives by misclassifying cases like:

Business Match

The first step in our deduplication system is our Business Match service. Using a wrapper over Apache Kafka, every time a new business is created or a business’s attribute is changed, a batch that consumes messages published to the new_business and business_changed topics calls Business Match to find any potential duplicates of the affected business above a relatively low confidence threshold. Business Match works by taking partial business information (such as name, location, and phone) as input, querying Elasticsearch, reranking the results with a learned model, and returning any businesses that the model scores above a scoring threshold. If Business Match returns any results, the business pairs are added to a table of potential duplicates.

Our User Operations team is responsible for going through pairs of businesses in the table and either merging them or marking them as not duplicates. However, the rate at which duplicates are added to the queue far outpaces the rate that humans can manually verify them which motivated us to develop a high-precision duplicate business classifier that would allow us to automatically merge duplicate pairs of businesses.

Getting Labelled Data

In order for our classifier to work, we needed to get thousands of instances of correctly labelled training data. For this, we sampled rows from our business duplicate table and created Crowdflower tasks to get crowdsourced labelings. We’ve launched public tasks as well as internal-only tasks for our User Operations team which let us easily create a gold dataset of thousands of accurately labelled business pairs. In the future, we are planning on trying an active learning approach where only data that our classifier scores with low confidence is sent to Crowdflower, which would minimize the amount of necessary human effort and allow our classifier to reach a high accuracy with a minimal number of training instances.

image00

Features

Our classifier takes as input a pair of businesses and generates features based on analyzing and comparing the business fields. It uses the same model (scikit_learn’s Random Forests) and many of the same basic features as Business Match like geographical distance, synonym-aware field matching, and edit distance / Jaccard similarity on text fields. In order to capture the kinds of false positives described earlier, we also added two intermediate classifiers whose output was used as features for the final classifier.

We created a named entity recognizer to detect and label business names that indicate a person (e.g. lawyers, doctors, real estate agents) in order to detect the differences between a professional and their practice or two professionals working at the same practice.

Another feature we added is a logistic regression classifier that works by running a word aligner on both business names, finding which terms occur on one or both business names, and determining how discriminative the similarities and differences between the two names are. It outputs a confidence score, the number of uncommon words that appeared in one name but not the other, and the number of uncommon words that appeared in both names, which are used as features in the duplicate classifier.

image02

Evaluation

Since merges are hard to undo, false positives are costly so the focus of our classifier was on precision rather than recall. Our main evaluation metric was F0.1 score, which treats precision as 10 times more important than recall. With all of our classifier’s features, we achieved a F0.1 score of 0.966 (99.1% precision, 27.7% recall) on a held-out data set, compared to a baseline F0.1 = 0.915 (97.1% precision, 13.4% recall) for the strategy of only merging exact (name/address) matches and F0.1 = 0.9415 (96.6% precision, 26.4% recall) using only the basic Business Match feature set.

Future Work

With the work done on our duplicate classifier and automatic business merging, we’ve been able to merge over 500,000 duplicate listings. However, there’s still room for improvements on deduplication. Some things slated for future work are:

  • language and geographical area-specific features
  • focusing deduplication efforts on high-impact duplicates (based on number of search result impressions)
  • extracting our named entity and discriminative word classifiers into libraries for use in other projects

With the improvements to our classifier, we hope to be able to detect merge all high confidence duplicate business listings and minimize the necessary amount of human intervention.