August 1st, 2014

The Yelp Dataset Challenge Goes International! New Data, New Cities, Open to Students Worldwide!

The Challenge

The Yelp Dataset Challenge provides the academic community with a real-world dataset over which to apply their research. We encourage students to take advantage of this wealth of data to develop and extend their own research in data science and machine learning. Students who submit their research are eligible for cash awards and incentives for publishing and presenting their findings.

The most recent Yelp Dataset Challenge (our third round) opened in February 2014, giving students access to our Phoenix Academic Dataset, with reviews and businesses from the greater Phoenix metro area. In the fourth round, open now, we are expanding the dataset to include data from four new cities from around the world. We are also opening up the challenge to international students, see the terms and conditions for more information.

New Data

We are proud to announce that we are extending the popular Phoenix Academic Dataset to include four new cities! By adding a diverse set of cities we hope to encourage students to compare and contrast the different aspects of each city and find new insights about what makes each city unique. The dataset is comprised of reviews, businesses and user information from:

  • Phoenix, AZ
  • Las Vegas, NV (new!)
  • Madison, WI (new!)
  • Waterloo, CAN (new!)
  • Edinburgh, UK (new!)

This new dataset increases the data included in the previous Phoenix Academic Dataset with the following new data and is available for immediate download:

  • Businesses – 42,153 (+26,568 new businesses!)
  • Business Attributes – 320,002 (+208,441 new attributes!)
  • Check-in Sets – 31,617 (+20,183 new check-in sets!)
  • Tips – 403,210 (+289,217 new tips!)
  • Users – 252,898 (+182,081 new users!)
  • User Connections – 955,999 (+804,482 new edges!)
  • Reviews – 1,125,458 (+790,436 new reviews!)

Round 4 is Now Live

Along with the updated dataset, we’re also happy to announce the next iteration of the Yelp Dataset Challenge. The challenge will be open to students around the world and will run from August 1st, 2014 to December 31, 2014. See the website for the full terms and conditions. This data can be used to train a myriad of models and extend research in many fields. So download the dataset now and start using this real-world dataset right away!

July 24th, 2014

Introducing MOE: Metric Optimization Engine; a new open source, machine learning service for optimal experiment design

At Yelp we run a lot of A/B tests. By constantly trying new features and testing their impact, we are able to continue evolving our products and make them as useful as possible. However, running online A/B tests can be expensive (in opportunity cost, user experience, or revenue) and time consuming (to achieve statistical significance).

Furthermore, many A/B tests boil down to parameter selection (more of an A/A’ test, where a feature stays the same, and only the parameters change). Given a feature, we want to find the optimal configuration values for the constants and hyperparameters of the feature as quickly as possible. This can be analytically impossible for many systems. We need to treat these systems like black boxes where we can observe only the input and output. We want some combination of metrics (the objective function) to go up or down, but we need to run expensive, time consuming experiments to sample this function for each set of parameters.

MOE, the Metric Optimization Engine, is an open source, machine learning tool for solving these global, black box optimization problems in an optimal way. MOE implements several algorithms from the field of Bayesian Global Optimization. It solves the problem of finding optimal parameters by building and fitting a model of the objective function given historical information using Gaussian Processes. MOE then finds and returns the point(s) of highest expected improvement. These are the points that will have the highest expected gain over the best historical samples seen so far. For more information see the documentation and examples.

Here are some examples of when you could use MOE:

  • Optimizing a system's click-through rate (CTR). MOE is useful when evaluating CTR requires running an A/B test on real user traffic, and getting statistically significant results requires running this test for a substantial amount of time (hours, days, or even weeks). Examples include setting distance thresholds, ad unit properties, or internal configuration values.

  • Optimizing tunable parameters of a machine-learning prediction method. MOE can be used when calculating the prediction error for one choice of the parameters takes a long time, which might happen because the prediction method is complex and takes a long time to train, or because the data used to evaluate the error is huge. Examples include deep learning methods or hyperparameters of features in logistic regression.

  • Optimizing the design of an engineering system. MOE helps when evaluating a design requires running a complex physics-based numerical simulation on a supercomputer. Examples include designing and modeling airplanes, the traffic network of a city, a combustion engine, or a hospital.

  • Optimizing the parameters of a real-world experiment. MOE can help guide design when every experiment needs to be physically created in a lab or very few experiments can be run in parallel. Examples include chemistry, biology, or physics experiments or a drug trial.

We want to collect information about the system as efficiently as possible, while finding the optimal set of parameters in as few attempts as possible. We want to find the best trade-off between gaining new information about the problem (exploration) and using the information we already have (exploitation). This is an application of optimal learning. MOE uses techniques from this field to solve this problem in an optimal way.

MOE provides REST, Python and C++ interfaces. A MOE server can be spun up within a Docker container in minutes. The black box nature of MOE allows it to optimize any number of systems, requiring no internal knowledge or access. By using MOE to inform parameter exploration of a time consuming process like running A/B tests, performing expensive batch simulations, or tuning costly models, you can optimally find the next best set of parameters to sample, given any objective function. MOE can also help find optimal parameters for heuristic thresholds and configuration values in any system. See the examples for more information.

MOE is available on GitHub. Try it out and let us know what you think at opensource+moe@yelp.com. If you have any issues please tell us about them along with any cool applications you find for it!

May 12th, 2014

May the Yelps be with You

May brings us talks from the Python meetup, another mind-blowing talk from Designers + Geeks, and a talk from Products That Count on brand naming. I’m also excited to give you a sneak peak into June: we’re hosting our second annual WWDC after party. We promise this after party will be so good, you won’t want to leave for the hotel lobby (even if R. Kelly himself invites you).

For our Pythonista readers, let’s take a deeper look into the the upcoming Python meetup. Packaging turns out to be an important part of any language. Great packaging encourages language adoption, focuses the community on 1-2 of the best solutions to a problem, and encourages modular design. But it’s also a surprisingly tricky problem: support for a variety of OS, interactions with compiled libraries, organization of namespaces, programmatic specification of dependencies, and discovery and documentation are just some of the problems that need to be tackled. We haven’t even covered the difference between installing packages “locally” vs system wide, and the implications for deploying a set of packages!

Luckily, next week Noah Kantrowitz is going to help us sort through these issues with two presentations covering Python packaging and deployments. In between the main talks, we’ll see lightning talks and get a chance to mingle and ask questions. Hope you can join us!

Mark your calendars now for next month’s WWDC: Yelp is opening our doors for an after party to top them all! Meet some of the cool cats who work on our award-winning iPhone app and get an inside look at Yelp life. There will be plenty of 5-star hors d’oeuvres and wine served from our own customized barrels! Please RSVP here and don’t forget to bring your conference badge.

April 29th, 2014

More Yelp in your Ruby

What’s that you say? Your Ruby applications are feeling a little empty and meaningless without Yelp data to make them shine? Well, consider your prayers answered because today we’re launching our official Ruby Gem for interfacing with the Yelp API!

Find the gem online now, and start using Yelp data in your Ruby applications with ease. Install the gem with bundle by adding ‘yelp’ to your Gemfile, or without a Gemfile by running gem install yelp.

Your first step on the road to Yelp data awesomeness is to register and get API keys from our developer site. Next, create a new client with your API keys and use that to make requests against the API.

client = Yelp::Client.new({ consumer_key: YOUR_CONSUMER_KEY,
                            consumer_secret: YOUR_CONSUMER_SECRET,
                            token: YOUR_TOKEN,
                            token_secret: YOUR_TOKEN_SECRET
                         })

Integrated into the gem is our Search API, which allows you to to find businesses based on the location provided and any search terms given:

results = client.search(‘San Francisco’, { term: ‘restaurants’ })
results.businesses # => [<..., name = 'Gary Danko', …>, <..., name = 'Little Delhi', …>, …]

Additionally, the Business API allows for retrieval of more information on specific businesses using the business id returned from the search:

business = client.business(‘gary-danko-san-francisco’)
business.name # => “Gary Danko”
business.rating # => 4.5
business.review_count # => 3701

Yelp + Rails
If you’re still fairly new to Ruby, want to work on a Rails app, and still want to use the gem then we’ve got a solution for you! We’ve created a small sample application to demo the gem and we’ve open sourced that as well.

All you need to do is create a file inside of config/initializers with a configuration block that sets your API keys:

inside config/initializers/yelp.rb
Yelp.client.configure do |config|
  config.consumer_key = YOUR_CONSUMER_KEY
  config.consumer_secret = YOUR_CONSUMER_SECRET
  config.token = YOUR_TOKEN
  config.token_secret = YOUR_TOKEN_SECRET
end

Here, we’re using a configuration block to tell Yelp what your API keys is. This will automatically create a client instance for you that lives inside of Yelp.client, allowing you to use your client instance over and over again in different places throughout your application.

Inside of your controller you can call Yelp.client with one of the API methods

inside app/controllers/home_controller.rb
class HomeController < ApplicationController
  # …

  def search
    parameters = { term: params[:term], limit: 16 }
    render json: Yelp.client.search(‘San Francisco’, parameters)
  end
end

And that’s it! You’re now able to use the Yelp gem throughout your Rails application. Using the gem should work similarly in other Ruby applications.

Head over to GitHub to check out the source code for the gem and the example app, and find more information on how to use the gem with the readme and documentation.

We can’t wait to see what you create. Learn more about the Yelp API at yelp.com/developers and make sure to share your apps with us!

Please remember to read and follow the Terms of Use and display requirements before creating your applications.

If you want to work with our API team (and why wouldn’t you?!), check out yelp.com/careers for available positions.

April 7th, 2014

April Showers Bring More Talks to Yelp

Yelp is hosting many exciting talks this month! Let’s take a closer look at two, but be sure to check out all of the links below to see what’s in the pipeline.

What does it take to monitor an architecture that is continually in flux and being changed by hundreds of developers? A seriously flexible monitoring framework: Sensu! A resident Yelp Site Reliability Engineer, Kyle Anderson, will be giving a tour of how Sensu works at Yelp. He’ll be explaining how Sensu hooks into the Yelp SOA architecture to empower individual development teams to monitor their own services in real time. He’ll also be discussing the difficulty of monitoring servers in AWS, and how Sensu can be used to track them.

Yelp hosts this talk, put on by San Francisco Dev Ops, tomorrow April 8th!

Next up is a talk about RxJava. Functional reactive programming is an interesting, relatively new paradigm that models application state as a set of dependencies updated via a stream of data. As a concrete example, one option of building a UI that depends on mouse clicks is to write a callback function. When the mouse is clicked, the function is called to do some processing and optionally update the UI. However, callbacks can quickly get complicated, especially when they involve multiple threads that may interact with each other.

The reactive alternative treats all mouse clicks as a iterable stream of events (Observable, in Rx jargon). An Observer can apply functional constructs — such as filter, map, or zip — to the stream of clicks and can in turn update other Observable objects. For examples involving real code, check out the blog post by Netflix.

Sound cool? Then come check out Greg Benson, a software engineer on the Android @ Netflix team, as he presents on incorporating reactive programming into your Android apps with RxJava. Sign up for this April 10th presentation at San Francisco Android Livecode!

  • Tuesday, April 8, 2014 – 6:00PM – Sensu @ Yelp: A Guided Tour (San Francisco Dev Ops)
  • Wednesday, April 9, 2014 – 6:00PM – Java 8 Lambda Expressions & Streams (Java User Group)
  • Thursday, April 10, 2014 – 7:00PM – Reactive Programming on Android Using RxJava with Greg Benson! (San Francisco Android Livecode)
  • Thursday, April 17, 2014 – 6:30PM – Working Hard, or Artily Twerking Kongsciousness (Designers + Geeks)
  • Tuesday, April 22, 2014 – 6:30PM – Women Who Code: Lightning Talks (Women Who Code)
  • Wednesday, April 23, 2014 – 6:45PM – Best Practices For Lean User Acquisition and Engagement (Products That Count)
  • Thursday, April 24, 2014 – Large-Scale Machine Learning with Apache Spark (SF Machine Learning)
  • Wednesday, April 30, 2014 – 6:15PM- Django 1.7 and You (Django Meetup Group)