Yelp is celebrating its 10th anniversary this year. That’s right, a decade of marvelous reviews from Yelpers all around the world. What better way to celebrate the big 1-0 than to build a tool that would take a sneak peak into over 61 million reviews from our community and let you discover real world trends in cities all over the globe?

Meet Yelp Trends. You might have seen it  hit  the  press  recently.

Yelp_Trends

As announced in our official blog, Yelp Trends is a fun way to visualize the review frequency of how often specific words are used in reviews and their development over the past 10 years. From popular trends in the culinary world to popular slang terms to what’s hot in fitness, users are encouraged to explore the world from the local communities’ perspective.

Yelp Trends started as a hackathon project when a group of engineers indexed words used in reviews into Elasticsearch. We leveraged its powerful facet queries in the backend and built the UI to graph the normalized review frequencies.

At Yelp, Elasticsearch is an important part of our search infrastructure, as it provides a robust, distributed platform that is easy to integrate with other services. On top of Elasticsearch we also have our own custom frameworks for creating clients as well as indexers called Apollo and ElasticIndexer, respectively.

Apollo (of course named after Apollo Creed) allows us to quickly build Elasticsearch clients with a fixed interface and also provides us with many default features such as monitoring and a managed infrastructure. Our Apollo client queries the reviews index and returns a JSON formatted time series of queries frequency.

What does a request out to Elasticsearch look like in Apollo? Not that much different than the regular JSON you would send but as Python data structures instead.

This sample shows how we built the request to correctly search through all of the indexed data for relevant results. We have three different filters here, all being selected with ‘and’ to make sure that each data point from Elasticsearch matches all three requirements: the city (San Francisco here), the restaurants, and the review language. These all get applied to the phrase we’re matching on, pizza.

After we have all of this data, we need to structure it in a manner that makes it easy for us to visualize. Using facets we’re able to have Elasticsearch format the data into an easily consumable format.

ElasticIndexer, the other framework that complements Apollo, is an indexing pipeline for loading Yelp data into Elasticsearch. Building a new index can take several days for some of our larger indices, so to help us avoid doing this, ElasticIndexer constantly monitors database tables for changes and re-indexes documents as they are modified or added. It also has the ability to determine field dependencies, which enables us to re-index only the fields that actually change when the database changes.

Leveraging both Apollo and ElasticIndexer, the hackathon project was at first branded Wordtime. We used AngularJS, D3.js, Rickshaw and adopted design templates from Bootstrap. Rickshaw provided the framework to display interactive graphs, drawn with SVG that are highly customizable and easily styled using the standard CSS techniques.

Yelp_Wordtime

It quickly became obvious that the tool is addictive and people enjoyed trying out new examples. That’s when some of the other teams learned about the project, with the anniversary in mind, we decided to productionize the tool.

Hackathon_To_Yelp_Trends

Wordtime originally supported English reviews only, while for Yelp Trends we wanted to add support for other review languages as well (through specialized search analysers). While we could have indexed the review text of reviews in other languages in the same field we use for English, this would not have worked well. Elasticsearch only allows a fixed analyzer per field and an English analyzer would not work well for a completely different language such as Japanese. We ended up adding new fields for each language and then reindexed the entire review corpus over a two day period. Our final reviews index grew to over 150GB but queries still only took under a second.

Yelp Trends is an inspiring tool! Give it a try yourself. We can’t wait to see what trends you will uncover in your city. And don’t forget to share your findings with us. Have fun!

Back to blog