At Yelp, we have a reasonably large Android community for a company of Yelp’s size. These talented and skilled Android engineers work on Yelp’s client and business applications. We would like to share some of the unique challenges that we’ve experienced along with our various efforts to overcome those challenges.

Analytics Infra is a team at Yelp that works on experimentation and logging platforms and supports them across the entire Yelp ecosystem. Within the Analytics Infra team, we have an Android working group. You may consider our team as an infrastructure team - a team that implements end-user functionality - except that our users are actually developers. While other teams are improving Yelp’s user experience, our goal is to make Yelp’s developers’ lives better. These technologies are important to other teams because they help them better estimate their projects and features. You might be surprised to learn that there are only two engineers on the Analytics Infra team who support Android development, which is why we wanted to share our story with you.

Sure enough, the mobile developers in Analytics Infra support mobile analytics technologies. We own two major SDKs: experimentation and logging. Besides these projects, we have a few smaller ones but they are out of the scope of this article.

Our clients mainly comprise of all the feature teams at Yelp that develop on Android and want to be able to launch a new feature to the users of our applications. They leverage the tools we build to understand the cause and effect of the changes they implement, which allows them to strategically analyze their features’ data and ultimately provide the best experience to our users.

Fig 1. Overview of how the client library uses our Logging & Experimentation SDK to publish and analyze data

Hopefully, this introduction gives you a basic understanding of what we do at Yelp. Let’s move forward and discuss some interesting challenges that we face every day.

We don’t have a UI.

Yeah, not at all! Our projects are just a bunch of APIs that are available to our clients as interfaces and standalone methods. This is super interesting but sometimes can pose a challenge as you can’t “touch” the results of your work. While being able to work with a good UI is always beneficial for developers, the scope of creating a UI to be able to develop SDKs for our team is far away from our OKR goals. As infra engineers, we have adapted to this way of development without a UI but are able to produce quality work. We believe that backend development has a similar issue. As it turns out, this problem leads us to another challenge.

We can’t validate the results of our work.

Well, we’re exaggerating a bit…we can validate results but the process is quite sophisticated. We can write unit tests and, technically, create an experiment and analyze the results or use any other strategic approach that requires conducting experiments, but the process can take a long time.

Our features often do not have effects that can be measured through an experiment, but instead, affect the mechanisms of experimentation itself. To put it simply, you cannot exactly use a tool to test the tool itself! Additionally, running experiments internally to test our platform, while other teams are using the platform for their experiments, must be done with care to avoid unexpected problems. This includes things like crash loops or invalidating experiment results for experiments that have been created by engineers outside of our team. While we could probably put our system “under maintenance” to test it, that would cause a disruption and delay the results for so many ongoing external projects.

Regardless of these challenges, there are still a few reliable ways for us to test new features. For example, we can implement these features inside of a sample app (a small sandbox where we can try to simulate the processes from the real app). However, the main and the safest option is to slowly roll out changes and prove implicitly that our code is working, through methods such as analyzing the data from the existing experiments to detect outliers, or keeping track of crashes and runtime issues. Usually it takes much more time to completely roll out our features than it takes for an average feature.

Fig 2. Experimentation and logging platforms inside clients’ lifecycle

Another interesting aspect of developing such a system is that our SDKs are core dependencies.

Another interesting aspect of developing such a system is that our SDKs are core dependencies. For us, this means that we can’t depend on some code inside client repositories. For example, we can’t reuse some basic internal Yelp modules with networking code as these modules also depend on ours, leading to circular dependencies. Moreover, the platforms should be as lightweight and independent as possible to keep the method count relatively small. So while we can reuse some code, we should be 100% sure that some modules will never rely on us. Unfortunately, nobody can guarantee that.

You may think that it is so frustrating to not have any code from clients with basic infrastructure, but contrary to this problem, we are free to choose our technologies and build our stack from scratch. But don’t think that we are using some obsolete or overly-simple Android tools! We are still moving according to modern requirements. For example, we have Kotlin, Koin, and Retrofit, but are not limited to these technologies.

The first thing that happens in the application is the launch of our SDKs.

This is the trickiest part of the whole development process. If the application crashes it leads to a start-up crash in 100% of the devices that have this broken version. The application will not have a chance to recover from this error and prevent it from happening, and this is why we have to be extra careful when rolling out any features. The only way to fix this kind of problem would be for the user to update the version of the application to the latest version.

Fig 3. Our SDK launches as soon as the user opens the app

Onpoint weeks can get long and tiring.

Plenty of different teams interact with our SDKs and it can raise a lot of questions and confusion. While we try our best to keep all our documentation up-to-date, it is a complex system that requires a little bit of understanding. Our onpoint faces multiple queries ranging from “How to start..?” to “My data is not showing up how I expected it to”. Usually, onpoint questions require a lot of expertise across the whole experimentation and logging infrastructure even if the questions are about mobile issues. To overcome this challenge, we are currently in the process of revamping our onpoint responsibilities to make the burden easier on our team as well as on Yelpers who are curious to know more about our tools. We also try to participate more in deep dives and product presentations across other teams to better impart the knowledge we have about the tools we own.

We have learned that regardless of how unique our team is and how unique the challenges that we face are, our team plays a very important role in the development process at Yelp. While we may not be developing classic Android components or UI for user-facing features, we are indeed Android Developers with crucial responsibilities. The logging we provide is an essential step in troubleshooting and analyzing results. These logs may contain important information that can be useful to the developers to better understand the application as well as the impact of their changes. Similarly, experimentation helps developers to test out different cohorts to identify which version performs better and appeals more to the users. This ensures that we always optimize the quality of our app.

Acknowledgements

Thanks to Brad Anderson, Joe Lagomarsino, Soorya Krishna Pillai for technical review and editing the blog post!

Become a Software Engineer at Yelp

Want to join a team where your work makes a significant impact? Yelp is the richest source of local data in the world, and we get to build all that content into location-aware mobile apps.

View Job

Back to blog