In our previous blog post, we established a performance improvement lifecycle and explored the first step by defining what metrics we wanted to measure and establishing a baseline on which we can improve. In this blog post, we’ll cover how we improved our two metrics of initial render timings and scroll performance when rendering search results in the Yelp Android app.

Step 2: Making Performance Improvements

Not all changes we made were specific to only one of our goals. Often times, a change that was expected to contribute to one goal also had an effect on the other. We used a few different performance techniques to achieve these goals:

Offloading Work Onto Background Threads

The main thread (or UI thread) on Android is the most important thread with respect to performance. This is the thread that is in charge of executing drawing functions that render views to the screen. If this thread is busy doing other work, like setting up a network request to relay analytics to the backend, then it won’t be able to send new instructions to the GPU to update the display when that 16ms refresh mark rolls around.

It’s a good idea to offload any work that isn’t on the “critical path” to rendering pixels on the screen to another thread. The easiest way to identify the work that needs to be done before your view can be drawn is to look at the work that’s already being done on the main thread before your view is rendered. The Android Studio CPU profiler is a great tool to use for this and helped us identify many potential improvements. Once you’ve identified work that’s being done on the main thread in the critical path to rendering that doesn’t actually contribute to the final result, you can move it to another thread.

RxJava makes it incredibly easy to offload work to a different thread pool. Here’s how easy it is:

Observable.fromCallable(
    new Callable<Void>() {
        @Override
        public Void call() throws Exception {
            // Do work on io thread here.
        }
    })
    // Specify which thread the callable is to be executed on.
    .subscribeOn(Schedulers.io())
    // Specify which thread doOnNext is to be executed on.
    .observeOn(AndroidSchedulers.mainThread())
    .doOnNext(
        new Action1<Void>() {
            @Override
            public void call(Void aVoid) {
                // Update UI on main thread.
            }
        })
    .subscribe();

Even though it’s relatively simple to offload work to another thread, there is a tradeoff being made. It increases the complexity of your codebase since you now need to make sure that the work that’s being done off of the main thread is thread safe.

Asynchronous View Inflation

As client developers, one of the things we can take advantage of for performance gains is the time it takes for a request to resolve on the backend. All those milliseconds (or in some cases, just regular seconds) that we spend waiting for a response from the server is free time for us. We can use that time to prepare as much as possible for the arrival of the response and use pre-loading techniques to make the rendering process as quick as possible.

One of these techniques that we employed involved inflating the views used in the search list ahead of time as opposed to letting the layout manager inflate them on-the-fly. Normally, while waiting for data from the backend, no view inflation takes place. Once some data is returned and we add it to the list’s backing data adapter, then all of the views that would be initially visible in the viewport are inflated and bound to the data. Waiting on this inflation increases the amount of time perceived by the user for search results to load and be shown on screen. This also has an impact on scroll performance. As a user scrolls down the list, new views enter the viewport. If the list doesn’t have enough views already inflated and offscreen to be recycled for these new views, then more view inflation needs to occur which impacts scroll performance. This is an issue that affects both of our key performance metrics, but we can kill both of these proverbial birds with one stone by inflating our views ahead of time while we wait for the request!

We started by using Android’s LayoutInflater to inflate views without actually showing them to the user by setting “attachToRoot” to false in the inflate method. We then stored these views in manually managed view pools (lists of inflated views) that we could draw from at render time rather than rely on the list itself to inflate them. Unfortunately, the LayoutInflater inflates views on the main UI thread. This meant that even if we were inflating views ahead of time and solving our initial performance issues, we were still introducing dropped frames that were visible to the user as a noticeable stutter in the loading animation.

To fix this, we went one step further and offloaded the inflation of each view to another thread using the AsyncLayoutInflater (introduced in API 24). Each call to AsyncLayoutInflater’s inflate method inflates a view on a new thread, freeing up the main ui thread. Each time that this is done, a small overhead is incurred in scheduling that inflation work on a different thread. While it might be practical to do this for a one-off view inflation, creating view pools of multiple views with the AsyncLayoutInflater.inflate() means that the small overhead in passing work onto a new thread for each view eventually adds up and still causes performance issues on the main thread.

Instead of making multiple calls to the AsyncLayoutInflater on the main thread (one for each view in the view pool), we offloaded all of these calls to another thread with RxJava. Now we have one RxJava call to offload work to a new thread, keeping our main thread free while the AsyncLayoutInflater can inflate all the views we need it to. The view inflations now have little to no impact on our main ui thread during loading, rendering and recycling of search result views!

Now when the response is returned from the server, we can quickly bind the data to the views from the view pools and render them on screen instead of having to wait for the views to inflate before binding and rendering. This improvement alone saved us 50-80ms on initial rendering time of our search results.

Reducing Overdraw and Layout Hierarchy Depth

When it comes to performance, it’s important that you don’t do any work that you don’t have to do. Sounds pretty obvious right? The tricky part is figuring out what work you don’t actually have to do.

One example of duplicate work is painting white pixels on top of white pixels. If the background of the application is white by default, and each of our list items also has a white background, then we’re making the GPU do work that it doesn’t have to do. The more layers there are to paint on screen, the more work the GPU has to do as layers are painted from back to front. Each time the GPU has to draw over a layer that it has previously drawn before we incur a performance overhead. This overhead is known as overdraw.

On the left we can see the Yelp search page before any improvements were made to reduce overdraw. Most of the screen is some shade of color which means the GPU is doing extra work to render each frame. To render anything on screen, the GPU has to make at least one pass, which is demonstrated by the color blue. Two passes is green, three is pink and four or more is red.

To improve overdraw we removed the background color and drawables on many of our views. Just doing this got us 80% of the way to what you see on the right. The other 20% came from different tricks to avoid overlapping views. Instead of using a background drawable to add a border to each list cell and causing the GPU to pass over the entire cell, we removed the background entirely, allowing the white background of the app to show through and added a small 1dp view to act as the divider. When using a background drawable to add a border, even if only part of the drawable is visible and the rest is transparent, the GPU still needs to do a full pass over the area that the drawable is displayed, contributing to overdraw. By removing the background drawable, the GPU only needs to worry about rendering the view that represents the border, and not the entire background of the list item.

To make the measure and layout phase of the Android render pipeline more efficient, we also switched to use a ConstraintLayout for each list item. This change allowed us to flatten the view hierarchy and reduce the number of nested views as much as possible.

As compared to the other improvements we made as part of the effort to improve search performance on Android, improving overdraw and flattening the layout hierarchy seemed to have had very little effect on our performance metrics. We tried to measure them at a device level but found too much variance between test runs to claim it as a successful improvement. We expected to see more of an impact as reducing overdraw and layout hierarchy are commonly prescribed methods to improve performance in Android apps, but as we tested we found no significant improvement in the number of dropped frames on scroll or in the time to render search results. Because of this, we decided not to invest more time in these improvements and focus our efforts on larger opportunities.

Caching Results in a View Model

On the search page, each business in the list is bound to a business model object. This business model is received from the backend as part of the search response. The model does not map 1:1 with what we display to the end user. We do some calculations on the client to build labels with information that’s useful to the user and localized to their device’s locale. For example, we convert kilometers to miles, format strings for proper pluralization and determine how many minutes there are until the business’ closing time. Some of these calculations and resource loading take a significant amount of time (10-20ms) and were done on the main UI thread. We were doing these costly calculations each time we needed to bind the business information to a business list item view, and the performance impact was evident, especially on older devices.

Instead of doing these calculations each time a view needed data, we decided to move the results of these calculations into a view model so that they only needed to be done once. We created a view model builder class to handle all of the logic for populating the fields of the view model. This builder takes in a business network model object and returns a business view model object with all of the properties calculated and string resources loaded. We then stored these view models in a cache where each business id would map to a view model object that we could use to bind data to the view. The cache only persists across one page of search results since it’s possible that new properties need to be computed depending on the search vertical.

/** This method is called for each item in the list at bind time. */
public BusinessListItemViewModel getViewModel(
        YelpBusiness business,
        Collection<DisplayFeature> features,
        int position,
        int offset) {

    // The result position that the user will see.
    int resultPosition = offset + position + 1;

    String cacheKey = business.getId();

    BusinessListItemViewModel viewModel = mBusinessListItemViewModelMap.get(cacheKey);

    if (viewModel == null) {
        viewModel = mBusinessListItemViewModelBuilder
                    .setBusiness(business)
                    .setListPosition(position)
                    .setResultPosition(resultPosition)
                    .setDisplayFeatures(features)
                    .build();
        mBusinessListItemViewModelMap.put(cacheKey, viewModel);
    }

    return viewModel;
}

Not only did this drastically improve scroll performance, but it also made our code a lot easier to test. The view model builder was created to not have any dependencies on the standard Android libraries, meaning we would be able to run unit tests instead of integration tests that require the entire Android app to start and run. Where we previously relied on a handful of espresso tests to verify view binding logic, we now have excellent unit test coverage that covers all aspects of creating a view model from a network business model and runs in a fraction of the time.

Here is a comparison of the scroll performance before and after the view model introduction:

As you can see, scrolling on the right hand side not only feels faster, it is faster! The GPU profiler showing frame render time bars on the screen illustrates this. We are admittedly still dropping some frames under heavy load, but we are dropping far fewer than before and when we do, they span a much shorter time frame. Through our measurements using the FrameMetrics API, we also found that the percentage of dropped frames while scrolling dropped from 33% of all frames to 17%.

When the user scrolls down for the first time, the view model for each business is being constructed for the very first time. This is something that is still pretty computationally intensive. We could probably take this one step further by taking advantage of the time before the user starts scrolling to build the view models ahead of time, but as it is now, the view model is constructed as each new view enters the viewport. While it is quicker than previously on the initial scroll, the performance improvement is most evident when the user scrolls back up the list. At that point the view model for each business is already constructed so no new computations are needed. That’s why you see a much smaller impact (shorter bars) on the right when compared to the left when the user scrolls up.

Next Steps for Performance Improvements

There are a few more ways we can improve the performance of search on the Android app. In case you didn’t notice, we didn’t provide any timing details for the timespan between the user pressing the button and the search request being executed. Unfortunately, we actually didn’t take that time into consideration when improving our timing and it was assumed to be negligible. Upon further investigation we have found that it’s non-trivial amount of time since it involves waiting for the activity containing the search list to be created. As a next step we want to add this time to the search_results_loaded timing metric and accurately gauge just how much time is spent on the client before the request is executed (represented in green below).

To offset this time, we can fire the search request as soon as the user interaction to initiate the search happens. If we do that, then we can do the work to create the activity containing the search list while the request is in flight which would make our diagram look more like this:

This makes the search_results_loaded timing metric smaller which means less time waiting for our users! The sooner we’re able to execute the request after the interaction, the more time we end up saving.

Another step we could take to improve performance would be to persist the business view models for a longer period of time by taking advantage of our app’s data layer. The data layer follows the repository pattern and we use it to cache the response for many network requests, including search. If we also cached the view models associated with each business in the data layer, then a repeat search could be rendered extremely quickly! Also, if a business shows up in more than one search (say after toggling a price filter) then we wouldn’t need to reconstruct the view model for this business.

Conclusion

Through our performance work on the Android client, we were able to reduce perceived search performance from 350ms at the 50th percentile and 656ms at the 90th percentile to 190ms and 394ms respectively. The percentage of dropped frames while scrolling to the bottom of the search page also improved from 33% to 17%.

We’ve completed step 2 of the improvement lifecycle by implementing some changes that have improved our performance metrics, the next step is to ensure that our improvements won’t be unwittingly undone by future engineers. In the next blog post we’ll cover our client-side performance monitoring and how we can be confident we’re always moving towards a more performant app, even when performance is not at the forefront of our next project!

Want to build next-generation Android application infrastructure?

We're hiring! Become an Android developer at Yelp

View Job

Back to blog