Engineering Career Series: Hiring a diverse team by reducing bias
Elias Alberts, Director of Technical Talent
- Apr 22, 2021
Compared to where we started, Yelp’s technical organization has made a lot of headway over the years when it comes to diverse hiring. While our approach to this work continues to evolve, we’ve made significant progress in improving the diversity of our organizations by, among other things, reducing gender and ethnicity bias in our interview process. We’re here to share some of what we’ve learned to help others in their own efforts.
If you’ve come looking for the secret formula to emulate our success, I can’t help you there, unfortunately. Anyone offering otherwise is probably selling you something. And, to be sure, you’re going to need to buy some things along the way. But, if that newest iteration of Bias Blaster 9000 sounds too good to be true, that’s because it is. There are no easy fixes here.
In the 9 years I’ve been with Yelp, we have taken several major strides to evolve our hiring processes and strategy that got us to where we are today. What I’m covering in this blog is arguably the most critical of the changes we’ve made: tracking every bit of data possible and running regular analyses to better diagnose the bias in our engineering interviews.
A Data Oriented Approach
Today, we’re monitoring every stage of every candidate’s interview process, as well as several data points about the candidates themselves. We track how many candidates apply organically versus how many are sourced and how each group performs on the first round interview. We monitor the offer rates by gender identity and how each group is converting to offer acceptance. We’re able to determine down to the level of individual questions in our interview process whether they are being passed at equal rates by all people being interviewed.
To know these things, we’ve become deliberate about tracking data and analyzing it. We automatically publish daily updates to a host of dashboards that monitor the health of our pipelines. We report weekly on the state of our hiring pipelines so that we can make adjustments as needed. We don’t make changes to our interview process without first knowing that we can measure the effects. With these procedures in place, we are truly able to systematically identify and address problems.
We’ve come a long way from where we started. It sounds somewhat absurd in hindsight, but early on during my tenure at Yelp, we didn’t even know how many people we needed to hire. We just knew that we needed more engineers, and that we needed them last month. We monitored how many people we were hiring per month alongside our offer-to-accept conversion rate. Sort of. As long as we remembered to track them, but it wasn’t a big deal if we forgot either. There was a lot of room for improvement.
An Opportunity To Start Fresh
Just prior to the onset of the pandemic, our recruiting team was presented with a new opportunity as Yelp Engineering decided to expand its footprint to Toronto. As the pandemic unfolded, our plans pivoted from focusing on Toronto to remote hiring in Canada at large. This was our first opportunity to enter a new talent market properly with the knowledge we’d gained over the previous years.
And it seems to be working: In Q1 2021, 19% of our engineering hires in Canada identified as Black or Latinx (together, underrepresented minorities or URM), and we saw even more impressive gains in leadership positions, too.
I’ve often regretted our inability to make quicker decisions for lack of data. It takes time to build up a sizable enough data set to understand your processes and detect the bias in them. In Yelp’s case, depending on the current rate of hiring, we’re typically able to understand the state of affairs with statistical significance after a month or two of data collection. There are of course variables that impact this. For example, top of funnel stages, such as first round interviews, produce more data.
Nine years ago, it would have taken us significantly longer to produce useful data. This is especially true at the later stages of the pipeline, when the number of candidates are reduced, and for demographics that are typically underrepresented in tech, because there weren’t enough in the pipeline to make statistically significant conclusions about. If you’re just getting started, the sooner you’re tracking recruiting data, the sooner you’ll be making meaningful changes to your processes.
Essential Data Points
If we were starting fresh today, there are three data points I’d want to make sure we started collecting immediately.
- Proceed/Did not proceed rates at every stage of the interview process - This one might go without saying, but it’s the foundation everything else is built on. Start from the point of contact all the way through to offer acceptance. Everything else is useless without an understanding of the proceed/did not proceed rates at each interview stage.
- Candidate source - Knowing where your candidates are coming from generates a number of insights. Do applicants from career fairs or job boards get more offers? Most people jump to wanting to find the most successful sources, but it’s equally valuable to know your least successful sources. Candidates from certain sources falling out of your pipeline at a disproportionate rate can be very telling. We’ve seen this manifest as non-traditional CS educations, such as bootcamps, being rejected at disproportionate rates. This indicated that we needed to be more explicit in our evaluation criteria for interviews that we’re unconcerned with an applicant’s educational background, and changing these criteria has been highly effective at making sure candidates with a wide range of education backgrounds proceed equally through the pipeline.
- Candidate demographics - Being able to analyze your pipeline by the demographics of the candidates is extremely helpful. For instance, it’s well known that there is a gender disparity in the tech space. Tying gender or ethnicity back to the previous two data points allows for powerful insights into which interview stages are problematic. As an example, we were able to identify early on in our data that women were less likely than men to attempt the code test, which is the first step to our interview process. A surprisingly effective intervention here was to ask all candidates a second time to participate, which is a good reminder that you don’t always need to reinvent the wheel to make change.
Point 3 comes with two very important caveats.
- Collecting this data is subject to different legal requirements depending on your location. Consult legal experts before moving forward.
- No one responsible for making hiring decisions can have access to this data. The trackers that hold this data are managed by our operations people and access is granted only to the sourcers and recruiters tracking the data.
Assess Your Systems
Don’t let perfect be the enemy of good when tracking your data. Teams can be overwhelmed with the possibilities of what to track and how to go about it. A good place to start is by getting a grasp on what your existing systems can provide. You likely have some sort of applicant tracking system (ATS) that can provide some types of pipeline metrics. Learn what your system can do for you and how it does it.
It’s likely you’ll have to supplement your ATS with custom-made solutions, as there’s going to be data that your ATS is unable to provide. Don’t be too good for spreadsheets. I know, I know, there has to be a better way. There’s always a better way. Getting the data matters more than how you’re getting it. If spreadsheets allow you to track your data while you find a more permanent off-the-shelf solution or your teammates in engineering build you something, do it. We’ve relied on spreadsheets for years. Even though we’ve incorporated tools such as Tableau, spreadsheets have remained an important part of our system.
Good, reliable data depends on maintaining consistent data collection practices. Depending on your systems, some of this will be automatic. At Yelp, we track a sizable amount of data manually and use our ATS for everything it can accurately, automatically track. For everything else, we rely on our recruiters and sourcers to manually track data in spreadsheets.
Each recruiter and sourcer has a centrally managed tracker that they use to track their candidates from start to finish. There is no room for interpretation about what data to collect and how to collect it. Every tracker is exactly the same and every team member tracks the same data.
Maintaining and analysing your data should also be the explicit responsibility of someone on your team. For us, things really took off when we created a full-time operations role within the recruiting organization. Taking this work seriously requires constant maintenance that can’t be done in “spare” time.
This approach is not foolproof. There are mistakes to be made, and we’ve made our fair share of them. Chief among them is drawing conclusions from data that is not statistically significant. We’re often dealing with fairly small data sets, and it can be very tempting to make changes based on perceived patterns. In these cases, patience is key.
If something looks off, definitely pay attention, but try not to jump to conclusions. Rolling a change back hurts and also messes up your painstakingly collected data. It ends up being more damaging in the long run to make changes based on data that hasn’t reached significance.
Part 2 of this post will go into detail on how we’ve built a structured interview process and acted on the data that we’ve collected. Layering structured interviews on top of our data collection and analysis practices has allowed us to make fine-grained tweaks to the interview process that would be otherwise impossible. Our insights have led us to a points-based system in our latest iteration of structured interviews that will further our goal of more equitably scoring interview performance.
Lastly, if you’re finding these posts interesting and Yelp sounds like the kind of company culture that you’d like to be a part of… we’re hiring!