Engineering Career Series: Using structured interviews to improve equity

For years, Yelp continued to use an interview process that was created when we were a 50-200 person Engineering organization, with only a handful of interviewers:

Each interviewer wrote their own interview questions
A few senior leaders gave overall hire/no hire decisions for every panel
Interviewers received ad hoc feedback from senior leaders when it seemed like they were too tough or too easy in their interviews

A few things went well:

there was a strong sense of personal responsibility for both leaders and interviewers
turnaround time for offer approvals was quick
and Yelp values could be preserved by senior leaders.

As the Engineering organization grew to more than 500 employees, the interviewer pool also grew, from tens to hundreds. This created a few challenges. It became harder to enforce similar standards across interviewers. It became increasingly difficult to tell whether a candidate was strong or if an interview question was correctly calibrated. This lack of structure made it difficult to confidently and consistently identify strong candidates. It also made it difficult to identify whether there were patterns of bias in our interview process. Faced with these challenges, we asked, “How do we continue to hire diverse, amazing talent as we scale our Engineering organization?”

Creating Structured Interviews

A group of folks across Technical Talent and Engineering banded together to answer this question. Since others had gone down this path before us, we began with a review of prior work by Medium, Quora, and Sensu. Those references, along with our own internal review led to the creation of a structured interview process that reflected what we felt it took to succeed at Yelp. As a first step, we focused on standardizing questions across all of our open roles to four key question types:

Problem Solving
System Design
Ownership, Tenacity, and Curiosity
Playing Well with Others

The first two interview types focus on the candidate’s technical skills, and the latter two focus on non-technical skills and how aligned the candidate is with Yelp’s values. For the technical portions, we wanted to evaluate the candidate’s skill with technical tasks that would be common in the role they’re applying for, rather than their ability to memorize algorithms or easy-to-search-for trivia. To create these questions, we asked engineers across the organization to take a problem their team recently solved and create an example on a smaller scale. We strongly believe that using real life problems to evaluate skills captures what is needed to actually succeed at Yelp and helps us give more opportunities to people with different backgrounds to be successful in our hiring pipeline.

To evaluate questions, we standardized criteria that related to dimensions (Technical Skill, Ownership, Business Insight, Continuous Improvement, and Leadership) that we use internally for leveling engineers. This further aligned internal and external expectations of candidates and employees.

Making Data-Informed Decisions

Moving to structured interviews allowed us to take the first step to both collect and analyze interview data in a meaningful way. We went from having no comparable feedback to thousands of technical and behavioral data points in a consistent format. This not only gives us the opportunity to monitor the health and size of our pipelines, but it also enables us to identify potential problems or biases at every stage of the interview process. When observing a gap or difference in dropoff rates, we are better able to drill down our focus to specific question sets or interviewers and determine what solutions to implement to directly mitigate bias.

First try: what we learned

After introducing structured interviews, we soon identified a difference in pass rates across genders in the initial round of technical interviews. Upon closer inspection, we found instances where candidate performance was identical when measuring how many components of a coding question were completed. However, men were progressing to the next stage of the interview process at a higher rate than women. We were able to quickly reduce this gap by replacing individual interviewers’ judgment on a candidate’s performance with standardized pass/fail criteria, which ensured that all qualified candidates moved forward. This was the first of several successful modifications, which have collectively reduced the pass rate gap between genders. Making corrections to the early steps of the interview process has made a huge impact on gender diversity at every subsequent stage. This ultimately increased the likelihood of having more women make it to the final offer stage. With better pipeline observability, we’ve been able to more effectively hire diverse talent by mitigating these biases and reducing false negatives.

Second try: defining evaluation criteria

While we were now able to both pinpoint and remedy where drop offs were occurring in our interview process, our approach to reducing bias was still reactive. Interpretation of candidate performance varied amongst interviewers, even with the measures we had in place. We recognized that having structured interview questions wasn’t enough, and we needed explicit evaluation criteria for all of our interviews.

To address this, we introduced a points-based evaluation criteria to our structured interviews. In this initiative, we further clarified what signals we wanted interviewers to look for and capture. Points are awarded for expected candidate behaviors based on a rubric. Interviewers are required to provide an explanation for when and why points are deducted. This scoring framework can then be aggregated and converted to hiring and initial leveling decisions to maintain consistency across the larger organization. A key benefit of this framework is that interviewers can systematically measure candidate performance during the interview, but the onus of deciding final interview outcome, and, therefore, the possibility of unconscious (or even conscious) bias by the interviewer, is reduced.

How we’re evolving

If there’s anything we’ve learned from this journey, it’s that improving interviews is an ongoing process of review and adaptation. At Yelp, we’ve made this a shared priority between Technical Talent and Engineering. Our teams work closely with one another and have a dedicated task force with several subgroups composed of folks from both groups that meet on weekly cadences to put this commitment into action. While we still have a lot on our roadmap, here are some key lessons that we have learned so far:

Making interview improvements requires a real partnership. It may seem obvious to say this, but, if you’re going to improve engineering interviews, you’re going to need subject matter experts from both engineering and recruiting to capture all the nuances that are often overlooked.
Interviewer bias still exists in your hiring process even with a standardized process and structure. A good best practice to combat this is to make sure that the group working on interview processes is reflective of the demographics of your organization, or what you’d like your organization to be. Make sure women and underrepresented minorities are involved.
A distributed workforce means different geographies with different cultural considerations and different employment norms, so include engineers representative of all your geographies when standardizing. Our initial task force failed to include folks from our European teams and, thus, some of our interview questions were geared towards Bay Area tech culture.
Collecting feedback is imperative towards making progress, so make sure you create feedback loops from all stakeholders: recruiters, recruiting coordinators, interviewers, and hiring managers. Candidates are stakeholders, too, so make sure to have a process to get feedback on their interview experience.
Standardization allows for easier review and change, whether that is the pipeline, the interview questions, interview evaluation, training - the list goes on. We’re still in the midst of rolling out our points-based evaluation criteria for structured interviews, and we’re able to move a lot faster instead of needing to reinvent the wheel!

Up next: How we onboard engineers across the world at Yelp

Equally important to bringing in diverse talent is everything that happens onwards from the moment a candidate becomes an official Yelper. In the next post in this series, we’ll take a closer look at the thought and logistics that we go through to set folks up for success, how we’ve streamlined our onboarding process for distributed teams in the virtual world, and the opportunities for continuous learning we provide to our employees through training and mentorship programs.

If you’re finding these posts interesting and Yelp sounds like the kind of company culture that you’d like to be a part of… we’re hiring!

This post is part of a series covering how we're building a happy, diverse, and inclusive engineering team at Yelp, including details on how we approached the various challenges along the way, what we've tried, and what worked and didn't.

Read the other posts in the series:

Back to blog

Yelp

Engineering