At Yelp, we store nearly all of our data in MySQL. At any given time we’re issuing tens of thousands of SQL queries to our database cluster per second, with some individual servers going above the 10k qps mark. Our database cluster

consists of billions of rows. In response to a lot of different problems we’ve had to optimize the snot out of our MySQL installation, and we’ve learned some interesting things along the way.

A colleague and I recently gave a presentation to some of our coworkers, titled  MySQL Minutiae & InnoDB Internals. The talk covered some good background knowledge that every developer should have about MySQL (transaction isolation levels, replication, etc.) and some more advanced topics such as InnoDB locking semantics.

There’s some good stuff in the talk for people of almost all skill levels, but to whet your appetite I’m going to dive into an interesting deadlock example.

A Mysterious Deadlock

The following example is based on a real deadlock I was trying to debug recently. Some of our photo code was issuing queries like this:

DELETE FROM biz_photo WHERE business_id = ...;

INSERT INTO biz_photo (…) VALUES (…);

When adding a new photo, the code wasn’t sure if a row would already exist in the table, so it would issue a DELETE just in case followed by an INSERT with the correct data. This code was deadlocking a lot. The deadlocks were

really mysterious too—you’d see that totally different business ids were causing deadlocks, so when photos were simultaneously added to two different businesses with different ids, the queries would still cause a deadlock. There’s a bunch of different ways that you could rework the queries to work around the problem—e.g. by using REPLACE INTO, ON DUPLICATE KEY UPDATE, a SELECT followed by an INSERT or UPDATE—but some of these

techniques may not be replication  safe, and without understanding _why _this is deadlocking it’s not clear which of these techniques will avoid the problem.

Diving a bit deeper, I was able to reproduce this deadlock with a very simple contrived example (the notation T1 and T2 depicts two different transactions):

CREATE TABLE dreadlock (i INT PRIMARY KEY) ENGINE=InnoDB;

– N.B. dreadlock is empty at this point

T1: BEGIN; DELETE FROM dreadlock WHERE i = 1;

T2: BEGIN; DELETE FROM dreadlock WHERE i = 2;

T1: INSERT INTO dreadlock (i) VALUES (1); – blocks

T2: INSERT INTO dreadlock (i) VALUES (2); – deadlock here

What’s going on? As it turns out the DELETE statement in T1 grabs an IX (intent exclusive) table lock on dreadlock, and an X (exclusive) gap lock on the dreadlock primary key index. Since the table is empty, the gap lock will cover the entire index (i.e. negative infinity to positive infinity). T2 does the same thing.

Wait, what? X-locks are supposed to be exclusive, right? So how can T1 and T2 both hold an X-lock on the same part of the index? Shouldn’t T2’s DELETE statement block after T1 acquires its gap X-lock, since X-locks are  incompatible?

As it turns out there are multiple types of gap locks, and the gap locks acquired by DELETE statements are of the purely “inhibitive” variety. The DELETE gap lock blocks INSERT statements (which acquire “insert intention”

locks), but do not block other DELETE X-locks. T1’s INSERT statement tries to upgrade its X-lock to an insert intention lock which blocks on T2’s gap X-lock. When T2 tries to insert, InnoDB detects the deadlock and kills T2

(because it holds the fewest number of locks). Note that in this example, using MySQL’s SELECT FOR UPDATE syntax would not have avoided the problem, since that would have acquired exactly the same type of lock as the DELETE. In this case the best solution is to retry the transaction after a deadlock.

If you found this interesting, or you want some coverage of more of the basics (including a full introduction to different InnoDB lock types) I encourage you to check out the  slides.

Back to blog