MySQL Minutiae & InnoDB Internals
At Yelp, we store nearly all of our data in MySQL. At any given time we're issuing tens of thousands of SQL queries to our database cluster per second, with some individual servers going above the 10k qps mark. Our database cluster consists of billions of rows. In response to a lot of different problems we've had to optimize the snot out of our MySQL installation, and we've learned some interesting things along the way.
A colleague and I recently gave a presentation to some of our coworkers, titled MySQL Minutiae & InnoDB Internals. The talk covered some good background knowledge that every developer should have about MySQL (transaction isolation levels, replication, etc.) and some more advanced topics such as InnoDB locking semantics.
There's some good stuff in the talk for people of almost all skill levels, but to whet your appetite I'm going to dive into an interesting deadlock example.
A Mysterious Deadlock
The following example is based on a real deadlock I was trying to debug recently. Some of our photo code was issuing queries like this:
DELETE FROM biz_photo WHERE business_id = ...; INSERT INTO biz_photo (...) VALUES (...);
When adding a new photo, the code wasn't sure if a row would already exist in
the table, so it would issue a DELETE just in case followed by an INSERT
with the correct data. This code was deadlocking a lot. The deadlocks were
really mysterious too—you'd see that totally different business ids were
causing deadlocks, so when photos were simultaneously added to two different
businesses with different ids, the queries would still cause a
deadlock. There's a bunch of different ways that you could rework the queries to
work around the problem—e.g. by using REPLACE INTO, ON DUPLICATE KEY
UPDATE, a SELECT followed by an INSERT or UPDATE—but some of these
techniques may not be replication
safe, and without understanding why
this is deadlocking it's not clear which of these techniques will avoid the
problem.
Diving a bit deeper, I was able to reproduce this deadlock with a very simple contrived example (the notation T1 and T2 depicts two different transactions):
CREATE TABLE dreadlock (i INT PRIMARY KEY) ENGINE=InnoDB; -- N.B. `dreadlock` is empty at this point T1: BEGIN; DELETE FROM dreadlock WHERE i = 1; T2: BEGIN; DELETE FROM dreadlock WHERE i = 2; T1: INSERT INTO dreadlock (i) VALUES (1); -- blocks T2: INSERT INTO dreadlock (i) VALUES (2); -- deadlock here
What's going on? As it turns out the DELETE statement in T1 grabs an IX
(intent exclusive) table lock on dreadlock, and an X (exclusive) gap lock on
the dreadlock primary key index. Since the table is empty, the gap lock will
cover the entire index (i.e. negative infinity to positive infinity). T2 does the same thing.
Wait, what? X-locks are supposed to be exclusive, right? So how can T1 and T2
both hold an X-lock on the same part of the index? Shouldn't T2's DELETE
statement block after T1 acquires its gap X-lock, since X-locks are
incompatible?
As it turns out there are multiple types of gap locks, and the gap locks
acquired by DELETE statements are of the purely "inhibitive" variety. The
DELETE gap lock blocks INSERT statements (which acquire "insert intention"
locks), but do not block other DELETE X-locks. T1's INSERT statement tries
to upgrade its X-lock to an insert intention lock which blocks on T2's gap
X-lock. When T2 tries to insert, InnoDB detects the deadlock and kills T2
(because it holds the fewest number of locks). Note that in this example, using
MySQL's SELECT FOR UPDATE syntax would not have avoided the problem, since
that would have acquired exactly the same type of lock as the DELETE. In this
case the best solution is to retry the transaction after a deadlock.
If you found this interesting, or you want some coverage of more of the basics (including a full introduction to different InnoDB lock types) I encourage you to check out the slides.