Author: Madhax | Published: 24th June 2009 | RSS | LINK

Recently I discovered that the recommended way of changing a column type of a table in MySQL is intolerably slow. By recommended I mean that the first result of searching for “how to change column type mysql” (excluding quotes) is a link to the MySQL developer documentation for an ALTER TABLE query.

The circumstances started with me needing a table to store MD5 hashes. The table definition I had come up with at the time was

mysql> DESCRIBE mytable;
+-------+------------+------+-----+------------------+-------+
| FIELD | Type       | NULL | KEY | DEFAULT          | Extra |
+-------+------------+------+-----+------------------+-------+
| HASH  | CHAR  (16) | NO   | PRI |                  |       |
+-------+------------+------+-----+------------------+-------+
1 row IN SET (0.03 sec)

The design error is that I had defined the `HASH` as char(16). A properly escaped MD5 digest would be stored correctly in the table and I was able to fetch any digest – via MySQL command line applications – as it was stored in the table (Data after \0’s would get returned.) There wasn’t any problem until I chose to make optimizations to the project and make use of functionality provided by a MySQL API.

When selecting any char(x) column from a table the API would read up to a \0 (NULL, int(0).) So I would get varying column lengths depending on the digest in the current row. It’s reasonable for the APIs to assume that because I have defined a column as a sequence of chars then a null terminated string would be stored in it … especially since there is another data type that is specifically used for binary data.

This realization came AFTER I had already populated the table with a lot of data. The table had grown to a fairly large size.

mysql> SELECT COUNT(*) FROM `mytable`;
 
+----------+
| COUNT(*) |
+----------+
| 38744395 |
+----------+
1 row IN SET (0.33 sec)

I was looking to fix my mistake. The ideal solution would be to change the column type. Research done via Google and looking through a book (High Performance MySQL) lead me to believe that

ALTER TABLE `mytable` MODIFY HASH BINARY(16) NOT NULL DEFAULT '’;

would be the best way to go about changing a column type. I was wrong. The query ran for days. This is the result of a “SHOW PROCESSLIST” a few days into the query.

mysql> SHOW processlist;
+------+------+----------------+----------+---------+--------+-------------------+----------------------------------------------------------------------+
| Id   | User | Host           | db       | Command | Time   | State             | Info                                                                 |
+------+------+----------------+----------+---------+--------+-------------------+----------------------------------------------------------------------+
|   82 | root | localhost:4442 | NULL     | Sleep   |     52 |                   | NULL                                                                 |
|  912 | root | localhost:3677 | mydb     | Query   | 275084 | copy TO tmp TABLE | ALTER TABLE `mytable` MODIFY HASH BINARY(16) NOT NULL DEFAULT ''     |
| 2672 | root | localhost:4527 | ******** | Query   |      0 | NULL              | SHOW processlist                                                     |
+------+------+----------------+----------+---------+--------+-------------------+----------------------------------------------------------------------+

A day later the query stopped executing because `mytable` was corrupted. It didn’t occur to me that my experimenting may have corrupted the table – so I didn’t bother running a consistency check. The MySQL client should’ve probably verified the tables before executing any query that is expected to run for more than a day. Regardless, I would have thought that the column modification would be near instantaneous. The column data wouldn’t need to be changed or converted (I used a single byte character set). All that would be required is for the particular column to be recognized as a different type. An ALTER TABLE query shouldn’t be this slow.

In contrast, a full table scan – which involves reading all the data from the table off of the hard drive – is completed within 8 minutes.

mysql> SELECT * FROM `mytable` WHERE `HASH` LIKE "%dsdfsdfsdfsd%" LIMIT 1;
Empty SET (7 min 1.45 sec)

So even if it needed to modify all the rows, and assuming that reading from disk takes as much time as writing to disk, then it should finish within 20 minutes. (The columns would have the same width so no data re-arrangement would need to take place.)

Without digging through the source code I can only provide speculation based on behavior and on information provided by third party sources. An excerpt from High Performance MySQL reads:

MySQL performs most alterations by making an empty table with the desired new
structure, inserting all the data from the old table into the new one, and deleting the
old table.
....
mysql> ALTER TABLE sakila.film
-> MODIFY COLUMN rental_duration TINYINT(3) NOT NULL DEFAULT 5;
 
Profiling that statement with SHOW STATUS shows that it does 1,000 handler reads and
1,000 inserts.

My understanding is that it creates a temporary table, fetches one row at a time from the original table and inserts one row at a time into the temporary table. The problem with this method is that it incurs a lot of overhead performing queries one row at a time. This overhead could be avoided by handling a lot of rows in a lot fewer queries. All this occurred to me while the ALTER TABLE had been executing, but I didn’t kill it because I wasn’t completely confident that this was the problem or that I had a better solution.

When the ALTER TABLE failed I tried an alternate solution that involves bulk SELECTs and INSERTs. I did the following steps:

1.	Create an SQL dump of mytable using mysqldump command
2.	Opened the SQL dump in an editor that supports large files (I used UEStudio.)
3.	I did a string-replace of char(16) to binary(16) (The reason I did a string replace is because UEStudio – when opening large files – would write whatever you typed directly to disk.. so for very large files a single keystroke would lag for about a minute.)
4.	Import the edited SQL file via  “mysql –uuser –p mydb < edited.sql”

This entire process was completed within an hour. Hopefully this post will help someone save a few days of their life.

RE: http://oreilly.com/catalog/9780596003067/
RE: http://dev.mysql.com/doc/refman/5.1/en/alter-table.html

Leave a Reply

Some basic HTML is allowed. Please keep all comments constructive, polite and on-topic. Any spam or offensive comments will be deleted.