Importing times in MySQL

One of the ways to import data into MySQL is using the LOAD DATA INFILE. It is a faster method than recovering from a dump, as it’s raw data instead of SQL sentences.

The import time depends on the table engine, for example, MyISAM can be 40 times faster than Innodb. Let’s benchmark this:

Preparation

I’m gonna make some benchmarking using MySQL 5.1.36 (64 bits MacOS X). I’ll need a big table, so I’ll take City from the World Database and create a huge table called “city_huge”:

CREATE TABLE city_huge LIKE CITY;

INSERT INTO city_huge 
    SELECT NULL, name, CountryCode, District, Population FROM city;
# Run this sentence 100 times,
# so city_huge table will be 100 times bigger than city.
# Tip: use a script, temporary table, stored procedure...
# or tell your monkey to do so.

SELECT COUNT(*) FROM city_huge;
#   +----------+
#   | COUNT(*) |
#   +----------+
#   |   407900 | 
#   +----------+

# Make a table data backup:
SELECT * FROM city_huge INTO OUTFILE 'city_huge.bak';

# Truncate table, so we'll start with an empty table.
TRUNCATE TABLE city_huge;

Direct Import

Let’s import the backup into the city_huge table, using MyISAM, InnoDB and MEMORY:

LOAD DATA INFILE 'city_huge.bak' INTO TABLE city_huge;
#   Query OK, ... (5.85 sec)
# So, that was using MyISAM.

# Let's empty the table and change the engine to InnoDB:
TRUNCATE TABLE city_huge;
ALTER TABLE city_huge ENGINE = InnoDB;
LOAD DATA INFILE 'city_huge.bak' INTO TABLE city_huge;
#   Query OK, ... (3 min 59.53 sec)

# With Memory:
TRUNCATE TABLE city_huge;
SET @@max_heap_size= 128 * 1024 * 1024;
ALTER TABLE city_huge ENGINE = MEMORY;
LOAD DATA INFILE 'city_huge.bak' INTO TABLE city_huge;
#   Query OK, ... (2.18 sec)
MyISAM 0:5.85
InnoDB 3:59.53
MEMORY 0:2.18

Wow, MyISAM is almost 40 times faster. And MEMORY is even faster.

Alter Table

Ok, InnoDB is a bit slow, but sometimes you can’t use another storage engine. In those cases, you could import in the other engine and then change the table engine to InnoDB.

That would look like this:

TRUNCATE TABLE city_huge;
ALTER TABLE city_huge ENGINE = MyISAM;
LOAD DATA INFILE 'city_huge.bak' INTO TABLE city_huge;
#   Query OK, ... (5.85 sec)
ALTER TABLE city_huge ENGINE = InnoDB;
#   Query OK, ... (4 min 11.24 sec)

Ooops, 4 min 17 sec is more than 3 min 59 sec.

Let’s try Memory:

TRUNCATE TABLE city_huge;
SET @@max_heap_size= 128 * 1024 * 1024;
ALTER TABLE city_huge ENGINE = MEMORY;
LOAD DATA INFILE 'city_huge.bak' INTO TABLE city_huge;
#   Query OK, ... (2.18 sec)
ALTER TABLE city_huge ENGINE = InnoDB;
# Query OK, ... (3 min 28.39 sec)

Yes, 3 min 31 sec is faster than 4 min. 30 seconds are around 10% faster.

Disclaimer

This benchmark is done using the default configuration, I’m sure that tuning InnoDB will improve the results. Also, I’m not the most accurate benchmarker, so I encourage you to do your own benchmarks.

This solution is not a silver bullet, MEMORY engine needs lots of memory. And this is NOT always the best approach, InnoDB should be fast by it’s own.

Anyway, its funny to discover those not so obvious behaviours :D.

Read Spanish Comments

2 Respuestas a “Importing times in MySQL”

  • #1 Ryan Thiessen

    Reporting on the default behaviour is all well and good, but is not a good representation of the differences between the engines.

    With adequate sizes for innodb_buffer_pool and innodb_log_file_size, my results on an underpowered workstation are 3.17s for MyISAM vs 6.25s for InnoDB. When I throw in innodb_flush_log_at_trx_commit := 0 then InnoDB load data infile time drops to 5.08 seconds.

    So yes, InnoDB is a bit slower for single threaded performance, but not *nearly* as much as your test indicates. Once you start adding in more concurrency you’ll find that very quickly reverses.

  • #2 golo

    Those are better results than I expected.

    As someone told me, default configuration is good to run mysql in any machine, but the worst to run in production :D.

    Now I’ve got another example to reafirm that sentence.

    Thanks for the tests!

Leave a Reply