Importing times in MySQL

One of the ways to import data into MySQL is using the LOAD DATA INFILE. It is a faster method than recovering from a dump, as it’s raw data instead of SQL sentences.

The import time depends on the table engine, for example, MyISAM can be 40 times faster than Innodb. Let’s benchmark this:

Preparation

I’m gonna make some benchmarking using MySQL 5.1.36 (64 bits MacOS X). I’ll need a big table, so I’ll take City from the World Database and create a huge table called «city_huge»:

CREATE TABLE city_huge LIKE CITY;

INSERT INTO city_huge 
    SELECT NULL, name, CountryCode, District, Population FROM city;
# Run this sentence 100 times,
# so city_huge table will be 100 times bigger than city.
# Tip: use a script, temporary table, stored procedure...
# or tell your monkey to do so.

SELECT COUNT(*) FROM city_huge;
#   +----------+
#   | COUNT(*) |
#   +----------+
#   |   407900 | 
#   +----------+

# Make a table data backup:
SELECT * FROM city_huge INTO OUTFILE 'city_huge.bak';

# Truncate table, so we'll start with an empty table.
TRUNCATE TABLE city_huge;

Direct Import

Let’s import the backup into the city_huge table, using MyISAM, InnoDB and MEMORY:

LOAD DATA INFILE 'city_huge.bak' INTO TABLE city_huge;
#   Query OK, ... (5.85 sec)
# So, that was using MyISAM.

# Let's empty the table and change the engine to InnoDB:
TRUNCATE TABLE city_huge;
ALTER TABLE city_huge ENGINE = InnoDB;
LOAD DATA INFILE 'city_huge.bak' INTO TABLE city_huge;
#   Query OK, ... (3 min 59.53 sec)

# With Memory:
TRUNCATE TABLE city_huge;
SET @@max_heap_size= 128 * 1024 * 1024;
ALTER TABLE city_huge ENGINE = MEMORY;
LOAD DATA INFILE 'city_huge.bak' INTO TABLE city_huge;
#   Query OK, ... (2.18 sec)
MyISAM 0:5.85
InnoDB 3:59.53
MEMORY 0:2.18

Wow, MyISAM is almost 40 times faster. And MEMORY is even faster.

Alter Table

Ok, InnoDB is a bit slow, but sometimes you can’t use another storage engine. In those cases, you could import in the other engine and then change the table engine to InnoDB.

That would look like this:

TRUNCATE TABLE city_huge;
ALTER TABLE city_huge ENGINE = MyISAM;
LOAD DATA INFILE 'city_huge.bak' INTO TABLE city_huge;
#   Query OK, ... (5.85 sec)
ALTER TABLE city_huge ENGINE = InnoDB;
#   Query OK, ... (4 min 11.24 sec)

Ooops, 4 min 17 sec is more than 3 min 59 sec.

Let’s try Memory:

TRUNCATE TABLE city_huge;
SET @@max_heap_size= 128 * 1024 * 1024;
ALTER TABLE city_huge ENGINE = MEMORY;
LOAD DATA INFILE 'city_huge.bak' INTO TABLE city_huge;
#   Query OK, ... (2.18 sec)
ALTER TABLE city_huge ENGINE = InnoDB;
# Query OK, ... (3 min 28.39 sec)

Yes, 3 min 31 sec is faster than 4 min. 30 seconds are around 10% faster.

Disclaimer

This benchmark is done using the default configuration, I’m sure that tuning InnoDB will improve the results. Also, I’m not the most accurate benchmarker, so I encourage you to do your own benchmarks.

This solution is not a silver bullet, MEMORY engine needs lots of memory. And this is NOT always the best approach, InnoDB should be fast by it’s own.

Anyway, its funny to discover those not so obvious behaviours :D.

Read Spanish Comments

I’m twitting again

Yeah!! Twitter have reactivated my account :D. Good news!!

I’ve written a patch for my wordpress theme and now it reads the caption from twitter :D. I still have to fix some performance problems, but… here it goes:

require_once 'XML/Feed/Parser.php';
$feeds = fopen('http://twitter.com/statuses/user_timeline/26378338.atom','r');
$source = "";
if ($feeds) {
	while ($s = fread($feeds, 1024)) {
		$source .= $s;
	}
}
$feed = new XML_Feed_Parser($source,false,true,true);
$first_entry = $feed->getEntryByOffset(0);
$title = $first_entry->title;
$little_title = substr($title, 13, 30);
$little_title .= (strlen($title) > 43) ? '...' : '';
print "<a href=\"http://www.twitter.com/capitangolo\">$little_title</a>";

Updated:

Thanks to: http://remysharp.com/2007/05/18/add-twitter-to-your-blog-step-by-step/. Now my tweets are loaded by javascript, so my blog loads much fast :D.

WarpTalks Febrero 2009

Warp Talks

Desde Warp Networks estamos promocionando un ciclo de charlas con nombre clave «warptalks».

Principalmente serán charlas técnicas relacionadas con el software libre impartidas por los trabajadores de Warp.

Aunque es una iniciativa interna y todavía estamos probando, queremos compartir la experiencia con el resto del mundo a través de internet.

Este viernes, día 20 de febrero 2009, tendremos el honor de contar con Adeodato Simó, que impartirá la charla «Perder el miedo a Git en 90 minutos» a las 18:00 (hora Europe/Madrid).

Adeodato Simó es parte del Release Team de Debian, ha realizado prácticas en Google y desarrolla el proyecto Minirok:

Podrás seguir la charla en vivo en justin.tv.

Si el horario te viene mal, o te interesan las charlas anteriores, tanto vídeos como presentaciones estarán disponibles en slidehsare y vimeo:

Nuestro objetivo es realizar un día de charlas el último lunes de cada mes (aproximadamente), quizá te interese obtener noticias de última hora en twitter.

o en nuestro blog corporativo:

First Warp Talks

Warp Talks

2009 Starts quite interesantly.

This Monday took place the first Warp Talks, a project of training between employees at Warp Networks. The last Monday of the month will take place a new Warp Talk.

koke and me where the first speakers.

I made an introduction to subversion, and he did a talk about 10 things you might not know about MySQL.

Koke took a camera and recorded our talks at the same time they were being broadcasted at justin.tv. Videos are available at vimeo (spanish):

http://www.vimeo.com/tag:warptalks