Optimizing MySQL 1: Using Indexes

Using Indexing To Speed Up MySQL

So recently I’ve come up against some issues with our website, webCV. The problem? Mysql tables were not correctly optimized, making the site run extremely slow since it was a heavily database intensive site. So after doing some reading in a great MySQL book, I decided to test some concepts, such as indexing, and that will be covered here: How to optimize your MySQL tables using indexes.

The exact workings of indexes will not be discussed here (for a more complete discussion of how they work, see http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html )

What we’ll also be looking at co-incidentally is how MySQL stored procedures can be used to populate tables quickly and easily with data (in this case random data).

Demo of Index Performance

In order to see how much indexes can improve query performance across multiple tables, we shall conduct a simple test involving three tables, t1, t2, and t3 with each table containing a column i1,i2 and i3 respectively and each having 1000 rows that are populated as follows: column i1 has rows with values 1 – 1000, i2 random values in range:  \{ 0, 20,000 \} , and column i3 the same as i2 (but different values).

The following MySQL procedure achieves the above by creating and populating the tables as needed. Simply import it from your Linux command line as you would any sql file:

The pop_random.sql file can be found here, exposing the init_tbl(’tablename’, no_random_entries) procedure call to us.

DELIMITER $
DROP PROCEDURE IF EXISTS init_tbl $
CREATE PROCEDURE init_tbl(IN tbl CHAR(64), IN upper INT)
BEGIN
	DECLARE count INT;
	SET count = 1;
 
	-- rem all first
	SET @s := CONCAT('DELETE FROM ', tbl, ' WHERE 1');
	PREPARE stmt FROM @s;
	EXECUTE stmt;
 
	-- Loop from 1 to upper--
	WHILE count < (upper + 1) DO
		SET @s := CONCAT('INSERT INTO ', tbl, '(i0, i1,i2,i3) VALUES (',count, ', 0, 0, 0)');
		PREPARE stmt FROM @s;
		EXECUTE stmt;
		SET count = count+1;
	END WHILE;
 
	CALL pop_random(tbl);
END $
 
DROP PROCEDURE IF EXISTS pop_random $
CREATE PROCEDURE pop_random(IN tbl CHAR(64))
BEGIN
	SET @r := 0;
	SET @s := CONCAT('UPDATE ', tbl, ' SET ', tbl,'.i1= (@r := @r + 1) order by rand()');
	PREPARE stmt FROM @s;
	EXECUTE stmt;
 
	SET @r := 0;
	SET @s := CONCAT('UPDATE ', tbl, ' SET ', tbl,'.i2= (@r := @r + 1) order by rand()');
	PREPARE stmt FROM @s;
	EXECUTE stmt;
 
	SET @r := 0;
	SET @s := CONCAT('UPDATE ', tbl, ' SET ', tbl,'.i3= (@r := @r + 1) order by rand()');
	PREPARE stmt FROM @s;
	EXECUTE stmt;
 
	SET @msg := CONCAT(tbl, ' successfully created and populated.');
	SELECT @msg;
 
END $
 
DELIMITER ;
 
DROP TABLE IF EXISTS `t1`;
CREATE TABLE `t1` (
  `i0` int(11) NOT NULL,
  `i1` int(11) NOT NULL,
  `i2` int(11) NOT NULL,
  `i3` int(11) NOT NULL,
  PRIMARY KEY  (`i0`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
 
DROP TABLE IF EXISTS `t2`;
CREATE TABLE `t2` (
  `i0` int(11) NOT NULL,
  `i1` int(11) NOT NULL,
  `i2` int(11) NOT NULL,
  `i3` int(11) NOT NULL,
  PRIMARY KEY  (`i0`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
 
DROP TABLE IF EXISTS `t3`;
CREATE TABLE `t3` (
  `i0` int(11) NOT NULL,
  `i1` int(11) NOT NULL,
  `i2` int(11) NOT NULL,
  `i3` int(11) NOT NULL,
  PRIMARY KEY  (`i0`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
 
-- _initialize the tables
call init_tbl('t1', 1000);
call init_tbl('t2', 1000);
call init_tbl('t3', 1000);

Running the above script will give us the three tables each with 3 columns i1-i3, each with 1000 lines of random, unique numbers. Stored procedures can make it easy to keep common procedures in SQL grouped together and perform complex operations. For example, if you’d like to populate the table ‘t1′ with more rows, simply run the query:

call init_tbl('t1', 20000);

Getting back to indexing, we’ll focus on the following three columns: t1.i1, t2.i2 and t3.i3. We write the following query to find all combinations of table rows in which the values are equal:

SELECT t1.i1, t2.i2, t3.i3 FROM t1 INNER JOIN t2 INNER JOIN t3 WHERE t1.i1 = t2.i2 AND t2.i2 = t3.i3;

That query took 2 min 5.36 sec to finish – quite a while! By running this query without using indexes, we have to scan ALL rows to determine which rows contain which values. So all combinations must be tried to find the matches on the WHERE clause. The largest possible number of combinations is: 20,000 x 20,000 x 20,000 – a very large number! That’s alot of wasted effort. As the tables grow, the processing time increases even more in the absence of indexes, leading to poor performance. By using indexes, performance can be greatly improved due to the query being processed as follows:

  1. Determine the value of the first row from table t1;
  2. Now, using the index on t2, immediately find the value that matches the value from the 1). (Similarly, an index on t3 will allow the value from t2 to be found immediately).
  3. Go to the next row of t1 and repeat steps 1) – 2), until all rows in t1 have been processed.

So the above procedure still does a full scan of table 1, but indexed lookups on t2 and t3 will cause the query to run many many times faster. Lets create indexes on the tables and run our above query again to evaluate performance improvement:

#Do same for t3.i3
 ALTER TABLE `t2` ADD INDEX ( `i2` ) ;

Now running our select & join query, produces the same output as before, but in 0.22 secs – a speed increase of roughly 568 times. This increase will be even more the larger the tables are.

How To Decide Which Columns To Index

Columns that are used for searching, sorting or grouping are ideal candidates for indexing. Columns that are used for output should be left out. Thus, the columns that appear in the WHERE clause, columns named in join clauses or columns that appear in ORDER BY or GROUP BY clauses are the type of columns you want to index. Be careful not to over-index, since indexes can slow down insert/update operations.
Leave a Comment