High insert rate on MySQL

Question

I have a project at the moment that is close to be finished. I have been asked (at the very end) to log a specific event. The event will occur very often (easily a few times each 5sec). Each event creates a record in the log table. The database must be MySQL.

My experience in database consists of making queries, not building the database. And, i'm new to MySQL.

The log table is:

create table LogEvent
{
    id bigint auto_increment primary key,
    area smallint unsigned not null,
    area_zone tinyint unsigned not null,
    value1 float not null,
    value2 float not null,
    reason char(2) not null
}

The id is just to have a primary key. area is from 1 to 100 at the moment, but it can grow. We can consider it as a category. area_zone is like a subcategory of area. Other values are insignificant for my problem.

There will be a lot of inserts but no delete/update. The select queries will always be like select * from LogEvent where area=1 and area_zone=1. All select statements will include area and area_zone. Select queries won't happen often, maybe a few times a week.

It will grow over 1 million records easily. And eventually, millions per area/area_zone.

So, my questions are:

1) I'll need partitioning for sure. But, can I do it per area/area_zone? So, two values partitionning. Or should I use a mediumint and concatenate the two values together in my application (bitwise operation)? That way, it will be a single value partitioning

2) MyISAM or innoDB?

3) Should I put index somewhere? I'm very bad in this domain.

4) Hard disk space won't be a problem, but what about RAM? How much should I have? According to MS SQL Server, it use 28 bytes to store a record, so it should be similar in MySQL.

It's the only table in this database.

I hope I have been clear enough to help you help me. Thanks.

score 2 · Answer 1 · edited Apr 13 '17 at 12:42

Partition Question

You could partition by area and area_zone. As a result your CREATE TABLE statement would look something like this.

create table LogEvent
(
    id bigint auto_increment,
    area smallint unsigned not null,
    area_zone tinyint unsigned not null,
    value1 float not null,
    value2 float not null,
    reason char(2) not null,
    primary key(id,area,area_zone)
)
PARTITION BY RANGE COLUMNS (area,area_zone)
(
  PARTITION p01 VALUES LESS THAN (10,20),
  PARTITION p02 VALUES LESS THAN (20,30),
  PARTITION p03 VALUES LESS THAN (30,40),
  PARTITION p04 VALUES LESS THAN (40,MAXVALUE)
);

You will notice that I modified your PRIMARY KEY definition as a PRIMARY KEY must include all columns in the table's partitionning function. There is also some learning you will have to do to understand in what partition you're rows would be stored. Here is a link that explains how and where rows would be inserted based on the way you define you're partitionning scheme.

http://dev.mysql.com/tech-resources/articles/mysql_55_partitioning.html

The data types you choose for area and area_zone will depend on how unique both will be. As you said you are expecting millions of rows, the question is, will there be millions of different areas and area_zones. The number of unique areas and area_zones could not go past the limits defined by smallint and tinyint.

Also keep in mind that some partitions will be much larger than others. If you partitioned by ID, the partitions would eventually be about the same size. Using area and area_zone, some may be much larger than others.

InnoDB versus MyISAM

Rolando answers your question about MyISAM versus InnoDB quite nicely in this post:

Choosing MyISAM over InnoDB for these project requirements; and long term options

What columns to Index

For question 3, knowing where to add an index depends on the cardinality/selectivity of the column you wish to index. If a column is very dense, in otherwords it has a large number of duplicate values, then it makes less sense to create an index for that column alone. In your situation, you have two columns, area and area_zone. Area alone may not be selective enough as there may be many area_zones per area. I think you could create a composite index on both columns to obtain a better cardinality and thus a more useful index. Keep in mind that when you add an index, there may be an impact on performance and storage as MySQL must store the data in the index you created as well as in each column being indexed. Here is a MySQL post on numeric data types to give you an idea of the amount of space each type takes.

http://dev.mysql.com/doc/refman/5.0/en/numeric-types.html

CREATE INDEX idx_area_area_zone ON LogEvent(area, area_zone);

Here is a great article on composite indexes that you may find useful:

http://www.mysqlperformanceblog.com/2009/09/19/multi-column-indexes-vs-index-merge/

Memory Attribution Question

To answer your RAM question, you could start with a relatively amount 2GB and a small InnoDB buffer pool size example (512Mb) since you are saying that SELECTs won't happen often. Once your server is in production, you can calculate how much of the buffer pool MySQL is actually using.

To help you determine that value, I am quoting Rolando from one of his previous posts:

"What you need to calculate is how much of the InnoDB Buffer Pool is loaded at any given moment on the current DB Server." Once your server is in production, you will be able to calculate what percent of your InnoDB buffer pool is actually in use. He provides formulas to help identify that percentage.

What to set innodb_buffer_pool and why..?

If you choose to use MyISAM for your LogEvent table, then you would give that memory to other variables such as key_buffer and join_buffer_size. The ratio of interest as far as memory usage is concerned, would be your Key Cache Hit Ratio

1 - (Key_reads/Key_read_requests)

Which should be as close as possible to 100%

onlineapplab.com · Answer 2 · 2012-12-04T20:13:53.087

Few times each five seconds what will roughly mean 1 write per second doesn't sound like you need anything special to handle it.

I'm not sure if you really need it make a test db put in the maximum amount of the data you are expecting and see what is the advantage of using partitioning.
If you are using it mainly for storing the data then definitely MyISAM unless you need transactions.
Index (area,area_zone) as those columns will be used in your selects.
You can cache indexes using key_buffer_size in your case it would be 3bytes per record.

You will need to make some tests as performance depends on your server hardware configuration and there is no single "right" answer suiting everyone

score 0 · Answer 3 · answered Dec 07 '12 at 23:51

INDEX(area, area_zone) is all you need for your SELECT. PARTITIONing would be of no benefit (at least for your one INSERT and one SELECT).

RAM size is mostly irrelevant; this is a low traffic database (about 1 query per second). 100 qps would get into questions about disk activity. 10K/sec gets really interesting.

The SELECT will be slow, regardless. After all, it needs to fetch and deliver a million rows?

primary key(id,area,area_zone) would get in your way, even with PARTITIONing. Don't do it.

PARTITIONing by id is useless, since you don't filter by id.

MyISAM:

The data will take 18 or 22 bytes per row, depending on the CHARACTER SET.

MyISAM does not care whether you have a PRIMARY KEY; you could drop the id and the PK.

INDEX(area, area_zone) will weigh in at about 15 bytes per row -- 3 bytes for the fields, 6 bytes for the pointer to the data, and some BTree overhead. Set key_buffer_size to about 20% of available RAM. (But there is no need for it to be bigger than the disk footprint of the .MYI file.)

InnoDB:

Plan on 100 bytes/row for InnoDB, including index(es).

InnoDB really likes to have an explicit PRIMARY KEY, as you have.

If you use InnoDB, set innodb_buffer_pool_size to about 70% of RAM. (But there is no need for it to be bigger than the disk footprint of the table.)

High insert rate on MySQL

3 Answers3