For huge tables, the main part of performance is I/O. The smaller the datatypes are, the less I/O is needed.
When laying out a schema, think about
- Use the smallest
...INT that can safely hold the possible values. Most people blindly use the 4-byte INT, even for true/false flags. Some products blindly use the 8-byte BIGINT. BIGINT is not justified 98% of the time.
- Use
UNSIGNED when appropriate (which is most of the time).
- Do not blindly say
VARCHAR(255) for strings.
- There is a 3-way battle between
ENUM, TINYINT UNSIGNED, and VARCHAR; each has advantages and disadvantages. The first two are only 1 byte.
- Indexes take space, too, and have copies of the columns. But that is another large discussion. Indexes are worth having, but don't blindly "index every column".
- Use the appropriate
CHARACTER SET.
- Use
CHAR only for truly fixed-length strings. Such are almost always CHARACTER SET ascii -- think country_code, postal_code, etc.
- Never use
TINYTEXT. (The other sizes of TEXT are useful.)
It is good to get into the habit of using smaller datatypes, even for non-huge tables.
For tables that I write, I "know" what size the ints are / will be. Without PROCEDURE ANALYSE, I simply say SELECT MIN(a), MAX(a) ... for numbers; SELECT MAX(LENGTH(s)) ... for VARCHAR and TEXT, but I add a fudge factor for future strings.