Questions tagged [optimization]

In the context of a database, optimisation refers to the process of the query optimiser selecting an efficient physical execution plan.

SQL is a declarative language, so the actual physical operations selected to fulfil a query are not under the direct control of the person writing the query. The database management system will have the option of a number of semantically equivalent strategies to execute the query. Most database management systems have a query optimiser that generates the set (or a subset) of possible execution plans and selects the best one.

A database query plan consists of a tree structure of operations that process data to produce the final result. For any given query, there may be multiple query plans that are semantically equivalent (i.e. will return the same final result) but vary in their structure and performance. A query plan can also be transformed algorithmically into other semantically equivalent query plans.

For example, a predicate (filter) could be applied at various points in a query. It may be quicker to apply it early, reducing the amount of data to be processed by the rest of a complex query. The query optimiser can apply transformations to push the query predicate down toward the leaves of the plan so it is evaluated earlier, which avoids subsequent operations having to process unnecessary data.

Other optimisations possible include generating intermediate results and post-processing them (sometimes known as spool operations), altering the order that tables are joined, and selecting the algorithm used for the join. For example, a hash join is efficient for matching a larger table against a smaller one but has a startup overhead in generating the hash table. A nested loos join is efficient for a query that processes a small amount of data as it has little startup overhead. A merge join processes data sequentially, so it is good for processing multiple large data sets if they can be sorted on the key used for the join.

Most DBMS platforms use a technique called 'cost based query optimisation' that works by joining a large number of query plans and using statistics about data volumes and distributions of key values to estimate a cost metric for the candidate plans. The cheapest plan is then selected and executed. Cost based optimisation is heuristic, and for various reasons a cost based optimiser can produce suboptimal plans. This can necessitate tuning work on the database if the suboptimal plans cause performance issues.

Other optimisation strategies include 'rule based', where a set of transformation rules are applied to query plans where specific patterns are found in the query plan. PostgreSQL has an unusual optimiser based on a genetic algorithm that evolves optimal plans by mutating query plans over time and retaining successful mutations.

2001 questions
148
votes
4 answers

Optimizing queries on a range of timestamps (two columns)

I use PostgreSQL 9.1 on Ubuntu 12.04. I need to select records inside a range of time: my table time_limits has two timestamp fields and one integer property. There are additional columns in my actual table that are not involved with this…
Stephane Rolland
  • 8,911
  • 11
  • 33
  • 40
131
votes
7 answers

How to determine if an Index is required or necessary

I've been running an auto-index tool on our MS SQL database (I modified a script originating from Microsoft that looks at the index statistics tables - Automated Auto Indexing). From the stats, I now have a list of recommendations for indexes that…
misterjaytee
  • 1,413
  • 3
  • 12
  • 8
100
votes
4 answers

Are views harmful for performance in PostgreSQL?

The following is an excerpt from a book about db design (Beginning Database Design ISBN: 0-7645-7490-6): The danger with using views is filtering a query against a view, expecting to read a very small portion of a very large table. Any filtering…
ARX
  • 1,509
  • 3
  • 14
  • 15
74
votes
3 answers

Optimizing a Postgres query with a large IN

This query gets a list of posts created by people you follow. You can follow an unlimited number of people, but most people follow < 1000 others. With this style of query, the obvious optimization would be to cache the "Post" ids, but unfortunately…
Garrett
  • 1,083
  • 1
  • 11
  • 16
65
votes
3 answers

When to use views in MySQL?

When creating tables from multiple joins for use in analysis, when is it preferred to use views versus creating a new table? One reason that I would prefer to use views is that the database schema has been developed by our administrator from within…
David LeBauer
  • 3,162
  • 8
  • 32
  • 34
48
votes
2 answers

How to partition existing table in postgres?

I would like to partition a table with 1M+ rows by date range. How is this commonly done without requiring much downtime or risking losing data? Here are the strategies I am considering, but open to suggestions: The existing table is the master and…
Evan Appleby
  • 1,203
  • 1
  • 10
  • 16
40
votes
1 answer

USING construct in JOIN clause can introduce optimization barriers in certain cases?

It was brought to my attention that the USING construct (instead of ON) in the FROM clause of SELECT queries might introduce optimization barriers in certain cases. I mean this key word: SELECT * FROM a JOIN b USING (a_id) Just in more complex…
Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
39
votes
5 answers

Logical operators OR AND in condition and order of conditions in WHERE

Let's examine these two statements: IF (CONDITION 1) OR (CONDITION 2) ... IF (CONDITION 3) AND (CONDITION 4) ... If CONDITION 1 is TRUE, will CONDITION 2 be checked? If CONDITION 3 is FALSE, will CONDITION 4 be checked? What about conditions on…
garik
  • 6,782
  • 10
  • 44
  • 56
39
votes
3 answers

Unexpected scans during delete operation using WHERE IN

I've got a query like the following: DELETE FROM tblFEStatsBrowsers WHERE BrowserID NOT IN ( SELECT DISTINCT BrowserID FROM tblFEStatsPaperHits WITH (NOLOCK) WHERE BrowserID IS NOT NULL ) tblFEStatsBrowsers has got 553 rows. tblFEStatsPaperHits…
Mark S. Rasmussen
  • 1,455
  • 1
  • 14
  • 20
36
votes
3 answers

How to speed up select distinct?

I have a simple select distinct on some time series data: SELECT DISTINCT user_id FROM events WHERE project_id = 6 AND time > '2015-01-11 8:00:00' AND time < '2015-02-10 8:00:00'; And it takes 112 seconds. Here's the query…
36
votes
1 answer

EXPLAIN ANALYZE shows no details for queries inside a plpgsql function

I am using a PL/pgSQL function in PostgreSQL 9.3 with several complex queries inside: create function f1() returns integer as $$ declare event tablename%ROWTYPE; .... .... begin FOR event IN SELECT * FROM tablename WHERE condition LOOP …
skumar
  • 371
  • 1
  • 3
  • 7
34
votes
1 answer

Warning about memory "Excessive Grant" in the query plan - how to find out what is causing it?

I am running a query that is giving the warning about a memory Excessive Grant. There are too many tables and indexes used, including a complex view, and therefore it is difficult to add all of the definitions here. I am trying to find what might me…
32
votes
5 answers

How to optimize very slow SELECT with LEFT JOINs over big tables

I was googling, self-educating & looking for solution for hours but with no luck. I found a few similar questions here but not this case. My tables: persons (~10M rows) attributes (location, age,...) links (M:M) between persons and attributes (~40M…
Martin
  • 431
  • 1
  • 5
  • 11
29
votes
2 answers

Slow index scans in large table

Update 2020-08-04: Since this answer is apparently still being viewed regularly I wanted to provide an update on the situation. We're currently using PG 11 with table partitioning on timestamp and are easily handling a few billion rows in the…
29
votes
5 answers

Why does my SELECT DISTINCT TOP N query scan the entire table?

I've run into a few SELECT DISTINCT TOP N queries which appear to be poorly optimized by the SQL Server query optimizer. Let's start by considering a trivial example: a million row table with two alternating values. I'll use the GetNums function to…
Joe Obbish
  • 32,976
  • 4
  • 74
  • 153
1
2 3
99 100