3

I'm working on a college assignment about how the query planner uses statistics and makes the most optimal query plan.

I have read the 57.1. Row Estimation Examples article and I know how PostgreSQL calculates the amount of entities taken from the database.

But how does the RDBMS determine which rows must be taken. For example, database calculated that the current query need 1000 entities. But which exactly? How is it determined?

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
Jazi
  • 131
  • 2

1 Answers1

2

You may have missed that spot in the chapter of the manual you quote:

Note also that since ANALYZE uses random sampling while producing statistics, the results will change slightly after any new ANALYZE.

Emphasis mine.

If you actually mean "how many" rows in your question: the query planner has selectivity estimators for certain operators in the conditions. In combination with the statistics gathered the planner estimates the number of rows for every step.

Details of row estimation can be found the chapter of the manual you already link to in your question.

Estimations can be way off if the data distribution is very uneven within a table. That's why raising the setting for default_statistics_target (and running ANALYZE afterwards) may help in such cases.

Some special operators have special selectivity estimators. Like text search. Details in the source code in src/backend/tsearch/ts_selfuncs.c

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633