I have an "insert only" table that store user information
- id
bigint-> Primary key, not null - user_id
bigint-> foreign key, not null - first_name
character varyingnull - middle_name
character varyingnull - last_name
character varyingnot null - email
character varyingnot null - birth_date
datenot null - phone_number
character varyingnot null - mobile_number
character varyingnot null - street_name
character varyingnull
and a user table (insert + update) (most of the times, I don't query this table)
idprimary key, not nullcreated_atdate, not nullmodified_atdate, null
I have an index on birth_date column in user_info table
and my query looks like this
SELECT * FROM user_info WHERE birth_date = @p1 and (first_name = @p2 or email = @p3) and (last_name = @p4 or email =@p3);
Most of the times, the query latency on production environment is 7-26ms but sometimes, and it's happening frequently actually, the latency jumps up to 900ms so I see in logs 3-4 queries with latency 180, 200, 700, 900ms and then goes back to 7-26ms.
- The table
user_infois insert-only, so no updates nor deletes - I am using PostgreSQL 11
- The table
user_infocontains 5 million records, data distribution looks good on production, most ofbirth_date(s)are between 200-1000 records, but only one particular birth date has 110K record (Would this be the issue?) - server configurations are the default as specified in (Azure Postgres single server)
- Server specs are (4 vCPU, 20 GB memory, Gen 5)
- Maximum concurrent connection to production db is 20 per second
- Best case scenario is only one SQL query executed (the one attached in this question)
- Worst case scenario is First: SQL Query attached in this question, Second: Insert query to
usertable, Third Insert query touser_infotable - Another scenario is First: SQL Query attached in this question, Second: Insert query to
user_infotable, Third: update modified time in theusertable - querying the
pg_stat_user_tablestrying to understand how many sequential scan vs sequential scan yielded the following result:
- executing two queries with
explain analyzefrom my local machine usingpgAdminagainst azure postgres server, one with the birth date that contains the most count and one with another birth date, the rest of the values are completely random yieleded the following results
My problem is:
- I need the latency to not exceed certain threshold
My questions are:
- What the reason behind the latency jump on
user_infotable? - Would breaking the relationship between the two tables solve the issue? Perhaps when doing updates on
usertableuser_infotable needs to update the foreign key column, because update is actually insert and delete which causes deadrows in the table? - Is it the data-distribution? And what can I do to improve the latency?
UPDATED:
I enabled auto_explain on production, I used a condition to log only queries with latency > 40ms. And ran automated script that queries existing data, 4 of them is with birth date that has 100K row. And as expected I saw only logs for that particular birth date value:
2022-09-19 {TIME REDACTED}-LOG: duration: 42.421 ms plan:
Query Text:
select * from unf.user_info unf
where (
unf.birth_date = $1
and ( unf.first_name = $2 or unf.email = $3 )
and ( unf.last_name = $4 or unf.email = $3 ))
Index Scan using idx_b_date on unf.user_info unf (cost=0.43..76483.96 rows=1 width=112) (actual time=0.044..42.411 rows=2 loops=1)
Output: id, user_id, birth_date, first_name, last_name, email, phone_number, mobile_number, street_name, created_date
Index Cond: (unf.birth_date = '{REDACTED} 00:00:00'::timestamp without time zone)
Filter: ((((unf.last_name)::text = '{REDACTED}'::text) OR ((unf.email)::text = '{REDACTED}'::text)) AND (((unf.first_name)::text = '{REDACTED}'::text) OR ((unf.email)::text = '{REDACTED}'::text)))
Rows Removed by Filter: 119228
Buffers: shared hit=11025
2022-09-19 {TIME REDACTED}--LOG: duration: 41.370 ms plan:
Query Text:
select * from unf.user_info unf
where (
unf.birth_date = $1
and ( unf.first_name = $2 or unf.email = $3 )
and ( unf.last_name = $4 or unf.email = $3 ))
Index Scan using idx_b_date on unf.user_info unf (cost=0.43..76483.96 rows=1 width=112) (actual time=0.087..41.359 rows=2 loops=1)
Output: id, user_id, birth_date, first_name, last_name, email, phone_number, mobile_number, street_name, created_date
Index Cond: (unf.birth_date = '{REDACTED} 00:00:00'::timestamp without time zone)
Filter: ((((unf.last_name)::text = '{REDACTED}'::text) OR ((unf.email)::text = '{REDACTED}'::text)) AND (((unf.first_name)::text = '{REDACTED}'::text) OR ((unf.email)::text = '{REDACTED}'::text)))
Rows Removed by Filter: 119228
Buffers: shared hit=11025
2022-09-19 {TIME REDACTED}--LOG: duration: 41.709 ms plan:
Query Text:
select * from unf.user_info unf
where (
unf.birth_date = $1
and ( unf.first_name = $2 or unf.email = $3 )
and ( unf.last_name = $4 or unf.email = $3 ))
Index Scan using idx_b_date on unf.user_info unf (cost=0.43..76483.96 rows=1 width=112) (actual time=0.079..41.682 rows=2 loops=1)
Output: id, user_id, birth_date, first_name, last_name, email, phone_number, mobile_number, street_name, created_date
Index Cond: (unf.birth_date = '{REDACTED} 00:00:00'::timestamp without time zone)
Filter: ((((unf.last_name)::text = '{REDACTED}'::text) OR ((unf.email)::text = '{REDACTED}'::text)) AND (((unf.first_name)::text = '{REDACTED}'::text) OR ((unf.email)::text = '{REDACTED}'::text)))
Rows Removed by Filter: 119228
Buffers: shared hit=11025
2022-09-19 {TIME REDACTED}--LOG: duration: 40.581 ms plan:
Query Text:
select * from unf.user_info unf
where (
unf.birth_date = $1
and ( unf.first_name = $2 or unf.email = $3 )
and ( unf.last_name = $4 or unf.email = $3 ))
Index Scan using idx_b_date on unf.user_info unf (cost=0.43..76483.96 rows=1 width=112) (actual time=0.057..40.568 rows=2 loops=1)
Output: id, user_id, birth_date, first_name, last_name, email, phone_number, mobile_number, street_name, created_date
Index Cond: (unf.birth_date = '{REDACTED} 00:00:00'::timestamp without time zone)
Filter: ((((unf.last_name)::text = '{REDACTED}'::text) OR ((unf.email)::text = '{REDACTED}'::text)) AND (((unf.first_name)::text = 'n'::text) OR ((unf.email)::text = '{REDACTED}'::text)))
Rows Removed by Filter: 119228
Buffers: shared hit=11025
