0

I have 2 tables:

Order
id: Serial
date: timestamp
item: varchar(255)
user_id: UNSIGNED INT References to users 
Users:
id: UNSIGNED INT
email: varchar(255)
type: ENUM(CUSTOMER,ADMIN)

And I want to fetch for a whole year of 2022 the customer orders:


select 
  orders.id,
  orders.item,
  orders.date,
  customer.name,
from
  orders join users on orders.user_id = users.user_id
where
  users.type = 'CUSTOMER' and date_part('year',orders.date) = 2022

But an alternate approach is to use subqueries:

select 
  orders1.id,
  orders1.item,
  orders1.date,
  customer.name,
from
  (select * from orders where date_part('year',order.date) = 2022) as orders1
  join (select * from users where users.type = 'CUSTOMER') as customers 
  on orders1.user_id = customers.user_id

But when is recomended the first approach and when is recomended the second one? What I want to achieve is faster data retrieval from the database upon the internet. Does table size is a parameter that I need to consider as well when I write select queries?

Dimitrios Desyllas
  • 873
  • 2
  • 14
  • 30

1 Answers1

0

Subqueries can help performance when they lead to a different query plan. (Like, if you aggregate in a subquery before you join ...)
That's not the case for the query you show. The second form is just a noisy, convoluted equivalent of the first - which can be optimized a bit:

SELECT o.id, o.item, o.date, c.name
FROM   orders o
JOIN   users  u USING (user_id)
WHERE  u.type = 'CUSTOMER'
AND    o.date >= '2022-01-01'
AND    o.date <  '2023-01-01';

Like a_horse commented, the rewritten filter expression is potentially much faster if it can be use an index on (date) (unlike your original, which is not "sargable"). It will be even faster without involving an index.

The literal '2022-01-01' happens to be a valid date or timestamp alike. The time part is assumed to be 00:00 if missing. It even works for timestamptz (but introduces a corner-case dependency on the time zone setting of the current session.) I did not add an explicit type cast so it is coerced to the matching type with an assignment cast.

Aside

Your table definition of "Order" gives me the twitches.

Mixed case spelling is ambiguous. Did you double-quote the name to preserve that unfortunate name? ORDER is a reserved word, don't use it as identifier. (Seems to be a typo, and the table name is orders after all.)

id: Serial - consider an IDENTITY column. See:

date: timestamp - don't name a timestamp column "date". That's just misleading. Best don't use basic type names as identifier at all.

item: varchar(255) - this particular size limit type is typically a misunderstanding in Postgres. See:

user_id: UNSIGNED INT - Postgres integer type is signed (as is smallint and bigint), you would have to install the additional module pguint, which I would not advise. See:

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633