Conflicting solutions
You would need a multicolumn index like @jjanes provided. While being at it, you could make (item_id, event_time) the primary key to provide the index automatically.
But that's conflicting with write performance like @Michael explained: You double the cost for 50K of items ... updated every 1 to 5 minutes to make occasional SELECT queries cheaper. That's about 1 mio. rows per hour.
Partitioning
If you don't have more conflicting requirements, the compromise could be partitioning where the current partition does not have an index, yet. This way you get top write performance and (almost) top read performance.
The parent table could be movement_history, the current partition movement_history_current. No indexes, only one constraint to allow constraint exclusion. Could be daily partitions per default. But the time intervals can be anything, does not have to be regular, even. We can work with that and start a new partition whenever we need to.
When you need to include current data in said query, do the following:
To start a new partition, in one transaction:
- Rename the current partition by appending sth. to the name, like
movement_history_20150110_20150115 (or more specific) and adjust the constraint on event_time.
- Create a new partition with the ever same name
movement_history_current and a constraint on event_time that does not overlap with the last one and with open end.
- Depending on your access patterns you may have to deal with concurrent write access ...
Add a PK on (item_id, event_time) to the hew historic partition. Not in the same transaction. Creating the index in one piece is much cheaper than incrementally adding to it.
2a. To integrate advice for your second query below:
REFRESH MATERIALIZED VIEW mv_last_movement
Run query. Actually, you can run the query any time. If it includes the current partition or any partition that doesn't have the index yet, it's slower for that partition.
Archive the oldest partitions from time to time. Just backup and delete the table. Does not interfere with ongoing operation much, that's the beauty of partitioning.
Read the manual first. There are caveats for inheritance and partitioning.
Your second query
The second query you added in an edit is the far bigger issue for performance. I am talking orders of magnitude:
select distinct on (item_id) * from movement_history
order by item_id, event_time desc;
Once you start inserting 1 mio. rows per hour, performance for this query will quickly deteriorate. You are dealing with many, many rows per item, DISTINCT ON is only good for few rows per item. Detailed explanation for DISTINCT ON and faster alternatives:
I still suggest partitioning like in my first answer. But enforce a new partition in reasonable intervals, so the current partition does not get too big.
In addition, create a "materilaized view" tracking the latest state for each item. It's not a standard MATERIALIZED VIEW because the defining query has a self-reference. I name it mv_last_movement and it has the same row type as movement_history.
Refresh whenever a new partition starts (see above).
Assuming the existence of an item table:
CREATE TABLE item (
item_id varchar(8) PRIMARY KEY -- should really be a serial
-- more columns?
);
If you don't have one, create it. Or use the alternative recursive CTE technique outlined in the answer linked above.
Init mv_last_movement once:
CREATE TABLE mv_last_movement AS
SELECT m.*
FROM item i
, LATERAL (
SELECT *
FROM movement_history_current -- current partition
WHERE item_id = i.item_id -- lateral reference
ORDER BY event_time DESC
LIMIT 1
) m;
ALTER TABLE mv_last_movement ADD PRIMARY KEY (item_id);
Then, to refresh (in a single transaction!):
BEGIN;
CREATE TABLE mv_last_movement2 AS
SELECT m.*
FROM item i
, LATERAL (
( -- parentheses required
SELECT *
FROM movement_history_current -- current partition
WHERE item_id = i.item_id -- lateral reference
ORDER BY event_time DESC
LIMIT 1 -- applied to this SELECT, not strictly needed but cheaper
)
UNION ALL -- if not found, fall back to latest previous state
SELECT *
FROM mv_last_movement -- your materialized view
WHERE item_id = i.item_id -- lateral reference
LIMIT 1 -- applied to whole UNION query
) m;
DROP TABLE mv_last_movement;
ALTER TABLE mv_last_movement2 RENAME mv_last_movement;
ALTER TABLE mv_last_movement ADD PRIMARY KEY (item_id);
COMMIT;
Or similar. More details here:
The very same query from above (bold emphasis) also replaces your original query cited at the top.
This way you don't have to inspect the whole history for items without current rows, which would be extremely expensive.
Why UNION ALL ... LIMIT 1?
More advice
varchar for PK / FK columns is inefficient, especially for big tables with 1 mio rows per hour. Use integer keys instead.
Always use ISO format for date and timestamp literals or your queries depend on locale settings: '2015-15-01' instead of '1-15-2015'.
Add NOT NULL constraints where the column can't be NULL.
Optimize your table layout to avoid space lost to padding