I am using PostgreSQL and I am trying to create a view of data:
forecast_datetime, file_date, thing_id, thing_name, facts...
The forecast_datetime column is a datetime for each 30 minute interval. The issue is that the forecast is for 1 month into the future which causes a lot of duplicate data. (so today, there will be a forecast for 2016-12-25 01:30:00 and then tomorrow there will be a forecast for the same interval, over and over again potentially until the period actually happens.).
The view I want to create is to just take a look at the last forecast which was appended to the table. This is indicated by the file_date column (as the name suggests, I receive a daily forecast file which is appended to the database. This column indicates the date from the filename).
I basically want to filter all rows for each thing_id and select the forecast_datetime columns where there file_date was the maximum date for that forecast. So each thing_id should only have one row per unique forecast_datetime interval which is based on the latest file_date.
CREATE OR REPLACE VIEW schema.view AS
SELECT DISTINCT ON (thing_id, forecast_datetime, file_date)
forecast_datetime,
interval,
time_zone,
file_date,
thing_id,
thing_name,
other_cols...
FROM schema.orig_table
ORDER BY file_date DESC
Would that query do the trick? I can do this in pandas/python easily, but I am newer to SQL and wanted to try a view instead of writing a new physical table.
Here is some sample data after applying the query I wrote above this. Because I included the the file_date in the SELECT DISTINCT query, there is still a repeat row which duplicates each forecast_datetime. This causes issues when I join this data based on the thing_id and forecast_datetime back to the actual results table.
forecast_datetime file_date thing_id
2016-12-12 20:30 2016-12-11 99
2016-12-12 20:30 2016-12-10 99
2016-12-12 20:30 2016-12-09 99
2016-12-12 20:30 2016-12-08 99
I would like to only have the row with the max file_date. In the example above, that would be the first row (2016-12-11). The other rows would be dropped from the view.