-1

I'm using Databricks and I have a table with a list of event from various years. I want to return the event most recent to today's date from each year. For example,

Today's date is 6th May and my table is thus:

Year (int) Date (date) Event (str)
2021 2021-08-04 Ate apple
2021 2021-04-16 Flew plane
2020 2020-10-11 Swam 100 miles
2020 2020-03-07 Did backflip
2020 2020-01-01 Tidied room
2019 2019-09-30 Found 10 pence
2018 2018-02-22 Lost 10 pence

So I would want to return:

On this day in history your most recent achievements were:

Year Date Event
2021 2021-04-16 Flew plane
2020 2020-03-07 Did backflip
2018 2018-02-22 Lost 10 pence

Is there a neat way of doing this?...and by neat I mean, without creating extra columns or tables i.e. by comparing CURRENT_DATE to my Date field.

ben_al
  • 1
  • 1

1 Answers1

0

One way of achieving what you want might be to use the window function row_number() to order matching event dates within each year (window), then pick only the most recent date each year.

-- Use a common table expression to pre-select and order dates we want
with past_events as ( 
   select 
      Date, Event, 
      -- within each year, order by the day since the beginning of that year
      -- in the descending order, and assign row numbers accordingly
      row_number() over (partition by year(Date) order by dayofyear(Date) desc) rn
   from some_table
   -- only pick days from each year in the past that's no later than today
   where dayofyear(Date) <= dayofyear(current_date())
   and   year(Date) < year(current_date())
) 
select year(Date), Date, Event
from past_events
where rn = 1 -- now pick only first (most recent) date of each year
order by Date desc

This is "paper code" as I have no way of testing it with Databricks, but it should give you the basic idea.


You may need to quote column names if Date or Event happen to be reserved words in Databricks.

mustaccio
  • 28,207
  • 24
  • 60
  • 76