0

Table WSHOSHO

SELECT * FROM WSHOSHO;

enter image description here

Table RRP

SELECT * FROM RRP;

enter image description here

Join to illustrate issue

The results of this join can be used to illustrate the issue:

SELECT * FROM WSHOSHO RIGHT JOIN RRP ON (WSHOSHO.DATE = RRP.DATE);

Result:

enter image description here

I.e.:

For any given row of RRP
    If there's a `WSHOSHO` row with the same date, use that
Otherwise, use the most recent WSHOSHO before RRP.DATE

Question

What's a good query to achieve this?

Example data

Here's some T-SQL code to setup the example data used here.

DROP TABLE WSHOSHO;

DROP TABLE RRP;

IF NOT EXISTS (SELECT * FROM sysobjects WHERE name='WSHOSHO') CREATE TABLE WSHOSHO ( DATE varchar(255), WSHOSHO decimal )

IF NOT EXISTS (SELECT * FROM sysobjects WHERE name='RRP') CREATE TABLE RRP ( DATE varchar(255), RRP decimal )

INSERT INTO WSHOSHO VALUES ('2023-01-11', 8100175.0), ('2023-01-18', 8079010.0), ('2023-01-25', 8062665.0), ('2023-02-01', 8024470.0), ('2023-02-08', 8024104.0), ('2023-02-15', 7990467.0), ('2023-02-22', 7986887.0), ('2023-03-01', 7948534.0), ('2023-03-08', 7948335.0), ('2023-03-15', 7940014.0), ('2023-03-22', 7936558.0), ('2023-03-29', 7926131.0), ('2023-04-05', 7877114.0);

INSERT INTO RRP VALUES ('2023-01-03', 2188272000000), ('2023-01-04', 2229542000000), ('2023-01-05', 2242486000000), ('2023-01-06', 2208265000000), ('2023-01-09', 2199121000000), ('2023-01-10', 2192942000000), ('2023-01-11', 2199170000000), ('2023-01-12', 2202989000000), ('2023-01-13', 2179781000000), ('2023-01-17', 2093328000000), ('2023-01-18', 2131678000000), ('2023-01-19', 2110145000000), ('2023-01-20', 2090523000000), ('2023-01-23', 2135499000000), ('2023-01-24', 2048386000000), ('2023-01-25', 2031561000000), ('2023-01-26', 2024069000000), ('2023-01-27', 2003634000000), ('2023-01-30', 2048714000000), ('2023-01-31', 2061572000000), ('2023-02-01', 2038262000000), ('2023-02-02', 2050063000000), ('2023-02-03', 2041217000000), ('2023-02-06', 2072261000000), ('2023-02-07', 2057958000000), ('2023-02-08', 2059604000000), ('2023-02-09', 2058942000000), ('2023-02-10', 2042893000000), ('2023-02-13', 2107775000000), ('2023-02-14', 2076548000000), ('2023-02-15', 2011998000000), ('2023-02-16', 2032457000000), ('2023-02-17', 2059662000000), ('2023-02-21', 2046064000000), ('2023-02-22', 2113849000000), ('2023-02-23', 2147417000000), ('2023-02-24', 2142141000000), ('2023-02-27', 2162435000000), ('2023-02-28', 2188035000000), ('2023-03-01', 2133950000000), ('2023-03-02', 2192355000000), ('2023-03-03', 2186150000000), ('2023-03-06', 2190793000000), ('2023-03-07', 2170195000000), ('2023-03-08', 2193237000000), ('2023-03-09', 2229623000000), ('2023-03-10', 2188375000000), ('2023-03-13', 2126677000000), ('2023-03-14', 2042579000000), ('2023-03-15', 2055823000000), ('2023-03-16', 2066319000000), ('2023-03-17', 2106166000000), ('2023-03-20', 2098393000000), ('2023-03-21', 2194631000000), ('2023-03-22', 2279608000000), ('2023-03-23', 2233956000000), ('2023-03-24', 2218458000000), ('2023-03-27', 2220131000000), ('2023-03-28', 2231749000000), ('2023-03-29', 2264862000000), ('2023-03-30', 2271531000000), ('2023-03-31', 2375171000000), ('2023-04-03', 2221010000000), ('2023-04-04', 2219375000000), ('2023-04-05', 2243011000000), ('2023-04-06', 2173663000000);

The above code is in T-SQL for SQL Server. However, I'm open to any SQL language solutions that would be considered 'idiomatic SQL'.

PowerShell approach

I have an approach that works for data in PowerShell shown below.

However, I'm wondering what the idiomatic SQL might look like.

$wshosho_data = Invoke-RestMethod 'https://fred.stlouisfed.org/graph/fredgraph.csv?id=WSHOSHO' | ConvertFrom-Csv
$rrp_data     = Invoke-RestMethod ('https://markets.newyorkfed.org/api/rp/reverserepo/propositions/search.json?startDate={0}' -f '2022-04-08')
# ----------------------------------------------------------------------
$wshosho_sorted = $wshosho_data             | Sort-Object DATE
$rrp_sorted     = $rrp_data.repo.operations | Sort-Object operationDate
# ----------------------------------------------------------------------
$wshosho_sorted | Select-Object -Last 10 | ft *
$rrp_sorted     | Select-Object -Last 10 | ft operationDate, totalAmtAccepted
# ----------------------------------------------------------------------
$wshosho_dates = $wshosho_sorted | ForEach-Object DATE
$rrp_dates     = $rrp_sorted     | ForEach-Object operationDate
# ----------------------------------------------------------------------
$rrp_earliest     = $rrp_dates     | Sort-Object | Select-Object -First 1
$wshosho_earliest = $wshosho_dates | Sort-Object | Select-Object -First 1

$earliest = $rrp_earliest, $wshosho_earliest | Sort-Object | Select-Object -Last 1

----------------------------------------------------------------------

$dates = $rrp_dates + $wshosho_dates | Sort-Object | Select-Object -Unique | Where-Object { $_ -GE $earliest }

----------------------------------------------------------------------

$table = foreach ($date in $dates) { $rrp_record = $rrp_sorted.Where( { $.operationDate -le $date }, 'Last' )[0] $wshosho_record = $wshosho_sorted.Where( { $.DATE -le $date }, 'Last' )[0]

$rrp_item     = [decimal] $rrp_record.totalAmtAccepted
$wshosho_item = [decimal] $wshosho_record.WSHOSHO

[PSCustomObject]@{
    date    = $date
    wshosho = $wshosho_item
    rrp     = $rrp_item
}

}

enter image description here

White Owl's approach

Here's an approach based on White Owl's answer below.

SELECT
RRP.DATE,
(
    CASE WHEN WSHOSHO.DATE IS NULL
        THEN
        (
            SELECT TOP(1) TBL.WSHOSHO FROM WSHOSHO as TBL WHERE TBL.DATE < RRP.DATE ORDER BY TBL.DATE DESC
        )
        ELSE
            WSHOSHO.WSHOSHO
        END
) AS WSHOSHO,
RRP.RRP
FROM RRP LEFT JOIN WSHOSHO ON (WSHOSHO.DATE = RRP.DATE);

enter image description here

Paul White
  • 94,921
  • 30
  • 437
  • 687
dharmatech
  • 165
  • 7

2 Answers2

3

In SQL Server 2022, an idiomatic solution would use a window function:

SELECT
    R.[DATE],
    WSHOSHO =
        LAST_VALUE(W.WSHOSHO) IGNORE NULLS OVER (
            ORDER BY R.[DATE] ASC
            ROWS UNBOUNDED PRECEDING),
    R.RRP
FROM WSHOSHO AS W
RIGHT JOIN RRP AS R
    ON (R.[DATE] = W.[DATE])
ORDER BY
    R.[DATE] ASC;

db<>fiddle demo

Closely related Q & A: How to get the last not-null value in an ordered column of a huge table?

Paul White
  • 94,921
  • 30
  • 437
  • 687
0

Tasks like that is easier to do in imperative languages (perl, python, etc). Almost all reporting tools have such functionality from a box.

In SQL it is indeed difficult to do, but there are few tricks

if you need to populate just one field, you can do a subselect in a column

select
   rrp.date,
   rrp.rrp,
   (case when wshosho.date is null 
        then (select b.wshosho
              from wshosho b
              where b.date<rrp.date 
              order by b.date desc
              limit 1)
     else wshosho.wshosho end)
from rrp
left join wshosho on (wshosho.date = rrp.date);

SQL Server uses TOP or OFFSET/FETCH.

If your DBMS support temp tables, then

select date, 0 as wshosho into #t from rrp
update #t set wshosho = wshosho from wshosho where #t.date=wshosho.date
update #t set wshosho = (select wshosho from #t b where b.date<#t.date order by desc limit 1)

select * from rrp join #t on rrp.date = #t.date

It can also be done with cursor in a loop (if DBMS has such ability).

But usually, this is done on a client side with sequential fetch.

Paul White
  • 94,921
  • 30
  • 437
  • 687
White Owl
  • 1,029
  • 3
  • 9