I have a Table with a HASH on a Employee_Key, it also Hold a Date_From and Date_To.
To build the facts I need to change it on a by_date column.
But what is the best practice in Azure DataWarehousing with the less DataMoving as possible.
I've tried with a INNER JOIN to a Dates table that was in ROUND_ROBIN, or tried to put all the dates in a View (with a trick that only sys.objects is called), .. All working but ... not sure what is the better way on performance due to the Data-Movements between the 60 nodes ...
Any suggestions ?
Here some parts of the tables ...
CREATE TABLE [edw_hr_temp].[DIM_EMPLOYEE]
(
[EMP_UID] [BIGINT] NULL,
[EMP_BK] [CHAR](8) NULL,
[EMP_DATE_FROM] [DATE] NULL,
[EMP_DATE_TO] [DATE] NULL,
)
WITH
(
DISTRIBUTION = HASH ( [EMP_BK] ),
CLUSTERED COLUMNSTORE INDEX
)
1st Option a Real Table for Dates
CREATE TABLE [bi_msft].[ALL_DATES]
(
[ADATE] [DATE] NULL
)
WITH
(
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED COLUMNSTORE INDEX
)
2nd option is in the Stored Procedure user this as a subquery of store it in a #tempTable
SELECT d
FROM (SELECT d = DATEADD(DAY, rn - 1, @FROM_DATE)
FROM (SELECT TOP (DATEDIFF(DAY, @FROM_DATE, @TO_DATE))
rn = ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
ORDER BY s1.[object_id]
) AS x
) AS y;
So basically my question is, when using the 2nd variant, would this query run on each node and so no data movement needed of will it run on one node and will all be moved....