7

I'm trying to join a few rows from a remote view to a local table. The view has about 300 million rows, so I want to use the REMOTE query hint so that all 3 million rows don't have to be transferred to my computer.

SELECT R.Something, L.ID, L.Something
FROM [dbo].[LocalTable] L
INNER JOIN (
    SELECT TOP 100 Something, L_ID FROM [RemoteServer].[RemoteDB].[dbo].[RemoteTable]
    ) R
ON L.ID = R.L_ID

This returns 100 rows, as I expected, and takes basically no time, as I expected.

However,

SELECT R.Something, L.ID, L.Something
FROM [dbo].[LocalTable] L
INNER REMOTE JOIN (
    SELECT TOP 100 Something, L_ID FROM [RemoteServer].[RemoteDB].[dbo].[RemoteTable]
    ) R
ON L.ID = R.L_ID

starts to return thousands of rows. I quit it after a few seconds, but it was in the tens - hundreds of thousands.

How could a query hint change my result set?

paparazzo
  • 5,048
  • 1
  • 19
  • 32
xyzzy
  • 71
  • 2

2 Answers2

18

TOP 100 with no ORDER BY means it is undeterministic which 100 rows from the remote table end up participating in the join. This is execution plan dependant and can vary.

If it is a one to many relationship it may be the case that one batch of 100 rows has more matches on the other side of the join than another different batch of 100 rows.

You should specify an ORDER BY (inside the derived table) on some unique column or combination of columns to ensure deterministic results.

Martin Smith
  • 87,941
  • 15
  • 255
  • 354
1

You can try forcing the remote query to run remotely:

SELECT R.Something, L.ID, L.Something
FROM [dbo].[LocalTable] L
INNER JOIN (
        SELECT TOP 100 Something, L_ID 
        FROM OPENQUERY([RemoteServer], 'SELECT Something, L_ID
                                        FROM [RemoteDB].[dbo].[RemoteTable]'
                      )
               ) R
ON L.ID = R.L_ID

Or (if you want the 100 limiter to be in the remote query):

SELECT R.Something, L.ID, L.Something
FROM [dbo].[LocalTable] L
INNER JOIN (
        SELECT Something, L_ID 
        FROM OPENQUERY([RemoteServer], 'SELECT TOP 100 Something, L_ID
                                        FROM [RemoteDB].[dbo].[RemoteTable]'
                      )
               ) R
ON L.ID = R.L_ID
SQLDevDBA
  • 2,244
  • 10
  • 12