I have two database servers, connected via Linked Servers. Both are SQL Server 2008R2 databases, and the linked server connection is made via a regular "SQL Server" link, using the current login's security context. The linked servers are both in the same datacentre, so the connection shouldn't be an issue.
I use the following query to check which values of the column identifier are available remotely, but not locally.
SELECT
identifier
FROM LinkedServer.RemoteDb.schema.[TableName]
EXCEPT
SELECT DISTINCT
identifier
FROM LocalDb.schema.[TableName]
On both tables are non-clustered indexes on the column identifier. Locally are around 2.6M rows, remotely only 54. Yet, when looking at the query plan, 70% of the execution time is devoted to "executing remote query". Also, when studying the complete query plan, the number of estimated local rows is 1 instead of 2695380 (which is the number of estimated rows when selecting only the query coming after EXCEPT).
When executing this query, it takes a long time indeed.
It makes me wonder: Why is this? Is the estimation "just" way off, or are remote queries on linked servers really that expensive?