1

I have data stored in different monthly databases that have the exact same name schema for the tables and columns. Each monthly database has ~10 million rows. I'm looking for a more efficient way to select the rows I want from each database based on the same variable. There are only 5 to 10 rows from each month for that variable.

I also need to JOIN this different monthly data to the same separate database. I am currently using UNION to combine the data. So:

    SELECT a.col1, a.col2, a.col3, b.colX, b.colY, b.colZ
    FROM   db_JAN.tab1 a
    INNER JOIN db_OTHER.tabX b
    ON (a.DATE = b.DATE AND a.someID = b.someID)
    WHERE a.col1='variable1'

    UNION

    SELECT a.col1, a.col2, a.col3, b.colX, b.colY, b.colZ
    FROM   db_FEB.tab1 a
    INNER JOIN db_OTHER.tabX b
    ON (a.DATE = b.DATE AND a.someID = b.someID)
    WHERE a.col1='variable1'

    UNION

    SELECT a.col1, a.col2, a.col3, b.colX, b.colY, b.colZ
    FROM   db_MARCH.tab1 a
    INNER JOIN db_OTHER.tabX b
    ON (a.DATE = b.DATE AND a.someID = b.someID)
    WHERE a.col1='variable1'

    .
    .
    .

If the SELECT, JOIN and WHERE clauses are the same across different databases with the same naming schema, is there a more efficient way of writing this query? Something like:

    SELECT a.col1, a.col2, a.col3, b.colX, b.colY, b.colZ
    FROM   (db_JAN,db_FEB,db_MARCH...).tab1 a
    INNER JOIN db_OTHER.tabX b
    ON (a.DATE = b.DATE AND a.someID = b.someID)
    WHERE a.col1='variable1'

This question might be applicable but I'm not sure as I'm relatively new to SQL and am not sure about sequential table scans or indexing:

"How to make a union view execute more efficiently?"

I don't believe it is a hierarchical query and this question doesn't help me either.

NOTE: I am using SAS to run the query connecting to Teradata SQL database and have used a MACRO VARIABLE for 'variable1' in my PROC SQL statement. I am not sure if I can use a MACRO VARIABLE for the databases but that could be an option if I can't do it in SQL itself.

xxCDxx
  • 13
  • 1
  • 3

2 Answers2

2

There are various ways to write the query in shorter form. I'm not sure if they could improve performance (or make it worse!)

The best lines of attack if the query isn't efficient is to change it into UNION ALL (provided that the results are equivalent) and examine/optimize the parts separately, make sure indexes are used, etc.

One way to rewrite is to move the UNION or UNION ALL inside a derived table and then join:

SELECT a.col1, a.col2, a.col3, b.colX, b.colY, b.colZ
FROM  
    ( SELECT col1, col2, col3, DATE, someID
      FROM db_JAN.tab1    WHERE col1 = 'variable1' 
      UNION ALL
      SELECT col1, col2, col3, DATE, someID
      FROM db_FEB.tab1    WHERE col1 = 'variable1' 
      UNION ALL
      ---
    ) a
INNER JOIN db_OTHER.tabX b
ON (a.DATE = b.DATE AND a.someID = b.someID) ;
ypercubeᵀᴹ
  • 99,450
  • 13
  • 217
  • 306
1

Teradata supports predicates pushdown

select  a.col1, a.col2, a.col3, b.colx, b.coly, b.colz 

from                (           select col1, col2, col3, date, someid from db_jan.tab1  
                    union all   select col1, col2, col3, date, someid from db_feb.tab1 
                    union all   ...
                    ) a 

        join        db_other.tabx b 

        on          a.date   = b.date 
                and a.someid = b.someid

where   a.col1 = 'variable1'