1

Hopefully the title can make sense to somebody, so I can get somebody working with a database (BigQuery in my case) to look into into.

Given two quarters, I need to find out the # of orders placed by each consumer for each of the two. I need also to count how many consumer I have in Q3 vs Q4 and do some calculation on the consumer retention.

I wish I could obtain the majority of this numbers with just one query, thus avoiding a number of queries.

I'll paste below a query in the attempt to explain what I need

  SELECT T1.email , T2.email
  FROM
  ( 
          SELECT email, 1 AS KEY
          FROM [reporting.order]
          WHERE 1 = 1
          AND timestampOrdered > TIMESTAMP_TO_USEC(TIMESTAMP("2014-07-01 05:00:00")) 
          AND timestampOrdered < TIMESTAMP_TO_USEC(TIMESTAMP("2014-10-01 05:00:00")) 
          AND orderStatus IN ('Delivered', 'Collected', 'PaymentCaptured')
          AND name NOT LIKE '%test%'
          AND restaurantId NOT LIKE '%test%'
          GROUP BY email
  ) T1
  JOIN (
          SELECT email, 1 AS KEY
          FROM [reporting.order]
          WHERE 1 = 1
          AND timestampOrdered > TIMESTAMP_TO_USEC(TIMESTAMP("2014-10-01 05:00:00")) 
          AND timestampOrdered < TIMESTAMP_TO_USEC(TIMESTAMP("2015-01-01 05:00:00")) 
          AND orderStatus IN ('Delivered', 'Collected', 'PaymentCaptured')
          AND name NOT LIKE '%test%'
          AND restaurantId NOT LIKE '%test%'
          GROUP BY email
  ) T2
  ON T1.KEY = T2.KEY

By adding an outer SELECT, with the above results I was able to count how many emails are in Q4 that were in Q3 (SELECT COUNT(CASE WHEN (T1.Email = T2.Email) THEN 1 ELSE NULL END) AS InPrevQ FROM) but anything more than this.

I thought to use the WINDOW function, but I'm struggling to understand its usage.

Paul White
  • 94,921
  • 30
  • 437
  • 687
Andrea Moro
  • 255
  • 2
  • 4
  • 14

1 Answers1

2

You are probably over-thinking it. What if you were to consider your "base" query as something along the lines of the following:

SELECT  CASE 
            WHEN TimestampOrdered >= TIMESTAMP_TO_USEC( TIMESTAMP( "2014-07-01 05:00:00" ) ) 
             AND TimestampOrdered < TIMESTAMP_TO_USEC( TIMESTAMP( "2014-10-01 05:00:00" ) )
              THEN Email
        END AS orderedbyQ3,
        CASE
            WHEN TimestampOrdered >= TIMESTAMP_TO_USEC( TIMESTAMP( "2014-10-01 05:00:00" ) ) 
             AND TimestampOrdered < TIMESTAMP_TO_USEC( TIMESTAMP( "2015-01-01 05:00:00" ) ) 
              THEN Email
        END AS orderedbyQ4
FROM    [reporting.order]
WHERE   timestampOrdered >= TIMESTAMP_TO_USEC( TIMESTAMP( "2014-07-01 05:00:00" ) ) 
    AND timestampOrdered < TIMESTAMP_TO_USEC( TIMESTAMP( "2015-01-01 05:00:00" ) ) 
    AND orderStatus IN ( 'Delivered', 'Collected', 'PaymentCaptured' )
    AND name NOT LIKE '%test%'
    AND restaurantId NOT LIKE '%test%'
GROUP BY email;

You could then rather easily get distinct email address counts and order counts, or whatever else you needed to derive:

SELECT  COUNT( DISTINCT OrderedByQ3 ) AS RestaurantsQ3,
        COUNT( OrderedByQ3 ) AS OrdersQ3,
        COUNT( DISTINCT OrderedByQ4 ) AS RestaurantsQ4,
        COUNT( OrderedByQ4 ) AS OrdersQ4
FROM (  SELECT  CASE 
                    WHEN TimestampOrdered >= TIMESTAMP_TO_USEC( TIMESTAMP( "2014-07-01 05:00:00" ) ) 
                     AND TimestampOrdered < TIMESTAMP_TO_USEC( TIMESTAMP( "2014-10-01 05:00:00" ) )
                      THEN Email
                END AS OrderedByQ3,
                CASE
                    WHEN TimestampOrdered >= TIMESTAMP_TO_USEC( TIMESTAMP( "2014-10-01 05:00:00" ) ) 
                     AND TimestampOrdered < TIMESTAMP_TO_USEC( TIMESTAMP( "2015-01-01 05:00:00" ) ) 
                      THEN Email
                END AS OrderedByQ4
        FROM    [reporting.order]
        WHERE   timestampOrdered >= TIMESTAMP_TO_USEC( TIMESTAMP( "2014-07-01 05:00:00" ) ) 
            AND timestampOrdered < TIMESTAMP_TO_USEC( TIMESTAMP( "2015-01-01 05:00:00" ) ) 
            AND orderStatus IN ( 'Delivered', 'Collected', 'PaymentCaptured' )
            AND name NOT LIKE '%test%'
            AND restaurantId NOT LIKE '%test%'
        GROUP BY email ) t;

Sorry if the syntax isn't perfect, I don't have access to a BigQuery DB.

Avarkx
  • 2,423
  • 13
  • 23