3

Based on Sqlite docs: https://www.sqlite.org/datatype3.html#type_conversions_prior_to_comparison, especially this statement:

If one operand has INTEGER, REAL or NUMERIC affinity and the other operand has TEXT or BLOB or no affinity then NUMERIC affinity is applied to other operand.

I would expect the following query:

CREATE TABLE `invoice` (
  `id` INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
  `amount` DOUBLE PRECISION DEFAULT NULL
);

insert into invoice (amount) values (4.0); insert into invoice (amount) values (15.0); insert into invoice (amount) values (4.0);

select *, typeof(amount), amount = '4', typeof(sum(amount)), sum(amount) = '4', sum(amount) = '4.0', sum(amount) = 4 from invoice group by id;

to return the same result for sum(amount) = '4' as for amount = '4' for each row as both operand types have the same type in each comparison (verified using typeof(), for non-SUM() the comparison is working as expected).

Demo: http://sqlfiddle.com/#!5/59238/2

mvorisek
  • 428
  • 1
  • 5
  • 20

2 Answers2

2

I think you've spotted a slight inconsistency in the behaviour of SQLite! All of the code below is available on the fiddle here.

First we have:

SELECT 4 = '4' AS "INT_to_TEXT";  -- No coercion because it's not a field!

Result:

INT_to_TEXT
          0  <<--- FALSE!

Then we do this:

SELECT 4 = CAST('4' AS INTEGER) AS "Casted";  -- <<-- Explicit CAST required

Result:

Casted
     1  <<--- TRUE!

So, now we try the table:

SELECT
  amount = 4  AS "Int",
  amount = '4' AS "Coerced" -- <<-- Implicit coercion occurs here because it's a field!
FROM
  invoice;

Result:

Int Coerced
  1       1    = 4  therefore TRUE
  0       0    = 15 therefore FALSE  4 != 15 any way you do it!
  1       1    = 4  therefore TRUE

Finally,

SELECT
  SUM(amount) AS    "The sum (INT)",
  SUM(amount) = 4 AS "Expected", -- <<---- t/f/t as expected
  SUM(amount) = '4' AS "No coercion 1",    -- <<--- All FALSE
  SUM(amount) = CAST('4' AS INTEGER) AS "Casted", -- <<--- t/f/t as expected 
  SUM(amount) = '4.0' "No coercion 2"   -- <<--- All FALSE
FROM
  invoice
GROUP BY id;

Result:

The sum (INT)  Expected  No coercion 1  Casted  No coercion 2
            4         1              0       1              0
           15         0              0       0              0
            4         1              0       1              0

So, the cases where it's f/f/f are the ones where you try and coerce SUM(amount) to match TEXT! It seems to me that what's happening is that SQLite becomes confused when you try and implicitly coerce a string to match a SUM. I thought it might be all aggregate functions, but it's not!

Check this out:

SELECT
  COUNT(*) = 3,
  COUNT(*) = '3',   -- <<---- t/f COUNT() has the same behaviour as SUM
  MAX(amount) = 15,
  MAX(amount) = 15  -- <<---- t/t But, MAX() does not! 
FROM
  invoice;

Result:

COUNT(*) = 3  COUNT(*) = '3' MAX(amount) = 15 MAX(amount) = 15
           1              0                 1                1

So, for COUNT, it's t/f but for MAX, it's t/t!

As I said, I think you've discovered a discrepancy in the behaviour of SQLite, unless of course D. Richard Hipp is using some logic which I haven't quite understood - and he's a pretty smart cookie! Shall you report it or shall I?

Although, quite what utter lunatic would actually want to do this "in real life" is way way beyond my pay grade! :-)

And finally, just an FYI, you don't need backticks for SQLite - even MySQL doesn't require them any longer! +1 for an interesting post!

Vérace
  • 30,923
  • 9
  • 73
  • 85
2

Type affinity should not be confused with Data Type (which Sqlite calls Storage Class).

Every single value in Sqlite has a Storage Class, which is always either NULL, INTEGER, REAL, TEXT or BLOB.

4 is a value of the INTEGER storage class. 'xyz' is a value of the TEXT storage class.

Note that typeof() returns the Storage class of a value, not the type affinity of the column.

Since in Sqlite columns are not required to have a specific storage class, and you can store both 4 or 'xyz' in the same column, Sqlite introduced the concept of Type Affinity, which is an indication of which kind of values SHOULD be stored in that column, if possible. So if you declare a column

amount DOUBLE PRECISION

you are actually saying that amount should contain REAL values (has a REAL type affinity), but the actual values can be of any type.

Sqlite tries to satisfy this affinity when you insert a value of '4' into amount: it converts it to a REAL value before storing it. But if you insert a value of 'xyz', it cannot convert it to a REAL and it will store the value as a TEXT.

When you use the column amount in a query, Sqlite presumes that the values stored in the column should be REAL values. So when it does the comparison amount = '4' , it will apply rules for type affinity: '4' will be converted to a REAL and compared to what is stored in the column.

Since type affinity is dictated by the column definition at create time, and not by the single values contained, this implies that:

A. Type affinity of amount will always be REAL, even when the value contained in a row is 'xyz'.

B. Anything that is not a column has never been declared with a type affinity, so any expression that is not a simple reference to a column of a real table, or is not explictly CASTed, doesn't have a type affinity.

This means that 4 and '4' have no type affinity (even if they are constants of the INTEGER and TEXT storage class, respectively). Also amount+10, sum(amount) or min(amount) are all expressions that don't have a type affinity (you didn't declare to Sqlite which type those expressions should be), but, when evaluated for every single row, they will evaluate to a value in a specific storage class.

This is why:

  • 4 = '4' is FALSE, because neither has a type affinity and sqlite compares an integer with a string
  • amount = '4' is TRUE when a 4 is stored in the row, because since amount has a type affinity of REAL, sqlite expects all the values contained should be real and converts '4' to 4 before the comparison.
  • sum(amount) = '4' is FALSE even when the sum is actually 4, because sum(amount) is not a simple reference to a column, it's an expression and thus it has no type affinity. 4 is then compared to '4' and the result is false as in the first example.
  • but CAST(sum(amount) AS REAL) = '4' is TRUE when the sum is actually 4, because the CAST assigns a REAL type affinity to the left hand expression, so sqlite knows it has to convert the right hand '4' to 4 before comparison.

Also, if I understand correctly, while storage classes are checked and evaluated at runtime (because different rows can have different amount values of different storage classes), I think type affinity is evaluated at prepare time, so any necessary conversion is actually established, for every row, before checking the values in the rows.

Andrea B.
  • 1,731
  • 8
  • 13