In the condition filtering feature that MySQL introduced from version 5.7, has filter estimates for various condition formats in the join order, to estimate the number of rows that enter the join order pipe line after joins.
Assume a query joining three tables A, B and C, and the criteria is A.col_a > B.col_b and C.col_c > 10.
If the join order for this query is A->B->C then, the query plan, attaches the heuristic guesstimate of 33.33 percent after both A and B are in the plan at the stage of A->B.
But if a query joining the same three tables A, B and C, had criteria such that (A.col_a = 3 or B.col_b = 17) and C.col_c > 10, the combined criteria (A.col_a = 3 or B.col_b = 17) is added to the attached conditions at the stage in join order when A->B but the filter condition percent is just 100. I suspected it to be following the probability theory and a filtering percentage of 55.55.
P(A.col_a = 3 or B.col_b = 17) = P(A.col_a = 3) + P(B.col_b = 17) - P(A.col_a = 3) * P(B.col_b = 17)
P(A.col_a = 3 or B.col_b = 17) = 0.3333 + 0.3333 - 0.1111
P(A.col_a = 3 or B.col_b = 17) = 0.5555
Why is this not applied in the planning? I see the documentation says:
A condition contributes to the filtering estimate only if:
It refers to the current table.
It depends on a constant value or values from earlier tables in the join sequence.
It was not already taken into account by the access method.
Why have they not included criteria that also refers to tables that are already in the join order also? Why is the case A.col_a op B.col_b case included but not A.col_a op val or B.col_b op val not included in the rows estimation method?