1

We're having a problem with a query that keeps executing very very slowly

SELECT `e`.*,
       IF(at_news_from_date.value_id > 0, at_news_from_date.value, at_news_from_date_default.value) AS `news_from_date`,
       IF(at_news_to_date.value_id > 0, at_news_to_date.value, at_news_to_date_default.value)       AS `news_to_date`
FROM `catalog_product_entity` AS `e`
         INNER JOIN `catalog_product_website` AS `product_website`
                    ON product_website.product_id = e.entity_id AND product_website.website_id = 3
         LEFT JOIN `catalog_product_entity_datetime` AS `at_news_from_date_default`
                   ON (`at_news_from_date_default`.`entity_id` = `e`.`entity_id`) AND
                      (`at_news_from_date_default`.`attribute_id` = '81') AND `at_news_from_date_default`.`store_id` = 0
         LEFT JOIN `catalog_product_entity_datetime` AS `at_news_from_date`
                   ON (`at_news_from_date`.`entity_id` = `e`.`entity_id`) AND
                      (`at_news_from_date`.`attribute_id` = '81') AND (`at_news_from_date`.`store_id` = 9)
         LEFT JOIN `catalog_product_entity_datetime` AS `at_news_to_date_default`
                   ON (`at_news_to_date_default`.`entity_id` = `e`.`entity_id`) AND
                      (`at_news_to_date_default`.`attribute_id` = '82') AND `at_news_to_date_default`.`store_id` = 0
         LEFT JOIN `catalog_product_entity_datetime` AS `at_news_to_date`
                   ON (`at_news_to_date`.`entity_id` = `e`.`entity_id`) AND
                      (`at_news_to_date`.`attribute_id` = '82') AND (`at_news_to_date`.`store_id` = 9)
WHERE (((((IF(at_news_from_date.value_id > 0, at_news_from_date.value, at_news_from_date_default.value) <=
           '2021-10-06 23:59:59') OR
          (IF(at_news_from_date.value_id > 0, at_news_from_date.value, at_news_from_date_default.value) IS null)))))
  AND (((((IF(at_news_to_date.value_id > 0, at_news_to_date.value, at_news_to_date_default.value) >=
           '2021-10-06 00:00:00') OR
          (IF(at_news_to_date.value_id > 0, at_news_to_date.value, at_news_to_date_default.value) IS null)))))
  AND ((IF(at_news_from_date.value_id > 0, at_news_from_date.value, at_news_from_date_default.value) IS not null) OR
       (IF(at_news_to_date.value_id > 0, at_news_to_date.value, at_news_to_date_default.value) IS not null))

The query provides a list of new products but takes minutes to execute. I'm not sure how to optimize this as all the conditions are needed to actually determine the new products.

It's actually taking minutes to execute.

Adding some profile results

enter image description here

And explain

enter image description here

Is it possible to somehow tweak the database to improve the execution time of this query?

Here's the structure on the used tables

-- auto-generated definition
create table catalog_product_entity
(
    entity_id              int unsigned auto_increment comment 'Entity ID'
        primary key,
    attribute_set_id       smallint unsigned default 0                   not null comment 'Attribute Set ID',
    type_id                varchar(32)       default 'simple'            not null comment 'Type ID',
    sku                    varchar(64)                                   null comment 'SKU',
    has_options            smallint          default 0                   not null comment 'Has Options',
    required_options       smallint unsigned default 0                   not null comment 'Required Options',
    created_at             timestamp         default current_timestamp() not null comment 'Creation Time',
    updated_at             timestamp         default current_timestamp() not null on update current_timestamp() comment 'Update Time',
    mv_product_id          int unsigned                                  null comment 'Megaventory Id',
    mageworx_is_require    smallint          default 0                   not null comment 'MageWorx Is Required',
    is_replace_product_sku tinyint(1)        default 0                   not null comment 'Is Replace Product SKU',
    created_by             int                                           null comment 'Created by Mass Product Import profiles id',
    updated_by             int                                           null comment 'Created by Mass Product Import profiles id'
)
    comment 'Catalog Product Table';

create index CATALOG_PRODUCT_ENTITY_ATTRIBUTE_SET_ID on catalog_product_entity (attribute_set_id);

create index CATALOG_PRODUCT_ENTITY_SKU on catalog_product_entity (sku);

-- auto-generated definition
create table catalog_product_website
(
    product_id int unsigned      not null comment 'Product ID',
    website_id smallint unsigned not null comment 'Website ID',
    primary key (product_id, website_id),
    constraint CATALOG_PRODUCT_WEBSITE_WEBSITE_ID_STORE_WEBSITE_WEBSITE_ID
        foreign key (website_id) references store_website (website_id)
            on delete cascade,
    constraint CAT_PRD_WS_PRD_ID_CAT_PRD_ENTT_ENTT_ID
        foreign key (product_id) references catalog_product_entity (entity_id)
            on delete cascade
)
    comment 'Catalog Product To Website Linkage Table';

create index CATALOG_PRODUCT_WEBSITE_WEBSITE_ID
    on catalog_product_website (website_id);


-- auto-generated definition
create table catalog_product_entity_datetime
(
    value_id     int auto_increment comment 'Value ID'
        primary key,
    attribute_id smallint unsigned default 0 not null comment 'Attribute ID',
    store_id     smallint unsigned default 0 not null comment 'Store ID',
    entity_id    int unsigned      default 0 not null comment 'Entity ID',
    value        datetime                    null comment 'Value',
    constraint CATALOG_PRODUCT_ENTITY_DATETIME_ENTITY_ID_ATTRIBUTE_ID_STORE_ID
        unique (entity_id, attribute_id, store_id),
    constraint CATALOG_PRODUCT_ENTITY_DATETIME_STORE_ID_STORE_STORE_ID
        foreign key (store_id) references store (store_id)
            on delete cascade,
    constraint CAT_PRD_ENTT_DTIME_ATTR_ID_EAV_ATTR_ATTR_ID
        foreign key (attribute_id) references eav_attribute (attribute_id)
            on delete cascade,
    constraint CAT_PRD_ENTT_DTIME_ENTT_ID_CAT_PRD_ENTT_ENTT_ID
        foreign key (entity_id) references catalog_product_entity (entity_id)
            on delete cascade
)
    comment 'Catalog Product Datetime Attribute Backend Table';

create index CATALOG_PRODUCT_ENTITY_DATETIME_ATTRIBUTE_ID
    on catalog_product_entity_datetime (attribute_id);

create index CATALOG_PRODUCT_ENTITY_DATETIME_STORE_ID
    on catalog_product_entity_datetime (store_id);

Any ideas on how to speed it up is much appreciated

gabtzi
  • 181
  • 7

1 Answers1

1

There's a lot going on in your query, and potentially a lot of room for improvement. As a general answer and set of advice, yes creating proper indexes on your tables involved should help improve the performance of your query. Proper indexes are one's that cover all of the fields used in the query. Sometimes this is feasible, other times not.

Generally speaking, to properly create a covering index, you should first specify the predicates (JOIN, WHERE, or HAVING clauses) fields in the index definition, then followed by what fields you're utilizing in the SELECT list. The order you list your fields matters because it is the order in which the underlying B-Tree data structure sorts your data in that index. If you listed a field first in the index definition that you're not actually using in your query, but the rest of the fields in that index definition are being used by your query, that index probably won't be applicable or used by your query.

So an example covering index you can try adding in your scenario would be for the table catalog_product_entity_datetime on the fields (entity_id, attribute_id, store_id, value_id). This index definition includes all 3 fields of your JOIN clause plus the field you're selecting from it ultimately.

Another word of advise is to avoid using SELECT * generally, as that's bad practice and will potentially result in some covering indexes no longer being applicable to your queries. It also results in unnecessary amounts of data being returned oftentimes when only a subset of the fields are really needed. In your example, you're doing SELECT e.*, do you really need all columns returned for catalog_product_entity in this query or can you explicitly list only the subset of columns needed? (In addition to the performance drawbacks, it also could result in other issues with code maintenance down the road, should your schema change without the consumer of your query being updated accordingly.)

Additionally, your query has a bunch of ORs in your predicates, specifically the WHERE clause. Sometimes this makes it difficult for database engines to come up with the most efficient plan and can cause what would've been an appropriate index, to not be used. Oftentimes one can rewrite their query replacing the OR clauses with a UNION clause instead, and selecting from their query a second time, applying the second case of their OR clause instead.

Finally, I'll just mention you're using the IF keyword in your WHERE clause as well. I'm not familiar enough in MariaDB to say that keyword specifically could cause performance issues, but just a heads up that in general functions in predicates can sometimes cause performance issues because they're unable to be evaluated effectively by the database engine to applicably apply what would've been an appropriate index otherwise. So sometimes re-writing the query to not use a function in the predicate, is also another way to improve performance.

Long story short, with the advise above, you can try a multitude of different indexes and query re-writes to find the most efficient outcome. It will require a lot of testing, which is why I gave general guidance on things you can do and what to look out for. Best of luck!

J.D.
  • 40,776
  • 12
  • 62
  • 141