2

If I understand correctly, secondary index leaf nodes in MySQL InnoDB engine hold the table primary key value (at least for a unique index).

Thus, looking up a value in the secondary index results in two BTREE lookups: one for the secondary index and another one for the clustered index (clustered on the primary key). This also means that the primary key size affects the size of all secondary indexes.

Is this how MongoDB WiredTiger secondary indexes work too? Or do MongoDB secondary indexes store a reference to the physical block where the document resides? (I believe this is how Postgres handles indexes)

nimrodm
  • 165
  • 1
  • 7

1 Answers1

3

I understand correctly, secondary index leaf nodes in MySQL InnoDB engine hold the table primary key value (at least for a unique index).

InnoDB tables have a clustered index which determines where data for rows is stored. The clustered index for an InnoDB table is either the PRIMARY KEY (if set), the first unique index with all key columns NOT NULL, or hidden index containing synthetic row ID values. Secondary index entries will include a reference to the clustered index for the record location.

Is this how MongoDB WiredTiger secondary indexes work too?* Or do MongoDB secondary indexes store a reference to the physical block where the document resides?

MongoDB (as at 4.0) does not support clustered indexes. The WiredTiger storage engine uses an approach similar to the third option described for InnoDB: an internal unique RecordID which is a 64-bit integer.

The WiredTiger storage engine currently uses a separate file for each collection and index. Collection data (collection-*.wt) is indexed using the internal RecordID; indexes (index-*.wt) map keys to RecordIDs in the associated collection data. These implementation details aren't relevant for the MongoDB API, which provides a consistent end user interface for indexing, querying, and manipulating data. Storage engines have ownership over how data is represented on disk and in memory. The approach is analogous to what happens under the hood in other database engines like InnoDB (where end users are interacting via the MySQL API).

The size of your primary key (which would be the _id field in MongoDB) currently has no impact on the size of secondary indexes in WiredTiger.

Incidentally, if you happen to be using the older (and deprecated) MMAPv1 MongoDB storage engine it does store references to the disk location (file and offset) where data resides. A significant downside of this approach is that document moves in storage (due to document growth) also require updating all secondary indexes. WiredTiger only needs to update secondary indexes when associated field values are changed.

Stennie
  • 10,345
  • 2
  • 31
  • 46