1

When does it make sense to put data in elastic search vs creating secondary indexing on Primary datastore? Elastic search with another primary store

Pros:

  1. Primary datastore can be optimised for read write usecases.
  2. Elastic search suports more than just key value matching like fuzzy match, etc.

Cons:

  1. Out of sync with primary datastore
  2. two more component to manage (ES as well a pipeline to insert in ES)
  3. Would need some sort of Change data capture capability from Primary datastore.

Secondary indexes on Primary datastore

Pros:

  1. Less moving parts.
  2. Less consistency issues ( because secondary indexes can be eventually consistant)

Cons

  1. Not all datastore support secondary indexing
  2. Secondary index queries are more oftan scatter gather, doing it on higher QPS will limit read write qps on primary access patterns like read, write by PK

Are there other considerations while deciding this?

best wishes
  • 113
  • 5

1 Answers1

2

When does it make sense to put data in elastic search vs creating secondary indexing on Primary datastore?

These things are like comparing apples to oranges.

Use Elasticsearch when your primary use case is complex word based searching and / or large bodies of text that need to be tokenized into words that are easily searchable.

Otherwise, for most other standard use cases, you'll probably want to use a relational database system (RDBMS). In an RDBMS, sometimes the primary index (usually referred to as clustered index) of a table is sufficient. Other times a secondary index (or a few secondary indexes) are needed to support the predicate use cases. It's very common to create secondary indexes, when needed.

In regards to some other things you mentioned:

Elastic search with another primary store

Pros:

  1. Primary datastore can be optimised for read write usecases.

Not sure what you mean by this. Elasticsearch won't automatically be any faster at reading and writing operations itself. And if you mean the read / writes of the source data store would be improved by offloading some of the work to Elasticsearch, that's not necessarily true either unless you use an eventual consistency methodology to copy the data over. Also, in that case, it has nothing to do with using Elasticsearch, and more so just the fact you're persisting a copy of the data, which can be done in any other system including another instance of the same source system.

  1. Elastic search suports more than just key value matching like fuzzy match, etc.

Elasticsearch is designed to be optimized at complex word based searching, as I mentioned earlier. It is not as optimized at standard key-value matching, and classic index seeking for equality and range matches, as an RDBMS is. Again, different use cases that both systems are optimized for and proficient at.

Secondary indexes on Primary datastore

Pros:

  1. Less consistency issues ( because secondary indexes can be eventually consistant)

Not sure what you mean by this. All indexes in an RDBMS are typically immediately consistent, not eventually consistent. I can only envision maybe you mean in a database system with a sharded topology, but that's only because shards can be eventually consistent, but even the indexes within the shard will still be immediately consistent once the shard becomes synchronized.

Cons

  1. Secondary index queries are more oftan scatter gather, doing it on higher QPS will limit read write qps on primary access patterns like read, write by PK

This is not exactly true. Yes, the more indexes a table has, the more write work that needs to occur to that table to make all of the indexes consistent. But that tradeoff is literally Big O theory - the tradeoff between search time complexity vs space complexity. The same is true, again, if you were to try to copy the data and persist it in another data store like Elasticsearch. There's additional write overhead, just the same.

Also, modern RDBMS offer different concurrency isolation levels, to improve performance. Optimistic concurrency isolation levels improve performance by allowing writes and reads to be able to occur concurrently. Writers don't block readers and readers don't block writers. One implementation of how this is done is via a version store of the data, which the database automatically maintains.


Resources:

  1. What is Elasticsearch?
J.D.
  • 40,776
  • 12
  • 62
  • 141