We have a large quanitity of text files we want to free-text/full text search, combined with relational structured metadata about the text file. So, a search could be "Give me all files that belong to group X(or sub groups of X), have author (Ari and Bari and Mari), belongs to organization Y, and contains the text "synthetic". The latter part being a full-text search, and the other being already stored as relational data in our existing db.
In our database(which is rather complex), there are stored a way to ID the files, and a ton of various metadata about the file, spread among tens of tables, ranging from simple 1-1 relationships, to 1-many sets pr file, and even tree-structure relationship(things like "this file is type X, type X is a subgroup of type Y, etc). And this metadata may change over time, all over the application(which is huge).
Now, I as a database admin, thought this could be solved by using SQL Server to do the search for structured metadata already in the DB, constraining the search to candidate-files, and then passing the candidate file id's to elastic search for full-text searching. (Re-indexing the file on elastic when a file is added or commited is trivial in our code)
However, the elastic-guys in our project naturally had a different idea: To extract all the meta-data as well as the full-text content from the files, to elastic-search, and run the search exclusively in elastic.
This allows them to run full powered lucene queries easily, and load is taken off the database, which is nice. However, this also to me, introduces a nightmare to keep the structured metadata in sync, and blindly re-indexing/syncing everything periodically is not possible due to the scale of data.
I can see merits/concerns to both options. Is there a best practice for this kind of thing?
