Questions tagged [full-text-search]

Searching the text of a collection of documents or free text fields in a database to find those containing a word or combination of words.

Full text search systems use a data structure called an inverted index. Logically an inverted index consists of a key containing the word, and a list of documents that word appears in. The document entry may also have a weighting based on the frequency with which that word appears in the document. A weighting may also be applied to the search terms.

Full text search engines locate documents matching the search terms and calculate the closeness of the match using a heuristic called a cosine ranking. This is calculated by forming an n-dimensional vector from the search terms and then constructing similar vectors from the search results. The dot product of these two vectors is the cosine of the angle between these vectors in n-dimensional space. A cosine value of 1 indicates parallel vectors and the closest possible match.

Typically the search results are fed into a priority queue and then popped out in order from highest cosine ranking to lowest. Some systems also apply weightings to the cosine rankings based on other factors; the most famous examples of this is Google's Pagerank algorithm.

Text retrieval systems normally use proprietary engines, although many general-purpose database systems also offer a full text search function.

622 questions
59
votes
3 answers

Possible INDEX on a VARCHAR field in MySql

I am working in a MySql database, with a table like this: +--------------+ | table_name | +--------------+ | myField | +--------------+ ...and I need to make a lot of queries like this (with 5-10 strings in the list): SELECT myField FROM…
Mark Tower
  • 693
  • 1
  • 6
  • 6
34
votes
1 answer

Guidelines for full-text index maintenance

What guidelines should be considered for maintaining full-text indexes? Should I REBUILD or REORGANIZE the full-text catalog (see BOL)? What is a reasonable maintenance cadence? What heuristics (similar to the 10% and 30% fragmentation thresholds)…
Geoff Patterson
  • 8,447
  • 2
  • 28
  • 53
32
votes
5 answers

What database technologies do big search engines use?

Does anybody knows how Google or Yahoo perform searches for keywords against very very huge amounts of data? What sort of database or technologies do they employ for this? It takes few milliseconds, but they have more than a billion pages indexed.
rkosegi
  • 518
  • 1
  • 7
  • 13
25
votes
3 answers

How is LIKE implemented?

Can anyone explain how the LIKE operator is implemented in current database systems (e.g. MySQL or Postgres)? or point me to some references that explain it? The naive approach would be to inspect each record, executing a regular expression or…
Nick
  • 353
  • 1
  • 3
  • 5
22
votes
3 answers

How can I use a full-text search on a jsonb column with Postgres?

So i have a jsonb column that has entries like this: https://pastebin.com/LxJ8rKk4 Is there any way to implement a full-text search on the entire jsonb column?
choco
  • 221
  • 1
  • 2
  • 4
20
votes
1 answer

Get partial match from GIN indexed TSVECTOR column

I would like to get results by query this: SELECT * FROM ( SELECT id, subject FROM mailboxes WHERE tsv @@ plainto_tsquery('avail') ) AS t1 ORDER BY id DESC; This works and return rows with tsv containing 'Available'. But if I use…
xangr
  • 457
  • 1
  • 5
  • 10
19
votes
0 answers

PostgreSQL trigram GIST vs. GIN indexes

I have a PostgreSQL 9.1 database that contains 10M+ rows and some text fields that need similarity and %word%-like searches, so I decided to use trigram indexes. Initially, I started to use GIN indexes, but now I am wondering whether I should rather…
P.Péter
  • 911
  • 1
  • 9
  • 20
18
votes
2 answers

Trigram search gets much slower as search string gets longer

In a Postgres 9.1 database, I have a table table1 with ~1.5M rows and a column label (simplified names for the sake of this question). There is a functional trigram-index on lower(unaccent(label)) (unaccent() has been made immutable to allow its use…
18
votes
1 answer

Best index for similarity function

So I have this table with 6.2 millions records and I have to perform search queries with similarity for one for the column. The queries can be: SELECT "lca_test".* FROM "lca_test" WHERE (similarity(job_title, 'sales executive') > 0.6) AND…
16
votes
2 answers

Postgres full text search with multiple columns, why concat in index and not at runtime?

I've come across full text search in postgres in the last few days, and I am a little confused about indexing when searching across multiple columns. The postgres docs talk about creating a ts_vector index on concatenated columns, like so: CREATE…
latentflip
  • 261
  • 1
  • 2
  • 4
15
votes
1 answer

There is insufficient system memory in resource pool 'internal' to run this query

One of our production servers is reporting an error in the log Error: 701, Severity: 17, State: 123. There is insufficient system memory in resource pool 'internal' to run this query. I searched for this error and found it’s a bug and there…
AA.SC
  • 4,073
  • 4
  • 28
  • 45
15
votes
2 answers

Will a Full-Text Catalog with Track changes: AUTO automatically update if Change Tracking for that table is not enabled?

I have this database table that is suppose to keep the full-text index up-to-date. However I am not seeing it happening at all (no error in the log because the last log I see was when I triggered it manually). Here is what I am seeing... but on the…
MetaGuru
  • 215
  • 1
  • 3
  • 8
14
votes
3 answers

Full text search over multiple related tables: indices and performance

We have the following database structure CREATE TABLE objects ( id int PRIMARY KEY, name text, address text ); CREATE TABLE tasks ( id int PRIMARY KEY, object_id int NOT NULL, actor_id int NOT…
13
votes
4 answers

Full text search results in a large amount of time spent in 'FULLTEXT initialization'

I'm currently trying to run some queries against a data dump of Stack Overflow's comments. Here's what the schema looks like: CREATE TABLE `socomments` ( `Id` int(11) NOT NULL, `PostId` int(11) NOT NULL, `Score` int(11) DEFAULT NULL, `Text`…
hichris123
  • 231
  • 1
  • 2
  • 7
13
votes
5 answers

SQL Server 2008 Full Text Index Never Seems to Complete

Our website has a SQL Server 2008 R2 Express Edition database with full-text indexing for our website search. Each time a new record is added or updated in one of the indexed tables, the indexing process never seems to complete. I have been…
1
2 3
41 42