4

I have to design and implement an algorithm for my university project that searches a given set of documents based on the keywords/query given. Assume that each document contain few sentences and these documents can be stored in a suitable data structure. When a query is made I have to display the documents that contain the keywords. A query can contain simple logical operators such as “AND” and “OR”.

For example assume that there are 3 documents named Doc1, Doc2, Doc3 with this content:

  • Doc1: This is my University.
  • Doc2: My University is situated at Delhi.
  • Doc3: I like My University.

Here are the answers to some queries:

  1. "University": Doc1, Doc2, Doc3
  2. "my AND University": Doc1, Doc2
  3. "like OR Delhi": Doc2, Doc3

Currently what I have developed reads each file and put its contents into separate binary trees, and I've developed a function for searching one word from the binary trees. How can I extend my search algorithm for search with logical operators?

2 Answers2

3

Hash the words to a key value pair of (word, set of documents). When you do the search, insert the sets found into the hashtable. Then do a union on the sets for OR and intersections for AND

Skorpius
  • 139
1

You could go with .

Dictionary<string, HashSet<Int>>  se
           word            docID

But I thought you had to build from scratch

I don't know java
It may be called a HashTable in java

var docs =  se["my"].Interset(se["university"]);

var docs =  se["my"].Union(se["Delhi"]);
paparazzo
  • 1,927