4

I have been working on making a completely offline dictionary using the Wiktionary XML dumps. The dumps themselves are about 10 MB, but when converted into a index using a search engine indexer (I use Whoosh Search Engine in Python), the complete index comes to about 250 MB. Which I think would be difficult to distribute, it might be zipped, still it won't come anywhere near 10 MB. And indexing takes about 1 hour in my system, so indexing while installing the software in a PC is tedious.

So I am looking for an alternate way of storing the words and meanings to make the dictionary. Which is a better searchable solution? May be some sort of Data Base that produces light weight DBs.

Or the search engine indexes are better than DBs?

1 Answers1

4

Have a look at the Directed Acyclic Word Graph data structure, which is designed to be a highly space-economical way to store dictionaries. They are commonly used on mobile phones, where economizing storage space is important.

Robert Harvey
  • 200,592