The purpose of GNU mifluz is to provide a C library to build and query a full text inverted index. It is dynamically updatable, scalable (up to 1Tb indexes), uses a controlled amount of memory, shares index files and memory cache among processes or threads and compresses index files to 50% of the raw data. The structure of the index is configurable at runtime and allows inclusion of relevance ranking information. The query functions do not require to load all the occurences of a searched term. They consume very few resources and many searches can be run in parallel.
GNU mifluz has been designed with the further upper limits in mind : 500 million documents, 50 giga words, 20 million document updates per day.
GNU mifluz has two main characteristics : it is very simple and uses 50% of the size of the indexed text for the index. It is simple because it provides only a few basic functionalities. It does not contain document parsers (HTML, PDF etc...). It does not contain a full text query parser. It does not provide result display functions or other user friendly stuff. It only provides functions to store word occurences and retrieve them.
The advantage GNU mifluz has over most full text indexing systems is that it is fully dynamic (update, delete, insert), uses only a controled amount of memory while resolving a query, has higher upper limits and has a simple storage scheme. Consuming more disk space allows all this.