Information Retrieval
Boolean retrieval
Simplest form of information retrieval. information retrieval needs to be fast, accurate and often be able to rank the results according to their relevance to the query.
- Naive linear search or grepping
- Incidence matrix (indexing)
- document vs term matrix
- is sparse i.e. many elements are 0s
- Inverted index
- data structures: variable length array vs linked list
- variable length array is better when documents are not frequently updated
- linked lists are more efficient when frequent updates are required
- hybrid representation can also be used
- compact representation of incidence matrix
- data structures: variable length array vs linked list
- Intersect operation
- sort in ascending order to list size
- after every merge the intermediate list will be of the least possible length
- Boolean operations
- and, or, not
- proximity
- Want to do
- use natural language instead of specific queries
- changes in postings of inverted index to accomodate “proximity” operations
- rank the retrivals