Abstract:
The compressed inverted index is one of the most widely-used techniques for full-text retrieval, which can not only save disk space but also reduce the number of memory accesses. Combining the ideas of compressing offset and chunking integer sequences, we propose the GR-OP compression algorithm, Simple-OP compression algorithm and PFOR-OP compression algorithm based on the offset within a block. These algorithms are based on dynamic chunking and compressed using Golomb-Rice, Simple9 and PFOR algorithm respectively. Experiments show that the new algorithms based on the intra-block offset have better compression performance than the original docid sequence and d-gap sequence. The new algorithms improve the BPI and random query time, while the compression time is longer due to the increased complexity of the algorithms. The new algorithms are meaningful for practical applications.