Conference paper
Space-Efficient Re-Pair Compression
Re-Pair [5] is an effective grammar-based compression scheme achieving strong compression rates in practice. Let n, σ, and d be the text length, alphabet size, and dictionary size of the final grammar, respectively. In their original paper, the authors show how to compute the Re-Pair grammar in expected linear time and 5n + 4σ2 + 4d + √n words of working space on top of the text.
In this work, we propose two algorithms improving on the space of their original solution. Our model assumes a memory word of [log2 n] bits and a re-writable input text composed by n such words. Our first algorithm runs in expected O(n/ε) time and uses (1+ε)n+√n words of space on top of the text for any parameter 0 <∊ ≤ 1 chosen in advance.
Our second algorithm runs in expected O(n log n) time and improves the space to n + √n words.
Language: | English |
---|---|
Publisher: | IEEE |
Year: | 2017 |
Pages: | 171-80 |
Proceedings: | 2017 Data Compression ConferenceData Compression Conference |
Series: | Data Compression Conference. Proceedings |
ISBN: | 1509067213 , 1509067221 , 9781509067213 and 9781509067220 |
ISSN: | 23750383 , 10680314 and 23750359 |
Types: | Conference paper |
DOI: | 10.1109/DCC.2017.24 |
ORCIDs: | Bille, Philip , Gørtz, Inge Li and Prezza, Nicola |
(1+ε)n+√n words Context Data compression Data structures Dictionaries Grammar Random access memory Re-Pair Re-Pair grammar Time-frequency analysis alphabet size compression compression rates computational complexity data compression dictionary size expected O(n log n) time expected O(n/ε) time expected linear time grammar grammar-based compression grammars memory word re-writable input text recursive pairing space-efficient Re-Pair compression text length