Ziplm: Gzip-Backed Language Model

The ZipLM language model is a unique and mildly interesting tool that utilizes compressors built into Python. One can train the model using training data or run it without any data to see patterns that gzip likes. The model can also provide the probability for a given sequence. Additionally, one can try using bz2 and lzma as language models by passing them as the compressor argument to the model. The model works by converting code lengths to probabilities, following the general equivalence between probability distributions and codes. While the model may not be perfect, it is still capable of generating relatively recognizable output.

https://github.com/Futrell/ziplm

To top