Natural language processing in Bash (2023)

The Bash toolchain allows for the generation of random prose using NLP techniques like the n-gram language model. By training the model with the novel Moby-Dick, we preprocess the text corpus and extract words, then build bigrams for text generation. Surprisingly, the toolchain can mimic Herman Melville’s style with just a few lines of Bash code. By shuffling and selecting common bigrams, the generated text closely resembles Melville’s writing. This toolchain can also handle trigrams for more complex text generation. The process is iterative and can generate unique and interesting sentences based on the initial words provided. Check out the toolchain for yourself for a fun and clever way to generate text!

https://massimo-nazaria.github.io/nlp.html

To top