First personal search engine prototype

The author explains that they have created a prototype for their personal search engine (pse) using Bash scripts and various tools. They used SQLite databases from Firefox and newsboat to gather URLs, downloaded the content using wget, and indexed it with PageFind. The prototype showed promising results with just 16,000 pages. The author provides the Bash scripts they used to get the URLs, harvest content, and launch the search engine. They also discuss the limitations of the prototype, such as slow harvesting due to stale links and the need for a larger corpus of pages. They mention potential ways to expand the corpus, including mirroring small websites, extracting text from PDFs, and incorporating browser history and bookmarks.

https://rsdoiel.github.io/blog/2023/03/10/first-prototype-pse.html

To top