Exploratory Data Analysis Using Awk

In the spring of 2023, Brian Kernighan co-taught a course called Literature as Data, which aimed to combine literary study with computing. The goal was to teach students enough computing skills to explore datasets that intrigued them. They used Unix command-line tools and Awk, which proved to be useful in the initial stages of exploring new datasets. The course focused on analyzing 18th-century sonnets and uncovered interesting findings, such as sonnets with a different number of lines than expected. The data validation process revealed anomalies and raised questions about the authors and the creation of the dataset. While Awk was useful for preliminary analysis, Python and its libraries were eventually used for more advanced analysis.

https://awk.dev/eda.html

To top