Data diffs: Algorithms for explaining what changed in a dataset (2022)

The article discusses the concept of explanation algorithms in data analysis. It explains that most reporting in the data world starts with asking how much, but eventually leads to questions of why. Explanation algorithms help identify high-likelihood explanations for changes in data over time. The article introduces two approaches to explanation algorithms, Scorpion and DIFF. Scorpion operates on aggregates and uses visualizations to highlight outliers for further explanation. DIFF, on the other hand, is a database operator expressed in SQL that compares different groups to find explanations. The author has implemented an open source version of DIFF in their datools library.

To top