The author shares their experience of solving a complex aggregation problem with a cyber intrusion dataset in R. They explain the challenge of aggregating statistics for different subsets of data and provide examples of how certain statistics can be aggregated. They then describe their initial inefficient algorithm and the optimization considerations they had to take into account. The author admits to not understanding how memory management works in R, which led to inefficient code. They explain the concept of copy-on-write semantics in R and demonstrate its effect through the use of the ‘inspect’ function. Finally, the author shares their improved algorithm for solving the aggregation problem.
https://franklin.dyer.me/post/213