In mid-December, after weeks of intense performance analysis, the author was surprised that another company had figured out the solution for their slow cluster. In 2023, Clyso was approached by a company looking to transition their HDD backed Ceph cluster to a 10 petabyte NVMe deployment. The cluster had to be spread across 17 racks with specific power, cooling, density, and vendor preferences. Clyso helped design a Dell architecture that was 13% cheaper than the original configuration. The cluster was tested using FIO with librbd backed FIO testing, and the author advocates for rethinking conventional wisdom regarding the number of PGs per OSD in order to achieve higher performance. During testing, the author encountered initial performance issues, but after several fixes, the cluster performed well.
https://ceph.io/en/news/blog/2024/ceph-a-journey-to-1tibps/