FDT - Deduplication Reimagined In OpenZFS
Storage Gaga, Thursday, April 18th, 2024
Deduplication in OpenZFS has been a bugbear for some years now. As data sets get larger, they have become even more difficult in using the present DeDuplication Table (DDT) method. Deduplication in OpenZFS is often derided as overwhelming and sluggish in performance.
Moreover, there is a common folklore passed on and on about allocating 5GB of RAM for every 1TB to dedupe in OpenZFS. I don't know where this 'sizing' came about. Probably derived from something Jeff Bonwick wrote back in the early days of ZFS. But there is some truth to this 'rule of thumb', commonly passed around in the TrueNAS circles.
Nevertheless, given the exponential growth of data, and the advancement of processing power in modern day computer systems, the OpenZFS development community has decided to revamp the DDT method. Several prominent luminaries from iXsystems, Klara Systems and the OpenZFS community have got together in mid-2023 to develop FDT or Fast Dedupe Table. And we got to see FDT announced to the world in the most recent OpenZFS Developer Summit in November 2023.