Contact Sales
Welcome!

An account will enable you to access:
- NetApp support's essential features
- NetApp communities
- NetApp training
- Sign in to my account
- Don't have an account?
  Create an account
NetApp account
- NetApp dashboard
- Sign out
Language
- English
- Deutsch
- Español
- Français
- Italiano
- Português
- 日本語
- 한국어
- 简体中文
- 繁體中文
See your global contacts
Learn
Browse

iDedup: Latency-aware, inline data deduplication for primary storage

Date

February 14, 2012

Author

Kiran Srinivasan, Tim Bisson, Garth Goodson, and Kaladhar Voruganti.

In this paper, we propose an inline deduplication solution, iDedup, for primary workloads, while minimizing extra IOs and seeks.

Deduplication technologies are increasingly being deployed to reduce cost and increase space-efficiency in corporate data centers. However, prior research has not applied deduplication techniques inline to the request path for latency sensitive, primary workloads. This is primarily due to the extra latency these techniques introduce. Inherently, deduplicating data on disk causes fragmentation that increases seeks for subsequent sequential reads of the same data, thus, increasing latency. In addition, deduplicating data requires extra disk IOs to access on-disk deduplication metadata. In this paper, we propose an inline deduplication solution, iDedup, for primary workloads, while minimizing extra IOs and seeks.

Our algorithm is based on two key insights from real-world workloads: i) spatial locality exists in duplicated primary data; and ii) temporal locality exists in the access patterns of duplicated data. Using the first insight, we selectively deduplicate only sequences of disk blocks. This reduces fragmentation and amortizes the seeks caused by deduplication. The second insight allows us to replace the expensive, on-disk, deduplication metadata with a smaller, in-memory cache. These techniques enable us to trade off capacity savings for performance, as demonstrated in our evaluation with real-world workloads. Our evaluation shows that iDedup achieves 60-70% of the maximum deduplication with less than a 5% CPU overhead and a 2-4% latency impact.

In Proceedings of the USENIX Conference on File and Storage Technologies 2012 (FAST '12)

Resources

A copy of the paper is attached to this posting. The FAST '12 conference site has an MP4 video of presentations made at FAST '12.

idedup-FAST12.pdf