Data Deduplication Overview
Applies to: Windows Server (Semi-Annual Channel), Windows Server 2016
What is Data Deduplication?
Data Deduplication, often called Dedup for short, is a feature of Windows Server 2016 that can help reduce the impact of redundant data on storage costs. When enabled, Data Deduplication optimizes free space on a volume by examining the data on the volume by looking for duplicated portions on the volume. Duplicated portions of the volume's dataset are stored once and are (optionally) compressed for additional savings. Data Deduplication optimizes redundancies without compromising data fidelity or integrity. More information about how Data Deduplication works can be found in the 'How does Data Deduplication work?' section of the Understanding Data Deduplication page.
KB4025334 contains a roll up of fixes for Data Deduplication, including important reliability fixes, and we strongly recommend installing it when using Data Deduplication with Windows Server 2016.
Why is Data Deduplication useful?
Data Deduplication helps storage administrators reduce costs that are associated with duplicated data. Large datasets often have a lot of duplication, which increases the costs of storing the data. For example:
- User file shares may have many copies of the same or similar files.
- Virtualization guests might be almost identical from VM-to-VM.
- Backup snapshots might have minor differences from day to day.
The space savings that you can gain from Data Deduplication depend on the dataset or workload on the volume. Datasets that have high duplication could see optimization rates of up to 95%, or a 20x reduction in storage utilization. The following table highlights typical deduplication savings for various content types:
|Scenario||Content||Typical space savings|
|User documents||Office documents, photos, music, videos, etc.||30-50%|
|Deployment shares||Software binaries, cab files, symbols, etc.||70-80%|
|Virtualization libraries||ISOs, virtual hard disk files, etc.||80-95%|
|General file share||All the above||50-60%|
When can Data Deduplication be used?
General purpose file servers
General purpose file servers are general use file servers that might contain any of the following types of shares:
Virtualized Desktop Infrastructure (VDI) deployments
VDI servers, such as Remote Desktop Services, provide a lightweight option for organizations to provision desktops to users. There are many reasons for an organization to rely on such technology:
Backup targets, such as virtualized backup applications
Backup applications, such as Microsoft Data Protection Manager (DPM), are excellent candidates for Data Deduplication because of the significant duplication between backup snapshots.
Other workloads may also be excellent candidates for Data Deduplication.
We'd love to hear your thoughts. Choose the type you'd like to provide:
Our feedback system is built on GitHub Issues. Read more on our blog.