question

FelixWagner-2549 avatar image
0 Votes"
FelixWagner-2549 asked DraganPenkov-4193 commented

Deduplication fails on General Use file server suddenly with "0x80565323, 0x80565309, A required filter driver is either not installed, not loaded, or not ready for service" - Windows Server 2004

Hello,

I am writing here to find answer and solve the issue which did cause our new moved File Server to have lost the data of a whole day.

The setup:
* Two Node Storage Spaces Direct Cluster with Full SSD Storage, is the base for the VMs of a General Use File Server Cluster

The issue:

  • Firstly, Dedup seems to run fine. Data was deduplicated and reported.

  • Then suddenly files where inaccessible. Trying to Start-DedupJob -Type Unoptimization did not helped and was terminated. Or this task reported that 56k files where inaccessible.

  • The evenlog is full with different errors.

  • Event ID:

  • 6144 - File inaccessible

  • 4137 - Volume not enabled for data deduplication - Well Get-DedupVolume and Get-DedupStatus report it differently at that time

  • 0x80565309, A required filter driver is either not installed, not loaded, or not ready for service

    This one made me courious. For me ti means in the beginning everything is fine. Dedup is running and the filter driver applied to the NTFS volume. But then something happens and this filter is removed or something happens to the System Volume Information and Dedup cannot read anything anymore. There was nothing touched in that area. All folders on the file share are accessible for SYSTEM.

    I tried to stop dedup via unoptimization. It did work if:

  • I have a volume without any dedup files

  • Remove it from the file server role

  • Enabled it as CSV

  • Run unoptimization task

All other volumes are not recoverable. Every Dedup tasks simply fails with an error.

We have other files servers on different hardware platforms where we use DFS-R. On these servers Dedup is enabled too and is working fine.
The difference is that those servers (two of them are served via an SOFS role) use VHDX files for the data. Only this Cluster with the main data on top of the S2D Cluster used the VHD Set disk types.
I searched now already for a lot for information and did not found any. So therefor, I start asking here if this is a known issue between dedup a volume which is a VHD set or if I do oversea any other issue?

For now the cluster is offline, so I could together with Microsoft take a deeper look on this issue. For me this looks critical as even if this is an known not-supported configuration, the MS documentation does not mention it. Neither in VHD sets nor in Data Dedup, or I have a blind spot on that sentence.

Hopefully, I just faced a rare bug and can help to avoid others having the same issue.

Kind regards

Felix






windows-serverwindows-server-clustering
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi,

We have noticed this post and doing research on it, if we get any updates, we'll feedback as soon as possible.

Best Regards,
Anne

1 Vote 1 ·

Hello Anne,

thank you and nice to hear you doing researching.

Do you need anything from my side? As I would like to resetup the VHDSets and start rebuilding this cluster to test it again more carefully.
For sure we have enough space to save the old VHD Sets for some time. But if something like Eventlog or a look on this issue is needed, it would be great to do it sooner than later.

Kind regards

Felix

0 Votes 0 ·
FelixWagner-2549 avatar image
0 Votes"
FelixWagner-2549 answered

As a follow up to give a bit more information:

Meanwhile, I saw this error, but cannot really deal with it:

Data Deduplication error: Unexpected error.

Operation:
Processing deleted chunk store streams.
Indexing active chunk references.
Starting chunk store garbage collection.
Running the deduplication garbage collection job.

Context:
File name: \\?\Volume{fad55a56-d363-496f-8f37-ef1707b40a67}\System Volume Information\Dedup\ChunkStore{1B8EC874-DD08-42FE-A0FF-FAC35A2FEC9A}.ddp\Stream\00290000.00000002.ccc
Chunk store: \\?\Volume{fad55a56-d363-496f-8f37-ef1707b40a67}\System Volume Information\Dedup\ChunkStore{1B8EC874-DD08-42FE-A0FF-FAC35A2FEC9A}.ddp\Stream
Volume name: C:\ClusterStorage\Volume1 (\\?\Volume{fad55a56-d363-496f-8f37-ef1707b40a67})

Error-specific details:
Error: CDedupFilter::DeviceIoControl(\\?\Volume{fad55a56-d363-496f-8f37-ef1707b40a67}\System Volume Information\Dedup\ChunkStore{1B8EC874-DD08-42FE-A0FF-FAC35A2FEC9A}.ddp\Stream\00290000.00000002.ccc, FSCTL_DEDUP_FILE, ...), 0x80070001, Incorrect function.


The other error which I continuously receive is this one:

Failure reason: FSCTL_DEDUP_FILE.DEDUP_SET_CHUNK_STORE_CACHING_FOR_NON_CACHED_IO failed with ERROR_INVALID_FUNCTION for volume

I assume that this one is caused by the issues above.

I like to make sure at this moment:
We are talking about a Storage which is a VHD SET. The VM with the VHD Set works directly on a S2D HCI Cluster Windows Server 2019 which has tested 750k IOPS and 1500MB/s read Throughput. Meanwhile extended another 10 SSDs. (Hence over 1Mio?). The S2D is configured as Nested-Mirror. The S2D itself looks very fine. Latency is in micro seconds of 99,99% a day. The rest are peaks to maximum of 3ms for a short peak moment. Such a peak moment did not happen at the time dedup did fail. So I assume the storage sub-system is fine.
The issue lies somewhere on the Hyper-V VM, the controller driver (there are some errors inside the VM event log too.) and the dedup driver.

Still the question why the chunk gets corrupted or seems to be corrupted for the Dedup filter driver.
Anyone who can assist here?




5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

XiaoweiHe-MSFT avatar image
2 Votes"
XiaoweiHe-MSFT answered DraganPenkov-4193 commented

Hi,

After a few research, I found a similar issue which reports below:

Symptoms
Bugcheck stop 9E or 133 occurs in Hyper-V server with Deduplication enabled. Also, issues of VHD corruption have been reported by customers
using Datacore SAN symphony storage pool, when VHDs are stored on volumes with Dedup enabled

Cause
Dedup filter holds the MCB spinlock for too long during periodic volume flush under high churn triggering this. The issue is more prominent if the VHDs of the VM are not on CSV volume and hence not opened in write-through mode

Resolution
Increasing the frequency of dedup flush by creating the following registry entry

HKLM\System\CurrentControlSet\Services\ddpsvc\Settings\
FlushMaxDelay DWORD 60

NOTE: By default, this entry may not be present. In which case the default value is 300 sec

Please test if the resolution to increase the frequency of dedup flush could work in your case.

Thanks for your time!
Best Regards,
Anne


If the Answer is helpful, please click "Accept Answer" and upvote it.

Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi
I have encountered issue with my S2D cluster with deduplication enabled. After regular updates and restart of server ( this was done at time when scrubbing dedup job was scheduled) I get one of my vhd files in one VM corrupted and I was unable to repair this file (restore from backup).
Can this registry help with this issue?

0 Votes 0 ·
FelixWagner-2549 avatar image
0 Votes"
FelixWagner-2549 answered

Hello Anne,

thank you for your advise. I am not sure it applies to my situation. Still I start to understand what this parameter does and why it could be helpful. So, I will implement it and try it out. Thank you for helping here.

Still, I am not sure if this applies to the situation as:
- Deduplication is not running on the S2D
- Deduplication errored inside a VM on top of that S2D. so there is no Hyper-V on this "VM"
- That "VM" is a Cluster of two VMs.


Meanwhile, I rebuild the storage from scratch. I updated both VMs to Windows Server 20H2, applied all updates (beside of previews).
As I did now configure the storage I recollect that:

  • The first time I did do it, I simply mounted the VHD Sets to both VMs

  • Over Server Manager I initialized them

  • Directly I did create the volumes. In this step I believe Deduplication was not available in Server Manager.

  • After creating the volumes I did move into Failover Cluster and added the Disks as Available Storage and created the Role

  • After the disk was attached to the File Server General Use Role, I enabled Deduplication via Server Manager (now it was possible)

A deeper look into event viewer showed, that the errors mentioned started right in this process. I believe something did went wrong there. I am sure that neither of those disks was writable on both nodes at the same time (the 2nd node always showed Readonly flag inside Server Manager).

Now, I did recreate the VHD sets. This time I adjusted the process:

  • Created disk and mounted them to both VMs

  • Initialized the disk on a single VM via PowerShell Cmdlet Initialize Disk -Number

  • Now I did run Cluster verification

  • added the disks as available storage to the Cluster

  • Added the disk to the File Server Role

  • Now, in Server Manager as I created the Volumes, Deduplication could be enabled directly without any issues

After a half day now, it looks like everything is running fine. Event log shows not a single error or warning. Only normal Information telling everything is fine. Well, Shadow copies are not yet enabled on these volumes as the DFS-R from the old file share server is still running. (May approach to carefully test now the new deployment)

So could it be that the wrong process damaged the volumes? I am not sure, how I did setup the other servers, but as this cluster uses only shared storage it maybe the case, or?

So hence it seems, my knowledge about volume management and how clusters interact lacks here some important part.

Kind regards

Felix

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

FelixWagner-2549 avatar image
0 Votes"
FelixWagner-2549 answered

Hello Anne,

update on this case:
Deduplication again broken. To apply your suggested settings, I restartet one node and moved the role manually. Right after moving the role, the error with the filter driver appears again:

Data Deduplication failed to start job type "Optimization" on volume "\\?\Volume{ba127eb2-8560-4a59-83d9-54d7e55c752f}\" with error "0x80565309, A required filter driver is either not installed, not loaded, or not ready for service.

".

So it seems that right at moving the issue appears. Deduplication is for sure installed on both nodes.

Kind regards


Felix
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

FelixWagner-2549 avatar image
2 Votes"
FelixWagner-2549 answered FelixWagner-2007 commented

Hello Anne,

as your suggestion didn't help :( . I started certain further evaluation of this issue.
After this investigation it seems that Windows Server 2004 and 20H2 have issues with Deduplication and non-CSV volume inside a cluster.
Windows Server 2019 is fine. I created a new (Guest-)Cluster with a Windows Server 2019 and everything is working.

Meanwhile, I did come over the fltmc.exe . This one helped to see what is going on with the Dedup filter.

So, as I create a new volume on a single node an fltmc.exe instances provides the the correct output:

Dedup 180450 Dedup 0 00000003
Dedup F: 180450 Dedup 0 00000003

After I do switch the volume (the file server role) to the other server and back (same applies already to the other node)

Dedup 180450 Dedup 0 00000003

This result can be reporduced every time.
So it seems, for some reason that Dedup filter cannot be applied as the volume moves.

This behavior isn't seen if the volume is a CSV. CSVs can be moved without any issue.

So, I created a new cluster based on Windows Server 2019 and Dedup works there. After created, moving fltmc.exe instances shows this result

Dedup 180450 Dedup 0 00000003
Dedup F: 180450 Dedup 0 00000003

For me this means that the initial issue lies at this error:

0x80565309, A required filter driver is either not installed, not loaded, or not ready for service

This error shows in more detail after a try with Enabled-DedupVolume -Volume F: -DataAccess:

Failure reason: FSCTL_DEDUP_FILE.DEDUP_SET_CHUNK_STORE_CACHING_FOR_NON_CACHED_IO failed with ERROR_INVALID_FUNCTION for volume

So something is wrong in connection of Windows Server 2004 & 20H2 with the creation of the Dedup Chunk storage.

At this point, I am sorry: I do not know how to copy data out of the 'F:\System Volume Information\Dedup\' to provide further assistance. Simply for me, those files are inaccessible. Neither Copy-Item nor Get-Content provide access from a remote PowerShell (CredSSP is disabled in our environment due to missing knowledge and known security issues). Well, I have solved my issue for now. Even more, with this error in background, I should deploy the file share for my company on Windows Server 2019. I wouldn't have thought before that for a file share this would be an issue, but well de-duplication isn't something easy. ;) :) (Our AD CS servers working quite well on Windows Server with upgrades.)

Moreover, I would like to try out how a Windows Server 2019 behaves inside a Cluster with those 20H2 servers, but there seems to be an incompatibility already. So no chance.

So fr me the issue is closed for now. But, well, I feel that the Microsoft QA should take a deeper look here before the next LTS release of Windows Server comes out with this bug. ;) ;) :)
The good thing for me is, I learned a lot of about volumes, drivers, filter drivers and how all those great things with CSVs work together and why a FileShare on a SOFS is a very bad idea. For sure it is. :D

Kind regards

Felix


· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi,

Appreciate your feeback and sharing the information with us. If I get any news about this topic, I will feedback as soon as possible, and if you have any new findings, welcome to feedback as well.

Best Regards,
Anne

2 Votes 2 ·

For now this topic is solved by using Windows Server 2019 in the production environment.
Only hope that this bug will be soon fixed, so next Windows Server LTS isn't affected.

0 Votes 0 ·
ChristopheGirardy-2649 avatar image
0 Votes"
ChristopheGirardy-2649 answered ChristopheGirardy-2649 edited

Hi, is there any evolution on this problem please?
Thanks

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

I do not test it any further. I learned a lot while doing so and now I am pretty sure I have a stable environment.
Still one single server in a branch office is Windows Server 20H2 with dedup enabled. But here even after multiple restarts, nothing is wrong.

It seems something is going wrong in case the Cluster moves. Do you stuck with an issue on accessing your data?

0 Votes 0 ·

Hi,
I'm not stuck with uncoverable data.
It just doesn't want to dedup a nvme disk I have with the same error.

0 Votes 0 ·