question

JamesEdmonds-7766 avatar image
0 Votes"
JamesEdmonds-7766 asked JamesEdmonds-7766 commented

Failover Cluster DNS error, event 1257 keeps coming back

Hi,

I have two failover clusters created, for which I did NOT pre-create the DNS records for the cluster or role names.
The DNS records are created when the cluster/role is brought online.

On one of the clusters, I keep getting event ID 1257, where it fails to register or update the DNS entry for the role running on the cluster.
If I delete the existing record, and restart the role, it creates successfully.

I am trying to understand why, if the cluster creates the record, this error keeps coming back?
What can I do to prevent this from constantly complaining about this, when both cluster nodes have access to that DNS record?

Thanks
James

windows-dhcp-dnswindows-server-clustering
· 7
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

My ocurrence of this issue just started again this morning.
I am able to resolve by taking the clustered role name resource offline and onlining it again.

What I have noticed, is that for a clustered role on another cluster, the permissions on the DNS entry for the role name resource lists the cluster name resource as having permissions.
On the one that was having issues this morning, it shows the role name resource as having permissions on itself, rather than the cluster name resource.

Should, when onlining the role name resource,, the cluster name object be what is given permissions on the DNS entry, rather than the role name object?

Banging my head against a brick wall here and started to get a bit frustrated.

Cheers
James

0 Votes 0 ·

Hey James, I checked my cluster objects and at least in our version (Server 2016), the actual Cluster itself 'owns' and updates the DNS objects on itself and on any roles that it is hosting. If you check the 'Advanced' security area of the DNS record, it should show you who owns it as well. We haven't had any issues with the individual roles themselves (so far), just the main CNO object is what isn't updating it's DNS record.

I did notice that the pwdLastSet timestamp of my CNO is different than the individual roles (although they are within a day or so of each other), meaning they don't all get updated at exactly the same time. Did the password on that role object recently get updated? Just curious if your issue is similar to mine.

Lastly, if you check the Diagnostic Failover Cluster event logs, you might be able to see something that occurs at the time when the DNS renewal is attempting. Those logs are pretty bloated, however, so it can be a bit of a pain to find stuff (try and filter out 'information' events), and usually they rollover pretty quickly (if the log size isn't increased) so you almost need to find it while the event is occurring.

Hope this helps!

0 Votes 0 ·

Hi Jack,

I'm interested in what roles you are hosting?
I think my issue is that, for both my CNOs and my SQL role DNS name resource, the owners of the DNS objects are the cluster name objects themselves.
In the case of my problematic file server role, the owner is the role itself.

See below. Top two entries are the CNOs. Bottom left is the problematic file server role, and bottom right is SQL role:
212091-screenshot-2022-06-16-115554.png

It seems as though when the file server role DNS name resource is brought online, it is not setting ownership of the DNS entry correctly?

I am not sure if my password reset times suggest my issue is the same as yours?

Cheers
James


0 Votes 0 ·
Show more comments
LimitlessTechnology-0326 avatar image
1 Vote"
LimitlessTechnology-0326 answered JamesEdmonds-7766 commented

Hi @JamesEdmonds-7766

Thank you for your question and reaching out.

I can understand you are facing event 1257 ID of DNS.

Please follow below steps in order to resolve the issue.

  1. Please delete the CNO ‘A’ record from DNS console.

  2. Add the same record and verify that “Allow any authenticated user to update DNS record with the same owner name” option is selected.

Hope this answers your question :)


--If the reply is helpful, please Upvote and Accept as answer--

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi Limitless,

I can do this, but I was under the impression that when the cluster/role creates the DNS entry, it would apply the appropriate permissions on the DNS entry for it to be able to manage it itself?
I can see the records have the cluster computer names and role computer names added having read/write privileges, so I was hoping that would be all that is needed?

Is it not expected behaviour for the cluster to create the records in a way they can manage them themselves?

Many thanks.

0 Votes 0 ·

After the last time I deleted the entry and had the cluster recreate it, the error went away for a while. It's kicked in again today.
Checking the cluster name DNS record, the cluster has read and write permissions on the record:
182748-image.png

Is this not enough for the cluster nodes to be able to update the record?
If not, why is it not setting the appropriate permissions when it creates the entry in the first place?

Cheers
James


0 Votes 0 ·
image.png (94.1 KiB)
ALTechAdmin-8084 avatar image
0 Votes"
ALTechAdmin-8084 answered JamesEdmonds-7766 commented

James,

I have the same problem which I have not been able to resolve it yet.

This is what I noticed. I think we cannot simply delete CNO object which Microsoft recommends, or re-create it manually and grant it appropriate permissions (aka “Allow any authenticated user to update DNS record with the same owner name”).

On day 1, after manually creates it, the error didn't repeat but actually it didn't go away.
On day 7, the error 1257 ID of DNS came back.

The action above only creates the DNS record as STATIC (as opposed to Dynamic) with date/time stamp of last update by the virtual cluster server.

On another set of cluster servers (different DNS server, different domain), I noticed this CNO is a dynamically created name. Not static.


I wonder, how do we create this CNO object DNS name dynamically in the first place?



· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Sounds like what I am seeing.

Interestingly, I don't see the issue at the moment, so will check back in a week to see if it reappears.
If it does, I will give details of the node, cluster and role names, along with ACLs on the DNS entries.

For ref, I think that a dynamic DNS entry is created when the cluster name resource is brought online (if a record doesn't already exist).
I think Microsoft say you can create a record manually, but dynamic should also work as far as I know. When the record is created, I would expect it to apply appropriate permissions on the record so the nodes/cluster name object can successfully update it.

Cheers
James

0 Votes 0 ·

I've been off for a week, so don't know when it started, but my Cluster 02 (SQL cluster), has now started generating these event IDs again.
We did do a failover just before I went on holiday, so perhaps that is what triggered it.

I can see the cluster name shows as "DNS Operation Refused":
200227-image.png

The owner node is currently server 04:
200217-image.png

If I check the DNS record, it shows that the clustered SQL role name has basically full control over the record, but not the individual nodes:
200254-image.png

This doesn't seem right to me, as I assume the nodes themselves need permissions on the entry, but given this record was automatically created by the cluster itself, I assume it must be what Microsoft intend?
I will try failing the role back to server 3, but otherwise my workaround is to delete the record, then offline/online the cluster name resource to have it recreate it.

It's a minor but frustrating issue, and I'd love to get it permanently resolved.
I want the cluster to be able to automatically manage this in the event of a name or IP change.


0 Votes 0 ·
image.png (21.0 KiB)
image.png (9.6 KiB)
image.png (39.8 KiB)

Since I got the cluster to recreate the DNS entry dynamically last week, it seems ok.
I will give it another few weeks and report back, but see no reason why this time it will work but not before.

Cheers
James

0 Votes 0 ·

Not sure what I did differently this time during the manual deletion of the record, and automation of recreation by taking the cluster name offline and online, but here we are about a month later and seemingly not happening anymore!

WIll monitor for a while, but maybe some updates or something have fixed the issue.

0 Votes 0 ·
JackDobiash-1696 avatar image
0 Votes"
JackDobiash-1696 answered JamesEdmonds-7766 commented

Hey all, we are also experiencing this on two of our clusters. I'm pretty sure it's a bug that was introduced by a patch sometime near the start of the year. We have a 3rd cluster which has not had the the issue, but it hasn't been updated in a while. We are running Server 2016 Clusters. I know pretty much exactly WHAT the problem is, but not how to fix it. The issue occurs every time the CNO password gets updated, which occurs around every 21 days (at least on our system). Once the password has been updated by the 'core' node, it then somehow 'forgets' what that new password is, at least when trying to update the DNS registration. The underlaying cluster event logs even indicate that it's basically failing to login to the DNS server when attempting to update the DNS record. If we just move the core resources from one node to another (and even back to the first node), things start working again, until the next time it updates the password on the CNO. The other option is to take the 'Name' resource offline and bring it back online, that also fixes it until the next password change.

If you want to see when the last time your CNO password was updated, check the Attributes of the actual Cluster Object in AD and look for 'pwdLastSet'. In our case, it's been like clockwork each time the password is updated on both clusters. Within 24 hours of the update it starts to complain again (since it updates DNS once a day).

I'm hoping may someone else can confirm they are in the same situation? Thanks!

· 16
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Interesting!

I've just come back from a week's holiday, and don't see any errors logged at the moment.
The last time I manually removed the record, and allowed the cluster to recreate, was about a month ago.
I see our pwdLastSet for both clusters is the 7th and 11th (this is the same for both the CNO and the role Name Objects).

I don't think I've completed any CAU updates since I last had the issue, and I am running server 2022, but maybe some update had broken it and a newer one has fixed it.

I'll continue monitoring and feedback if my issue reappears.

0 Votes 0 ·

Thanks for the reply James! I get the feeling that your issue is probably different than ours, so hopefully yours is fixed. Ours definitely isn't as we had to just had to 'offline/online' our 'Name' Role again to get it working. I don't think deleting the DNS record in our case will fix it, BUT as a last ditch effort we'll give it a try the next time it occurs. Of course it takes 21 days each time to find out if it worked :)

Take care!

0 Votes 0 ·

Following

0 Votes 0 ·
Show more comments

Following too

0 Votes 0 ·
HuttonGregory-4701 avatar image
0 Votes"
HuttonGregory-4701 answered HuttonGregory-4701 published

Following as well

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

JoeBruns-4457 avatar image
1 Vote"
JoeBruns-4457 answered JamesEdmonds-7766 commented

As of 7/22

MS has identified it is indeed an issue on their end. Due to the nature of the 2016 maintenance, they have to examine code from November of 21 to April of 22 to zero in on the culprit and kill the bug.

· 6
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

That's good....in a way :)

If you want to let them know I, as a Server 2022 user, see the same issue, I will gladly speak to them if they want someone with an impacted 2022 environment for diagnosis/testing etc.

Thanks
James

0 Votes 0 ·
JoeBruns-4457 avatar image JoeBruns-4457 JamesEdmonds-7766 ·

I have let them know it affects 2022 as well.

0 Votes 0 ·

Hi Joe,

Have MS provided you with any update on this issue as of yet?

Cheers
James

0 Votes 0 ·
Show more comments
Dottn avatar image
0 Votes"
Dottn answered

We have also experienced this issue.
In our case we have noticed that the cluster attempts to update DNS records using the credentials of a random VCO in the cluster.
A workaround is to put all VCO's in a security group and give the group permission to update DNS entries for all roles in the cluster.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.