When Good Network Location Servers Go Bad – Preparing Against NLS Failure
In a recent article on The Edge Man blog, I talked about the Network Location Server (NLS) and how it’s used to help the DirectAccess (DA) client determine if it’s on or off the corporate network. If you missed that article, or need a refresher, check it out at http://blogs.technet.com/tomshinder/archive/2010/04/02/directaccess-client-location-awareness-nrpt-name-resolution.aspx
Now that you have a basic understanding of what the NLS server does and its critical role in a DirectAccess solution, the next step is to figure out what happens when the NLS server is unreachable by the DA client when the DA client is on the corpnet.
Discuss UAG DirectAccess issues on the TechNet Forums over at http://social.technet.microsoft.com/Forums/en-US/forefrontedgeiag -----------------------------------------------------------------------------------------
What might make the NLS server unreachable? Consider the following:
- The web servers that that host the NLS URL are offline
- Network connectivity between the DA client and the NLS server is disrupted, perhaps due to a router failure or problems with the routing tables or other components of the routing infrastructure, such as failed cables or switches
- Misconfiguration of the DNS host record entry for the NLS server name or IP address
- Network connectivity issues between the DA client and the DNS server hosting the A record for the NLS server
- Issues with the web site certificate installed on the NLS servers, such as an expired certificate or a revoked certificate
- Issues with contacting the server(s) hosting the Certificate Revocation List – the DA client must be able to access the CRL in order to establish an HTTPS connection to the NLS
- For branch office users, perhaps a WAN link failure, or failure of a site to site VPN could prevent name resolution for the NLS server, or prevent access to the NLS server, or prevent access to the server(s) hosting the CRL
There are probably many other things that could cause problems when the DA client tries to connect to the NLS server. If this happens, what is the result?
- The DA client will assume that it still on the outside, and the NRPT will be used to determine what DNS server to send name query requests to, and the DA client will not use the DNS server address configured on its NIC. In a UAG DA environment, that means that the DA client will send name query requests to the UAG DA server’s DNS proxy component, using the IPv6 address of the UAG DA server. Name query requests are for the names configured on the NRPT, which will include the names on the corpnet, with an exemption for the NLS, so that it will use its locally configured DNS settings on the NIC to find the name of the NLS server (which won’t help in this circumstance, since some other issue is causing the DA client to not connect to the NLS server).
- The Domain profile will not be activated, and either a public or private profile will be enabled, depending on which network type the user selected when connecting to the corpnet (private profile for home/work and public profile if the user selected public). This means that the DA client computer will try to connect to corpnet resources over the UAG DA server, by using the IPsec tunnels that are defined in the DA clients Connection Security Rules.
What happens at this point is determined by whether or not the DA client on the corpnet has connectivity to the UAG DA server on the Internet. That is to say, the results differ depending on whether or not the DA client on the corpnet can connect to the external IP address on the UAG DA server.
What happens when the DA client is not able to connect to the UAG DA server’s external IP address when a connection to the NLS fails?
- The client tries to resolve the FQDN of an internal resource. The DNS server on the corpnet cannot be contacted, since the UAG DA client isn’t able to connect to the UAG DA server’s DNS proxy, so the connection times out. There is no fall back for FQDNs, so the chosen fall back mechanism doesn’t activate. End result: failed connection attempt.
- The client tries to resolve a single-label (local) name on the internal network. In this case the client will first fully qualify the single-label name with it’s own domain name (that is to say, the name of the domain that the DA client belongs to, since all DA clients must be domain members) and any DNS suffixes that the client might have been configured to use. Since the DA client won’t be able to contact the UAG DA server’s DNS proxy, the DNS queries will time out and fail. However, since this was a single-label name query, fall back will take place based on the method you choose when you set up DA on the UAG server. The fall back mechanism also depends on the type of network the user chose when connecting to the corpnet. Check out http://blogs.technet.com/tomshinder/archive/2010/04/02/directaccess-client-location-awareness-nrpt-name-resolution.aspx where I talk about the fall back mechanisms in more detail. There are a number of factors involved with fall back, such as whether the resource is on the local link, and whether or not a WINS server is available and has a name mapping for the requested resource.
Next, what happens when the DA client is able to connect to the external interface of the UAG DA server? Remember, the UAG DA server cannot enable outbound connections from hosts on the internal network, not even outbound connections from its internal interface to its external interface. That means that some other gateway on the network must be available to allow the connection to the external interface of the UAG DA server.
In this case the DA client tries to connect to the resource first by using a FQDN. Depending on the connectivity the DA client has with the external interface of the UAG DA server, the client might use Teredo or IP-HTTPS to bounce back to the internal network through the UAG DA server. Since the DA client can connect to the UAG DA server’s DNS proxy, it will be able to resolve the name of the internal network destination host.
However, the request/response path is not going to be efficient:
- The DA client establishes the IPsec tunnels to the external interface of the UA DA server
- The client connects to corporate resources through these tunnels, both the intranet and the infrastructure tunnels
- The requests are forwarded through the UAG DA server to the destination server on the corpnet
- The destination server responds to the request and the response is routed back through the DA server
- The response makes it back to the DA client through whatever gateway the DA client used to send the outbound request
The request/response path will look something like this (from a very high level view):
As you can imagine, performance is likely to suffer, depending on the number of interposed devices and the traffic profiles on the networks that the packets have to traverse. Also, there are a number of potential points of failure, which doesn’t help either. However, this scenario does allow the DA clients to connect to corporate resources in the event of a NLS failure, which could buy you time while you’re trying to fix the primary problem.
Preparing and Preventing Failure
The fact is that bad things happen to good computers. Server failure is not a matter of “if” it will happen, it’s a matter of “when” it will happen. Since you know that your NLS servers and all the other dependent components are going to fail at some point in time, what can you do to mitigate these failures and have your users experience the least amount of pain during the event?
- Make sure your NLS servers are highly available – use NLB or an external load balancer
- Have a certificate lifecycle management process in place and pay strict attention to that management process so that certificates do not expire
- Make sure that your CRLs are highly available – a highly available deployment of NLS servers is of little use if the CRL is not available.
- Locate your NLS servers at multiple locations so that a single point of network failure does not impact the DA clients; consider installing NLS servers at branch offices where the risk of WAN failure is relatively high
- Review DNS record management processes and point out that the NLS server is a high priority name, on par with the names of the domain controllers and other critical network assets
There are some additional measures you can take to make sure that NLS failure causes the least disruption as possible:
- Think about what fall back setting is best for your organization. While the default setting is a balance between security and accessibility, you might want to favor the latter over the former, depending on your assessment of the actual security risks imposed by allowing DA clients to broadcast names over untrusted networks when single-label name resolution fail over occurs.
- Deploy the DirectAccess Group Policy Objects and settings only to computers that will act as DirectAccess clients. This is a critical point of distinction and worth repeating early and often. This means that you need to create custom groups to apply these Group Policy settings, or at least create custom OUs for the settings. Don’t apply the DA GPO settings to any of the default groups, and don’t apply them to machines that will never act as DA clients, such as servers and domain controllers.
- Train your users on what to do in the event that a DA failure occurs. The DirectAccess Connectivity Assistant (DCA) which you can download at http://www.microsoft.com/downloads/details.aspx?FamilyID=9A87EFE8-E254-4473-8A26-678ADEA6D9E9&displaylang=en will inform users when there is a problem with the DA connection. Users can then right click the DCA icon in the system tray (notification area) and click Prefer Local DNS Names. When the users select this option, the DA related entries in the NRPT are removed, and the DA client will then be able to resolve names using the the DNS server address configured on the DA client’s NIC, and will connect to resources using their IPv4 address. Note that when there’s a connectivity change (network status change) the normal DA client behavior will start again, and if the DCA continues to show connectivity issues, the user will need to enable the Prefer Local Names option again.
The Network Location Server allows the DA client to know when it’s on or off the corpnet. However, when there is an issue preventing the DA client from connecting to the NLS server when the DA client is on the corpnet, the client will act as if it off the corpnet, and this can cause a number of problems, depending on the current network configuration and whether or not the DA client can reach the external interface of the UAG DA server when the DA client is on the corpnet. This article summarized a number of things you can do to mitigate NLS connectivity failures and help insure that these failures have less negative impact on network operations and your users when they occur.
In the near future we will publish a white paper on this issue and it will expand a bit on what I’ve covered in these two blog posts on Network Location Servers. I’ll make sure to post the URL to that paper on this blog when it becomes available.
UAG Direct Access/Anywhere Access Team
The “Edge Man” blog (DA all the time): http://blogs.technet.com/tomshinder/default.aspx
Follow me on Twitter: https://twitter.com/tshinder