Windows Server 2012, File Servers and SMB 3.0 – Simpler and Easier by Design
I have been presenting our solution in Windows Server 2012 for File Storage for Virtualization (Hyper-V over SMB) and Databases (SQL Server over SMB) for a while now. I always start the conversation talking about how simple and easy it is for an IT Administrator or an Application Developer to use SMB 3.0.
However, I am frequently challenged to detail exactly what that means. After all, both “simple” and “easy” are fairly subjective concepts. In this blog post, I will enumerate some design decisions regarding the SMB 3.0 protocol and its implementation in Windows to make this more concrete.
Please note that, while the topic here is simplicity, I will essentially be going into details behind the implementation, so this could get a little intricate. Not to scare anyone, but if you’re just getting started with SMB, you should probably start with a lighter topic. Or proceed at your own risk :-).
Here’s a summary of the items on this post:
- 1. Introduction
- 2. SMB Transparent Failover
- 2.1. SMB Transparent Failover – No application failures, no application changes
- 2.2. SMB Transparent Failover – Continuous Availability turned on by default
- 2.3. SMB Transparent Failover – Witness Service
- 3. SMB Scale-Out
- 3.1. SMB Scale-Out – Volumes, not drive letters
- 3.2. SMB Scale-Out – Single name, dynamic
- 3.3. SMB Scale-Out – Using node IP addresses
- 4. SMB Multichannel
- 4.1. SMB Multichannel – Auto-discovery
- 4.2. SMB Multichannel – Transparent failover
- 4.3. SMB Multichannel – Interface arrival
- 4.4. SMB Multichannel – Link-local
- 5. SMB Direct
- 5.1. SMB Direct – Discovery over TCP/IP
- 5.2. SMB Direct – Fail back to TCP/IP
- 6. VSS for SMB File Share
- 6.1. VSS for SMB File Share – Same model as block VSS
- 6.2. VSS for SMB File Shares – Stream snapshots from file server
- 7. SMB Encryption
- 7.1. SMB Encryption – No PKI or certificates required
- 7.2. SMB Encryption – Hardware acceleration
- 8. Server Manager
- 8.1. Server Manager – Simple Wizards
- 8.2. Server Manager – Fewer knobs
- 9. SMB PowerShell
- 9.1. SMB PowerShell – Permissions for a New Share
- 9.2. SMB PowerShell – Permissions on the Folder
- 9.3. SMB PowerShell – Cluster type based on disk type
- 9.4. SMB PowerShell – Shares, Sessions and Open Files across Scale-Out nodes
- 9.5. SMB PowerShell – Connection
- 10. Conclusion
2. SMB Transparent Failover
SMB Transparent Failover is a feature that allows an SMB client to continue to work uninterrupted when there’s a failure in the SMB file server cluster node that it’s using. This includes preserving information on the server side plus allowing the client to automatically reconnect to the same share and files on a surviving file server cluster node.
2.1. SMB Transparent Failover – No application failures, no application changes
Over the years, we have improved SMB to make the protocol more fault tolerant. We’ve been taking some concrete steps that direction, like introducing SMB Durability in SMB 2.0 (if you lose a connection, SMB will attempt to reconnect, but without guarantees) and SMB Resiliency in SMB 2.1 (guaranteed re-connection, but application need to be changed to take advantage of it).
With SMB 3.0, we can now finally guarantee a transparent failover without any application changes through the use of SMB Persistence (also known as Continuously Available file shares). This final step gives us the ability to simply create a share that supports persistency (Continuously Available share) with the application accessing that share automatically enjoying the benefit, without any changes to the way files are opened and no perceived errors that the application needs to handle.
2.2. SMB Transparent Failover – Continuous Availability turned on by default
Once we had the option to create Continuously Available file shares, we considered the option to make it a default setting or leaving it up to the administrator to set it. In the end, we made Continuous Availability the default for any share in a file server cluster. In order to feel comfortable with that decision, we worked through all the different scenarios to make sure we would not break any existing functionality.
Expert Mode: There is a PowerShell setting to turn off Continuous Availability. You can also use PowerShell to tweak the default time that SMB 3.0 waits for a reconnection, which defaults to 60 seconds. These parameters are available for both New-SmbShare and Set-SmbShare. See http://blogs.technet.com/b/josebda/archive/2012/06/27/the-basics-of-smb-powershell-a-feature-of-windows-server-2012-and-smb-3-0.aspx
2.3. SMB Transparent Failover – Witness Service
Another problem we had with the Clustered File Server failover was TCP/IP timeouts. If the file server cluster node you were talking to had a hardware issue (say, someone pulled the plug on it), the SMB client would need to wait for the TCP/IP timeout. This was really bad if we had a requirement to make the failover process automatic and fast. To make that work better, we created a new (optional) protocol and service to simply detect these situations and act faster.
In Windows Server 2012, the client connect to one cluster node for file access and to another cluster node for witness service. The witness service is automatically loaded on the server when necessary and the client is always ready for it, with no additional configuration required on either client or server. Another key aspect was making the client find a witness when connecting to a cluster, if available, without any manual steps. Also, there’s logic in place to find a new witness if the one chosen by the client for some reason is no longer there. Witness proved out to be a key component, which is also used for coordinating moves to rebalance a Scale-Out cluster.
Regarding simplicity, no additional IT Administrator action is required on the file server cluster to enable the Witness service. No client configuration is required either.
3. SMB Scale-Out
SMB Scale-out is the new ability in SMB 3.0 in cluster configuration to show a share in all nodes of a cluster. This active/active configuration makes it possible to scale file server clusters further, without a complex configuration with multiple volumes, shares and cluster resources.
Expert mode: There are specific capabilities that cannot be combined with the Scale-Out file servers, like DFS Replication, File Server quotas and a few others, as outlined in the screenshot above. If you have workloads that require those abilities, you will need to deploy Classic File Servers (called File Server for general use in the screenshot above) to handle them. The Hyper-V over SMB and SQL Server over SMB scenarios are a good fit for Scale-Out File Servers.
More information: http://technet.microsoft.com/en-us/library/hh831349.aspx
3.1. SMB Scale-Out – Volumes, not drive letters
This new Scale-Out capability is enabled by using a couple of feature from Failover Clustering: Cluster Shared Volumes (CSV) and the Dynamic Network Name (DNN).
CSV is a special volume that shows on every cluster node simultaneously, so you don’t have to be worried about which cluster node can access the data (they all do). These CSV volumes show under the C:\ClusterStorage path, which means you don’t need a drive letter for every cluster disk. Definitely simpler.
3.2. SMB Scale-Out – Single name, dynamic
The DNN is also much easier to use, since you can have a single name that allows SMB clients to connect to the cluster nodes. A client will be automatically directed to one of the nodes in the cluster in a round-robin fashion. No extra configuration required. Also, there is really no good reason anymore to have multiple names in a single cluster for File Servers. One name to rule them all.
3.3. SMB Scale-Out – Using node IP addresses
In the past, File Server Cluster groups required specific IP addresses in addition to the regular IP addresses for each node (or a set of IP addresses per cluster group, if you have multiple networks). Combined with the fact that we used to need multiple cluster names to leverage multiple nodes, that made for a whole lot of IP addresses to assign. Even if you were using DHCP, that would still be complex.
With SMB Scale-Out, as we explained, you only need one name. In addition to that, we no longer need additional IP addresses besides the regular cluster node IP addresses. The DNN simply points to the existing node IP addresses of every node in the cluster. This means fewer IP addresses to configure and fewer Cluster resources to manage.
4. SMB Multichannel
We introduced the ability to use multiple network interfaces for aggregated throughput and fault tolerance in SMB 3.0, which we call SMB Multichannel.
4.1. SMB Multichannel – Auto-discovery
We struggled with the best way to configure SMB Multichannel, since there are many possible configurations. In other solutions in the same space, you are sometimes required to specify each client the multiple IP addresses on each server, requiring updates whenever the cluster or networking layout changes. We thought that was too complex.
In probably the best example of the commitment to keeps things simple and easy, we decided to change the SMB protocol to include the ability for the client to query the server network configuration. This allows the client to know the type, speed and IP address of every NIC on the server. With that information at hand, the client can then automatically select the best combination of paths to use. It was not an easy thing to implement, but we thought it was worth doing. The end result was better than what we expected. SMB 3.0 does all the hard work and we got extensive feedback that this is one of the most impressive new abilities in this version.
Expert mode: There are PowerShell cmdlets to define SMB Multichannel Constraints, in case the configuration automatically calculated by SMB Multichannel is less than ideal. We strongly recommend against using this, since it will make SMB unable to adapt to changing configurations. It’s an option, though. See link above for details.
4.2. SMB Multichannel – Transparent failover
With multiple network interfaces used by SMB Multichannel, we now have the ability to survive the failure of a network path, as long as there is at least one surviving path. To keep things simple and easy, this happens automatically and the user and applications do not see any interruption of their work.
For instance, if you have two 10GbE NICs in use on both server and client, SMB Multichannel will use them both to achieve up to 20Gbps throughput. If someone pulls off one of the cables in the client, SMB will automatically detect that situation instantly and move all the items queued on the failed NIC to the surviving one. Since SMB is already fully asynchronous, there is no problem with packet ordering. You obviously will work at only 10Gbps after this failure, but this will otherwise be completely transparent.
If an interface fails on the server side, the behavior is a bit different. It will take a TCP/IP timeout to figure out that server interface was lost and this means a few packets might be delayed a few seconds. Once the situation is detected, the behavior described for the client-side interface loss will take place. Again, this is all transparent to users and applications, but it will take a few more seconds to happen.
4.3. SMB Multichannel – Interface arrival
Beyond surviving the failure of an interface, SMB Multichannel can also re-establish the connections when a NIC “comes back”. Again, this happens automatically and typically within seconds, due to the fact that the SMB client will listen to the networking activity of the machine, being notified whenever a new network interface arrives on the system.
Using the example on the item above, if the cable for the second 10GbE interface is plugged back in, the SMB client will get a notification and re-evaluate its policy. This will lead to the second set of channel being brought back and the throughput going up to 20Gbps again, automatically.
If a new interface arrives on the server side, the behavior is slightly different. If you lost one of the server NICs and it comes back, the server will immediately accept connections on the new interface. However, clients with existing connections might take up to 10 minutes the readjust their policies. This is because they will poll the server for configuration adjustments in 10-minute intervals. After 10 minutes, all clients will be back to full throughput automatically.
4.4. SMB Multichannel – Link-local
There is a class of IP addresses referred to as link-local addresses. These are the IPv4 addresses starting with 169.254 and IPv6 addresses starting with FE80. These special IP addresses are automatically assigned when no manual addresses are configured and no DHCP server is found on the network. They are not routable (in these configurations, there is no default gateway) and we were tempted to drop support for them in SMB Multichannel, since it would complicate our automatic policy logic.
However, we heard clear feedback that these type of addresses can be useful and they are the simplest and easiest form of addressing. For certain types of back-end network like a cluster internal network, they could be an interesting choice. Because of that, we put in the work to transparently support both IPv4 and IPv6 link-local addresses.
Expert mode: You should never mix static and link-local addresses on a single interface. If that happens, SMB will ignore link-local addresses in favor of the other address.
5. SMB Direct
SMB Direct introduced the ability to use RDMA network interfaces for high throughput with low latency and low CPU utilization.
More information: http://technet.microsoft.com/en-us/library/jj134210.aspx
5.1. SMB Direct – Discovery over TCP/IP
When using these RDMA network interfaces, you need specific connections to leverage their enhanced data transfer mode. However, since not every server hardware out there will have this RDMA capability, you have to somehow discover when it’s OK to use RDMA connections.
In order to keep things simple for the administrator, all RDMA network interfaces that support SMB Direct, regardless of technology, are required to behave as a regular NIC, with an IP address and a TCP/IP stack. For the IT administrator, these RDMA-capable interfaces look exactly like regular network interfaces. Their configuration is also done in the same familiar fashion.
The initial SMB negotiation will always happens over traditional TCP/IP connections and SMB Multichannel will be used to find if a network interface has this RDMA capability. The shift from TCP/IP to RDMA connections happens automatically, shortly after SMB discovers that the client and server are capable of using the 3.0 dialect, they both support the SMB Multichannel capability and that there are RDMA NICs on both ends.
Expert mode: This process is so simple and transparent that IT Administrators sometimes have difficulty telling that RDMA is actually being used, besides the fact that you enjoy better throughput and lower CPU usage. You can specifically find out by using PowerShell cmdlets (like Get-SmbMultichannelConnection) and performance counters (there are counters for the RDMA traffic and the SMB Direct connections).
5.2. SMB Direct – Fail back to TCP/IP
Whenever SMB detects an RDMA-capable network, it will automatically try to use their RDMA capability. However, if for any reason the SMB client fails to connect using the RDMA path, it will simply continue to use TCP/IP connections instead. As we mentioned before, all RDMA interfaces compatible with SMB Direct are required to also implement a TCP/IP stack, and SMB Multichannel is aware of that.
Expert mode: If SMB is using the fallback TCP/IP connection instead of RDMA connections, you are essentially using a less efficient mode of your RDMA-capable network interface. As explained before, you can detect this condition by using specific PowerShell cmdlets and performance counters. There are also specific events logged when that occurs.
6. VSS for SMB File Share
Backup and Restore are important capabilities in Storage. In Windows Server, Volume Shadowcopy Services (VSS) is a key piece of infrastructure for creating application-consistent snapshots of data volumes.
6.1. VSS for SMB File Share – Same model as block VSS
VSS has a well-established ecosystem for block solutions and we decided that the best path for file shares was to follow the same model. By offering VSS for SMB File Shares, the backup applications and the protected applications can leverage their investments in VSS. For an IT Administrators already using VSS, the solution looks familiar and easy to understand.
6.2. VSS for SMB File Shares – Stream snapshots from file server
In most backup scenarios, after a snapshot is created, you need to stream the data from the application server to a backup server. That’s, for instance, the model used by Data Protection Manager (DPM). While there are technologies to move snapshots around in some cases, some types of snapshot can only be surfaced on the same host where the volume was originally snapped.
By using VSS for SMB File Shares the resulting snapshot is represented as a UNC path pointing to the file server. That means it’s simple to tell that you can rely on the file server to stream the backup data and not have to bother the application server for anything else after the snapshot is taken.
7. SMB Encryption
Encryption solutions are an important capability, but they are typically associated with Public Key Infrastructure (PKI) and the management of certificates. This leads to everyone concluding this is complex and hard to deploy. While SMB can be used in conjunction with IPSec encryption, we thought we needed something much simpler to deploy.
7.1. SMB Encryption – No PKI or certificates required
With SMB Encryption, the encryption key is derived from the existing session key, so you don’t need PKI or certificates. These keys are not shared on the wire and you can simply enjoy yet another benefit of having already deployed your Active Directory infrastructure. From an IT administrator perspective, enabling encryption simply means checking a box (or adding a single parameter to the PowerShell cmdlet that creates the share). Clients don’t need to do anything, other than support SMB 3.0.
Expert mode: The default configuration is to configure encryption per share, but there is an option to enable encryption for the entire server, configured via PowerShell. You simply need to use Set-SmbServerConfiguration –EncryptData $true.
7.2. SMB Encryption – Accelerated Encryption
Probably the second highest concern with encryption solution is that it would make everything run slower. For SMB encryption, we decided to use a modern algorithm (128-bit AES-CCM) which can be accelerated significantly on CPUs supporting the AES-NI instruction set. That means that, for most modern CPUs like the Intel Core i5 and Intel Core i7, the cost of encryption becomes extremely low. That removes another potential blocker for adoption of encryption, which is the complexity associated in planning additional processing capacity to handle it. It’s also nice to note that AES-CCM provides integrity, in addition to privacy.
There’s no need to configure any settings on the client or the server to benefit from this accelerated encryption.
8. Server Manager
For IT Administrator handling individual shares using a GUI, we are providing in Windows Server 2012 a new Server Manager tool. This tool allows you to create your SMB file shares using a simple set of wizards.
More information: http://technet.microsoft.com/en-us/library/hh831456.aspx
8.1. Server Manager – Simple Wizards
In Server Manager, you get an “SMB Share – Quick” that creates a simple share, an “SMB Share – Advanced” wizard to create shares with additional options (like quotas and screening) and an “SMB Share – Applications” wizard specifically for creating shares intended for server-to-server workloads (like Hyper-V over SMB or SQL Server over SMB).
These wizards are exactly what you would expect from Windows: a set of simple questions that are easy to follow, with plenty of explanatory text around them. It’s also important to mention that you can run Server Manager against remote servers. In fact, you can even run it on the IT Administrator’s Windows 8 PC or tablet.
8.2. Server Manager – Fewer knobs
When creating the new wizards, we tried as much as possible to simplify the workflow. The simple share wizard asks only a few questions: what disk the share will sit on, what’s the name of the share, what are the options you want to enable and who has access. For each page, we provided a simple selection to make the process as straightforward as possible.
For instance, in the selection of the location of the share, we ask you to select a server, than a volume (based what’s available to the server you targeted). You are shown information on the type of server and on capacity/free disk on each volume to help with your selections. You don’t even have to create a folder structure. The wizard will automatically create a folder with the specified share name under the “Shares” folder on the selected volume. You can still specify a full path if you want.
When we started thinking about the options page, we had a huge list of knobs and switches. In the end, the options page was trimmed down to show only the four most commonly used options, graying them out if not relevant for that type of wizard. We also spent lots of time thinking of the right default selections for these options. For instance, in the “SMB Share – Applications” options page, only encryption is offered. In that case, Continuous Availability is always selected, while Access-based enumeration and BranchCache are always deselected.
9. SMB PowerShell
While most people looking for the most simplicity and ease of use will likely be using the Server Manager GUI, most IT Administrators managing a significant number of shares will be using PowerShell. PowerShell is also a tool that will let you go deeper into understanding the inner workings of SMB 3.0. However, we wanted to make SMB PowerShell simple enough that any IT Administrator could understand it.
9.1. SMB PowerShell – Permissions for a New Share
The New-SmbShare cmdlet, used to create new SMB file shares, is very straightforward. You provide a share name, a path and optionally some share settings like access-based enumeration or encryption. Then there’s permissions. At first, we considered requiring the administrator to use a second cmdlet (Grant-SmbShareAccess) to add the required permissions to the share. However, since some set of permissions are always required for a share, we thought of a way to add permissions in the same cmdlet you use to create the share. We did this by using the –FullAccess, -ChangeAccess, –ReadAccess and –NoAccess parameters. This makes the entire share creation a single line of PowerShell that is easy to write and understand.
For instance, to create a share for Hyper-V over SMB using a couple of Hyper-V hosts, you can use:
New-SmbShare –Name VMS –Path C:\ClusterStorage\Volume1\VMS –FullAccess Domain\HVAdmin, Domain\HV1$, Domain\HV2$
9.2. SMB PowerShell – Permissions on the Folder
While we believe we found an simple and easy way to declare the share permissions, anyone experienced with SMB knows that we also need to grant permissions at the file system level. Essentially, we need to grant the NTFS permissions on the folder behind the SMB share. In most cases, we want the NTFS folder permissions to match the SMB share permissions. However, we initially overlooked the fact that we were requiring the IT Administrator to declare those permissions twice. That was additional, repetitive work that we could potentially avoid. And granting permissions at the NTFS level is a little more complex, because there are options for inheritance and a more granular permission model than the SMB permissions.
To facilitate that, we introduced an option to address this. The basic idea is to provide a preconfigured set of NTFS permissions based on the SMB share permissions. The goal was to create the Access Control List (ACL) automatically (we called it the “Preset Path ACL”), providing the IT Administrator with a simple way to apply the SMB share ACL to the NTFS folder. It is an optional step, but we believe it is useful.
Here’s what that command line looks like, assuming your share is called “VMS”:
(Get-SmbShare –Name VMS).PresetPathAcl | Set-Acl
9.3. SMB PowerShell – Cluster type based on disk type
Another item that was high on our list was automatically detecting if the share was sitting on a Nonclustered File Server, on a Classic File Server cluster or on a Scale-Out File Server. This is not a trivial thing, since the same Windows Server is actually capable of hosting all three types of file shares at the same time. At first, we were considering requiring the IT Administrator to explicitly state what type of file share should be used and exactly what cluster name (scope name) should be used.
In the end, we decided that we could infer that information on almost all cases. A Scale-Out file share always uses a folder in a Cluster Shared Volumes (CSV), while a Classic clustered file share will always use a traditional cluster disk and a nonclustered file share will use a disk that is not part of the cluster. With that in mind, the “New-SmbShare” cmdlet (or more precisely the infrastructure behind it) is able to find the scope name for a given share without requiring the IT Administrator to specify the scope name.
Type of Disk Type of share Scope Name Local disk (non-clustered) Nonclustered file share N/A Traditional cluster disk Classic clustered file share Name resource on Cluster Group Cluster Shared Volume Scale-Out file share Distributed Network Name (DNN)
Expert mode: The only situation where you must specify the Scope Name while creating a new SMB file share is when there are two DNNs in a clusters. We recommend that you use only one DNN per cluster. In fact, we don’t see any good reason to use more than one DNN per cluster.
9.4. SMB PowerShell – Shares, Sessions and Open Files across Scale-Out nodes
SMB Scale-Out introduces the concept of multiple file server cluster nodes hosting the same shares and spreading client connections across nodes. However, for an IT Administrator managing this kind of cluster, it might seem more complicated. To avoid that, we made sure that any node of a Scale-Out File Server answers consistently for the entire cluster. For the list of shares that was easy, since every node (by definition) has the full list of shares. For sessions and open files that was a bit more complex and we actually had make the node you’re talking to gather the information from the other nodes.
In the end, the IT Administrator can rest assured that, when using Get-SmbSession or a Get-SmbOpenFile in a Scale-Out File Server, what is provided is a combined view of the cluster and there’s no need to query every node in the cluster and collate that information manually.
9.5. SMB PowerShell – Connection
With all these new capabilities in SMB 3.0 and the potential in the protocol to negotiate down to previous versions of SMB, it’s important to understand what dialect was negotiated. There is a simple “Get-SmbConnection” cmdlet, which you should run on an SMB client, that shows information on which SMB connections are currently established.
This can tell you which shares you are connected to (for a given server in the example above, but you can also omit the -ServerName parameter to see all connections) and which version of SMB (dialect) you are using (SMB 3.0 in the example above). You can also see how many files you have opened and what credentials you are offering to the server. If you look at all the data returned (by piping the output to “Select *” as shown below), you get the details on whether the share is continuously available or if the connection is encrypted.
You might argue that is not necessarily the simplest command output you have seen. However, this is much simpler than what we used before. To find out this type of information in previous releases, you would need to break out a network tracing tool like Network Monitor or WireShark and starting capturing network packets, analyzing the traffic to find the negotiation messages.
I hope this post clarifies our claims for simplicity and ease of use in SMB 3.0 and the Windows Server 2012 File Server. It is always harder to code a solution that “just works” with minimal administration intervention, but we do believe this is something worth doing it.
I would love to hear back from you on the progress we made in Windows Server 2012. Were these the right decisions? Did we miss anything important? Are there areas where things are still complex or hard to use? Let me know through the blog post comments…