了解 DFSR 中(缺少)的分布式文件锁定Understanding (the Lack of) Distributed File Locking in DFSR

本文介绍了在 Windows 中缺少多主机分布式文件锁定机制,尤其是在 DFSR 复制的文件夹内。This article discusses the absence of a multi-host distributed file locking mechanism within Windows, and specifically within folders replicated by DFSR.

某些背景Some Background

  • 分布式文件锁定-这是指在多台计算机上具有文件的多个副本,并在打开一个文件进行写入时,所有其他副本均被锁定的概念。Distributed File Locking – this refers to the concept of having multiple copies of a file on several computers and when one file is opened for writing, all other copies are locked. 这可防止多个用户同时在多个服务器上修改文件。This prevents a file from being modified on multiple servers at the same time by several users.
  • 分布式文件系统复制- DFSR在基于状态的多主机设计中运行。Distributed File System Replication – DFSR operates in a multi-master, state-based design. 在基于状态的复制中,多主系统中的每个服务器会在其到达时将更新应用到其副本,而无需交换日志文件 (它使用版本向量来维护 "最新" 的信息) 。In state-based replication, each server in the multi-master system applies updates to its replica as they arrive, without exchanging log files (it instead uses version vectors to maintain “up-to-dateness” information). 初始同步后,任何一台服务器永远都不会获得权威,因此它在各种网络拓扑上都是高度可用且非常灵活的。No one server is ever arbitrarily authoritative after initial sync, so it is highly available and very flexible on various network topologies.
  • 服务器消息块- SMB是 Windows 中用于通过网络访问文件的通用协议。Server Message Block - SMB is the common protocol used in Windows for accessing files over the network. 简而言之,它是一种客户端-服务器协议,使用重定向程序使远程文件系统看起来是本地文件系统。In simplified terms, it's a client-server protocol that makes use of a redirector to have remote file systems appear to be local file systems. 它并不特定于 Windows,非常常见–众所周知的非 Microsoft 示例是 Samba,这使得 Linux、Mac 和其他操作系统可以充当 SMB 客户端/服务器并参与 Windows 网络。It is not specific to Windows and is quite common – a well known non-Microsoft example is Samba, which allows Linux, Mac, and other operating systems to act as SMB clients/servers and participate in Windows networks.

很重要的一点是,在复制的数据环境中,请务必清楚地略图 DFSR 和 SMB。It's important to make a clear delineation of where DFSR and SMB live in your replicated data environment. SMB 允许用户访问其文件,并且它不知道 DFSR。SMB allows users to access their files, and it has no awareness of DFSR. 同样,DFSR (使用 RPC 协议) 会使文件在服务器之间保持同步,并且不会意识到 SMB。Likewise, DFSR (using the RPC protocol) keeps files in sync between servers and has no awareness of SMB. 不要将此 post 和机会锁定中定义的分布式锁定混为一谈。Don't confuse distributed locking as defined in this post and Opportunistic Locking.

就像 Brits 所说的那样,这里有一些东西。So here's where things can go pear-shaped, as the Brits say.

由于用户可以修改多个服务器上的数据,并且由于每个 Windows server 仅知道自己的文件锁定,而且由于DFSR 不知道其他服务器上的锁的任何内容,因此用户可能会覆盖彼此的更改。Since users can modify data on multiple servers, and since each Windows server only knows about a file lock on itself, and since DFSR doesn't know anything about those locks on other servers, it becomes possible for users to overwrite each other's changes. DFSR 使用 "最后一次编写者入选" 冲突算法,因此某人必须丢失,最后保存的人员才能保留其更改。DFSR uses a “last writer wins” conflict algorithm, so someone has to lose and the person to save last gets to keep their changes. 丢失文件复制 chucked 到ConflictAndDeleted文件夹。The losing file copy is chucked into the ConflictAndDeleted folder.

现在,这种情况并不太常见。Now, this is far less common than people like to believe. 通常,在本地环境中修改真实的共享文件;在分支机构或隔间的同一行中。Typically, true shared files are modified in a local environment; in the branch office or in the same row of cubicles. 它们通常由同一团队的人员进行处理,因此人们通常会了解修改数据的同事。They are usually worked on by people on the same team, so people are generally aware of colleagues modifying data. 而且,由于它们通常在同一站点中,因此,使用共享文档的所有用户都将使用相同的服务器。And since they are usually in the same site, the odds are much higher that all the users working on a shared doc will be using the same server. Windows SMB 处理此情况。Windows SMB handles the situation here. 如果用户已锁定要修改的文件,并且他的同事尝试编辑该文件,则其他用户将收到如下错误:When a user has a file locked for modification and his coworker tries to edit it, the other user will get an error like:

文件正在使用中

而且,如果打开文件的应用程序真的很聪明(如 Word 2007),则可能会给出:And if the application opening the file is really clever, like Word 2007, it might give you:

文件正在使用中

DFSR 确实有一个用于锁定文件的机制,但它仅在服务器自己的上下文中。DFSR does have a mechanism for locked files, but it is only within the server's own context. 如果本地副本的本地副本具有排他锁,DFSR 将不会复制该文件。DFSR will not replicate a file in or out if its local copy has an exclusive lock. 但这并不会阻止其他服务器上的任何人修改该文件。But this doesn't prevent anyone on another server from modifying the file.

再次了解,在地理上修改的共享数据存在的问题是,对于某些人来说,这是相当 gnarly 的。Back on topic, the issue of shared data being modified geographically does exist, and for some folks it's pretty gnarly. 我们偶尔会问到,DFSR 不处理此锁定,并使用一只一波神奇的东西。We're occasionally asked why DFSR doesn't handle this locking and take of everything with a wave of the magic wand. 事实证明,对于多主机复制系统来说,这是一种有趣且难以解决的问题。It turns out this is an interesting and difficult scenario to solve for a multi-master replication system. 接下来,将探索双精度类型。Let's explore.

第三方解决方案Third-Party Solutions

有一些供应商解决方案需要使用此问题,他们通常会通过以下一种或多种方法来解决这些问题 * :There are some vendor solutions that take on this problem, which they typically tackle through one or more of the following methods*:

  • 使用 broker 机制Use of a broker mechanism

使用中央 "流量 cop",可以让一个服务器知道所有其他服务器以及用户锁定了哪些文件。Having a central ‘traffic cop' allows one server to be aware of all the other servers and which files they have locked by users. 遗憾的是,这也意味着分布式锁定系统中通常会出现单点故障。Unfortunately this also means that there is often a single point of failure in the distributed locking system.

拓扑

  • 完全路由网络的要求Requirement for a fully routed network

由于中央代理必须能够与参与文件复制的所有服务器进行通信,因此这会消除处理复杂网络拓扑的功能。Since a central broker must be able to talk to all servers participating in file replication, this removes the ability to handle complex network topologies. 通常不可能出现环拓扑和多个中心辐射型拓扑。Ring topologies and multi hub-and-spoke topologies are not usually possible. 在非完全路由的网络中,某些服务器可能无法直接与其他服务器或代理进行联系,并且只能与他可以与另一台服务器通信的合作伙伴通信,等等。In a non-fully routed network, some servers may not be able to directly contact each other or a broker, and can only talk to a partner who himself can talk to another server – and so on. 这在多主机环境中是正常的,但不能使用协调机制。This is fine in a multi-master environment, but not with a brokering mechanism.

拓扑

  • 仅限于一对服务器Are limited to a pair of servers

某些解决方案将拓扑限制到一对服务器,以便简化其分布式锁定机制。Some solutions limit the topology to a pair of servers in order to simplify their distributed locking mechanism. 对于较大的环境,这可能不可行。For larger environments this is may not be feasible.

  • 使用客户端和服务器上的代理Make use of agents on clients and servers
  • 不使用多主机复制Do not use multi-master replication
  • 不要使用 MS 群集Do not make use of MS clustering
  • 使用专业设备Make use of specialty appliances

* 请注意,我说的是,我通常 ! 不会发布死亡威胁,因为您的解决方案可以/不实现这些方法中的一个或多个。!* Note that I say typically! Please do not post death threats because you have a solution that does/does not implement one or more of those methods!

更深入的想法Deeper Thoughts

当你进一步考虑此问题时,会开始剪裁一些基本问题。As you think further about this issue, some fundamental issues start to crop up. 例如,如果有四个服务器,其中包含四个站点中的用户可以修改的数据,并且与其中一个用户的 WAN 连接处于脱机状态,我们该做什么?For example, if we have four servers with data that can be modified by users in four sites, and the WAN connection to one of them goes offline, what do we do? 用户仍可以访问其各自的服务器,但是否应让他们访问?The users can still access their individual servers – but should we let them? 我们不希望这些更改发生冲突,但我们确实希望他们能够继续工作,并做出公司的资金。We don't want them to make changes that conflict, but we definitely want them to keep working and making our company money. 如果在这种情况下,我们会随机阻止更改,即使可能没有发生任何冲突,用户也不能工作!If we arbitrarily block changes at that point, no users can work even though there may not actually be any conflicts happening! 无法告诉其他服务器该文件正在使用中,而您又是第一方。There's no way to tell the other servers that the file is in use and you're back at square one.

拓扑

然后是 SMB 本身和报告锁的错误处理。Then there's SMB itself and the error handling of reporting locks. 我们无法真正改变 SMB 报告共享冲突的方式,因为我们会破坏大量的应用程序,客户端仍然无法理解新扩展的错误消息。We can't really change how SMB reports sharing violations as we'd break a ton of applications and clients wouldn't understand new extended error messages anyways. Word 2007 之类的应用程序执行一些 undercover trickery 来确定谁正在锁定文件,但大多数应用程序不知道哪个用户使用了文件 (甚至是该 SMB 存在。Applications like Word 2007 do some undercover trickery to figure out who is locking files, but the vast majority of applications don't know who has a file in use (or even that SMB exists. 确实如此 ) 。Really.). 因此,当用户收到消息 "此文件正在使用" 时,这并不是特别适用–他们是否应呼叫支持人员?So when a user gets the message ‘This file is in use' it's not particularly actionable – should they all call the help desk? 支持人员是否有权访问所有文件服务器以查看哪些用户正在访问文件?Does the help desk have access to all the file servers to see which users are accessing files? 有些.Messy.

由于我们希望使用多主机来实现高可用性,因此不太适合使用 broker 系统;我们可能需要在所有服务器上运行一些内容,以便所有服务器甚至可以通过非完全路由的网络进行通信。Since we want multi-master for high availability, a broker system is less desirable; we might need to have something running on all servers that allows them all to communicate even through non-fully routed networks. 这将需要非常复杂的同步技术。This will require very complex synchronization techniques. 它会在网络 (上增加一些开销,但这可能不太太) ,需要尽快确保我们不会让用户处于工作状态;它需要 outrun 文件复制本身,实际上,它可能需要通过某种方式与复制相关联。It will add some overhead on the network (although probably not much) and it will need to be lightning fast to make sure that we are not holding up the user in their work; it needs to outrun file replication itself - in fact, it might need to actually be tied to replication somehow. 它还必须考虑与网络相关的服务器中断,而不是由某种程度的服务器故障引起的。It will also have to account for server outages that are network related and not server crashes, somehow.

拓扑

接下来,我们回到此方案的特殊客户端软件,以便更好地理解锁定,并可以为用户提供一些有用的信息 ( "在记账中调用 Susie 并告诉她发布该文档","很抱歉,文件锁定拓扑已断开,管理员阻止你打开此文件,因为它已修复" 等) 。And then we're back to special client software for this scenario that better understands the locks and can give the user some useful info (“Go call Susie in accounting and tell her to release that doc”, “Sorry, the file locking topology is broken and your administrator is preventing you from opening this file until it's fixed”, etc). 这种情况下,在 Windows 中运行的数百万应用程序非常有趣。Getting this to play nicely with the millions of applications running in Windows will definitely be interesting. 有大量的操作系统不受支持或无法获取软件– Windows 2000 的主流支持和 XP 很快就会推出。There are plenty of OS's that would not be supported or get the software – Windows 2000 is out of mainstream support and XP soon will be. Linux 和 Mac 客户端在感觉非常重要之前,不会使用此软件,因此,客户必须希望其供应商做出类似的操作。Linux and Mac clients wouldn't have this software until they felt it was important, so the customer would have to hope their vendors made something analogous.

更多更多信息More inforamtion

现在,在 DFSR 中控制这种情况的最简单方法是使用 DFS 命名空间指导用户使用一致的命名空间的可预测位置。Right now the easiest way to control this situation in DFSR is to use DFS Namespaces to guide users to predictable locations, with a consistent namespace. 通过正确配置 DFSN 站点拓扑和服务器链接,你可以强制用户共享同一个本地服务器,并且仅允许他们在 "主" 服务器关闭时访问远程计算机。By correctly configuring your DFSN site topology and server links, you force users to all share the same local server and only allow them to access remote computers when their ‘main' server is down. 对于大多数环境,这种方式非常有效。For most environments, this works quite well. 作为 DFSR 的替代方法,SharePoint 是一个选项,因为它是其签出/签入系统。Alternative to DFSR, SharePoint is an option because of its check-out/check-in system. BranchCache (在 Windows Server 2008 R2 和 Windows 7) 中可能是为你选择的选项,因为它旨在缓动在分支方案中读取文件,但最终权威数据仍将在一台服务器上出现,在此更详细的信息。BranchCache (coming in Windows Server 2008 R2 and Windows 7) may be an option for you as it is designed for easing the reading of files in a branch scenario, but in the end the authoritative data will still live on one server only – more on this here. 同样,这些供应商也有其自己的解决方案。And again, those vendors have their solutions.