Loose notes on FILESTREAM…
Following a previous post on the subject, here are a few things you should be attentive when planning a FILESTREAM implementation.
There are two security requirements for using the FILESTREAM feature:
- SQL Server must be configured for integrated security.
- If remote access will be used, then the SMB port (445) must be enabled through any firewall systems. This is the same as required for regular remote share access.
FILESTREAM data is secured just like other data is secured within the database engine: by granting permissions at the table or column levels. If a user has permission to the FILESTREAM column in a table, the user can open the associated files. Only the account under which the SQL Server service account runs is granted the proper NTFS permissions to the FILESTREAM container. It is recommended that no other account be granted permissions on that data container.
When a database is opened, SQL Server restricts access to the FILESTREAM data container(s), except when that access is made by using T-SQL transactions and OpenSqlFilestream APIs. Be advised however, when the database is closed, the physical data container is fully available and subject to Windows security checks.
It is recommend that the directories that contain FILESTREAM data are secured so that the files cannot be accidentally altered or deleted thru NTFS access. Windows admins will always be able to get access whatever the ACLs and OS. This is the behavior for any resource on a server, not FILESTREAM specific.
…Transactional consistency and recoverability
FILESTREAM maintains transactional consistency between the structured and unstructured data at all times, even allowing point-in-time recovery of FILESTREAM data using log backups. Consistency is maintained automatically by SQL Server and does not require any custom logic, greatly improving the design time for applications. The FILESTREAM mechanism does this by keeping a sort of transaction log, named FILESTREAM garbage collection, which is a background task that is triggered by the database checkpoint process.
The combination of the database’s transaction log along with the FILESTREAM transaction log allows the FILESTREAM and structured data to be transactionally recoverable. For this reason, FILESTREAM data should never be manipulated outside SQL Server, at the risk of making the database inconsistent.
All backup types and recovery models work with FILESTREAM data, and the FILESTREAM data is fully backed up with the structured data in the database, even though a backup that combines SQL Server database files and a large number of FILESTREAM files will be slower than a backup of just SQL Server database files of an equivalent total size.
This is sort of expected, if you consider the extra overhead of backing up each NTFS file (one per FILESTREAM data value). This overhead becomes more noticeable when the FILESTREAM files are smaller, as the additional time becomes a larger percentage of the total time to backup per MB of data. In essence, we all know that copying small files is intrinsically random I/O, when compared to a greater chance of running into sequential I/O when dealing with large files, and the volume was properly defragmented and able to allocate space sequentially for a given file.
In this case, depending on the amount of data, it can be preferable to use a partial backup to exclude FILESTREAM filegroups, and backup the filegroups that contain FILESTREAM data separately.
As stated earlier, FILESTREAM works with all recovery models and all forms of backup and restore (full, differential, and log). So in a disaster situation, if the CONTINUE_AFTER_ERROR option is specified on either a BACKUP or RESTORE option, the FILESTREAM data may not recover with zero data loss (similar to recovery of regular data when CONTINUE_AFTER_ERROR is specified).
As part of a recovery plan, and because all FILESTREAM data is stored in GUID-like named files, it may be wise to periodically store the mappings for these names in a database table (on a different database) or text file. This is not the most desirable or even elegant solution, but if a FILESTREAM enabled database becomes completely unrecoverable by any series of misfortunate events (I tremble just thinking about it), all is not lost if the files can still be identified by name. The example in this previous post can be used under the DAC to retrieve this information, if the Schema of the database was designed with that purpose.
Until next time!
Disclaimer: I hope that the information on these pages is valuable to you. Your use of the information contained in these pages, however, is at your sole risk. All information on these pages is provided "as -is", without any warranty, whether express or implied, of its accuracy, completeness, fitness for a particular purpose, title or non-infringement, and none of the third-party products or information mentioned in the work are authored, recommended, supported or guaranteed by Ezequiel. Further, Ezequiel shall not be liable for any damages you may sustain by using this information, whether direct, indirect, special, incidental or consequential, even if it has been advised of the possibility of such damages.