FileStream Data Type : SQL Server 2008

I am loving the enthusiasm that people have for SQL Server 2008. Everyday I come across many questions on SQL Server 2008, this shows that people are curious and I am pretty sure that people are regularly reading and learning SQL Server 2008 to know more about it and how it can help to improve their present working environment. SQL Server 2008 CTP5 is awaited and it would be out for us very soon. There are lots of new features introduced in CTP5 and I think it will keep me busy writing and keep you busy reading. So lets get started and keep ourselves busy for something good. J

Till the date people have been creatively inventing their own mechanisms to store unstructured  data, we are not trying to put the end to the creativity but rather we are proving a new Robust data type that can help people in storing the unstructured data like bitmap images, text files, videos, audio files etc. in a single data type which is secure, more manageable and the among the best of features, the best is that it is very friendly to use. So the expected results are Happy DBAs, Happy Developers and Happy Users.

People have been inventing their own mechanisms to store the unstructured data, I will share some of the creative things I did some years back when I was a developer and I had the access only to old databases and legacy systems like NT 4.0. The application project needed to store such unstructured data securely like Price Quotations, Vendor Data etc. I had to store complete UNC Path in the database and store the file in that UNC Path. So the issues were UNC Path should be accessible to every user, so there was no security for these files and anyone could take a peek at any files stored at the UNC Path. So when I came across this issue I created multiple UNC Paths and only a single user will have the access to this path. This time a new problem jumped in front of me i.e. if the same file had to be shared to a group of 5 users then either I should create a new folder where these 5 users will have the access or or create 5 copies of the same file in separate folders so that the group of 5 people can take a look. The manageability of this system was not at all AGILE but rather FRAGILE when you get some 20-30 such cases per day.

FILESTREAM is going to solve all such problems. Applications can leverage the rich streaming APIs and performance of the FileSystem at the same maintain a transactional consistency between this unstructured data and structured data with security. Today much of the data is created in the unstructured way such as text documents, videos, images, spreadsheets etc. This unstructured data is often stored outside the database, separate from its structured data most of the time in the creative way like described in the above paragraph. This separation can cause data management complexities. Or, if the data is associated with structured storage, the file streaming capabilities and performance can be limited. FILESTREAM integrates the SQL Server Database Engine with an NTFS file system by storing varbinary(max) binary large object (BLOB) data as files on the file system. Transact-SQL statements can insert, update, query, search, and back up FILESTREAM data. Win32 file system interfaces provide streaming access to the data. FILESTREAM uses the NT system cache for caching file data. This helps reduce any effect that FILESTREAM data might have on Database Engine performance. The SQL Server buffer pool is not used; therefore, this memory is available for query processing.

 

Inside FileStream

When to use

Most often the questions arise what should be stored in the FileStream, there are some of the scenarios that are described below. Before discussing the scenarios I would like to emphasize that SQL Server 2008 still supports varbinary(max), varchar(max), XML data types so we need to be judicious while using FileStream.  SQL Server, BLOBs can be standard varbinary(max) data that stores the data in tables, or FILESTREAM varbinary(max) objects that store the data in the file system. The size and use of the data determines whether you should use database storage or file system storage. If the following conditions are true, you should consider using FILESTREAM:

  • Objects that are being stored are, on average, larger than 1 MB
  • Fast read access is important.
  • You are developing applications that use a middle tier for application logic.
  • For smaller objects, storing varbinary(max) BLOBs in the database often provides better streaming performance.
Storage

FILESTREAM storage is implemented as a varbinary(max) column in which the data is stored as BLOBs in the file system. The sizes of the BLOBs are limited only by the volume size of the file system. The standard varbinary(max) limitation of 2-GB file sizes does not apply to BLOBs that are stored in the file system. To specify that a column should store data on the file system, specify the FILESTREAM attribute on a varbinary(max) column. This causes the Database Engine to store all data for that column on the file system, but not in the database file. FILESTREAM data must be stored in FILESTREAM filegroups. A FILESTREAM filegroup is a special filegroup that contains file system directories instead of the files themselves. These file system directories are called data containers. Data containers are the interface between Database Engine storage and file system storage.

When you use FILESTREAM storage, consider the following:

  • When a table contains a FILESTREAM column, each row must have a unique row ID.
  • FILESTREAM data containers cannot be nested.
  • When you are using failover clustering, the FILESTREAM filegroups must be on shared disk resources.
  • FILESTREAM filegroups can be on compressed volumes.

 

Security

In SQL Server, FILESTREAM data is secured just like other data is secured: by granting permissions at the table or column levels. If a user has permission to the FILESTREAM column in a table, the user can open the associated files.Encryption is not supported on FILESTREAM data. Only the account under which the SQL Server service account runs is granted NTFS permissions to the FILESTREAM container. It is recommended that no other account be granted permissions on the data container. When a database is opened, SQL Server restricts access to the FILESTREAM data container(s), except when access is made by using the Transact-SQL transactions and OpenSqlFilestream APIs. However, when the database is closed, the physical data container is fully available and subject to Windows security check. We recommend that you secure directories that contain FILESTREAM data so that the files cannot be accidentally altered or deleted.

 

Management

FILESTREAM is implemented as a varbinary(max) column and integrated directly into the Database Engine, most SQL Server management tools and functions work without modification for FILESTREAM data. For example, you can use all backup and recovery models with FILESTREAM data, and the FILESTREAM data is backed up with the structured data in the database. If you do not want to back up FILESTREAM data with relational data, you can use a partial backup to exclude FILESTREAM filegroups.

 

Working with FILESTREAM

Enabling FileStream

Before accessing the FileStream and using it we need to Enable the FileStream. For Enabling the FileStream support sp_filestream_configure system stored procedure must be executed.

The syntax of the sp_filestream_configure Stored Procedure is as follows:

sp_filestream_configure [ [ @enable_level = ] level ]
    [ [ , @share_name = ] 'share_name' ] ;

The parameter @enable_level could be set anywhere between 0 and 3.

@share_name is the parameter that takes a string value which indicates by which name we would like to refer the FileStream for remote access.

It requires membership in sysadmin Fixed Server role and Windows Administrators on local computers.

Code Sample

EXEC sp_filestream_configure

       @enable_level = 3,

       @share_name = "MyFileStreamShare";

After this code sample is executed we can switch to command window and see this share by typing NET SHARE on the command prompt.

clip_image002

 

Let's get some hands on ...

The scenario is we will create a new database named FileStreamDB and create a Table that can hold the Vendor Product Catalogs, we would insert records, delete and update them. So let's fold our sleeves and play with FileStream Data Type.

 

Creating Database

The first step in using FILESTREAM is to have a database that supports it. As the FILESTREAM uses a special type of filegroup, you must specify the CONTAINS FILEGROUP clause for least one filegroup.The following Transact-SQL code example creates a database named FileStreamDB. This database contains three filegroups: PRIMARY, RowGroup1, and FileStreamGroup1. PRIMARY and RowGroup1 are regular filegroups that cannot contain FILESTREAM data. FileStreamGroup1 is the FILESTREAM filegroup.For a FILESTREAM filegroup, FILENAME refers to a path. The path up to the last folder must exist, and the last folder must not exist. In this example, C:\FSDemo must exist, but the VendorCatalog sub-folder cannot exist when you execute the CREATE DATABASE statement.

After you run the sample code, you should see the C:\FSDemo\VendorCatalog folder on your drive. This folder contains a filestream.hdr file and an $FSLOG folder.

The filestream.hdr file is an important system file. Do not open or modify this file.

 

CREATE DATABASE FileStreamDB ON PRIMARY

  ( NAME = FileStreamDB_data,

    FILENAME = N'C:\FSDemo\FileStreamDB_data.mdf',

    SIZE = 10MB,

    MAXSIZE = 50MB,

    FILEGROWTH = 10%),

FILEGROUP RowGroup1

  ( NAME = FileStreamDB_group1,

    FILENAME = N'C:\FSDemo\FileStreamDB_group1.ndf',

    SIZE = 10MB,

    MAXSIZE = 50MB,

    FILEGROWTH = 5MB),

FILEGROUP FileStreamGroup1 CONTAINS FILESTREAM

  ( NAME = FileStreamDBResumes,

    FILENAME = N'C:\FSDemo\VendorCatalog')

LOG ON

  ( NAME = 'FileStreamDB_log',

    FILENAME = N'C:\FSDemo\FileStreamDB_log.ldf',

    SIZE = 5MB,

    MAXSIZE = 25MB,

    FILEGROWTH = 5MB);

 

FileStream in Action

We will create a couple of tables for managing Vendor Details and the details of the Catalog sent by the Vendor. Thereafter we would insert records in the Table. In this example I am just casting a text and putting it to the FileStream. I will include some advanced example in a couple of days time in the same post.

USE FileStreamDB;

 

Go

 

--Creating Vendor Master Table

Create Table dbo.Vendor

(

       VendorID int Primary Key,

       VendorName varchar(200),

       VendorAddress Varchar(250),

       VendorCity varchar(50),

       VendorCountry varchar(50),

)

 

GO

 

--Creating the Table that will contain the catalogs

--in the FileStream

CREATE TABLE dbo.VendorCatalog

(     

       VendorID int References dbo.Vendor(VendorID),

       CatalogName varchar(100),

       Catalog varbinary(max) FILESTREAM,

       CatalogID UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE

);

GO

 

--Inserting Record in the Vendor Table

Insert into dbo.Vendor

Values(

1,

'Adventure Works',

'4th Cross',

'Seattle',

'USA'

)

 

GO

 

--Inserting first record in the Catalog

Insert into dbo.VendorCatalog

Values(

1,

'Baseball Stuff',

Cast ('Baseball Catalog' As varbinary(max)),

newid()

)

 

Go

 

--Inserting the second record in the catalog

Insert into dbo.VendorCatalog

Values(

1,

'Diving Gear',

Cast ('Deep Sea Diving Gear Catalog' As varbinary(max)),

newid()

)

 

 

 

 

I will update this article very soon with more information and some code samples so keep a watch......