Chapter 5 - Developing: Databases - Migrating the Database Architecture
On This Page
Introduction and Goals
Build the SQL Server Instance
Configure the Server
Migrate the Storage Architecture
Introduction and Goals
The first task in the migration of the database is to create an instance of SQL Server that will provide the container (the RDBMS) for creating schemas and schema objects to cater to the needs of applications. Your main focus when creating the database should be on its architecture: the instance architecture involving server resources, such as processor power and memory; and storage architecture, including file systems. Your objective for both of these areas is to derive performance that is equivalent to or better than the original solution.
The two most common implementations of databases are Online Transaction Processing (OLTP) systems and Decision Support Systems (DSS). OLTP systems are characterized by large numbers of short transactions that frequently add or modify data. DSS systems, on the other hand, contain large volumes of mostly static data imported into them from OLTP databases, file systems, and other such systems for the purposes of analysis and reporting. There are fewer transactions in DSS systems, but they run much longer. OLTP system performance is influenced by locking. DSS performance is typified by large reads, sorts, computation, and so on.
The guidance provided in this solution is focused on OLTP systems. The types of operations performed in OLTP and DSS systems impose different demands on the database resources. While the architecture for OLTP and DSS systems differs, resource considerations are similar. Though there are some objects that are specialized for DSS systems, most objects are common to both systems. As a result, the guidance provided applies to DSS systems, as well.
Build the SQL Server Instance
This section discusses the planning and preparation for installing the SQL Server RDBMS software. A more detailed discussion can be found in SQL Server Books Online under the “Installing SQL Server” topic.
You should follow these design guidelines for the layout of a SQL Server software installation and database files:
Independent subdirectories. Files should be separated by categories and instance to minimize effects and ease navigation. When named instances of SQL Server are created, the associated directories are created with instance names for identification purposes.
Consistent naming convention for files. Use the following file name extensions: .mdf for primary datafiles, .ndf for secondary datafiles, and .ldf for log files. The default file naming convention is dbname_data.mdf for the datafile and dbname_log.ldf for the log file.
Integrity of home directories. Keep the SQL Server software separate from the datafiles. This allows the software to be moved or deleted without affecting the application.
Separation of administrative information for each database. Store system data in the master database separate from other data.
Separation of tablespace content. Every instance of SQL Server comes with the system databases master, model, msdb, and tempdb.
Tune I/O load across all disks. SQL Server has filegroups, each with multiple files that give the same advantages as tablespaces. In addition, storage allocated to an object is distributed evenly across all datafiles belonging to the filegroup.
The following important installation options have to be determined prior to installing SQL Server:
Instance name. SQL Server has a provision for a default instance where the instance takes the network name of the server. Multiple instances with assigned names can also be created. SQL Server naming conventions do not preclude reuse of Oracle instance names.
Network libraries. SQL Server can communicate using several network protocols. TCP/IP and named pipes are the most common. To minimize resource utilization, only configure those protocols that are required.
Service accounts. Like any other service, both SQL Server and SQL Server Agent (the SQL Server scheduling service) require a Microsoft Windows® account. You should run SQL Server under a domain account. It does not have to be a local administrator or a domain account that has local administrative privileges.
Authentication mode. Options available in SQL Server are Windows Authentication or Mixed mode. Mixed mode allows login access to SQL Server either by a valid Windows account or a valid SQL Server login. The authentication mode can be changed any time in the future. Changing the authentication mode requires that the database be restarted to take effect.
Licensing mode. Indicate whether this SQL Server is licensed by processor or per seat (Client Access License) and the number for the given mode.
Location of files. With Oracle, even though the installer provides the option of creating a starter database, installing the software and creation of databases are two independent events. Installing the SQL Server software also creates a "database system" with databases such as master, model, msdb, northwind, pubs, and tempdb created by default. The destination location of SQL Server application files and the datafiles for the default databases can be specified at installation.
Configure the Server
A SQL Server instance can be configured similar to an Oracle instance. The areas that are covered here include memory, CPU, and listener.
Architecturally, both Oracle and SQL Server can be divided into instance and database. The instance is made up of memory areas and database processes. In SQL Server, the division of memory pool into sub-caches (buffer cache, procedure cache, log cache, connection context, and system data structures) based on function closely resembles the Oracle SGA and its components. SQL Server has threads that perform similar work to the Oracle foreground and background processes.
In Oracle, the characteristics and behavior of the database and the instance are determined by a large set of parameters stored in the initialization file (init.ora) or server parameter file (spfile). These parameters cover a diverse set of resources, such as memory, processes, network, disk, I/O, connections, files, character set, and so on.
The non-default values of the Oracle initialization parameters can be obtained from the parameter file if one is in use. If a server parameter file is in use, the parameter values can be obtained using one of the following options:
Convert the server parameter file (spfile) to an initialization parameter file by executing the following statement:
CREATE pfile FROM spfile
Query the database by executing the following statement:
SELECT name, value
FROM sys.v$spparameter WHERE isspecified = 'TRUE'
SQL Server does not have an equivalent for every initialization parameter. The configuration options in SQL Server can be specified using SQL Server Enterprise Manager or using the sp_configure system stored procedure. Syntax for use is shown in the following statement.
sp_configure [option, [value]]
A discussion of every Oracle initialization parameter is beyond the scope of this guide. The configuration options available in SQL Server can be categorized into user-configurable and advanced. The advanced options, which are similar to Oracle’s “hidden” parameters, are either self-configuring or should be manipulated only with the advice of a certified SQL Server technician.
Examples of Oracle parameters that have an equivalent configurable parameter in SQL Server are provided in Table 5.1:
Table 5.1: Examples of Oracle Initialization Parameters with Equivalent in SQL Server
max worker thread options
Sometimes a configuration option can be set at one of several different places. For example, the SQL Server max degree of parallelism parameter can be set using the Oracle DEGREE clause at the object or query level.
Certain options which are configurable in one database may have a fixed implementation in the other. For example, the Oracle LOG_CHECKPOINT_INTERVAL and LOG_CHECKPOINT_TIMEOUT parameters used to tune checkpoints do not have equivalents in SQL Server. In SQL Server, checkpoints occur automatically based on the number of redo records in the log. Similarly, UNDO_MANAGEMENT, UNDO_TABLESPACE, and ROLLBACK_SEGMENTS that define the storage for transaction rollback or undo data are implemented in SQL Server as part of the transaction logs of each database.
SQL Server has far fewer configuration parameters than Oracle. In Oracle, there are several parameters that are used to control memory allocation to the instance and its sub-components. Some of these parameters are
SQL Server offers the parameters min server memory and max server memory, which can be used to limit the amount of server memory that can be utilized by the database system. SQL Server cooperates with the operating system to dynamically adjust the amount of memory used based on the demands on the server from other applications. However, this behavior puts the burden on the dimensioning of the system so it is capable of running the database with known loads and peaks and avoiding competing workloads which might jeopardize the system resources available to the database. All memory areas within the memory pool are dynamically adjusted by the SQL Server code to optimize performance and do not need any administrator input. Hence you will find very few memory-related configuration parameters in SQL Server.
The amount of memory that can be utilized by SQL Server can be configured either using Enterprise Manager or through T-SQL. The two options are provided in the following procedures:
To configure the amount of memory available for SQL Server in Enterprise Manager, follow these steps:
Expand the server group.
Right-click Server (Server Name\Instance Name), and then click Properties.
Select the Memory tab.
Select Dynamically configure SQL Server memory to set the memory range under which SQL Server should operate. Select Use a fixed memory size (MB) to allocate a fixed amount of memory. The actual values can be selected using the memory slider graphic.
The code for setting a range of memory and a fixed amount of memory for SQL Server is provided in the following procedure.
To configure a range of memory for SQL Server using T-SQL, enter the following commands:
USE master EXEC sp_configure 'show advanced option', 1 RECONFIGURE GO sp_configure 'min server memory', 100 -- 100MB RECONFIGURE GO sp_configure 'max server memory', 4000 -- 4000 MB RECONFIGURE GO
To configure a fixed amount of memory for SQL Server using T-SQL, enter the following commands:
USE master sp_configure 'show advanced option', 1 RECONFIGURE GO sp_configure 'min server memory', 400 RECONFIGURE GO sp_configure 'max server memory', 400 RECONFIGURE GO
Note A database restart is required to activate the new settings of all advanced options.
If the amount of physical memory exceeds 3 GB, then the Address Windowing Extensions (AWE) Windows feature has to be enabled to handle allocation to SQL Server using the configuration parameter awe enabled. When this feature is enabled, SQL Server stops dynamic memory management at the server level and the max server memory parameter has to be set. If the max server memory parameter is set, cache management is similar to SGA sizing in Oracle but without the complications of tuning the individual cache components.
To configure SQL Server to use more than 3 GB of memory using T-SQL, enter the following commands:
USE master sp_configure 'show advanced option', 1 RECONFIGURE GO sp_configure 'awe enabled', 1 RECONFIGURE GO sp_configure 'max server memory', 6000 RECONFIGURE GO
Because the memory requirements of a SQL Server instance are not the same as an equivalent Oracle instance, the SQL Server memory size has to be tuned during a migration. The following Performance Monitor counters can be used to check the amount of memory SQL Server is using:
SQL Server: Buffer Manager: Total Pages
SQL Server: Memory Manager: Total Server Memory (KB)
Process: Working Set
The dbcc memorystatus command can be used to view information about memory allocation. For more information about memory usage and this command, refer to http://support.microsoft.com/?id=271624.
The LOCK_SGA initialization parameter can be implemented in SQL Server using set working set size. The allocated memory stays in physical memory and improves performance because swapping is avoided. However, care should be taken that allocating a fixed amount of memory to SQL Server does not impact the needs of other applications running on the same server.
Set the CPU Affinity
SQL Server has the capability of defining CPU affinity where CPUs can be dedicated to an instance. This can be controlled using the affinity mask configuration setting or Enterprise Manager. The priority boost configuration setting can be used to prioritize threads. SQL Server SP1 has added an IO_affinity_mask switch, which can used at instance startup to reserve CPUs to handle disk I/O associated with the instance. Use of the IO_affinity_mask configuration setting is only recommended for machines with more than 16 CPUs.
For more information on setting up the I/O affinity mask, refer to http://support.microsoft.com/?kbid=298402.
Configure the Listener
The Oracle listener can operate on any available port. The well-known default selection is 1521. A default installation of SQL Server uses port 1433. If not configured manually, a named instance picks up an unused TCP port during startup. Unlike Oracle, where a single listener is shared by all instances, each SQL Server instance listens on its own port. The SQL Server server name, database name, and port information are required by client applications to configure the data source or the connection string (in non-DNS connections).
Migrate the Storage Architecture
This section discusses the physical and logical storage structures found in SQL Server. This discussion is useful in understanding how to configure storage in SQL Server in a manner similar to the original Oracle database.
Database data is physically stored in files. However, allocation of this space to the database objects is done in logical units such as blocks, extents, and segments. Figure 5.1 provides a comparison of the storage architecture in Oracle and SQL Server.
Figure 5.1 Mapping Oracle storage structures to SQL Server
The organization of storage and its allocation has great significance on performance and maintenance of a database. Due to the similarities in the storage architecture, the principles on which the source Oracle database was organized can be carried over into the SQL Server database with relative ease. The following sections discuss each level of the storage hierarchy shown in Figure 5.1.
In Oracle, the smallest unit of storage is the data block. All data in the database is retrieved and manipulated in terms of blocks. The SQL Server equivalent of the data block is called the page. In Oracle, the DBA chooses the default page size at database creation to be any of 2K, 4K, 8K, 16K or 32K. In SQL Server, the page size is fixed at 8K. With Oracle 9i, in addition to the default page size, up to five non-standard (non-default) page sizes can also be used for the tablespaces (and, consequently, the logical storage structures).
The SQL Server page has similar composition as the Oracle data block. The data page consists of page header, data rows, and row offset array (row directory). The header leaves 8060 bytes of usable space for data rows and row offset. In Oracle, a data row that is larger than the usable space is broken into several row pieces (row chaining) and stored in multiple blocks. SQL Server does not allow rows larger than 8060 bytes. This restriction in SQL Server, however, does not apply to rows containing large data types, such as text, image, and so on, which are stored differently.
Within a SQL Server block, space is managed in a manner similar to that by Oracle’s Automatic Segment Space Management (ASSM) option, where used and free space is tracked using bitmaps. This has implications with respect to migrating to SQL Server. In Oracle, the choice of block size is driven by the type of database — OLTP or DSS/Data Warehouse. The advantage of large block size is mainly in terms of I/O performance because there is less overall cost for setting up fewer I/Os. Better I/O performance can be achieved with smaller block sizes by using other techniques, such as multi-block read (DB_FILE_MULTIBLOCK_READ_COUNT), parallel reads and read ahead, or prefetch. The maximum I/O size, however, is dependent on the operating system. SQL Server also uses a technique called scatter I/O that enables multiple unrelated blocks to be read in a single I/O request.
An advantage of larger block size is reduced amount of row chaining. This is a factor when tables contain rows with lengths larger than can be accommodated in a block. Reading of multiple blocks may be necessitated in a SQL Server database as compared to Oracle databases with larger block size, but the performance can be compensated for by better hardware and I/O techniques.
Based on the environment, large block sizes can impact cache performance. With large block sizes, for a fixed amount of RAM, the number of blocks in physical memory reduces. For example, if 1 MB of memory can hold 500 blocks of size 2 KB, it can hold only 125 blocks of size 8 KB. So the cache performance is dependent on the density of the block (number of rows per block) and the access pattern (random or sequential). Smaller blocks reduce contention while the overhead becomes a larger portion of the block.
In the migration, the move from differing block sizes in Oracle to 8 KB blocks in SQL Server does not pose any challenge. When migrating from Oracle (where multiple block sizes are in use), the rows will be rearranged into 8 KB blocks. However, solutions have to be created to accommodate rows with length greater than 8060 bytes. Such solutions are examined later in this chapter.
Extents and Segments
Data allocation to schema objects (such as tables and indexes) and system data structures (temporary segment, rollback segment) in Oracle and SQL Server is in terms of logical extents. Oracle provides parameters for appropriate sizing of the extents. SQL Server, on the other hand, uses fixed size extents of 8 pages (64 KB). Oracle extents — irrespective of whether they belong to dictionary or locally-managed tablespaces — are migrated to fixed 64 KB extents.
Extents allocated to Oracle objects are dedicated to the object and do not contain blocks from other objects. This is not the case in SQL Server. SQL Server has two types of extents: uniform and mixed. In a uniform extent, all pages are allocated to a single object, while mixed extents can have pages belonging to multiple objects. When a table or index is created in SQL Server, initially it is allocated two pages out of mixed extent. When the table or index grows to eight pages, all future allocations use uniform extents. As a result of fixed size extents, fragmentation within the tablespace (filegroup in SQL Server) is eliminated.
Because an extent is a contiguous allocation of blocks, large extents aid in better I/O rate while reading huge amounts of data. After moving to SQL Server, physical I/O will have to be configured and tuned, as was the case with the change in block sizes.
In Oracle, all extents of an object are collectively called a segment. While the concept of segment existed in version 6.5, SQL Server does not have an equivalent for the segment from version 7.0 and later. The segment has no bearing on migration or performance.
Tablespaces and Datafiles
Datafiles are used to store persistent data in the database. For ease of management, one or more datafiles can be grouped into logical tablespaces. The SQL Server equivalent of the tablespace is the filegroup. While filegroups are similar in function, their organization differs from that of tablespaces. Oracle tablespaces are at the instance level. SQL Server filegroups come under and are associated with the individual databases. Figure 5.2 illustrates the tablespace to datafiles hierarchy:
Figure 5.2 Schematic mapping of Oracle files and tablespaces to SQL Server
Table 5.2 captures some of the relations between tablespaces in Oracle and databases in SQL Server that are not obvious in Figure 5.2.
Table 5.2: Oracle Tablespaces and SQL Server Functional Equivalents
Undo (Rollback) tablespace
Online Redo log
"App Data" tablespace
"App" database — "Data" filegroup
"App Index" tablespace
"App" database — "Index" filegroup
The following list describes how Oracle functionality is implemented in SQL Server:
Each of the SQL Server databases has its own security structures, such as users, privileges (permissions), and roles.
Each of the SQL Server databases has its own administrative roles that bestow privileges on the database.
In SQL Server, the system catalog — which is analogous to the Oracle data dictionary — is broken up between the individual databases and the master database.
In SQL Server, each of the databases has its own transaction log files, which combine the functions of the Oracle online redo logs and rollback segments.
The tempdb database provides temporary storage for the entire SQL Server instance, and the temporary tablespace in Oracle is common to the entire Oracle instance.
Tablespaces and filegroups provide the ability to better distribute data across multiple files for the purposes of performance. The grouping also helps ease administration of backup and recovery, maintenance, availability, and so on.
Each SQL Server database is created with a primary file belonging to the default primary filegroup. Optionally, secondary datafiles can be added to the primary filegroup, or additional filegroups can be created. The location of the files belonging to a database is recorded in the master database and the primary file for the database. Data added to objects of a filegroup are proportionally filled across all files belonging to the filegroup.
A common concern among users when creating SQL Server databases is the lack of the multiplexing feature available in Oracle for control files and redo logs. This is not a concern because SQL Server recommends the use of striped and mirrored devices (RAID 0+1) for transaction logs to preserve the level of protection against hardware failure obtained by log multiplexing. SQL Server recommends using RAID 0+1 for all database datafiles. If RAID 0+1 cannot be implemented, at a minimum RAID 5 should be used to provide tolerance for hard disk failure. Apart from file systems, SQL Server also supports the use of raw partitions.
Storage Definition for Tables and Indexes
In Oracle, the storage definition is used to specify the characteristics of tables and indexes. The following list provides information on how these characteristics migrate to SQL Server.
Physical attributes. These include PCTUSED, PCTFREE, INITRANS, and MAXTRANS. Of these parameters, SQL Server only offers an equivalent for PCTFREE through the FILLFACTOR clause. FILLFACTOR is available for indexes only. A PCTFREE of 10 would correspond to a FILLFACTOR of 90.
Tablespace. This is used in a manner similar to tables. The SQL Server equivalent is the ON filegroup_name option. A default filegroup can be specified for each database, and the default filegroup functions similar to the default tablespace of an Oracle user. If a filegroup is not specified during index creation, the index is created in the default filegroup for the database.
Storage attributes. These override the defaults specified at the tablespace level, such as INITIAL, NEXT, MINEXTENTS, MAXEXTENTS, PCTINCREASE, and so on. Due to the fixed (64 KB) extents, these attributes have no relevance in SQL Server.
Migrate System Storage Structures
As has already been presented in this chapter, an Oracle database has the following system structures that have a similar role in a SQL Server database. The configuration of these structures in an Oracle database can be used to provide a starting point for configuration in SQL Server.
System tablespace. In Oracle, the system tablespace houses the data dictionary and the system rollback segments. The master database in SQL Server has a similar role. The transaction log — which also holds the rollback information — is in a separate file. Also, the data dictionary in the master database does not grow because user object information is stored in the primary file of the contained database under the decentralized data dictionary model of SQL Server. The master database also has some of the information that is stored in the control file of an Oracle database. Based on these facts, it is appropriate to use the default master database settings.
Rollback or undo. In Oracle, a separate tablespace exists for holding rollback segments (manual and auto modes) whose size and number is dependent upon the number of user connections and the amount of changes being made to the database. These factors should be considered in sizing the transaction log in SQL Server.
Redo. There are several factors to consider while choosing the size of a transaction log. The source Oracle database has more than one schema that will be migrated into separate SQL Server databases. For the same transaction, the amount of redo and rollback information generated in SQL Server and Oracle is different. The frequency of backup or truncation of the transaction log also influences the size. Hence prototyping is the best way to size the SQL Server transaction log.
Temporary. The size of tempdb can be initially set to a conservative size as compared to its Oracle counterpart, the temporary tablespace. The auto-increment feature should be set to ensure that transactions do not fail due to lack of space. Setting very small increments will also affect performance due to constant space operations. The temporary space needs are based on the application needs, which may change a little as applications are migrated to a SQL Server environment.