Chapter 15: Designing Data Components

For more details of the topics covered in this guide, see Contents of the Guide.

Contents

  • Overview
  • Step 1 – Choose a Data Access Technology
  • Step 2 – Choose How to Retrieve and Persist Business Objects from the Data Store
  • Step 3 – Determine How to Connect to the Data Source
  • Step 4 – Determine Strategies for Handling Data Source Errors
  • Step 5 – Design Service Agent Objects (Optional)
  • Additional Resources

Overview

Data layer components provide access to data that is hosted within the boundaries of the system, and data exposed by other networked systems. It contains components such as data access components that provide functionality for accessing the data hosted within the system boundaries, and service agent components that provide functionality for accessing data exposed by other back-end systems through Web services. Additionally it may also contain components that provide helper functions and utilities.

This chapter will help you to understand the basic steps for designing your data components. The first step is to identify the constraints associated with the data to be accessed, which will help you to choose an appropriate data access technology. The next step is to choose a mapping strategy and then determine your data access approach, which includes identifying the business entities to be used and the format of entities. Then you can determine how the data access components will connect to the data source. Finally, you determine the error handling strategy to manage the data source exceptions.

Step 1 – Choose a Data Access Technology

The choice of an appropriate data access technology must take into account the type of data you are dealing with and how you want to manipulate that data within the application. Certain technologies are better suited to specific scenarios. Use the following guidelines to map your application scenarios to the available data access technology solutions:

  • ADO.NET Entity Framework. Consider using the ADO.NET Entity Framework (EF) if you want to create a data model and map it to a relational database; map a single class to multiple tables using inheritance; or query relational stores other than the Microsoft SQL Server family of products. EF is appropriate when you have an object model that you must map to a relational model using a flexible schema, and you need the flexibility of separating the mapping schema from the object model. If you use EF, also consider using:
    • LINQ to Entities. Consider using LINQ to Entities if you must execute queries over strongly typed entities, or must execute queries against relational data using LINQ syntax.
  • ADO.NET Data Services Framework. ADO.NET Data Services is built on top of EF and allows you to expose parts of your Entity Model through a REST interface. Consider using the ADO.NET Data Services Framework if you are developing a RIA or an n-tier rich client application, and you want to access data through a resource-centric service interface.
  • ADO.NET Core. Consider using ADO.NET Core if you need to use a low level API for full control over data access your application, you want to leverage the existing investment made into ADO.NET providers, or you are using traditional data access logic against the database. ADO.NET Core is appropriate if do not need the additional functionality offered by the other data access technologies, or you are building an application that must support a disconnected data access experience.
  • ADO.NET Sync Services. Consider using ADO.NET Sync Services if you are designing an application that must support occasionally connected scenarios, or requires collaboration between databases.
  • LINQ to XML. Consider using LINQ to XML if you are using XML data in your application, and you want to execute queries using the LINQ syntax.

For more information on data access technologies on the Microsoft platform, see Appendix C "Data Access Technology Matrix."

Step 2 – Choose How to Retrieve and Persist Business Objects from the Data Store

After you have identified your data source requirements, the next step is to choose a strategy for populating your business objects or business entities from the data store and for persisting them back to the data store. An impedance mismatch typically exists between an object-oriented data model and the relational data store that sometimes makes it difficult to translate between them. There are a number of approaches to handling this mismatch, but these approaches differ in terms of the data types, structure, transactional techniques, and in how the data is manipulated. The most common approaches use Object/Relational Mapping (O/RM) tools and frameworks. The type of entity you use in your application is the main factor in deciding how to map those entities to data source structures. Use the following guidelines to help you choose how to retrieve and persist business objects from the data store:

  • Consider using an O/RM framework that translates between domain entities and the database. If you are working in a greenfield environment, where you have full control over the database schema, you can use an O/RM tool to generate a schema to support the object model and provide a mapping between the database and domain entities. If you are working in a brownfield environment, where you must work with an existing database schema, you can use an O/RM tool to help you to map between the domain model and relational model.
  • A common pattern associated with OO design is domain model, which is based on modeling entities on objects within a domain. See the Chapter 13 "Designing Business Entities" later in this chapter for information on domain driven design techniques.
  • Ensure you group your entities correctly to achieve a high level of cohesion. This may mean that you require additional objects within your domain model, and that related entities are grouped into aggregate roots.
  • When working with Web applications or services, group entities and provide options for partially loading domain entities with only the required data. This minimizes the use of resources by avoiding holding initialized domain models for each user in memory, and allows applications to handle higher user load.

Step 3 – Determine How to Connect to the Data Source

Now that you know how the data access components map to the data source, you should identify how to connect to the data source, protect user credentials, and perform transactions. Use the guidelines in the following sections to help you choose an appropriate approach:

  • Connections
  • Connection Pooling
  • Transactions and Concurrency

Connections

Connections to data sources are a fundamental part of the data layer. The data layer should coordinate all data source connections, making use of the data access infrastructure. Creating and managing connections uses valuable resources in both the data layer and the data source. Use the following guidelines to ensure that you design an appropriate technique for connecting to data sources:

  • Ensure that you open connections to the data source as late as possible and close them as early as possible. This will ensure that the resources are locked for as short a duration as possible, and are more freely available to other processes. If you have nonvolatile data, use optimistic concurrency to mitigate the cost of locking data in the database. This avoids the overhead of locking database rows, including the connection that must be kept open during a lock.
  • Perform transactions through a single connection where possible. This allows you to use the transaction features of ADO.NET without requiring the services of a distributed transaction coordinator
  • Use connection pooling and tune performance based on results obtained by running simulated load scenarios. Consider tuning connection isolation levels for data queries. If you are building an application with high-throughput requirements, special data operations may be performed at lower isolation levels than the rest of the transaction. Combining isolation levels can have a negative impact on data consistency, so you must carefully analyze this option on a case by case basis.
  • For security reasons, avoid using a System or User Data Source Name (DSN) to store connection information.
  • Design retry logic to manage the situation where the connection to the data source is lost or times out.
  • Batch commands and execute them against the database where possible to reduce round trips to the database server.

Another important aspect that you must consider is the security requirements associated with accessing your data source. In other words, how will data access components authenticate with a data source, and what are the authorization requirements? Use the following guidelines to ensure that you design a secure approach for connecting to data sources:

  • Prefer Windows authentication to SQL Server authentication. If you are using Microsoft SQL Server, consider using Windows authentication with a trusted subsystem.
  • If you do use SQL authentication, ensure that you use custom accounts with strong passwords, limit the permissions of each account within SQL Server using database roles, add ACLs to any files used to store connection strings, and encrypt connection strings in configuration files.
  • Use accounts with least privilege in the database, and require callers to send identity information to the data layer for auditing purposes.
  • Do not store passwords for user validation in a database; either plaintext or encrypted. Instead, store password hashes that use a salt value (random bits used as one of the inputs to the hashing function).
  • If you are using SQL statements to access the data source, understand your trust boundaries and use the parameterized approach to create queries instead of string concatenation to protect against SQL injection attacks.
  • Protect sensitive data sent over the network to and from SQL Server. Be aware that Windows authentication protects credentials, but not application data. Use IPSec or SSL to protect the data in the channel.

Connection Pooling

Connection pooling allows applications to reuse a connection from the pool, or create new connection and add it to the pool if no suitable connection is available. When applications close a connection, it is released back into the pool and the underlying connection remains open. This means that ADO.NET is not required to create a new connection and open it against the data source every time. Although pooling open connections does consume resources, it reduces data access delays and makes applications run more efficiently when suitable pooled connections are available. Other issues that affect connection pooling are the following:

  • To maximize the effectiveness of connection pooling, consider using a trusted subsystem security model and avoid impersonation if possible. By using the minimum number of credential sets, you increase the likelihood that an existing pooled connection can be reused and reduce the change of a connection pool overflow. If every call uses different credentials, ADO.NET must create a new connection every time.
  • Connections that remain open for long periods can hold on to resources on the server. A typical cause is opening connections early and closing them late (for example, by not explicitly closing and disposing a connection until it goes out of scope).
  • Connections can be held open for long periods when using DataReader objects, which are only valid while the connection is open.

Transactions and Concurrency

If you have business critical operations in your application, consider wrapping them in transactions. Transactions allow you to execute associated actions on a database as an atomic unit, and ensure database integrity. A transaction is only considered complete if all information and actions are complete, and the associated database changes are made permanent. Transactions support undo (rollback) database actions following an error, which helps to preserve the integrity of data in the database. Use the following guidance to help you design transactions:

  • If you are accessing a single data source, use connection-based transactions whenever possible. If you are using manual or explicit transactions, consider implementing the transaction within a stored procedure. If you cannot use transactions, implement compensating methods to revert the data store to its previous state.
  • If you are using long-running atomic transactions, avoid holding locks for long periods. In such scenarios, use compensating locks instead. If transactions take a long time to complete, consider using asynchronous transactions that call back to the client when complete. Also, consider the use of multiple active result sets in transaction-heavy concurrent applications to avoid potential deadlock issues.
  • If the chance of a data conflict from concurrent users is low (for example, when users are generally adding data or editing different rows), consider using optimistic locking during data access so that the last update applied is valid. If the chance of a data conflict from concurrent users is high (for example, when users are likely to be editing the same rows), consider using pessimistic locking during data access so that updates can only be applied to the latest version. Also consider concurrency issues when accessing static data within the application or when using threads to perform asynchronous operations. Static data is not inherently thread safe, which means that changes made in one thread will affect other threads using the same data.
  • Keep transactions as short as possible to minimize lock durations and improve concurrency. However, consider that short and simple transactions may result in a chatty interface if it requires multiple calls to achieve one operation.
  • Use the appropriate isolation level. The tradeoff is data consistency versus contention. A high isolation level will offer higher data consistency at the price of overall concurrency. A lower isolation level improves performance by lowering contention at the cost of consistency.

In general, you can choose from three types of transaction support, as described in the following list:

  • System.Transactions namespace classes provides as part of the.NET Framework for both implicit and explicit transaction support. Consider using System.Transactions if you are developing a new application that requires transaction support, or if you have transactions that span multiple nondurable resource managers. For most transactions, the recommended approach is to use the implicit model provided by the TransactionScope object in the System.Transactions namespace. Although implicit transactions are not as fast as manual, or explicit, transactions, they are easier to develop and lead to middle tier solutions that are flexible and easy to maintain. If you do not want to use the implicit model for transactions, you can implement manual transactions using the Transaction class in System.Transactions.
  • ADO.NET Transactions based on a single database connection. This is the most efficient approach for client controlled transactions on a single data store. Consider using ADO.NET Transactions if you are extending an application that already uses ADO.NET Transactions, you are using ADO.NET providers for database access and the transactions are limited to a single database, or you are deploying your application into an environment that does not support version 2.0 of the .NET Framework. You use ADO.NET commands to begin, commit, and roll back the operations performed within the transaction.
  • T-SQL (Database) Transactions controlled by commands executed in the database. These are most efficient for server controlled transactions on a single data store, where the database manages all aspects of the transaction. Consider using database transactions if you are developing stored procedures that encapsulate all of the changes that must be managed by a transaction, or you have multiple applications that use the same stored procedure and transaction requirements can be encapsulated within that stored procedures.

Step 4 – Determine Strategies for Handling Data Source Errors

In this step, you should design an overall strategy to handle data source errors. All exceptions associated with data sources should be caught by the data access layer. Exceptions concerning the data itself, and data source access and timeout errors, should be handled in this layer and passed to other layers only if the failures affect application responsiveness or functionality. Use the guidelines in the following sections to help you choose an appropriate approach:

  • Exceptions
  • Retry Logic
  • Timeouts

Exceptions

A centralized exception management strategy will enable consistent handling of exceptions. Exception handling is a crosscutting concern, so consider implementing the logic in separate components that you can share across layers. Pay particular attention to exceptions that propagate through trust boundaries and to other layers or tiers, and design for unhandled exceptions so they do not result in application reliability issues or exposure of sensitive application information. The following approach will help you in designing the exception management strategy:

  • Determine exceptions that should be caught and handled in the data access layer. Deadlocks, connection issues, and optimistic concurrency checks can often be resolved within the data layer.
  • Consider implementing a retry process for operations where data source errors or timeouts occur, but only where it is safe to do so.
  • Design an appropriate exception propagation strategy. For example, allow exceptions to bubble up to boundary layers where they can be logged and transformed as necessary before passing them to the next layer.
  • Design an appropriate logging and notification strategy for critical errors and exceptions that does not reveal sensitive information.
  • Consider using existing frameworks such as the patterns & practices Enterprise Library to implement a consistent exception handling and management strategy.

Retry Logic

Design retry logic to handle errors, such as those that may occur during server or database failover. Retry logic should catch any errors that occur while connecting to the database or executing commands (queries or transactions) against the database. There may be multiple causes for the error. When an error occurs, the data component should reestablish connectivity by closing any existing connections and attempting to make a new connection, and then re-execute failed commands if necessary. It should retry the process only a certain number of times, and then finally give up and return a failure exception. Ensure that queries and requests, and any subsequent retries, are executed asynchronously so that they do not render the application unresponsive.

Timeouts

Identifying the appropriate value for connection and command timeouts is very important. Setting a connection or command timeout value that is higher than the client timeout (for example, in the case of a Web application, the browser or Web server request timeout) can result in the client request timing out before the database connection is opened. Setting a low value will cause the error handler to invoke the retry logic. If a timeout occurs while executing a transaction, database resources may remain locked after the connection is closed when connection pooling is enabled. In such cases, when the connection is closed, it should be discarded so that is not returned to the pool. This results in the transaction being rolled back, freeing the database resources.

Step 5 – Design Service Agent Objects (Optional)

Service agents are objects that manage the semantics of communicating with external services, isolate your application from the idiosyncrasies of calling diverse services, and provide additional services such as basic mapping between the format of the data exposed by the service and the format your application requires. They may also implement caching, and offline or intermittent connection support. Follow the steps below to design the service agent objects:

  1. Use the appropriate tool to add a service reference. This will generate a proxy and the data classes that represent the data contract from the service.
  2. Determine how the service will be used in your application. For most applications, the service agent acts as an abstraction layer between your business layer and the remote service, and can provide a consistent interface regardless of the data format. In smaller applications, the presentation layer, may access the service agent directly.

Additional Resources

For more information on general data access guidelines, see the following resources: