How To: Design Your Data Access Layer

J.D. Meier, Alex Homer, David Hill, Jason Taylor, Prashant Bansode, Lonnie Wall, Rob Boucher Jr, Akshay Bogawat.

Applies To

  • Data Access components in the data access layer of a design.

Summary

The goal of a data access layer is to provide an abstraction between application entities and the data used by those entities regardless of where that data comes from. For instance, you may have a relational database, a web service, or a legacy environment with data that you need to display or manipulate. Data layer components are used to provide access to data that is hosted within the boundaries of the system, and data exposed by other back-end systems.

Contents

  • Objectives
  • Overview
  • Summary of Steps
  • Step 1 – Identify Data Source Constraints
  • Step 2 – Determine your Data Access Approach
  • Step 3 – Choose How to Map Data Access Components to The Data Source
  • Step 4 – Determine How to Connect to The Data Source
  • Step 5 – Determine Strategies for Handling Data Source Errors
  • Step 6 – Design Data Helper/Utility Objects (Optional)
  • Step 7 – Design Service Agent Objects (Optional)
  • Additional Resources

Objectives

  • Understand your data access constraints.
  • Determine a data access approach.
  • Determine how you will map between your data source and your entities.
  • Determine how you will handle data source connections.
  • Determine how you will handle data access errors.

Overview

This article will help you to understand the basic steps when designing your data access layer. The first step is to understand the constraints associated with the data to be accessed. Next step is to choose a mapping strategy. Next, determine your data access approach which includes identifying the business entities to be used and the format of entities. The next task is to identify how the data access components will connect to the data source. Finally, determine the error handling strategy to manage the data source exceptions.

Summary of Steps

  • Step 1 – Identify Data Source Constraints
  • Step 2 – Determine your Data Access Approach
  • Step 3 – Choose How to Map Data Access Components to The Data Source
  • Step 4 – Determine How to Connect to The Data Source
  • Step 5 – Determine Strategies for Handling Data Source Errors
  • Step 6 – Design Data Helper/Utility Objects (Optional)
  • Step 7 – Design Service Agent Objects (Optional)

Step 1 – Identify Data Source Constraints

The main goal in this step is to determine if you are working with a Greenfield or Brownfield environment and identify associated restrictions.
You should consider if any new development is required and in such cases how does it coexist with the data source in its current state.

Greenfield

As there is no prior work done related to the data source, you have full control over the schema used by your data source. Restrictions are based on the data source itself.

Brownfield

You do not have control over data source schemas, and the data source could be anything from a database to gateway components used to interact with existing components. You need to understand the complexity and constraints of the existing business. Consider the following guidelines for a Brownfield environment:
  • Determine if there a pre-defined operational data store or other restriction on the database schemas. In such cases, you would not be able to change the existing schema. But you can add new tables or views.
  • Determine if you are interacting with the data layer using Web services or with legacy applications using gateway components. In case accessing through Web service or gateway components then you are restricted to the interface defined by the Web service.

Check Point

Before you finish this step, you should be able to answer the following questions:
  • Have you identified whether you have Greenfield or Brownfield environment?
  • Have you identified the constraints imposed by the environment?

Step 2 – Determine your Data Access Approach

In this step you will decide your approach to design the data access layer. It starts with choosing the business entities and then provides the guidelines to choose the appropriate technology for your application.

Choose the entity types that you need

Data access logic components deal with entities. Entities are used to contain and manage the data used by your application. You should not add business logic in the entities, but you can consider adding data validation within the entities.
Choosing the right type and format for your business entities is important as it would decide whether the interoperability and serialization requirements are met. Determine if you need to facilitate serialized communications across different processes and physical machines and also whether storage and retrieval of application state is required.
Following are the several different formats to represent data:
  • Custom Objects – Best suited for applications that have moderate to complex business rules associated with the data, but they can be used with any application type. There are two main design patterns associated with custom objects, Domain Model and Table Module. However, custom objects can also be used to represent XML structures, list items in a SharePoint list, or any other structure used to hold data.
  • DataSet – Good for small amounts of data that have limited rules associated with the data. There are two types of DataSets, standard and strongly typed. A standard DataSet uses string based names to access the data while a strongly typed DataSet provides a property based interface to the data.
  • DataTable – This is a component of DataSets that can be used independently from a DataSet. A DataTable can be initialized and loaded with data independent of DataSets. Because they do not maintain schema data and changed data they have much less overhead than using a DataSet. DataTables can also be strongly typed similar to DataSets.
  • XML – A standards based format that contains structured data. This format would be used when interoperability with other platforms is desired. In addition, XML is often used for file based data used by the business layer. However, it is normally loaded into a document object model or deserialized into objects for use within the business layer.

Consider the following while choosing and implementing the appropriate data format:
  • Custom Objects. If better application maintainability is important to you, then use custom business entities. This will require additional code to map the entities to database operations. However, using the object/relational mapping (O/RM) solutions will reduce the amount of custom code required. Choose ADO.NET Entity Framework, if you need more flexibility, for example, when you want to use the same entity structure (conceptual view) with different data sources. Choose LINQ if you only need mapping to a single data source.
  • DataSets. If you need to support disconnected scenarios in simple CRUD-based applications, then use DataSets. The most common approach is to use the ADO.NET provider. This is suitable when you are working with an existing application that already uses the ADO.NET providers. Use LINQ to datasets to populate datasets based on LINQ queries. This is suitable if you are developing a new application.
  • XML. If your data access layer will be accessed by other applications and you require interoperability with other systems and platforms, then use XML formats.
  • Implement business entities by deriving them from a base class that provides basic functionality and encapsulates common tasks. However, be careful not to overload the base class with unrelated operations, which would reduce the cohesiveness of entities derived from the base class, and cause maintainability and performance issues.
  • Design business entities to rely on data access logic components for database interaction. Centralize implementation of all data access policies and related business logic. For example, if your business entities access SQL Server databases directly, all applications deployed to clients that use the business entities will require SQL connectivity and logon permissions.

Choose the appropriate technology

The choice of an appropriate data access technology will depend on the type of data you are dealing with, and how you want to manipulate the data within the application. Certain technologies are better suited for specific scenarios. Use the following common scenarios and solutions to map your application scenarios to common data access technology solutions.:
ADO.NET Core
Consider using ADO.NET Core if you:
  • Need to use low level API for full control over data access your application.
  • Want to leverage the existing investment made into ADO.NET providers.
  • Are using traditional data access logic against the database.
  • Do not need the additional functionality offered by the other data access technologies.
  • Are building an application that needs to support disconnected data access experience.
ADO.NET Data Services Framework
Consider using ADO.NET Data Services Framework if you:
  • Are developing a Silverlight application and want to access data through a data centric service interface.
  • Are developing a rich client application and want to access data through a data centric service interface.
  • Are developing N-tier application and want to access data through data centric service interface.
ADO.NET Entity Framework
Consider using ADO.NET Entity Framework (EF) if you:
  • Need to share a conceptual model across applications and services.
  • Need to map a single class to multiple tables via Inheritance.
  • Need to query relational stores other than the Microsoft SQL Server family of products.
  • Have an object model that you must map to a relational model using a flexible schema.
  • Need the flexibility of separating the mapping schema from the object model.
ADO.NET Sync Services
Consider using ADO.NET Sync Services if you:
  • Need to build an application that supports occasionally connected scenarios.
  • Need collaboration between databases.
LINQ to Data Services
Consider using LINQ to Data Services if you:
  • Are using data returned from ADO.NET Data Services in a client.
  • Want to execute queries against client-side data using LINQ syntax.
  • Want to execute queries against REST data using LINQ syntax.
LINQ to DataSets
Consider using LINQ to DataSets if you:
  • Want to execute queries against a Dataset, including queries that join tables.
  • Want to use a common query language instead of writing iterative code.
LINQ to Entities
Consider using LINQ to Entities if you:
  • Are using the ADO.NET Entity Framework
  • Need to execute queries over strongly-typed entities.
  • Want to execute queries against relational data using LINQ syntax.
LINQ to Objects
Consider using LINQ to Objects if you:
  • Need to execute queries against a collection.
  • Want to execute queries against file directories.
  • Want to execute queries against in-memory objects using LINQ syntax.
LINQ to XML
Consider using LINQ to XML if you:
  • Are using XML data in your application.
  • Want to execute queries against XML data using LINQ syntax.

Check Point

Before you finish this step, you should be able to answer the following questions:
  • Have you identified the format of entities you need?
  • Have you identified the appropriate technology?

Step 3 – Choose how to map data access components to the data source

Once you have identified data source requirements and determined the level of abstraction you plan to use the next step is to choose a mapping strategy to map your entities to the data store. The type of entity you use in your application is the main factor in deciding how to map those entities to data source structures.

You should consider the impedance mismatch between the object-oriented model and the relational model that makes it difficult to translate between them. There are differences in the data types, structural differences, transactional differences, and differences in how data is manipulated.

The two common approaches to handling the mismatch are data access design patterns such as Repository, and Object/Relational Mapping (O/RM) tools. A common model associated with OO design is the Domain Model, which is based on modeling entities after objects within a domain. Design the object relational mapping using the following approach:
  • Consider using or developing a framework that provides a layer between domain entities and the database.
  • If you are working in a Greenfield environment, where you have full control over the database schema, choose an O/RM tool that will generate a schema to support the object model and provide a mapping between the database and domain entities.
  • If you are working in a Brownfield environment, where you must work with an existing database schema, consider tools that will help you to map between the domain model and relational model.
  • If you are working with a smaller application or do not have access to O/RM tools, implement a common data access pattern such as Repository. With the Repository pattern, the repository objects allow you to treat domain entities as if they were located in memory.
  • When working with Web applications or services, group entities and support options that will partially load domain entities with only the required data. This allows applications to handle the higher user load required to support stateless operations, and limit the use of resources by avoiding holding initialized domain models for each user in memory.

Check Point

Before you finish this step, you should be able to answer the following questions:
  • Have you identified how the data access components map to the data source?

Step 4 –Determine How to Connect to The Data Source

Now that you know how the data access components map to the data source, you should identify how to connect to the data source, protect the user’s credentials and perform transactions.

Connections

Connections to data sources are a fundamental part of the data layer. All data source connections should be managed by the data layer. Creating and managing connections uses valuable resources in both the data layer and the data source.
  • Ensure that you open connections to the data source as late as possible and close them as early as possible. This will ensure that the resources are locked for duration as short as possible and are available to others.
  • Perform transactions through a single connection where possible.
  • For security reasons, avoid using a System or User Data Source Name (DSN) to store connection information.
  • Design retry logic to manage the situation where the connection to the data source is lost or times out.
  • Use connection pooling and tune performance based on results obtained by running simulated load scenarios.
  • Consider tuning isolation levels for data queries. If you are building an application with high-throughput requirements, special data operations may be performed at lower isolation levels than the rest of the transaction. Combining isolation levels can have a negative impact on data consistency, so you must carefully analyze this option on a case-by-case basis.
  • Execute commands on the database by batching, it will reduce round trips to the database server.
  • If you have non-volatile data use optimistic concurrency to mitigate the cost of locking data in the database. This avoids the overhead of locking database rows, including the connection that must be kept open during a lock.
  • If using a DataReader, use ordinal lookups for faster performance.

Other aspects that need to be considered are the security requirements associated with accessing your data source. In other words, how will data access components authenticate with a data source and what are the authorization requirements. Connections to data sources are a fundamental part of the data layer. All data source connections should be managed by the data layer. Creating and managing connections uses valuable resources in both the data layer and the data source.
  • Prefer Windows authentication over SQL Server authentication.
  • If you are using Microsoft SQL Server, consider using Windows authentication with a trusted subsystem.
  • If you are using SQL authentication, take the following steps to improve security:
    • Use custom accounts with strong passwords.
    • Limit the permissions of each account within SQL Server using database roles.
    • Add ACLs to any files used to store connection strings.
    • Encrypt connection strings in configuration files instead of using a system or user data source name (DSN).
  • Use accounts with least privilege in the database.
  • Don't store user passwords in databases for user validation. Instead, store password hashes with a salt instead of clear text or encrypted passwords.
  • Require that callers send identity information to the data layer for auditing purposes.
  • If you are using SQL statements, consider the parameterized approach instead of string concatenation to protect against SQL injection attacks.
  • Protect sensitive data sent over the network to and from SQL Server.
    • Windows authentication protects credentials, but not application data.
    • Use IPSec or SSL.

Connection Pooling

When you are using connection pooling, new connections are established by picking from the pool or creating new connection and adding it to the pool. Also, whenever you close a connection, it is just released back to the pool while the actual connection remains open. This holds up the resources on the server. As new connections are not required to be opened every time when connections are available in the pool, it saves time of this operation. Connection pooling comes with an overhead. Consider the following steps when opening a pooled connection:
  • Identify if the connection string matches an existing pool.
  • Create a new pool or open new connection and add it to the pool.
  • Reset the server-side connection.
  • To maximize the effectiveness of connection pooling, consider using a trusted subsystem security model and avoid impersonation if possible.

Common issues with connection pooling:
  • Connections can remain open for too long which can hold resources on the server. Opening connections early and closing late.
  • Connection pool overflow.
  • Connections remaining open because of using DataReader objects.

Transactions and Concurrency

Transactions enable you to treat associated actions on the database as an atomic unit and ensure database integrity. A transaction is only considered complete if all information and actions are complete, and the associated database changes are made permanent. Transactions support undo (rollback) database actions following an error, which helps to preserve the integrity of data in the database.
Use the following conditions to design transactions:
  • If you are accessing a single data source use connection-based transactions, whenever possible.
  • If you cannot use transactions, implement compensating methods to revert the data store to its previous state.
  • If you are using long-running atomic transactions, avoid holding locks for long periods. In such scenarios, use compensating locks for long-running transactions.
  • If the chance of a data conflict from concurrent users is low (for example, when users are generally adding data or editing different rows), consider using optimistic locking during data access.
  • If the chance of a data conflict from concurrent users is high (for example, when users are likely to be editing the same rows), consider using pessimistic locking during data access.
  • If transactions take a long time to complete, consider using asynchronous transactions that call back to the client when complete.
  • Keep transactions as short as possible to minimize lock durations and improve concurrency. But keeping the transactions too short will make the database access too chatty.
  • Use the appropriate isolation level. The tradeoff is data consistency versus contention. A high isolation level will offer higher data consistency at the price of overall concurrency. A lower isolation level improves performance by lowering contention at the cost of consistency.
  • If using manual or explicit transactions, consider implementing the transaction within a stored procedure.
  • Consider the use of multiple active result sets (MARS) in transaction-heavy concurrent applications to avoid potential deadlock issues.

Consider using T-SQL commands, ADO.NET, or System.Transaction to perform transactions.
  • T-SQL transactions are most efficient for server-controlled transactions on a single data store.
  • For most transactions in .NET the recommended approach is to use the implicit model provided by TransactionScope in the System.Transactions namespace. Although implicit transactions are not as quick as manual, or explicit, transactions, they are easier to develop and lead to middle tier solutions that are flexible and easy to maintain.
  • If you do not want to use the implicit model for transactions you can implement manual transactions using ADO.NET or the Transaction class in System.Transactions.

Check Point

Before you finish this step, you should be able to answer the following questions:
  • Have you identified how your data access layer will connect to the database?
  • Does your application need to connect to multiple data sources?
  • Have you identified how to secure your connections and user’s credentials?
  • Do you need to pool connections to the database?
  • How will you support transactions?

Step 5 – Determine Strategies for Handling Data Source Errors

In the next step you should design an overall strategy to handle data source errors. All exceptions associated with data sources should be caught by the data access layer. Exceptions concerning the data itself, and data source access and timeout errors, should be handled in this layer and passed to other layers only if the failures affect application responsiveness or functionality.

Exceptions

A centralized exception management strategy in your data layer will enable handling exceptions consistently. Consider adding the exception-handling logic in your database helper components which will help in centralizing it. Pay particular attention to exceptions that propagate through trust boundaries and to other layers or tiers. Design for unhandled exceptions so they do not result in application reliability issues or exposure of sensitive application information.
Following approach will help you in designing the exception management strategy:
  • Determine exceptions that should be caught and handled in the data access layer. Deadlocks, connection issues, and optimistic concurrency checks can often be resolved at the data layer.
  • Consider implementing a retry process for operations where data source errors or timeouts occur, where it is safe to do so.
  • Design an appropriate exception propagation strategy. For example, allow exceptions to bubble up to boundary layers where they can be logged and transformed as necessary before passing them to the next layer.
  • Design an appropriate logging and notification strategy for critical errors and exceptions that does not reveal sensitive information.

Retry Logic

Design the retry logic to catch errors during failover. Design the retry logic to catch any errors that occur while connecting to the database or executing commands (queries or transactions) on the database. When an error occurs, it should re-establish the connectivity, and then re-execute failed commands if necessary.
Retry Logic Flow
The retry logic re-executes a command against the database upon a failure. There could be multiple causes for the failure. There should be a certain number of times the application should re-execute the command and then give-up finally.
The following pseudo-code outlines one approach to automatic failover logic:
  1. Initiate the retry logic when an error occurs.
  2. Try to open the database connection.
  3. Re-execute the command (query or transaction) on the database.
  4. Return success if there are no errors.
  5. Catch errors which have occurred while opening connection or executing the command.
  6. Close the connections.
  7. Check the number of times the command has failed. If exceeded the maximum number of retries, then return failure.
  8. Go back to step 1.

Make sure that the retry is initiated after closing the old connections and new database connection is attempted.

Timeouts

Identifying the appropriate value for connection/command timeout is very important. Setting a high value would cause the client (for example, browser in case of Web applications) to timeout before the connection is opened. Setting a low value would cause the error handler to invoke the retry logic.

If a timeout occurs in the middle of executing a transaction, database resources can stay locked even after the connection is closed when connection pooling is enabled. In such cases when the connection is closed it should be discarded, that is not returned to the pool. This results in the transaction being rolled back, freeing the database resources.

Check Point

Before you finish this step, you should be able to answer the following questions:
  • Have you identified the various types of exceptions in data access layer?
  • Have you determined if you need retry logic?
  • Have you determined how to choose timeout values?
  • Have you designed exception handling mechanism?

Step 6 – Design Data Helper/Utility Objects (Optional)

This is an optional step to create data helper objects for the data access layer. These are specialized libraries or custom routines designed to maximize data access performance, reduce the development requirements of the logic components and the service agent and assist in data manipulation, data transformation, and data access within the layer.
Follow the steps below to design the data helper objects:
  • Identify functionality that could be moved out of the data access components and centralized for reuse.
  • Research available helper component libraries.
  • Consider custom helper components for common problems such as connection strings, data source authentication, monitoring, and exception processing.
  • Consider implementing routines for data access monitoring and testing in your helper components.
  • Consider the setup and implementation of logging for your helper components.

Check Point

Before you finish this step, you should be able to answer the following questions:
  • Does your application need helper or utility components?

Step 7 – Design Service Agent Objects (Optional)

This is an optional step to design service agents. These are objects which manage the semantics of communicating with external service, isolate your application from the idiosyncrasies of calling diverse services and provide additional services such as basic mapping between the format of the data exposed by the service and the format your application requires.
Follow the steps below to design the service agent objects:
  • Use the appropriate tool to add a service reference. This will generate a proxy and the data classes that represent the data contract from the service.
  • Determine how the service will be used in your application. For most applications, you should use an abstraction layer between the business layer and the data access layer, which will provide a consistent interface regardless of the data source. For smaller applications, the business layer, or even the presentation layer, may access the service agent directly.

Check Point

Before you finish this step, you should be able to answer the following questions:
  • Does your application need service agent components?

Additional Resources

For more information on general data access guidelines, see the following resources:

Last edited Dec 17, 2008 at 8:49 PM by prashantbansode, version 1

Comments

No comments yet.