Reflections on Code First, Persistence and Domain Modeling

By Dino Esposito | July 2016

Dino Esposito Code First is a piece of functionality you find in Entity Framework (EF) that lets you model database tables using plain .NET classes. Frankly, I think the name Code First is a bit misleading, but the work it does under the hood is crystal clear. Code First lays out the structure of the database being used and provides an all-round object-oriented API to work with stored data.

Introduced with EF4.1, Code First is included up through EF6—just one of the approaches you can take to model your database through C# or Visual Basic classes. Up until EF6, you can also use a Visual Studio designer to infer the schema of the database, save it to an XML file with the EDMX extension and create ad hoc classes for use in code. The Visual Studio designer also lets you create an abstract model that’s later used to create a physical database.

To make a long story short, up until EF6 there have been two ways of doing the same stuff, but the EDMX approach—although functional—is more problematic than the other. For this reason, the upcoming EF7 discontinues the EDMX support.

Over the years, Code First has been associated with Domain-Driven Design (DDD) and this might have contributed to the general idea that Code First and EDMX are not exactly two ways of doing the same thing. In this column, I’ll offer a more architectural perspective of Code First and draw a line between the realm of the domain model and the realm of the persistence model. Code First, along with LINQ, realizes an old dream of most developers: It hides the intricacies of data access (tables, indexes, constraints) behind an object-oriented facade and qualifies as that object-oriented data definition language you never had.

Historical Background

While working with relational databases, you play by the rules of the SQL language. While coding applications, you play by the rules of the programming language of choice instead. Hence, an abstraction layer is required to bridge the object-oriented—or procedural—nature of top-level programming languages with the SQL language. In the Microsoft .NET Framework, this abstraction layer is the ADO.NET framework.

ADO.NET is a relatively thin abstraction layer in the sense that it only provides objects for your .NET code to place SQL commands. ADO.NET doesn’t map any data sent or retrieved from the database to ad hoc object-oriented data structures. In ADO.NET, the tools to get to the data are fully merged with the surrounding .NET Framework, but the data is flat.

About a decade ago, Object/Relational Mapper (O/RM) frameworks appeared on the horizon. An O/RM framework maps the properties of a class to the columns of a table. In doing so, it implements a bunch of design patterns such as Data Mapper, Unit of Work and Query Object. An O/RM framework also maintains internally a set of mapping rules and information about the schema of the target database. This is concrete and tangible information that must be saved somewhere and somehow. NHibernate—the first ever O/RM in the .NET space—stores that information as an XML file. EF initially took the same approach with EDMX files and added a nice designer to manage it from within Visual Studio. Code First maps class properties to columns and tables via either attributes or a fluent (and richer) API.

In a blog post that appeared several months ago, the EF team explained in a clear manner the motivation behind making Code First the only supported way to store data models in EF7. (You can read the full story at bit.ly/1sLM3Ur.) In the post, the expression “code-based modeling” is used as a more explanatory name for what Code First really does. I couldn’t agree more.

DDD in a Nutshell

DDD is an approach to software development that was initially devised as a set of rules applied systematically to control a monumental level of complexity (that is, a huge number of business rules and entities). While DDD shines in very large systems with at least hundreds of rules and entities, it has a lot of value for developers and architects in simpler scenarios. In a nutshell, there’s no reason for not applying certain parts of DDD in just about every software project. The part of DDD that’s valuable in any project is Strategic Design and is centered on the application of a few well-known methods: Ubiquitous Language, Bounded Context and Context Map. These analytical patterns have little to do with the actual classes and database tables you end up using in the final application, even though the ultimate goal of using them is to write code more effectively. The DDD strategic design patterns aim at analyzing the business domain and envisioning the top-level architecture of the resulting system. Figure 1 provides a possible top-level architecture for an e-commerce solution. Each block represents a bounded context identified during analysis and introduced to speed up development.

Figure 1 Sample Top-Level Architecture with Bounded Contexts

Each bounded context that comes up from your analysis has its own business language, its own software architecture (including technologies) and its own set of relationships to other bounded contexts. Each bounded context may then be implemented using the software architecture that best fits a given number and skills of the teams involved, budget and time constraints, plus any other stakeholders’ concerns such as those related to existing software licenses, costs, expertise, policies, and so on. DDD also provides a clear suggestion for what could be a really effective way to build stuff for a bounded context: the layered architecture.

The Domain Model in a Layered Architecture

Figure 2 provides the gist of a layered architecture. It has four layers—ranging from presentation to infrastructure—with an application layer and a domain layer in the middle. In short, it’s a generalized form of the well-known three-tier architecture—presentation, business, data—with a neat separation between use-cases logic that changes with the use cases you consider in the presentation and domain logic that’s inherent to the specific way of doing business, and is common to all use cases and presentation layers.

Figure 2 Schema of a Layered Architecture

The infrastructure layer includes whatever’s required to implement and support use cases and persist the state of the domain entities. The infrastructure layer, therefore, includes components that know the connection string to connect to the database.

Central to the DDD approach is the notion of a “domain model.” Quite simply, a domain model is a software model you create to fully represent the business domain. Put another way, it’s anything you can do with software that lets you deal with the domain you’re facing. Typically, a domain model is populated with entities, events and value objects, and some of the entities and value objects work together to form an indissoluble unit. DDD calls this an “aggregate” and the root of the aggregate is the aggregate root. Persistence occurs at the level of aggregate roots and the aggregate root typically is responsible for persisting all the other entities and value objects in the aggregate.

How would you code an aggregate of entities and value types? It depends on the programming paradigm you’re using. Most of the time, a domain model is an object-oriented model where entities are classes with properties and methods and value objects are immutable data structures. Using a functional language and immutable data structures is an option, however, at least in certain types of business domains.

Code First is a concrete technology strictly related to the performance of data access tasks. The most characterizing aspect of Code First is the use of classes to represent the underlying schema of tables and the data used by the application. Is the data used by the application the same as the data persisted by the application through relational tables? Or, asked another way, is the set of classes Code First uses to map tables in the relational database the same as the application’s domain model? Well, I’d mostly say no, but when it comes to software architecture, as always, it depends.

The Domain Model in a Layered Architecture

Code First is sometimes associated with DDD because of its ability to model application’s data through classes. While sometimes it’s more than acceptable to have a single set of classes that deal with both the business logic of the domain and persistence concerns, in general terms domain model and persistence model are distinct. The domain model is the software model you use to express the domain logic of the system and implement its business rules. It might be an object-oriented model, as well as a functional model or even a plain collection of static methods exposed out of helper classes.

The point of DDD is that you keep persistence concerns off the domain model and in the design of the domain model you focus more on what a business entity does (and how it’s used) than on the data it contains and manages. A behavior-centric approach breaks a monumental level of complexity down to a level that can be effectively tackled with code. Let’s consider a simple example, a sports match, as shown in Figure 3.

Figure 3 Behavior vs. Data in the Entity of a Domain Model

To express the behavior of a match entity in the context of a scoring system, you’d model actions like Start, Finish, Goal, Timeout and whatever else makes sense in the specific scenario. These methods implement all business rules and ensure that only actions consistent with the current state of the entity are carried out programmatically. For example, the method Goal would throw if invoked on a Match instance currently suspended because of a timeout. The internal state of the Match entity contains all those properties you’d typically associate with such an entity in a pure relational model except that these properties are read-only and updated only internally via methods.

Not all the classes you may have in a domain model must be persisted and persistence might include all properties or just a few. So Code First isn’t about domain modeling in general, but its API that maps properties to table columns can be used to persist the classes in your domain model that need be persisted. In this way, you have a single model for the domain that covers both business and persistence needs.

The Issue of Private Setters

In the domain modeling perspective, you only work with entity-following business workflows as outlined by domain experts. Looking back at the match score example, it might not be consistent with business rules setting the state of the match or the score programmatically. State and score, in fact, change as the workflow makes progress. Likewise, you’re not going to have a default parameterless constructor because it would return a Match entity devoid of some critical information such as the names of the playing teams and an ID that would reasonably tie the match to a competition. Yet, if you’re using a single model for business and persistence, a parameterless constructor is required; otherwise, EF wouldn’t be able to return an instance of the type after a query.

But there’s more to consider. When EF performs a query and returns an instance of the Match class, it needs to access the setters of all properties in order to save in the returned instance a state coherent with the information in the database. This legitimate requirement of EF conflicts with the design rules of a domain model. More in general, a way to force a state into an entity of a domain model must exist and most of the time it must be internal and not publicly available via code outside the layer. This is one of the purposes of domain services that, along with the domain, form the Domain Layer of Figure 2. If you use Code First, you can achieve the same by simply marking setters as non-public (internal, protected or even private) and adding a default constructor with the same non-public visibility. EF will still find a way (via reflection) to access private members and force a state, but public clients of the domain API won’t. Well, not until they use reflection themselves.

Wrapping Up

It’s not unusual to surf the Web and find articles that put Code First in relationship with DDD. Code First is all about persistence of an object-oriented model that’s explicitly mapped onto a set of tables. Conceptually speaking, the domain model is completely another thing and even living in a different layer. However, because of some specific capabilities of the Code First API—dealing with private setters and constructors—in some cases it’s possible to use a single object-oriented model that includes behavior and business rules and can be easily persisted to a relational database.

Dino Esposito is the author of “Microsoft .NET: Architecting Applications for the Enterprise” (Microsoft Press, 2014) and “Modern Web Applications with ASP.NET” (Microsoft Press, 2016). A technical evangelist for the .NET and Android platforms at JetBrains, and frequent speaker at industry events worldwide, Esposito shares his vision of software at software2cents@wordpress.com and on Twitter: @despos.

Thanks to the following Microsoft technical expert for reviewing this article: Jon Arne Saeteras

Discuss this article in the MSDN Magazine forum

[Cutting Edge]

Reflections on Code First, Persistence and Domain Modeling

Historical Background

DDD in a Nutshell

The Domain Model in a Layered Architecture

The Domain Model in a Layered Architecture

The Issue of Private Setters

Wrapping Up

Additional resources