Codementor Events

TPT Polymorphism - When a good ERD leaves you with 1:1 relationships in your database

Published Sep 28, 2021
TPT Polymorphism - When a good ERD leaves you with 1:1 relationships in your database

I recently stumbled across this Post on StackOverflow: can you make one to one relationship with a single attribute entity to another entity? and could instantly relate to the author's plight. There are many real world and application scenarios that look simple enough when you reduce them down to a normalised ERD, you feel like you've sucessfully modelled the business domain elements into meaningful relationships and are ready to build a schema from the model. Yet for all that confidence many of us hesitate at the sight of 1:1 relationships and seriously question the existence a 1:1 relationship where a dependant table is optional and has a single PK/FK column and ultimately go back and change the ERD to remove what we interpret as an aberration.

What we often lack in this domain is the language to express our intial understanding of the model when the ERD was designed and some good reference material to justify our assumptions. Just because the model is now expressed in the form of SQL should not invalidate original concepts.

I wanted to post a reply on Stack Overflow, but a question like this is hard to be accepted as "On Topic" for that site, what I really want to do is sit down with the author and guide them through an evaluation of the merits of a database schema that stays true a normalised ERD. By writing this I hope to reach others who have also been in the situation of having to explain what you feel to be a great schema idea but find it hard to sell to the rest of the team when they see some SQL that makes them uncomfortable.

This article therfore serves as a rebuttle to all the published material out there that

The Problem

So for my database project, I designed a meat supply chain optimization project for a large organization, for example such as walmart, where every part of the chain is owned by the organization, (transport, farm, retail store etc). I created an entity called Workplace with primary attribute Workplace_ID, and other attributes Type (farm, retail store, transport etc) and location. Workplace_ID is also the primary key for the other workplaces such as Slaugherhouse, Farm, Transport etc. What I intended was to do was for the other entities' primary key to be foreign key Workplace_ID, where I could easily create new tables specifying type and workplace_ID, relating Workplace entity to for example Farm entity. I have linked the ERD diagram here so that you can see how it was designed.

The problem was, my professor didn't even listen to my final presentation, he just saw the auto generated ERD from XAMPP, and said that we were kidding him and wasting his time with a single attribute entity (Farm) having a foreign key as primary key, with the relation being 1 to 1. He then asked us to provide one example from anywhere of such a thing existing, an ERD would serve as a valid example, implying that such a thing was impossible to find, with the implication being if we could not provide any example we would receive 0.

An important thing to note that he did not let me clarify my design choices, and that I intended it to be a IS A 1-1 relationship, sure, as in one instance of farm entity can only be one instance of workplace entity, but also that not all instances of workplace entities were instances of farm entities, it would only be instances of workplace entities which had farm as the type attribute. I designed it as such so that all workplaces did not have to include their locations and types again and again in their attributes, you would create a relationship table between two entities specifying type and Workplace_ID. I designed it in such a way since for some, Workplace_ID is primary key, whereas for others, it is simply a foreign key relating to other entities (for example, Batch has Workplace_ID signifying where the particular batch came from (which slaughterhouse), but its primary key is Batch_ID).

OAAmm.jpg

A Review of the proposed Schema

This ERD is a representation of Domain Key Normal Form (DKNF) and is a common result of application database schema design that follows Table per Type (TPT) pattern to persist the application data classes into storage.

We're talking about having a common or abstract database table, in this case WorkPlace, and that Workplace can be a Farm or a Slaughterhouse or a Retail Storefront.

Where Workplace has a 1:1 relationship with Farm,Slaughterhouse and Retail Storefront where the dependent entity is optional, this allows for 2 programming concepts to be applied: Inheritance and Polymorphism.

This can result in some tables existing with only a single PK column that effectively becomes a discriminator column that is used to differentiate the type of each the principal table. It can also result in a single conceptual data "entry" being represented as single rows across multiple tables, sometimes referred to as Sharding, even though this term is more commonly referred to as distributing data between multiple databases many of those arguments can be used within the database as well, we're just on a smaller scale.

The alternative to this model would be to implement TPH (Table per Hierarchy) but this would specifically preclude polymorphism and (A Workplace cannot be both a Farm and a Retail Storefront) which might be acceptable in the current environment and would now involve multiple Controllers, Repositories but it would also result in a duplicated data structures as the fields for Workplace would be duplicated in each of the dependent tables or worse, an additional linking table would be required between the dependent tables and the Workplace.

This type of schema pairs very well with ORMs like Entity Framework and the Unit Of Work pattern without compromising compatibility with legacy Repository Pattern tooling. The validity of TPT or 1:1 relationships in your database design is subjective to each scenario, a quick search returns discussion forums like these:

As well as many others that like Entity Framework Table Per Type Performance that unfortuantely show poor understanding on how to write efficent Linq to SQL queries in general. By version 6, EF really matured but TPT is still one of the least understood or respected design patterns which only further perpetuates the association with poor application performace.

First Response to the "Professor"

From an ERD point of a Professor should not be concerned about the number of fields in the dependent tables, it is the relationships that matter. Over the lifetime of the application/solution each table presents an extensibility point were we can easily extend the functionality or metadata for the represented conceptual entity. Manipulating relationships later is an expensive process, but adding fields is comparatively cheap, in terms of development time, maintenance and risk. So we try to model the concepts as best we can during the design phase, the proposed model captures the specific business domains in a logical format, it presents as well thought out and lends itself to a simple and minimal user interface that could be largely in line with the data structure.

I say minimal in terms of only ever needing to code 1 user interface and the associated data plumbing code for all types of workplaces, no need for complex switching logic in the View layer. There are different coding solutions to achieve the same thing, but this schema keeps the code and the data in sync without needing to understand deeper interfaces, this is a good compromise that keeps DBAs and Developers on a similar page without 1 having to create further abstractions to please or work around the other.

Supporting Argument 1: OO Inheritance

From an application design point of view Farm, Slaughterhouse and Retail Storefront all inherit from Workplace. Describe this model to any OO developer and this make perfect sense, we encapsulate the Properties, Behaviors and Relationships within the Workplace entity.

In general, by encapsulating common relationships and fields, like in this case Employees in a single table then you only have to manage these field definitions, relationships, constraints and views all in one place. This will reduce the overall maintenance effort rather than having to maintain multiple similar structures, remember that to maintain multiple anything should also indicate duplication of testing routines, but the concept of duplication is itself an anti-pattern that should be avoided if nothing else than it leads to ambiguity in the SDLC process, when the duplicated code blocks get out of sync is it an error? which one should be the master? Should improvements and changes in one of the code clones be automatically propagated to the others? Inheritance is a way of documenting the original intent, whilst still allowing future implementations to override when or if they need to.

Supporting Argument 2: Poly-Morphism

Compared to TPH and TPC patterns, only TPT allows for multiple inheritance structures. But multiple inheritance as a concept is traditionally hard to directly implement in some programming languages that only support single object inheritance so it is usually frowned upon. In this suggested ERD structure the inheritance chain is simplified and expressed in the codebase as a series of relationships. The code can, but does not have to directly implement the inheritance chain as traditional OO Inheritance, as long as the conceptual entity can still be treated as a single conceptual item the chosen application language can usually use composition or interface concepts to model the same relationship rules in a strongly typed way.

If you want a single conceptual Entity that IS A Farm, to also be a 'Slaughterhouse' then we can represent this in the database as 1:1 relationships back to the master or base/common table.

Before you say that is not realistic, I grew up on a farm that had both an Abattoir and a StoreFront on the same site, we even had a commercial manufacturing facility as well... I strongly doubt our farm was the only such establishment like that.

Without the 1:1 relationships, my farm would need a separate entry in each of the tables, meaning my farm does not have a single Id, it is represented by multiple Ids, then I need to maintain the employees across each of these Workplaces, or arbitrarily designate one of them to hold the relevant information and not bother maintaining the others. You want to find a way to resolve your business domain concepts down to a single definitive identifier that can be used, your DKNF model provides for this in a very simple, yet elegant way.

Without the common table to specify the Id, ambiguity creeps in again, if you need to report on the employees of all slaughter houses, but in my farm I chose to only store my employees as employees to the Farm identity, then my employees will not as easily be represented in the report.

Supporting Argument 3: Default Query Performance

The usual argument against this design pattern is:

the singleton or minimal tables that participate in a 1:1 relationship could just as easily be represented with a discriminator column or a flag in the principal table and ignoring the fields that will only contain data when the flag is set.

That statement is true, but in production you will need to manage indexes on the discriminator column, then you my also need to implement filtered indexes that cover the additional minimal set of fields that would normally have been represented in the minimal 1:1 dependent table.

On top of that, you will need to ensure that the queries executed by the application are written correctly so as to consistently use the indexes that are created.

The end result of the optimised TPC or TPH should be almost the same number of indexes and should also include management of filtered indexes. Except the management of TPT indexes is far simpler, with minimal if any filtering required, as the data only exists in each table that the conceptual entity requires.

The often understated benefit to this design is that DBAs, Developers, ORMs and other modern developer tools will create queries that are more efficient by default due to the complex joins they may be forced to write. The join syntax cannot be avoided to get to the data that we need, you also end up querying less fields by default, because even when you accidentally use [table].* in your select logic, we are still only bringing back the columns for that specific table or sub-set of the entity, not all of the possible columns across the domain object.

DO NOT BE AFFRAID OF JOINS
Yes writing queries against this schema will involve a lot of joins and some of those queries might appear to be complicated. But do not confuse this with poor performance, there are slight overheads involved with looking up the references to entities that have a 1:1 relationship compared to a non-normalised or TPH data schema, but these should be marginal especially given the other signficant runtime and SDLC performance benefits offered by TPT.

Some DBAs may even lament at the effort it now requires them to compose and execute ad-hoc queries or data maintenance tasks. There should not be many data maintenance tasks that they need to attend to in a normalised structure like this, when it is needed, it should be performed instead through the application layer, which ultimately means there would be minimal SQL being hand coded at all.

Any good DBA wouldn't complain, they'd just write some views to represent the de-normalised conceptual data objects that they wish to deal with.

Supporting Argument 4: Database Query Plan Management

Modern database engines have a lot of optimisations under the hood that we often overlook or take for granted as Software Engineers. If you think about the usage patterns that evolve within applications there are some sets of data that are considered "Hot" in terms of Inserts or Updates or perhaps some sets of data that might have a much higher percentage of "Reads" compared to the "Writes", there are some tables that either due to caching at the application level or due to the relevance of the data become "Cold" and might be queried only sporadically compared to the rest.

An enterprise level database might detect these usage patterns and optmize the underlying data storage in response to the usage. If your chosen RDBMS does not do this automatically there will be features that allow you to actively manage this yourself.

The proposed schema closely models the business domains, and as a by-product it will group data together that has a common usage pattern, it is the relationships that define the boundaries between the sets of data that commonly go together. What we find in a TPT structure is that we can simplify optimisation strategies by focusing on the entire table, we might have some hardware disks that have lower latency or faster response times, most RDBMS allow us to define for each table that the data is stored in a separate physically media from the rest of the database.

When we need to squeeze the most out of our hardware it might even be advantageous to split a table into 2, linked 1:1, if we identify that a sub-set of the fields is retrieved or updated more frequently than the rest.

Supporting Argument 5: Extensibility

By distributing the data across multiple Entities where each entity represents a business discrete domain concept it becomes easier to extend the model in the future, both in terms of nomenclature of the corresponding tables, fields and classes, but also in terms of the effort involved in implementing the change. If today Farm has no additional fields to the common Workplace then in future versions when I need to add more metadata to the Farm this can be safely added without perhaps even needing to re-test the implementation of the Employees for a Farm or without concern or knowledge that a Farm might have employees at all.

From a code maintenance POV this significantly reduces the risk to implementing future extensions and will also reduce the time it takes to get the new features to the clients.

It also simplifies extending the model with new Entities, like feed Suppliers or Warehouses that could both conceivably have Employees and so are also candidates to be Workplaces.

By extending Workplace with these new conceptual entities through the inheritance implementation all the previous user interface will still be relevant, each entity might only require a single tab page or extra component template added to the UI for the entity itself, everything else should be ready for re-use with minial effort.

Supporting Argument 6: Data Security

A common application requirement is to implement data access security in terms of being able to prevent certain types of users from either having access to certain types of information or at least being able to modify certain type of information. Users are usually grouped by the Role they fulfil in the workplace and the type of data can usually be correlated to specific business domains.

Many RDBMS provide solutions for row level access rules, and it is possible to setup field level security if needed. However field level security is almost always expensive to setup and maintain as well as introducing a runtime performance cost to consider.

By spreading the data for a single conceptual entry across multiple tables, where each table represents a common security requirement, then we can simplify requests for field level security by scoping security to tables instead. This becomes much simpler to implement at the application level if the user interface maintains the grouping of data that matches the ERD, with a good design you can offer most of the benefits from field level security with the minimal cost of row level security.

Conclusion

What is a good theoretical design is subjective, my main argument is that business domains are rife with polymorphic concepts, to attempt to model these concepts in a database structure that does not support it results in compromises that if implemented poorly may not be acceptable to the end user at all.

What is clear is that a good application and its database should take well known theories and concepts into account but it should attempt to strike a balance between paradigms and patterns that optimise the Software Development Lifecycle and minimise risk of failure to deliver with traditional or perhaps legacy approaches to design that specifically restrict the potential of the application to achieve the end user's expectations.

TPT and a few carefully placed 1:1 relationships could be the key to unlocking your business concepts and allowing them to evolve into truely polymorphic entities.

Discover and read more posts from Chris Schaller
get started