Designing and Documenting Your Data Model
The importance of your data model’s design cannot be emphasized enough. It might get overlooked in many cases in the Salesforce world due to the fact that Salesforce comes with a pre-built data model, in addition to great flexibility for creating and changing custom fields and objects.
However, the proper data model design could be the difference between a smart, flexible, and scalable solution (that delivers valuable real-time insight that justifies all efforts required to enter the data) and a poorly designed database with tons of dumped data (that gives an overall feeling that data entry is an overhead).
Your data model is the foundation of your solution. Several solution aspects depend on your data models, such as the data sharing and visibility strategy and your ability to identify large data volume (LDV) objects.
You can read through the solution by understanding its data model. Your solution will be as strong as its foundation. You need to get your data model design right, and to do so, you need to understand key concepts in data modeling. This starts with normalization versus denormalization, going through three of the standard normal forms for database design, and finally, understanding the common relationship types between database tables and how that is reflected in Salesforce.
Normalization Versus Denormalization
Normalization is the process of arranging data in a database efficiently based on its relationships. This approach aims to remove data redundancy as it wastes disk space, slows down queries, and costs more processing time to execute create, read, update, and delete (CRUD) operations. Moreover, redundant data might also increase data inconsistency. For example, when the same data persists in multiple places and gets updated in one of them, you need to ensure that the change gets reflected in all other occurrences; otherwise, you may risk having data inconsistencies. Normalization should aim to get rid of data redundancy, but not at the cost of losing data integrity.
Normalization is based on the concept of what is known as normal forms. The dataset must meet specific criteria in order to be considered in one of these normal forms. There are three main normal forms (1NF, 2NF, and 3NF), all of which you will cover shortly, and others such as BCNF, 4NF, 5NF, and so on, which you are not going to cover in this book for brevity.
Denormalization can be considered the opposite of the normalization process. In a denormalized dataset, you intentionally use redundant information. This is done for several purposes but mostly to improve performance while executing queries and performing analytics. There is an associated overhead, of course, in keeping all the redundant data consistent and aligned.
The denormalization process reduces the number of tables (though it consumes more storage) and simplifies complicated table joins, which effectively enhances the performance while querying data that resides in multiple tables. The concept adopted by denormalization is that by placing all the data in one place, you could simplify the search process as it only needs to be executed on one table.
While designing your data model, you may come across use cases that could be better solved using a normalized set of tables. In other cases, denormalization could be the answer. You would understand the difference better with reference to these user stories.
The focus of this user story is on analytics/reports. Considering the standard reporting capabilities in Salesforce, it makes sense to store the account and account address details in two separate tables. This will also enable us to link the shipment records to the right address straight away, as well as build the desired report with minimal effort, even though Salesforce is doing additional processes behind the scenes to query data from multiple tables.
The following figure represents the proposed data model and an example of the data that is stored:
Figure 2.3 – Data model in a normalized form example
Now, explore the second user story.
The focus here is on the user experience while entering or viewing the data. In this case, it makes sense to use a denormalized dataset. These denormalized fields can easily be added to list views and page layouts. They can also be edited using fewer clicks.
You will come across more complicated scenarios where you could utilize a denormalized dataset to reduce data storage throughout this book. Although theoretically, a denormalized dataset consumes more storage data, in Salesforce, the data storage for the records of most objects is roughly 2 KB (with a few exceptions, such as person accounts and articles). This is true regardless of the number of fields in it, as well as if these fields are filled in or not (some field types are exceptions, such as rich text fields). As mentioned earlier in this book, there are some concepts of data modeling that may look different in Salesforce.
The following figure represents the proposed data model and an example of the data that is stored:
Figure 2.4 – Example of a data model in a denormalized form
The differences between normalized versus denormalized datasets can be summarized as follows:
- The normalization process relies on splitting data into multiple tables. The aim is to reduce data redundancy and increase consistency and data integrity. On the other hand, denormalization relies on combining data to speed up the retrieval processes. In Salesforce, it could also be used to reduce data storage and reduce the size of large data volume (LDV) objects. Despite that, this is not a common benefit in other databases.
- Normalization is usually used in online transaction processing (OLTP) systems, where the speed of insert, delete, and update operations is the key. On the other hand, denormalization is used with online analytical processing (OLAP), where the query’s speed and analytics are key.
- Data integrity is hard to maintain in denormalized datasets, unlike normalized datasets.
- Denormalization increases data redundancy.
- The denormalization process reduces the number of tables and potential join statements, as compared to normalization, which increases both of these.
- Typically, denormalized datasets take more disk storage. As mentioned earlier, this is not necessarily true in Salesforce.
The standard Salesforce data model is in normalized form by default. To further understand the normalization process, you need to understand the three main different types of normal forms.
Normal Forms
As you explored earlier, normalization is all about arranging data in a database efficiently based on its relationships. There are three common forms of data normalization. Explore each of these next.
First Normal Form
A database is considered in the first normal form if it meets the following conditions:
- Contains atomic values only: Atomic values are values that cannot be divided. For example, in the following figure, the value of the
Phone Number
column can be divided into three different phone numbers. Therefore, it is not in the first normal form (not 1NF):
Figure 2.5 – Table 1, which does not meet the first normal form (1NF)
- No repeating groups: This means that the table does not contain two or more fields/columns that represent multiple values for the same data entity. For example, in the following figure, you can see that the
Phone Number 1
,Phone Number 2
, andPhone Number 3
fields represent multiple values for the same data entity, which is the phone number. Therefore, this table is not in 1NF:
Figure 2.6 – Table 2, which does not meet the first normal form (1NF)
To bring the table shown in Figure 2.5 into the 1NF, you must split the table into the following two tables:
Figure 2.7 – Table 1 from Figure 2.5 modified to meet the first normal form (1NF)
Second Normal Form
A database is considered in the second normal form if it meets the following conditions:
- It is in 1NF.
- Non-key attributes function is based on the primary key. This is particularly applicable to cases where you have a composite key. (In Salesforce, the
ID
field is always the primary ID. There is no use for composite keys, which means that this condition is always met.) For example, the following table is not in 2NF becauseAddress City
is dependent on a subset of the composite key (which isSite ID
). This can be clearly seen in the second and fourth rows. Therefore, this table is not in 2NF:
Figure 2.8 – A table that does not meet the second normal form (2NF)
To bring the table into 2NF, you must split the table into the following two tables:
Figure 2.9 – The table from Figure 2.8 modified to meet the second normal form (2NF)
Third Normal Form
A database is considered in the third normal form if it meets the following conditions:
- It is in 2NF.
- Non-key attributes are not transitively dependent on the primary key. Take the following table as an example. The
ID
field is the primary key. The table is in 1NF and 2NF. TheName
,Partner Number
, andBank Code
fields are functionally dependent on theID
field. However, theBank Name
field is dependent on theBank Code
field. Therefore, this table is not in 3NF:
Figure 2.10 – A table that does not meet the third normal form (3NF)
To bring this table into 3NF, you must split the table into the following two tables:
Figure 2.11 – The table from Figure 2.10 modified to meet the third normal form (3NF)
Now that you have covered the three normalization normal forms, you will explore the types of relationships that can be created between the different database tables.