Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Learning PostgreSQL

You're reading from   Learning PostgreSQL Create, develop and manage relational databases in real world applications using PostgreSQL

Arrow left icon
Product type Paperback
Published in Nov 2015
Publisher Packt
ISBN-13 9781783989188
Length 464 pages
Edition 1st Edition
Arrow right icon
Toc

Table of Contents (16) Chapters Close

Preface 1. Relational Databases FREE CHAPTER 2. PostgreSQL in Action 3. PostgreSQL Basic Building Blocks 4. PostgreSQL Advanced Building Blocks 5. SQL Language 6. Advanced Query Writing 7. Server-Side Programming with PL/pgSQL 8. PostgreSQL Security 9. The PostgreSQL System Catalog and System Administration Functions 10. Optimizing Database Performance 11. Beyond Conventional Data types 12. Testing 13. PostgreSQL JDBC 14. PostgreSQL and Hibernate Index

Data modeling

Data models describe real-world entities such as customer, service, products, and the relation between these entities. Data models provide an abstraction for the relations in the database. Data models aid the developers in modeling business requirements, and translating business requirements to relations in the relational database. They are also used for the exchange of information between the developers and business owners.

In the enterprise, data models play a very important role in achieving data consistency across interacting systems. For example, if an entity is not defined, or is poorly defined, then this will lead to inconsistent and misinterpreted data across the enterprise. For instance, if the semantics of the customer entity is not defined clearly, and different business departments use different names for the same entity such as customer and client, this may lead to confusion in the operational departments.

Another common bad practice is to define business rules that describe how things should be done at the database level. This contradicts the "abstraction of concerns" and leads to fixed complex data structures. The business departments should define what needs to be done but not how.

Data model perspectives

Data model perspectives are defined by ANSI as follows:

  • Conceptual data model: Describes the domain semantics, and is used to communicate the main business rules, actors, and concepts. It describes the business requirements at a high level and is often called a high-level data model. The conceptual model is the chain between developers and business departments in the application development life cycle.
  • Logical data model: Describes the semantics for a certain technology, for example, the UML class diagram for object-oriented languages.
  • Physical data model: Describes how data is actually stored and manipulated at the hardware level such as storage area network, table space, CPUs, and so on.

According to ANSI, this abstraction allows changing one part of the three perspectives without amending the other parts. One could change both the logical and the physical data models without changing the conceptual model. To explain, sorting data using bubble or quick sort is not of interest for the conceptual data model. Also, changing the structure of the relations could be transparent to the conceptual model. One could split one relation into many relations after applying normalization rules, or by using enum data types in order to model the lookup tables.

The entity-relation model

The entity-relation (ER) model falls in the conceptual data model category. It captures and represents the data model for both business users and developers. The ER model can be transformed into the relational model by following certain techniques.

Conceptual modeling is a part of the Software development life cycle (SDLC). It is normally done after the functional and data requirements-gathering stage. At this point, the developer is able to make the first draft of the ER diagram as well as describe functional requirements using data flow diagrams, sequence diagrams, user case scenarios, user stories, and many other techniques.

During the design phase, the database developer should give great attention to the design, run a benchmark stack to ensure performance, and validate user requirements. Developers modeling simple systems could start coding directly. However, care should be taken when making the design, since data modeling involves not only algorithms in modeling the application but also data. The change in design might lead to a lot of complexities in the future such as data migration from one data structure to another.

While designing a database schema, avoiding design pitfalls is not enough. There are alternative designs, where one could be chosen. The following pitfalls should be avoided:

  • Data redundancy: Bad database designs elicit redundant data. Redundant data can cause several other problems including data inconsistency and performance degradation. When updating a tuple which contains redundant data, the changes on the redundant data should be reflected in all the tuples that contain this data.
  • Null saturation: By nature, some applications have sparse data, such as medical applications. Imagine a relation called diagnostics which has hundreds of attributes for symptoms like fever, headache, sneezing, and so on. Most of them are not valid for certain diagnostics, but they are valid in general. This could be modeled by utilizing complex data types like JSON, or by using vertical modeling like entity-attribute-value (EAV).
  • Tight coupling: In some cases, tight coupling leads to complex and difficult-to-change data structures. Since business requirements change with time, some requirements might become obsolete. Modeling generalization and specialization (for example a part-time student is a student) in a tightly coupled way may cause problems.

Sample application

In order to explain the basics of the ER model, an online web portal to buy and sell cars will be modeled. The requirements of this sample application are listed as follows, and an ER model will be developed step-by-step:

  • The portal provides the facility to register the users online, and provide different services for the users based on their categories.
  • The users might be sellers or normal users. The sellers can create new car advertisements; other users can explore and search for cars.
  • All users should provide there full name and a valid e-mail address during registration. The e-mail address will be used for logging in.
  • The seller should also provide an address.
  • The user can rate the advertisement and the seller's service quality.
  • All users' search history should be maintained for later use.
  • The sellers have ranks and this affects the advertisement search; the rank is determined by the number of posted advertisements and the user's rank.
  • The car advertisement has a date and the car can have many attributes such as color, number of doors, number of previous owners, registration number, pictures and so on.

Entities, attributes, and keys

The ER diagram represents entities, attributes, and relationships. An entity is a representation of a real-world object such as car or a user. An attribute is a property of an object and describes it. A relationship represents an association between two or more entities.

The attributes might be composite or simple (atomic). Composite attributes can be divided into smaller subparts. A subpart of a composite attribute provides incomplete information that is semantically not useful by itself. For example, the address is composed of street name, building number, and postal code. Any one of them isn't useful alone without its counterparts.

Attributes could also be single-valued or multi-valued. The color of a bird is an example of a multi-valued attribute. It can be red and black, or a combination of any other colors. A multi-valued attribute can have a lower and upper bound to constrain the number of values allowed. In addition, some attributes can be derived from other attributes. Age can be derived from the birth date. In our example, the final rank of a seller is derived from the number of advertisements and the user ratings.

Finally, key attributes can identify an entity in the real world. A key attribute should be marked as a unique attribute, but not necessarily as a primary key, when physically modeling the relation. Finally, several attribute types could be grouped together to form a complex attribute.

Attribute symbol

Meaning

Entities, attributes, and keys

Key attribute

Example: E-mail address

Entities, attributes, and keys

Attribute

Example: Date of birth

Entities, attributes, and keys

Derived attribute

Example: Age

Entities, attributes, and keys

Multi-valued attribute

Example: Car color

Entities, attributes, and keys

Composite attribute

Example: Address

Summary of the attribute notation for ER diagrams.

Entities should have a name and a set of attributes. They are classified into the following:

  • Weak entity: Does not have key attributes of its own
  • Strong entity or regular entity: Has a key attribute.

A weak entity is usually related to another strong entity. This strong entity is called the identifying entity. Weak entities have a partial key, aka "discriminator", which is an attribute that can uniquely identify the weak entity, and it is related to the identifying entity. In our example, if we assume that the search key is distinct each time the user searches for cars, then the search key is the partial key. The weak entity symbol is distinguished by surrounding the entity box with a double line.

Entity symbol

Meaning

Entities, attributes, and keys

Weak entity

Entities, attributes, and keys

Strong entity

ER entities symbols

The next image shows the preliminary design of the online. The user entity has several attributes. The name attribute is a composite attribute, and e-mail is a key attribute. The seller entity is a specialization of the user entity. The total rank is a derived attribute calculated by aggregating the user ratings and the number of advertisements. The color attribute of the car is multi-valued. The seller can be rated by the users for certain advertisements; this relation is a ternary relation, because the rating involves three entities which are car, seller, and user. The car picture is a subpart attribute of the advertisement. The following diagram shows that the car can be advertised more than once by different sellers. In the real world, this makes sense, because one could ask more than one seller to sell his car.

Entities, attributes, and keys

The entity design of the car portal database.

When an attribute of one entity refers to another entity, some relationships exist. In the ER model, these references should not be modeled as attributes but as relationships or weak entities. Similar to entities, there are two classes of relationships: weak and strong. Weak relationships associate the weak entities with other entities. Relationships can have attributes as entities. In our example, the car is advertised by the seller; the advertisement date is a property of the relationship.

Relationships have cardinality constraints to limit the possible combinations of entities that participate in a relationship. The cardinality constraint of car and seller is 1:N; the car is advertised by one seller, and the seller can advertise many cars. The participation between seller and user is called total participation, and is denoted by a double line. This means that a seller cannot coexist alone, and he must be a user.

Tip

The many-to-many relationship cardinality constraint is denoted by N:M to emphasize different participation from the entities.

Entities, attributes, and keys

Car web portal ER diagram.

Up until now, only the basic concepts of ER diagrams have been covered. Some concepts such as (min, max) cardinality notation, ternary/n-ary relationships, generalization, specialization, and Enhanced Entity relation diagrams (EER) have not been discussed.

Mapping ER to Relations

The rules for mapping an ER diagram to a set of relations (that is, the database schema) are almost straightforward but not rigid. One could model an entity as an attribute, and then refine it to a relationship. An attribute which belongs to several entities can be promoted to be an independent entity. The most common rules are listed as follows (note that only basic rules have been covered, and the list is not exhaustive):

  • Map regular entities to relations: If entities have composite attributes, then include all the subparts of the attributes. Pick one of the key attributes as a primary key.
  • Map weak entities to relations, include simple attributes and the subparts of the composite attributes. Add a foreign key to reference the identifying entity. The primary key is normally the combination of the partial key and the foreign key.
  • If a relationship has an attribute, and the relation cardinality is 1:1, then the relation attribute can be assigned to one of the participating entities.
  • If a relationship has an attribute, and the relation cardinality is 1:N, then the relation attribute can be assigned to the participating entity on the N side. For example, the advertisement_date can be assigned to the car relation.
  • Map many-to-many relationships, also known as N:M, to a new relation. Add foreign keys to reference the participating entities. The primary key is the composition of foreign keys. The customer_service relation is an example of many-to-many relationship.
  • Map a multi-valued attribute to a relation. Add a foreign key to reference the entity that owns the multi-valued attribute. The primary key is the composition of the foreign key and the multi-valued attribute.

UML class diagrams

Unified modeling language (UML) is a standard developed by Object Management Group (OMG). UML diagrams are widely used in modeling software solutions, and there are several types of UML diagrams for different modeling purposes including class, use case, activity, and implementation diagrams.

A class diagram can represent several types of associations, that is, the relationship between classes. They can depict attributes as well as methods. An ER diagram can be easily translated into a UML class diagram. UML class diagrams also have the following advantages:

  • Code reverse engineering: The database schema can be easily reversed to generate a UML class diagram.
  • Modeling extended relational database objects: Modern relational databases have several object types such as sequences, views, indexes, functions, and stored procedures. UML class diagrams have the capability to represent these objects types.
You have been reading a chapter from
Learning PostgreSQL
Published in: Nov 2015
Publisher: Packt
ISBN-13: 9781783989188
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image