Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
PostgreSQL for Data Architects
PostgreSQL for Data Architects

PostgreSQL for Data Architects: Discover how to design, develop, and maintain your database application effectively with PostgreSQL

eBook
€22.99 €32.99
Paperback
€41.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

PostgreSQL for Data Architects

Chapter 2. Server Architecture

In the previous chapter, we saw that starting a PostgreSQL cluster kicks off a few processes. The processes manage quite a bit of file I/O, manage other processes, and deal with memory allocation and a lot of other work. In this chapter, we will focus on the processes and their roles, memory management, and how they work together.

We will start by listing the processes once again. No client is connected to any of the databases yet:

[root@MyCentOS ~]# ps f -U postgres
  PID TTY      STAT   TIME COMMAND
 1918 tty1     S      0:00 /usr/local/pgsql/bin/postgres
 1920 ?        Ss     0:00  \_ postgres: checkpointer process   
 1921 ?        Ss     0:00  \_ postgres: writer process     
 1922 ?        Ss     0:00  \_ postgres: wal writer process   
 1923 ?        Ss     0:00  \_ postgres: autovacuum launcher process 
 1924 ?        Ss     0:00  \_ postgres: stats collector process   

Starting with the daemon process

The first process that is started when we start PostgreSQL is /usr/local/pgsql/bin/postgres. This process has quite a few responsibilities such as performing recovery, initializing shared data structures/memory space, and kicking off the mandatory and optional processes. These processes are also referred to as utility processes and include bgwriter, checkpointer, autovacuum launcher, log writer, stats collector process, and so on. The daemon process also listens for connection requests, receives requests for connections from clients, and spawns server processes for the client. It's obvious that the daemon itself is a mandatory process that should be running for a user to connect to the database.

Let's focus on the user connecting-issuing-commands scenario and other pieces should fall in place. The following diagram walks you through the process of how the daemon process receives a connection request and starts (forks) a backend process. The backend...

Understanding the shared buffer

When there are thousands of users trying to read/write data to many different tables, reading from the directories/files (which we saw getting created when we installed PostgreSQL and created a database with a couple of tables) will result in a miserably non-scalable system. The reads and writes will result in searching for many files, opening these files, using fseek() for specific data records, locking, editing, and unlocking. To make this a lot more scalable and faster, the concept of shared buffers (memory area) is introduced. Now, the backend processes are no longer reading from the files and writing to the files, but dealing with buffers or RAM, with significant improvement in performance. The amount of memory to be allocated is decided by the shared_buffers parameter in postgresql.conf. This fixed-size block of shared memory is allocated when the server is started.

It's not this memory chunk alone that is responsible for improving the response...

Checkpoint

Checkpoint is a mandatory process. To understand this, let's discuss blocks. PostgreSQL always reads and writes data in blocks. Consider the emp table. It has just one record. The data in this record should add up to a few bytes; we have the value 1 in the column id, and the value Newname in the column first_name. However, this table will consume 8K in the disk because PostgreSQL works with 8K blocks. A block is also referred to as a page. It is easy to verify that PostgreSQL uses blocks. Ensure that our table has just one record as follows:

SELECT * FROM emp;
 id | first_name 
----+------------
  1 | Newname
(1 row)

Then, we find the filename:

SELECT pg_relation_filepath('emp');
 pg_relation_filepath 
----------------------
 base/24741/24742
(1 row)

Now, we check the size of the file:

\! ls -l /pgdata/9.3/base/24741/24742
-rw-------. 1 postgres postgres 8192 Nov 15 11:33 /pgdata/9.3/base/24741/24742

8192 bytes = 8K. So, a table with just one record takes up 8K.

Let&apos...

WAL and the WAL writer process

When we make changes to the data, the changes are not written to the data files immediately, as mentioned before (probably many times). Changes are made to the blocks in the buffer and records of these changes are written to the WAL buffer (as soon as changes to data are made). The changes are flushed to the WAL segments when the changes are committed.

In the pg_xlog directory, the WAL segments are each 16 MB in size:

[postgres@MyCentOS pg_xlog]$ pwd
/pgdata/9.3/pg_xlog
[postgres@MyCentOS pg_xlog]$ ls -alrt
total 16396
drwx------.  2 postgres postgres     4096 Oct 13 13:23 archive_status
drwx------.  3 postgres postgres     4096 Oct 13 13:23 .
drwx------. 15 postgres postgres     4096 Nov 15 20:17 ..
-rw-------.  1 postgres postgres 16777216 Nov 15 20:17 000000010000000000000001

We can find out the segment PostgreSQL is writing to now using the pg_current_xlog_location function:

[postgres@MyCentOS pg_xlog]$ psql
psql (9.3.0)
Type "help" for help.

postgres...

The background writer

The background writer is responsible for writing to disk specific dirty buffers based on an algorithm, whereas checkpointer writes all dirty buffers. The process takes into consideration shared memory usage data as well as information about which blocks have been used/accessed recently (least recently used). The primary objective of this process is to ensure that free buffers are available for use. The relevant parameters are as follows:

#bgwriter_delay = 200ms                 # 10-10000ms between rounds
#bgwriter_lru_maxpages = 100            # 0-1000 max buffers written/round
#bgwriter_lru_multiplier = 2.0          # 0-10.0 multiplier on buffers scanned/round

As we can see, the default value for delay is 200 milliseconds. This parameter specifies how long the process should wait between successive executions. The bgwriter_lru_maxpages parameter specifies the maximum number of buffers that will be written by the process in each iteration. The third parameter is also...

Starting with the daemon process


The first process that is started when we start PostgreSQL is /usr/local/pgsql/bin/postgres. This process has quite a few responsibilities such as performing recovery, initializing shared data structures/memory space, and kicking off the mandatory and optional processes. These processes are also referred to as utility processes and include bgwriter, checkpointer, autovacuum launcher, log writer, stats collector process, and so on. The daemon process also listens for connection requests, receives requests for connections from clients, and spawns server processes for the client. It's obvious that the daemon itself is a mandatory process that should be running for a user to connect to the database.

Let's focus on the user connecting-issuing-commands scenario and other pieces should fall in place. The following diagram walks you through the process of how the daemon process receives a connection request and starts (forks) a backend process. The backend process...

Understanding the shared buffer


When there are thousands of users trying to read/write data to many different tables, reading from the directories/files (which we saw getting created when we installed PostgreSQL and created a database with a couple of tables) will result in a miserably non-scalable system. The reads and writes will result in searching for many files, opening these files, using fseek() for specific data records, locking, editing, and unlocking. To make this a lot more scalable and faster, the concept of shared buffers (memory area) is introduced. Now, the backend processes are no longer reading from the files and writing to the files, but dealing with buffers or RAM, with significant improvement in performance. The amount of memory to be allocated is decided by the shared_buffers parameter in postgresql.conf. This fixed-size block of shared memory is allocated when the server is started.

It's not this memory chunk alone that is responsible for improving the response times...

Checkpoint


Checkpoint is a mandatory process. To understand this, let's discuss blocks. PostgreSQL always reads and writes data in blocks. Consider the emp table. It has just one record. The data in this record should add up to a few bytes; we have the value 1 in the column id, and the value Newname in the column first_name. However, this table will consume 8K in the disk because PostgreSQL works with 8K blocks. A block is also referred to as a page. It is easy to verify that PostgreSQL uses blocks. Ensure that our table has just one record as follows:

SELECT * FROM emp;
 id | first_name 
----+------------
  1 | Newname
(1 row)

Then, we find the filename:

SELECT pg_relation_filepath('emp');
 pg_relation_filepath 
----------------------
 base/24741/24742
(1 row)

Now, we check the size of the file:

\! ls -l /pgdata/9.3/base/24741/24742
-rw-------. 1 postgres postgres 8192 Nov 15 11:33 /pgdata/9.3/base/24741/24742

8192 bytes = 8K. So, a table with just one record takes up 8K.

Let's try inserting some...

WAL and the WAL writer process


When we make changes to the data, the changes are not written to the data files immediately, as mentioned before (probably many times). Changes are made to the blocks in the buffer and records of these changes are written to the WAL buffer (as soon as changes to data are made). The changes are flushed to the WAL segments when the changes are committed.

In the pg_xlog directory, the WAL segments are each 16 MB in size:

[postgres@MyCentOS pg_xlog]$ pwd
/pgdata/9.3/pg_xlog
[postgres@MyCentOS pg_xlog]$ ls -alrt
total 16396
drwx------.  2 postgres postgres     4096 Oct 13 13:23 archive_status
drwx------.  3 postgres postgres     4096 Oct 13 13:23 .
drwx------. 15 postgres postgres     4096 Nov 15 20:17 ..
-rw-------.  1 postgres postgres 16777216 Nov 15 20:17 000000010000000000000001

We can find out the segment PostgreSQL is writing to now using the pg_current_xlog_location function:

[postgres@MyCentOS pg_xlog]$ psql
psql (9.3.0)
Type "help" for help.

postgres=# SELECT...

The background writer


The background writer is responsible for writing to disk specific dirty buffers based on an algorithm, whereas checkpointer writes all dirty buffers. The process takes into consideration shared memory usage data as well as information about which blocks have been used/accessed recently (least recently used). The primary objective of this process is to ensure that free buffers are available for use. The relevant parameters are as follows:

#bgwriter_delay = 200ms                 # 10-10000ms between rounds
#bgwriter_lru_maxpages = 100            # 0-1000 max buffers written/round
#bgwriter_lru_multiplier = 2.0          # 0-10.0 multiplier on buffers scanned/round

As we can see, the default value for delay is 200 milliseconds. This parameter specifies how long the process should wait between successive executions. The bgwriter_lru_maxpages parameter specifies the maximum number of buffers that will be written by the process in each iteration. The third parameter is also...

The autovacuum launcher process


This is an optional process. There is a parameter called autovacuum in postgresql.conf, with ON as default value. This process automates the execution of vacuum and analyzes commands based on a few parameters. To understand autovacuum, first, we have to understand vacuum.

Assume that we delete a few records from a table. PostgreSQL does not immediately remove the deleted tuples from the data files. These are marked as deleted. Similarly, when a record is updated, it's roughly equivalent to one delete and one insert. The previous version of the record continues to be in the data file. Each update of a database row generates a new version of the row. The reason is simple: there can be active transactions, which want to see the data as it was before. As a result of this activity, there will be a lot of unusable space in the data files. After some time, these dead records become irrelevant as there are no transactions still around to see the old data. However,...

The logging process


This is an optional process and the default setting is off. We have to set the logging_collector parameter to on to start this process:

cd $PGDATA

Edit the postgresql.conf file and make a few changes:

log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
log_min_duration_statement = 0

[postgres@MyCentOS 9.3]$ pg_ctl restart
waiting for server to shut down.... done
server stopped
server starting
[postgres@MyCentOS 9.3]$ ps f -U postgres
  PID TTY      STAT   TIME COMMAND
 2581 pts/2    S      0:00 -bash
 3201 pts/2    R+     0:00  \_ ps f -U postgres
 2218 pts/1    S+     0:00 -bash
 3186 pts/2    S      0:00 /usr/local/pgsql/bin/postgres
 3187 ?        Ss     0:00  \_ postgres: logger process     
 3189 ?        Ss     0:00  \_ postgres: checkpointer process   
 3190 ?        Ss     0:00  \_ postgres: writer process     
 3191 ?        Ss     0:00  \_ postgres: wal writer process   
 3192 ?        Ss     0:00  \_ postgres: autovacuum launcher process...
Left arrow icon Right arrow icon

Description

This book is for developers and data architects who have some exposure to databases. It is assumed that you understand the basic concepts of tables and common database objects, including privileges and security.

Who is this book for?

This book is for developers and data architects who have some exposure to databases. It is assumed that you understand the basic concepts of tables and common database objects, including privileges and security.

What you will learn

  • Compile PostgreSQL from source and understand the PostgreSQL architecture
  • Configure parameters and change default settings for a PostgreSQL server
  • Leverage the logging mechanism to identify errors and suboptimal queries
  • Use replication to scale horizontally
  • Set up backup and recovery processes
  • Tweak parameters to optimize queries and processes
  • Troubleshoot connection errors
  • Use indexes and rewrite queries to improve performance
  • Install and use PostgreSQL extensions
  • Learn how to leverage the commandline client (psql)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Mar 30, 2015
Length: 272 pages
Edition : 1st
Language : English
ISBN-13 : 9781783288601
Category :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Mar 30, 2015
Length: 272 pages
Edition : 1st
Language : English
ISBN-13 : 9781783288601
Category :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 124.97
Learning PostgreSQL
€45.99
PostgreSQL Replication, Second Edition
€36.99
PostgreSQL for Data Architects
€41.99
Total 124.97 Stars icon
Banner background image

Table of Contents

13 Chapters
1. Installing PostgreSQL Chevron down icon Chevron up icon
2. Server Architecture Chevron down icon Chevron up icon
3. PostgreSQL – Object Hierarchy and Roles Chevron down icon Chevron up icon
4. Working with Transactions Chevron down icon Chevron up icon
5. Data Modeling with SQL Power Architect Chevron down icon Chevron up icon
6. Client Tools Chevron down icon Chevron up icon
7. SQL Tuning Chevron down icon Chevron up icon
8. Server Tuning Chevron down icon Chevron up icon
9. Tools to Move Data in and out of PostgreSQL Chevron down icon Chevron up icon
10. Scaling, Replication, and Backup and Recovery Chevron down icon Chevron up icon
11. PostgreSQL – Troubleshooting Chevron down icon Chevron up icon
12. PostgreSQL – Extras Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Full star icon Full star icon 5
(2 Ratings)
5 star 100%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Geordee Naliyath Jul 07, 2015
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book gives a good overview of PostgreSQL database. It is an easy read, especially so if you have some background in any other databases. The book is logically organized from the installation steps to advanced topics. Even the process of installation is explained pretty nicely, explaining the internals of the database systems - which is pretty rare in similar books.In the past I have built quite a number of products with PostgreSQL as the backend. I could work with PostgreSQL easily with the general overview and DBMS concepts from other databases. The chapters on server architecture and object hierarchies really helped me to understand PostgreSQL in depth, especially in relation to other databases. As I mentioned earlier, and as the title implies, if you have been a database programmer, administrator or an architect, the details are presented concisely, in a logical order. I should also mention the handy queries and commands in every section, that prompts me to leave the eBook on the desktop.Next two chapters focus on the transactions, data modeling, and client tools which are more beginner-friendly. However, it helps to understand how PostgreSQL implements or supports those features. The book then takes a steep climb to more serious topics such as tuning, administration and scaling.SQL Tuning is quite enjoyable, starting with a few thought-provoking facts and strategies. Each of the tuning tips is well-demonstrated. A better (and consistent) layout would have made it a little more easier to read. The chapter on server tuning is also quite easy to read, especially if you have some similar background from any of the databases.Scaling, Replication, and Backup and Recovery is one of the longest, and toughest chapters. Overlapping concepts from all four topics are distilled here, and it would take some effort from the reader to keep everything in mind, together. A lot of details present here would require additional reading to get a fair understanding and grip on the subject. The book serves the purpose of keeping the concepts together as an overview.PostgreSQL is one database that competes well against the NoSQL data stores. The last chapter gives a glimpse of PostgreSQL’s not-so-conventional capabilities. The books covers PostgreSQL 9.3. Hope that it would be revised with 9.4 or even 9.5 in future, with more details on the newer features available to data architects.Overall a good read, and a book worth keeping.
Amazon Verified review Amazon
Denver Water - Dawson Mar 04, 2016
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Wonderful published book. Great vendor!!
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.