Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
PostgreSQL for Data Architects
PostgreSQL for Data Architects

PostgreSQL for Data Architects: Discover how to design, develop, and maintain your database application effectively with PostgreSQL

eBook
$29.99 $43.99
Paperback
$54.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

PostgreSQL for Data Architects

Chapter 2. Server Architecture

In the previous chapter, we saw that starting a PostgreSQL cluster kicks off a few processes. The processes manage quite a bit of file I/O, manage other processes, and deal with memory allocation and a lot of other work. In this chapter, we will focus on the processes and their roles, memory management, and how they work together.

We will start by listing the processes once again. No client is connected to any of the databases yet:

[root@MyCentOS ~]# ps f -U postgres
  PID TTY      STAT   TIME COMMAND
 1918 tty1     S      0:00 /usr/local/pgsql/bin/postgres
 1920 ?        Ss     0:00  \_ postgres: checkpointer process   
 1921 ?        Ss     0:00  \_ postgres: writer process     
 1922 ?        Ss     0:00  \_ postgres: wal writer process   
 1923 ?        Ss     0:00  \_ postgres: autovacuum launcher process 
 1924 ?        Ss     0:00  \_ postgres: stats collector process   

Starting with the daemon process

The first process that is started when we start PostgreSQL is /usr/local/pgsql/bin/postgres. This process has quite a few responsibilities such as performing recovery, initializing shared data structures/memory space, and kicking off the mandatory and optional processes. These processes are also referred to as utility processes and include bgwriter, checkpointer, autovacuum launcher, log writer, stats collector process, and so on. The daemon process also listens for connection requests, receives requests for connections from clients, and spawns server processes for the client. It's obvious that the daemon itself is a mandatory process that should be running for a user to connect to the database.

Let's focus on the user connecting-issuing-commands scenario and other pieces should fall in place. The following diagram walks you through the process of how the daemon process receives a connection request and starts (forks) a backend process. The backend...

Understanding the shared buffer

When there are thousands of users trying to read/write data to many different tables, reading from the directories/files (which we saw getting created when we installed PostgreSQL and created a database with a couple of tables) will result in a miserably non-scalable system. The reads and writes will result in searching for many files, opening these files, using fseek() for specific data records, locking, editing, and unlocking. To make this a lot more scalable and faster, the concept of shared buffers (memory area) is introduced. Now, the backend processes are no longer reading from the files and writing to the files, but dealing with buffers or RAM, with significant improvement in performance. The amount of memory to be allocated is decided by the shared_buffers parameter in postgresql.conf. This fixed-size block of shared memory is allocated when the server is started.

It's not this memory chunk alone that is responsible for improving the response...

Checkpoint

Checkpoint is a mandatory process. To understand this, let's discuss blocks. PostgreSQL always reads and writes data in blocks. Consider the emp table. It has just one record. The data in this record should add up to a few bytes; we have the value 1 in the column id, and the value Newname in the column first_name. However, this table will consume 8K in the disk because PostgreSQL works with 8K blocks. A block is also referred to as a page. It is easy to verify that PostgreSQL uses blocks. Ensure that our table has just one record as follows:

SELECT * FROM emp;
 id | first_name 
----+------------
  1 | Newname
(1 row)

Then, we find the filename:

SELECT pg_relation_filepath('emp');
 pg_relation_filepath 
----------------------
 base/24741/24742
(1 row)

Now, we check the size of the file:

\! ls -l /pgdata/9.3/base/24741/24742
-rw-------. 1 postgres postgres 8192 Nov 15 11:33 /pgdata/9.3/base/24741/24742

8192 bytes = 8K. So, a table with just one record takes up 8K.

Let&apos...

WAL and the WAL writer process

When we make changes to the data, the changes are not written to the data files immediately, as mentioned before (probably many times). Changes are made to the blocks in the buffer and records of these changes are written to the WAL buffer (as soon as changes to data are made). The changes are flushed to the WAL segments when the changes are committed.

In the pg_xlog directory, the WAL segments are each 16 MB in size:

[postgres@MyCentOS pg_xlog]$ pwd
/pgdata/9.3/pg_xlog
[postgres@MyCentOS pg_xlog]$ ls -alrt
total 16396
drwx------.  2 postgres postgres     4096 Oct 13 13:23 archive_status
drwx------.  3 postgres postgres     4096 Oct 13 13:23 .
drwx------. 15 postgres postgres     4096 Nov 15 20:17 ..
-rw-------.  1 postgres postgres 16777216 Nov 15 20:17 000000010000000000000001

We can find out the segment PostgreSQL is writing to now using the pg_current_xlog_location function:

[postgres@MyCentOS pg_xlog]$ psql
psql (9.3.0)
Type "help" for help.

postgres...

The background writer

The background writer is responsible for writing to disk specific dirty buffers based on an algorithm, whereas checkpointer writes all dirty buffers. The process takes into consideration shared memory usage data as well as information about which blocks have been used/accessed recently (least recently used). The primary objective of this process is to ensure that free buffers are available for use. The relevant parameters are as follows:

#bgwriter_delay = 200ms                 # 10-10000ms between rounds
#bgwriter_lru_maxpages = 100            # 0-1000 max buffers written/round
#bgwriter_lru_multiplier = 2.0          # 0-10.0 multiplier on buffers scanned/round

As we can see, the default value for delay is 200 milliseconds. This parameter specifies how long the process should wait between successive executions. The bgwriter_lru_maxpages parameter specifies the maximum number of buffers that will be written by the process in each iteration. The third parameter is also...

Starting with the daemon process


The first process that is started when we start PostgreSQL is /usr/local/pgsql/bin/postgres. This process has quite a few responsibilities such as performing recovery, initializing shared data structures/memory space, and kicking off the mandatory and optional processes. These processes are also referred to as utility processes and include bgwriter, checkpointer, autovacuum launcher, log writer, stats collector process, and so on. The daemon process also listens for connection requests, receives requests for connections from clients, and spawns server processes for the client. It's obvious that the daemon itself is a mandatory process that should be running for a user to connect to the database.

Let's focus on the user connecting-issuing-commands scenario and other pieces should fall in place. The following diagram walks you through the process of how the daemon process receives a connection request and starts (forks) a backend process. The backend process...

Understanding the shared buffer


When there are thousands of users trying to read/write data to many different tables, reading from the directories/files (which we saw getting created when we installed PostgreSQL and created a database with a couple of tables) will result in a miserably non-scalable system. The reads and writes will result in searching for many files, opening these files, using fseek() for specific data records, locking, editing, and unlocking. To make this a lot more scalable and faster, the concept of shared buffers (memory area) is introduced. Now, the backend processes are no longer reading from the files and writing to the files, but dealing with buffers or RAM, with significant improvement in performance. The amount of memory to be allocated is decided by the shared_buffers parameter in postgresql.conf. This fixed-size block of shared memory is allocated when the server is started.

It's not this memory chunk alone that is responsible for improving the response times...

Checkpoint


Checkpoint is a mandatory process. To understand this, let's discuss blocks. PostgreSQL always reads and writes data in blocks. Consider the emp table. It has just one record. The data in this record should add up to a few bytes; we have the value 1 in the column id, and the value Newname in the column first_name. However, this table will consume 8K in the disk because PostgreSQL works with 8K blocks. A block is also referred to as a page. It is easy to verify that PostgreSQL uses blocks. Ensure that our table has just one record as follows:

SELECT * FROM emp;
 id | first_name 
----+------------
  1 | Newname
(1 row)

Then, we find the filename:

SELECT pg_relation_filepath('emp');
 pg_relation_filepath 
----------------------
 base/24741/24742
(1 row)

Now, we check the size of the file:

\! ls -l /pgdata/9.3/base/24741/24742
-rw-------. 1 postgres postgres 8192 Nov 15 11:33 /pgdata/9.3/base/24741/24742

8192 bytes = 8K. So, a table with just one record takes up 8K.

Let's try inserting some...

WAL and the WAL writer process


When we make changes to the data, the changes are not written to the data files immediately, as mentioned before (probably many times). Changes are made to the blocks in the buffer and records of these changes are written to the WAL buffer (as soon as changes to data are made). The changes are flushed to the WAL segments when the changes are committed.

In the pg_xlog directory, the WAL segments are each 16 MB in size:

[postgres@MyCentOS pg_xlog]$ pwd
/pgdata/9.3/pg_xlog
[postgres@MyCentOS pg_xlog]$ ls -alrt
total 16396
drwx------.  2 postgres postgres     4096 Oct 13 13:23 archive_status
drwx------.  3 postgres postgres     4096 Oct 13 13:23 .
drwx------. 15 postgres postgres     4096 Nov 15 20:17 ..
-rw-------.  1 postgres postgres 16777216 Nov 15 20:17 000000010000000000000001

We can find out the segment PostgreSQL is writing to now using the pg_current_xlog_location function:

[postgres@MyCentOS pg_xlog]$ psql
psql (9.3.0)
Type "help" for help.

postgres=# SELECT...

The background writer


The background writer is responsible for writing to disk specific dirty buffers based on an algorithm, whereas checkpointer writes all dirty buffers. The process takes into consideration shared memory usage data as well as information about which blocks have been used/accessed recently (least recently used). The primary objective of this process is to ensure that free buffers are available for use. The relevant parameters are as follows:

#bgwriter_delay = 200ms                 # 10-10000ms between rounds
#bgwriter_lru_maxpages = 100            # 0-1000 max buffers written/round
#bgwriter_lru_multiplier = 2.0          # 0-10.0 multiplier on buffers scanned/round

As we can see, the default value for delay is 200 milliseconds. This parameter specifies how long the process should wait between successive executions. The bgwriter_lru_maxpages parameter specifies the maximum number of buffers that will be written by the process in each iteration. The third parameter is also...

The autovacuum launcher process


This is an optional process. There is a parameter called autovacuum in postgresql.conf, with ON as default value. This process automates the execution of vacuum and analyzes commands based on a few parameters. To understand autovacuum, first, we have to understand vacuum.

Assume that we delete a few records from a table. PostgreSQL does not immediately remove the deleted tuples from the data files. These are marked as deleted. Similarly, when a record is updated, it's roughly equivalent to one delete and one insert. The previous version of the record continues to be in the data file. Each update of a database row generates a new version of the row. The reason is simple: there can be active transactions, which want to see the data as it was before. As a result of this activity, there will be a lot of unusable space in the data files. After some time, these dead records become irrelevant as there are no transactions still around to see the old data. However,...

The logging process


This is an optional process and the default setting is off. We have to set the logging_collector parameter to on to start this process:

cd $PGDATA

Edit the postgresql.conf file and make a few changes:

log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
log_min_duration_statement = 0

[postgres@MyCentOS 9.3]$ pg_ctl restart
waiting for server to shut down.... done
server stopped
server starting
[postgres@MyCentOS 9.3]$ ps f -U postgres
  PID TTY      STAT   TIME COMMAND
 2581 pts/2    S      0:00 -bash
 3201 pts/2    R+     0:00  \_ ps f -U postgres
 2218 pts/1    S+     0:00 -bash
 3186 pts/2    S      0:00 /usr/local/pgsql/bin/postgres
 3187 ?        Ss     0:00  \_ postgres: logger process     
 3189 ?        Ss     0:00  \_ postgres: checkpointer process   
 3190 ?        Ss     0:00  \_ postgres: writer process     
 3191 ?        Ss     0:00  \_ postgres: wal writer process   
 3192 ?        Ss     0:00  \_ postgres: autovacuum launcher process...
Left arrow icon Right arrow icon

Description

This book is for developers and data architects who have some exposure to databases. It is assumed that you understand the basic concepts of tables and common database objects, including privileges and security.

Who is this book for?

This book is for developers and data architects who have some exposure to databases. It is assumed that you understand the basic concepts of tables and common database objects, including privileges and security.

What you will learn

  • Compile PostgreSQL from source and understand the PostgreSQL architecture
  • Configure parameters and change default settings for a PostgreSQL server
  • Leverage the logging mechanism to identify errors and suboptimal queries
  • Use replication to scale horizontally
  • Set up backup and recovery processes
  • Tweak parameters to optimize queries and processes
  • Troubleshoot connection errors
  • Use indexes and rewrite queries to improve performance
  • Install and use PostgreSQL extensions
  • Learn how to leverage the commandline client (psql)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Mar 30, 2015
Length: 272 pages
Edition : 1st
Language : English
ISBN-13 : 9781783288618
Category :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Mar 30, 2015
Length: 272 pages
Edition : 1st
Language : English
ISBN-13 : 9781783288618
Category :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 164.97
Learning PostgreSQL
$60.99
PostgreSQL Replication, Second Edition
$48.99
PostgreSQL for Data Architects
$54.99
Total $ 164.97 Stars icon
Banner background image

Table of Contents

13 Chapters
1. Installing PostgreSQL Chevron down icon Chevron up icon
2. Server Architecture Chevron down icon Chevron up icon
3. PostgreSQL – Object Hierarchy and Roles Chevron down icon Chevron up icon
4. Working with Transactions Chevron down icon Chevron up icon
5. Data Modeling with SQL Power Architect Chevron down icon Chevron up icon
6. Client Tools Chevron down icon Chevron up icon
7. SQL Tuning Chevron down icon Chevron up icon
8. Server Tuning Chevron down icon Chevron up icon
9. Tools to Move Data in and out of PostgreSQL Chevron down icon Chevron up icon
10. Scaling, Replication, and Backup and Recovery Chevron down icon Chevron up icon
11. PostgreSQL – Troubleshooting Chevron down icon Chevron up icon
12. PostgreSQL – Extras Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Full star icon Full star icon 5
(2 Ratings)
5 star 100%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Geordee Naliyath Jul 07, 2015
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book gives a good overview of PostgreSQL database. It is an easy read, especially so if you have some background in any other databases. The book is logically organized from the installation steps to advanced topics. Even the process of installation is explained pretty nicely, explaining the internals of the database systems - which is pretty rare in similar books.In the past I have built quite a number of products with PostgreSQL as the backend. I could work with PostgreSQL easily with the general overview and DBMS concepts from other databases. The chapters on server architecture and object hierarchies really helped me to understand PostgreSQL in depth, especially in relation to other databases. As I mentioned earlier, and as the title implies, if you have been a database programmer, administrator or an architect, the details are presented concisely, in a logical order. I should also mention the handy queries and commands in every section, that prompts me to leave the eBook on the desktop.Next two chapters focus on the transactions, data modeling, and client tools which are more beginner-friendly. However, it helps to understand how PostgreSQL implements or supports those features. The book then takes a steep climb to more serious topics such as tuning, administration and scaling.SQL Tuning is quite enjoyable, starting with a few thought-provoking facts and strategies. Each of the tuning tips is well-demonstrated. A better (and consistent) layout would have made it a little more easier to read. The chapter on server tuning is also quite easy to read, especially if you have some similar background from any of the databases.Scaling, Replication, and Backup and Recovery is one of the longest, and toughest chapters. Overlapping concepts from all four topics are distilled here, and it would take some effort from the reader to keep everything in mind, together. A lot of details present here would require additional reading to get a fair understanding and grip on the subject. The book serves the purpose of keeping the concepts together as an overview.PostgreSQL is one database that competes well against the NoSQL data stores. The last chapter gives a glimpse of PostgreSQL’s not-so-conventional capabilities. The books covers PostgreSQL 9.3. Hope that it would be revised with 9.4 or even 9.5 in future, with more details on the newer features available to data architects.Overall a good read, and a book worth keeping.
Amazon Verified review Amazon
Denver Water - Dawson Mar 04, 2016
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Wonderful published book. Great vendor!!
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.