Reading Tables: The SELECT Query
The most common operation in a database is reading data from a database. This is almost exclusively done through the use of the SELECT
keyword.
Basic Anatomy and Working of a SELECT Query
Generally speaking, a query can be broken down into five parts:
- Operation: The first part of a query describes what is going to be done. In this case, this is the word
SELECT
followed by the names of columns combined with functions. - Data: The next part of the query is the data, which is the
FROM
keyword followed by one or more tables connected together with reserved keywords indicating what data should be scanned for filtering, selection, and calculation. - Conditional: This is a part of the query that filters the data to only rows that meet a condition usually indicated with
WHERE
. - Grouping: This is a special clause that takes the rows of a data source and assembles them together using a key created by a
GROUP BY
clause, and then calculates a value using the values from all rows with the same value. We will discuss this step more in Chapter 3, Aggregate and Window Functions. - Postprocessing: This is a part of the query that takes the results of the data and formats them by sorting and limiting the data, often using keywords such as
ORDER BY
andLIMIT
.
The steps of a SELECT
query are as follows:
- Create a data source by taking one or more tables and combining them together into one large table.
- Filter the table based on the large data source created in Step 1 by seeing which rows meet the
WHERE
clause. - Calculate values based on columns in the data source in Step 1. If there is a
GROUP BY
clause, divide the rows into groups and then calculate an aggregate statistic for each group. Otherwise, return a column or value that has been calculated by performing functions on one or more columns together. - Take the rows returned and reorganize them based on the query.
To break down those steps, let's look at a typical query and follow the logic we've described:
SELECT first_name FROM customers WHERE state='AZ' ORDER BY first_name;
The operation of this query follows a sequence:
- We start with the
customers
table. - The
customers
table is filtered to where thestate
column equals'AZ'
. - We capture the
first_name
column from the filtered table. - The
first_name
column is ordered alphabetically.
Here, we've demonstrated how a query can be broken down into a series of steps for the database to process.
We will now look at the query keywords and patterns found in a SELECT
query.
Basic Keywords in a SELECT Query
There are many keywords that we use while writing a SELECT query. Let's start by understanding the SELECT and FROM keywords.
The SELECT and FROM Statements
The most basic SELECT
query follows this pattern: SELECT…FROM <table_name>;
. This query is a way to pull data from a single table. For example, if you want to pull all the data from the products
table in our sample database, simply use this query:
SELECT * FROM products;
This query will pull all the data from a database. The *
symbol seen here is shorthand to return all the columns from a database. The semicolon operator (;
) is used to tell the computer it has reached the end of the query, much as a period is used for a normal sentence. It's important to note that the rows will be returned in no specific order. If we want to return only specific columns from a query, we can simply replace the asterisk (*) with the names of the columns we want to be separated in the order we want them to be returned in. For example, if we wanted to return the product_id
column followed by the model
column of the products
table, we would write the following query:
SELECT product_id, model FROM products;
If we wanted to return the model
column first and the product_id
column second, we would write this:
SELECT model, product_id FROM products;
In the next section, we will learn about the WHERE
clause.
The WHERE Clause
The WHERE
clause is a piece of conditional logic that limits the amount of data returned. All of the rows that are returned in a SELECT
statement with a WHERE
clause in it meet the conditions of the WHERE
clause. The WHERE
clause can usually be found after the FROM
clause of a single SELECT
statement.
The condition in the WHERE
clause is generally a Boolean statement that can either be True
or False
for every row. In the case of numeric columns, these Boolean statements can use equals, greater than, or less than operators to compare the columns against a value.
We will use an example to illustrate this. Let's say we wanted to see the model names of our products with the model year of 2014 from our sample dataset. We would write the following query:
SELECT model FROM products WHERE year=2014;
In the next section, we will learn how we can use the AND
/OR
clause in our queries.
The AND/OR Clause
The previous query had only one condition. We are often interested in multiple conditions being met at once. For this, we put multiple statements together using the AND
or OR
clause.
Now we will illustrate this with an example. Let's say we wanted to return models that were not only built-in 2014 but also have a manufacturer's suggested retail price (MSRP) of less than $1,000. We can write the following:
SELECT model FROM products WHERE year=2014 AND msrp<=1000;
Now, let's say we wanted to return any models that were released in the year 2014 or had a product type of automobile
. We would write the following query:
SELECT model FROM products WHERE year=2014 OR product_type='automobile';
When using more than one AND
/OR
condition, use parentheses to separate and position pieces of logic together. This will ensure that your query works as expected and that it is as readable as possible. For example, if we wanted to get all products with models between the years of 2014 and 2016, as well as any products that are scooters, we could write the following:
SELECT * FROM products WHERE year>2014 AND year<2016 OR product_type='scooter';
However, to clarify the WHERE
clause, it would be preferable to write the following:
SELECT * FROM products WHERE (year>2014 AND year<2016) OR product_type='scooter';
In the next section, we will learn about the IN
and NOT ON
clauses.
The IN/NOT IN Clause
As mentioned earlier, Boolean statements can use equals signs to indicate that a column must equal a certain value. However, what if you are interested in returning rows where a row has a column that can be equal to any group of values? For instance, let's say you were interested in returning all models from the years 2014, 2016, or 2019. You could write a query such as this:
SELECT model FROM products WHERE year = 2014 OR year = 2016 OR year = 2019;
However, this is long and tedious to write. Using IN
, you can instead write the following:
SELECT model FROM products WHERE year IN (2014, 2016, 2019);
This is much cleaner to write and makes it easier to understand what is going on.
Conversely, you can also use the NOT IN
clause to return all the values that are not in a list of values. For instance, if you wanted all of the products that were not produced in the years 2014, 2016, and 2019, you could write the following:
SELECT model FROM products WHERE year NOT IN (2014, 2016, 2019);
In the next section, we will learn how to use the ORDER BY
clause in our queries.
The ORDER BY Clause
As previously mentioned, SQL queries will order rows as the database finds them if they are not given more specific instructions to do otherwise. For many use cases, this is acceptable. However, you will often want to see rows in a specific order. Let's say you want to see all of the products listed by the date when they were first produced, from earliest to latest. The method for doing this in SQL would be using the ORDER BY
clause as follows:
SELECT model FROM products ORDER BY production_start_date;
If an order sequence is not explicitly mentioned, the rows will be returned in ascending order. Ascending order simply means the rows will be ordered from the smallest value to the highest value of the chosen column or columns. In the case of things such as text, this means alphabetical order. You can make the ascending order explicit by using the ASC
keyword. For our last query, this could be achieved by writing the following:
SELECT model FROM products ORDER BY production_start_date ASC;
If you want to extract data in descending order, you can use the DESC
keyword. If we wanted to fetch manufactured models ordered from newest to oldest, we would write the following:
SELECT model FROM products ORDER BY production_start_date DESC;
Also, instead of writing the name of the column you want to order by, you can refer to what number column it is in the natural order of the table. For instance, say you wanted to return all the models in the products
table ordered by product ID. You could write the following:
SELECT model FROM products ORDER BY product_id;
However, because product_id
is the first column in the table, you could instead write the following:
SELECT model FROM products ORDER BY 1;
Finally, you can order by multiple columns by adding additional columns after ORDER BY
separated with commas. For instance, let's say we wanted to order all of the rows in the table first by the year of the model from newest to oldest, and then by the MSRP from least to greatest. We would then write the following:
SELECT * FROM products ORDER BY year DESC, base_msrp ASC;
The following is the output of the preceding code:
In the next section, we will learn about the LIMIT
keyword in SQL.
The LIMIT Clause
Most tables in SQL databases tend to be quite large and, therefore, returning every single row is unnecessary. Sometimes, you may want only the first few rows. For this scenario, the LIMIT
keyword comes in handy. Let's imagine that you wanted to only get the first five products that were produced by the company. You could get this by using the following query:
SELECT model FROM products ORDER BY production_start_date LIMIT 5;
The following is the output of the code:
As a general rule, you probably want to use the LIMIT
keyword for a table or query you have not worked with.
The IS NULL/IS NOT NULL Clause
Often, some entries in a given column may be missing. This could be for a variety of reasons. Perhaps the data was not collected or not available at the time that the data was collected. Perhaps the ETL job failed to collect and load data into a column. It may also be possible that the absence of a value is representative of a certain state in the row and actually provides valuable information. Whatever the reason, we are often interested in finding rows where the data is not filled in for a certain value. In SQL, blank values are often represented by the NULL
value. For instance, in the products
table, the production_end_date
column having a NULL
value indicates that the product is still being made. In this case, if we want to list all products that are still being made, we can use the following query:
SELECT * FROM products WHERE production_end_date IS NULL;
The following is the output of the code:
If we are only interested in products that are not being produced, we can use the IS NOT NULL
clause, as shown in the following query:
SELECT * FROM products WHERE production_end_date IS NOT NULL;
The following is the output of the code:
We now will look at how to use these new keywords in the following exercise.
Exercise 1.06: Querying the salespeople Table Using Basic Keywords in a SELECT Query
In this exercise, we will create various queries using basic keywords in a SELECT
query. Let's say that after a few days at your new job, you finally get access to the company database. Today, your boss has asked you to help a sales manager who does not know SQL particularly well. The sales manager would like a couple of different lists of salespeople. First, create a list of the online usernames of the first 10 female salespeople hired, ordered from the first hired to the latest hired.
Note
For all future exercises in this book, we will be using pgAdmin 4.
Perform the following steps to complete the exercise:
- Open your favorite SQL client and connect to the
sqlda
database. - Examine the schema for the
salespeople
table from the schema drop-down list. Notice the names of the columns in the following figure: - Execute the following query to get the usernames of female salespeople sorted by their
hire_date
values, and then setLIMIT
to10
:SELECT username FROM salespeople WHERE gender= 'Female' ORDER BY hire_date LIMIT 10;
The following is the output of the preceding code:
We now have a list of usernames for female salespeople ordered from the earliest hire to the most recent hire.
Note
To access the source code for this specific section, please refer to https://packt.live/2B4qMUk.
In this exercise, we used different basic keywords in a SELECT
query to help the sales manager get the list of salespeople as per their requirements.
Activity 1.03: Querying the customers Table Using Basic Keywords in a SELECT Query
The marketing department has decided that they want to do a series of marketing campaigns to help promote a sale. To do this, they need details of all customers in New York city. The following are the steps to complete the activity:
- Open your favorite SQL client and connect to the
sqlda
database. Examine the schema for thecustomers
table from the schema drop-down list. - Write a query that pulls all emails for ZoomZoom customers in the state of Florida in alphabetical order.
- Write a query that pulls all the first names, last names and email details for ZoomZoom customers in New York City in the state of New York. They should be ordered alphabetically, with the last name followed by the first name.
- Write a query that returns all the customers with a phone number ordered by the date the customer was added to the database.
Note
The solution for this activity can be found via this link.
In this activity, we used various basic keywords in a SELECT
query and helped the marketing manager to get the data they needed for the marketing campaign.