Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Microsoft Certified Azure Data Fundamentals (Exam DP-900) Certification Guide
Microsoft Certified Azure Data Fundamentals (Exam DP-900) Certification Guide

Microsoft Certified Azure Data Fundamentals (Exam DP-900) Certification Guide: The comprehensive guide to passing the DP-900 exam on your first attempt

Arrow left icon
Profile Icon Marcelo Leite
Arrow right icon
Free Trial
Full star icon Full star icon Full star icon Full star icon Half star icon 4.7 (12 Ratings)
Paperback Nov 2022 300 pages 1st Edition
eBook
NZ$37.99 NZ$54.99
Paperback
NZ$68.99
Subscription
Free Trial
Arrow left icon
Profile Icon Marcelo Leite
Arrow right icon
Free Trial
Full star icon Full star icon Full star icon Full star icon Half star icon 4.7 (12 Ratings)
Paperback Nov 2022 300 pages 1st Edition
eBook
NZ$37.99 NZ$54.99
Paperback
NZ$68.99
Subscription
Free Trial
eBook
NZ$37.99 NZ$54.99
Paperback
NZ$68.99
Subscription
Free Trial

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Microsoft Certified Azure Data Fundamentals (Exam DP-900) Certification Guide

Understanding the Core Data Terminologies

Welcome, dear reader!

This book has been prepared based on the knowledge that you need to pass the Azure DP-900 Data Platform Fundamentals exam. So, you will find detailed use cases, hand's-on exercises, as well as sample questions and answers to help you during the exam.

This book will not only prepare you for certification but also complement the knowledge needed for planning and working in a data organization. You can look forward to learning about transactional and analytical database concepts, SQL and NoSQL, when to use each option, and the most modern tools and techniques for implementation on Azure.

Data generation and data processing have been growing exponentially in recent years. Data is being generated and processed everywhere: in information systems, cell phones, smart watches, smart TVs, city buses, subways, and cars, among others. Knowing how to capture and process this data to generate intelligence provides today’s main competitive advantage in the market.

To start understanding how these technologies and solutions work, it is necessary to know the concepts of data storage and processing, which we will cover in this introductory chapter.

By the end of this chapter, you will be able to understand the following:

  • The types of data and how to store it
  • Relational and non-relational data
  • Data Analytics
  • How to differentiate the data workloads

Understanding the core data concepts

To start, let’s understand the terminologies used in the data world so that all the following concepts are easily interpreted to be applied to technologies.

What is data?

Data is a record, also called a fact, which can be a number, text, or description used to make decisions. Data only generates intelligence when processed and then this data is called information or insights.

Data is classified into three basic formats: structured, semi-structured, and unstructured data. We will learn about them all in the following sections.

Structured data

Structured data is formatted and typically stored in a table represented by columns and rows. This data is found in relational databases, which organize their table structures in a way that creates relationships between these tables.

The following figure shows an example of a simple table with structured data:

Figure 1.1 – Example of structured data in a database

Figure 1.1 – Example of structured data in a database

In this example, the table called CUSTOMER has seven columns and six records (rows) with different values.

The CUSTOMER table could be part of a customer relationship management (CRM) database, for example, financial and enterprise resource planning (ERP), among other types of business applications.

Semi-structured data

Semi-structured data is a structure in which records have attributes such as columns but are not organized in a tabular way like structured data. One of the most used formats for semi-structured data is JavaScript Object Notation (JSON) files. The following example demonstrates the structure of a JSON file containing the registration of one customer:

## JSON FILE - Document 1 ##
{
  "CUSTOMER_ID": "10302",
  "NAME": 
  { 
    "FIRST_NAME": "Leo", 
    "LAST_NAME": "Boucher" 
  },
  "ADDRESS": 
  {
    "STREET": "54, rue Royale",
    "CITY": "Nantes",
    "ZIP_CODE": "44000",
    "COUNTRY": "France" 
   }
}

In this example, we can see that each JSON file contains a record, like the rows of the structured data table, but there are other formats of JSON and similar files that contain multiple records in the same file.

In addition to the JSON format, there is data in key-value and graph databases, which are considered semi-structured data, too.

The key-value database stores data in a related array format. These arrays have a unique identification key per record. Values written to a record can have a variety of formats, including numbers, text, and even full JSON files.

The following is an example of a key-value database:

Figure 1.2 – Example of a key-value database

Figure 1.2 – Example of a key-value database

As you can see in the preceding figure, each record can contain different attributes. They are stored in a single collection, with no predefined schema, tables, or columns, and no relationships between the entities; this differentiates the key-value database from the relational database.

The graph database is used to store data that requires complex relationships. A graph database contains nodes (object information) and edges (object relationship information). It means that the graph database predetermines what objects are and the relationships they will have with each other, but the records can contain different formats. The following is a representation of nodes and edges in a graph database of sales and deliveries:

Figure 1.3 – Example of a graph database

Figure 1.3 – Example of a graph database

The diagram demonstrates how the relations around the ORDER entity are created in a graph database, considering the CUSTOMER, LOCATION, SUPPLIER, and PRODUCT nodes in the process. It represents an interesting acceleration in terms of query processing in the database because the graph is already structured to deliver the relations faster.

Unstructured data

In addition to structured and semi-structured data, there is also unstructured data, such as audio, videos, images, or binary records without a defined organization.

This data can also be processed to generate information, but the type of storage and processing for this is different from that of structured and semi-structured data. It is common, for example, for unstructured data such as audio to be transcribed using artificial intelligence, generating a mass of semi-structured data for processing.

Now that you understand the basics of data types, let’s look at how that data is stored in a cloud environment.

How is data stored in a modern cloud environment?

Depending on the data format, structured, semi-structured, and unstructured cloud platforms have different solutions. In Azure, we can count on Azure SQL Database, Azure SQL Database for PostgreSQL, Azure Database for MySQL, and database servers installed on virtual machines, such as SQL Server on a virtual machine in Azure, to store structured data. These are called relational databases.

Semi-structured data can be stored in Azure Cosmos DB and unstructured data (such as videos and images) can be stored in Azure Blob storage in a platform called Azure Data Lake Storage, optimized for queries and processing.

These services are delivered by Azure in the following formats:

  • Infrastructure as a service (IaaS) – Databases deployed on virtual machines
  • Platform as a service (PaaS) – Managed database services, where the responsibility for managing the virtual machine and the operating system lies with Azure

For these database services to be used, they must be provisioned and configured to receive the data properly.

One of the most important aspects after provisioning a service is the access control configuration. Azure allows you to create custom access role control, but in general, we maintain at least three profiles:

  • Read-only – Users can read existing data on that service, but they cannot add new records or edit or delete them
  • Read/Write – Users can read, create, delete, and edit records
  • Owner – Higher access privilege, including the ability to manage permission for other users to use this data

With these configured profiles, you will be able to add users to the profiles to access the data storage/databases.

Let’s look at an example. You are an administrator of a CUSTOMER database, and you have the Owner profile. So, you configure access to this database for the leader of the commercial area to Read/Write, and for salespeople to Read-only.

In addition to the permissions configuration, it is important to review all network configurations, data retention, and backup patterns, among other administrative activities. These management tasks will be covered in Chapter 7, Provisioning and Configuring Relational Database Services in Azure.

In all database scenarios, we will have different access requirements, and it is important (as in the example) to accurately delimit the access level needs of each profile.

Describing a data solution

There are two types of database solutions: transactional solutions and analytical solutions. In the following sections, we will understand in detail what these solutions are and the requirements for choosing between them.

Transactional databases

Transactional databases are used by systems for basic operations: creating, reading, updating, and deleting. Transactional systems are considered the core of the informatization of business processes. With these basic operations, we can create entities such as customers, products, stores, and sales transactions, among others, to store important data.

A transactional database is commonly known as online transaction processing (OLTP) considering that this type of database serves online transactional operations between the application and the database.

For an organization, transactional databases usually have their data segmented into entities, which can be tables (or not), with or without a relationship between these entities to facilitate the correlation between this data.

For example, an e-commerce database can be structured with a table called Shopping_Cart, which represents the products that are being selected in the store during user navigation, and another called Purchases with the completed transaction records.

The process of segmenting entities in a database is called normalization, which will be covered in Chapter 3, Working with Relational Data.

The format of a normalized transactional database is optimized for transactional operations, but it is not the best format for data exploration and analysis.

The following is an example of a relational transactional database:

Figure 1.4 – Example of a relational transactional database

Figure 1.4 – Example of a relational transactional database

The preceding figure demonstrates a relational database of transactional workloads in a sales and delivery system. We can see the main entity, Orders, joined to Employees, Shippers, Customers, and Order Details, which then detail all products of this order in the relationship with the Products entity, which looks for information in the Categories and Suppliers entities.

Analytical databases

When the data solution requires a good interface for queries, explorations, and data analysis, the data storage organization is different from transactional databases. To meet this requirement, we prioritize the data aggregations and relationships for data consumption and exploration; this specialized data storage is called an analytical database.

Analytical databases use a process called online analytical processing (OLAP) and have undergone a great evolution in recent years with the emergence of data warehouses and big data platforms.

Analytical databases are constituted through a process of data ingestion, and they are responsible for processing and transforming the data into insights and information and then making this processed information available for consumption. The following steps describe this process:

  1. Data ingestion – The process responsible for connecting to transactional databases or other data sources to collect raw transaction information and include it in the analytical database
  2. Data processing – The process performed by the OLAP platform to create a data model, organize entities, perform indicator calculations, and define metrics for data consumption
  3. Data query – After the data model is loaded with the proper organization for querying, data manipulation and reporting tools can connect to the OLAP platform to perform your queries

The following diagram is an example of a structured data model in an OLAP database:

Figure 1.5 – Example of an analytical relationship

Figure 1.5 – Example of an analytical relationship

The following diagram is a simple comparison of OLTP and OLAP databases:

Figure 1.6 – Data flow between OLTP and OLAP

Figure 1.6 – Data flow between OLTP and OLAP

The preceding figure demonstrates the traditional flow of data, which is sourced and stored in transactional OLTP databases and then moved to OLAP analytical databases for data intelligence generation.

Important note

There are modern data storage platforms that aim to unite OLTP and OLAP on the same platform, but these databases, often called NewSQL, still need to mature their structures to deliver the best of transactional and analytical worlds in the same database. The industry standard is to keep transactional and analytical data structures separate.

In this section, we defined what transactional and analytical data solutions are and the characteristics of each solution. In the next section, we will detail the recommended data types and storage for each of these types.

Defining the data type and proper storage

Categorizing the data to identify its types and best solutions for your storage is an important process for a data solution, and not just for evaluating whether it is structured, unstructured, or semi-structured. In this section, you will learn about the characteristics of different types of data.

Characteristics of relational and non-relational databases

Relational databases are the most traditional and used database format, as they have an easy-to-understand design and a simple tabular data model like other simple platforms such as Excel spreadsheets. Relational databases have predefined schemas, which are the structures of their tables, containing columns, the data type of each column, and other parameters such as primary and secondary keys used in relationships.

However, relational databases with these rigid schemas can pose challenges, as presented in the following example.

Your CRM system has a database structure with a CUSTOMER table, where you intend to store customer data: CUSTOMER_ID, CUSTOMER_NAME, ADDRESS, MOBILE_PHONE, and ZIP_CODE. To do this, you start by creating a CUSTOMER table with five fields:

Figure 1.7 – Example of a CUSTOMER table in a relational database

Figure 1.7 – Example of a CUSTOMER table in a relational database

However, after setting up this table, you realize that you have clients that have more than one address and zip code, and even more than one mobile phone number. How can you solve this issue?

To face problems like this one, we can use normalization one more time. Normalization is done when there is a need to split a table (CUSTOMER, in this example) into more child tables that are correlated to the initial table.

Therefore, we can change the CUSTOMER table as follows:

Figure 1.8 – A relationship model in a transactional database

Figure 1.8 – A relationship model in a transactional database

Non-relational databases allow you to store data in its original format without having a predefined schema as in relational databases. The most common non-relational storage format is document storage, where each record in the database is an independent file. The benefit is that each file can have different and unique attributes.

On the other hand, the files being independent can present a challenge: data duplication.

Going back to our CUSTOMER entity example in a relational database, when two or more customers live at one address, the database records that relationship, and the normalized database only keeps one address record. But in a non-relational database, if two customers live at the same address, this address will be presented in the records of the first customer and the second customer as well, independently.

Let’s now analyze how this storage could be structured in a relational database, using the concept of normalization:

Figure 1.9 – Example of data structured into tables

Figure 1.9 – Example of data structured into tables

The preceding figure exemplifies the data stored within the relational model tables with the CUSTOMER, CUSTOMER_ADDRESS, and ADDRESS entities to understand the structure of a normalized table.

Now let’s analyze the same data in a CUSTOMER table, but in the format of a non-relational database:

## JSON FILE - CUSTOMER ##
{
  "CUSTOMER_ID": "0001",
  " CUSTOMER_NAME": 
  { 
    "FIRST_NAME": " MARK", 
    "LAST_NAME": " HUGGS" 
  },
  "ADDRESS": 
  {
    "STREET": "1200, Harper Str" 
   }
}
## JSON FILE – CUSTOMER2 ##
{
  "CUSTOMER_ID": "0002",
  " CUSTOMER_NAME": 
  { 
    "FIRST_NAME": " KRISTI", 
    "LAST_NAME": " LAMP" 
  },
  "ADDRESS": 
  {
    "STREET": "1200, Harper Str" 
   }
}

In the preceding example, we can see two records in a CUSTOMER table, with each record being a JSON document structured with the attributes of each customer.

Thus, we can observe that the same data can be stored in relational and non-relational structures.

Therefore, to decide between a relational or non-relational data storage solution, you must evaluate the behavior of the application or the user that will use that database, the relationships between the entities, and possible normalization processes.

Both relational and non-relational databases should be used primarily for transactional workloads. In the upcoming sections, we will understand the differences between these transactional workloads and analytical workloads.

A transactional workload

Relational and non-relational databases can be used as solutions for transactional workloads, which are the databases used to perform basic data storage operations: create, read, update, and delete (CRUD). Transactional operations must be done in sequence, with a transaction control that only confirms the conclusion of this transaction (a process called a commit) when the entire operation is successfully executed. If this does not occur, the transaction is canceled, and all processes are not performed, thus generating a process called rollback.

An important idea to help understand the difference between relational and non-relational databases is ACID, present in most database technologies. These properties are as follows:

  • Atomicity: This is the property that controls the transaction and defines whether it was successfully performed completely to commit or must be canceled by performing a rollback. Database technology should ensure atomicity.
  • Consistency: For a running transaction, it is important to evaluate consistency between the database state before receiving the data and the database state after receiving the data. For example, in a bank transfer, when funds are added to an account, those funds must have a source. Therefore, it is important to know this source and whether the fund’s source exit process has already been performed before confirming the inclusion in this new account.
  • Isolation: This property evaluates whether there are multiple executions of transactions similar to the current one and if so, it keeps the database in the same state. It then evaluates whether the execution of transactions was sequential. In the bank transfer example, if multiple transactions are sent simultaneously, it checks whether the amounts have already left the source for all transactions, or you need to review one by one, transaction per transaction.
  • Durability: This is responsible for evaluating whether a transaction remains in the committed database even if there is a failure during the process, such as a power outage or latency at the time of recording the record.

ACID properties are not unique to transactional databases; they are also found in analytic databases. At this point, the most important thing is to understand that these settings exist, and you can adjust them as per the requirements of your data solution use case.

Since we are talking about databases, let’s understand an acronym that is widely used to represent database software: DBMS.

Database management systems

Database management systems (DBMSs), which are database software, have ACID properties within their architecture, and in addition to performing these controls, they need to manage several complex situations. For example, if multiple users or systems try to access or modify database records, the database systems need to isolate transactions, perform all necessary validations quickly, and maintain the consistency of the data stored after the transaction is committed. For this, some DBMS technologies work with temporary transaction locks, so that actions are done sequentially. This lock is done during the process of an action executing in that record; for example, in an edit of a field in a table, the lock ends as soon as the commit is executed, confirming that transaction.

Some DBMSs are called distributed databases. These databases have their architecture distributed in different storage and processing locations, which can be on-premises in the company’s data center or a different data center in the cloud. Distributed database solutions are widely used to maintain consistency in databases that will serve applications in different geographic locations, but this consistency doesn’t need to be synchronous. For example, a mobile game can be played in the United States and Brazil, and the database of this game has some entities (categories, game modes, and so on) that must be shared among all players. But the transactions from the United States player do not necessarily need to appear to the player in Brazil in a real-time way; this transactional data will be synchronized from the United States to Brazil, but in an asynchronous process. Let’s understand this process next.

Eventual consistency

All transactions in distributed databases take longer to process than in undistributed databases because it is necessary to replicate the data across all nodes in this distributed system. So, to maintain an adequate replication speed, the distributed databases only synchronize the data that is needed. This is the concept of eventual consistency, which configures ACID to perform replication between the distributed nodes asynchronously, after the confirmation of the transaction on the main node of the database is created. This technique can lead to temporary inconsistencies between database nodes. Ideally, the application connected to a distributed database does not require a guarantee of data ordering. It means that the data relating to this eventual consistency may appear to users with an eventual delay as well. Distributed databases are widely used by social media platforms, for news feeds, likes, and shares, among other features.

Let’s use the following figure to understand the behavior of a database with eventual consistency:

Figure 1.10 – Diagram of an eventual consistency database

Figure 1.10 – Diagram of an eventual consistency database

The preceding diagram shows behavior that we can observe when querying information in a database with eventual consistency. Instead of fetching the ball in a sequential way, the hero who retrieved it made the query of the ball in a future frame, generating a momentary duplication of the ball. In the end, only one ball was retrieved, after the sync was done.

This is an analogy for an eventual consistency database, where queries do not need to be made on entities that are already synchronized between all replicas of the database, and sometimes, this momentary duplication happens until the asynchronous process data update is complete.

In addition to transactional, relational, or non-relational databases, we also have another data workload, the analytical workload, which we will address in the next section.

An analytical workload

The second category of data solutions is the analytical workloads. These analytical solutions are based on high-volume data processing platforms, optimized for querying and exploring, and not for CRUD transactions or with ACID properties. In analytical databases, we aggregate various data sources, such as more than one transactional database, as well as logs, files, images, videos, and everything that can generate information for a business analyst.

This raw data is processed and aggregated, thus generating summaries, trends, and predictions that can support decision-making.

An analytical workload can be based on a specific time or a sequence of dated events. In these workloads, it’s common to evaluate only the data that is relevant to the analysis. For example, if you have a sales system with a transactional database (source) with several tables recording all sales, products, categories, and customers, among others, it is important to evaluate which of these tables can be used for the analytical database (destination) and then perform the data connections.

To create an analytical database, it is necessary to perform data ingestion, a process of copying data from sources to the analytical base. For this, a technique called extract, transform, and load (ETL), or the more recent extract, load, and transform (ELT), is used. The following figure demonstrates this process with an example of a transactional database as the data source and the analytical database as the destination:

Figure 1.11 – Data flow between a transactional database and an analytical database

Figure 1.11 – Data flow between a transactional database and an analytical database

In the preceding diagram, we can see that transactional databases are storages of information systems that automate business processes. Analytical databases act on simple and advanced data analysis, using, for example, statistical models with the application of machine learning, a branch of artificial intelligence. The data ingestion process is an important process for assembling an analytical database that meets the data solution. In the next section, we will understand what data ingestion is and the different types of this ingestion.

Understanding data ingestion

Data ingestion is the process of copying operational data from data sources to organize it in an analytical database. There are two different techniques for performing this copy: batching data and online data streaming.

It is important to identify latency requirements between the time when the data is generated in the source database and the data availability in the analytical database.

Understanding batch load

When batching the data, the operation is offline. You must define the periodicity for creating the data batch load, collecting data in the data source, and then inserting it into the analytical database.

The periodicity can be hourly, daily, or even monthly, if the requirement of analysis of this data is met. Events that can trigger a batch load can be a new record on a table entity in the database, an action triggered by a user in an application, a manual trigger, and more.

An example of batch processing might be the way we get vote counts in elections. The votes are not counted one by one the moment after the voter has voted, but they are inserted in lots that are processed during election day until the completion of all charges and the definition of the results.

Advantages of batch load

Batch loads can be heavily used in data solutions, but they do not meet all requirements for data solutions. The following are two of the advantages of this ingestion technique:

  • It is the most used method by companies that have multiple transactional systems with large volumes of data. This is because due to scheduling loads, it can be made at the most convenient time, such as outside business hours when transactional servers are in lower demand.
  • You can monitor the loads to verify where you need to optimize a script or a method independently, so if you need to prioritize one specific load performance, you can manipulate your computing resources to prioritize that load.

Constraints of batch load

To continue the evaluation of the technique, it is important to understand the constraints of adopting batch loads as well:

  • There is a delay between the time of data generation on the transactional database and the availability of this data on the analytical database, which sometimes makes it impossible to follow up and immediately make a decision based on the numbers
  • The full batch of data must be completed to then begin copying, and if there is any data unavailability, inconsistent data, or network latency between transactional and analytical bases, among other situations, the batch load will fail

Batch loads can be our default data consumption for legacy databases, file repositories, and other types of data sources. But there are business requirements to consume some data in near real time, for monitoring and quick decision-making. And to meet these needs, we have another technique, called data streaming, which loads data online.

Understanding data streaming

In data-streaming-based data ingestion, there is an online connection between the data source and the analytical database, and the pieces of data are processed one by one, in events, right after their generation at this source. For example, for a sales tracking monitoring solution, sales managers need to track sales data in near real time on a dashboard for immediate decision-making. The sales transaction database is linked through a streaming load to the analytical database that receives this data, processes it, and demonstrates it on a monitoring dashboard.

Another example could be a stock exchange and its real-time stock tracking panels. These dashboards receive processed information from purchase and sale transaction data for stock papers in a data stream. See the following figure with the data flow in this scenario:

Figure 1.12 – Stock market example diagram

Figure 1.12 – Stock market example diagram

The load on data streaming is not always done online; it can also be done at intervals that load a portion of data. Data streaming is a continuous window of data ingestion between the source and the destination, while in the batch load, each batch opens and closes the connection to the process.

Let us now evaluate the advantages and disadvantages of the data streaming technique.

Advantages of data streaming

The advantages are listed as follows:

  • The delay between data creation and analytical processing can be minimal
  • The latency between the source and the target in the order of seconds or milliseconds
  • Analytical solutions can demonstrate both past data and performance trends, which assists in immediate decision-making while events are happening

Constraints of streaming data load

The disadvantages are listed as follows:

  • Most transactional database technologies do not have a native streaming data export technology, so you need to implement this technique through manual control of what has already been ingested and what has not yet been ingested. This generates great complexity.
  • The size of each event is usually small to avoid having a very robust infrastructure to maintain this event's queue during the streaming. This makes it impossible to ingest large files, videos, audio, and photos, among others. These loads are often best implemented in batch loads.

In summary, we typically use batch data loads for the most of the structuring operations of the analytical base, the ingestion of the largest volumes of data, and unstructured data.

To understand in practice how these concepts are applied, let’s now evaluate a case study of a complete data solution.

Case study

Webshoes is a fictitious sales company of shoes and accessories that is being created. The company’s business areas have defined that Webshoes will have an online store and that the store will need to have personalized experiences. The requirements that the business areas have passed to the project development team are as follows:

  • Online store – The online store should have a simple catalog with the 12 different products of the brand
  • Smart banner – If the customer clicks on a product, similar products should appear in a Recommended banner, with products that have the same characteristics as the one selected, but only products that the customer has not purchased yet
  • Sales conversion messages – If the customer does not complete the sale and has logged into the portal, the online store should contact the customer via email and a message on their cell phone later, with the triggering of a few messages created for conversion of the sale

By analyzing these business requirements, we can do the following technical decomposition to select the appropriate data storage:

  • Online store – A repository to store the product catalog, a repository to register the sales through the shopping cart, and a repository to store customer login
  • Smart banner – Depending on the customer and product selected, a near real-time interaction of banner customization
  • Sales conversion messages – Will be processed after the customer leaves the online store (closing their login session) and depends on their actions while browsing the website and purchase history

Now, with the knowledge gained in this chapter, can you help me to select suitable storage types for each requirement?

Come on, let’s go! Here are the solutions:

  • Online storeTransactional workload. A SQL relational or NoSQL database can assist in this scenario very well, as it will have product entities, customers, login information, and shopping carts, among others, already related in the database.
  • Smart bannerAnalytical workload. For near real-time processing, data streaming is required, capturing the behavior of the client and crossing it with the other historical data. In this case, an analytical base can process the information and return the application/banner to the appropriate message for customization.
  • Sales conversion messagesAnalytical workload. In this case, the customer will have left the store, and we do not need to work with data streaming but rather a batch load of data. It is important to evaluate with the business area how long it is optimal to send messages to target customers, and the analytical base will process the information, generating the message list to be fired.

Therefore, each use case can define a different data workload type, which influences our database decision. In the next chapters, we will detail the Azure solutions for SQL transactional databases, NoSQL, and analytical databases, and the understanding of the different use cases will be simpler for sure.

Summary

In this chapter, we reviewed the core data concepts about data storage and processing, the different data types, and data solutions. We went through the explanation of relational, non-relatable, transactional, and analytical data, their particularities, and application cases.

Now you know how to differentiate a transactional database from an application and an analytical database. In the following chapters, we will go into the details of each of these workloads and of the Azure services that are implemented for this. But before we detail these structures, in the next chapter, we will understand the different roles and responsibilities in a data domain.

Sample questions and answers

Let’s evaluate some sample questions related to the content of this chapter:

  1. What type of workload is an OLAP model?
    1. Analytical workload
    2. Transactional workload
    3. Relational database
  2. How is data in a relational table organized?
    1. Rows and columns
    2. Header and footer
    3. Pages and paragraphs
    4. Connections and arrows
  3. Which of the following is an example of unstructured data?
    1. Audio and video files
    2. An Employee table with EmployeeID, EmployeeName, and EmployeeDesignation columns
    3. A table within a relational database
    4. A stored procedure in a database
  4. What type of cloud service is a database deployed in a virtual machine?
    1. PaaS
    2. IaaS
    3. SaaS
    4. DaaS

Answer key

1-A 2-A 3-A 4-B

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Get the knowledge you need to pass the DP-900 exam on your first attempt
  • Gain fundamental knowledge of the core concepts of working with data in Azure cloud data services
  • Learn through a practical approach and test yourself with mock exams at the end of the book

Description

Passing the DP-900 Microsoft Azure Data Fundamentals exam opens the door to a myriad of opportunities for working with data services in the cloud. But it is not an easy exam and you'll need a guide to set you up for success and prepare you for a career in Microsoft Azure. Absolutely everything you need to pass the DP-900 exam is covered in this concise handbook. After an introductory chapter covering the core terms and concepts, you'll go through the various roles related to working with data in the cloud and learn the similarities and differences between relational and non-relational databases. This foundational knowledge is crucial, as you'll learn how to provision and deploy Azure's relational and non-relational services in detail later in the book. You'll also gain an understanding of how to glean insights with data analytics at both small and large scales, and how to visualize your insights with Power BI. Once you reach the end of the book, you'll be able to test your knowledge with practice tests with detailed explanations of the correct answers. By the end of this book, you will be armed with the knowledge and confidence to not only pass the DP-900 exam but also have a solid foundation from which to embark on a career in Azure data services.

Who is this book for?

This book is for data engineers, database administrators, or aspiring data professionals getting ready to take the DP-900 exam. It will also be helpful for those looking for a bit of guidance on how to be better equipped for Azure-related job roles such as Azure database administrator or Azure data engineer. A basic understanding of core data concepts and relational and non-relational data will help you make the most out of this book, but they're not a pre-requisite.

What you will learn

  • Explore the concepts of IaaS and PaaS database services on Azure
  • Query, insert, update, and delete relational data using SQL
  • Explore the concepts of data warehouses in Azure
  • Perform data analytics with an Azure Synapse Analytics workspace
  • Upload and retrieve data in Azure Cosmos DB and Azure HDInsight
  • Provision and deploy non-relational data services in Azure
  • Contextualize the knowledge with real-life use cases
  • Test your progress with a mock exam

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Nov 25, 2022
Length: 300 pages
Edition : 1st
Language : English
ISBN-13 : 9781803240633
Vendor :
Microsoft
Languages :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Nov 25, 2022
Length: 300 pages
Edition : 1st
Language : English
ISBN-13 : 9781803240633
Vendor :
Microsoft
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just NZ$7 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just NZ$7 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total NZ$ 288.97
Azure Data Engineer Associate Certification Guide
NZ$116.99
Microsoft Certified Azure Data Fundamentals (Exam DP-900) Certification Guide
NZ$68.99
Microsoft Azure Fundamentals Certification and Beyond
NZ$102.99
Total NZ$ 288.97 Stars icon
Banner background image

Table of Contents

20 Chapters
Part 1: Core Data Concepts Chevron down icon Chevron up icon
Chapter 1: Understanding the Core Data Terminologies Chevron down icon Chevron up icon
Chapter 2: Exploring the Roles and Responsibilities in Data Domain Chevron down icon Chevron up icon
Chapter 3: Working with Relational Data Chevron down icon Chevron up icon
Chapter 4: Working with Non-Relational Data Chevron down icon Chevron up icon
Chapter 5: Exploring Data Analytics Concepts Chevron down icon Chevron up icon
Part 2: Relational Data in Azure Chevron down icon Chevron up icon
Chapter 6: Integrating Relational Data on Azure Chevron down icon Chevron up icon
Chapter 7: Provisioning and Configuring Relational Database Services in Azure Chevron down icon Chevron up icon
Chapter 8: Querying Relational Data in Azure Chevron down icon Chevron up icon
Part 3: Non-Relational Data in Azure Chevron down icon Chevron up icon
Chapter 9: Exploring Non-Relational Data Offerings in Azure Chevron down icon Chevron up icon
Chapter 10: Provisioning and Configuring Non-Relational Data Services in Azure Chevron down icon Chevron up icon
Part 4: Analytics Workload on Azure Chevron down icon Chevron up icon
Chapter 11: Components of a Modern Data Warehouse Chevron down icon Chevron up icon
Chapter 12: Provisioning and Configuring Large-Scale Data Analytics in Azure Chevron down icon Chevron up icon
Chapter 13: Working with Power BI Chevron down icon Chevron up icon
Chapter 14: DP-900 Mock Exam Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.7
(12 Ratings)
5 star 91.7%
4 star 0%
3 star 0%
2 star 0%
1 star 8.3%
Filter icon Filter
Top Reviews

Filter reviews by




Doru Feb 13, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Having read Part 1, I can only say I am very pleased with the depth of knowledge imparted. The book follows the skills examined and provides explanations clear for anyone, no matter of their exposure to the technology.
Subscriber review Packt
Om S Dec 27, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I recently got the "Microsoft Certified Azure Data Fundamentals Exam DP-900 Study Guide" and I have to say that I am extremely impressed with the quality of this book. As someone who is preparing for the DP-900 exam, I found the book to be an invaluable resource.The book is well-organized and easy to follow, making it easy to study and retain the material. It covers all of the key concepts and technologies that are covered in the exam, including working with working with relational data, non-relational data, data analytics, components of a modern data warehouse, and power BI. The explanations are clear and concise, and the accompanying question and answers and summery really helped to solidify my understanding of the material.I also really appreciated the practice questions and exam tips included in the book. They were a great way to test my knowledge and helped me to identify areas where I needed to focus my studies.Overall, I highly recommend the "Microsoft Certified Azure Data Fundamentals Exam DP-900 Study Guide" to anyone who is preparing for the DP-900 exam. It is a comprehensive and well-written resource that is sure to help you succeed on the exam.
Amazon Verified review Amazon
Rayane Leite Dec 04, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Highly recommended!!The book is very good for anyone starting to study the world of data or who are seeking the fundamental knowledge for successful projects using data. The first chapters are dedicated to the concepts necessary to understand modern technologies, the possibilities and techniques of databases and data analytics.The remaining chapters focus on hands-on exercises on Azure that help materialize the knowledge and give you the experience you need to get started with projects.Very good!I really liked the analogies used and the simple language, I feel prepared to do the DP-900 certification and embark on the data career.
Amazon Verified review Amazon
Marcelo Castro Nov 26, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I just finished reading this book and since I read it after taking my certification exam I can say that this is a great asset and it will definitely help you get your certification.It is well-written and it leverages many real world scenarios and use cases to make it easier to visualize all that is needed for the exam. Furthermore, the book covers all areas that Microsoft requires for the exam, they are split to make it easier to understand.I strongly believe that if use this book and some free Microsoft Learn courses, you will be well-prepared for DP-900 exam.
Amazon Verified review Amazon
Vincent Jordan Dec 01, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
If you are taking the DP 900 I highly recommend this certification guidebook! Now that I have passed the DP900 I plan to keep this book in my personal library for those times when I have data conversations and need a refresher. Again, I highly recommend this certification guide and the author did an awesome job!
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.