Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Data Engineering with Alteryx
Data Engineering with Alteryx

Data Engineering with Alteryx: Helping data engineers apply DataOps practices with Alteryx

eBook
€19.99 €28.99
Paperback
€35.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Data Engineering with Alteryx

Chapter 1: Getting Started with Alteryx

In the current century, one of the core functions of all companies is to retrieve data from its source and get it into the hands of your company's analysts, decision makers, and data scientists. This data flow allows businesses to make decisions, supported by empirical evidence, quickly and with confidence. The capability also gives businesses a robust process for delivering the data flows with a significant advantage over their competition.

Creating robust data flows requires that end users find the datasets and trust the raw data source. End users need to know what transformations were applied to the dataset to build trust. They also need to know who to talk to if their needs change. Alteryx gives data engineers and end users a single unified place to create data pipelines and discover data resources. It also provides the context that gives end users confidence when making decisions based on any of those datasets.

This book will describe how to build and deploy data engineering pipelines with Alteryx. We will learn how to examine to apply DataOps methods to build high-quality datasets. We will also learn the techniques required for monitoring the pipelines when they are running in an automated production environment.

This chapter will introduce the Alteryx platform as a whole and the major software components within the platform. Then, we will see how those components fit together to create a data pipeline, and how Alteryx can improve your development speed and build confidence throughout your data team.

Once we understand the Alteryx platform, we will look into Alteryx Designer and familiarize ourselves with the interface. Next, we will set a baseline for building an Alteryx workflow and use Alteryx to create standalone data pipelines.

Next, we will investigate the server-based components of the Alteryx platform, Alteryx Server, and Alteryx Connect. We will learn how Alteryx Server can automate the pipeline execution, scale the efforts and work of your data engineering team, and serve as a central location where workflows are stored and shared. We will also learn how Alteryx Connect is used to find data sources throughout an enterprise, build user confidence with data cataloging, and build trust in the data sources by maintaining the lineage.

Finally, we will see how this book can help your data engineering work and link each part of the data engineering pipeline with the Alteryx platform applications.

In this chapter, we will cover the following topics:

  • Understanding the Alteryx platform
  • Using Alteryx Designer
  • Leveraging Alteryx Server and Alteryx Connect
  • Using this book in your data engineering work

Understanding the Alteryx platform

The Alteryx platform is the Alteryx software suite that combines processing, managing datasets, and analysis. While a lot of focus in the Alteryx community tends to be on the business user analyst, a data engineer's benefits are extensive. Alteryx as a whole allows for both code-free and code-friendly workflow development, giving it the flexibility to quickly transform a dataset while having the depth to make complex transformations using whatever tool or process makes the most sense.

In this section, we will learn about the following:

  • What software is offered in the Alteryx platform
  • How Alteryx can be used with an example business case

The software that makes the Alteryx platform

The Alteryx platform is a collection of four software products:

  • Alteryx Designer: Designer is the desktop workflow creation tool. It is a Graphical User Interface (GUI) for building workflows that interact with the Alteryx Engine, which executes the workflow when run. Designer also enables automated and guided Machine Learning (ML) with the Intelligence Suite add-on. This is in addition to building your own ML data pipelines, and we will discuss both methods in Chapter 8, Beginning Advanced Analytics.
  • Alteryx Server: We publish a workflow to Server when created to run the workflows on-demand or on a time-based schedule. It also holds a simple version history for referencing which version of a workflow ran a particular transformation. Finally, Server makes provision for the sharing of workflows between different users throughout a company.
  • Alteryx Connect: The Connect catalog allows users to find and trace datasets and lineage. The population process is completed by running the Connect Apps, a series of Alteryx workflows with a user input for parameters that identify the different locations where the datasets reside. These apps will extract all the data catalog information and upload it to the connect database for exploration in the web browser. When the source data doesn't contain context information such as field descriptions, you can add them manually to enrich the catalog.
  • Alteryx Promote: Promote is a data science model management tool. It provides a way to manage a model's life cycle, monitor performance and model drift, orchestrate model iterations' movements between environments, and provide an API endpoint to deploy the models to other applications.

    Important Note

    Alteryx software products have Alteryx as part of the name. Generally, the name Alteryx is dropped from the name in discussions and that will often happen throughout this book.

    Because the data science deployment falls into Machine Learning Operations (MLOps), it isn't a core component of the Data Operations (DataOps) process. Thus, while you might have some interactions with the model deployment as a data engineer, we will be focusing on extracting and processing the raw datasets rather than the model management and implementation that Promote supports. As such, the Promote software will be beyond the scope of this book.

Now that we know what the Alteryx platform is and what software is available, we can look at how Alteryx will fit into a business case.

Using the Alteryx platform in a business scenario

The Alteryx platform is all about creating a process where iteration is easy. All too often, when integrating a new data source, you won't always know the answer to the following questions until late in the process:

  • What is the final form of that data?
  • What transformations need to take place?
  • Are there additional resources that are required to enrich the data source?

Trying to develop a workflow to answer these questions with a pipeline focused on writing code, common areas of frustration appear when trying to iterate through ideas and tests. These frustrations include the following:

  • Knowing when to refactor a part of the pipeline
  • Identifying exactly when a particular transformation happens in the pipeline
  • Debugging the process for logical errors where the error is in the data output but not caused by a coding error

The visual nature of Alteryx lets you quickly think through the pipeline, and see what transformation is happening where. When errors appear in the process, the tool will highlight the error in context.

It is also easy to trace specific records back through the process visually. This tracing renders straightforward the process of identifying when a transformation takes place that results in a logical error.

How Alteryx benefits data engineers

The Alteryx platform's key benefits to a data engineer arise in three major cases:

  • Speed of development
  • Iterative workflow development
  • Self-documentation (which you can supplement with additional information)

These benefits fall under an overarching theme of making it easier to get new datasets to the end user. For example, suppose the development time, debugging, and documentation can all be made simpler. In that case, responding to requests from analysts and data scientists becomes something to take pride in rather than dreading.

Speed of development

The Alteryx platform supports the speed of development with two fundamental features:

  • The visual development process
  • The performance of the Alteryx Engine

The visual development process helps a data engineer by allowing them to lay out the pipeline onto the Alteryx canvas. Of course, you can create the pipeline from scratch, which is often the case if little information about the end destination is available. Still, you can build the pipeline from a data flow chart with the principal steps preplanned.

This translation process uses the transformation tools that provide the building blocks for a workflow. By aligning those tools with a logical grid across (or down) the Designer canvas, you can see each step in the pipeline. Such an arrangement allows you to focus on each step to identify when the data might diverge for a particular process and add any intermediate checks.

The other benefit is speed – the fact that the Alteryx engine performs the operations quickly. One of the reasons for this performance is that transformations take place in memory and with the minimum memory footprint required for any particular change.

For example, when a column with millions of records has a formula applied, only the cells (the row and column combination) that are processed are needed in memory. The result is that the transformations that Alteryx does are fast.

The location of the dataset is often the only limit to Alteryx's in-memory performance. For example, opening a large Snowflake or Microsoft SQL Server table in Alteryx can become bottlenecked by network transfers. In these cases, the InDB tools can perform calculations on the remote database to minimize the problem and reduce the volume of data transferred locally.

Iterative development workflow

The next significant benefit is the inherent iterative workflow that Alteryx development uses. When building a data pipeline, the sequencing of the transformations is vital to the dataset result.

This iterative process allows you to do the following:

  • Check what the data looks like using browse tools and browse anywhere samples.
  • Make modifications and establish the impact that those modifications create.
  • Backtrack along the pipeline and insert new changes.

The iterative process allows the data engineer to test changes quickly without worrying about how long it will take to compile or if you haven't noticed a typo in the SQL script.

Self-documenting with additional supplementing of specific notes

Each tool in Alteryx will automatically document itself with annotations. For example, a formula tool will list the calculations taking place.

This self-documenting provides a good starting point for the documentation of the overall workflow. You can supplement these annotations by adding additional context. The further context can be renaming specific tools to reflect what they are doing (which also appears in the workflow logs). Add comment sections to the canvas or grouping processes with tool containers.

We now understand why the Alteryx platform is a powerful tool for data engineering and some of its key benefits. Next, we need to gain a deeper insight into the benefits that using Alteryx Designer can bring to your data engineering development.

Using Alteryx Designer

We have covered at a high level what the benefits of the Alteryx platform are. This section will look a bit closer at Alteryx Designer and why it is suitable for data engineering.

As mentioned previously, Designer is the desktop workflow creation tool in the Alteryx platform. You create the data pipelines and perform advanced analytics in Designer. Designer can also create preformatted reports, upload datasets to API endpoints, or load data into your database of choice.

Here, we will answer some of the questions that revolve around Designer:

  • Why is Alteryx Designer suitable for data engineering
  • How to start building a workflow in Designer
  • How you can leverage the InDB tools for large databases
  • And explain some workflow best practices

Answering the preceding questions will give you a basic understanding of why Designer is a good tool for building your data pipelines and the basis for the DataOps principles we will talk about later.

Why is Alteryx Designer suitable for data engineering?

Alteryx Designer utilizes a drag-and-drop interface for building a workflow. Each tool represents a specific transformation or process. This action and visibility of the process allow for a high development speed and emphasize an iterative workflow to get the best results. Throughout the workflow, you can check the impact of the tool's changes on the records and compare them to the tool's input records.

Building a workflow in Designer

If you open a new Designer workflow, you will see the following main interface components:

  1. Tool Pallet
  2. Configuration Page
  3. Workflow Canvas
  4. Results Window

These components are shown in the following screenshot:

Figure 1.1 – Alteryx Designer interface

Figure 1.1 – Alteryx Designer interface

Each of these sections provides a different set of information to you while building a workflow.

The Canvas gives a visual representation of the progress of a workflow, the configuration page allows for quick reference and the changing of any settings, and the results window provides a preview of the changes made to the dataset.

This easy viewing of the entire pipeline in the canvas, the data changes at each transformation, and the speedy confirmation of settings in the workflow allow for rapid iteration and testing. As a data engineer, getting a dataset to the stakeholder accurately and quickly is the central goal of your efforts. These Designer features are focused on making that possible.

The default orientation for a workflow is left to right, but you can also customize this to work from top to bottom. Throughout this book, I will describe everything in this context, but be aware that you can change it.

Accessing Online Help

When working in the Designer interface, you can access the online help by pushing the F1 button on your keyboard. Additionally, if you have a particular tool selected when you push the F1 button, you will navigate to the help menu for that specific tool.

Let's build a simple workflow using the tools in the Favorites tool bin. We will complete the following steps and create the completed workflow shown in Figure 1.2:

  1. Connect to a dataset.
  2. Perform a calculation.
  3. Summarize the results.
  4. Write the results to an Alteryx yxdb file:
Figure 1.2 – Introduction workflow

Figure 1.2 – Introduction workflow

You can look at the example workflow in the book's GitHub repository here: https://github.com/PacktPublishing/Data-Engineering-with-Alteryx/tree/main/Chapter%2001.

Using an Input Data tool, we can connect to the Cust_wTransactions.xls dataset. This dataset is one of the Alteryx Sample datasets, and you can find this in the Alteryx Program folder, located at C:\ProgramFiles\Alteryx\Samples\data\SampleData\Cust_wTransactions.xls.

In step 2 of the process, we create a field with the following steps:

  1. Create a new field with a Formula tool: When creating a formula, you always go through the following steps: Create a new Output Column (or select an existing column).
  2. Set the data type: Set the data type for a new column (you cannot change an existing column's data type).
  3. Write the formula: Alteryx has field and formula autocompletion, so that will also help for speeding up your development.

The workflow of the preceding steps can be seen in the following screenshot:

Figure 1.3 – Steps for creating a formula

Figure 1.3 – Steps for creating a formula

The third step in the process is to summarize the results to find the average speed per customer in each city as follows:

  • Choose any grouping fields: Select any fields that we are grouping by, such as City, and then add the action of Group By for that field.
  • Choose any aggregation fields: Select the field that we want to aggregate, Spend Per Visit, and apply the aggregation we want to action (Numeric action menu | Average option)

The configuration for the summary described is shown in the following screenshot:

Figure 1.4 – Summarize configuration

Figure 1.4 – Summarize configuration

The final step in our workflow is to view the results of the processing. We can use the Browse tool to view all the records in a dataset and see the full results.

The process we have looked at works well on smaller datasets or data in local files. It is less effective when working with large data sources or when the data is already in a database. In those situations, using InDB tools is a better toolset to use. We will get an understanding of how to use those tools in the next section.

What can the InDB tools do?

The InDB tools are a great way to process datasets without copying the data across the network to your local machine. In the following screenshot, we have an example workflow that uses a sample Snowflake database to process 4.1 GB of data in less than 2 minutes:

Figure 1.5 – Example workflow using InDB tools

Figure 1.5 – Example workflow using InDB tools

You can look at the example workflow in the book's GitHub repository here: https://github.com/PacktPublishing/Data-Engineering-with-Alteryx/tree/main/Chapter%2001.

This workflow entails three steps:

  1. Generate an initial query for the target data.
  2. Produce a subquery off that data to generate the filtering logic.
  3. Apply the filtering logic to the primary query.

When looking at the visual layout, we see the generation of the query, where the logic branches off, and how we merge the logic back onto the dataset. The automated annotations all provide information about what is happening at each step. At the same time, the tool containers group the individual logic steps together.

We will look at how to use the InDB tools in more detail in later chapters, but this workflow shows how complicated queries are run on large datasets while still providing good performance in your workflow.

Building better documentation into your workflow improves the usability of the workflow. Therefore, adding this documentation is considered the best practice to employ when developing a workflow. We will explore how we can apply the documentation in the next section.

Best practices for Designer workflows

Applying Designer best practices makes your data engineering more usable for you and other team members. Having the documentation and best practices implemented throughout a workflow embeds the knowledge of what the workflow components are doing in context. It means that additional team members, or you in the future, will be able to open a workflow and understand what each small section is trying to achieve.

The best practices fall into three areas:

  1. Supplementing the automatic annotations: The automatic annotations that Alteryx creates for individual tools provide basic information about what has happened in a tool. The annotations do not offer an explanation or justification of the logic. Additionally, the default naming of each tool doesn't provide any context for the log outputs. We can add more information in both of these areas. We can update the tool name to describe what is happening in that tool and expand the annotation to include more detail.
  2. Using tool containers to group logic: Adding tool containers to a workflow is a simple way of visually grouping processes on the canvas. You can also use specific colors for the containers to highlight different functions. For example, you can color input functions green and logic calculations in orange. These particular color examples don't matter as long as the colors are consistent across workflows and your organization.
  3. Adding comment and explorer box tools for external context: Often, you will need to add more context to a workflow, and this context won't fit in an annotation or color grouping. You can supplement the automatic documentation with Comment tools for text-based notes or an explorer box to reference external sources. Those external sources could be web pages, local HTML files, or folder directories. For example, you can include web documentation or a README file in the workflow, thereby providing deeper context.

These three areas all focus on making a workflow decipherable at a glance and quickly understandable. They give new data engineers the information they need to understand the workflow when adopting or reviewing a project.

With a completed workflow, the next step will be making the workflow run automatically. We also need to make the datasets that the workflow creates searchable and the lineage traceable. We will use Alteryx Server and Alteryx Connect to achieve this, which we will look at next.

Leveraging Alteryx Server and Alteryx Connect

Once you have successfully created a data pipeline, the following process is to automate its use. In this section, we will use Alteryx to automate a pipeline and create discoverability and trust in the data.

The two products we will focus on are Alteryx Server and Alteryx Connect. Server is the workflow automation, scaling, and sharing platform, while Connect is for data cataloging, trust, and discoverability.

Server has three main capabilities that are of benefit to a data engineer:

  • Time-based automation of workflows: Relying on a single person to run a workflow that is key to any system is a recipe for failure. So, having a schedule-based system for running those workflows makes it more robust and reliable.
  • Scaling of capacity for running workflows: Running multiple workflows on Designer Desktop is not a good experience for most people. Having Server run more workflows will also free up local resources for other jobs.
  • Sharing workflows via a central location: The Server is the central location where workflows are published to and discovered by users around the organization.

Connect is a service for data cataloging and discovery. Data assets can be labeled by what the data represents, the field contents, or the source. This catalog enables the discovery of new resources. Additionally, the Data Nexus allows a data field's lineage to be traced and builds trust with users to know where a field originated from and what transformations have taken place.

How can you use Alteryx Server to orchestrate a data pipeline?

Once we have created a pipeline, we may want to have the dataset extracted on a regular schedule. Having this process automated allows for more robust implementation and makes using the dataset simpler to use.

Orchestrating a data pipeline with Alteryx Server is a three-step process:

  1. Create a pipeline in Alteryx Designer and publish it to Alteryx Server.
  2. Set a time frame to run the workflow.
  3. Monitor the running of the workflow.

This three-step process is deceptively simple and, for this introduction, only covers the most straightforward use cases. Later, in Chapter 10, Monitoring DataOps and Managing Changes, we will walk through some techniques to orchestrate more complex, multistep data pipelines. Still, those examples fundamentally come back to these three steps mentioned above.

In the following screenshot, we can see how we can define the time frame for our schedule on the Server Schedule page:

Figure 1.6 – The Alteryx Server scheduling page

Figure 1.6 – The Alteryx Server scheduling page

On this page, we can define the frequency of a schedule, the time the schedule will occur, and provide a reference name for the schedule.

How does Connect help with discoverability?

The final piece of your data engineering puzzle is how will users find and trust the dataset you have created? While you will often generate datasets on request, you also find that users will come to you looking for datasets you have already made, and they don't know they exist.

Connect is a data cataloging and discoverability tool for you to surface the datasets in your organization and allow users to find them, request access, and understand what the fields are. It is a central place for data definitions and allows searching in terms of how content is defined.

Using this book in your data engineering work

Now that you know the basics of using Alteryx, we can investigate how Alteryx applies to data engineering. Data engineering is a broad topic and has many different definitions, depending on who is using it. So, for the context of this book, here is how I define data engineering:

Data engineering is the process of taking data from any number of disparate sources and transforming them into a usable format for an end user.

It sounds simple enough, but this definition encapsulates many variables and complexity:

  • Where is the data, and how many sources are there?
  • What transformations are needed?
  • What is a usable state?
  • How should the data be accessed?
  • Who is the end user?

Chapter 2, Data Engineering with Alteryx, will expand on what this definition means. It will also explain how Alteryx products cover all the steps needed to deliver that definition.

How does the Alteryx platform come together for data engineering?

So far in this introduction, we have talked about how the parts of Alteryx can help the data engineering process independently. However, each Alteryx element also works together to build a complete, end-to-end data engineering process.

There is a common set of processes that are required when completing a data engineering project. These processes are shown in the next diagram along with what Alteryx software is usually associated with that process:

Figure 1.7 – The aspects of the data engineering process

Figure 1.7 – The aspects of the data engineering process

The preceding screenshot shows Designer overlapping the data sources and transformation aspects of the processes, Server overlays the automation (which performs some of the transformations), and Connect covers the discovery section of the process.

Chapter 2, Data Engineering with Alteryx, will introduce a complete data engineering example and the DataOps principles that support data engineering in Alteryx. Finally, Chapter 3, DataOps and Its Benefits, will take the principles introduced and expand on why those principles will benefit data engineering and your organization.

Examples where Alteryx is used for data engineering

I want to share two example use cases where Alteryx provides an excellent platform for data engineering from my consulting work.

In the first example, my client uses Alteryx Designer to create a series of workflows to collect reference information from a third party. They automate this process on Server to extract the information from the source text files and load them into their data warehouse daily. These resources are then shared with people throughout the company and made discoverable.

The other use case is where a medium-sized business uses Alteryx to collect the core company information from scattered business APIs; finance and billing, social media and web analytics, CRM, and customer engagement. Next, the company automatically consolidates the business resources into the core reporting database. The company then discovers the centralized data sources in Connect while Alteryx populates an additional data catalog for the Business Intelligence tool.

Summary

In this chapter, we have learned the parts that make up the Alteryx platform. We have also learned how they can benefit you as a data engineer with faster development, an iterative workflow, and extendable self-documentation.

We examined an example of how to build a workflow with Designer and learned what the InDB tools can do. Finally, we introduced Server and Connect. We learned how Server can automate and scale your data engineering developments. Then we learned that Connect provides a place for user discovery of the datasets you have created.

In the next chapter, we will expand on what a data engineer is for Alteryx and how you can use Alteryx products for data engineering. Then we will introduce DataOps and why this is a guiding principle for data engineering in Alteryx.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Learn DataOps principles to build data pipelines with Alteryx
  • Build robust data pipelines with Alteryx Designer
  • Use Alteryx Server and Alteryx Connect to share and deploy your data pipelines

Description

Alteryx is a GUI-based development platform for data analytic applications. Data Engineering with Alteryx will help you leverage Alteryx’s code-free aspects which increase development speed while still enabling you to make the most of the code-based skills you have. This book will teach you the principles of DataOps and how they can be used with the Alteryx software stack. You’ll build data pipelines with Alteryx Designer and incorporate the error handling and data validation needed for reliable datasets. Next, you’ll take the data pipeline from raw data, transform it into a robust dataset, and publish it to Alteryx Server following a continuous integration process. By the end of this Alteryx book, you’ll be able to build systems for validating datasets, monitoring workflow performance, managing access, and promoting the use of your data sources.

Who is this book for?

If you’re a data engineer, data scientist, or data analyst who wants to set up a reliable process for developing data pipelines using Alteryx, this book is for you. You’ll also find this book useful if you are trying to make the development and deployment of datasets more robust by following the DataOps principles. Familiarity with Alteryx products will be helpful but is not necessary.

What you will learn

  • Build a working pipeline to integrate an external data source
  • Develop monitoring processes for the pipeline example
  • Understand and apply DataOps principles to an Alteryx data pipeline
  • Gain skills for data engineering with the Alteryx software stack
  • Work with spatial analytics and machine learning techniques in an Alteryx workflow Explore Alteryx workflow deployment strategies using metadata validation and continuous integration
  • Organize content on Alteryx Server and secure user access

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jun 30, 2022
Length: 366 pages
Edition : 1st
Language : English
ISBN-13 : 9781803236483
Vendor :
Alteryx
Category :
Languages :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Jun 30, 2022
Length: 366 pages
Edition : 1st
Language : English
ISBN-13 : 9781803236483
Vendor :
Alteryx
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 113.97
Data Engineering with Alteryx
€35.99
Mastering Microsoft Power BI – Second Edition
€37.99
Azure Data Engineering Cookbook
€39.99
Total 113.97 Stars icon
Banner background image

Table of Contents

17 Chapters
Part 1: Introduction Chevron down icon Chevron up icon
Chapter 1: Getting Started with Alteryx Chevron down icon Chevron up icon
Chapter 2: Data Engineering with Alteryx Chevron down icon Chevron up icon
Chapter 3: DataOps and Its Benefits Chevron down icon Chevron up icon
Part 2: Functional Steps in DataOps Chevron down icon Chevron up icon
Chapter 4: Sourcing the Data Chevron down icon Chevron up icon
Chapter 5: Data Processing and Transformations Chevron down icon Chevron up icon
Chapter 6: Destination Management Chevron down icon Chevron up icon
Chapter 7: Extracting Value Chevron down icon Chevron up icon
Chapter 8: Beginning Advanced Analytics Chevron down icon Chevron up icon
Part 3: Governance of DataOps Chevron down icon Chevron up icon
Chapter 9: Testing Workflows and Outputs Chevron down icon Chevron up icon
Chapter 10: Monitoring DataOps and Managing Changes Chevron down icon Chevron up icon
Chapter 11: Securing and Managing Access Chevron down icon Chevron up icon
Chapter 12: Making Data Easy to Use and Discoverable with Alteryx Chevron down icon Chevron up icon
Chapter 13: Conclusion Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8
(11 Ratings)
5 star 81.8%
4 star 18.2%
3 star 0%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Alisha Dhillon Jul 02, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book has been beautifully written and explains some very complex aspects of Alteryx. It does exactly what it says in the title, giving a good breakdown of different areas supported by examples throughout.As somebody currently using Alteryx in a data engineering capacity, all of this has been incredibly useful. I have had to previously google many of my questions whilst Paul's write-up is clear and to the point. There are screenshots with annotations throughout which makes it really easy to follow along. I have learned a lot of new information reading this and can recognise how much value this can bring users to then go and create or strengthen existing processes. You will be surprised by how much you can gain from this!This WILL transform the way that you operate, into a more efficient and effective manner. If you are navigating your way through a strict IT environment, you will find your answers here with information on areas such as using Git to carry out version control or even how to use the Alteryx server monitoring workflow to then creating an insight dashboard for workflow monitoring. There are too many best practices to list.I highly recommend investing in this book, both for expanding your own knowledge and to address your business needs. You will not regret it - I certainly don't.
Amazon Verified review Amazon
mark frisch Jun 30, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I've known Paul for years and was eager to read this book. Paul conveys the context of what's coming and guides the reader through application of the concept in a clear and concise manner. Better yet, if you're a small business, mid-size or enterprise user of Alteryx you can adapt the concepts to your needs. Working in an Agile framework, he's quick to make the content useable and valuable.
Amazon Verified review Amazon
Patrick H Aug 19, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book provides a little bit of everything for all skill levels of Alteryx users. It will guide you through creating your first workflow if you are brand new to the tool to advanced concepts like administrating Alteryx Server, which I found particularly useful, and DataOps concepts. I highly recommend this book for anyone who wants to level up their Alteryx skills!
Amazon Verified review Amazon
sandee matecko Jul 30, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Best practices, tips and tricks, and more- this book is great for any level, from beginner to expert!
Amazon Verified review Amazon
JF Jul 03, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I have been reviewing the new book "Data Engineering with Alteryx " by Paul Houghton and have thoroughly enjoyed it!This book will help data professionals early in their careers to understand the DataOps principles and how they can use them in Alteryx to build data pipelines. It will also guide Alteryx workflow deployment strategies.Experienced data engineers can leverage this book to get insights into advanced analytical techniques like spatial analytics and machine learning!There is also valuable content about managing Alteryx servers and user access. This book is a vital tool in the data engineer's toolbox!
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.