Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Apache Superset Quick Start Guide
Apache Superset Quick Start Guide

Apache Superset Quick Start Guide: Develop interactive visualizations by creating user-friendly dashboards

eBook
$9.99 $25.99
Paperback
$32.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Apache Superset Quick Start Guide

Configuring Superset and Using SQL Lab

Superset has a flexible software architecture. This means that a Superset setup can be made for many different production environment needs. The production environment at Airbnb runs Superset inside Kubernetes and serves 600+ daily users, rendering over 100,000 charts every day.

At the same time, Superset can be set up with default settings for most users. When launching our first dashboard on a Google Compute Instance, we did not have to make any changes to the default parameters.

In this chapter, we will learn about the following:

  • Setting the web server
  • Metadata database
  • Web server
  • Setting up an NGINX reverse proxy
  • Setting up HTTPS or SSL certification
  • Flask-AppBuilder permissions
  • Securing session data
  • Caching queries
  • Mapbox access token
  • Long-running queries
  • Upgrading Superset
  • Main configuration file
  • SQL Lab
...

Setting the web server

Start the Superset web server with this command:

superset runserver

Superset loads the configuration from a superset_config.py Python file. This file must be present in the path stored in the SUPERSET_CONFIG_PATH environment variable. The configuration variables present in this config file will override their default values. Superset uses the default values for variables not defined in the file.

So to configure the application, we need to create a Python file. After creating the Python file, we need to update SUPERSET_CONFIG_PATH to include the file path.

On your GCE instance, run the following commands:

shashank@superset:~$ touch $HOME/.superset/superset_config.py
shashank@superset:~$ echo 'export SUPERSET_CONFIG_PATH=$HOME/.superset/superset_config.py' >> ~/.bash_profile
shashank@superset:~$ source ~/.bash_profile

Those are the last commands...

Creating the metadata database

The SQLALCHEMY_DATABASE_URI variable value is picked up by the Flask-AppBuilder manager to create the metadata database for the web app. The metadata database is persisted in ~/.superset/superset.db by default. This can be verified by running sqlite3 in the directory and listing the tables in the database:

shashank@superset:~/.superset$ sqlite3 
SQLite version 3.16.2 2017-01-06 16:32:41Enter ".help" for usage hints.Connected to a transient in-memory database.Use ".open FILENAME" to reopen on a persistent database.
sqlite> .open superset.db
sqlite> .tables
ab_permission annotation_layer logs ab_permission_view clusters metrics ab_permission_view_role columns query ab_register_user css_templates saved_query ab_role dashboard_slices slice_user ab_user dashboard_user slices ab_user_role dashboards sql_metrics ab_view_menu datasources...

Migrating data from SQLite to PostgreSQL

Before we move forward, let's make sure all tables have been migrated from the SQLite database to the newly set up PostgreSQL database.

First, we need to migrate the SQLite metadata database to our new PostgreSQL installation. We will use sequel, an open-source database toolkit available as a Ruby gem. It works very well with migration tasks from sqlite3 to PostgreSQL, which is why we are using it.

We will install OS dependencies and gem dependencies along with the sequel Ruby gem:

sudo apt-get install ruby-dev libpq-dev libsqlite3-dev
sudo gem install pg sqlite3
sudo gem install sequel

After installing sequel, the migration is as simple as running the following command. Make sure the path to the sqlite3 database is set correctly:

sequel -C sqlite:///home/shashank/.superset/superset.db postgresql://superset:superset@localhost/superset...

Web server

We can integrate Superset with many web server options, such as Gunicorn, NGINX, and Apache HTTP, depending on our runtime requirements.

Web servers handle HTTP or HTTPS requests. A Superset web server typically processes a large number of such requests to render charts. Each request generates an I/O-bound database query in Superset. This query is not CPU-bound because the query execution happens at the database level and the result is returned to Superset by the database query execution engine. Requests to a Superset web server almost always require a dynamic output and not a static resource as a response. Gunicorn is a Python WSGI HTTP server. WSGI is a Python application interface based on the Python Enhancement Proposal (PEP) 333 standard. It specifies how Python applications interface with a web server. Gunicorn is the recommended web server for deploying a Superset...

Setting up an NGINX reverse proxy

We are going to set up NGINX as a proxy server that will retrieve resources on behalf of a client from the Gunicorn web server. NGINX has many functionalities and it is the most popular proxy server in use. We will use it primarily to redirect connections when someone enters a registered web domain name in their web browser, or the external IP address directly into our Superset web server.

We will set up SSL certification for the NGINX proxy server. This way, web connections to our web app will always be encrypted and secure. More popular browsers, such as Chrome and Firefox, will show a warning if the web page does not have an SSL certificate. No worries, we will get the certificate!

We will first install NGINX in our GCE instance. GCE runs an Ubuntu OS:

# Install
sudo apt-get update
sudo apt-get install nginx 

The NGINX service is now installed...

Setting up HTTPS or SSL certification

We will be using Let's Encrypt (https://letsencrypt.org/) a free, automated, and open certificate authority managed by the non-profit Internet Security Research Group (ISRG).

Secure Socket Layer (SSL) is a secure transport layer that can be used in any protocol; HTTPS is a common instance of it, that we will be implementing for our Superset web server.

Just like most other things, configuring SSL has OS level dependencies. First, we will install certbot, which is the free automated certificate service. It needs to verify our site first. It does this by doing some checks (which it calls challenges) in http://<url>/.well_known:

# Install certbot
sudo add-apt-repository ppa:certbot/certbot
sudo apt-get install certbot
# Create .well_known directory
cd /var/www/html
mkdir .well_known

We also need to update the superset.conf file in the...

Flask-AppBuilder permissions

Superset uses the Flask-AppBuilder framework to store metadata required for permissions in Superset. Every time a Flask-AppBuilder app is initialized, permissions and views are automatically created for the Admin role. When multiple concurrent workers are started by Gunicorn, they might lead to contention and race conditions between the workers trying to write to one metadata database table.

The automatic updating of permissions in the metadata database can be disabled by setting the value of the SUPERSET_UPDATE_PERMS environment variable to zero. It is one or enabled by default:

export SUPERSET_UPDATE_PERMS=1 superset init
# Make sure superset init is called before Superset starts with a new metadata database
export SUPERSET_UPDATE_PERMS=0 gunicorn -w 10 … superset:app

Securing session data

Session data that is exchanged between the Superset web server and a browser client or internet bot can be encrypted using the SECRET_KEY parameter value present in the superset_config.py file. It uses a cryptographic one-way hashing algorithm for encryption. Since the secret is never included with data the web server sends to a browser client or internet bot, neither can tamper with session data and hope to decrypt it.

Just set its value to a random string of length greater than ten:

SECRET_KEY = 'AdLcixY34P' # random string

Caching queries

Superset uses Flask-Cache for cache management and Flask-Cache provides support for many backend implementations that fit different use cases.

Redis is the recommended cache backend for Superset. But if you do not expect many users to use your Superset installation, then FileSystemCache is a good alternative to a Redis server.

The following are some of the cache implementations that are available, with a description and their configuration variables:

CACHE_TYPE
Description and configuration
simple
Uses a local Python dictionary to store results. This is not really safe when using multiple workers on the web server.
filesystem

Uses the filesystem to store cached values. The CACHE_DIR variable is the directory path used by FileSystemCache.

memcached

Uses a memcached server to store values. Requires the pylibmc Python package installed in the...

Mapbox access token

The MAPBOX_API_KEY variable needs to be defined because we will use Mapbox visualizations in Superset charts. We need to get a Mapbox access token using the guidelines available here: https://www.mapbox.com/help/how-access-tokens-work/.

After you have obtained it, set the MAPBOX_API_KEY variable to the valid access token value.

Long-running queries

Database queries that are initiated by Superset to render charts must complete within the lifetime of HTTP/HTTPS requests. Some long-running database queries can cause a request timeout if they exceed the maximum duration of a request. But it is possible to configure Superset to handle long-running queries properly using a Celery distributed queue, and transfer the responsibility of query handling to Celery workers.

In large databases, it is common to run queries that run for minutes and hours while most commonly web request timeouts are within 30-60 seconds. Therefore, it is necessary that we configure this asynchronous query execution backend for Superset.

We need to ensure that the worker and the Superset server both have the same values for common configuration variables.

Redis is the recommended message queue for submitting new queries to Celery workers...

Main configuration file

So, we have completed configuring Superset. Let's take a look at the complete Superset configuration file:

# Superset Configuration file
# add file superset_config.py to PYTHONPATH for usage

# Metadata database
SQLALCHEMY_DATABASE_URI = "postgresql+psycopg2://superset:superset@localhost/superset"

# Securing Session data
SECRET_KEY = 'AdLcixY34P' # random string

# Caching Queries
CACHE_CONFIG = {
# Specify the cache type

'CACHE_TYPE': 'redis',
'CACHE_REDIS_URL': 'redis://localhost:6379/0',
# The key prefix for the cache values stored on the server
'CACHE_KEY_PREFIX': 'superset_results'
}

# Set this API key to enable Mapbox visualizations
MAPBOX_API_KEY = os.environ.get('MAPBOX_API_KEY', 'mapbox-api-key')

# Long running query handling using Celery workers
class
...

SQL Lab

SQL Lab is a powerful SQL IDE inside Superset. It works with any database that has a SQLAlchemy Python connector. It is great for data exploration. It can query any data sources in the Superset, including the metadata database.

It is a solid playground from which we can slice and dice the dataset in many ways to arrive at a form that needs to be visualized to solve the analytical question that the chart was created to answer.

First, we need to enable SQL Lab use on the superset-bigquery data source. We will explore and visualize the data in the table using SQL queries.

After clicking on the Sources | Databases option on the navigation bar, select the Edit record option for the superset-bigquery data source:

The overview chart of the list of databases

Then, make sure the following three options are enabled. Allow Run Sync should be enabled by default. We are doing this...

Summary

We understood that when the Superset web server is started we can configure it for our runtime environment needs using the superset_config.py file. We looked at the configuration parameters that can make Superset secure and scalable to match optimal trade-offs.

SQL Lab provides an opportunity to experiment with result sets before plotting. It can be used as an excellent tool for exploring datasets and developing charts.

In this chapter, we replaced SQLite metadata with a PostgreSQL database and configured a web app to use it as the database. So that the web app can handle many concurrent users, we deployed it on a Gunicorn server:

  • PostgreSQL metadata database
  • Gunicorn
  • NGINX
  • HTTPS authorization
  • Securing session data
  • Redis caching system
  • Celery for long-running queries
  • Mapbox access token

Nicely done! We have been able to make dashboards, use SQL Lab, and understand the...

Left arrow icon Right arrow icon

Key benefits

  • Work with Apache Superset's rich set of data visualizations
  • Create interactive dashboards and data storytelling
  • Easily explore data

Description

Apache Superset is a modern, open source, enterprise-ready business intelligence (BI) web application. With the help of this book, you will see how Superset integrates with popular databases like Postgres, Google BigQuery, Snowflake, and MySQL. You will learn to create real time data visualizations and dashboards on modern web browsers for your organization using Superset. First, we look at the fundamentals of Superset, and then get it up and running. You'll go through the requisite installation, configuration, and deployment. Then, we will discuss different columnar data types, analytics, and the visualizations available. You'll also see the security tools available to the administrator to keep your data safe. You will learn how to visualize relationships as graphs instead of coordinates on plain orthogonal axes. This will help you when you upload your own entity relationship dataset and analyze the dataset in new, different ways. You will also see how to analyze geographical regions by working with location data. Finally, we cover a set of tutorials on dashboard designs frequently used by analysts, business intelligence professionals, and developers.

Who is this book for?

This book is for data analysts, BI professionals, and developers who want to learn Apache Superset. If you want to create interactive dashboards from SQL databases, this book is what you need. Working knowledge of Python will be an advantage but not necessary to understand this book.

What you will learn

  • Get to grips with the fundamentals of data exploration using Superset
  • Set up a working instance of Superset on cloud services like Google Compute Engine
  • Integrate Superset with SQL databases
  • Build dashboards with Superset
  • Calculate statistics in Superset for numerical, categorical, or text data
  • Understand visualization techniques, filtering, and grouping by aggregation
  • Manage user roles and permissions in Superset
  • Work with SQL Lab

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Dec 19, 2018
Length: 188 pages
Edition : 1st
Language : English
ISBN-13 : 9781788992244
Vendor :
Apache
Category :
Languages :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Dec 19, 2018
Length: 188 pages
Edition : 1st
Language : English
ISBN-13 : 9781788992244
Vendor :
Apache
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 98.97
Apache Ignite Quick Start Guide
$32.99
Apache Superset Quick Start Guide
$32.99
Artificial Intelligence and Machine Learning Fundamentals
$32.99
Total $ 98.97 Stars icon
Banner background image

Table of Contents

9 Chapters
Getting Started with Data Exploration Chevron down icon Chevron up icon
Configuring Superset and Using SQL Lab Chevron down icon Chevron up icon
User Authentication and Permissions Chevron down icon Chevron up icon
Visualizing Data in a Column Chevron down icon Chevron up icon
Comparing Feature Values Chevron down icon Chevron up icon
Drawing Connections between Entity Columns Chevron down icon Chevron up icon
Mapping Data That Has Location Information Chevron down icon Chevron up icon
Building Dashboards Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.5
(2 Ratings)
5 star 50%
4 star 0%
3 star 0%
2 star 50%
1 star 0%
Cliente Kindle Mar 20, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Excelente livro. Cobertura total sobre o Apache Superset (install, config., etc.).De bônus, um excelente overview sobre análise de dados, visualização/insigths.
Amazon Verified review Amazon
Beth Jan 11, 2022
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
The basics of using superset to make visualization and do day-to-day maintenance within the application are good. But in all honesty, those things are easier to figure out than how to take this open source code and deploy it in a sustainable manageable manner. This book, like ALL online tutorials tells you how to manually use the command line to simply install it and get it running. All management of the servers and load balancing would basically be manual from the command line.That serves no purpose in this day and age when ever server service offers manage servers which scale up and down, offer redundancy and safeguards for data. There is litteraly NO WHERE online where someone shows you how to build a docker compose file to automate CloudFormation with appropriate configuration. Docker compose can even do it for you, but you still need to know how to create that docker compose document and use the CLI. I was hoping this was that book. There are just too many resources for what is in here (albeit strewn in little tutorials and videos) all over the net.Well written and useful as a software USER but totally useless if you have been asked to build a server for your team...
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.