Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Mastering Python Data Visualization
Mastering Python Data Visualization

Mastering Python Data Visualization: Generate effective results in a variety of visually appealing charts using the plotting packages in Python

Arrow left icon
Profile Icon Kirthi Raman
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.5 (4 Ratings)
Paperback Oct 2015 372 pages 1st Edition
eBook
$9.99 $43.99
Paperback
$54.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Kirthi Raman
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.5 (4 Ratings)
Paperback Oct 2015 372 pages 1st Edition
eBook
$9.99 $43.99
Paperback
$54.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$9.99 $43.99
Paperback
$54.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Mastering Python Data Visualization

Chapter 1. A Conceptual Framework for Data Visualization

The existence of the Internet and social media in modern times has led to an abundance of data, and data sizes are growing beyond imagination. How and when did this begin?

A decade ago, a new way of doing business evolved: of corporations collecting, combining, and crunching large amount of data from sources throughout the enterprise. Their goal was to use a high volume of data to improve the decision-making process. Around that same time, corporations like Amazon, Yahoo, and Google, which handled large amounts of data, made significant headway. Those milestones led to the creation of several technologies supporting big data. We will not get into details about big data, but will try exploring why many organizations have changed their ways to use similar ideas for better decision-making.

How exactly are these large amount of data used for making better decisions? We will get to that eventually, but first let us try to understand the difference between data, information, and knowledge, and how they are all related to data visualization. One may wonder, why are we talking about data, information, and knowledge. There is a storyline that connects how we start, what we start with, how all these things benefit the business, and the role of visualization. We will determine the required conceptual framework for data visualization by briefly reviewing the steps involved.

In this chapter, we will cover the following topics:

  • The difference between data, information, knowledge, and insight
  • The transformation of information into knowledge, and further, to insight
  • Collecting, processing, and organizing data
  • The history of data visualization
  • How does visualizing data help decision-making?
  • Visualization plots

Data, information, knowledge, and insight

The terms data, information, and knowledge are used extensively in the context of computer science. There are many definitions of these terms, often conflicting and inconsistent. Before we dive into these definitions, we will understand how these terms are related to visualization. The primary objective of data visualization is to gain insight (hidden truth) into the data or information. The whole discussion about data, knowledge, and insight in this book is within the context of computer science, and not psychology or cognitive science. For the cognitive context, one may refer to https://www.ucsf.edu/news/2014/05/114321/converting-data-knowledge-insight-and-action.

Data

The term data implies a premise from which one may draw conclusions. Though data and information appear to be interrelated in a certain context, data actually refers to discrete, objective facts in a digital form. Data are the basic building blocks that, when organized and arranged in different ways, lead to information that is useful in answering some questions about the business.

Data can be something very simple, yet voluminous and unorganized. This discrete data cannot be used to make decisions on its own because it has no meaning and, more importantly, because there is no structure or relationship between them. The process by which data is collected, transmitted, and stored varies widely with the types of data and storage methods. Data comes in many forms; some notable forms are listed as follows:

  • CSV files
  • Database tables
  • Document formats (Excel, PDF, Word, and so on)
  • HTML files
  • JSON files
  • Text files
  • XML files

Information

Information is processed data presented as an answer to a business question. Data becomes information when we add a relationship or an association. The association is accomplished by providing a context or background to the data. The background is helpful because it allows us to answer questions about the data.

For example, let us assume that the data given for a basketball player includes height, weight, position, college, date of birth, draft pick, draft round, NBA-debut, and recruiting rank. The answer to the question, "Who is the first draft pick with a height of more than six feet and plays on the point guard position?" is also the information.

Similarly, each player's score is one piece of data. The answer to the question "Who has the highest point per game this year and what is his score" is "LeBron James, 27.47", which is also information.

Knowledge

Knowledge emerges when humans interpret and organize information and use that to drive decision-making. Knowledge is the data, information, and the skills acquired through experience. Knowledge comprises the ability to make the appropriate decision as well as the skills to execute it.

The essential ingredient—connecting the data—allows us to understand the relative importance of each piece of information. By comparing results from the past and by recognizing patterns, we don't have to build a solution to a problem from scratch. The following diagram summarizes the concepts of data, information, and knowledge:

Knowledge

Knowledge changes in an incremental way, particularly when information is rearranged or reorganized or when some computing algorithm changes. Knowledge is like an arrow pointing to the results of an algorithm that is dependent on past information that comes from data. In many instances, knowledge is also gained by visually interacting with the results. Insight on the other hand, opens the way to the future.

Data analysis and insight

Before we dive into the definition of insight and how it relates to business, let us see how the idea of capturing insight ever began. For over a decade, organizations have been struggling to make sense of all the data and information they have, particularly with the exploding data size. They all realized the importance of data analysis (also known as data analytics or analytics) in order to arrive at an optimal or realistic business decision based on existing data and information.

Analytics hinges upon mathematical algorithms to determine the relationships between the data that can yield insight. One simple way to understand insight is by considering an analogy: when data does not have a structure and proper alignment with the business, it gives a clearer and deeper understanding by converting the data to a more structured form and aligning it more closely to the business goals. Insight is that "eureka" moment when there is a breakthrough result that comes out. One should not get confused between the terms Analytics and Business Intelligence. Analytics has predictive capabilities while Business Intelligence provides results based on the analysis of historical data.

Analytics is usually applicable to a broader spectrum of data and, for this reason, it is very common that data collaboration happens internally and/or externally. In some business paradigms, the collaboration only happens internally in an extensive collection of a dataset, but in most other cases, an external connection helps in connecting the dots or completing the puzzle. Two of the most common sources of external data connection are social media and consumer base.

Later in this chapter, we refer to real-life business stories that achieved some remarkable results by applying analytics to gain insight and drive business value, improve decision-making, and understand their customers better.

The transformation of data

By now we know what data is, but now the question is: what is the purpose of collecting data? Data is useful for describing a physical or social phenomenon and to further answer questions about that phenomenon. For this reason, it is important to ensure that the data is not faulty, inaccurate, or incomplete; otherwise, the responses based on that data will also not be accurate or complete.

There are different categories of data, some of which are past performance data, experimental data, and benchmark data. Past performance data and experimental data are pretty self-explanatory. Benchmark data, on the other hand, is data that compares the characteristics of two different items or products to a standard measure. Data gets transformed into information, is processed further, and is then used for answering questions. It is apparent, therefore, that our next step is to achieve that transformation.

Transforming data into information

Data is collected and stored in several different forms depending on the content and its significance. For instance, if the data is about playoff basketball games, then it will be in a text and video format. Another example is the temperature recordings from all the cities of a country, collected and made accessible via different formats. The transformation from data to information involves collection, processing, and organization of data as shown in the following diagram:

Transforming data into information

The collected data needs some processing and organizing, which later may or may not have a structure, model, or a pattern. However, this process at least gives us an organized way of finding answers to questions about the data. The process could be a simple sorting based on the total points scored by basketball players or a sorting based on the names of the city and state.

The transformation from data to information could also be a little more than just sorting such as statistical modeling or a computational algorithm. It is this transformation from data to information that is really important and enables the data to be queried, accessed, and manipulated. In some cases, when there is a vast and divergent amount of data, the transformation may involve processing methods such as filtering, aggregating, applying correlation, scaling and normalizing, and classifying.

Data collection

Data collection is a time-consuming process. So, businesses are looking for better ways to automate data capture. However, manual data collection is still prevalent for many processes. Data collection by automatic processes in modern times uses input devices such as sensors. For instance, underwater coral reefs are monitored via sensors; agriculture is another area where sensors are used in monitoring soil properties, controlling irrigation, and fertilization methods.

Another way to collect data automatically is by scanning documents and log files, which is a form of server-side data collection. Manual processes include data collection via web-based methods that get stored in the database, which can then be transformed into information. Nowadays, web-based collaborative environments are benefiting from improved communication and sharing of data.

Traditional visualization and visual analytic tools are typically designed for a single user interacting with a visualization application on a single machine. Extending these tools to include support for collaboration has clearly come a long way towards increasing the scope and applicability of visualizations in the real world.

Data preprocessing

Today, data is highly susceptible to noise and inconsistency due to its size and likely origin from multiple, heterogeneous sources and types. There are several data preprocessing techniques such as data cleaning, data integration, data reduction, and data transformation. Data cleaning can be applied to remove noise and correct inconsistencies in the data. Data integration merges and combines the data from multiple sources into a coherent format, mostly known as data warehouse. Data reduction can reduce data size by, for instance, merging, aggregating, and eliminating the redundant features. Data transformations may be applied where data is scaled to fall within a smaller range, thus improving the accuracy and efficiency in processing and visualizing them. The transformation cycle of data is shown in the following diagram:

Data preprocessing

Anomaly detection is the identification of unusual data that might not fall into an expected behavior or pattern in the collected data. Anomalies are also known as outliers or noise; for example in signal data, a particular signal that is unusual is considered noise, and in transaction data, an outlier is a fraudulent transaction. Accurate data collection is essential for maintaining the integrity of data. As much as the down side of anomalies, on the flip side, there is also a significant importance of outliers—specifically in cases where one would want to find fraudulent insurance claims, for instance.

Data processing

Data processing is a significant step in the transformation process. It is imperative that the focus be on data quality. Some processing steps that help in preparing data for analyzing and understanding it better are dependency modeling and clustering. There are other processing techniques, but we will limit our discussion here with the two most popular processing methods.

Dependency modeling is the fundamental principle of modeling data to determine the nature and structure of the representation. This process searches for relationships between the data elements; for example, a department store might gather data on the purchasing habits of its customers. This process helps the department store deduce the information about frequent purchases.

Clustering is the task of discovering groups in the data that have, in some way or another, a "similar pattern", without using known structures in the data.

Organizing data

Database management systems allow users to store data in a structured format. However, the databases are too large to fit into memory. There are two ways of structuring data:

  • Storing large data in disks in a structured format like tables, trees, or graphs
  • Storing data in memory using data structure formats for faster access

A data structure comprises a set of different formats for structuring data to be able to store and access it. The general data structure types are arrays, files, tables, trees, lists, maps, and so on. Any data structure is designed to organize the data to suit a specific purpose so that it can be stored, accessed, and manipulated at runtime. A data structure may be selected or designed to store data for the purpose of working on it with various algorithms for faster access.

Data that is collected, processed, and organized to be stored efficiently is much easier to understand, which leads to information that can be better understood.

Getting datasets

For readers who do not have access to organizational data, there are plenty of resources on the Internet with rich datasets from several different sources, such as:

Transforming information into knowledge

Information is quantifiable and measurable, it has a shape, and can be accessed, generated, stored, distributed, searched for, compressed and duplicated. It is quantifiable by the volume or amount of information.

Information transforms into knowledge by the application of discrete algorithms, and knowledge is expected to be more qualitative than information. In some problem domains, knowledge continues to go through an evolving cycle. This evolution happens particularly when the data changes in real time.

Knowledge is like the recipe that lets you make bread out of the information, in this case, the ingredients of flour and yeast. Another way to look at knowledge is as the combination of data and information, to which experience and expert opinion is added to aid decision making. Knowledge is not merely a result of filtering or algorithms.

What are the steps involved in this transformation, and how does the change happen? Naturally, it cannot happen by itself. Though the word information is subject to different interpretations based on the definition, we will explore it further within the context of computing.

A simple analogy to illustrate the difference between information and knowledge: course materials for a particular course provide you the necessary information about the concepts, and the teacher later helps the students to understand the concepts through discussions. This helps the students in gaining knowledge about the course. By a similar process, something needs to be done to transform information into knowledge. The following diagram shows the transformation from information to knowledge:

Transforming information into knowledge

As illustrated in the figure, information when aggregated and run through some discrete algorithms, gets transformed into knowledge. The information needs to be aggregated to get broader knowledge. The knowledge obtained by this transformation helps in answering questions about the data or information such as which quarter did the company have maximum revenue from sales? How much has advertising driven the sales? Or, how many new products have been released this year?

Transforming knowledge into insight

In the traditional system, information is processed, and then analyzed to generate reports. Ever since the Internet came into existence, processed information is already and always available, and social media has emerged as a new way of conducting business.

Organizations have been using external data to gain insights via data analysis. For example, the measure of user sentiments from tweets by consumers via Twitter is used to follow the opinions about product brands. In some cases, there is a higher percentage of users giving a positive message on social media about a new product, say an iPhone or a tablet computer. The analytical tool can provide numerical evidence of that sentiment, and this is where data visualization plays a significant role.

Another example to illustrate this transformation, Netflix announced a competition in 2009 for the best collaborative filtering algorithm to predict user ratings for films, based on previous ratings. The winner of that competition used the pragmatic theory and achieved a 10.05 percent improvement in predicting user ratings, which increased the business value for Netflix.

Transforming knowledge into insight

Transforming knowledge into insight is achieved using collaboration and analytics as shown in the preceding diagram. Insight implies seeing the solution and realizing what needs to be done. Achieving data and information is easy and organizations have known methods to achieve that, but getting insight is very hard. Achieving insight requires new and creative thinking and the ability to connect the dots. In addition to applying creative thinking, data analysis and data visualization play a big role in achieving insight. Data visualization is considered both an art and a science.

Data visualization history

Visualization has its roots in a long historical tradition of representing information using primitive paintings and maps on walls, tables of numbers, and paintings on clay. However, they were not known as visualization or data visualization. Data visualization is a new term; it expresses the idea that it involves more than just representing data in a graphical form. The information behind the data should be revealed in an intuitive representation using good display; the graphic should inherently aid viewers in seeing the structure of data.

Visualization before computers

In early Babylonian times, pictures were drawn on clay and in the later periods were rendered on papyrus. The goal of those paintings and maps was to provide the viewer with a qualitative understanding of the information. We also know that understanding pictures are our natural instincts as a visual presentation of information is perceived with greater ease. This section includes only partial details about the history of visualization. For elaborate details and examples, we recommend two interesting resources:

Minard's Russian campaign (1812)

Charles Minard was a civil engineer working in Paris. He summarized the War of 1812—Napoleon's march on Moscow—in a figurative map. This map is a simple picture, which is both a visual timeline and a geographic map depicting the size and direction of the army, temperature, and the landmarks and locations. Prof. Edward Tufte famously described this picture as possibly being the best statistical graphic ever drawn.

Minard's Russian campaign (1812)

The wedge starts with being thick on the left-hand side, and we see the army begin the campaign at the Polish border with 422,000 men. The wedge becomes narrower as it gets deeper into Russia and the temperature gets lower. This visualization manages to condense a number of different numeric and geographic facts into one image: when the army gets reduced, the reason for the reduction, and subsequently, their retreat.

The Cholera epidemics in London (1831-1855)

In October 1831, the first case of Asiatic cholera occurred in Great Britain, and over 52,000 people died in the epidemic. Subsequently, in 1848-1849 and 1853-1854, more cholera epidemics produced large death tolls.

In 1855, Dr. John Snow produced a map showing the deaths due to cholera clustered around the Broad Street pump in London. This map by Dr. John Snow was a landmark graphic discovery, but unfortunately, it was devised at the end of that period. His map showed the location of each of the deceased, and that provided an insight for his conclusion that the source of outbreak could be localized to contaminated water from a pump on Broad Street. Around that time, the use of graphs became important in economic and state planning.

Statistical graphics (1850-1915)

By the mid 18th century, a rapid growth of visualization had been established throughout Europe. In 1863, one page of Galton's multivariate weather chart of Europe showed barometric pressure, wind direction, rain, and temperature for the month of December 1861 (source: The life, letters and labors of Francis Galton, Cambridge University Press).

During this period, statistical graphics became mainstream and there were many textbooks written on the same. These textbooks contained detailed descriptions of the graphic method, discussing frequencies, and the effects of the choice of scales and baselines on the visual estimation of differences and ratios. They also contained historical diagrams in which two or more time series could be shown on a single chart for comparative views of their histories.

Later developments in data visualization

In the year 1962, John W. Tukey issued a call for the recognition of data analysis as a legitimate branch of statistics; shortly afterwards, he began the invention of a wide variety of new, simple, and effective graphic displays under the rubric Exploratory Data Analysis (EDA), which was followed by Exploratory Spatial Data Analysis (ESDA). Tukey later wrote a book titled Exploratory Data Analysis in 1977. There are a number of tools that are useful for EDA with graphical techniques, which are listed as follows:

  • Box-and-whisker plot (box plot)
  • Histogram
  • Multivari chart (from candlestick charts)
  • Run-sequence plot
  • Pareto chart (named after Vilfredo Pareto)
  • Scatter plot
  • Multidimensional scaling
  • Targeted projection pursuit

Visualization in scientific computing is emerging as an important computer-based field, with the goal to improve the understanding of data and to make quick real-time decisions. Today, the ability of medical doctors to diagnose ailments is dependent upon vision. For example, in hip-replacement surgeries, custom hips can now be fabricated before surgical procedures. Accurate measurements can be made prior to surgery using non-invasive 3D imaging thereby reducing the number of post-operative body rejections from 30 percent to a mere 5 percent (source: http://bonesmart.org/hip/hip-implants-specialized-and-custom-fitted-options/).

Visualization of the human brain structure and function in 3D is a research frontier of far-reaching importance. Few advances have transformed the fields of neuroscience and brain-imaging technology, like the ability to see inside and read the brain of a living human. For continued progress in brain research, it will be necessary to integrate structural and functional information at many levels of abstraction.

The rate at which the hardware performance power has been on the rise tells us that we are already able to analyze DNA sequences and visually represent them. The future advances in computing promises a much brighter progress in the fields of medicine and other scientific areas.

How does visualization help decision-making?

There is a variety of ways to represent data visually. However, there are only a few ways in which one can portray the data in a manner that allows one to see something visually and observe new patterns. Data visualization is not as easy as it seems; it is an art and requires a great deal of practice and experience. (Just like painting a picture—one cannot be a master painter from day one, it takes a lot of practice.)

Human perception plays an important role in the field of data visualization. A pair of healthy human eyes has a total field view of approximately 200 degrees horizontally (about 120 degrees of which are shared by both the eyes). About one quarter of the human brain is involved in visual processing, which is more than any other sense. Among the three senses of hearing, seeing, and smelling, human vision has the maximum sense—measured to be sixty per cent (http://contemplatingmadness.tumblr.com/post/27478393311/10-limits-to-human-perception-and-how-they-shape).

Effective visualization helps us in analyzing and understanding data. Author Stephen Few described the following eight types of quantitative messages (via visualization) that may help us with understanding or communicating from a set of data (source: https://www.perceptualedge.com/articles/ie/the_right_graph.pdf):

  • Time-series
  • Ranking
  • Part-to-whole
  • Deviation
  • Frequency distribution
  • Correlation
  • Nominal comparison
  • Geographic or geospatial

Scientists have mapped the human genome, and this is one of the reasons why we are faced with the challenges of transforming knowledge into a visual representation for better understanding. In other words, we may have to find new ways to visually present the human genome so that it is not difficult for a common person to understand.

Where does visualization fit in?

It is important to note that data visualization is not scientific visualization. Scientific visualization deals with the data that has an inherent physical structure, such as air molecules flowing over an aircraft wing. Information visualization, on the other hand, deals with abstract data, and helps in solving problems involving large datasets. One of the challenges is to ensure that the data is clean and subsequently, to reduce the dimensions so that unnecessary information is discarded.

Visualization can be used wherever we see increased knowledge or value of data. That can be determined by doing more data analysis and running through algorithms. The data analysis might vary from the simplest form to a more complicated one.

Sometimes, there is value in looking at data beyond the mean, median, or total, because these measurements only measure things that may seem obvious. Sometimes, aggregates or values around a region hide the interesting details that need special focus. One classic example is the "Anscombe's quartet" which comprises of four datasets that have nearly identical simple statistical properties yet appear very different when graphed. For more on this, one can refer to the link, https://en.wikipedia.org/wiki/Anscombe%27s_quartet.

Where does visualization fit in?

Mostly, datasets that lend themselves well to visualization can take different forms, but some paint a clearer picture to understand than others. In some cases, it is mandatory to analyze them several times to get a much better understanding of the visualization as shown in the preceding diagram.

A good visualization is not just a static picture that one can look at, like an exhibit in a museum. It is something that allows us to drill down and find more about the change in data. For example, view first, zoom and filter, change the values of some scale of display, and view the results in an incremental way, as described in http://www.mat.ucsb.edu/~g.legrady/academic/courses/11w259/schneiderman.pdf by Ben Shneiderman. Sometimes, it is much harder to display everything on a single display and on a single scale, and only by experience, one can better understand these visualization methods. Summarizing further, visualization is useful in both organizing and making sense out of data, particularly when it is in abundance.

Interactive visualization is emerging as a new form of communication, which allows users to analyze the information in order to construct their own, new understanding of the data.

Data visualization today

While many areas of computing aim to replace human judgment with automation, visualization systems are unique and are explicitly designed not to replace humans. In fact, they are designed to keep the humans actively involved in the whole process; why is that?

Data Visualization is an art, driven by data and yet created by humans with the help of various computing tools. An artist paints a picture using tools and materials like brushes, and colors. Similarly, another artist tries to create data visualization with the help of computing tools. Visualization can be aesthetically pleasing and helps in making things clear; sometimes, it may lack one or both of those qualities depending on the users who create it.

Today, there are over thirty different visual representations of data, each having a reason to represent data in that specific way. As the visualization methods progress, we have much more than just bar graphs and pie charts. Despite the many benefits of data visualization, they are undermined due to a lack of understanding and, in some cases, due to cluttering together of things on a dashboard that becomes too cumbersome.

There are many ways to present data, but only a handful of those make sense in most cases; this will be explained in detail in later sections of this chapter. Before that discussion, let us take a look at a list of some important things that make a good visualization.

What is a good visualization?

Good visualization helps the users to explore and understand data, providing value and deep insights. It is effective, visually appealing, scalable, and is easy to understand (good visualization does not have to be too complicated). Visualization is a central tool in finding patterns and trends in the data by carrying out research and analysis, using whichever one can answer questions about the data.

The main principle behind an effective visualization is to identify the main point that you want to make, recognize the level and background of your audience, accurately represent the data, and then create a clear presentation that conveys the message to that audience.

Example: The following representations have been created with a small sample data source that shows the percentage of women and men conferred with degrees in ten different disciplines for the years from 1970-2012 (womens-undergrad-degrees.csv and mens-undergrad-degrees.csv from http://www.knapdata.com/python/):

What is a good visualization?

The full data source available at http://nces.ed.gov/programs/digest/d11/tables/dt11_290.asp maintains the complete set of data.

One simple way is to represent them on one scale, although there is no relationship between the numbers between the different disciplines. Let us analyze and see if this representation makes sense, and if it doesn't, then what else do we need? Are there any other representations?

For one thing, all the data about the different disciplines is displayed on one screen, which is an excellent comparison. However, if we need to get the information for the year 2000, there is no straightforward way. Unless there is an interactive mode of display that is similar to a financial stock chart, there is no easy way to determine the information about the degrees conferred in multiple disciplines for the year 2000. Another confusing part of these plots is that the percentage doesn't add up to a sum of 100 percent. On the other hand, the percentage of conferred degrees within one discipline for men and women add up to 100 percent; for instance, the percentage of degrees conferred in the Health Professions discipline for men and women are 15.2 percent and 84.8 percent respectively.

Can we represent these through other visualization methods? One can create bubble charts for each year, have an interactive visualization with year selection, and also have a play button that transitions the bubbles for each year.

This visualization better suits the data that we are looking at. We can also use the same slider with the original plot and make it interactive by highlighting the data for the selected year. It is a good habit to visualize the data in several different ways to see if some display makes more sense than the other. We may have to scale the values on a logarithmic scale if there is a very large range of numerical values (for example, from 20 to 200,000).

One can write a program in Python to accomplish this bubble chart. Other alternate languages are JavaScript using D3.js and R using R-Studio. It is left for the reader to explore other visualization options.

Google Motion Chart can be used for visualization to represent this interactive chart at developers.google.com/chart/interactive/docs/gallery/motionchart?csw=1#Example where it shows a working example that is similar to this bubble chart. The bubble chart shown here is for only three years, but you can create another one for all the years.

What is a good visualization?

Data visualization is a process that has to be used after data analysis. We also noticed earlier that data transformation, data analysis, and data visualization are done several times; why is that so? We all know the famous quote, Knowledge is having the right answer, Intelligence is asking the right question. Data analysis helps us to understand the data better and therefore be in a position to respond to questions about the data. However, when the data is represented visually in several different ways, some new questions emerge, and this is one of the reasons why there is a repeated process of analysis and visualization.

Visualization of data is one of the primary tools for data exploration, and almost always precedes or inspires data analysis. There are many tools to display data visually, but there are fewer tools to do the analysis. Programming languages like Julia, R, and Python have ranked higher for performing data analysis, but for visualization, JavaScript based D3.js has a much greater potential to generate interactive data visualization.

Between R and Python, R is a more difficult language to learn. Python, on the other hand, is much easier. This is also debated on Quora; one may check the validity of this on the internet (https://www.quora.com/Which-is-better-for-data-analysis-R-or-Python). Today there are numerous tools in Python for statistical modeling and data analysis, and therefore, it is an attractive choice for data science.

Visualization plots

One of the reasons why we perform visualization is to confirm our knowledge of data. However, if the data is not well understood, you may not frame the right questions about the data.

When creating visualizations, the first step is to be clear on the question to be answered. In other words, how is visualization going to help? There is another challenge that follows this—knowing the right plotting method. Some visualization methods are as follows:

  • Bar graph and pie chart
  • Box plot
  • Bubble chart
  • Histogram
  • Kernel Density Estimation (KDE) plot
  • Line and surface plot
  • Network graph plot
  • Scatter plot
  • Tree map
  • Violin plot

In the course of identifying the message that the visualization should convey, it makes sense to look at the following questions:

  • How many variables are we dealing with, and what are we trying to plot?
  • What do the x axis and y axis refer to? (For 3D, z axis as well.)
  • Are the data sizes normalized and does the size of data points mean anything?
  • Are we using the right choices of colors?
  • For time series data, are we trying to identify a trend or a correlation?

If there are too many variables, it makes sense to draw multiple instances of the same plot on different subsets of data. This technique is called lattice or trellis plotting. It allows a viewer to quickly extract a large amount of information about complex data.

Consider a subset of student data that has an unusual mixture of information about (gender, sleep, tv, exercise, computer, gpa) and (height, momheight, dadheight). The units for computer, tv, sleep, and exercise are hours, height is in inches and gpa is measured on a scale of 4.0.

Visualization plots

The preceding data is an example that has more variables than usual, and therefore, it makes sense to do a trellis plot to visualize and see the relationship between these variables.

One of the reasons we perform visualization is to confirm our knowledge of data. However, if the data is not well understood, one may not frame the right questions about it.

Since there are only two genders in the data, there are 10 combinations of variables that can be possible (sleep, tv), (sleep, exercise), (sleep, computer), (sleep, gpa), (tv, exercise), (tv, computer), (tv, gpa), (exercise, computer), (exercise, gpa), and (computer, gpa) for the first set of variables; another two, (height, momheight) and (height, dadheight) for the second set. Following are all the combinations except (sleep, tv), (tv, exercise).

Visualization plots

Our goal is to find what combination of variables can be used to make some sense out of this data, or to see if any of these variables have any meaningful impact. Since the data is about students, gpa may be a key variable that drives the relevance of the other variables. The preceding image depicts scatter plots that show that a greater number of female students have a higher gpa than the male students and a greater number of male students spend more time on computer and get a similar gpa range of values. Although all scatter plots are being shown here, the intent is to find out which data plays a more significant role, and what sense can we make out of this data.

Visualization plots

A greater number of blue dots high up (for gpa on the y axis) shows that there are more female students with a higher gpa (this data was collected from UCSD).

The data can be downloaded from http://www.knapdata.com/python/ucdavis.csv.

One can use the seaborn package and display a scatter plot with very few lines of code, and the following example shows a scatter plot of gpa along the x - axis compared with the time spent on computer by students:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

students = pd.read_csv("/Users/kvenkatr/Downloads/ucdavis.csv")

g = sns.FacetGrid(students, hue="gender", palette="Set1", size=6)
g.map(plt.scatter, "gpa", "computer", s=250, linewidth=0.65,
  edgecolor="white")

g.add_legend()

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

These plots were generated using the matplotlib, pandas, and seaborn library packages. Seaborn is a statistical data visualization library based on matplotlib, created by Michael Waskom from Stanford University. Further details about these libraries will be discussed in the following chapters.

There are many useful classes in the Seaborn library. In particular, the FacetGrid class comes in handy when we need to visualize the distribution of a variable or the relationship between multiple variables separately within subsets of data. FacetGrid can be drawn with up to three dimensions, that is, row, column and hue. These library packages and their functions will be described in later chapters.

When creating visualizations, the first step is to be clear on the question to be answered. In other words, how is visualization going to help? The other challenge is choosing the right plotting method.

Bar graphs and pie charts

When do we choose bar graphs and pie charts? They are the oldest visualization methods and pie chart is best used to compare the parts of a whole. However, bar graphs can compare things between different groups to show patterns.

Bar graphs, histograms, and pie charts help us compare different data samples, categorize them, and determine the distribution of data values across that sample. Bar graphs come in several different styles varying from single, multiple, and stacked.

Bar graphs

Bar graphs are especially effective when you have numerical data that splits nicely into different categories, so you can quickly see trends within your data.

Bar graphs are useful when comparing data across categories. Some notable examples include the following:

  • Volume of jeans in different sizes
  • World population change in the past two decades
  • Percent of spending by department

In addition to this, consider the following:

  • Add color to bars for more impact: Showing revenue performance with bars is informative, but adding color to reveal the profits adds visual insight. However, if there are too many bars, colors might make the graph look clumsy.
  • Include multiple bar charts on a dashboard: This helps the viewer to quickly compare related information instead of flipping through a bunch of spreadsheets or slides to answer a question.
  • Put bars on both sides of an axis: Plotting both positive and negative data points along a continuous axis is an effective way to spot trends.
  • Use stacked bars or side-by-side bars: Displaying related data on top of or next to each other gives depth to your analysis and addresses multiple questions at once.

These plots can be achieved with fewer than 12 lines of Python code, and more examples will be discussed in the later chapters.

With bar graphs, each column represents a group defined by a specific category; with histograms, each column represents a group defined by a quantitative variable. With bar graphs, the x axis does not have a low-end or a high-end value, because the labels on the x axis are categorical and not quantitative. On the other hand, in a histogram, there is going to be a range of values. The following bar graph shows the statistics of Oscar winners and nominees in the US from 2000-2009:

Bar graphs

The following Python code uses matplotlib to display bar graphs for a small data sample from the movies (This may not necessarily be a real example, but gives an idea of plotting and comparing):

[5]: import numpy as np
     import matplotlib.pyplot as plt

     N = 7
     winnersplot = (142.6, 125.3, 62.0, 81.0, 145.6, 319.4, 178.1)

     ind = np.arange(N)  # the x locations for the groups
     width = 0.35        # the width of the bars

     fig, ax = plt.subplots()
     winners = ax.bar(ind, winnersplot, width, color='#ffad00')

     nomineesplot = (109.4, 94.8, 60.7, 44.6, 116.9, 262.5, 102.0)
     nominees = ax.bar(ind+width, nomineesplot, width,
       color='#9b3c38')

     # add some text for labels, title and axes ticks
     ax.set_xticks(ind+width)
     ax.set_xticklabels( ('Best Picture', 'Director', 'Best Actor',
       'Best Actress','Editing', 'Visual Effects', 'Cinematography'))

     ax.legend( (winners[0], nominees[0]), ('Academy Award Winners',  
       'Academy Award Nominees') )

     def autolabel(rects):
       # attach some text labels
       for rect in rects:
         height = rect.get_height()
         hcap = "$"+str(height)+"M"
         ax.text(rect.get_x()+rect.get_width()/2., height, hcap,
           ha='center', va='bottom', rotation="vertical")

     autolabel(winners)
     autolabel(nominees)

     plt.show()

Pie charts

When it comes to pie charts, one should really consider answering the questions, "Do the parts make up a meaningful whole?" and "Do you have sufficient real-estate to represent them using a circular view?". There are critics who come crashing down on pie charts, and one of the main reasons, for that is that when there are numerous categories, it becomes very hard to get the proportions and compare those categories to gain any insight. (Source: https://www.quora.com/How-and-why-are-pie-charts-considered-evil-by-data-visualization-experts).

Pie charts are useful for showing proportions on a single space or across a map. Some notable examples include the following:

  • Response categories from a survey
  • Top five company market shares in a specific technology (in this case, one can quickly know which companies have a major share in the market)

In addition to this, consider the following:

  • Limit pie wedges to eight: If there are more than eight proportions to represent, consider a bar graph. Due to limited real - estate, it is difficult to meaningfully represent and interpret the pieces.
  • Overlay pie charts on maps: Pie charts can be much easier to spread across a map and highlight geographical trends. (The wedges should be limited here too.)

Consider the following code for a simple pie-chart to compare how the intake of admissions among several disciplines are distributed:

[6]: import matplotlib.pyplot as plt

     labels = 'Computer Science', 'Foreign Languages', 
       'Analytical Chemistry', 'Education', 'Humanities', 
       'Physics', 'Biology', 'Math and Statistics', 'Engineering'

     sizes = [21, 4, 7, 7, 8, 9, 10, 15, 19]
     colors = ['yellowgreen', 'gold', 'lightskyblue', 'lightcoral',
       'red', 'purple', '#f280de', 'orange', 'green']
     explode = (0,0,0,0,0,0,0,0,0.1)
     plt.pie(sizes, explode=explode, labels=labels, 
       autopct='%1.1f%%', colors=colors)
     plt.axis('equal')
     plt.show()

The following pie chart example shows the university admission intake in some chosen top-study areas:

Pie charts

Box plots

Box plots are also known as box-and-whisker plots. This is a standardized way of displaying the distribution of data based on the five number summaries: minimum, first quartile, median, third quartile, and maximum. The following diagram shows how a box plot can be read:

Box plots

A box plot is a quick way of examining one or more sets of data graphically, and they take up less space to define five summaries at a time. One example that we can think of for this usage is: if the same exam is given to two or more classes, then a box plot can tell when the most students in one class did better than most students in the other class. Another example is that if there are more people who eat burgers, the median is going to be higher or the top whisker could be longer than the bottom one. In such a case, it gives one a good overview of the data distribution.

Before we try to understand when to use box plots, here is a definition that one needs to understand. An outlier in a collection of data values is an observation that lies at an abnormal distance from other values.

Box plots are most useful in showing the distribution of a set of data. Some notable examples are as follows:

  • Identifying outliers in the data
  • Determining how the data is skewed towards either end

In addition to this, consider the following:

  • Hide the points within the box: focus on the outliers
  • Compare across distributions: Box plots are good for comparing quickly with distributions between data set

Scatter plots and bubble charts

A scatter plot is a type of visualization method for displaying two variables. The pattern of their intersecting points can graphically show the relationship patterns. A scatter plot is a visualization of the relationship between two variables measured on the same set of individuals. On the other hand, a Bubble chart displays three dimensions of data. Each entity with its triplet (a,b,c) of associated data is plotted as a disk that expresses two of those three variables through the xy location and the third shows the quantity measured for significance.

Scatter plots

The data is usually displayed as a collection of points, and is often used to plot various kinds of correlations. For instance, a positive correlation is noticed when the increase in the value of one set of data increases the other value as well. The student record data shown earlier has various scatter plots that show the correlations among them.

In the following example, we compare the heights of students with the height of their mother to determine if there is any positive correlation. The data can be downloaded from http://www.knapdata.com/python/ucdavis.csv.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
students = pd.read_csv("/Users/Macbook/python/data/ucdavis.csv")
g = sns.FacetGrid(students, palette="Set1", size=7)
g.map(plt.scatter, "momheight", "height", s=140, linewidth=.7, edgecolor="#ffad40", color="#ff8000")
g.set_axis_labels("Mothers Height", "Students Height")

We demonstrate this example using the seaborn package, but one can also accomplish this using only matplotlib, which will be shown in the following section. The scatterplot map for the preceding code is depicted as follows:

Scatter plots

Scatter plots are most useful for investigating the relationship between two different variables. Some notable examples are as follows:

  • The likelihood of having skin cancer at different ages in males versus females
  • The correlation between the IQ test score and GPA

In addition to this, consider the following:

  • Add a trend line or line of best-fit (if the relation is linear): Adding a trend line can show the correlation among the data values
  • Use informative mark types: Informative mark types should be used if the story to be revealed is about data that can be visually enhanced with relevant shapes and colors

Bubble charts

The following example shows how one can use color map as a third dimension that may indicate the volume of sales or any appropriate indicator that drives the profit:

 [7]: import numpy as np
     import pandas as pd
     import seaborn as sns
     import matplotlib.pyplot as plt

     sns.set(style="whitegrid")
     mov = pd.read_csv("/Users/MacBook/python/data/2014_gross.csv")

     x=mov.ProductionCost
     y=mov.WorldGross
     z=mov.WorldGross

     cm = plt.cm.get_cmap('RdYlBu')
     fig, ax = plt.subplots(figsize=(12,10))

     sc = ax.scatter(x,y,s=z*3, c=z,cmap=cm, linewidth=0.2, alpha=0.5)
     ax.grid()
     fig.colorbar(sc)

     ax.set_xlabel('Production Cost', fontsize=14)
     ax.set_ylabel('Gross Profits', fontsize=14)

     plt.show()
..-.

The following scatter plot is the result of the example using color map:

Bubble charts

Bubble charts are extremely useful for comparing relationships between data in three numeric-data dimensions: the x axis data, the y axis data, and the data represented by the bubble size. Bubble charts are like XY scatter plots, except that each point on the scatter plot has an additional data value associated with it that is represented by the size of the circle or "bubble" centered on the XY point. Another example of a bubble chart is shown here (without the python code, to demonstrate a different style):

Bubble charts

In the preceding display, the bubble chart shows the Life Expectancy versus Gross Domestic Product per Capita around different continents.

Bubble charts are most useful for showing the concentration of data along two axes with a third data element being the significance value measured. Some notable examples are as follows:

  • The production cost of movies and gross profit made, and the significance measured along a heated scale as shown in the example

In addition to this, consider the following:

  • Adding color and shape significance: By varying the size and color, the data points can be transformed into a visualization that clearly answers some questions
  • Make it interactive: If there are too many data points, bubble charts could get cluttered, so group them on the time axis or categories, and visualize them interactively

KDE plots

Kernel Density Estimation (KDE) is a non-parametric way to estimate the probability density function and its average across the observed data points to create a smooth approximation. They are closely related to histograms, but sometimes can be endowed with smoothness or continuity by a concept called kernel.

The kernel of a Probability Density Function (PDF) is the form of the PDF in which any factors that are not functions of any of the variables in the domain are omitted. We will focus only on the visualization aspect of it; for more theory, one may refer to books on statistics.

There are several different Python libraries that can be used to accomplish a KDE plot at various depths and levels including matplotlib, Scipy, scikit-learn, and seaborn. Following are two examples of KDE Plots. There will be more examples in later chapters, wherever necessary to demonstrate various other ways of displaying KDE plots.

In the following example, we use a random dataset of size 250 and the seaborn package to show the distribution plot in a few simple lines:

KDE plots

One can display simple distribution of a data plot using seaborn, which is demonstrated here using a random sample generated using numpy.random:

from numpy.random import randn
import matplotlib as mpl
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_palette("hls")
mpl.rc("figure", figsize=(10,6))
data = randn(250)
plt.title("KDE Demonstration using Seaborn and Matplotlib", fontsize=20)
sns.distplot(data, color='#ff8000')

In the second example, we are demonstrating the probability density function using SciPy and NumPy. First we use norm() from SciPy to create normal distribution samples and later, use hstack() from NumPy to stack them horizontally and apply gaussian_kde() from SciPy.

KDE plots

The preceding plot is the result of a KDE plot using SciPy and NumPy, which is shown as follows:

from scipy.stats.kde import gaussian_kde
from scipy.stats import norm
from numpy import linspace, hstack
from pylab import plot, show, hist

sample1 = norm.rvs(loc=-1.0, scale=1, size=320)
sample2 = norm.rvs(loc=2.0, scale=0.6, size=320)
sample = hstack([sample1, sample2])
probDensityFun = gaussian_kde(sample)
plt.title("KDE Demonstration using Scipy and Numpy", fontsize=20)
x = linspace(-5,5,200)
plot(x, probDensityFun(x), 'r')
hist(sample, normed=1, alpha=0.45, color='purple')
show()

The other visualization methods such as the line and surface plot, network graph plot, tree maps, heat maps, radar or spider chart, and the violin plot will be discussed in the next few chapters.

Summary

The examples shown so far are just to give you an idea of how one should think and plan before making a presentation. The most important stage is the data familiarization and preparation process for visualization. Whether one can get the data first or shape the desired story is mainly influenced by exactly what outcome is attempted. It is like the "chicken and the egg" situation—does data come first or the focus? Initially, it may not be clear what data one may need, but in most cases, after a few iterations, things will be clear as long as there are no errors in the data.

Transform the quality of data by doing some cleanup or reducing the dimensions (if required), and fill gaps if any. Unless the data is good, the efforts that one may put into presenting it visually will be wasted. After a reasonable understanding of the data is achieved, it makes sense to determine what kind of visualization may be appropriate. In some cases, it would be better to display it in several different ways to see the story clearly.

Left arrow icon Right arrow icon

Description

Python has a handful of open source libraries for numerical computations involving optimization, linear algebra, integration, interpolation, and other special functions using array objects, machine learning, data mining, and plotting. Pandas have a productive environment for data analysis. These libraries have a specific purpose and play an important role in the research into diverse domains including economics, finance, biological sciences, social science, health care, and many more. The variety of tools and approaches available within Python community is stunning, and can bolster and enhance visual story experiences. This book offers practical guidance to help you on the journey to effective data visualization. Commencing with a chapter on the data framework, which explains the transformation of data into information and eventually knowledge, this book subsequently covers the complete visualization process using the most popular Python libraries with working examples. You will learn the usage of Numpy, Scipy, IPython, MatPlotLib, Pandas, Patsy, and Scikit-Learn with a focus on generating results that can be visualized in many different ways. Further chapters are aimed at not only showing advanced techniques such as interactive plotting; numerical, graphical linear, and non-linear regression; clustering and classification, but also in helping you understand the aesthetics and best practices of data visualization. The book concludes with interesting examples such as social networks, directed graph examples in real-life, data structures appropriate for these problems, and network analysis. By the end of this book, you will be able to effectively solve a broad set of data analysis problems.

What you will learn

  • Gather, cleanse, access, and map data to a visual framework
  • Recognize which visualization method is applicable and learn best practices for data visualization
  • Get acquainted with reader-driven narratives and author-driven narratives and the principles of perception
  • Understand why Python is an effective tool to be used for numerical computation much like MATLAB, and explore some interesting data structures that come with it
  • Explore with various visualization choices how Python can be very useful in computation in the field of finance and statistics
  • Get to know why Python is the second choice after Java, and is used frequently in the field of machine learning
  • Compare Python with other visualization approaches using Julia and a JavaScript-based framework such as D3.js
  • Discover how Python can be used in conjunction with NoSQL such as Hive to produce results efficiently in a distributed environment

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Oct 27, 2015
Length: 372 pages
Edition : 1st
Language : English
ISBN-13 : 9781783988327
Vendor :
Anaconda
Category :
Languages :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Oct 27, 2015
Length: 372 pages
Edition : 1st
Language : English
ISBN-13 : 9781783988327
Vendor :
Anaconda
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 152.97
Mastering Python Data Visualization
$54.99
Python Data Visualization Cookbook (Second Edition)
$48.99
Python Machine Learning
$48.99
Total $ 152.97 Stars icon
Banner background image

Table of Contents

10 Chapters
1. A Conceptual Framework for Data Visualization Chevron down icon Chevron up icon
2. Data Analysis and Visualization Chevron down icon Chevron up icon
3. Getting Started with the Python IDE Chevron down icon Chevron up icon
4. Numerical Computing and Interactive Plotting Chevron down icon Chevron up icon
5. Financial and Statistical Models Chevron down icon Chevron up icon
6. Statistical and Machine Learning Chevron down icon Chevron up icon
7. Bioinformatics, Genetics, and Network Models Chevron down icon Chevron up icon
8. Advanced Visualization Chevron down icon Chevron up icon
A. Go Forth and Explore Visualization Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.5
(4 Ratings)
5 star 50%
4 star 50%
3 star 0%
2 star 0%
1 star 0%
Math Review Nov 17, 2015
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I have not completely read the book yet, I liked the examples in sports and Monte Carlo Simulations.
Amazon Verified review Amazon
Oleg Okun Dec 06, 2015
Full star icon Full star icon Full star icon Full star icon Full star icon 5
As its title says, this book is about exploration of data visualization in Python. The author approaches to this task by not only featuring the available Python functionality to visualize data, but by putting it into the context of "visualization of information for knowledge inference". In his words, this means not the visualization per se, but knowledge discovery aimed visualization which is the integral part of data science related projects. With this goal in mind, the author conveys readers through a number of real-world stories (taken from finance, sports, bioinformatics, natural language processing) accompanied by plots of various kinds.As a Python distribution, Anaconda is chosen as it includes many pre-installed packages. As any knowledge discovery assumes data processing and analysis, the discussion is also on numpy, scipy, matplotlib, scikit-learn, NetworkX, bokeh, IPython, plotly and a few others less commonly known packages as well as on their application to the book main topic.The book presumes some background knowledge of Python from readers and therefore it is best suitable for those who have exposure to Python programming but wants to acquire data visualization skills, meaning the typical job titles of a data analyst or a data engineer.
Amazon Verified review Amazon
yoalieh Dec 09, 2015
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
I liked this book, though it's not easy to be loved.I'd liked the introduction a lot, as the author talked about data visualization as a discipline, and gave some tips and ideas of diferent kind of visualizations (There's is a lot more than graph bars and scatterplots it seems, ;) ). It tries to be discipline-agnostic by using many real life examples from many disciplines. I think this can bring inspiration when in need of a way to present information hard to explain.After that, when talking about Python, it gives an overview about Python versions and libraries which can simplify the process of creating good visualizations. Finally, almost all examples are based in Conda, but still other things are used. This can cause a bit of confussion, but I see it as one of the potential of this book, as it can be used as reference to create good visualizations in different workflows, and serves as a reference about which libraries can be used for a special kind of visualization if it's not covered by one of them.The examples in further chapters are very good, and I loved when it talks about Numpy, simulation, or advanced data structures, all of which can be used to create better visualization, or even the part talking about drawing graphs.Don't expect this book to be a cookbook, it's more like a big notebook of a professional in charge of creating a LOT of visualizations for different fields. I think it lacks a bit of more explaining on some specfic examples or libraries, but that would require a lot more books to fit them. Also, a very good level of Python understanding, and documentation for each library in use is not only recommended, but a must.
Amazon Verified review Amazon
Amazon Customer Dec 12, 2015
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
[Disclaimer: Packt Publishing asked me to review the book in light of my Github public profile. I was given complete editorial freedom and NOT compensated in anyway for the review however]Overall, I enjoyed this book, although I suspect it's real value will become apparent when I return to it over the next few years when faced with visualising tricky datasets. Broadly, Kirthi Raman covers three areas: Introducing visualisation as an activity itself (he considers it a form of story telling), several Python tools for visualisation and analytic techniques that can drive the visualisation/modelling process. I particularly like that a plethora of approaches are encouraged, so that if you find one isn’t suited to what you’re doing, there are always plenty other to consider. As someone who uses Python on a daily basis to both model and visualise a variety of data sources, Raman's book is an important addition to my professional library.Where I find the book lacking is in providing a clear path to applying the array of techniques and packages suggested. To be clear, there are good code examples for almost every visualisation/analytic technique (the financial models are particularly well explained), but I would have liked more explanation/worked examples of going from a raw dataset to a professional visualisation.Another minor criticism is that it is quite ambitious in its scope (there are whole journals devoted to some of the modelling techniques covered in a few pages), but by making the reader aware of these approaches, the reader can always read further.To end on a practical note, I like that the publisher makes the book available in multiple formats, including Kindle and DRM-free PDF. This is very practical for reading (and using) the book over multiple devices. I would recommend a colour display though, so as to enjoy the full effect of the many visualisation examples.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.