Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Mastering Numerical Computing with NumPy
Mastering Numerical Computing with NumPy

Mastering Numerical Computing with NumPy: Master scientific computing and perform complex operations with ease

Arrow left icon
Profile Icon Mert Cakmak Profile Icon Tiago Antao Profile Icon Cuhadaroglu
Arrow right icon
Free Trial
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (1 Ratings)
Paperback Jun 2018 248 pages 1st Edition
eBook
Mex$447.98 Mex$639.99
Paperback
Mex$799.99
Subscription
Free Trial
Arrow left icon
Profile Icon Mert Cakmak Profile Icon Tiago Antao Profile Icon Cuhadaroglu
Arrow right icon
Free Trial
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (1 Ratings)
Paperback Jun 2018 248 pages 1st Edition
eBook
Mex$447.98 Mex$639.99
Paperback
Mex$799.99
Subscription
Free Trial
eBook
Mex$447.98 Mex$639.99
Paperback
Mex$799.99
Subscription
Free Trial

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Mastering Numerical Computing with NumPy

Working with NumPy Arrays

Scientific computing is a multidisciplinary field, with its applications spanning across disciplines such as numerical analysis, computational finance, and bioinformatics.

Let's consider a case for financial markets. When you think about financial markets, there is a huge interconnected web of interactions. Governments, banks, investment funds, insurance companies, pensions, individual investors, and others are involved in this exchange of financial instruments. You can't simply model all the interactions between market participants because everyone who is involved in financial transactions has different motives and different risk/return objectives. There are also other factors which affect the prices of financial assets. Even modeling one asset price requires you to do a tremendous amount of work, and your success is not guaranteed. In mathematical terms, this doesn't have a closed-form solution and this makes a great case for utilizing scientific computing where you can use advanced computing techniques to attack such problems.

By writing computer programs, you will have the power to better understand the system you are working on. Usually, the computer program you will be writing will be some sort of simulation, such as the Monte Carlo simulation. By using a simulation such as Monte Carlo, you can model the price of option contracts. Pricing financial assets is a good material for simulations, simply because of the complexity of financial markets. All of these mathematical computations need a powerful, scalable and convenient structure for your data (which is mostly in matrix form) when you do your computation. In other words, you need a more compact structure than a list in order to simplify your task. NumPy is a perfect candidate for performant vector/matrix operations and its extensive library of mathematical operations makes numeric computing easy and efficient.

In this chapter, we will cover the following topics:

  • The importance of NumPy
  • Theoretical and practical information about vectors and matrices
  • NumPy array operations and their usage in multidimensional arrays

The question is, where should we start practicing coding skills? In this book, you will be using Python because of its huge adoption in the scientific community, and you will mainly work with a specific library called NumPy, which stands for numerical Python.

Technical requirements

In this book, we will use Jupyter Notebooks. We will edit and run Python code via a web browser. It's an open source platform which you can install by following the instructions in this link: http://jupyter.org/install.

This book will be using Python 3.x, so when you open a new notebook, you should pick Python 3 kernel. Alternatively, you can install Jupyter Notebook using Anaconda (Python version 3.6), which is highly recommended. You can install it by following the instructions in this link: https://www.anaconda.com/download/.

Why do we need NumPy?

Python has become a rockstar programming language recently, not only because it has friendly syntax and readability, but because it can be used for a variety of purposes. Python's ecosystem of various libraries makes various computations relatively easy for programmers. Stack Overflow is one the most popular websites for programmers. Users can ask questions by tagging which programming language they relate to. The following figure shows the growth of major programming languages by calculating these tags and plot the popularity of major programming languages over the years. The research conducted by Stack Overflow can be further analyzed via this link to their official blog: https://stackoverflow.blog/2017/09/06/incredible-growth-python/:

Growth of major programming languages

NumPy is the most fundamental package for scientific computing in Python and is the base for many other packages. Since Python was not initially designed for numerical computing, this need has arised in the late 90's when Python started to become popular among engineers and programmers who needed faster vector operations. As you can see from the following figure, many popular machine learning and computational packages use some of NumPy's features, and the most important thing is that they use NumPy arrays heavily in their methods, which makes NumPy an essential library for scientific projects.

The figure shows some well-known libraries which use NumPy features:

NumPy stack

For numerical computing, you mainly work with vectors and matrices. You can manipulate them in different ways by using a range of mathematical functions. NumPy is a perfect fit for these kinds of situations since it allows users to have their computations completed efficiently. Even though Python lists are very easy to create and manipulate, they don't support vectorized operations. Python doesn't have fixed type elements in lists and for example, for loop is not very efficient because, at every iteration, data type needs to be checked. In NumPy arrays, however, the data type is fixed and also supports vectorized operations. NumPy is not just more efficient in multidimensional array operations comparing to Python lists; it also provides many mathematical methods that you can apply as soon as it's imported. NumPy is a core library for the scientific Python data science stack.

SciPy has strong relationship with NumPy as it's using NumPy multidimensional arrays as a base data structure for its scientific functions for linear algebra, optimization. interpolation, integration, FFT, signal and image processing and others. SciPy was built on top of the NumPy array framework and uplifted scientific programming with its advanced mathematical functions. Therefore some parts of the NumPy API have been moved to SciPy. This relationship with NumPy makes SciPy more convenient for advanced scientific computing in many cases.

To sum this up, we can summarize NumPy's advantages as follows:

  • It's open source and zero-cost
  • It's a high-level programming language with user-friendly syntax
  • It's more efficient than Python lists
  • It has more advanced built-in functions and is well-integrated with other libraries

Who uses NumPy?

In both academic and business circles, you will hear people talking about the tools and technologies they use in their work. Depending on the environment and conditions, you might need to work with specific technologies. For example, if your company has already invested in SAS, you will need to carry out your project in the SAS development environment suited to your problem.

However, one of the advantages of NumPy is that it's open source, and it costs nothing for you to utilize it in your project. If you have already coded in Python, it's super easy to learn. If performance is your concern, you can easily embed C or Fortran code. Moreover, it will introduce you to a whole other set of libraries such as SciPy and Scikit-learn, which you can use to solve almost any problem.

Since data mining and predictive analytics became really important recently, roles like Data Scientist and Data Analyst are mentioned as the hottest jobs of the 21st century in many business journals such as Forbes, Bloomberg, and so on. People who need to work with data and do analysis, modeling, or forecasting should become familiar with NumPy's usage and its capabilities, as it will help you quickly prototype and test your ideas. If you are a working professional, your firm most probably wants to use data analysis methods in order to move one step ahead of its competitors. If they can better understand the data they have, they can understand the business better, and this will lead them to make better decisions. NumPy plays a critical role here as it is capable of performing wide range of operations and making your projects timewise efficient.

Introduction to vectors and matrices

A matrix is a group of numbers or elements which are arranged as a rectangular array. The matrix's rows and columns are usually indexed by letter. For a n x m matrix, n represents the number of rows and m represents the number of columns. If we have a hypothetical n x m matrix, it will be structured as follows:

If n = m, then it is called a square matrix:

A vector is actually a matrix with one row or one column with more than one element. It can also be defined as the 1-by-m or n-by-1 matrix. You can interpret a vector as an arrow or direction in an m dimensional space. Generally, the capital letter denotes a matrix, like X in this example, and lowercase letters with subscripts like X11 denote the element of the matrix X.

In addition, there are some important special matrices: the zero matrix (null matrix) and the identity matrix. 0 denotes the zero matrix, which is a matrix of all 0s (MacDufee 1943 p.27). In a 0 matrix, it's optional to add subscripts:

The identity matrix denoted by I, and its diagonal elements are 1 while the others are 0:

When you multiply a matrix X with the identity matrix, the result will be equal to X:

An identity matrix is very useful for calculating the inverse of a matrix. When you multiply any given matrix with its inverse, the result will be an identity matrix:

Let's briefly see the matrix algebra on NumPy arrays. Addition and subtraction operations for matrices are similar to math equations with ordinary single numbers. As an example:

Scalar multiplication is also pretty straightforward. As an example, if you multiply your matrix X by 4, the only thing that you should do is multiply each element with the value 4 as follows:

The seemingly complicated part of matrix manipulation at the beginning is matrix multiplication.

Imagine you have two matrices as X and Y, where X is an matrix and Y is an matrix:

The product of these two matrices will be as follows:

So each element of the product matrix is calculated as follows:

Don't worry if you didn't understand the notation. The following example will make things clearer. You have matrices X and Y and the goal is to get the matrix product of these matrices:

The basic idea is that the product of the ith row of X and the jth of column Y will become the ith, jth element of the matrix in the result. Multiplication will start with the first row of X and the first column of Y, so their product will be Z[1,1]:

You can cross-check the results easily with the following four lines of code:

In [1]: import numpy as np
x = np.array([[1,0,4],[3,3,1]])
y = np.array([[2,5],[1,1],[3,2]])
x.dot(y)
Out[1]: array([[14, 13],[12, 20]])

The previous code block is just a demonstration of how easy to calculate the dot product of two matrices by use of NumPy. In later chapters, we will go more in deep into matrix operations and linear algebra.

Basics of NumPy array objects

As mentioned in the preceding section, what makes NumPy special is the usage of multidimensional arrays called ndarrays. All ndarray items are homogeneous and use the same size in memory. Let's start by importing NumPy and analyzing the structure of a NumPy array object by creating the array. You can easily import this library by typing the following statement into your console. You can use any naming convention instead of np, but in this book, np will be used as it's the standard convention. Let's create a simple array and explain what the attributes hold by Python behind the scenes as metadata of the created array, so-called attributes:

In [2]: import numpy as np
x = np.array([[1,2,3],[4,5,6]])
x
Out[2]: array([[1, 2, 3],[4, 5, 6]])
In [3]: print("We just create a ", type(x))
Out[3]: We just create a <class 'numpy.ndarray'>
In [4]: print("Our template has shape as" ,x.shape)
Out[4]: Our template has shape as (2, 3)
In [5]: print("Total size is",x.size)
Out[5]: Total size is 6
In [6]: print("The dimension of our array is " ,x.ndim)
Out[6]: The dimension of our array is 2
In [7]: print("Data type of elements are",x.dtype)
Out[7]: Data type of elements are int32
In [8]: print("It consumes",x.nbytes,"bytes")
Out[8]: It consumes 24 bytes

As you can see, the type of our object is a NumPy array. x.shape returns a tuple which gives you the dimension of the array as an output such as (n,m). You can get the total number of elements in an array by using x.size. In our example, we have six elements in total. Knowing attributes such as shape and dimension is very important. The more you know, the more you will be comfortable with computations. If you don't know your array's size and dimensions, it wouldn't be wise to start doing computations with it. In NumPy, you can use x.ndim to check what the dimension of your array is. There are other attributes such as dtype and nbytes, which are very useful while you are checking memory consumption and verifying what kind of data type you should use in the array. In our example, each element has a data type of int32 that consumes 24 bytes in total. You can also force some of these attributes while creating your array such as dtype. Previously, the data type was an integer. Let's switch it to float, complex, or uint (unsigned integer). In order to see what the data type change does, let's analyze what byte consumption is, which is shown as follows:

In [9]: x = np.array([[1,2,3],[4,5,6]], dtype = np.float)
print(x)
Out[9]: print(x.nbytes)
[[ 1. 2. 3.]
[ 4. 5. 6.]]
48
In [10]: x = np.array([[1,2,3],[4,5,6]], dtype = np.complex)
print(x)
print(x.nbytes)
Out[10]: [[ 1.+0.j 2.+0.j 3.+0.j]
[ 4.+0.j 5.+0.j 6.+0.j]]
96
In [11]: x = np.array([[1,2,3],[4,-5,6]], dtype = np.uint32)
print(x)
print(x.nbytes)
Out[11]: [[ 1 2 3]
[ 4 4294967291 6]]
24

As you can see, each type consumes a different number of bytes. Imagine you have a matrix as follows and that you are using int64 or int32 as the data type:

In [12]: x = np.array([[1,2,3],[4,5,6]], dtype = np.int64)
print("int64 consumes",x.nbytes, "bytes")
x = np.array([[1,2,3],[4,5,6]], dtype = np.int32)
print("int32 consumes",x.nbytes, "bytes")
Out[12]: int64 consumes 48 bytes
int32 consumes 24 bytes

The memory need is doubled if you use int64. Ask this question to yourself; which data type would suffice? Until your numbers are higher than 2,147,483,648 and lower than -2,147,483,647, using int32 is enough. Imagine you have a huge array with a size over 100 MB. In such cases, this conversion plays a crucial role in performance.

As you may have noticed in the previous example, when you were changing the data types, you were creating an array each time. Technically, you cannot change the dtype after you create the array. However, what you can do is either create it again or copy the existing one with a new dtype and with the astype attribute. Let's create a copy of the array with the new dtype. Here is an example of how you can also change your dtype with the astype attribute:

In [13]: x_copy = np.array(x, dtype = np.float)
x_copy
Out[13]: array([[ 1., 2., 3.],
[ 4., 5., 6.]])
In [14]: x_copy_int = x_copy.astype(np.int)
x_copy_int
Out[14]: array([[1, 2, 3],
[4, 5, 6]])

Please keep in mind that when you use the astype attribute, it doesn't change the dtype of the x_copy, even though you applied it to x_copy. It keeps the x_copy, but creates a x_copy_int:

In [15]: x_copy
Out[15]: array([[ 1., 2., 3.],
[ 4., 5., 6.]])

Let's imagine a case where you are working in a research group that tries to identify and calculate the risks of an individual patient who has cancer. You have 100,000 records (rows), where each row represents a single patient, and each patient has 100 features (results of some of the tests). As a result, you have (100000, 100) arrays:

In [16]: Data_Cancer= np.random.rand(100000,100)
print(type(Data_Cancer))
print(Data_Cancer.dtype)
print(Data_Cancer.nbytes)
Data_Cancer_New = np.array(Data_Cancer, dtype = np.float32)
print(Data_Cancer_New.nbytes)
Out[16]: <class 'numpy.ndarray'>
float64
80000000
40000000

As you can see from the preceding code, their size decreases from 80 MB to 40 MB just by changing the dtype. What we get in return is less precision after decimal points. Instead of being precise to 16 decimals points, you will have only 7 decimals. In some machine learning algorithms, precision can be negligible. In such cases, you should feel free to adjust your dtype so that it minimizes your memory usage.

NumPy array operations

This section will guide you through the creation and manipulation of numerical data with NumPy. Let's start by creating a NumPy array from the list:

In [17]: my_list = [2, 14, 6, 8]
my_array = np.asarray(my_list)
type(my_array)
Out[17]: numpy.ndarray

Let's do some addition, subtraction, multiplication, and division with scalar values:

In [18]: my_array + 2
Out[18]: array([ 4, 16, 8, 10])
In [19]: my_array - 1
Out[19]: array([ 1, 13, 5, 7])
In [20]: my_array * 2
Out[20]: array([ 4, 28, 12, 16, 8])
In [21]: my_array / 2
Out[21]: array([ 1. , 7. , 3. , 4. ])

It's much harder to do the same operations in a list because the list does not support vectorized operations and you need to iterate its elements. There are many ways to create NumPy arrays, and now you will use one of these methods to create an array which is full of zeros. Later, you will perform some arithmetic operations to see how NumPy behaves in element-wise operations between two arrays:

In [22]: second_array = np.zeros(4) + 3
second_array
Out[22]: array([ 3., 3., 3., 3.])
In [23]: my_array - second_array
Out[23]: array([ -1., 11., 3., 5.])
In [24]: second_array / my_array
Out[24]: array([ 1.5 , 0.21428571, 0.5 , 0.375 ])

As we did in the previous code, you can create an array which is full of ones with np.ones or an identity array with np.identity and do the same algebraic operations that you did previously:

In [25]: second_array = np.ones(4) + 3
second_array
Out[25]: array([ 4., 4., 4., 4.])
In [26]: my_array - second_array
Out[26]: array([ -2., 10., 2., 4.])
In [27]: second_array / my_array
Out[27]: array([ 2. , 0.28571429, 0.66666667, 0.5 ])

It works as expected with the np.ones method, but when you use the identity matrix, the calculation returns a (4,4) array as follows:

In [28]: second_array = np.identity(4)
second_array
Out[28]: array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.]])
In [29]: second_array = np.identity(4) + 3
second_array
Out[29]: array([[ 4., 3., 3., 3.],
[ 3., 4., 3., 3.],
[ 3., 3., 4., 3.],
[ 3., 3., 3., 4.]])
In [30]: my_array - second_array
Out[30]: array([[ -2., 11., 3., 5.],
[ -1., 10., 3., 5.],
[ -1., 11., 2., 5.],
[ -1., 11., 3., 4.]])

What this does is subtract the first element of my_array from all of the elements of the first column of second_array and the second_element of the second column, and so on. The same rule is applied to division as well. Please keep in mind that you can successfully do array operations even if they are not exactly the same shape. Later in this chapter, you will learn about broadcasting errors when computation cannot be done between two arrays due to differences in their shapes:

In [31]: second_array / my_array
Out[31]: array([[ 2. , 0.21428571, 0.5 , 0.375 ],
[ 1.5 , 0.28571429, 0.5 , 0.375 ],
[ 1.5 , 0.21428571, 0.66666667, 0.375 ],
[ 1.5 , 0.21428571, 0.5 , 0.5 ]])

One of the most useful methods in creating NumPy arrays is arange. This returns an array for a given interval between your start and end values. The first argument is the start value of your array, the second is the end value (where it stops creating values), and the third one is the interval. Optionally, you can define your dtype as the fourth argument. The default interval values are 1:

In [32]: x = np.arange(3,7,0.5)
x
Out[32]: array([ 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. , 6.5])

There is another way to create an array with fixed intervals between the start and stop point when you cannot decide what the interval should be, but you should know how many splits your array should have:

In [33]: x = np.linspace(1.2, 40.5, num=20)
x
Out[33]: array([ 1.2 , 3.26842105, 5.33684211, 7.40526316, 9.47368421,
11.54210526, 13.61052632, 15.67894737, 17.74736842, 19.81578947,
21.88421053, 23.95263158, 26.02105263, 28.08947368, 30.15789474,
32.22631579, 34.29473684, 36.36315789, 38.43157895, 40.5 ])

There are two different methods which are similar in usage but return different sequences of numbers because their base scale is different. This means that the distribution of the numbers will be different as well. The first one is geomspace, which returns numbers on a logarithmic scale with a geometric progression:

In [34]: np.geomspace(1, 625, num=5)
Out[34]: array([ 1., 5., 25., 125., 625.])

The other important method is logspace, where you can return the values for your start and stop points, which are evenly scaled in:

In [35]: np.logspace(3, 4, num=5)
Out[35]: array([ 1000. , 1778.27941004, 3162.27766017, 5623.4132519 ,
10000. ])

What are these arguments? If the starting point is 3 and the ending point is 4, then these functions return the numbers which are much higher than the initial range. Your starting point is actually set as default to 10**Start Argument and the ending is set as 10**End Argument. So technically, in this example, the starting point is 10**3 and the ending point is 10**4. You can avoid such situations and keep your start and end points the same as when you put them as arguments in the method. The trick is to use base 10 logarithms of the given arguments:

In [36]: np.logspace(np.log10(3) , np.log10(4) , num=5)
Out[36]: array([ 3. , 3.2237098 , 3.46410162, 3.72241944, 4. ])

By now, you should be familiar with different ways of creating arrays with different distributions. You have also learned how to do some basic operations with them. Let's continue with other useful functions that you will definitely use in your day to day work. Most of the time, you will have to work with multiple arrays and you will need to compare them very quickly. NumPy has a great solution for this problem; you can compare the arrays as you would compare two integers:

In [37]: x = np.array([1,2,3,4])
y = np.array([1,3,4,4])
x == y
Out[37]: array([ True, False, False, True], dtype=bool)

The comparison is done element-wise and it returns a Boolean vector, whether elements are matching in two different arrays or not. This method works well in small size arrays and also gives you more details. You can see from the output array where the values are represented as False, that these indexed values are not matching in these two arrays. If you have a large array, you may also choose to get a single answer to your question, whether the elements are matching in two different arrays or not:

In [38]: x = np.array([1,2,3,4])
y = np.array([1,3,4,4])
np.array_equal(x,y)
Out[38]: False

Here, you have a single Boolean output. You only know that arrays are not equal, but you don't know which elements exactly are not equal. The comparison is not only limited to checking whether two arrays are equal or not. You can also do element-wise higher- lower comparison between two arrays:

In [39]: x = np.array([1,2,3,4])
y = np.array([1,3,4,4])
x < y
Out[39]: array([False, True, True, False], dtype=bool)

When you need to do logical comparison (AND, OR, XOR), you can use them in your array as follows:

In [40]: x = np.array([0, 1, 0, 0], dtype=bool)
y = np.array([1, 1, 0, 1], dtype=bool)
np.logical_or(x,y)
Out[40]: array([ True, True, False, True], dtype=bool)
In [41]: np.logical_and(x,y)
Out[41]: array([False, True, False, False], dtype=bool)
In [42]: x = np.array([12,16,57,11])
np.logical_or(x < 13, x > 50)
Out[42]: array([ True, False, True, True], dtype=bool)

So far, algebraic operations such as addition and multiplication have been covered. How can we use these operations with transcendental functions such as the exponential function, logarithms, or trigonometric functions?

In [43]: x = np.array([1, 2, 3,4 ])
np.exp(x)
Out[43]: array([ 2.71828183, 7.3890561 , 20.08553692, 54.59815003])
In [44]: np.log(x)
Out[44]: array([ 0. , 0.69314718, 1.09861229, 1.38629436])
In [45]: np.sin(x)
Out[45]: array([ 0.84147098, 0.90929743, 0.14112001, -0.7568025 ])

What about the transpose of a matrix? First, you will use the reshape function with arange to set the desired shape of the matrix:

In [46]: x = np.arange(9)
x
Out[46]: array([0, 1, 2, 3, 4, 5, 6, 7, 8])
In [47]: x = np.arange(9).reshape((3, 3))
x
Out[47]: array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [48]: x.T
Out[48]: array([[0, 3, 6],
[1, 4, 7],
[2, 5, 8]])

You transpose the 3*3 array, so the shape doesn't change because both dimensions are 3. Let's see what happens when you don't have a square array:

In [49]: x = np.arange(6).reshape(2,3)
x
Out[49]: array([[0, 1, 2],
[3, 4, 5]])
In [50]: x.T
Out[50]: array([[0, 3],
[1, 4],
[2, 5]])

The transpose works as expected and the dimensions are switched as well. You can also get summary statistics from arrays such as mean, median, and standard deviation. Let's start with methods that NumPy offers for calculating basic statistics:

Method
Description
np.sum
Returns the sum of all array values or along the specified axis
np.amin
Returns the minimum value of all arrays or along the specified axis
np.amax
Returns the maximum value of all arrays or along the specified axis
np.percentile
Returns the given qth percentile of all arrays or along the specified axis
np.nanmin
The same as np.amin, but ignores NaN values in an array
np.nanmax
The same as np.amax, but ignores NaN values in an array
np.nanpercentile
The same as np.percentile, but ignores NaN values in an array

The following code block gives an example of the preceding statistical methods of NumPy. These methods are very useful as you can operate the methods in a whole array or axis-wise according to your needs. You should note that you can find more fully-featured and better implementations of these methods in SciPy as it uses NumPy multidimensional arrays as a data structure:

In [51]: x = np.arange(9).reshape((3,3))
x
Out[51]: array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [52]: np.sum(x)
Out[52]: 36
In [53]: np.amin(x)
Out[53]: 0
In [54]: np.amax(x)
Out[54]: 8
In [55]: np.amin(x, axis=0)
Out[55]: array([0, 1, 2])
In [56]: np.amin(x, axis=1)
Out[56]: array([0, 3, 6])
In [57]: np.percentile(x, 80)
Out[57]: 6.4000000000000004

The axis argument determines the dimension that this function will operate on. In this example, axis=0 represents the first axis which is the x axis, and axis = 1 represents the second axis which is y. When we use a regular amin(x), we return a single value because it calculates the minimum value in all arrays, but when we specify the axis, it starts evaluating the function axis-wise and returns an array which shows the results for each row or column. Imagine you have a large array; you find the max value by using amax, but what will happen if you need to pass the index of this value to another function? In such cases, argmin and argmax come to the rescue, as shown in the following snippet:

In [58]: x = np.array([1,-21,3,-3])
np.argmax(x)
Out[58]: 2
In [59]: np.argmin(x)
Out[59]: 1

Let's continue with more statistical functions:

Method

Description

np.mean

Returns the mean of all array values or along the specific axis

np.median

Returns the median of all array values or along the specific axis

np.std

Returns the standard deviation of all array values or along the specific axis

np.nanmean

The same as np.mean, but ignores NaN values in an array

np.nanmedian

The same as np.nanmedian, but ignores NaN values in an array

np.nonstd

The same as np.nanstd, but ignores NaN values in an array

The following code gives more examples of the preceding statistical methods of NumPy. These methods are heavily used in data discovery phases, where you analyze your data features and distribution:

In [60]: x = np.array([[2, 3, 5], [20, 12, 4]])
x
Out[60]: array([[ 2, 3, 5],
[20, 12, 4]])
In [61]: np.mean(x)
Out[61]: 7.666666666666667
In [62]: np.mean(x, axis=0)
Out[62]: array([ 11. , 7.5, 4.5])
In [63]: np.mean(x, axis=1)
Out[63]: array([ 3.33333333, 12. ])
In [64]: np.median(x)
Out[64]: 4.5
In [65]: np.std(x)
Out[65]: 6.3944420310836261

Working with multidimensional arrays

This section will give you a brief understanding of multidimensional arrays by going through different matrix operations.

In order to do matrix multiplication in NumPy, you have to use dot() instead of *. Let's see some examples:

In [66]: c = np.ones((4, 4))
c*c
Out[66]: array([[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.]])
In [67]: c.dot(c)
Out[67]: array([[ 4., 4., 4., 4.],
[ 4., 4., 4., 4.],
[ 4., 4., 4., 4.],
[ 4., 4., 4., 4.]])

The most important topic in working with multidimensional arrays is stacking, in other words how to merge two arrays. hstack is used for stacking arrays horizontally (column-wise) and vstack is used for stacking arrays vertically (row-wise). You can also split the columns with the hsplit and vsplit methods in the same way that you stacked them:

In [68]: y = np.arange(15).reshape(3,5)
x = np.arange(10).reshape(2,5)
new_array = np.vstack((y,x))
new_array
Out[68]: array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9]])
In [69]: y = np.arange(15).reshape(5,3)
x = np.arange(10).reshape(5,2)
new_array = np.hstack((y,x))
new_array
Out[69]: array([[ 0, 1, 2, 0, 1],
[ 3, 4, 5, 2, 3],
[ 6, 7, 8, 4, 5],
[ 9, 10, 11, 6, 7],
[12, 13, 14, 8, 9]])

These methods are very useful in machine learning applications, especially when creating datasets. After you stack your arrays, you can check their descriptive statistics by using Scipy.stats. Imagine a case where you have 100 records, and each record has 10 features, which means you have a 2D matrix which has 100 rows and 10 columns. The following example shows how you can easily get some descriptive statistics for each feature:

In [70]: from scipy import stats
x= np.random.rand(100,10)
n, min_max, mean, var, skew, kurt = stats.describe(x)
new_array = np.vstack((mean,var,skew,kurt,min_max[0],min_max[1]))
new_array.T
Out[70]: array([[ 5.46011575e-01, 8.30007104e-02, -9.72899085e-02,
-1.17492785e+00, 4.07031246e-04, 9.85652100e-01],
[ 4.79292653e-01, 8.13883169e-02, 1.00411352e-01,
-1.15988275e+00, 1.27241020e-02, 9.85985488e-01],
[ 4.81319367e-01, 8.34107619e-02, 5.55926602e-02,
-1.20006450e+00, 7.49534810e-03, 9.86671083e-01],
[ 5.26977277e-01, 9.33829059e-02, -1.12640661e-01,
-1.19955646e+00, 5.74237697e-03, 9.94980830e-01],
[ 5.42622228e-01, 8.92615897e-02, -1.79102183e-01,
-1.13744108e+00, 2.27821933e-03, 9.93861532e-01],
[ 4.84397369e-01, 9.18274523e-02, 2.33663872e-01,
-1.36827574e+00, 1.18986562e-02, 9.96563489e-01],
[ 4.41436165e-01, 9.54357485e-02, 3.48194314e-01,
-1.15588500e+00, 1.77608372e-03, 9.93865324e-01],
[ 5.34834409e-01, 7.61735119e-02, -2.10467450e-01,
-1.01442389e+00, 2.44706226e-02, 9.97784091e-01],
[ 4.90262346e-01, 9.28757119e-02, 1.02682367e-01,
-1.28987137e+00, 2.97705706e-03, 9.98205307e-01],
[ 4.42767478e-01, 7.32159267e-02, 1.74375646e-01,
-9.58660574e-01, 5.52410464e-04, 9.95383732e-01]])

NumPy has a great module named numpy.ma, which is used for masking array elements. It's very useful when you want to mask (ignore) some elements while doing your calculations. When NumPy masks, it will be treated as an invalid and does not take into account computation:

In [71]: import numpy.ma as ma
x = np.arange(6)
print(x.mean())
masked_array = ma.masked_array(x, mask=[1,0,0,0,0,0])
masked_array.mean()
2.5
Out[71]: 3.0

In the preceding code, you have an array x = [0,1,2,3,4,5]. What you do is mask the first element of the array and then calculate the mean. When an element is masked as 1(True), the associated index value in the array will be masked. This method is also very useful while replacing the NAN values:

In [72]: x = np.arange(25, dtype = float).reshape(5,5)
x[x<5] = np.nan
x
Out[72]: array([[ nan, nan, nan, nan, nan],
[ 5., 6., 7., 8., 9.],
[ 10., 11., 12., 13., 14.],
[ 15., 16., 17., 18., 19.],
[ 20., 21., 22., 23., 24.]])
In [73]: np.where(np.isnan(x), ma.array(x, mask=np.isnan(x)).mean(axis=0), x)
Out[73]: array([[ 12.5, 13.5, 14.5, 15.5, 16.5],
[ 5. , 6. , 7. , 8. , 9. ],
[ 10. , 11. , 12. , 13. , 14. ],
[ 15. , 16. , 17. , 18. , 19. ],
[ 20. , 21. , 22. , 23. , 24. ]])

In preceding code, we changed the value of the first five elements to nan by putting a condition with index. x[x<5] refers to the elements which indexed for 0, 1, 2, 3, and 4. Then we overwrite these values with the mean of each column(excluding nan values). There are many other useful methods in array operations in order help your code be more concise:

Method
Description
np.concatenate
Join to the matrix in a sequence with a given matrix
np.repeat
Repeat the element of an array along a specific axis
np.delete
Return a new array with the deleted subarrays
np.insert
Insert values before the specified axis
np.unique
Find unique values in an array
np.tile
Create an array by repeating a given input for a given number of repetitions

Indexing, slicing, reshaping, resizing, and broadcasting

When you are working with huge arrays in machine learning projects, you often need to index, slice, reshape, and resize.

Indexing is a fundamental term used in mathematics and computer science. As a general term, indexing helps you to specify how to return desired elements of various data structures. The following example shows indexing for a list and a tuple:

In [74]: x = ["USA","France", "Germany","England"]
x[2]
Out[74]: 'Germany'
In [75]: x = ('USA',3,"France",4)
x[2]
Out[75]: 'France'

In NumPy, the main usage of indexing is controlling and manipulating the elements of arrays. It's a way of creating generic lookup values. Indexing contains three child operations, which are field access, basic slicing, and advanced indexing. In field access, you just specify the index of an element in an array to return the value for a given index.

NumPy is very powerful when it comes to indexing and slicing. In many cases, you need to refer your desired element in an array and do the operations on this sliced area. You can index your array similarly to what you do with tuples or lists with square bracket notations. Let's start with field access and simple slicing with one-dimensional arrays and move on to more advanced techniques:

In [76]: x = np.arange(10)
x
Out[76]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [77]: x[5]
Out[77]: 5
In [78]: x[-2]
Out[78]: 8
In [79]: x[2:8]
Out[79]: array([2, 3, 4, 5, 6, 7])
In [80]: x[:]
Out[80]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [81]: x[2:8:2]
Out[81]: array([2, 4, 6])

Indexing starts from 0, so when you create an array with an element, your first element is indexed as x[0], the same way as your last element, x[n-1]. As you can see in the preceding example, x[5] refers to the sixth element. You can also use negative values in indexing. NumPy understands these values as the nth orders backwards. Like in the example, x[-2] refers to the second to last element. You can also select multiple elements in your array by stating the starting and ending indexes and also creating sequential indexing by stating the increment level as a third argument, as in the last line of the code.

So far, we have seen indexing and slicing in 1D arrays. The logic does not change, but for the sake of demonstration, let's do some practice for multidimensional arrays as well. The only thing that changes when you have multidimensional arrays is just having more axis. You can slice the n-dimensional array as [slicing in x-axis, slicing in y-axis] in the following code:

In [82]: x = np.reshape(np.arange(16),(4,4))
x
Out[82]: array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [83]: x[1:3]
Out[83]: array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [84]: x[:,1:3]
Out[84]: array([[ 1, 2],
[ 5, 6],
[ 9, 10],
[13, 14]])
In [85]: x[1:3,1:3]
Out[85]: array([[ 5, 6],
[ 9, 10]])

You sliced the arrays row and column-wise, but you haven't sliced the elements in a more irregular or more dynamic fashion, which means you always slice them in a rectangular or square way. Imagine a 4*4 array that we want to slice as follows:

To obtain the preceding slicing, we execute the following code:

In [86]: x = np.reshape(np.arange(16),(4,4))
x
Out[86]: array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [87]: x[[0,1,2],[0,1,3]]
Out[87]: array([ 0, 5, 11])

In advanced indexing, the first part indicates the index of rows to be sliced and the second part indicates the corresponding columns. In the preceding example, you first sliced the 1st, 2nd, and 3rd rows ([0,1,2]) and then sliced the 1st, 2nd and 4th columns ([0,1,3]) into sliced rows.

The reshape and resize methods may seem similar, but there are differences in the outputs of these operations. When you reshape the array, it's just the output that changes the shape of the array temporarily, but it does not change the array itself. When you resize the array, it changes the size of the array permanently, and if the new array's size is bigger than the old one, the new array elements will be filled with repeated copies of the old ones. On the contrary, if the new array is smaller, a new array will take the elements from the old array with the order of index which is required to fill the new one. Please note that same data can be shared by different ndarrays which means that an ndarray can be a view to another ndarray. In such cases changes made in one array will have consequences on other views.

The following code gives an example of how the new array elements are filled when the size is bigger or smaller than the original array:

In [88]: x = np.arange(16).reshape(4,4)
x
Out[88]: array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [89]: np.resize(x,(2,2))
Out[89]: array([[0, 1],
[2, 3]])
In [90]: np.resize(x,(6,6))
Out[90]: array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 0, 1],
[ 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13],
[14, 15, 0, 1, 2, 3]])

The last important term of this subsection is broadcasting, which explains how NumPy behaves in arithmetic operations of the array when they have different shapes. NumPy has two rules for broadcasting: either the dimensions of the arrays are equal, or one of them is 1. If one of these conditions is not met, then you will get one of the two errors: frames are not aligned or operands could not be broadcast together:

In [91]: x = np.arange(16).reshape(4,4)
y = np.arange(6).reshape(2,3)
x+y
--------------------------------------------------------------- ------------
ValueError Traceback (most recent call last)
<ipython-input-102-083fc792f8d9> in <module>()
1 x = np.arange(16).reshape(4,4)
2 y = np.arange(6).reshape(2,3)
----> 3 x+y
12
ValueError: operands could not be broadcast together with shapes (4,4) (2,3)

You might have seen that you can multiply two matrices with shapes (4, 4) and (4,) or with (2, 2) and (2, 1). The first case meets the condition of having one dimension so that the multiplication becomes a vector * array, which does not cause any broadcasting problems:

In [92]: x = np.ones(16).reshape(4,4)
y = np.arange(4)
x*y
Out[92]: array([[ 0., 1., 2., 3.],
[ 0., 1., 2., 3.],
[ 0., 1., 2., 3.],
[ 0., 1., 2., 3.]])
In [93]: x = np.arange(4).reshape(2,2)
x
Out[93]: array([[0, 1],
[2, 3]])
In [94]: y = np.arange(2).reshape(1,2)
y
Out[94]: array([[0, 1]])
In [95]: x*y
Out[95]: array([[0, 1],
[0, 3]])

The preceding code block gives an example for the second case, where during computation small arrays iterate through the large array and the output is stretched across the whole array. That's the reason why there are (4, 4) and (2, 2) outputs: during the multiplication, both arrays are broadcast to larger dimensions.

Summary

In this chapter, you got familiar with NumPy basics for array operations and refreshed your knowledge about basic matrix operations. NumPy is an extremely important library for Python scientific stacks, with its extensive methods for array operations. You have learned how to work with multidimensional arrays and covered important topics such as indexing, slicing, reshaping, resizing, and broadcasting. The main goal of this chapter was to give you a brief idea of how NumPy works when it comes to numerical datasets, which will be helpful in your daily data analysis work.

In the next chapter, you will learn the basics of linear algebra and complete practical examples with NumPy.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • • Grasp all aspects of numerical computing and understand NumPy
  • • Explore examples to learn exploratory data analysis (EDA), regression, and clustering
  • • Access NumPy libraries and use performance benchmarking to select the right tool

Description

NumPy is one of the most important scientific computing libraries available for Python. Mastering Numerical Computing with NumPy teaches you how to achieve expert level competency to perform complex operations, with in-depth coverage of advanced concepts. Beginning with NumPy's arrays and functions, you will familiarize yourself with linear algebra concepts to perform vector and matrix math operations. You will thoroughly understand and practice data processing, exploratory data analysis (EDA), and predictive modeling. You will then move on to working on practical examples which will teach you how to use NumPy statistics in order to explore US housing data and develop a predictive model using simple and multiple linear regression techniques. Once you have got to grips with the basics, you will explore unsupervised learning and clustering algorithms, followed by understanding how to write better NumPy code while keeping advanced considerations in mind. The book also demonstrates the use of different high-performance numerical computing libraries and their relationship with NumPy. You will study how to benchmark the performance of different configurations and choose the best for your system. By the end of this book, you will have become an expert in handling and performing complex data manipulations.

Who is this book for?

Mastering Numerical Computing with NumPy is for you if you are a Python programmer, data analyst, data engineer, or a data science enthusiast, who wants to master the intricacies of NumPy and build solutions for your numeric and scientific computational problems. You are expected to have familiarity with mathematics to get the most out of this book.

What you will learn

  • • Perform vector and matrix operations using NumPy
  • • Perform exploratory data analysis (EDA) on US housing data
  • • Develop a predictive model using simple and multiple linear regression
  • • Understand unsupervised learning and clustering algorithms with practical use cases
  • • Write better NumPy code and implement the algorithms from scratch
  • • Perform benchmark tests to choose the best configuration for your system

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jun 28, 2018
Length: 248 pages
Edition : 1st
Language : English
ISBN-13 : 9781788993357
Category :
Languages :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Jun 28, 2018
Length: 248 pages
Edition : 1st
Language : English
ISBN-13 : 9781788993357
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just Mex$85 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just Mex$85 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total Mex$ 2,276.97
SciPy Recipes
Mex$799.99
Hands-On Data Analysis with NumPy and pandas
Mex$676.99
Mastering Numerical Computing with NumPy
Mex$799.99
Total Mex$ 2,276.97 Stars icon
Banner background image

Table of Contents

10 Chapters
Working with NumPy Arrays Chevron down icon Chevron up icon
Linear Algebra with NumPy Chevron down icon Chevron up icon
Exploratory Data Analysis of Boston Housing Data with NumPy Statistics Chevron down icon Chevron up icon
Predicting Housing Prices Using Linear Regression Chevron down icon Chevron up icon
Clustering Clients of a Wholesale Distributor Using NumPy Chevron down icon Chevron up icon
NumPy, SciPy, Pandas, and Scikit-Learn Chevron down icon Chevron up icon
Advanced Numpy Chevron down icon Chevron up icon
Overview of High-Performance Numerical Computing Libraries Chevron down icon Chevron up icon
Performance Benchmarks Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Full star icon Full star icon 5
(1 Ratings)
5 star 100%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Andrey Jun 27, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Good
Subscriber review Packt
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.