Learn Python Series (#11) - NumPy Part 1

12 comments

scipio
65
9 months agoUtopian9 min read

Learn Python Series (#11) - NumPy Part 1

numpy-logo.png

What Will I Learn?

  • You will learn how to import NumPy,
  • what an ndarray object is, and why it is so useful, powerful and fast,
  • how to generate number sequences.
  • about a few other useful NumPy methods,
  • about Numpy's "basic" Array Attributes

Requirements

  • A working modern computer running macOS, Windows or Ubuntu
  • An installed Python 3(.6) distribution, such as (for example) the Anaconda Distribution
  • The ambition to learn Python programming
  • An installed version of NumPy in your Python (virtual) environment. In case you are using Anaconda, NumPy is installed by default. If it's not installed, just do so via pip install numpy from the command line.

Difficulty

Intermediate

Curriculum (of the Learn Python Series):

Learn Python Series (#11) - NumPy Part 1

As part of this Learn Python Series, NumPy must be included to my perception. NumPy is a package for numerical computation and includes support for multi-dimensional arrays and matrices, and mathematical (high-level) functions to perform operations to those arrays. NumPy allows for fast numerical computing in Python where the "standard" Python bytecode interpreter isn't initially designed for numerical computing. Using NumPy, well-written Python code running mathematical algorithms and lots of data, isn't slow at all!

Because NumPy serves as a fundamental package for scientific Python computing, on top of which multiple other scientific packages are built even, NumPy is mostly used by data scientists having in-depth scientific backgrounds. And therefore, presumably, not many easy to get into NumPy tutorials exist. However, I argue that NumPy can also be used as a default toolkit for non-scientific Python programmers, even beginners. This NumPy tutorial sub-series hopes to onboard Python programmers from any background or level.

NumPy's Core: The ndarray object

The ndarray object is the core of NumPy: n-dimensional arrays, holding the same ("homogeneous") sorts of data to which various math operations can be performed efficiently. This is different to standard Python lists because NumPy arrays are fixed size, hold elements of the same data type, and function element-wise by default, hence not needing for loops per element.

For example, let's assume two lists a and b of equal length, all holding integers, from which we want to create a new list c in which every element of a and b is multiplied:

# Standard Python way
a = [1,2,3,4]
b = [5,6,7,8]
c = []
for i in range(len(a)):
    c.append(a[i] * b[i])
print(type(c), c)
# <class 'list'> [5, 12, 21, 32]
<class 'list'> [5, 12, 21, 32]
# NumPy way
import numpy as np
a = np.array(a)
b = np.array(b)
c = a*b
print(type(c), c)
# <class 'numpy.ndarray'> [ 5 12 21 32]
<class 'numpy.ndarray'> [ 5 12 21 32]

Explanation:
In the 'Standard Python way' example, first the list [1,2,3,4] was assigned to variable a and [5,6,7,8] to variable b, after which another empty list was assigned to variable c, to initialize c. Next a for loop was needed in which every element of both a and b were fetched by index number i, multiplied, and its multiplication result was appended to (the initially empty) list c.

In the 'NumPy way' example, first the NumPy package was imported as np in order to use it, and then a and b were both "converted" from a list to a 1-dimensional NumPy array. And as a result, because in NumPy element-by-element operations are the default, no for loop was needed to let c hold the multiplication results. This is called vectorization, where explicit looping is absent and mathematical operations (in this case a simple multiplication) was performed "under the NumPy hood".

The creation of number sequences

NumPy has multiple built-in methods to create sequences of values, which we can then further manipulate.

  • There's for example the NumPy function arange() that, unlike range() does in standard Python, returns evenly-spaced arrays (not lists).

Usage: numpy.arange([start, ]stop, [step, ]dtype=None)

Examples:

# Using 1 argument: as stop
arr_1 = np.arange(10)
# array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# Using 2 arguments: start & stop
arr_2 = np.arange(5, 10)
# array([5, 6, 7, 8, 9])

# Using 3 arguments: start, stop, 
# and a step-incrementor, which can be a float!
arr_3 = np.arange(0, 5, 0.8)
# array([0. , 0.8, 1.6, 2.4, 3.2, 4. , 4.8])
  • Another NumPy sequence creator is the linspace() function. Instead of specifying the steps (like arange() expects), linspace() via its num keyword argument (kwarg) expects the number of elements you want to create. It by default includes the endpoint stop argument in the created array.

Usage: numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)

Examples:

# Create 5 evenly-spaced array elements (with step == 2),
# by not including the stop value (= endpoint) 10
arr_4 = np.linspace(0, 10, num=5, endpoint=False, dtype=int)
# array([0, 2, 4, 6, 8])
# ... or create 6 evenly-spaced array elements (with step == 2),
# by including the stop value (= 10) as last array element
arr_5 = np.linspace(0, 10, num=6, dtype=int)
# array([ 0,  2,  4,  6,  8, 10])
# If the the `num` argument is unset, by default 50 array elements
# will be created
arr_6 = np.linspace(0, 1, endpoint=False)
# array([0. , 0.02, 0.04, 0.06, 0.08, 0.1, 0.12, 0.14, 0.16, 0.18, 
#        0.2, 0.22, 0.24, 0.26, 0.28, 0.3, 0.32, 0.34, 0.36, 0.38, 
#        0.4, 0.42, 0.44, 0.46, 0.48, 0.5, 0.52, 0.54, 0.56, 0.58,
#        0.6, 0.62, 0.64, 0.66, 0.68, 0.7, 0.72, 0.74, 0.76, 0.78,
#        0.8, 0.82, 0.84, 0.86, 0.88, 0.9, 0.92, 0.94, 0.96, 0.98])
  • A third function (from many more) to create a sequence of numbers is random.random(), provided an "array size" is set. If ommitted, only one value of type float is returned, else a NumPy array of floats, all between 0.0 and 1.0.

Usage: numpy.random.random(size=None)

Examples:

# Create a single random float
x = np.random.random()
# 0.6022344994122718
# Create a 1-dimensional array with 5 elements
# PS: note the comma after the 5
arr_7 = np.random.random((5,))
# array([0.45631267, 0.08919399, 0.76948001, 0.14375291, 0.02052383])
# Create a 2-dimensional array with 6 elements (size == 3*2 == 6) 
arr_8 = np.random.random((3,2))
# array([[0.0379596 , 0.89298785],
#        [0.03927935, 0.96021587],
#        [0.38208804, 0.21292953]])

NumPy's "Basic" Array Attributes

If you had to think twice to wrap your head around the creation of the last 2-dimensional (arr_8) example, then don't worry. Because NumPy handles N-dimensional arrays, I wanted to briefly touch upon the concept of 2-dimensional arrays, already in this Part 1 of the NumPy sub-series. But understanding multi-dimensional arrays, let alone being able to write eloquent code using them, can be tough. It's tough for me as well to explain what the attributes (properties, characteristics) of multi-dimensional arrays are. But let me try nonetheless....

PS: for the early next parts of the NumPy sub-series I'll try only to use 1-dimensional arrays. So even if you don't really understand the following attribute explanation, you can probably still follow along the other NumPy topics I will be covering.

  • In NumPy terminology, dimensions are also called axes,
  • the number of axes (= dimensions) a NumPy array has, is called its rank, for example a 3-dimensional array has a rank of 3,
  • by defining the shape of an array, you define the array's dimensions, where the size of each dimension is also called the axis length,
  • and the length of the shape tuple is again the array rank.

Examples:

# Let's create a 2-dimensional array, <= so rank 2
# which holds 3 (three) 1-dimensional arrays <= axis_1 has length 3 
# each holding 4 (four) integer elements <= axis_2 has length 4
arr_9 = np.arange(1, 13).reshape(3,4) # <= this is a tuple, the shape tuple
print(arr_9)
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
arr_9.ndim
# 2 <= indeed, this array has 2 dimensions
2
arr_9.shape
# (3, 4) <= that's 2 numbers, so rank 2, a 2-dimensional array
(3, 4)
arr_9.size
# 12 <= the array size is 3*4 == 12 elements in total 
12

Test questions

Let's define a NumPy array holding a 3D-coordinate, like so:

coord = np.array([1,2,3])

Do you now know the attribute values of this 3D-coordinate?

  • what's the rank, the number of dimensions, of this array?
  • what's the shape of the array? Or in other words, what are the lengths of the axes?
  • what's the array size?

Let's find out, together!

coord.ndim
# 1 <= even though the array is holding a 3D-coordinate, the array itself is of rank 1,
# it just has one dimension!
1
coord.shape
# (3,) <= there's just 1 dimension, 1 axis, with length 3
(3,)
coord.size
# 3 <= in total there are 3 elements stored in the array
3

What's covered in the next tutorials?

Now that we know the NumPy library exists, that it's used for numerical Python computing, that it uses ndarray objects, which allow for vectorization, how we generate value sequences, and what the attributes / properties of N-dimensional arrays are... in the next tutorial part we can cover some of the NumPy operations, explore some "universal functions" (well-known mathematical functions you are, or maybe were in school / university (?), already familiar to!

However, in the next Learn Python Series we'll first be focusing on some more built-in modules, to handle files (in general), and CSV and JSON more specifically, as well as using the popular external Requests: http for humans library, to fetch data from the web. Also, we'll go over using BeautifulSoup to parse HTML files.

If we combine our Python knowledge regarding strings, lists, dictionaries, tuples, Matplotlib, NumPy, CSV, JSON, fetching web data via Requests, parsing HTML via BeautifulSoup, and reading from and saving to (our own) files, we can do lots of very useful things already! Stay tuned for the following episodes of the Learn Python Series!

Thank you for your time!



Posted on Utopian.io - Rewarding Open Source Contributors

Comments

Sort byBest