Learn Python Series (#11) - NumPy Part 1
Learn Python Series (#11) - NumPy Part 1
What Will I Learn?
- You will learn how to import NumPy,
- what an
ndarray objectis, and why it is so useful, powerful and fast,
- how to generate number sequences.
- about a few other useful NumPy methods,
- about Numpy's "basic"
- A working modern computer running macOS, Windows or Ubuntu
- An installed Python 3(.6) distribution, such as (for example) the Anaconda Distribution
- The ambition to learn Python programming
- An installed version of NumPy in your Python (virtual) environment. In case you are using Anaconda, NumPy is installed by default. If it's not installed, just do so via
pip install numpyfrom the command line.
Curriculum (of the
Learn Python Series):
- Learn Python Series - Intro
- Learn Python Series (#2) - Handling Strings Part 1
- Learn Python Series (#3) - Handling Strings Part 2
- Learn Python Series (#4) - Round-Up #1
- Learn Python Series (#5) - Handling Lists Part 1
- Learn Python Series (#6) - Handling Lists Part 2
- Learn Python Series (#7) - Handling Dictionaries
- Learn Python Series (#8) - Handling Tuples
- Learn Python Series (#9) - Using Import
- Learn Python Series (#10) - Matplotlib Part 1
Learn Python Series (#11) - NumPy Part 1
As part of this Learn Python Series, NumPy must be included to my perception. NumPy is a package for numerical computation and includes support for multi-dimensional arrays and matrices, and mathematical (high-level) functions to perform operations to those arrays. NumPy allows for fast numerical computing in Python where the "standard" Python bytecode interpreter isn't initially designed for numerical computing. Using NumPy, well-written Python code running mathematical algorithms and lots of data, isn't slow at all!
Because NumPy serves as a fundamental package for scientific Python computing, on top of which multiple other scientific packages are built even, NumPy is mostly used by data scientists having in-depth scientific backgrounds. And therefore, presumably, not many easy to get into NumPy tutorials exist. However, I argue that NumPy can also be used as a default toolkit for non-scientific Python programmers, even beginners. This NumPy tutorial sub-series hopes to onboard Python programmers from any background or level.
NumPy's Core: The ndarray object
The ndarray object is the core of NumPy: n-dimensional arrays, holding the same ("homogeneous") sorts of data to which various math operations can be performed efficiently. This is different to standard Python lists because NumPy arrays are fixed size, hold elements of the same data type, and function element-wise by default, hence not needing
for loops per element.
For example, let's assume two lists
b of equal length, all holding integers, from which we want to create a new list
c in which every element of
b is multiplied:
# Standard Python way a = [1,2,3,4] b = [5,6,7,8] c =  for i in range(len(a)): c.append(a[i] * b[i]) print(type(c), c) # <class 'list'> [5, 12, 21, 32]
<class 'list'> [5, 12, 21, 32]
# NumPy way import numpy as np a = np.array(a) b = np.array(b) c = a*b print(type(c), c) # <class 'numpy.ndarray'> [ 5 12 21 32]
<class 'numpy.ndarray'> [ 5 12 21 32]
In the 'Standard Python way' example, first the list
[1,2,3,4] was assigned to variable
[5,6,7,8] to variable
b, after which another empty list was assigned to variable
c, to initialize
c. Next a
for loop was needed in which every element of both
b were fetched by index number
i, multiplied, and its multiplication result was appended to (the initially empty) list
In the 'NumPy way' example, first the NumPy package was imported as
np in order to use it, and then
b were both "converted" from a list to a 1-dimensional NumPy array. And as a result, because in NumPy element-by-element operations are the default, no
for loop was needed to let
c hold the multiplication results. This is called vectorization, where explicit looping is absent and mathematical operations (in this case a simple multiplication) was performed "under the NumPy hood".
The creation of number sequences
NumPy has multiple built-in methods to create sequences of values, which we can then further manipulate.
- There's for example the NumPy function
range()does in standard Python, returns evenly-spaced arrays (not lists).
numpy.arange([start, ]stop, [step, ]dtype=None)
# Using 1 argument: as stop arr_1 = np.arange(10) # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) # Using 2 arguments: start & stop arr_2 = np.arange(5, 10) # array([5, 6, 7, 8, 9]) # Using 3 arguments: start, stop, # and a step-incrementor, which can be a float! arr_3 = np.arange(0, 5, 0.8) # array([0. , 0.8, 1.6, 2.4, 3.2, 4. , 4.8])
- Another NumPy sequence creator is the
linspace()function. Instead of specifying the steps (like
numkeyword argument (kwarg) expects the number of elements you want to create. It by default includes the endpoint
stopargument in the created array.
numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)
# Create 5 evenly-spaced array elements (with step == 2), # by not including the stop value (= endpoint) 10 arr_4 = np.linspace(0, 10, num=5, endpoint=False, dtype=int) # array([0, 2, 4, 6, 8])
# ... or create 6 evenly-spaced array elements (with step == 2), # by including the stop value (= 10) as last array element arr_5 = np.linspace(0, 10, num=6, dtype=int) # array([ 0, 2, 4, 6, 8, 10])
# If the the `num` argument is unset, by default 50 array elements # will be created arr_6 = np.linspace(0, 1, endpoint=False) # array([0. , 0.02, 0.04, 0.06, 0.08, 0.1, 0.12, 0.14, 0.16, 0.18, # 0.2, 0.22, 0.24, 0.26, 0.28, 0.3, 0.32, 0.34, 0.36, 0.38, # 0.4, 0.42, 0.44, 0.46, 0.48, 0.5, 0.52, 0.54, 0.56, 0.58, # 0.6, 0.62, 0.64, 0.66, 0.68, 0.7, 0.72, 0.74, 0.76, 0.78, # 0.8, 0.82, 0.84, 0.86, 0.88, 0.9, 0.92, 0.94, 0.96, 0.98])
- A third function (from many more) to create a sequence of numbers is
random.random(), provided an "array size" is set. If ommitted, only one value of type float is returned, else a NumPy array of floats, all between 0.0 and 1.0.
# Create a single random float x = np.random.random() # 0.6022344994122718
# Create a 1-dimensional array with 5 elements # PS: note the comma after the 5 arr_7 = np.random.random((5,)) # array([0.45631267, 0.08919399, 0.76948001, 0.14375291, 0.02052383])
# Create a 2-dimensional array with 6 elements (size == 3*2 == 6) arr_8 = np.random.random((3,2)) # array([[0.0379596 , 0.89298785], # [0.03927935, 0.96021587], # [0.38208804, 0.21292953]])
NumPy's "Basic" Array Attributes
If you had to think twice to wrap your head around the creation of the last 2-dimensional (
arr_8) example, then don't worry. Because NumPy handles N-dimensional arrays, I wanted to briefly touch upon the concept of 2-dimensional arrays, already in this Part 1 of the NumPy sub-series. But understanding multi-dimensional arrays, let alone being able to write eloquent code using them, can be tough. It's tough for me as well to explain what the attributes (properties, characteristics) of multi-dimensional arrays are. But let me try nonetheless....
PS: for the early next parts of the NumPy sub-series I'll try only to use 1-dimensional arrays. So even if you don't really understand the following attribute explanation, you can probably still follow along the other NumPy topics I will be covering.
- In NumPy terminology, dimensions are also called axes,
- the number of axes (= dimensions) a NumPy array has, is called its rank, for example a 3-dimensional array has a rank of 3,
- by defining the shape of an array, you define the array's dimensions, where the size of each dimension is also called the axis length,
- and the length of the shape tuple is again the array rank.
# Let's create a 2-dimensional array, <= so rank 2 # which holds 3 (three) 1-dimensional arrays <= axis_1 has length 3 # each holding 4 (four) integer elements <= axis_2 has length 4 arr_9 = np.arange(1, 13).reshape(3,4) # <= this is a tuple, the shape tuple print(arr_9)
[[ 1 2 3 4] [ 5 6 7 8] [ 9 10 11 12]]
arr_9.ndim # 2 <= indeed, this array has 2 dimensions
arr_9.shape # (3, 4) <= that's 2 numbers, so rank 2, a 2-dimensional array
arr_9.size # 12 <= the array size is 3*4 == 12 elements in total
Let's define a NumPy array holding a 3D-coordinate, like so:
coord = np.array([1,2,3])
Do you now know the attribute values of this 3D-coordinate?
- what's the rank, the number of dimensions, of this array?
- what's the shape of the array? Or in other words, what are the lengths of the axes?
- what's the array size?
Let's find out, together!
coord.ndim # 1 <= even though the array is holding a 3D-coordinate, the array itself is of rank 1, # it just has one dimension!
coord.shape # (3,) <= there's just 1 dimension, 1 axis, with length 3
coord.size # 3 <= in total there are 3 elements stored in the array
What's covered in the next tutorials?
Now that we know the NumPy library exists, that it's used for numerical Python computing, that it uses ndarray objects, which allow for vectorization, how we generate value sequences, and what the attributes / properties of N-dimensional arrays are... in the next tutorial part we can cover some of the NumPy operations, explore some "universal functions" (well-known mathematical functions you are, or maybe were in school / university (?), already familiar to!
However, in the next
Learn Python Series we'll first be focusing on some more built-in modules, to handle files (in general), and CSV and JSON more specifically, as well as using the popular external
Requests: http for humans library, to fetch data from the web. Also, we'll go over using
BeautifulSoup to parse HTML files.
If we combine our Python knowledge regarding strings, lists, dictionaries, tuples, Matplotlib, NumPy, CSV, JSON, fetching web data via Requests, parsing HTML via BeautifulSoup, and reading from and saving to (our own) files, we can do lots of very useful things already! Stay tuned for the following episodes of the
Learn Python Series!
Thank you for your time!
Posted on Utopian.io - Rewarding Open Source Contributors