NumPy Ninjutsu: Mastering Array Operations for High-Performance Machine Learning.
Image by Author | Ideogram
Machine learning workflows typically involve plenty of numerical computations in the form of mathematical and algebraic operations upon data stored as large vectors, matrices, or even tensors — matrix counterparts with three or more dimensions. In terms of computing cost, this can translate into significant time and memory consumption to process, train, and apply inference processes like predictions with such large data structures. Therefore, optimizing the efficiency of these low-level operations is key.
This is where the NumPy library can help: NumPy arrays are data structures designed for fast, memory-efficient numerical computation, leveraging agile computational processes like vectorization and broadcasting, thereby enabling high-performance machine learning modeling processes, as fast and silent as a ninja.
In this article, we uncover some representative examples of NumPy array operations that come in particularly handy to optimize the performance of machine learning workflows.
Vectorized Operations
NumPy allows applying arithmetic operations or mathematical functions to an entire array, such that the operation (or function) is applied element-wise, without the need for loops. For instance, given an array arr
with 1000 elements, arr*2
multiplies every element in the array by two.
Regarding the use of functions, vectorization becomes very handy, for instance, to conduct activation functions in manually defined neural networks. The ReLU (Rectified Linear Unit) function is a very common activation function proven effective in training neural network models. It dissipates linearity in information by mapping negative values to 0, and keeping positive values unaltered. This is how ReLU activation can be implemented using NumPy array vectorization:
import numpy as np input_array = np.array([–2.0, 0.0, 1.5]) output = np.maximum(0, input_array) print(output) |
Output: [0. 0. 1.5]
We just defined an array of three elements &mdash information flowing through three neurons — to make the example easily understandable. The true magic occurs when, instead of three elements, we have thousands or millions of them. That’s when the efficiency optimization truly makes the difference.
Broadcasting for Batch Computation
Another attractive feature of NumPy array operations is broadcasting: it entails adapting the size of at least one out of multiple arrays being used in a mathematical operation between them. Standardizing values in a matrix of data is a good example of this, and a very common process in various machine learning modeling techniques that often require scaled data for more effective results.
batch = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
normalized = (batch – batch.mean(axis=0)) / batch.std(axis=0) |
Output:
array([[–1.22474487, –1.22474487], [ 0. , 0. ], [ 1.22474487, 1.22474487]]) |
In the above example, both batch.mean(axis=0)
and batch.std(axis=0)
are 1D arrays containing two elements each: the column-wise means and standard deviations, respectively. Thus, for every single element in the 2D matrix, standardization consists of subtracting the mean and dividing by the standard deviation associated with the column that the element belongs to.
Matrix Multiplication
Matrix multiplication is at the core of linear transformations applied by plenty of machine learning models, both classical ones and neural network-based, namely multiplying flowing information by connection weights among two consecutive fully connected layers of neurons. This operation takes place even in very large models like transformers, underlying language models.
Here’s how it works NumPy-wise, simulating two fully connected layers containing two neurons each:
weights = np.array([[0.2, 0.8], [0.5, 0.1]]) inputs = np.array([1.0, 2.0]) bias = np.array([0.1, –0.1]) output = np.dot(weights, inputs) + bias
print(np.round(output, 2)) |
Output: [1.9 0.6]
Advanced Row Selection by Masking
This is useful when you need to select certain instances in your dataset based on certain external criteria, for instance, modeled by a boolean mask as a 1D array of “true vs false” elements used to decide which rows in the 2D dataset to filter. The following example uses a mask that selects the second and third instances in the dataset matrix:
data = np.array([[1, 2], [3, 4], [5, 6]]) labels = np.array([0, 1, 1]) filtered = data[labels == 1] print(filtered) |
Output:
ArgMax for Probabilistic Class Prediction
Several classification models use a function called softmax to calculate normalized probabilities of an instance belonging to a class, out of several mutually exclusive classes or categories. In language models that generate text responses word by word by applying a next word prediction problem sequentially, this softmax principle becomes astronomically complex, requiring the computation of the probability of every word in a vocabulary (typically a human language) being the next one to generate. Thanks to the np.argmax
, finding the word (or generally speaking, the class) with the highest probability becomes incredibly easier.
This example illustrates the application of this function twice, for two instances in which the probabilities of belonging to three possible classes are stored in the logits
matrix:
logits = np.array([[0.2, 0.8, 0.0], [0.5, 0.3, 0.2]]) predictions = np.argmax(logits, axis=1) print(predictions) |
Output: [1 0]
The outputs are the chosen class (classes are by default indexed from 0 to 2) to classify each instance.
Custom Tensor Operations with Einsum
Einsum — abbreviation for Einstein summation — is an interesting function in NumPy. It may seem trivial at first glance, but this function deals with a specific notation to express algebraic operations on arrays like dot product, outer products, and even transformer attention mechanisms in an interpretable fashion. It can also be handy to construct custom layers in deep learning architectures.
To have a small glimpse of this function, let’s see this example, which uses the equivalent “einsum” expression to indicate the application of matrix multiplication: 'ij,jk->ik'
.
A = np.array([[1, 2], [3, 4]]) B = np.array([[5, 6], [7, 8]]) result = np.einsum(‘ij,jk->ik’, A, B) print(result) |
Output:
If the mechanisms behind this function don’t look like real ninjutsu, what else would?
For more information on how this function works, check the NumPy documentation page.
Conclusion
This article unveiled six intriguing ninjutsu tricks strategies provided by Python’s NumPy library to perform array operations efficiently, useful for scaling up custom machine learning workflows that conduct intensive computations on data, model weights, etc.