2.1. معالجة البيانات
Open the notebook in Colab

لإنجاز أي شيء متعلّق بالبيانات ، نحتاج إلى طريقة لتخزينها ومعالجتها.

عمومًا ، هناك شيئان مهمان نحتاج إلى القيام بهما مع البيانات: (1) الحصول عليها ؛ و (2) معالجتها بمجرد أن تكون داخل الكمبيوتر. لا فائدة من الحصول على البيانات دون أي وسيلة لتخزينها، لذلك دعونا نتدرّب أوّلاً على بعض البيانات الإصطناعية. للبدء ، سوف نقوم بإعتبار الصفيف (array) كهيكل أساسي لتخزين البيانات التّي سنتعامل معها، و سنشير إليه في بقية الكتاب ب ndarray حيث تمثل \(n\) عدد أبعاده و أمّا d فيشير إلى كلمة أبعاد المأخوذة من اللفظ الإنجليزي dimension. إعلم أنّ الndarray هو الأداة الرئيسية لتخزين البيانات وتحويلها في MXNet.

إذا إستخدمت NumPy من قبل ، و هي حزمة الحوسبة العلمية الأكثر استخدامًا في Python ، فستَجِدُ هذا الِقسمَ مألوفًا. إعلم أنّ المحافظة على هذا الشبه قد كان واحدًا من مبادئ التصميم الأساسية لMXNet. لقد قمنا بتصميم ndarray في MXNet ليكون امتدادًا لـ ndarray الموجود في NumPy مع إضافة بعض الميزات المتقدّمة و هي: أولاً ، تدعم ndarray من MXNet الحساب غير المتزامن على وحدات المعالجة المركزية ، GPU ، والبنية السحابية الموزعة ، في حين يدعم NumPy الحساب على وحدة المعالجة المركزية فقط. ثانياً ، تدعم ndarray من MXNet التمايز التلقائي (automatic differentiation). هذه الخصائص تجعل ndarray من MXNet مناسبة للتعلّم العميق. في بقية الكتاب ، عندما نقول ndarray ، فإننا نُشير إلى ndarray من MXNet ما لم يُنَصَّ على خلاف ذلك.

2.1.1. لنشرع بالعمل

في هذا القسم ، نَهدف إلى تزويدك بالأدوات الأساسية للحساب الرياضي والحساب الرقمي التي ستستند إليها أثناء تقدمك في الكتاب. نوفر أيضا العديد من الأمثلة التي ستمكنك من التمرّن على إستخدام مكتبة MXNet. لا تقلق إذا واجهت صعوبة في فهم بعض المفاهيم الرياضية أو وظائف المكتبة. سنقوم في الأقسام التالية بإعادة النظر في مواد هذا الباب في سياق أمثلة عمليّة وسوف تفهمها أكثر حينها. من ناحية أخرى ، إذا كنت متمرسا على المعلومات الموجودة في هذا الباب وترغب في التعمق في المحتوى الرياضي ، فما عليك سوى تخطي هذا القسم.

للبدء ، لنستورد الوحدات النمطية np (numpy) وnpx (numpy_extension) من MXNet. هنا ، تشتمل الوحدة النمطية np على وظائف يدعمها NumPy ، بينما تحتوي الوحدة النمطية npx على مجموعة من الملحقات التي تم تطويرها لتمكين التعلّم العميق في بيئة تشبه NumPy. عند استخدام ndarray ، فإننا دائمًا ما نستدعي وظيفة set_np : نقوم بهذا لكي نظمن توافق معالجة ndarray مع بقية مكونات مكتبة MXNet.

from mxnet import np, npx
npx.set_np()

يمثل ndarray مجموعة (قد تكون متعددة الأبعاد) من القيم العددية. إذا كان هناك محور واحد ، فإنّ ndarray يقابل (في الرياضيات) متجه (vector). مع محورين ،يتطابق “ndarray” مع مصفوفة (matrix). لا توجد أسماء رياضية مخصّصة للمصفوفات التي تحتوي على أكثر من محورين — لذلك نحن ببساطة نسميها tensors.

للبدء ، يُمكننا استخدام arange لإنشاء متجه صفي (row vector) سنسميه x يحتوي على الأعداد صحيحة من \(0\) إلى \(12\). عند إنشاء هذه القيم سيعتبرها الحاسوب مبدئيا على أنّها أعداد عشرية. إلاّ أنّنا نسمى كُلاًّ من القيم الموجودة في ndarray ب* العنصر* (element) منndarray. على سبيل المثال ، نقول بأنّ هناك \(12\) عنصرًا في الndarray المسمى x.

ما لم يُنَصَّ على خلاف ذلك ، سيتم تخزين ndarray جديد في الذاكرة الرئيسية وتخصيصه للحساب المستند على وحدة المعالجة المركزية (CPU).

x = np.arange(12)
x
array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11.])

We can access an ndarray’s shape (the length along each axis) by inspecting its shape property.

x.shape
(12,)

If we just want to know the total number of elements in an ndarray, i.e., the product of all of the shape elements, we can inspect its size property. Because we are dealing with a vector here, the single element of its shape is identical to its size.

x.size
12

To change the shape of an ndarray without altering either the number of elements or their values, we can invoke the reshape function. For example, we can transform our ndarray, x, from a row vector with shape (\(12\),) to a matrix with shape (\(3\), \(4\)). This new ndarray contains the exact same values, but views them as a matrix organized as \(3\) rows and \(4\) columns. To reiterate, although the shape has changed, the elements in x have not. Note that the size is unaltered by reshaping.

x = x.reshape(3, 4)
x
array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.],
       [ 8.,  9., 10., 11.]])

Reshaping by manually specifying every dimension is unnecessary. If our target shape is a matrix with shape (height, width), then after we know the width, the height is given implicitly. Why should we have to perform the division ourselves? In the example above, to get a matrix with \(3\) rows, we specified both that it should have \(3\) rows and \(4\) columns. Fortunately, ndarray can automatically work out one dimension given the rest. We invoke this capability by placing -1 for the dimension that we would like ndarray to automatically infer. In our case, instead of calling x.reshape(3, 4), we could have equivalently called x.reshape(-1, 4) or x.reshape(3, -1).

The empty method grabs a chunk of memory and hands us back a matrix without bothering to change the value of any of its entries. This is remarkably efficient but we must be careful because the entries might take arbitrary values, including very big ones!

np.empty((3, 4))
array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

Typically, we will want our matrices initialized either with zeros, ones, some other constants, or numbers randomly sampled from a specific distribution. We can create an ndarray representing a tensor with all elements set to \(0\) and a shape of (\(2\), \(3\), \(4\)) as follows:

np.zeros((2, 3, 4))
array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]])

Similarly, we can create tensors with each element set to 1 as follows:

np.ones((2, 3, 4))
array([[[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]])

Often, we want to randomly sample the values for each element in an ndarray from some probability distribution. For example, when we construct arrays to serve as parameters in a neural network, we will typically inititialize their values randomly. The following snippet creates an ndarray with shape (\(3\), \(4\)). Each of its elements is randomly sampled from a standard Gaussian (normal) distribution with a mean of \(0\) and a standard deviation of \(1\).

np.random.normal(0, 1, size=(3, 4))
array([[ 1.1630787 ,  2.2122064 ,  0.4838046 ,  0.7740038 ],
       [ 0.29956347,  1.0434403 ,  0.15302546,  1.1839255 ],
       [-1.1688148 ,  1.8917114 ,  1.5580711 , -1.2347414 ]])

We can also specify the exact values for each element in the desired ndarray by supplying a Python list (or list of lists) containing the numerical values. Here, the outermost list corresponds to axis \(0\), and the inner list to axis \(1\).

np.array([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
array([[2., 1., 4., 3.],
       [1., 2., 3., 4.],
       [4., 3., 2., 1.]])

2.1.2. Operations

This book is not about software engineering. Our interests are not limited to simply reading and writing data from/to arrays. We want to perform mathematical operations on those arrays. Some of the simplest and most useful operations are the elementwise operations. These apply a standard scalar operation to each element of an array. For functions that take two arrays as inputs, elementwise operations apply some standard binary operator on each pair of corresponding elements from the two arrays. We can create an elementwise function from any function that maps from a scalar to a scalar.

In mathematical notation, we would denote such a unary scalar operator (taking one input) by the signature \(f: \mathbb{R} \rightarrow \mathbb{R}\). This just mean that the function is mapping from any real number (\(\mathbb{R}\)) onto another. Likewise, we denote a binary scalar operator (taking two real inputs, and yielding one output) by the signature \(f: \mathbb{R}, \mathbb{R} \rightarrow \mathbb{R}\). Given any two vectors \(\mathbf{u}\) and \(\mathbf{v}\) of the same shape, and a binary operator \(f\), we can produce a vector \(\mathbf{c} = F(\mathbf{u},\mathbf{v})\) by setting \(c_i \gets f(u_i, v_i)\) for all \(i\), where \(c_i, u_i\), and \(v_i\) are the \(i^\mathrm{th}\) elements of vectors \(\mathbf{c}, \mathbf{u}\), and \(\mathbf{v}\). Here, we produced the vector-valued \(F: \mathbb{R}^d, \mathbb{R}^d \rightarrow \mathbb{R}^d\) by lifting the scalar function to an elementwise vector operation.

In MXNet, the common standard arithmetic operators (+, -, *, /, and **) have all been lifted to elementwise operations for any identically-shaped tensors of arbitrary shape. We can call elementwise operations on any two tensors of the same shape. In the following example, we use commas to formulate a \(5\)-element tuple, where each element is the result of an elementwise operation.

x = np.array([1, 2, 4, 8])
y = np.array([2, 2, 2, 2])
x + y, x - y, x * y, x / y, x ** y  # The ** operator is exponentiation
(array([ 3.,  4.,  6., 10.]),
 array([-1.,  0.,  2.,  6.]),
 array([ 2.,  4.,  8., 16.]),
 array([0.5, 1. , 2. , 4. ]),
 array([ 1.,  4., 16., 64.]))

Many more operations can be applied elementwise, including unary operators like exponentiation.

np.exp(x)
array([2.7182817e+00, 7.3890562e+00, 5.4598148e+01, 2.9809580e+03])

In addition to elementwise computations, we can also perform linear algebra operations, including vector dot products and matrix multiplication. We will explain the crucial bits of linear algebra (with no assumed prior knowledge) in Section 2.3.

We can also concatenate multiple ndarrays together, stacking them end-to-end to form a larger ndarray. We just need to provide a list of ndarrays and tell the system along which axis to concatenate. The example below shows what happens when we concatenate two matrices along rows (axis \(0\), the first element of the shape) vs. columns (axis \(1\), the second element of the shape). We can see that, the first output ndarray‘s axis-\(0\) length (\(6\)) is the sum of the two input ndarrays’ axis-\(0\) lengths (\(3 + 3\)); while the second output ndarray‘s axis-\(1\) length (\(8\)) is the sum of the two input ndarrays’ axis-\(1\) lengths (\(4 + 4\)).

x = np.arange(12).reshape(3, 4)
y = np.array([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
np.concatenate([x, y], axis=0), np.concatenate([x, y], axis=1)
(array([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [ 2.,  1.,  4.,  3.],
        [ 1.,  2.,  3.,  4.],
        [ 4.,  3.,  2.,  1.]]),
 array([[ 0.,  1.,  2.,  3.,  2.,  1.,  4.,  3.],
        [ 4.,  5.,  6.,  7.,  1.,  2.,  3.,  4.],
        [ 8.,  9., 10., 11.,  4.,  3.,  2.,  1.]]))

Sometimes, we want to construct a binary ndarray via logical statements. Take x == y as an example. For each position, if x and y are equal at that position, the corresponding entry in the new ndarray takes a value of \(1\), meaning that the logical statement x == y is true at that position; otherwise that position takes \(0\).

x == y
array([[False,  True, False,  True],
       [False, False, False, False],
       [False, False, False, False]])

Summing all the elements in the ndarray yields an ndarray with only one element.

x.sum()
array(66.)

For stylistic convenience, we can write x.sum()as np.sum(x).

2.1.3. Broadcasting Mechanism

In the above section, we saw how to perform elementwise operations on two ndarrays of the same shape. Under certain conditions, even when shapes differ, we can still perform elementwise operations by invoking the broadcasting mechanism. These mechanisms work in the following way: First, expand one or both arrays by copying elements appropriately so that after this transformation, the two ndarrays have the same shape. Second, carry out the elementwise operations on the resulting arrays.

In most cases, we broadcast along an axis where an array initially only has length \(1\), such as in the following example:

a = np.arange(3).reshape(3, 1)
b = np.arange(2).reshape(1, 2)
a, b
(array([[0.],
        [1.],
        [2.]]), array([[0., 1.]]))

Since a and b are \(3\times1\) and \(1\times2\) matrices respectively, their shapes do not match up if we want to add them. We broadcast the entries of both matrices into a larger \(3\times2\) matrix as follows: for matrix a it replicates the columns and for matrix b it replicates the rows before adding up both elementwise.

a + b
array([[0., 1.],
       [1., 2.],
       [2., 3.]])

2.1.4. Indexing and Slicing

Just as in any other Python array, elements in an ndarray can be accessed by index. As in any Python array, the first element has index \(0\) and ranges are specified to include the first but before the last element. As in standard Python lists, we can access elements according to their relative position to the end of the list by using negative indices.

Thus, [-1] selects the last element and [1:3] selects the second and the third elements as follows:

x[-1], x[1:3]
(array([ 8.,  9., 10., 11.]), array([[ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]]))

Beyond reading, we can also write elements of a matrix by specifying indices.

x[1, 2] = 9
x
array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  9.,  7.],
       [ 8.,  9., 10., 11.]])

If we want to assign multiple elements the same value, we simply index all of them and then assign them the value. For instance, [0:2, :] accesses the first and second rows, where : takes all the elements along axis \(1\) (column). While we discussed indexing for matrices, this obviously also works for vectors and for tensors of more than \(2\) dimensions.

x[0:2, :] = 12
x
array([[12., 12., 12., 12.],
       [12., 12., 12., 12.],
       [ 8.,  9., 10., 11.]])

2.1.5. حفظ الذاكرة

In the previous example, every time we ran an operation, we allocated new memory to host its results. For example, if we write y = x + y, we will dereference the ndarray that y used to point to and instead point y at the newly allocated memory. In the following example, we demonstrate this with Python’s id() function, which gives us the exact address of the referenced object in memory. After running y = y + x, we will find that id(y) points to a different location. That is because Python first evaluates y + x, allocating new memory for the result and then makes y point to this new location in memory.

before = id(y)
y = y + x
id(y) == before
False

This might be undesirable for two reasons. First, we do not want to run around allocating memory unnecessarily all the time. In machine learning, we might have hundreds of megabytes of parameters and update all of them multiple times per second. Typically, we will want to perform these updates in place. Second, we might point at the same parameters from multiple variables. If we do not update in place, this could cause that discarded memory is not released, and make it possible for parts of our code to inadvertently reference stale parameters.

Fortunately, performing in-place operations in MXNet is easy. We can assign the result of an operation to a previously allocated array with slice notation, e.g., y[:] = <expression>. To illustrate this concept, we first create a new matrix z with the same shape as another y, using zeros_like to allocate a block of \(0\) entries.

z = np.zeros_like(y)
print('id(z):', id(z))
z[:] = x + y
print('id(z):', id(z))
id(z): 140525583902592
id(z): 140525583902592

If the value of x is not reused in subsequent computations, we can also use x[:] = x + y or x += y to reduce the memory overhead of the operation.

before = id(x)
x += y
id(x) == before
True

2.1.6. Conversion to Other Python Objects

Converting an MXNet ndarray to a NumPy ndarray, or vice versa, is easy. The converted result does not share memory. This minor inconvenience is actually quite important: when you perform operations on the CPU or on GPUs, you do not want MXNet to halt computation, waiting to see whether the NumPy package of Python might want to be doing something else with the same chunk of memory. The array and asnumpy functions do the trick.

a = x.asnumpy()
b = np.array(a)
type(a), type(b)
(numpy.ndarray, mxnet.numpy.ndarray)

To convert a size-\(1\) ndarray to a Python scalar, we can invoke the item function or Python’s built-in functions.

a = np.array([3.5])
a, a.item(), float(a), int(a)
(array([3.5]), 3.5, 3.5, 3)

2.1.7. الخلاصة

  • MXNet’s ndarray is an extension to NumPy’s ndarray with a few killer advantages that make it suitable for deep learning.

  • MXNet’s ndarray provides a variety of functionalities including basic mathematics operations, broadcasting, indexing, slicing, memory saving, and conversion to other Python objects.

2.1.8. التّمارين

  1. Run the code in this section. Change the conditional statement x == y in this section to x < y or x > y, and then see what kind of ndarray you can get.

  2. Replace the two ndarrays that operate by element in the broadcasting mechanism with other shapes, e.g., three dimensional tensors. Is the result the same as expected?

2.1.9. Discussions

image0