NumPy Exercises¶

Practice Problems — Warm-Up, Main, Stretch¶

How to Get the Most From These Exercises

Open a Jupyter notebook or Python file. Read the problem. Close these notes and work through it yourself. Only check the solution when you are genuinely stuck or want to compare approaches. A solution you struggled toward is worth ten you read cold. If your solution works but looks different from the provided one, understand why — there is usually a reason one approach is preferred.

Warm-Up — Array Basics¶

These confirm that you have the fundamentals and can navigate the API without hesitation.

Exercise W1 — Create and Inspect¶

Create a 1-D array of even numbers from 2 to 30 inclusive. Print its shape, size, dtype, sum, and mean.

Expected output:

Array:  [ 2  4  6  8 10 12 14 16 18 20 22 24 26 28 30]
Shape:  (15,)
Size:   15
dtype:  int64
Sum:    240
Mean:   16.0

Show answer

import numpy as np

arr = np.arange(2, 32, 2)
print("Array: ", arr)
print("Shape: ", arr.shape)
print("Size:  ", arr.size)
print("dtype: ", arr.dtype)
print("Sum:   ", arr.sum())
print("Mean:  ", arr.mean())

Exercise W2 — Border Matrix¶

Create a 6×6 integer matrix where border elements are 1 and interior elements are 0. Do not use a loop.

Expected output:

[[1 1 1 1 1 1]
 [1 0 0 0 0 1]
 [1 0 0 0 0 1]
 [1 0 0 0 0 1]
 [1 0 0 0 0 1]
 [1 1 1 1 1 1]]

Show answer

import numpy as np

mat = np.ones((6, 6), dtype=int)
mat[1:-1, 1:-1] = 0
print(mat)

# Alternative starting from zeros:
mat2 = np.zeros((6, 6), dtype=int)
mat2[0, :]  = 1
mat2[-1, :] = 1
mat2[:, 0]  = 1
mat2[:, -1] = 1
print(mat2)

Exercise W3 — Checkerboard¶

Create an 8×8 integer matrix with a checkerboard pattern of 0s and 1s. (0, 0) should be 0. Do not use a loop.

Expected output (partial):

[[0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1]
 ...]

Show answer

import numpy as np

# Method 1: index parity
board = np.zeros((8, 8), dtype=int)
board[1::2, ::2]  = 1   # odd rows, even cols
board[::2,  1::2] = 1   # even rows, odd cols
print(board)

# Method 2: sum of indices modulo 2
rows, cols = np.indices((8, 8))
board2 = (rows + cols) % 2
print(board2)

Exercise W4 — Reshape and Navigate 3-D¶

Create an array containing integers 1 through 60. Reshape it to shape (3, 4, 5). Then: 1. Print the element at position [2, 3, 4] 2. Print the entire second "layer" (index 1 along axis 0) 3. Print all values greater than 45

Show answer

import numpy as np

arr = np.arange(1, 61).reshape(3, 4, 5)

# 1. Single element
print(arr[2, 3, 4])   # Output: 60  ← last element

# 2. Second layer (axis 0, index 1)
print(arr[1])
# Output:
# [[21 22 23 24 25]
#  [26 27 28 29 30]
#  [31 32 33 34 35]
#  [36 37 38 39 40]]

# 3. Values greater than 45
print(arr[arr > 45])
# Output: [46 47 48 49 50 51 52 53 54 55 56 57 58 59 60]

Main — Indexing, Vectorization, Broadcasting¶

Realistic problems requiring you to combine multiple concepts.

Exercise M1 — Slicing Patterns¶

Given arr = np.arange(20):

Extract every third element starting from index 1
Extract the last 6 elements in reverse order
Replace all odd-indexed elements with their negative (in-place)

Expected outputs:

1: [ 1  4  7 10 13 16 19]
2: [19 18 17 16 15 14]
3: [ 0 -1  2 -3  4 -5  6 -7  8 -9 10 -11 12 -13 14 -15 16 -17 18 -19]

Show answer

import numpy as np

arr = np.arange(20)

# 1. Every third, starting at index 1
print(arr[1::3])   # Output: [ 1  4  7 10 13 16 19]

# 2. Last 6 in reverse
print(arr[-1:-7:-1])  # Output: [19 18 17 16 15 14]

# 3. Negate odd-indexed elements in-place
arr = np.arange(20)  # reset
arr[1::2] = -arr[1::2]
print(arr)
# Output: [ 0 -1  2 -3  4 -5  6 -7  8 -9 10 -11 12 -13 14 -15 16 -17 18 -19]

Exercise M2 — Boolean Filtering on a Matrix¶

You have salary data for employees across three departments.

import numpy as np
rng = np.random.default_rng(42)
# Shape: (20, 3) — 20 employees, columns: [dept_id, years_exp, salary]
data = np.column_stack([
    rng.integers(0, 3, 20),          # dept: 0, 1, or 2
    rng.integers(1, 15, 20),         # years experience
    rng.integers(40000, 120000, 20)  # salary
])

Without using any loop: 1. Find all employees in department 1 2. Find all employees with salary > 80,000 AND experience > 5 years 3. Count how many employees earn above the overall mean salary 4. Print the salary of the highest-paid employee in department 2

Show answer

import numpy as np
rng = np.random.default_rng(42)
data = np.column_stack([
    rng.integers(0, 3, 20),
    rng.integers(1, 15, 20),
    rng.integers(40000, 120000, 20)
])

# 1. Department 1 employees
dept1 = data[data[:, 0] == 1]
print(f"Dept 1 employees: {len(dept1)}")

# 2. High salary AND experienced
mask = (data[:, 2] > 80000) & (data[:, 1] > 5)
high_earners = data[mask]
print(f"High salary + experienced: {len(high_earners)}")

# 3. Count above mean salary
mean_salary = data[:, 2].mean()
count_above = np.sum(data[:, 2] > mean_salary)
print(f"Mean salary: {mean_salary:.0f}, Above mean: {count_above}")

# 4. Top salary in department 2
dept2_salaries = data[data[:, 0] == 2, 2]
print(f"Highest salary in dept 2: {dept2_salaries.max()}")

Exercise M3 — Vectorize This¶

Rewrite the following loop as a single vectorized NumPy expression. Verify the results match.

import math

data = [3.5, 7.2, 1.1, 8.8, 4.5, 6.3, 2.9, 9.1, 5.0, 3.7]
mean = sum(data) / len(data)
std  = math.sqrt(sum((x - mean)**2 for x in data) / len(data))

result = []
for x in data:
    if x > mean:
        result.append(math.log(x / mean))
    else:
        result.append(-(mean - x) / std)

Show answer

import numpy as np

data = np.array([3.5, 7.2, 1.1, 8.8, 4.5, 6.3, 2.9, 9.1, 5.0, 3.7])
mean = data.mean()
std  = data.std()

# np.where handles the conditional branch vectorized
result = np.where(
    data > mean,
    np.log(data / mean),
    -(mean - data) / std
)
print(result.round(4))
# Both approaches produce the same output.

Exercise M4 — Weekly Temperature Analysis¶

import numpy as np
rng = np.random.default_rng(0)
# 4 weeks × 7 days × 24 hours of temperature readings (Celsius)
temps = 20 + rng.standard_normal((4, 7, 24)) * 6
# Shape: (4, 7, 24)

Using only NumPy (no loops):

What is the average temperature for each week? (one number per week)
Which hour of the day is coldest on average across all weeks and days?
What fraction of all readings exceeded 30°C?
What is the daily temperature range (max - min) for each day of the first week?
Identify the (week, day) combination with the highest single-hour reading

Show answer

import numpy as np
rng = np.random.default_rng(0)
temps = 20 + rng.standard_normal((4, 7, 24)) * 6

# 1. Weekly averages: collapse days and hours
weekly_avg = temps.mean(axis=(1, 2))
print("Weekly averages:", weekly_avg.round(2))

# 2. Coldest hour: average over weeks and days, then find argmin
hourly_avg = temps.mean(axis=(0, 1))   # shape: (24,)
coldest_hour = np.argmin(hourly_avg)
print(f"Coldest hour: {coldest_hour} ({hourly_avg[coldest_hour]:.2f}°C)")

# 3. Fraction above 30°C
fraction_hot = np.mean(temps > 30)
print(f"Fraction > 30°C: {fraction_hot:.3f} ({fraction_hot*100:.1f}%)")

# 4. Daily range for week 0: max and min across hours (axis=2)
week0 = temps[0]          # shape: (7, 24)
daily_range = week0.max(axis=1) - week0.min(axis=1)
print("Daily range (week 0):", daily_range.round(2))

# 5. Week and day of highest single reading
# Collapse hours: find max per (week, day)
daily_max = temps.max(axis=2)    # shape: (4, 7)
flat_idx = np.argmax(daily_max)  # index in flattened array
week_idx, day_idx = np.unravel_index(flat_idx, daily_max.shape)
peak_temp = daily_max[week_idx, day_idx]
print(f"Peak reading: week {week_idx}, day {day_idx}, temp={peak_temp:.2f}°C")

Exercise M5 — Views vs Copies Detective¶

For each operation below, predict whether the result is a view or a copy. Then verify using .base.

import numpy as np

arr = np.arange(24).reshape(4, 6)

a = arr[1:3, :]        # prediction: ?
b = arr[[0, 2], :]     # prediction: ?
c = arr[arr > 10]      # prediction: ?
d = arr.T              # prediction: ?
e = arr.flatten()      # prediction: ?
f = arr.ravel()        # prediction: ?
g = arr.reshape(6, 4)  # prediction: ?

Show answer

import numpy as np

arr = np.arange(24).reshape(4, 6)

a = arr[1:3, :]
b = arr[[0, 2], :]
c = arr[arr > 10]
d = arr.T
e = arr.flatten()
f = arr.ravel()
g = arr.reshape(6, 4)

results = {
    'a = arr[1:3,:]   (basic slice)': a,
    'b = arr[[0,2],:] (fancy index)': b,
    'c = arr[arr>10]  (bool index) ': c,
    'd = arr.T        (transpose)  ': d,
    'e = arr.flatten()(flatten)    ': e,
    'f = arr.ravel()  (ravel)      ': f,
    'g = arr.reshape  (reshape)    ': g,
}

for name, x in results.items():
    kind = 'VIEW' if x.base is not None else 'COPY'
    print(f"{name} → {kind}")

# Expected output:
# a = arr[1:3,:]    → VIEW
# b = arr[[0,2],:]  → COPY
# c = arr[arr>10]   → COPY
# d = arr.T         → VIEW
# e = arr.flatten() → COPY
# f = arr.ravel()   → VIEW  (usually — depends on memory layout)
# g = arr.reshape() → VIEW  (usually)

Stretch — Harder, No Hints¶

These problems require combining multiple ideas. The expected output is shown; the path is yours to find.

Exercise S1 — Implement Euclidean Distance Without `np.linalg`¶

Write a function pairwise_distances(X) that takes a 2-D array of shape (n, d) (n points in d-dimensional space) and returns the (n, n) pairwise distance matrix. No loops, no np.linalg.norm on pairs.

import numpy as np
rng = np.random.default_rng(7)
X = rng.standard_normal((5, 3))
D = pairwise_distances(X)
# D[i, j] = Euclidean distance between point i and point j
# D should be symmetric, and D[i, i] = 0

Verify: - D is symmetric: np.allclose(D, D.T) → True - Diagonal is zero: np.allclose(np.diag(D), 0) → True

Show answer

import numpy as np

def pairwise_distances(X):
    # X: (n, d)
    # Expand to (n, 1, d) and (1, n, d), then broadcast
    diff = X[:, np.newaxis, :] - X[np.newaxis, :, :]  # (n, n, d)
    return np.sqrt((diff ** 2).sum(axis=-1))           # (n, n)

rng = np.random.default_rng(7)
X = rng.standard_normal((5, 3))
D = pairwise_distances(X)

print("Symmetric:", np.allclose(D, D.T))       # True
print("Zero diag:", np.allclose(np.diag(D), 0)) # True
print(D.round(3))

Exercise S2 — Complete Preprocessing Pipeline¶

import numpy as np
rng = np.random.default_rng(99)
raw = rng.standard_normal((200, 6)) * np.array([10, 2, 50, 0.5, 100, 1])
# Introduce some outliers
raw[::20] *= 5

Build a pipeline (no loops) that:

Removes any row where at least one feature is more than 4 standard deviations from that feature's mean
Applies min-max normalization to each feature (scale to [0, 1])
Computes the (6, 6) correlation matrix of the normalized features using only NumPy
Finds the pair of features with the highest absolute correlation (excluding self-correlation)

Show answer

import numpy as np
rng = np.random.default_rng(99)
raw = rng.standard_normal((200, 6)) * np.array([10, 2, 50, 0.5, 100, 1])
raw[::20] *= 5

# Step 1: Remove outlier rows
mean = raw.mean(axis=0)          # shape: (6,)
std  = raw.std(axis=0)           # shape: (6,)
z_scores = np.abs((raw - mean) / std)   # shape: (200, 6)
clean_mask = np.all(z_scores <= 4, axis=1)
clean = raw[clean_mask]
print(f"Rows after outlier removal: {clean.shape[0]}")

# Step 2: Min-max normalize each feature
col_min = clean.min(axis=0)      # shape: (6,)
col_max = clean.max(axis=0)      # shape: (6,)
normalized = (clean - col_min) / (col_max - col_min)

# Step 3: Correlation matrix using NumPy
# np.corrcoef expects (features, samples) — transpose first
corr = np.corrcoef(normalized.T)   # shape: (6, 6)
print("Correlation matrix:")
print(corr.round(3))

# Step 4: Find highest off-diagonal absolute correlation
# Mask out the diagonal (which is always 1)
mask = ~np.eye(6, dtype=bool)
abs_corr = np.abs(corr)
abs_corr_masked = abs_corr * mask   # zero out diagonal

flat_idx = np.argmax(abs_corr_masked)
i, j = np.unravel_index(flat_idx, corr.shape)
print(f"Highest correlation: features {i} and {j}, r = {corr[i, j]:.4f}")

Exercise S3 — Rolling Window Statistics¶

Compute the rolling mean and rolling standard deviation for a time series using only NumPy (no Pandas, no loops for the window computation).

import numpy as np
rng = np.random.default_rng(42)
prices = 100 + np.cumsum(rng.standard_normal(500) * 2)
window = 20

Expected behavior: - rolling_mean[i] = mean of prices[i : i + window] for i in range(len(prices) - window + 1) - Result shape: (len(prices) - window + 1,) = (481,)

Show answer

import numpy as np
rng = np.random.default_rng(42)
prices = 100 + np.cumsum(rng.standard_normal(500) * 2)
window = 20
n = len(prices) - window + 1   # 481

# Build a (n, window) array using stride tricks
# Each row is a sliding window view
from numpy.lib.stride_tricks import as_strided

item_size = prices.strides[0]
windows = as_strided(
    prices,
    shape=(n, window),
    strides=(item_size, item_size)
)
# windows[i] = prices[i : i + window]

rolling_mean = windows.mean(axis=1)
rolling_std  = windows.std(axis=1)

print(f"Shape: {rolling_mean.shape}")   # (481,)
print(f"First rolling mean: {rolling_mean[0]:.3f}")
print(f"Last rolling mean:  {rolling_mean[-1]:.3f}")

# Verify first value
expected_first = prices[:window].mean()
print(f"Manual check: {expected_first:.3f}")  # should match rolling_mean[0]

# Alternative without stride tricks (using broadcasting index array)
indices = np.arange(window) + np.arange(n)[:, np.newaxis]  # shape: (n, window)
rolling_mean_v2 = prices[indices].mean(axis=1)
print("Methods agree:", np.allclose(rolling_mean, rolling_mean_v2))

Progress Tracker¶

Exercise	Solved Without Help?	Understood Solution?
W1 — Create and Inspect	☐	☐
W2 — Border Matrix	☐	☐
W3 — Checkerboard	☐	☐
W4 — Reshape 3-D	☐	☐
M1 — Slicing Patterns	☐	☐
M2 — Boolean Filtering	☐	☐
M3 — Vectorize This	☐	☐
M4 — Temperature Analysis	☐	☐
M5 — Views vs Copies	☐	☐
S1 — Pairwise Distance	☐	☐
S2 — Preprocessing Pipeline	☐	☐
S3 — Rolling Window	☐	☐

05-broadcasting | 07-cheat-sheet

NumPy Exercises¶

Practice Problems — Warm-Up, Main, Stretch¶

Warm-Up — Array Basics¶

Exercise W1 — Create and Inspect¶

Exercise W2 — Border Matrix¶

Exercise W3 — Checkerboard¶

Exercise W4 — Reshape and Navigate 3-D¶

Main — Indexing, Vectorization, Broadcasting¶

Exercise M1 — Slicing Patterns¶

Exercise M2 — Boolean Filtering on a Matrix¶

Exercise M3 — Vectorize This¶

Exercise M4 — Weekly Temperature Analysis¶

Exercise M5 — Views vs Copies Detective¶

Stretch — Harder, No Hints¶

Exercise S1 — Implement Euclidean Distance Without np.linalg¶

Exercise S2 — Complete Preprocessing Pipeline¶

Exercise S3 — Rolling Window Statistics¶

Progress Tracker¶

Exercise S1 — Implement Euclidean Distance Without `np.linalg`¶