NumPy Exercises¶
Practice Problems — Warm-Up, Main, Stretch¶
How to Get the Most From These Exercises
Open a Jupyter notebook or Python file. Read the problem. Close these notes and work through it yourself. Only check the solution when you are genuinely stuck or want to compare approaches. A solution you struggled toward is worth ten you read cold. If your solution works but looks different from the provided one, understand why — there is usually a reason one approach is preferred.
Warm-Up — Array Basics¶
These confirm that you have the fundamentals and can navigate the API without hesitation.
Exercise W1 — Create and Inspect¶
Create a 1-D array of even numbers from 2 to 30 inclusive. Print its shape, size, dtype, sum, and mean.
Expected output:
Array: [ 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30]
Shape: (15,)
Size: 15
dtype: int64
Sum: 240
Mean: 16.0
Show answer
Exercise W2 — Border Matrix¶
Create a 6×6 integer matrix where border elements are 1 and interior elements are 0. Do not use a loop.
Expected output:
Show answer
Exercise W3 — Checkerboard¶
Create an 8×8 integer matrix with a checkerboard pattern of 0s and 1s. (0, 0) should be 0. Do not use a loop.
Expected output (partial):
Show answer
Exercise W4 — Reshape and Navigate 3-D¶
Create an array containing integers 1 through 60. Reshape it to shape (3, 4, 5). Then:
1. Print the element at position [2, 3, 4]
2. Print the entire second "layer" (index 1 along axis 0)
3. Print all values greater than 45
Show answer
import numpy as np
arr = np.arange(1, 61).reshape(3, 4, 5)
# 1. Single element
print(arr[2, 3, 4]) # Output: 60 ← last element
# 2. Second layer (axis 0, index 1)
print(arr[1])
# Output:
# [[21 22 23 24 25]
# [26 27 28 29 30]
# [31 32 33 34 35]
# [36 37 38 39 40]]
# 3. Values greater than 45
print(arr[arr > 45])
# Output: [46 47 48 49 50 51 52 53 54 55 56 57 58 59 60]
Main — Indexing, Vectorization, Broadcasting¶
Realistic problems requiring you to combine multiple concepts.
Exercise M1 — Slicing Patterns¶
Given arr = np.arange(20):
- Extract every third element starting from index 1
- Extract the last 6 elements in reverse order
- Replace all odd-indexed elements with their negative (in-place)
Expected outputs:
1: [ 1 4 7 10 13 16 19]
2: [19 18 17 16 15 14]
3: [ 0 -1 2 -3 4 -5 6 -7 8 -9 10 -11 12 -13 14 -15 16 -17 18 -19]
Show answer
import numpy as np
arr = np.arange(20)
# 1. Every third, starting at index 1
print(arr[1::3]) # Output: [ 1 4 7 10 13 16 19]
# 2. Last 6 in reverse
print(arr[-1:-7:-1]) # Output: [19 18 17 16 15 14]
# 3. Negate odd-indexed elements in-place
arr = np.arange(20) # reset
arr[1::2] = -arr[1::2]
print(arr)
# Output: [ 0 -1 2 -3 4 -5 6 -7 8 -9 10 -11 12 -13 14 -15 16 -17 18 -19]
Exercise M2 — Boolean Filtering on a Matrix¶
You have salary data for employees across three departments.
import numpy as np
rng = np.random.default_rng(42)
# Shape: (20, 3) — 20 employees, columns: [dept_id, years_exp, salary]
data = np.column_stack([
rng.integers(0, 3, 20), # dept: 0, 1, or 2
rng.integers(1, 15, 20), # years experience
rng.integers(40000, 120000, 20) # salary
])
Without using any loop: 1. Find all employees in department 1 2. Find all employees with salary > 80,000 AND experience > 5 years 3. Count how many employees earn above the overall mean salary 4. Print the salary of the highest-paid employee in department 2
Show answer
import numpy as np
rng = np.random.default_rng(42)
data = np.column_stack([
rng.integers(0, 3, 20),
rng.integers(1, 15, 20),
rng.integers(40000, 120000, 20)
])
# 1. Department 1 employees
dept1 = data[data[:, 0] == 1]
print(f"Dept 1 employees: {len(dept1)}")
# 2. High salary AND experienced
mask = (data[:, 2] > 80000) & (data[:, 1] > 5)
high_earners = data[mask]
print(f"High salary + experienced: {len(high_earners)}")
# 3. Count above mean salary
mean_salary = data[:, 2].mean()
count_above = np.sum(data[:, 2] > mean_salary)
print(f"Mean salary: {mean_salary:.0f}, Above mean: {count_above}")
# 4. Top salary in department 2
dept2_salaries = data[data[:, 0] == 2, 2]
print(f"Highest salary in dept 2: {dept2_salaries.max()}")
Exercise M3 — Vectorize This¶
Rewrite the following loop as a single vectorized NumPy expression. Verify the results match.
import math
data = [3.5, 7.2, 1.1, 8.8, 4.5, 6.3, 2.9, 9.1, 5.0, 3.7]
mean = sum(data) / len(data)
std = math.sqrt(sum((x - mean)**2 for x in data) / len(data))
result = []
for x in data:
if x > mean:
result.append(math.log(x / mean))
else:
result.append(-(mean - x) / std)
Show answer
import numpy as np
data = np.array([3.5, 7.2, 1.1, 8.8, 4.5, 6.3, 2.9, 9.1, 5.0, 3.7])
mean = data.mean()
std = data.std()
# np.where handles the conditional branch vectorized
result = np.where(
data > mean,
np.log(data / mean),
-(mean - data) / std
)
print(result.round(4))
# Both approaches produce the same output.
Exercise M4 — Weekly Temperature Analysis¶
import numpy as np
rng = np.random.default_rng(0)
# 4 weeks × 7 days × 24 hours of temperature readings (Celsius)
temps = 20 + rng.standard_normal((4, 7, 24)) * 6
# Shape: (4, 7, 24)
Using only NumPy (no loops):
- What is the average temperature for each week? (one number per week)
- Which hour of the day is coldest on average across all weeks and days?
- What fraction of all readings exceeded 30°C?
- What is the daily temperature range (max - min) for each day of the first week?
- Identify the
(week, day)combination with the highest single-hour reading
Show answer
import numpy as np
rng = np.random.default_rng(0)
temps = 20 + rng.standard_normal((4, 7, 24)) * 6
# 1. Weekly averages: collapse days and hours
weekly_avg = temps.mean(axis=(1, 2))
print("Weekly averages:", weekly_avg.round(2))
# 2. Coldest hour: average over weeks and days, then find argmin
hourly_avg = temps.mean(axis=(0, 1)) # shape: (24,)
coldest_hour = np.argmin(hourly_avg)
print(f"Coldest hour: {coldest_hour} ({hourly_avg[coldest_hour]:.2f}°C)")
# 3. Fraction above 30°C
fraction_hot = np.mean(temps > 30)
print(f"Fraction > 30°C: {fraction_hot:.3f} ({fraction_hot*100:.1f}%)")
# 4. Daily range for week 0: max and min across hours (axis=2)
week0 = temps[0] # shape: (7, 24)
daily_range = week0.max(axis=1) - week0.min(axis=1)
print("Daily range (week 0):", daily_range.round(2))
# 5. Week and day of highest single reading
# Collapse hours: find max per (week, day)
daily_max = temps.max(axis=2) # shape: (4, 7)
flat_idx = np.argmax(daily_max) # index in flattened array
week_idx, day_idx = np.unravel_index(flat_idx, daily_max.shape)
peak_temp = daily_max[week_idx, day_idx]
print(f"Peak reading: week {week_idx}, day {day_idx}, temp={peak_temp:.2f}°C")
Exercise M5 — Views vs Copies Detective¶
For each operation below, predict whether the result is a view or a copy. Then verify using .base.
import numpy as np
arr = np.arange(24).reshape(4, 6)
a = arr[1:3, :] # prediction: ?
b = arr[[0, 2], :] # prediction: ?
c = arr[arr > 10] # prediction: ?
d = arr.T # prediction: ?
e = arr.flatten() # prediction: ?
f = arr.ravel() # prediction: ?
g = arr.reshape(6, 4) # prediction: ?
Show answer
import numpy as np
arr = np.arange(24).reshape(4, 6)
a = arr[1:3, :]
b = arr[[0, 2], :]
c = arr[arr > 10]
d = arr.T
e = arr.flatten()
f = arr.ravel()
g = arr.reshape(6, 4)
results = {
'a = arr[1:3,:] (basic slice)': a,
'b = arr[[0,2],:] (fancy index)': b,
'c = arr[arr>10] (bool index) ': c,
'd = arr.T (transpose) ': d,
'e = arr.flatten()(flatten) ': e,
'f = arr.ravel() (ravel) ': f,
'g = arr.reshape (reshape) ': g,
}
for name, x in results.items():
kind = 'VIEW' if x.base is not None else 'COPY'
print(f"{name} → {kind}")
# Expected output:
# a = arr[1:3,:] → VIEW
# b = arr[[0,2],:] → COPY
# c = arr[arr>10] → COPY
# d = arr.T → VIEW
# e = arr.flatten() → COPY
# f = arr.ravel() → VIEW (usually — depends on memory layout)
# g = arr.reshape() → VIEW (usually)
Stretch — Harder, No Hints¶
These problems require combining multiple ideas. The expected output is shown; the path is yours to find.
Exercise S1 — Implement Euclidean Distance Without np.linalg¶
Write a function pairwise_distances(X) that takes a 2-D array of shape (n, d) (n points in d-dimensional space) and returns the (n, n) pairwise distance matrix. No loops, no np.linalg.norm on pairs.
import numpy as np
rng = np.random.default_rng(7)
X = rng.standard_normal((5, 3))
D = pairwise_distances(X)
# D[i, j] = Euclidean distance between point i and point j
# D should be symmetric, and D[i, i] = 0
Verify:
- D is symmetric: np.allclose(D, D.T) → True
- Diagonal is zero: np.allclose(np.diag(D), 0) → True
Show answer
import numpy as np
def pairwise_distances(X):
# X: (n, d)
# Expand to (n, 1, d) and (1, n, d), then broadcast
diff = X[:, np.newaxis, :] - X[np.newaxis, :, :] # (n, n, d)
return np.sqrt((diff ** 2).sum(axis=-1)) # (n, n)
rng = np.random.default_rng(7)
X = rng.standard_normal((5, 3))
D = pairwise_distances(X)
print("Symmetric:", np.allclose(D, D.T)) # True
print("Zero diag:", np.allclose(np.diag(D), 0)) # True
print(D.round(3))
Exercise S2 — Complete Preprocessing Pipeline¶
import numpy as np
rng = np.random.default_rng(99)
raw = rng.standard_normal((200, 6)) * np.array([10, 2, 50, 0.5, 100, 1])
# Introduce some outliers
raw[::20] *= 5
Build a pipeline (no loops) that:
- Removes any row where at least one feature is more than 4 standard deviations from that feature's mean
- Applies min-max normalization to each feature (scale to [0, 1])
- Computes the
(6, 6)correlation matrix of the normalized features using only NumPy - Finds the pair of features with the highest absolute correlation (excluding self-correlation)
Show answer
import numpy as np
rng = np.random.default_rng(99)
raw = rng.standard_normal((200, 6)) * np.array([10, 2, 50, 0.5, 100, 1])
raw[::20] *= 5
# Step 1: Remove outlier rows
mean = raw.mean(axis=0) # shape: (6,)
std = raw.std(axis=0) # shape: (6,)
z_scores = np.abs((raw - mean) / std) # shape: (200, 6)
clean_mask = np.all(z_scores <= 4, axis=1)
clean = raw[clean_mask]
print(f"Rows after outlier removal: {clean.shape[0]}")
# Step 2: Min-max normalize each feature
col_min = clean.min(axis=0) # shape: (6,)
col_max = clean.max(axis=0) # shape: (6,)
normalized = (clean - col_min) / (col_max - col_min)
# Step 3: Correlation matrix using NumPy
# np.corrcoef expects (features, samples) — transpose first
corr = np.corrcoef(normalized.T) # shape: (6, 6)
print("Correlation matrix:")
print(corr.round(3))
# Step 4: Find highest off-diagonal absolute correlation
# Mask out the diagonal (which is always 1)
mask = ~np.eye(6, dtype=bool)
abs_corr = np.abs(corr)
abs_corr_masked = abs_corr * mask # zero out diagonal
flat_idx = np.argmax(abs_corr_masked)
i, j = np.unravel_index(flat_idx, corr.shape)
print(f"Highest correlation: features {i} and {j}, r = {corr[i, j]:.4f}")
Exercise S3 — Rolling Window Statistics¶
Compute the rolling mean and rolling standard deviation for a time series using only NumPy (no Pandas, no loops for the window computation).
import numpy as np
rng = np.random.default_rng(42)
prices = 100 + np.cumsum(rng.standard_normal(500) * 2)
window = 20
Expected behavior:
- rolling_mean[i] = mean of prices[i : i + window] for i in range(len(prices) - window + 1)
- Result shape: (len(prices) - window + 1,) = (481,)
Show answer
import numpy as np
rng = np.random.default_rng(42)
prices = 100 + np.cumsum(rng.standard_normal(500) * 2)
window = 20
n = len(prices) - window + 1 # 481
# Build a (n, window) array using stride tricks
# Each row is a sliding window view
from numpy.lib.stride_tricks import as_strided
item_size = prices.strides[0]
windows = as_strided(
prices,
shape=(n, window),
strides=(item_size, item_size)
)
# windows[i] = prices[i : i + window]
rolling_mean = windows.mean(axis=1)
rolling_std = windows.std(axis=1)
print(f"Shape: {rolling_mean.shape}") # (481,)
print(f"First rolling mean: {rolling_mean[0]:.3f}")
print(f"Last rolling mean: {rolling_mean[-1]:.3f}")
# Verify first value
expected_first = prices[:window].mean()
print(f"Manual check: {expected_first:.3f}") # should match rolling_mean[0]
# Alternative without stride tricks (using broadcasting index array)
indices = np.arange(window) + np.arange(n)[:, np.newaxis] # shape: (n, window)
rolling_mean_v2 = prices[indices].mean(axis=1)
print("Methods agree:", np.allclose(rolling_mean, rolling_mean_v2))
Progress Tracker¶
| Exercise | Solved Without Help? | Understood Solution? |
|---|---|---|
| W1 — Create and Inspect | ☐ | ☐ |
| W2 — Border Matrix | ☐ | ☐ |
| W3 — Checkerboard | ☐ | ☐ |
| W4 — Reshape 3-D | ☐ | ☐ |
| M1 — Slicing Patterns | ☐ | ☐ |
| M2 — Boolean Filtering | ☐ | ☐ |
| M3 — Vectorize This | ☐ | ☐ |
| M4 — Temperature Analysis | ☐ | ☐ |
| M5 — Views vs Copies | ☐ | ☐ |
| S1 — Pairwise Distance | ☐ | ☐ |
| S2 — Preprocessing Pipeline | ☐ | ☐ |
| S3 — Rolling Window | ☐ | ☐ |