Day 02 — Part 2: Pandas Basics¶
Pandas is the tool you will use more than any other in data science work. Every dataset you analyze, every model you prepare data for, every report you generate — it starts with a DataFrame. This session builds the muscle memory you need before any real analysis can happen.
Learning Objectives¶
By the end of this session, you will be able to:
- Explain what a
SeriesandDataFrameare and how they relate to each other - Understand the Index as a first-class object, not just row numbers
- Create DataFrames from dictionaries and lists
- Load CSV and Excel files with full control over which rows, columns, and types get loaded
- Select columns and rows confidently using
[],.loc, and.iloc - Filter rows using boolean indexing and
.query() - Sort data and retrieve top/bottom records
- Run a first-look analysis with
.describe(),.value_counts(),.info(), and.corr() - Detect and quantify missing values
- Save results back to disk correctly
Today's Roadmap¶
Pandas Basics
│
├── 01 → Series and DataFrames
│ Series | DataFrame | Index | dtypes | memory | column selection
│
├── 02 → Reading CSV and Excel Files
│ read_csv parameters | encoding | messy CSVs | read_excel | saving
│
├── 03 → Filtering and Sorting
│ boolean indexing | .loc vs .iloc | .query() | sort_values | nlargest
│
├── 04 → Basic Data Analysis
│ describe | info | value_counts | corr | groupby preview | missing values
│
├── 05 → Practice Exercises
│ three levels: warm-up, main, stretch
│
└── 06 → Interview Questions
collapsible model answers for 12 questions
Time Allocation¶
| # | Topic | Duration | Type |
|---|---|---|---|
| 01 | Series and DataFrames | 35 min | Lecture + Code |
| 02 | Reading CSV and Excel Files | 30 min | Demo |
| 03 | Filtering and Sorting | 35 min | Lecture + Practice |
| 04 | Basic Data Analysis | 30 min | Guided Analysis |
| 05 | Practice Exercises | 50 min | Hands-on |
| 06 | Interview Questions | Reference | Revision |
Total: ~3 hours
Prerequisites¶
- Python basics (functions, lists, dictionaries, loops)
- NumPy fundamentals (arrays, dtypes, vectorized operations)
- Jupyter Notebook, VS Code, or another Python environment
- Pandas installed
Setup Check¶
import pandas as pd
import numpy as np
print(pd.__version__)
# Output: 2.x.x (any 1.5+ works for this material)
If Pandas is missing:
openpyxl is required for reading and writing .xlsx files. Install it alongside pandas.
Files in This Module¶
| File | Topic |
|---|---|
01-series-and-dataframes.md |
Core objects, Index, dtypes, memory, column selection |
02-reading-csv-excel.md |
Loading and saving files with full parameter control |
03-filtering-and-sorting.md |
Boolean indexing, .loc vs .iloc, query, sort |
04-basic-data-analysis.md |
Summary statistics, missing values, groupby preview |
05-practice-exercises.md |
Warm-up, main, and stretch exercises |
06-interview-questions.md |
12 interview questions with model answers |
Study Strategy¶
- Type every code example yourself. Do not copy-paste. The act of typing builds pattern recognition.
- Add
print()calls liberally. Intermediate results reveal how pandas thinks. - Always inspect a dataset before analyzing it — shape, dtypes, missing values, first rows.
- Read error messages top to bottom. The useful part is often the last two lines.
- Keep the NumPy notes nearby. Pandas is built on NumPy and borrows its logic for operations.