Day 02 — Part 2: Pandas Basics¶

Pandas is the tool you will use more than any other in data science work. Every dataset you analyze, every model you prepare data for, every report you generate — it starts with a DataFrame. This session builds the muscle memory you need before any real analysis can happen.

Session Overview

Difficulty: Beginner–Intermediate Reading time: ~2 hours | Exercises: ~1 hour Prerequisites: Python basics · NumPy arrays (Day 02 Part 1)

Learning Objectives¶

By the end of this session, you will be able to:

Explain what a Series and DataFrame are and how they relate to each other
Understand the Index as a first-class object, not just row numbers
Create DataFrames from dictionaries and lists
Load CSV and Excel files with full control over which rows, columns, and types get loaded
Select columns and rows confidently using [], .loc, and .iloc
Filter rows using boolean indexing and .query()
Sort data and retrieve top/bottom records
Run a first-look analysis with .describe(), .value_counts(), .info(), and .corr()
Detect and quantify missing values
Save results back to disk correctly

Today's Roadmap¶

Pandas Basics
│
├── 01 → Series and DataFrames
│         Series | DataFrame | Index | dtypes | memory | column selection
│
├── 02 → Reading CSV and Excel Files
│         read_csv parameters | encoding | messy CSVs | read_excel | saving
│
├── 03 → Filtering and Sorting
│         boolean indexing | .loc vs .iloc | .query() | sort_values | nlargest
│
├── 04 → Basic Data Analysis
│         describe | info | value_counts | corr | groupby preview | missing values
│
├── 05 → Practice Exercises
│         three levels: warm-up, main, stretch
│
└── 06 → Interview Questions
          collapsible model answers for 12 questions

Time Allocation¶

#	Topic	Duration	Type
01	Series and DataFrames	35 min	Lecture + Code
02	Reading CSV and Excel Files	30 min	Demo
03	Filtering and Sorting	35 min	Lecture + Practice
04	Basic Data Analysis	30 min	Guided Analysis
05	Practice Exercises	50 min	Hands-on
06	Interview Questions	Reference	Revision

Total: ~3 hours

Prerequisites¶

Python basics (functions, lists, dictionaries, loops)
NumPy fundamentals (arrays, dtypes, vectorized operations)
Jupyter Notebook, VS Code, or another Python environment
Pandas installed

Setup Check¶

import pandas as pd
import numpy as np

print(pd.__version__)
# Output: 2.x.x  (any 1.5+ works for this material)

If Pandas is missing:

pip install pandas openpyxl

openpyxl is required for reading and writing .xlsx files. Install it alongside pandas.

Files in This Module¶

File	Topic
`01-series-and-dataframes.md`	Core objects, Index, dtypes, memory, column selection
`02-reading-csv-excel.md`	Loading and saving files with full parameter control
`03-filtering-and-sorting.md`	Boolean indexing, .loc vs .iloc, query, sort
`04-basic-data-analysis.md`	Summary statistics, missing values, groupby preview
`05-practice-exercises.md`	Warm-up, main, and stretch exercises
`06-interview-questions.md`	12 interview questions with model answers

Study Strategy¶

Type every code example yourself. Do not copy-paste. The act of typing builds pattern recognition.
Add print() calls liberally. Intermediate results reveal how pandas thinks.
Always inspect a dataset before analyzing it — shape, dtypes, missing values, first rows.
Read error messages top to bottom. The useful part is often the last two lines.
Keep the NumPy notes nearby. Pandas is built on NumPy and borrows its logic for operations.

Previous: NumPy Cheat Sheet | Next: Series and DataFrames