Skip to content

2-Week Data Science Crash Training

Transformers Overview

KirkYagami/2-Week-Data-Science-Crash-Training

2-Week Data Science Crash Training

KirkYagami/2-Week-Data-Science-Crash-Training

Home
Week 01 — Foundations
Week 01 — Foundations
- Overview
- Day 01 — Python
  Day 01 — Python
  - Part 1 — Python Basics
    Part 1 — Python Basics
    
    Agenda
    
    Python Introduction
    
    Variables & Data Types
    
    Control Flow
    
    Functions
    
    Lists, Tuples & Dicts
    
    Practice Problems
    
    Interview Questions
    
    Cheat Sheet
  - Part 2 — Advanced Python
    Part 2 — Advanced Python
    
    Agenda
    
    OOP Basics
    
    File Handling
    
    Modules & Packages
    
    Exception Handling
    
    Python Best Practices
    
    Mini Exercises
    
    Interview Questions
- Day 02 — NumPy & Pandas
  Day 02 — NumPy & Pandas
  - Part 1 — NumPy Fundamentals
    Part 1 — NumPy Fundamentals
    
    Agenda
    
    Introduction to NumPy
    
    Array Creation
    
    Indexing & Slicing
    
    Vectorization
    
    Broadcasting
    
    NumPy Exercises
    
    Cheat Sheet
  - Part 2 — Pandas Basics
    Part 2 — Pandas Basics
    
    Agenda
    
    Series & DataFrames
    
    Reading CSV & Excel
    
    Filtering & Sorting
    
    Basic Data Analysis
    
    Practice Exercises
    
    Interview Questions
- Day 03 — Pandas Advanced & Visualization
  Day 03 — Pandas Advanced & Visualization
  - Part 1 — Pandas Advanced
    Part 1 — Pandas Advanced
    
    GroupBy
    
    Merge & Join
    
    Apply Functions
    
    Missing Values
    
    Real-World Data Cleaning
    
    Exercises
  - Part 2 — Data Visualization
    Part 2 — Data Visualization
    
    Matplotlib Basics
    
    Line, Bar & Histogram
    
    Seaborn Basics
    
    Heatmaps
    
    Visualization Best Practices
    
    Mini Project
- Day 04 — Statistics
  Day 04 — Statistics
  - Part 1 — Statistics Basics
    Part 1 — Statistics Basics
    
    Mean, Median & Mode
    
    Variance & Std Dev
    
    Probability Basics
    
    Distributions
    
    Cheat Sheet
    
    Practice Questions
  - Part 2 — Inferential Statistics
    Part 2 — Inferential Statistics
    
    Hypothesis Testing
    
    P-Value
    
    Confidence Intervals
    
    Correlation
    
    Statistical Tests
    
    Interview Prep
- Day 05 — SQL & EDA
  Day 05 — SQL & EDA
  - Part 1 — SQL for Data Science
    Part 1 — SQL for Data Science
    
    SELECT & WHERE
    
    GROUP BY & HAVING
    
    JOINs
    
    Subqueries
    
    CTEs
    
    Window Functions
    
    SQL Case Studies
    
    Interview Questions
  - Part 2 — EDA
    Part 2 — EDA
    
    Data Cleaning
    
    Outlier Detection
    
    Feature Understanding
    
    Univariate Analysis
    
    Bivariate Analysis
    
    EDA Workflow
    
    Mini Case Study
Week 02 — Machine Learning
Week 02 — Machine Learning
- Overview
- Day 01 — ML Basics & Regression
  Day 01 — ML Basics & Regression
  - Part 1 — Machine Learning Basics
    Part 1 — Machine Learning Basics
    
    Agenda
    
    What is Machine Learning?
    
    Supervised vs Unsupervised
    
    Train/Test Split & Leakage
    
    Scikit-learn Workflow
    
    Exercises
  - Part 2 — Regression Algorithms
    Part 2 — Regression Algorithms
    
    Agenda
    
    Regression Overview
    
    Linear Regression
    
    Ridge, Lasso & ElasticNet
    
    Tree-Based Regression
    
    Regression Metrics
    
    Exercises
- Day 02 — Classification & Clustering
  Day 02 — Classification & Clustering
  - Part 1 — Classification Algorithms
    Part 1 — Classification Algorithms
    
    Agenda
    
    Classification Overview
    
    Logistic Regression
    
    KNN & Naive Bayes
    
    Trees, Forests & Boosting
    
    Classification Metrics
    
    Exercises
  - Part 2 — Clustering Techniques
    Part 2 — Clustering Techniques
    
    Agenda
    
    Clustering Overview
    
    K-Means
    
    Hierarchical Clustering
    
    DBSCAN
    
    Evaluation & Scaling
    
    Exercises
- Day 03 — Feature Engineering & Evaluation
  Day 03 — Feature Engineering & Evaluation
  - Part 1 — Feature Engineering
    Part 1 — Feature Engineering
    
    Agenda
    
    Overview
    
    Numeric Features
    
    Categorical Features
    
    Datetime & Text Features
    
    Pipelines & Leakage
    
    Exercises
  - Part 2 — Model Evaluation
    Part 2 — Model Evaluation
    
    Agenda
    
    Evaluation Overview
    
    Cross-Validation
    
    Regression Evaluation
    
    Classification Evaluation
    
    Model Selection & Tuning
    
    Exercises
- Day 04 — Deep Learning & NLP
  Day 04 — Deep Learning & NLP
  - Part 1 — Intro to Deep Learning
    Part 1 — Intro to Deep Learning
    
    Agenda
    
    Neural Network Intuition
    
    Training Neural Networks
    
    Keras Quickstart
    
    Overfitting & Regularization
    
    Exercises
  - Part 2 — NLP Basics
    Part 2 — NLP Basics
    
    Agenda
    
    NLP Overview
    
    Text Preprocessing
    
    BoW & TF-IDF
    
    Sentiment Classification
    
    Transformers Overview Transformers Overview
    Table of contents
    
    Core Ideas
    
    When to Use Classic NLP vs Transformers
    
    Landmark Paper
    
    Next
    
    Exercises
- Day 05 — Project & Interview
  Day 05 — Project & Interview
  - Part 1 — End-to-End Mini Project
    Part 1 — End-to-End Mini Project
    
    Agenda
    
    Project Brief
    
    EDA & Cleaning
    
    Feature Engineering
    
    Modeling Pipeline
    
    Evaluation & Report
    
    Submission Checklist
  - Part 2 — Mock Interview & Resume Review
    Part 2 — Mock Interview & Resume Review
    
    Agenda
    
    Resume Checklist
    
    Portfolio & GitHub
    
    Technical Interview Questions
    
    Case Study Practice
    
    Mock Interview Script
Projects
Projects
- Titanic Survival Prediction
  Titanic Survival Prediction
- House Price Prediction
  House Price Prediction
- Movie Recommendation System
  Movie Recommendation System
- Customer Churn Prediction
  Customer Churn Prediction
- Sales Forecasting
  Sales Forecasting
- Sentiment Analysis on Tweets
  Sentiment Analysis on Tweets
Cheat Sheets
Cheat Sheets
Interview Prep
Interview Prep
- Overview
- Python
  Python
- SQL
  SQL
- Statistics
  Statistics
- Machine Learning
  Machine Learning
- Deep Learning
  Deep Learning
- NLP
  NLP
- System Design
  System Design
- Case Studies
  Case Studies
- Behavioral
  Behavioral
Assignments
Assignments
- Week 01
- Week 02
Resources
Resources

🤖 05 — Transformers Overview¶

Transformers are modern deep learning architectures for language tasks.

They power systems like:

BERT
GPT-style models
translation systems
summarizers
semantic search

Core Ideas¶

Concept	Meaning
Token	piece of text
Embedding	numeric representation
Attention	mechanism for focusing on relevant tokens
Pretraining	learning from huge text corpora
Fine-tuning	adapting to a specific task

When to Use Classic NLP vs Transformers¶

Situation	Good Choice
small dataset, simple classifier	TF-IDF + Logistic Regression
high accuracy text task	pretrained transformer
limited compute	classic NLP baseline
semantic understanding	transformer embeddings

Landmark Paper¶

Attention Is All You Need

Next¶

➡️ 06-exercises