🗄️ Why SQL Matters
- Understand Relational Databases (RDBMS)
- Connect to databases using Python's `sqlite3`
- Write SQL queries (SELECT, INSERT, WHERE)
- Use SQLAlchemy for Object-Relational Mapping (ORM)
- Load SQL data directly into Pandas DataFrames
🗄️ SQLite Basics
Work with lightweight, file-based databases using Python's built-in sqlite3 module. Perfect for local projects and prototypes.
🔗 SQLAlchemy ORM
Use SQLAlchemy to map database tables to Python classes, enabling object-oriented database operations without raw SQL.
🐼 Pandas + SQL
Load SQL query results directly into Pandas DataFrames with pd.read_sql() for seamless data analysis workflows.
Database: A library building.
Tables: Different sections (Fiction, Non-Fiction, Reference).
Rows: Individual books on the shelves.
Columns: Book attributes (Title, Author, ISBN, Year).
SQL Queries: Asking the librarian for specific books using a request form.
ORM (SQLAlchemy): Having a personal assistant who translates your plain-English requests into library catalog language.
Pandas Integration: The librarian hands you a spreadsheet (DataFrame) with all the books matching your request, ready for analysis.
Data rarely lives in CSV files in production. It lives in databases. Python gives you the tools to extract, transform, and load (ETL) that data efficiently.
Real-World Applications:
- E-commerce: Query product databases to analyze sales trends
- Finance: Extract transaction history for fraud detection models
- Healthcare: Pull patient records for research studies
- Social Media: Analyze user engagement from massive database tables
Open notebook-sessions/week6/session1_sql_databases.ipynb and write a short note on when to choose raw SQL vs ORM. Add a tiny example for each.
Start with SQLite: It's built into Python and requires no setup—perfect for learning SQL fundamentals.
Learn SQL First, ORM Later: Understanding raw SQL makes you appreciate ORMs. Don't skip to SQLAlchemy without knowing basic SELECT and JOIN.
Use DB Browser for SQLite: Visualize your database structure with a GUI tool before writing Python code.
Practice Pandas Integration: pd.read_sql() is your bridge from databases to DataFrames—master it for seamless data pipelines.
Mind Your Connections: Always close database connections (conn.close()) or use context managers (with) to avoid resource leaks.
💾 SQLite Basics
SQLite is a lightweight, file-based database built into Python. No installation required!
# ❌ BAD: String concatenation for queries
import sqlite3
user_input = "admin'; DROP TABLE users; --"
# DANGEROUS! Allows SQL Injection attacks
query = f"SELECT * FROM users WHERE name = '{user_input}'"
print(f"Executing: {query}")
# Executing: SELECT * FROM users WHERE name = 'admin'; DROP TABLE users; --'
Never insert user input directly into a query string. Malicious users can destroy your database.
Practice safe query techniques to prevent SQL injection:
- Create a
userstable with columns:id,username,email,role - Write a safe INSERT using
?placeholders for all user-provided values - Create a search function that accepts a username parameter safely
- Test with malicious input like
'; DROP TABLE users; --to verify protection - Compare the executed query string when using f-strings vs parameterized queries
Bonus: Create a login function that uses parameterized queries to check username AND password.
# 🔰 NOVICE: Manual connection management
import sqlite3
# Connect to file (or memory)
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
cursor.execute("CREATE TABLE users (id INT, name TEXT)")
cursor.execute("INSERT INTO users VALUES (1, 'Alice')")
conn.commit()
cursor.execute("SELECT * FROM users")
print(cursor.fetchall())
conn.close() # Don't forget to close!
# ⭐ BEST PRACTICE: Context Manager + Parameterized Queries
import sqlite3
# Use 'with' to auto-close connection
with sqlite3.connect(':memory:') as conn:
cursor = conn.cursor()
cursor.execute("CREATE TABLE users (id INT, name TEXT)")
# Safe parameter substitution (?)
data = (1, 'Alice')
cursor.execute("INSERT INTO users VALUES (?, ?)", data)
conn.commit()
# Querying
cursor.execute("SELECT * FROM users WHERE name=?", ('Alice',))
print(cursor.fetchall())
Write SELECT statements with WHERE, ORDER BY, and LIMIT:
- Create an
employeestable with:id,name,department,salary,hire_date - Insert at least 10 employees across different departments
- Query: Find all employees in the "Engineering" department
- Query: Find employees with salary > 50000, ordered by salary DESC
- Query: Get the top 5 highest-paid employees using LIMIT
- Query: Find employees hired after a specific date using WHERE
Bonus: Combine multiple conditions with AND/OR operators.
Practice INNER JOIN and LEFT JOIN between tables:
- Create two tables:
employees(id, name, department_id) anddepartments(id, name, location) - Insert sample data with some employees having valid department_id, some with NULL
- Write an INNER JOIN to list employees with their department names
- Write a LEFT JOIN to show ALL employees, even those without a department
- Add a third table
projects(id, name, employee_id) and join all three - Count how many employees are in each department using JOIN + GROUP BY
Bonus: Find departments with no employees using a RIGHT JOIN or subquery.
- SQL Injection Vulnerabilities: Never use f-strings or string concatenation for queries with user input. Always use parameterized queries (
cursor.execute(query, params)). - Forgetting to Commit: INSERT, UPDATE, and DELETE require
conn.commit()to save changes. Forgetting this means your data changes are lost. - Not Closing Connections: Always
conn.close()or usewithcontext managers. Unclosed connections cause resource leaks and lock issues. - Type Mismatches: SQLite has dynamic typing, which can cause unexpected behavior. Be explicit about data types when creating tables.
- Why is SQL injection considered one of the most dangerous security vulnerabilities? Can you think of real-world examples where this caused damage?
- When would you choose SQLite over a full database server like PostgreSQL or MySQL? What are SQLite's limitations?
- How does using parameterized queries (
?placeholders) protect against SQL injection attacks? - What's the difference between
fetchone(),fetchmany(), andfetchall()? When would you use each?
Challenge: Create a SQLite database school.db with a students table:
- Columns:
id(INTEGER PRIMARY KEY),name(TEXT),grade(INTEGER),gpa(REAL) - Insert 5 students with different grades and GPAs
- Query students with GPA > 3.5
- Update a student's grade
- Delete a student by ID
Bonus: Use a with statement to automatically close the connection.
Open notebook-sessions/week6/session1_sql_databases.ipynb and create a table, insert rows using ? parameters, and query them. Demonstrate a safe lookup by name.
🔗 SQLAlchemy ORM
Write Python, not SQL. SQLAlchemy maps Python classes to database tables.
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import declarative_base, sessionmaker
# 1. Setup Engine
engine = create_engine('sqlite:///:memory:')
Base = declarative_base()
# 2. Define Model (Table)
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String)
Base.metadata.create_all(engine)
# 3. Interact via Session
Session = sessionmaker(bind=engine)
session = Session()
# Add
new_user = User(name='Alice')
session.add(new_user)
session.commit()
# Query
user = session.query(User).first()
print(f"User: {user.name} (ID: {user.id})")
- Forgetting to Commit Sessions: Like raw SQL, ORM changes require
session.commit(). Without it, your INSERT/UPDATE/DELETE operations are rolled back. - N+1 Query Problem: Accessing relationships in loops causes separate queries for each item. Use
joinedload()orsubqueryload()to eagerly load relationships. - Not Closing Sessions: Always
session.close()or use context managers. Unclosed sessions hold database connections and can exhaust connection pools. - Mixing ORM and Raw SQL: Using
session.execute()for raw SQL bypasses ORM tracking. Be careful mixing paradigms—stick to one approach per transaction.
- What are the advantages of using an ORM like SQLAlchemy over raw SQL? What are the trade-offs in terms of performance and control?
- How does SQLAlchemy's declarative base pattern make database schema changes easier to manage across a team?
- When would you choose raw SQL queries over SQLAlchemy ORM methods? Give examples of scenarios where each is preferable.
- What is the N+1 query problem, and why is it a performance issue in production applications?
Build a complete Create, Read, Update, Delete application using SQLite and Python:
- Create: Function
add_contact(name, email, phone)that inserts a new contact - Read: Function
get_all_contacts()that returns all contacts as a list - Read: Function
search_contacts(query)that searches by name or email - Update: Function
update_contact(id, **kwargs)that updates any field - Delete: Function
delete_contact(id)that removes a contact by ID - Use parameterized queries for ALL database operations
- Include proper error handling with try/except blocks
Advanced: Add a simple command-line interface (CLI) menu to interact with your CRUD functions.
Extra Credit: Refactor to use SQLAlchemy ORM instead of raw sqlite3.
Open notebook-sessions/week6/session1_sql_databases.ipynb and define a User model, create the schema, insert a user, and query by name using SQLAlchemy.
🐼 Pandas & SQL
The ultimate duo. Read SQL query results directly into a DataFrame.
import pandas as pd
import sqlite3
# Setup dummy data
conn = sqlite3.connect(':memory:')
conn.execute("CREATE TABLE data (id INT, name TEXT)")
conn.execute("INSERT INTO data VALUES (1, 'Alice'), (2, 'Bob')")
conn.commit()
# ⭐ READ SQL to DATAFRAME
df = pd.read_sql("SELECT * FROM data", conn)
print(df)
# ⭐ WRITE DATAFRAME to SQL
df['new_col'] = 'Active'
df.to_sql('new_table', conn, if_exists='replace', index=False)
conn.close()
- Loading Huge Tables:
pd.read_sql()loads entire result sets into memory. For large tables, usechunksizeparameter or write better WHERE clauses to filter data in SQL first. - if_exists='replace' Dangers: Using
to_sql()withif_exists='replace'deletes the entire table. Use'append'for adding rows or'fail'to prevent accidental overwrites. - Index Confusion: By default,
to_sql()writes the DataFrame index as a column. Setindex=Falseunless you explicitly want the index in your table. - Data Type Mismatches: Pandas infers SQL data types, which can cause issues. Use the
dtypeparameter into_sql()to explicitly specify column types.
Use GROUP BY and aggregate functions (COUNT, SUM, AVG) for data analysis:
- Create a
salestable with:id,product_name,category,quantity,price,sale_date - Insert at least 20 sales records across different products and categories
- Query: Calculate total revenue per category using SUM(quantity * price)
- Query: Find the average order value (AVG) per product
- Query: Count the number of sales per day using GROUP BY sale_date
- Query: Find categories with more than 5 sales using HAVING
- Load results into a Pandas DataFrame and create a summary report
Bonus: Add MIN/MAX to find the cheapest and most expensive sale per category.
Build an end-to-end analytics pipeline combining SQL and Pandas:
- Create a SQLite database with related tables:
customers,orders,order_items,products - Write a complex JOIN query combining all tables to get order details
- Use
pd.read_sql()to load the joined data into a DataFrame - Calculate: Customer lifetime value, top 10 products by revenue, monthly sales trends
- Use Pandas
groupby()for additional aggregations not done in SQL - Export your analysis results back to new SQL tables using
to_sql()
Advanced: Create a function that generates weekly sales reports automatically.
- Why is it often better to filter data with SQL WHERE clauses instead of loading everything into Pandas first?
- When would you use
pd.read_sql_query()vs.pd.read_sql_table()? What are the differences? - How does
chunksizeinpd.read_sql()help with memory management? Give an example scenario where this is critical. - What are the advantages of using Pandas DataFrames as an intermediate layer between databases and visualization tools?
Challenge: Build a complete ETL pipeline:
- Create a SQLite database
sales.dbwith atransactionstable (columns:id,product,amount,date) - Insert 20 sample transactions with different products and dates
- Use
pd.read_sql()to query transactions from the last 7 days - Calculate total sales per product in Pandas
- Write the aggregated results back to a new table
product_summaryusingto_sql()
Bonus: Use SQLAlchemy engine instead of raw sqlite3 connection for better compatibility.
Open notebook-sessions/week6/session2_sql_databases_group.ipynb and load a table into a DataFrame with pd.read_sql. Add a new column and write it back using to_sql.
📚 Supplemental Materials: Extended SQL & Related Topics
Deepen your database skills with additional resources and explore how SQL integrates with your broader Python data toolkit.
- Original SQL Lesson — Alternative explanations and additional SQL patterns
- Advanced Pandas — Complement SQL queries with DataFrame operations
- Data Visualization — Visualize query results effectively
For additional SQL coverage and alternative explanations, explore the original lesson materials:
Topics covered: SQLite fundamentals, SQLAlchemy ORM, Pandas-SQL integration, and best practices.
Many SQL operations have Pandas equivalents. Master both to choose the right tool:
🐼 Open Advanced Pandas Lesson →
Comparison: SQL JOINs ↔ pd.merge(), GROUP BY ↔ df.groupby(), WHERE ↔ boolean filtering
- Use SQL: Large datasets, persistent storage, complex joins, data integrity constraints
- Use Pandas: Exploratory analysis, quick prototyping, complex transformations, visualization
- Use Both: Query subset with SQL, analyze in Pandas, write results back to database
📜 SQL Cheat Sheet
# ═══ SQLITE3 ═══
import sqlite3
with sqlite3.connect('db.sqlite') as conn:
cursor = conn.cursor()
cursor.execute("SELECT * FROM table WHERE id=?", (1,))
row = cursor.fetchone()
# ═══ PANDAS ═══
import pandas as pd
# Read
df = pd.read_sql("SELECT * FROM table", conn)
# Write
df.to_sql('table_name', conn, if_exists='append', index=False)
# ═══ BASIC SQL SYNTAX ═══
# SELECT col1, col2 FROM table WHERE col1 > 5
# INSERT INTO table (col1) VALUES (val1)
# UPDATE table SET col1 = val1 WHERE id = 1
# DELETE FROM table WHERE id = 1
- Master SQLite First: Python's built-in
sqlite3is perfect for learning SQL fundamentals without server setup. - Always Use Parameterized Queries: Protect against SQL injection by using
?placeholders instead of f-strings or string concatenation. - Remember to Commit: INSERT, UPDATE, and DELETE require
conn.commit()to persist changes—forgetting this loses your data modifications. - ORMs Simplify Complex Apps: SQLAlchemy lets you work with databases using Python objects, making schema changes and relationships easier to manage.
- Pandas + SQL = ETL Power:
pd.read_sql()andto_sql()create seamless pipelines from databases to DataFrames and back. - Filter in SQL, Not Pandas: Use WHERE clauses to reduce data volume before loading into Pandas—memory and performance depend on it.
- Close Your Connections: Always
conn.close()or usewithstatements to avoid resource leaks and connection pool exhaustion. - Understand Data Types: Be explicit about column types when creating tables and writing DataFrames to avoid type mismatch issues.