Top 10 Python Libraries Every Developer Should Know

30 Packages Down to 10: What Survived the Purge

A year ago, my requirements.txt had thirty entries. Thirty. Some I’d installed for a weekend project in 2022 and never touched again. Others were dependencies of dependencies I couldn’t even name. One rainy Saturday in Bangalore, I sat down with a chai and started deleting.

Not randomly. I had rules. If I hadn’t imported it in six months, gone. If another package did the same job better, gone. If I couldn’t explain to a junior dev why it mattered, gone. By the end of that afternoon, twenty packages were cut. And honestly? My projects ran cleaner. My installs were faster. My cognitive load dropped by half.

What remained were ten libraries I’d bet my career on. Not the trendiest. Not always the newest. But the ones that kept showing up across every single project I built — web apps, data pipelines, CLI tools, APIs, automation scripts. These ten earned their place by being irreplaceable.

Here’s what survived, why each one matters, and where I think people get them wrong.

1. Requests — Because `urllib` Is Pain

Let me be blunt: urllib and urllib3 are fine if you enjoy suffering. Requests exists because Kenneth Reitz looked at Python’s built-in HTTP handling back around 2011 and thought, “developers deserve better.” He was right.

Most downloaded package on PyPI for a reason. GET requests, POST calls, session management, cookies, auth headers — all of it reads like pseudocode. I’ve introduced Requests to interns who were writing functional API calls within fifteen minutes. No other HTTP library has that onboarding speed.


# pip install requests
import requests

# GET request
response = requests.get("https://jsonplaceholder.typicode.com/posts/1")
print(f"Status: {response.status_code}")
data = response.json()
print(f"Title: {data['title']}")

# POST request with JSON body
new_post = {
    "title": "Hello from Python",
    "body": "This post was created with the requests library.",
    "userId": 1,
}
response = requests.post(
    "https://jsonplaceholder.typicode.com/posts",
    json=new_post,
    headers={"X-Custom-Header": "byteyogi"},
    timeout=10,
)
print(f"Created post ID: {response.json()['id']}")

# Session for connection pooling and persistent cookies
session = requests.Session()
session.headers.update({"Authorization": "Bearer my-token"})
r1 = session.get("https://httpbin.org/headers")
print(r1.json()["headers"]["Authorization"])

Where people go wrong: they skip the timeout parameter. I’ve seen production services hang indefinitely because someone forgot timeout=10. Always set it. Always. And use Session() objects when you’re hitting the same API multiple times — connection pooling alone saves noticeable latency on anything beyond a hobby script.

Could you switch to HTTPX for async work? Sure. We’ll get to that. But for synchronous, straightforward HTTP calls, Requests hasn’t been dethroned in over a decade. I don’t see that changing anytime soon.

2. Pandas — Messy Data’s Worst Enemy

Some people complain Pandas is bloated. They’re not entirely wrong — it pulls in NumPy, pytz, dateutil, and more. But here’s my take: if you work with tabular data and you’re not using Pandas, you’re writing five times more code for the same result. Full stop.

DataFrames changed how I think about data. Before Pandas, I was writing nested loops to filter CSV rows. Now? One line. Groupby, aggregate, pivot, merge — operations that would’ve taken me an hour in raw Python take maybe thirty seconds.


# pip install pandas
import pandas as pd

# Create a DataFrame from a dictionary
df = pd.DataFrame({
    "name": ["Alice", "Bob", "Charlie", "Diana", "Eve"],
    "department": ["Engineering", "Marketing", "Engineering", "Sales", "Marketing"],
    "salary": [95000, 72000, 88000, 68000, 79000],
    "years": [5, 3, 7, 2, 4],
})

# Basic analysis
print(df.describe())
print(f"\nAverage salary: ${df['salary'].mean():,.0f}")
print(f"Total headcount: {len(df)}")

# Filtering and grouping
engineers = df[df["department"] == "Engineering"]
print(f"\nEngineers:\n{engineers}")

dept_stats = df.groupby("department").agg(
    avg_salary=("salary", "mean"),
    headcount=("name", "count"),
    avg_tenure=("years", "mean"),
).round(1)
print(f"\nDepartment stats:\n{dept_stats}")

# Read and write CSV
df.to_csv("employees.csv", index=False)
df_loaded = pd.read_csv("employees.csv")

Fair warning though: Pandas has a learning curve that sneaks up on you. .apply() looks simple until you realize it’s basically a disguised for-loop and murders performance on large datasets. Vectorized operations — .str, .dt, boolean indexing — are what make Pandas fast. Learn those first.

And for anyone processing datasets over a few gigabytes, look into Polars. Seriously. Polars is eating Pandas’ lunch on performance benchmarks since it launched in 2023. But Pandas still wins on ecosystem integration. Every data science tutorial, every Kaggle notebook, every Stack Overflow answer — Pandas code everywhere. That momentum matters when you’re debugging at 2 AM.

3. NumPy — The Foundation Under Everything

You might never write import numpy directly. Doesn’t matter. NumPy is already running under your Pandas, your scikit-learn, your TensorFlow, your Matplotlib. Remove NumPy from the Python ecosystem and roughly 80% of data science libraries collapse overnight.

Why care, then, if it’s working behind the scenes? Because understanding NumPy arrays makes you a better Python programmer. Once you grok vectorized operations, you stop writing slow loops. Your brain shifts from “process each element” to “operate on the whole array.” Massive difference in both speed and code clarity.


# pip install numpy
import numpy as np

# Create arrays
arr = np.array([1, 2, 3, 4, 5])
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
zeros = np.zeros((3, 4))
random_data = np.random.randn(1000)  # 1000 random normal values

# Vectorized operations (no loops needed)
print("Squared:", arr ** 2)
print("Mean:", random_data.mean())
print("Std:", random_data.std())

# Matrix operations
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
print("Matrix multiply:\n", a @ b)
print("Element-wise multiply:\n", a * b)
print("Determinant:", np.linalg.det(a))

# Boolean indexing
data = np.random.randint(0, 100, size=20)
above_50 = data[data > 50]
print(f"Values above 50: {above_50}")

My hot take: most developers learn Pandas before NumPy, and that’s backwards. Spend a weekend with NumPy first. Understand broadcasting, array shapes, dtype quirks. When you come back to Pandas, half the “magic” suddenly makes sense because DataFrames are just labeled NumPy arrays with extra machinery bolted on.

Performance-wise, a NumPy array doing element-wise multiplication runs 50-100x faster than the equivalent Python list comprehension. That gap only widens as your data grows. For number crunching at scale, nothing in pure Python comes close.

4. Flask — The Framework That Gets Out of Your Way

Django fans, don’t come at me. I love Django too. But when I need an API running in twenty minutes — a webhook receiver, a microservice, an internal tool — Flask is what I reach for every time.

Flask gives you routing, request handling, and templating. That’s it. No ORM, no admin panel, no authentication system baked in. Some people see that as a weakness. I think it’s Flask’s greatest strength. You pick your database layer. You pick your auth strategy. You wire it together your way.


# pip install flask
from flask import Flask, request, jsonify

app = Flask(__name__)
books = []

@app.route("/")
def home():
    return jsonify({"message": "Welcome to the Book API", "version": "1.0"})

@app.route("/books", methods=["GET", "POST"])
def handle_books():
    if request.method == "POST":
        book = request.get_json()
        book["id"] = len(books) + 1
        books.append(book)
        return jsonify(book), 201
    return jsonify(books)

if __name__ == "__main__":
    app.run(debug=True)

Real talk: Flask doesn’t scale itself. Gunicorn or uWSGI behind Nginx, maybe a task queue for heavy lifting — that’s on you. Flask won’t hold your hand. For large teams building monolithic apps with dozens of models, Django’s batteries-included approach probably saves more time. But for everything else? Flask’s minimalism is a feature, not a bug.

I’ve shipped at least fifteen Flask apps to production since 2019. Most are still running. The ones I built with heavier frameworks? Half got rewritten because the framework’s opinions clashed with the project’s needs.

5. FastAPI — Flask’s Younger, Faster Sibling

Okay, confession: FastAPI almost knocked Flask off my list entirely. Almost. Built by Sebastián Ramírez around 2018, it’s basically what happens when someone looks at Flask, Django REST Framework, and modern Python type hints, then combines the best parts of all three.

Automatic request validation through Pydantic models. Async support out of the box. Auto-generated OpenAPI docs at /docs. Type hints aren’t just decorative — they drive the entire validation and serialization pipeline. Write your types, get your validation free.


# pip install fastapi uvicorn
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field

app = FastAPI(title="Book API", version="2.0")

class Book(BaseModel):
    title: str = Field(..., min_length=1, max_length=200)
    author: str
    year: int = Field(..., ge=1000, le=2030)
    isbn: str | None = None

books_db: dict[int, Book] = {}

@app.post("/books", status_code=201)
def create_book(book: Book) -> dict:
    book_id = len(books_db) + 1
    books_db[book_id] = book
    return {"id": book_id, **book.model_dump()}

@app.get("/books/{book_id}")
def get_book(book_id: int) -> dict:
    if book_id not in books_db:
        raise HTTPException(status_code=404, detail="Book not found")
    return {"id": book_id, **books_db[book_id].model_dump()}

# Run with: uvicorn main:app --reload
# Auto-generated docs at: http://localhost:8000/docs

So why keep Flask at all? Honestly, ecosystem maturity. Flask has thirteen years of battle-tested extensions, tutorials, and Stack Overflow answers. FastAPI’s ecosystem is growing fast — incredibly fast, actually — but Flask’s depth of third-party support still edges it out for certain use cases. Give it another two years, though. I suspect FastAPI will be the default recommendation for new Python web projects by 2028.

My rule of thumb: new API project with a modern Python codebase? FastAPI. Legacy project or team unfamiliar with type hints? Flask. Both are excellent choices. Neither is wrong.

6. SQLAlchemy — Your Database Doesn’t Matter Anymore

Pick a database. PostgreSQL, MySQL, SQLite, Oracle, Microsoft SQL Server. SQLAlchemy doesn’t care. Write your models once, swap the connection string, and your code migrates across database engines without rewriting queries.

That’s the promise, and in my experience, it actually delivers about 90% of the time. Edge cases around database-specific features (PostgreSQL’s JSONB, MySQL’s FULLTEXT indexes) require raw SQL or dialect-specific extensions. But for standard CRUD operations? Flawless portability.


# pip install sqlalchemy
from sqlalchemy import create_engine, Column, Integer, String, Float
from sqlalchemy.orm import declarative_base, Session

engine = create_engine("sqlite:///products.db", echo=False)
Base = declarative_base()

class Product(Base):
    __tablename__ = "products"
    id = Column(Integer, primary_key=True)
    name = Column(String(100), nullable=False)
    price = Column(Float, nullable=False)
    category = Column(String(50))

    def __repr__(self):
        return f"Product(id={self.id}, name='{self.name}', price={self.price})"

Base.metadata.create_all(engine)

with Session(engine) as session:
    session.add_all([
        Product(name="Laptop", price=999.99, category="Electronics"),
        Product(name="Headphones", price=149.99, category="Electronics"),
        Product(name="Notebook", price=12.99, category="Office"),
    ])
    session.commit()

    electronics = session.query(Product).filter_by(category="Electronics").all()
    print(electronics)

Strong opinion incoming: ORMs aren’t for everyone, and people who say “just write raw SQL” aren’t wrong. But they’re ignoring the maintenance angle. With SQLAlchemy, your schema is defined in Python. Migrations (via Alembic) are version-controlled. Relationships are explicit. When you join a project with forty tables, having models as code beats reading a 2000-line SQL dump.

Version 2.0, released in early 2023, overhauled the API to feel more modern — cleaner session handling, better type support, improved async compatibility. If you tried SQLAlchemy years ago and bounced off, give 2.0 another shot. Different experience entirely.

7. Pytest — Tests You’ll Actually Write

Python ships with unittest. Nobody wants to use it. All those self.assertEqual calls, the mandatory class inheritance, the setUp/tearDown ceremony — it feels like writing Java in Python. Pytest strips all that away.

Plain functions. Plain assert statements. Run pytest and it finds your tests automatically. I went from writing tests grudgingly to writing them enthusiastically once I switched from unittest. Not exaggerating.


# pip install pytest
# Save as test_calculator.py

def add(a, b):
    return a + b

def divide(a, b):
    if b == 0:
        raise ValueError("Cannot divide by zero")
    return a / b

# Tests
def test_add():
    assert add(2, 3) == 5
    assert add(-1, 1) == 0
    assert add(0, 0) == 0

def test_divide():
    assert divide(10, 2) == 5.0
    assert divide(7, 2) == 3.5

def test_divide_by_zero():
    import pytest
    with pytest.raises(ValueError, match="Cannot divide by zero"):
        divide(10, 0)

# Parametrized tests
import pytest

@pytest.mark.parametrize("a, b, expected", [
    (1, 1, 2),
    (0, 0, 0),
    (-1, -1, -2),
    (100, 200, 300),
])
def test_add_parametrized(a, b, expected):
    assert add(a, b) == expected

# Run with: pytest test_calculator.py -v

What makes Pytest genuinely special is fixtures. Fixtures let you define reusable test setup — database connections, mock servers, temporary files — as dependency-injected function arguments. Compose them, scope them (function-level, module-level, session-level), and share them across your entire test suite. Once you’ve built a good fixture library for your project, writing new tests takes minutes instead of hours.

Parametrized tests are the other killer feature. One test function, multiple input sets, each reported as a separate test case. Covers edge cases without copy-pasting the same test body fifteen times. Combined with pytest-cov for coverage reporting, you’ve got a testing setup that’d make any CI pipeline happy.

8. Click — CLIs That Don’t Embarrass You

Quick: how do you parse command-line arguments in Python? If your answer is argparse, I won’t judge. But I will say Click does the same job with half the boilerplate and twice the readability.

Built by Armin Ronacher (same person behind Flask), Click turns functions into CLI commands through decorators. Arguments, options, flags, help text, input validation — all declarative. Your CLI code reads almost like a specification rather than implementation.


# pip install click
import click

@click.group()
@click.version_option(version="1.0.0")
def cli():
    """A sample CLI application built with Click."""
    pass

@cli.command()
@click.argument("name")
@click.option("--greeting", "-g", default="Hello", help="Greeting to use")
@click.option("--count", "-c", default=1, help="Number of greetings")
def greet(name, greeting, count):
    """Greet someone by name."""
    for _ in range(count):
        click.echo(f"{greeting}, {name}!")

@cli.command()
@click.argument("directory", type=click.Path(exists=True))
@click.option("--extension", "-e", default=".py", help="File extension to count")
def count_files(directory, extension):
    """Count files with a given extension in a directory."""
    from pathlib import Path
    files = list(Path(directory).rglob(f"*{extension}"))
    click.echo(f"Found {len(files)} {extension} files in {directory}")

if __name__ == "__main__":
    cli()

# Usage: python app.py greet "World" --count 3
# Usage: python app.py count-files ./src --extension .py

Unpopular opinion: if your CLI has more than two commands, skip argparse entirely. Subcommand handling in argparse requires nested subparsers that get ugly fast. Click’s @group and @command decorators handle nesting cleanly at any depth.

There’s also Typer, which is basically Click reimagined with type hints (built by the same person who made FastAPI, naturally). Typer is worth a look for new projects. But Click’s maturity — stable since roughly 2014 — and the sheer volume of production CLIs built on it still make it my default.

9. Rich — Make Your Terminal Look Like It Belongs in 2026

Most Python scripts dump plain text to stdout. That’s fine for quick scripts. But when you’re building developer tools, admin dashboards, or anything that a human will stare at repeatedly, Rich transforms the experience.

Colored output. Formatted tables. Progress bars that actually look good. Syntax-highlighted code in the terminal. Markdown rendering. Rich does all of it with maybe five lines of code. Will Mcgugan built something genuinely beautiful here, and the Python community adopted it fast — over 50,000 GitHub stars as of early 2025.


# pip install rich
from rich.console import Console
from rich.table import Table
from rich.progress import track
from rich import print as rprint
import time

console = Console()

# Rich print with markup
rprint("[bold green]Success![/bold green] File uploaded.")
rprint("[red]Error:[/red] Connection timeout after [bold]30s[/bold]")

# Tables
table = Table(title="Server Status")
table.add_column("Service", style="cyan")
table.add_column("Status", style="green")
table.add_column("Uptime", justify="right")

table.add_row("Web Server", "Running", "14 days")
table.add_row("Database", "Running", "14 days")
table.add_row("Cache", "Degraded", "2 hours")
console.print(table)

# Progress bars
for step in track(range(50), description="Processing..."):
    time.sleep(0.05)

# Syntax highlighting
from rich.syntax import Syntax
code = '''
def fibonacci(n):
    a, b = 0, 1
    for _ in range(n):
        yield a
        a, b = b, a + b
'''
syntax = Syntax(code, "python", theme="monokai", line_numbers=True)
console.print(syntax)

I started using Rich in late 2022 for a deployment script at work. Before Rich, our team would squint at walls of unformatted log lines trying to spot failures. After adding Rich tables and color-coded status indicators, deploy reviews went from ten minutes to two. Not because the deploys changed — because the output was finally readable.

Pair Rich with Click and you’ve got CLIs that look professional with minimal effort. Those two libraries together cover 90% of what you need for polished command-line tools.

10. HTTPX — Requests, But Ready for the Future

Here’s where I’ll probably catch some heat. HTTPX is a Requests-compatible HTTP client that adds async support, HTTP/2, and a cleaner architecture. And yes, I still listed Requests at number one.

Why? Because most Python code is still synchronous. Requests works perfectly for that world. But HTTPX is where things are heading. Async Python isn’t a niche anymore — FastAPI runs on it, Django added async views in 4.1, and any application making multiple concurrent HTTP calls benefits enormously from async.


# pip install httpx
import httpx
import asyncio

# Synchronous usage (drop-in requests replacement)
response = httpx.get("https://jsonplaceholder.typicode.com/posts/1")
print(response.json()["title"])

# Async usage for concurrent requests
async def fetch_all_posts():
    async with httpx.AsyncClient() as client:
        tasks = [
            client.get(f"https://jsonplaceholder.typicode.com/posts/{i}")
            for i in range(1, 6)
        ]
        responses = await asyncio.gather(*tasks)
        for r in responses:
            data = r.json()
            print(f"Post {data['id']}: {data['title'][:40]}...")

asyncio.run(fetch_all_posts())

# HTTP/2 support
async def http2_example():
    async with httpx.AsyncClient(http2=True) as client:
        r = await client.get("https://httpbin.org/get")
        print(f"HTTP version: {r.http_version}")

asyncio.run(http2_example())

Concrete example: I needed to fetch data from twelve different APIs for an aggregation service last year. Synchronous requests, one after another — about eight seconds total. Switched to HTTPX async with asyncio.gather, same twelve calls ran in roughly 1.2 seconds. That’s not a micro-optimization. That’s a fundamental architectural improvement.

HTTP/2 support matters too, even if you haven’t thought about it. Multiplexed connections, header compression, server push — modern web infrastructure assumes HTTP/2. Requests can’t do it. HTTPX can, with one parameter: http2=True.

My prediction: within three years, HTTPX will be the default HTTP library recommendation for new Python projects. Requests will still work, still be maintained, still be everywhere. But HTTPX is the future. Start using it now so you’re not rewriting later.

What About TensorFlow? Django? Celery?

Yeah, I know. People expected TensorFlow and Django on a list like this. Maybe Celery or Scrapy too. Here’s why they didn’t make my cut.

TensorFlow is extraordinary if you’re doing machine learning. But “every developer” doesn’t need it. ML is a specialization, and recommending TensorFlow to a backend developer building REST APIs is like recommending a tractor to someone who needs a bicycle. If you’re into ML, absolutely learn TensorFlow (or PyTorch — I’d probably pick PyTorch in 2026, honestly). But it’s not a universal tool.

Django is a framework, not a library, and it’s opinionated in ways that don’t suit every project. I use Django for large web applications with admin panels and ORM-heavy data models. For APIs? FastAPI or Flask. For microservices? Flask. Django earns its place in specific contexts, not across-the-board.

Celery handles distributed task queues brilliantly. But you don’t need Celery until you need Celery. It’s complex to set up, requires a message broker (Redis or RabbitMQ), and adds operational overhead. When you hit that scale, you’ll know. Until then, threading or asyncio covers most background task needs.

Now It’s Your Turn

Go look at your own requirements.txt right now. Not tomorrow. Right now. Count the packages. How many do you actually use? How many are dead weight from a project you abandoned? How many could be replaced by something better?

Do the purge. Pick a Saturday afternoon, brew some strong coffee (or chai, I won’t judge), and start deleting. Keep what matters, cut what doesn’t, and I guarantee your projects will feel lighter for it. You might end up with a different ten than mine — and that’s fine. The point isn’t matching my list. It’s knowing your list and being able to defend every entry on it.

Here’s my dare: cut your requirements.txt by at least 30% this week. Post your surviving list in the comments. I want to see what you kept and, more importantly, what you had the guts to throw away.