span

🏆 C++ for Competitive Programming: A USACO Guide

From Zero to USACO Gold

The complete beginner's roadmap to competitive programming in C++, designed around USACO competition preparation.

No prior experience required. Written for clarity, depth, and contest readiness.

🎯 What Is This Book?

This book is a structured, self-contained course for students who want to learn competitive programming in C++ — specifically USACO (USA Computing Olympiad)

Unlike scattered online resources, this book gives you a single linear path: from writing your very first C++ program, through data structures and graph algorithms, all the way to solving USACO Gold problems with confidence. Every chapter builds on the previous one, with detailed worked examples, annotated C++ code, and SVG diagrams that make abstract algorithms visual and concrete.

If you've ever felt overwhelmed looking at USACO editorials, or if you know some programming but don't know what to learn next — this book was written for you.

✅ What You'll Learn

📊 Book Statistics

Metric	Value
Parts / Chapters	7 parts / 26 chapters
Code Examples	150+ (all C++17, compilable)
Practice Problems	130+ (labeled Easy/Medium/Hard)
SVG Diagrams	35+ custom visualizations
Algorithm Templates	20+ contest-ready templates
Appendices	6 (Quick Ref, Problem Set, Tricks, Templates, Math, Debugging)
Estimated Completion	8–12 weeks (1–2 chapters/week)
Target Level	USACO Bronze → USACO Gold

🗺️ Learning Path

🚀 Quick Start (5 Minutes)

Step 1: Install C++ Compiler

Windows: Install MSYS2, then: pacman -S mingw-w64-x86_64-gcc

macOS: xcode-select --install in Terminal

Linux: sudo apt install g++ build-essential

Verify: g++ --version (should show version ≥ 9)

Step 2: Get an Editor

VS Code + C/C++ extension + Code Runner extension

Step 3: Competition Template

Copy this to template.cpp — use it as your starting point for every problem:

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    // freopen("problem.in", "r", stdin);   // uncomment for file I/O
    // freopen("problem.out", "w", stdout);

    // Your solution here

    return 0;
}

Step 4: Compile & Run

g++ -o sol solution.cpp -std=c++17 -O2 -Wall
./sol < input.txt

Step 5: Start Reading

Go to Chapter 2.1 and write your first C++ program. Then solve all practice problems before moving on. Don't skip the problems — that's where 80% of learning happens.

📚 How to Use This Book

The Reading Strategy That Works

Read actively: Code every example yourself. Don't just read — type it out.
Do the problems: Each chapter has 5–7 problems. Attempt every one before reading hints.
Read hints when stuck (after 20–30 minutes of genuine effort)
Review the Chapter Summary before moving on — it's a quick checklist.
Return to earlier chapters when a later chapter references them.

Practice Problems Guide

Each practice problem is labeled:

🟢 Easy — Directly applies the chapter's main technique
🟡 Medium — Requires combining ideas or a minor insight
🔴 Hard — Challenging; partial credit counts!
🏆 Challenge — Beyond chapter scope; try when ready

All hints are hidden by default (click to expand). Struggle first!

Reading Schedule

Stage	Chapters	Recommended Time
Foundations	2.1–2.3	1–2 weeks
Data Structures	3.1–3.11	2–3 weeks
Greedy	4.1–4.2	1 week
Graphs	5.1–5.4	2–3 weeks
DP	6.1–6.3	3–4 weeks
USACO Contest Guide	7.1–7.3	1 week

📖 Chapter Overview

Part 2: C++ Foundations (1–2 weeks)

Chapter	Topic	Key Skills
Ch.2.1: First C++ Program	Hello World, variables, I/O	`cin`, `cout`, `int`, `long long`
Ch.2.2: Control Flow	Conditions and loops	`if/else`, `for`, `while`, `break`
Ch.2.3: Functions & Arrays	Reusable code, collections	Arrays, vectors, recursion

Part 3: Core Data Structures (2–3 weeks)

Chapter	Topic	Key Skills
Ch.3.1: STL Essentials	Powerful built-in containers	`sort`, `map`, `set`, `stack`, `queue`
Ch.3.2: Arrays & Prefix Sums	Range queries in`O(1)`	1D/2D prefix sums, difference arrays
Ch.3.3: Sorting & Searching	Efficient ordering and lookup	`sort`, binary search, BS on answer
Ch.3.4: Two Pointers & Sliding Window	Linear-time array techniques	Two pointer, fixed/variable windows
Ch.3.5: Monotonic Stack & Monotonic Queue	Monotonic data structures	Next greater element, sliding window max
Ch.3.6: Stacks, Queues & Deques	Order-based data structures	`stack`, `queue`, `deque`; LIFO/FIFO patterns
Ch.3.7: Hashing Techniques	Fast key lookup and collision handling	`unordered_map/set`, polynomial hashing, rolling hash
Ch.3.8: Maps & Sets	Key-value lookup and uniqueness	`map`, `set`, `multiset`
Ch.3.9: Introduction to Segment Trees	Range queries with updates	Segment tree build/query/update
Ch.3.10: Fenwick Tree (BIT)	Efficient prefix-sum with point updates	Binary Indexed Tree, BIT update/query, inversion count
Ch.3.11: Binary Trees	Tree data structure fundamentals	Traversals, BST operations, balanced trees

Part 4: Greedy Algorithms (1 week)

Chapter	Topic	Key Skills
Ch.4.1: Greedy Fundamentals	When greedy works (and fails)	Activity selection, exchange argument
Ch.4.2: Greedy in USACO	Contest-focused greedy	Scheduling, binary search + greedy

Part 5: Graph Algorithms (2–3 weeks)

Chapter	Topic	Key Skills
Ch.5.1: Introduction to Graphs	Modeling relationships	Adjacency list, graph types
Ch.5.2: BFS & DFS	Graph traversal	Shortest path, multi-source BFS, cycle detection, topo sort
Ch.5.3: Trees & Special Graphs	Tree algorithms	DSU, Kruskal's MST, tree diameter, LCA, Euler tour
Ch.5.4: Shortest Paths	Weighted graph shortest paths	Dijkstra, Bellman-Ford, Floyd-Warshall

Part 6: Dynamic Programming (3–4 weeks)

Chapter	Topic	Key Skills
Ch.6.1: Introduction to DP	Memoization and tabulation	Fibonacci, coin change
Ch.6.2: Classic DP Problems	Core DP patterns	LIS, 0/1 Knapsack, grid paths
Ch.6.3: Advanced DP Patterns	Harder techniques	Bitmask DP, interval DP, tree DP, digit DP

Part 7: USACO Contest Guide (Read anytime)

Chapter	Topic	Key Skills
Ch.7.1: Understanding USACO	Format, divisions, scoring, problem taxonomy	Contest strategy, upsolving, pattern recognition
Ch.7.2: Problem-Solving Strategies	How to think about problems	Algorithm selection, debugging
Ch.7.3: Ad Hoc Problems	Observation-based problems with no standard algorithm	Invariants, parity, cycle detection, constructive thinking

Appendix & Reference

Section	Content
Appendix A: C++ Quick Reference	STL cheat sheet, complexity table
Appendix B: USACO Problem Set	Curated problem list by topic and difficulty
Appendix C: Competitive Programming Tricks	Fast I/O, macros, modular arithmetic
Appendix D: Contest-Ready Templates	DSU, Segment Tree, BFS, Dijkstra, binary search, modpow
Appendix E: Math Foundations	Modular arithmetic, combinatorics, number theory, probability
Appendix F: Debugging Guide	Common bugs, debugging techniques, AddressSanitizer
Glossary	35+ competitive programming terms defined
📊 Knowledge Map	Interactive chapter dependency graph — click nodes to explore prerequisites

🔧 Setup Instructions

Compiler Setup

Platform	Command
Windows (MSYS2)	`pacman -S mingw-w64-x86_64-gcc`
macOS	`xcode-select --install`
Linux (Debian/Ubuntu)	`sudo apt install g++ build-essential`

Verify with: g++ --version

Recommended Compile Flags

# Development (shows warnings, helpful for debugging)
g++ -o sol solution.cpp -std=c++17 -O2 -Wall -Wextra

# Contest (fast, silent)
g++ -o sol solution.cpp -std=c++17 -O2

Running with I/O Redirection

# Run with input file
./sol < input.txt

# Run and save output
./sol < input.txt > output.txt

# Compare output to expected
diff output.txt expected.txt

🌐 External Resources

Resource	What It's Best For
usaco.org	Official USACO problems + editorials
usaco.guide	Community guide, curated problems by topic
codeforces.com	Additional practice problems, contests
cp-algorithms.com	Deep dives into specific algorithms
atcoder.jp	High-quality educational problems (AtCoder Beginner)

🏅 Who Is This Book For?

✅ Middle school / high school students preparing for USACO Bronze through Silver

✅ Complete beginners with no prior programming experience (Part 2 starts from zero)

✅ Intermediate programmers who know Python or Java and want to learn C++ for competitive programming

✅ Self-learners who want a structured, complete curriculum instead of scattered tutorials

✅ Coaches and teachers looking for a comprehensive curriculum for their students

This book is NOT for:

USACO Gold/Platinum (advanced data structures, network flow, geometry)
General software engineering (no databases, web development, etc.)

🐄 Ready? Let's Begin!

Turn to Chapter 2.1 and write your first C++ program.

The path from complete beginner to USACO GOLD is roughly 200–400 hours of focused practice over 2–6 months. It won't always be easy — but every USACO GOLD and Gold competitor you admire started exactly where you are now.

The only way to get better is to write code, struggle with problems, and keep going. 🐄

Last updated: 2026 · Targets: USACO Bronze & GOLD · C++ Standard: C++17 35+ SVG diagrams · 150+ code examples · 130+ practice problems

⚡ Part 2: C++ Foundations

Master the building blocks of competitive programming in C++. From your first "Hello World" to functions and arrays.

📚 3 Chapters · ⏱️ Estimated 1-2 weeks · 🎯 Target: Write and compile C++ programs

Part 2: C++ Foundations

Before you can solve algorithmic problems, you need to speak the language. Part 2 is your crash course in C++ — from the very first program to functions, arrays, and vectors. You'll build the foundational skills needed for all later chapters.

What You'll Learn

Chapter	Topic	Key Skills
Chapter 2.1	Your First C++ Program	Variables, input/output, compilation
Chapter 2.2	Control Flow	if/else, loops, break/continue
Chapter 2.3	Functions & Arrays	Reusable code, arrays, vectors

Why C++?

Competitive programmers overwhelmingly choose C++ for two reasons:

Speed — C++ programs run faster than Python or Java, which matters when you have tight time limits (typically 1–2 seconds for up to 10^8 operations)
The STL — C++'s Standard Template Library gives you ready-made implementations of nearly every data structure and algorithm you'll ever need

Note: USACO accepts C++, Java, and Python. But C++ is by far the most common choice among top competitors, and this book focuses on it exclusively.

Tips for Part 2

Type the code yourself. Don't copy-paste. Your fingers need to learn the syntax.
Break things. Deliberately introduce errors and see what happens. Reading compiler errors is a skill.
Run every example. Seeing output appear on screen cements understanding far better than just reading.

Let's dive in!

📖 Chapter 2.1 ⏱️ ~60 min read 🎯 Beginner

Chapter 2.1: Your First C++ Program

📝 Before You Continue: This is the very first chapter — no prerequisites! You don't need to have any programming experience. Just work through this chapter from top to bottom and you'll write your first real C++ program by the end.

Welcome! By the end of this chapter, you will have:

Set up a working C++ environment (takes 5 minutes using an online compiler)
Written, compiled, and run your first C++ program
Understood what every single line of code does
Learned about variables, data types, and input/output
Solved 13 practice problems with full solutions

2.1.0 Setting Up Your Environment

Before writing any code, you need a place to write and run it. There are two options: online compilers (recommended for beginners — no installation required) and local setup (optional, for when you want to work offline).

Option A: Online Compilers (Recommended — Start Here!)

You only need a web browser. Open any of these sites:

Site	URL	Notes
Codeforces IDE	codeforces.com	Create a free account, then click "Submit code" on any problem to get a code editor
Replit	replit.com	Create a "C++ project", get a full editor + terminal
Ideone	ideone.com	Paste code, select C++17, click "Run" — simplest option
OnlineGDB	onlinegdb.com	Good debugger built in

Using Ideone (simplest for beginners):

Go to ideone.com
Select "C++17 (gcc 8.3)" from the language dropdown
Paste your code in the text area
Click the green "Run" button
See output in the bottom panel

That's it! No installation, no configuration.

Option B: Using CLion (Recommended Local IDE)

If you want to write and run C++ code offline on your own computer, we highly recommend CLion — a professional C/C++ IDE by JetBrains. It features intelligent code completion, one-click build & run, and a built-in debugger, all of which will significantly boost your productivity.

💡 Free for Students! CLion is a paid product, but JetBrains offers a free educational license for students. Simply apply with your .edu email on the JetBrains Student License page.

Installation Steps:

Step 1: Install a C++ Compiler (CLion requires an external compiler)

OS	How to Install
Windows	Install MSYS2. After installation, run the following in the MSYS2 terminal: `pacman -S mingw-w64-x86_64-gcc`, then add `C:\msys64\mingw64\bin` to your system PATH
Mac	Open Terminal and run: `xcode-select --install`. Click "Install" in the dialog that appears and wait about 5 minutes
Linux	Ubuntu/Debian: `sudo apt install g++ cmake`; Fedora: `sudo dnf install gcc-c++ cmake`

Step 2: Install CLion

Go to the CLion download page and download the installer for your OS
Run the installer and follow the prompts (keep the default options)
On first launch, choose "Activate" → sign in with your JetBrains student account, or start a free 30-day trial

Step 3: Create Your First Project

Open CLion and click "New Project"
Select "C++ Executable" and set the Language standard to C++17
Click "Create" — CLion will automatically generate a project with a main.cpp file
Write your code in main.cpp, then click the green ▶ Run button in the top-right corner to compile and run
The output will appear in the "Run" panel at the bottom

🔧 CLion Auto-Detects Compilers: On first launch, CLion automatically scans for installed compilers (GCC / Clang / MSVC). If detection succeeds, you'll see a green checkmark ✅ in Settings → Build → Toolchains. If not detected, verify that the compiler from Step 1 is correctly installed and added to your PATH.

Useful CLion Features for Competitive Programming:

Built-in Terminal: The Terminal tab at the bottom lets you type test input directly
Debugger: Set breakpoints, step through code line by line, and inspect variable values — an essential tool for tracking down bugs
Code Formatting: Ctrl + Alt + L (Mac: Cmd + Option + L) automatically tidies up your code indentation

How to Compile and Run (Local)

Once you have g++ installed, here's how to compile and run:

g++ -o hello hello.cpp -std=c++17

Let's break down that command character by character:

Part	Meaning
`g++`	The name of the C++ compiler program
`-o hello`	`-o` means "output file name"; `hello` is the name we're giving our program
`hello.cpp`	The source file we want to compile (our C++ code)
`-std=c++17`	Use the C++17 version of C++ (has the most features)

Then to run it:

./hello        # Linux/Mac: ./ means "in current directory"
hello.exe      # Windows (the .exe is added automatically)

🤔 Why ./hello and not just hello? On Linux/Mac, the system won't run programs from the current folder by default (for security). The ./ explicitly says "look in the current directory."

2.1.1 Hello, World!

Every programming journey starts the same way. Here is the simplest complete C++ program:

#include <iostream>    // tells the compiler we want to use input/output

int main() {           // every C++ program starts executing from main()
    std::cout << "Hello, World!" << std::endl;  // print to the screen
    return 0;          // 0 = success, program ended normally
}

Run it, and you should see:

Hello, World!

What every line means:

Line 1: #include <iostream> This is a preprocessor directive — an instruction that runs before the actual compilation. It says "copy-paste the contents of the iostream library into my program." The iostream library provides cin (read input) and cout (print output). Without this line, your program can't print anything.

Think of it like: before you can cook, you need to bring the ingredients into the kitchen.

Line 3: int main() This declares the main function — the starting point of every C++ program. When you run a C++ program, the computer always starts executing from the first line inside main(). The int means this function returns an integer (the exit code). Every C++ program must have exactly one main.

Line 4: std::cout << "Hello, World!" << std::endl; This prints text. Let's break it down:

std::cout — the "console output" stream (think of it as the screen)
<< — the "put into" operator; sends data into the stream
"Hello, World!" — the text to print (the quotes are not printed)
<< std::endl — adds a newline (like pressing Enter)
; — every statement in C++ ends with a semicolon

Line 5: return 0; Exits main and tells the operating system the program finished successfully. (A non-zero return would signal an error.)

The Compilation Pipeline

Visual: The Compilation Pipeline

C++ Compilation Pipeline

The diagram above shows the three-stage journey from source code to executable: your .cpp file is fed to the g++ compiler, which produces a runnable binary. Understanding this pipeline helps debug compilation errors before they happen.

2.1.2 The Competitive Programmer's Template

When solving USACO problems, you'll use a standard template. Here it is, fully explained:

#include <bits/stdc++.h>      // "batteries included" — includes ALL standard libraries
using namespace std;           // lets us write cout instead of std::cout

int main() {
    ios_base::sync_with_stdio(false);  // disables syncing C and C++ I/O (faster)
    cin.tie(NULL);                      // unties cin from cout (faster input)

    // Your solution code goes here

    return 0;
}

Why `#include <bits/stdc++.h>`?

This is a GCC-specific header that includes every standard library at once. Instead of writing:

#include <iostream>
#include <vector>
#include <algorithm>
#include <map>
// ... 20 more lines

You write one line. In competitive programming, this is universally accepted and saves time.

Note: bits/stdc++.h only works with GCC (the compiler USACO judges use). It's fine for competitive programming, but don't use it in production software.

Why `using namespace std;`?

The standard library puts everything inside a namespace called std. Without this line, you'd write std::cout, std::vector, std::sort everywhere. With using namespace std;, you write cout, vector, sort — much cleaner.

The I/O Speed Lines

ios_base::sync_with_stdio(false);
cin.tie(NULL);

These two lines make cin and cout much faster. Without them, reading large inputs can be 10× slower and cause "Time Limit Exceeded" (TLE) even if your algorithm is correct. Always include them.

🐛 Common Bug: After using these speed lines, don't mix cin/cout with scanf/printf. Pick one style.

2.1.3 Variables and Data Types

A variable is a named location in memory that stores a value. In C++, every variable has a type — the type tells the computer how much memory to reserve and what kind of data will go in it.

🧠 Mental Model: Variables are like labeled boxes

When you write:   int score = 100;

The computer does three things:
  1. Creates a box big enough to hold an integer (4 bytes)
  2. Puts the label "score" on the box
  3. Puts the number 100 inside the box

Variable Memory Box

The Essential Types for Competitive Programming

#include <bits/stdc++.h>
using namespace std;

int main() {
    // int: whole numbers, range: -2,147,483,648 to +2,147,483,647 (about ±2 billion)
    int apples = 42;
    int temperature = -5;

    // long long: big whole numbers, range: about ±9.2 × 10^18
    long long population = 7800000000LL;  // the LL suffix means "this is a long long literal"
    long long trillion = 1000000000000LL;

    // double: decimal/fractional numbers
    double pi = 3.14159265358979;
    double percentage = 99.5;

    // bool: true or false only
    bool isRaining = true;
    bool finished = false;

    // char: a single character (stored as a number 0-255)
    char grade = 'A';     // single quotes for characters
    char newline = '\n';  // special: newline character

    // string: a sequence of characters
    string name = "Alice";         // double quotes for strings
    string greeting = "Hello!";

    // Print them all:
    cout << "Apples: " << apples << "\n";
    cout << "Population: " << population << "\n";
    cout << "Pi: " << pi << "\n";
    cout << "Is raining: " << isRaining << "\n";  // prints 1 for true, 0 for false
    cout << "Grade: " << grade << "\n";
    cout << "Name: " << name << "\n";

    return 0;
}

Visual: C++ Data Types Reference

C++ Variable Types

Choosing the Right Type

Situation	Type to Use
Counting things, small numbers	`int`
Numbers that might exceed 2 billion	`long long`
Decimal/fractional answers	`double`
Yes/no flags	`bool`
Single letters or characters	`char`
Words or sentences	`string`

Variable Naming Rules

Variable names follow strict rules in C++. Getting these right is essential — bad names lead to bugs, and illegal names won't compile at all.

The Formal Rules (Enforced by the Compiler)

✅ Legal names must:

Start with a letter (a-z, A-Z) or underscore _
Contain only letters, digits (0-9), and underscores
Not be a C++ reserved keyword

❌ These will NOT compile:

Illegal Name	Why It's Wrong
`3apples`	Starts with a digit
`my score`	Contains a space
`my-score`	Contains a hyphen (interpreted as minus)
`int`	Reserved keyword
`class`	Reserved keyword
`return`	Reserved keyword

⚠️ Case sensitive! score, Score, and SCORE are three completely different variables. This is a common source of bugs — be consistent.

Common Naming Styles

There are several widely-used naming conventions in C++. You don't have to pick one for competitive programming, but knowing them helps you read other people's code:

Style	Example	Typically Used For
camelCase	`numStudents`, `totalScore`	Local variables, function parameters
PascalCase	`MyClass`, `GraphNode`	Classes, structs, type names
snake_case	`num_students`, `total_score`	Variables, functions (C/Python style)
ALL_CAPS	`MAX_N`, `MOD`, `INF`	Constants, macros
Single letter	`n`, `m`, `i`, `j`	Loop indices, math-style competitive programming

In competitive programming, camelCase and single-letter names are most common. In production code at companies, snake_case or camelCase are standard depending on the style guide.

Best Practices for Naming

1. Be descriptive — make the purpose clear from the name:

// ✅ Good — instantly clear what each variable stores
int numCows = 5;
long long totalMilk = 0;
string cowName = "Bessie";
int maxScore = 100;

// ❌ Bad — legal but confusing
int x = 5;            // What is x? Count? Index? Value?
long long t = 0;      // What is t? Time? Total? Temporary?
string n = "Bessie";  // n usually means "number" — misleading for a name!

2. Use conventional single-letter names only when the meaning is obvious:

// ✅ Acceptable — these are universally understood conventions
for (int i = 0; i < n; i++) { ... }    // i, j, k for loop indices
int n, m;                                // n = count, m = second dimension
cin >> n >> m;                           // in competitive programming, everyone does this

// ❌ Confusing — single letters with no clear convention
int q = 5;   // Is q a count? A query? A coefficient?
char z = 'A'; // Why z?

3. Constants should be ALL_CAPS to stand out:

const int MAX_N = 200005;        // maximum array size
const int MOD = 1000000007;      // modular arithmetic constant
const long long INF = 1e18;      // "infinity" for comparisons
const double PI = 3.14159265359; // mathematical constant

4. Avoid names that look too similar to each other:

// ❌ Easy to mix up
int total1 = 10;
int totall = 20;  // is this "total-L" or "total-1" with a typo?

int O = 0;        // the letter O looks like the digit 0
int l = 1;        // lowercase L looks like the digit 1

// ✅ Better alternatives
int totalA = 10;
int totalB = 20;

5. Don't start names with underscores followed by uppercase letters:

// ❌ Technically compiles, but reserved by the C++ standard
int _Score = 100;   // names like _X are reserved for the compiler/library
int __value = 42;   // double underscore is ALWAYS reserved

// ✅ Safe alternatives
int score = 100;
int myValue = 42;

Naming in Competitive Programming vs. Production Code

Aspect	Competitive Programming	Production / School Projects
Variable length	Short is fine: `n`, `m`, `dp`, `adj`	Descriptive: `numStudents`, `adjacencyList`
Loop variables	`i`, `j`, `k` always	`i`, `j`, `k` still fine
Constants	`MAXN`, `MOD`, `INF`	`kMaxSize`, `kModulus` (Google style)
Comments	Minimal — speed matters	Thorough — readability matters
Goal	Write fast, solve fast	Write code others can maintain

💡 For this book: We'll use a mix — descriptive names for clarity in explanations, but shorter names when solving problems under time pressure. The important thing is: you should always be able to look at a variable name and immediately know what it stores.

Deep Dive: `char`, `string`, and Character-Integer Conversions

Earlier in this chapter we briefly introduced char and string. Since many USACO problems involve character processing, digit extraction, and string manipulation, let's take a deeper look at these essential types.

`char` and ASCII — Every Character is a Number

A char in C++ is stored as a 1-byte integer (0–255). Each character is mapped to a number according to the ASCII table (American Standard Code for Information Interchange). You don't need to memorize the whole table, but knowing a few key ranges is extremely useful:

ASCII Table Key Ranges

 Key relationships:
 • 'a' - 'A' = 32     (difference between lower and upper case)
 • '0' has ASCII value 48 (not 0!)
 • Digits, uppercase letters, and lowercase letters
   are each in CONSECUTIVE ranges

#include <bits/stdc++.h>
using namespace std;

int main() {
    char ch = 'A';

    // A char IS an integer — you can print its numeric value
    cout << ch << "\n";        // prints: A  (as character)
    cout << (int)ch << "\n";   // prints: 65 (its ASCII value)

    // You can do arithmetic on chars!
    char next = ch + 1;       // 'A' + 1 = 66 = 'B'
    cout << next << "\n";     // prints: B

    // Compare chars (compares their ASCII values)
    cout << ('a' < 'z') << "\n";   // 1 (true, because 97 < 122)
    cout << ('A' < 'a') << "\n";   // 1 (true, because 65 < 97)

    return 0;
}

`char` ↔ `int` Conversions — The Most Common Technique

In competitive programming, you constantly need to convert between character digits and integer values. Here's the complete guide:

1. Digit character → Integer value (e.g., '7' → 7)

char ch = '7';
int digit = ch - '0';    // '7' - '0' = 55 - 48 = 7
cout << digit << "\n";   // prints: 7

// This works because digit characters '0'~'9' have consecutive ASCII values:
// '0'=48, '1'=49, ..., '9'=57
// So ch - '0' gives the actual numeric value (0~9)

2. Integer value → Digit character (e.g., 7 → '7')

int digit = 7;
char ch = '0' + digit;   // 48 + 7 = 55 = '7'
cout << ch << "\n";      // prints: 7 (as the character '7')

// Works for digits 0~9 only

3. Uppercase ↔ Lowercase conversion

char upper = 'C';
char lower = upper + 32;           // 'C'(67) + 32 = 'c'(99)
cout << lower << "\n";            // prints: c

// More readable approach using the difference:
char lower2 = upper - 'A' + 'a';  // 'C'-'A' = 2, 'a'+2 = 'c'
cout << lower2 << "\n";           // prints: c

// Reverse: lowercase → uppercase
char ch = 'f';
char upper2 = ch - 'a' + 'A';    // 'f'-'a' = 5, 'A'+5 = 'F'
cout << upper2 << "\n";           // prints: F

// Using built-in functions (recommended for clarity):
cout << (char)toupper('g') << "\n";  // prints: G
cout << (char)tolower('G') << "\n";  // prints: g

4. Check character types (very useful in USACO)

char ch = '5';

// Check if digit
if (ch >= '0' && ch <= '9') {
    cout << "It's a digit!\n";
}

// Check if uppercase letter
if (ch >= 'A' && ch <= 'Z') {
    cout << "Uppercase!\n";
}

// Check if lowercase letter
if (ch >= 'a' && ch <= 'z') {
    cout << "Lowercase!\n";
}

// Or use built-in functions:
// isdigit(ch), isupper(ch), islower(ch), isalpha(ch), isalnum(ch)
if (isdigit(ch)) cout << "Digit!\n";
if (isalpha(ch)) cout << "Letter!\n";

5. A Classic Pattern: Extract Digits from a String

string s = "abc123def";
int sum = 0;
for (char ch : s) {
    if (ch >= '0' && ch <= '9') {
        sum += ch - '0';  // convert digit char to int and add
    }
}
cout << "Sum of digits: " << sum << "\n";  // 1+2+3 = 6

`string` Detailed Guide

string is C++'s built-in text type. Unlike a single char, a string holds a sequence of characters and provides many useful operations.

Basic operations:

#include <bits/stdc++.h>
using namespace std;

int main() {
    // Creating strings
    string s1 = "Hello";
    string s2 = "World";
    string empty = "";           // empty string
    string repeated(5, 'x');     // "xxxxx" — 5 copies of 'x'

    // Length
    cout << s1.size() << "\n";   // 5 (same as s1.length())

    // Concatenation (joining strings)
    string s3 = s1 + " " + s2;  // "Hello World"
    s1 += "!";                   // s1 is now "Hello!"

    // Access individual characters (0-indexed, just like arrays)
    cout << s3[0] << "\n";      // 'H'
    cout << s3[6] << "\n";      // 'W'

    // Modify individual characters
    s3[0] = 'h';                // "hello World"

    // Comparison (lexicographic, i.e., dictionary order)
    cout << ("apple" < "banana") << "\n";   // 1 (true)
    cout << ("abc" == "abc") << "\n";       // 1 (true)
    cout << ("abc" < "abd") << "\n";        // 1 (true, compares char by char)

    return 0;
}

Iterating over a string:

string s = "USACO";

// Method 1: index-based loop
for (int i = 0; i < (int)s.size(); i++) {
    cout << s[i] << " ";  // U S A C O
}
cout << "\n";

// Method 2: range-based for loop (cleaner)
for (char ch : s) {
    cout << ch << " ";    // U S A C O
}
cout << "\n";

// Method 3: range-based with reference (for modifying in-place)
for (char& ch : s) {
    ch = tolower(ch);     // convert each char to lowercase
}
cout << s << "\n";        // "usaco"

Useful string functions:

string s = "Hello, World!";

// Substring: s.substr(start, length)
string sub = s.substr(7, 5);     // "World" (starting at index 7, take 5 chars)
string sub2 = s.substr(7);       // "World!" (from index 7 to end)

// Find: s.find("text") — returns index or string::npos if not found
size_t pos = s.find("World");    // 7  (size_t, not int!)
if (s.find("xyz") == string::npos) {
    cout << "Not found!\n";
}

// Append
s.append(" Hi");                 // "Hello, World! Hi"
// or equivalently: s += " Hi";

// Insert
s.insert(5, "!!");               // "Hello!!, World! Hi"

// Erase: s.erase(start, count)
s.erase(5, 2);                   // removes 2 chars starting at index 5 → "Hello, World! Hi"

// Replace: s.replace(start, count, "new text")
string msg = "I love cats";
msg.replace(7, 4, "dogs");       // "I love dogs"

Reading strings from input:

// cin >> reads ONE WORD (stops at whitespace)
string word;
cin >> word;    // input "Hello World" → word = "Hello"

// getline reads the ENTIRE LINE (including spaces)
string line;
getline(cin, line);   // input "Hello World" → line = "Hello World"

// ⚠️ Remember: after cin >>, call cin.ignore() before getline!
int n;
cin >> n;
cin.ignore();          // consume the leftover '\n'
string fullLine;
getline(cin, fullLine); // now this reads correctly

Converting between string and numbers:

// String → Integer
string numStr = "42";
int num = stoi(numStr);         // stoi = "string to int" → 42
long long big = stoll("123456789012345"); // stoll = "string to long long"

// String → Double
double d = stod("3.14");       // stod = "string to double" → 3.14

// Integer → String
int x = 255;
string s = to_string(x);       // "255"
string s2 = to_string(3.14);   // "3.140000"

`char` Arrays (C-Style Strings) — Know They Exist

In C (and old C++ code), strings were stored as arrays of char ending with a special null character '\0'. You'll rarely need these in competitive programming (use string instead), but you should recognize them:

// C-style string (char array)
char greeting[] = "Hello";  // actually stores: H e l l o \0 (6 chars!)
// The '\0' (null terminator) marks the end of the string

// WARNING: you must ensure the array is big enough to hold the string + '\0'
char name[20];              // can hold up to 19 characters + '\0'

// Reading into a char array (rarely needed)
// cin >> name;             // works, but limited by array size
// scanf("%s", name);       // C-style, also works

// Converting between char array and string
string s = greeting;        // char array → string (automatic)
// string → char array: use s.c_str() to get a const char*

Why string is better than char[] for competitive programming:

Feature	`char[]` (C-style)	`string` (C++)
Size	Must predefine max size	Grows automatically
Concatenation	`strcat()` — manual, error-prone	`s1 + s2` — simple
Comparison	`strcmp()` — returns int	`s1 == s2` — natural
Length	`strlen()` — O(N) each call	`s.size()` — O(1)
Safety	Buffer overflow risk	Safe, managed by C++

⚡ Pro Tip for USACO: Always use string unless a problem specifically requires char arrays. String operations are cleaner, safer, and easier to debug. The only common use of char arrays in competitive programming is when reading very large inputs with scanf/printf for speed — but with sync_with_stdio(false), string + cin/cout is fast enough for 99% of USACO problems.

Quick Reference: Character/String Cheat Sheet

Task	Code	Example
Digit char → int	`ch - '0'`	`'7' - '0'` → `7`
Int → digit char	`'0' + digit`	`'0' + 3` → `'3'`
Uppercase → lowercase	`ch - 'A' + 'a'` or `tolower(ch)`	`'C'` → `'c'`
Lowercase → uppercase	`ch - 'a' + 'A'` or `toupper(ch)`	`'f'` → `'F'`
Is digit?	`ch >= '0' && ch <= '9'` or `isdigit(ch)`	`'5'` → true
Is letter?	`isalpha(ch)`	`'A'` → true
String length	`s.size()` or `s.length()`	`"abc"` → 3
Substring	`s.substr(start, len)`	`"Hello".substr(1,3)` → `"ell"`
Find in string	`s.find("text")`	returns index or `npos`
String → int	`stoi(s)`	`stoi("42")` → 42
Int → string	`to_string(n)`	`to_string(42)` → `"42"`
Traverse string	`for (char ch : s)`	iterate each character

⚠️ Integer Overflow — The #1 Bug in Competitive Programming

What happens when a number gets too big for its type?

// Imagine int as a dial that goes from -2,147,483,648 to 2,147,483,647
// When you go past the maximum, it WRAPS AROUND to the minimum!

int x = 2147483647;  // maximum int value
cout << x << "\n";   // prints: 2147483647
x++;                 // add 1... what happens?
cout << x << "\n";   // prints: -2147483648  (OVERFLOW! Wrapped around!)

This is like an old car odometer that hits 999999 and rolls back to 000000. The number wraps around.

How to avoid overflow:

int a = 1000000000;    // 1 billion — fits in int
int b = 1000000000;    // 1 billion — fits in int
// int wrong = a * b;  // OVERFLOW! a*b = 10^18, doesn't fit in int

long long correct = (long long)a * b;  // Cast one to long long before multiplying
cout << correct << "\n";  // 1000000000000000000 ✓

// Rule of thumb: if N can be up to 10^9 and you multiply two such values, use long long

⚡ Pro Tip: When in doubt, use long long. It's slightly slower than int but prevents overflow bugs that are very hard to spot.

2.1.4 Input and Output with `cin` and `cout`

Printing Output with `cout`

int score = 95;
string name = "Alice";

cout << "Score: " << score << "\n";     // Score: 95
cout << name << " got " << score << "\n"; // Alice got 95

// "\n" vs endl
cout << "Line 1" << "\n";   // fast — just a newline character
cout << "Line 2" << endl;   // slow — flushes buffer AND adds newline

⚡ Pro Tip: Always use "\n" instead of endl. The endl flushes the output buffer which is much slower. In problems with lots of output, using endl can cause Time Limit Exceeded!

Reading Input with `cin`

int n;
cin >> n;    // reads one integer from input

string s;
cin >> s;    // reads one word (stops at whitespace — spaces, tabs, newlines)

double x;
cin >> x;    // reads a decimal number

cin >> automatically skips whitespace between values. This means spaces, tabs, and newlines are all treated the same way. So these two inputs are read identically:

Input style 1 (all on one line):   42 hello 3.14
Input style 2 (on separate lines):
42
hello
3.14

Both work with:

int a; string b; double c;
cin >> a >> b >> c;  // reads all three regardless of formatting

Reading Multiple Values — The Most Common USACO Pattern

USACO problems almost always start with: "Read N, then read N values." Here's how:

Typical USACO input:
5          ← first line: N (the number of items)
10 20 30 40 50   ← next line(s): the N items

int n;
cin >> n;              // read N

for (int i = 0; i < n; i++) {
    int x;
    cin >> x;          // read each item
    cout << x * 2 << "\n";  // process it
}

Complexity Analysis:

Time: O(N) — read N numbers and process each one in O(1)

Space: O(1) — only one variable x, no storage of all data

For the input 5\n10 20 30 40 50, this would print:

Reading a Full Line (Including Spaces)

Sometimes input has multiple words on a line. cin >> only reads one word at a time, so use getline:

string fullName;
getline(cin, fullName);  // reads the entire line, including spaces
cout << "Name: " << fullName << "\n";

🐛 Common Bug: Mixing cin >> and getline can cause problems. After cin >> n, there's a leftover \n in the buffer. If you then call getline, it will read that empty newline instead of the next line. Fix: call cin.ignore() after cin >> before using getline.

Controlling Decimal Output

double y = 3.14159;

cout << y << "\n";                            // 3.14159 (default)
cout << fixed << setprecision(2) << y << "\n"; // 3.14 (exactly 2 decimal places)
cout << fixed << setprecision(6) << y << "\n"; // 3.141590 (6 decimal places)

2.1.5 Basic Arithmetic

#include <bits/stdc++.h>
using namespace std;

int main() {
    int a = 17, b = 5;

    cout << a + b << "\n";   // 22  (addition)
    cout << a - b << "\n";   // 12  (subtraction)
    cout << a * b << "\n";   // 85  (multiplication)
    cout << a / b << "\n";   // 3   (INTEGER division — truncates toward zero!)
    cout << a % b << "\n";   // 2   (modulo — the REMAINDER after division)

    // Integer division example:
    // 17 ÷ 5 = 3 remainder 2
    // So: 17 / 5 = 3  and  17 % 5 = 2

    double x = 17.0, y = 5.0;
    cout << x / y << "\n";   // 3.4 (real division when operands are doubles)

    // Shorthand assignment operators:
    int n = 10;
    n += 5;    // same as: n = n + 5   → n is now 15
    n -= 3;    // same as: n = n - 3   → n is now 12
    n *= 2;    // same as: n = n * 2   → n is now 24
    n /= 4;    // same as: n = n / 4   → n is now 6
    n++;       // same as: n = n + 1   → n is now 7
    n--;       // same as: n = n - 1   → n is now 6

    cout << n << "\n";  // 6

    return 0;
}

🤔 Why does integer division truncate?

When both operands are integers, C++ does integer division — it discards the fractional part. 17 / 5 gives 3, not 3.4. This is intentional and very useful (e.g., to find which "group" something falls into).

// How many full hours in 200 minutes?
int minutes = 200;
int hours = minutes / 60;     // 200 / 60 = 3 (not 3.33...)
int remaining = minutes % 60; // 200 % 60 = 20
cout << hours << " hours and " << remaining << " minutes\n";  // 3 hours and 20 minutes

// To get decimal division, at least ONE operand must be a double:
int a = 7, b = 2;
cout << a / b << "\n";           // 3    (integer division)
cout << (double)a / b << "\n";   // 3.5  (cast a to double first)
cout << a / (double)b << "\n";   // 3.5  (cast b to double)
cout << 7.0 / 2 << "\n";        // 3.5  (literal 7.0 is a double)

2.1.6 Your First USACO-Style Program

Let's put everything together and write a complete program that reads input and produces output — just like a real USACO problem.

Problem: Read two integers N and M. Print their sum, difference, product, integer quotient, and remainder.

Thinking through it:

We need two variables to store N and M
We use cin to read them
We use cout to print each result
Since N and M could be large, should we use long long? Let's be safe.

💡 Beginner's Problem-Solving Flow:

When facing a problem, don't rush to write code. First think through the steps in plain language:

Understand the problem: What is the input? What is the output? What are the constraints?

Work through an example by hand: Use the sample input, manually compute the output, confirm you understand the problem

Think about data ranges: How large can N and M be? Could there be overflow?

Write pseudocode: Read → Compute → Output

Translate to C++: Convert pseudocode to real code line by line

This problem: read two numbers → perform five operations → output five results. Very straightforward!

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    long long n, m;
    cin >> n >> m;  // read both numbers on one line

    cout << n + m << "\n";  // sum
    cout << n - m << "\n";  // difference
    cout << n * m << "\n";  // product
    cout << n / m << "\n";  // integer quotient
    cout << n % m << "\n";  // remainder

    return 0;
}

Complexity Analysis:

Time: O(1) — only a fixed number of arithmetic operations

Space: O(1) — only two variables

Sample Input:

17 5

Sample Output:

⚠️ Common Mistakes in Chapter 2.1

#	Mistake	Example	Why It's Wrong	Fix
1	Integer overflow	`int a = 1e9; int b = a*a;`	`a*b` = 10^18 exceeds `int` max ~2.1×10^9, result "wraps around" to wrong value	Use `long long`
2	Using `endl`	`cout << x << endl;`	`endl` flushes the output buffer, 10x+ slower than `"\n"` for large output, may cause TLE	Use `"\n"`
3	Forgetting I/O speedup	Missing `sync_with_stdio` and `cin.tie`	By default `cin`/`cout` syncs with C's `scanf`/`printf`, very slow for large input	Always add the two speed lines
4	Integer division surprise	`7/2` expects `3.5` but gets `3`	Dividing two integers, C++ truncates the fractional part	Cast to double: `(double)7/2`
5	Missing semicolon	`cout << x`	Every C++ statement must end with `;`, otherwise compilation fails	`cout << x;`
6	Mixing `cin >>` and `getline`	`cin >> n` then `getline(cin, s)`	`cin >>` leaves a `\n` in the buffer, `getline` reads an empty line	Add `cin.ignore()` in between

Chapter Summary

📌 Key Takeaways

Concept	Key Points	Why It Matters
`#include <bits/stdc++.h>`	Includes all standard libraries at once	Saves time in contests, no need to remember each header
`using namespace std;`	Omits the `std::` prefix	Cleaner code, universal practice in competitive programming
`int main()`	The sole entry point of the program	Every C++ program must have exactly one `main`
`cin >> x` / `cout << x`	Read input / write output	The core I/O method for USACO
`int` vs `long long`	~2×10^9 vs ~9.2×10^18	Wrong type = overflow = wrong answer (most common bug in contests)
`"\n"` vs `endl`	`"\n"` is 10x faster	Determines AC vs TLE for large output
`a / b` and `a % b`	Integer division and remainder	Core tools for time conversion, grouping, etc.
I/O Speed Lines	`sync_with_stdio(false)` + `cin.tie(NULL)`	Essential in contest template, forgetting may cause TLE

❓ FAQ

Q1: Does bits/stdc++.h slow down compilation?

A: Yes, compilation time may increase by 1-2 seconds. But in contests, compilation time is not counted toward the time limit, so it doesn't affect results. Don't use it in production projects.

Q2: Which should I default to — int or long long?

A: Rule of thumb — when in doubt, use long long. It's slightly slower than int (nearly imperceptible on modern CPUs), but prevents overflow. Especially note: if two int values are multiplied, the result may need long long.

Q3: Why can't I use scanf/printf in USACO?

A: You actually can! But after adding sync_with_stdio(false), you cannot mix cin/cout with scanf/printf. Beginners are advised to stick with cin/cout — it's safer.

Q4: Can I omit return 0;?

A: In C++11 and later, if main() reaches the end without a return, the compiler automatically returns 0. So technically it can be omitted, but writing it is clearer.

Q5: My code runs correctly locally, but gets Wrong Answer (WA) on the USACO judge. What could be wrong?

A: The three most common reasons: ① Integer overflow (used int when long long was needed); ② Not handling all edge cases; ③ Wrong output format (extra or missing spaces/newlines).

🔗 Connections to Later Chapters

Chapter 2.2 (Control Flow) builds on this chapter by adding if/else conditionals and for/while loops, enabling you to handle "repeat N times" tasks
Chapter 2.3 (Functions & Arrays) introduces functions (organizing code into reusable blocks) and arrays (storing a collection of data) — core tools for solving USACO problems
Chapter 3.1 (STL Essentials) introduces STL tools like vector and sort, greatly simplifying the logic you write manually in this chapter
The integer overflow prevention techniques learned in this chapter will appear throughout the book, especially in Chapter 3.2 (Prefix Sums) and Chapters 6.1–6.3 (DP)

Practice Problems

Work through all problems in order — they get progressively harder. Each has a complete solution you can reveal after trying it yourself.

🌡️ Warm-Up Problems

These problems only require 1-3 lines of new code each. They're meant to help you practice typing C++ and running programs.

Warm-up 2.1.1 — Personal Greeting Write a program that prints exactly this (with your own name):

Hello, Alice!
My favorite number is 7.
I am learning C++.

(You can hardcode all values — no input needed.)

💡 Solution (click to reveal)

Approach: Just print three lines with cout. No input needed.

#include <bits/stdc++.h>
using namespace std;

int main() {
    cout << "Hello, Alice!\n";
    cout << "My favorite number is 7.\n";
    cout << "I am learning C++.\n";
    return 0;
}

Key points:

Each cout statement ends with ;\n" — the \n creates a new line
You can also chain multiple << operators on one cout line
No cin needed when there's no input

Warm-up 2.1.2 — Five Lines Print the numbers 1 through 5, each on its own line. Use exactly 5 separate cout statements (no loops yet — we cover loops in Chapter 2.2).

💡 Solution (click to reveal)

Approach: Five separate cout statements, one per number.

#include <bits/stdc++.h>
using namespace std;

int main() {
    cout << 1 << "\n";
    cout << 2 << "\n";
    cout << 3 << "\n";
    cout << 4 << "\n";
    cout << 5 << "\n";
    return 0;
}

Key points:

cout << 1 << "\n" prints the number 1 followed by a newline
We'll learn to do this with a loop in Chapter 2.2 — but this manual approach works fine for small counts

Warm-up 2.1.3 — Double It Read one integer from input. Print that integer multiplied by 2.

Sample Input: 7 Sample Output: 14

💡 Solution (click to reveal)

Approach: Read into a variable, multiply by 2, print.

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    cout << n * 2 << "\n";
    return 0;
}

Key points:

cin >> n reads one integer and stores it in n
We can do arithmetic directly inside cout: n * 2 is computed first, then printed
Use long long n if n might be very large (up to 10^9), since n * 2 could overflow int

Warm-up 2.1.4 — Sum of Two Read two integers on the same line. Print their sum.

Sample Input: 15 27 Sample Output: 42

💡 Solution (click to reveal)

Approach: Read two integers, add them, print.

#include <bits/stdc++.h>
using namespace std;

int main() {
    int a, b;
    cin >> a >> b;
    cout << a + b << "\n";
    return 0;
}

Key points:

cin >> a >> b reads two values in one statement — works whether they're on the same line or different lines
Declaring two variables on the same line: int a, b; is equivalent to int a; int b;

Warm-up 2.1.5 — Say Hi Read a single word (a first name, no spaces). Print Hi, [name]!

Sample Input: Bob Sample Output: Hi, Bob!

💡 Solution (click to reveal)

Approach: Read a string, then print it inside the greeting message.

#include <bits/stdc++.h>
using namespace std;

int main() {
    string name;
    cin >> name;
    cout << "Hi, " << name << "!\n";
    return 0;
}

Key points:

string name; declares a variable that holds text
cin >> name reads one word (stops at the first space)
Notice how cout can chain: literal string + variable + literal string

🏋️ Core Practice Problems

These problems require combining input, arithmetic, and output. Think through the math before coding.

Problem 2.1.6 — Age in Days Read a person's age in whole years. Print their approximate age in days (use 365 days per year, ignore leap years).

Sample Input: 15 Sample Output: 5475

💡 Solution (click to reveal)

Approach: Multiply years by 365. Since age × 365 fits in an int (max age ~150 → 150×365 = 54750, well within int range), int is fine here.

#include <bits/stdc++.h>
using namespace std;

int main() {
    int years;
    cin >> years;
    cout << years * 365 << "\n";
    return 0;
}

Key points:

years * 365 is computed as integers — no overflow risk here
If you wanted to include hours, minutes, seconds, you'd use long long to be safe

Problem 2.1.7 — Seconds Converter Read a number of seconds S (1 ≤ S ≤ 10^9). Convert it to hours, minutes, and remaining seconds.

Sample Input: 3661 Sample Output:

1 hours
1 minutes
1 seconds

💡 Solution (click to reveal)

Approach: Use integer division and modulo. First divide by 3600 to get hours, then use the remainder (mod 3600), divide by 60 to get minutes, remaining is seconds.

#include <bits/stdc++.h>
using namespace std;

int main() {
    long long s;
    cin >> s;

    long long hours = s / 3600;         // 3600 seconds per hour
    long long remaining = s % 3600;     // seconds left after removing full hours
    long long minutes = remaining / 60; // 60 seconds per minute
    long long seconds = remaining % 60; // seconds left after removing full minutes

    cout << hours << " hours\n";
    cout << minutes << " minutes\n";
    cout << seconds << " seconds\n";

    return 0;
}

Key points:

We use long long because S can be up to 10^9 (safe in int, but long long is a good habit)
The key insight: s % 3600 gives the seconds after removing full hours, then we can divide that by 60 to get minutes
Check: 3661 → 3661/3600=1 hour, 3661%3600=61, 61/60=1 minute, 61%60=1 second ✓

Problem 2.1.8 — Rectangle Read the length L and width W of a rectangle. Print its area and perimeter.

Sample Input: 6 4 Sample Output:

Area: 24
Perimeter: 20

💡 Solution (click to reveal)

Approach: Area = L × W, Perimeter = 2 × (L + W).

#include <bits/stdc++.h>
using namespace std;

int main() {
    long long L, W;
    cin >> L >> W;

    cout << "Area: " << L * W << "\n";
    cout << "Perimeter: " << 2 * (L + W) << "\n";

    return 0;
}

Key points:

Order of operations: 2 * (L + W) — the parentheses ensure we add L+W first, then multiply by 2
Using long long in case L and W are large (if L,W up to 10^9, L*W could be up to 10^18)

Problem 2.1.9 — Temperature Converter Read a temperature in Celsius. Print the equivalent in Fahrenheit. Formula: F = C × 9/5 + 32

Sample Input: 100 Sample Output: 212.00

💡 Solution (click to reveal)

Approach: Apply the formula. Since we need a decimal output, use double. The tricky part is the integer division trap: 9/5 in integer math = 1, not 1.8!

#include <bits/stdc++.h>
using namespace std;

int main() {
    double celsius;
    cin >> celsius;

    double fahrenheit = celsius * 9.0 / 5.0 + 32.0;

    cout << fixed << setprecision(2) << fahrenheit << "\n";

    return 0;
}

Key points:

Use 9.0 / 5.0 (or 9.0/5) instead of 9/5 — the latter is integer division giving 1, not 1.8!
fixed << setprecision(2) forces exactly 2 decimal places in the output
Check: 100°C → 100 × 9.0/5.0 + 32 = 180 + 32 = 212 ✓

Problem 2.1.10 — Coin Counter Read four integers: the number of quarters (25¢), dimes (10¢), nickels (5¢), and pennies (1¢). Print the total value in cents.

Sample Input:

3 2 1 4

(3 quarters, 2 dimes, 1 nickel, 4 pennies)

Sample Output: 104

💡 Solution (click to reveal)

Approach: Multiply each coin count by its value, sum them all.

#include <bits/stdc++.h>
using namespace std;

int main() {
    int quarters, dimes, nickels, pennies;
    cin >> quarters >> dimes >> nickels >> pennies;

    int total = quarters * 25 + dimes * 10 + nickels * 5 + pennies * 1;

    cout << total << "\n";

    return 0;
}

Key points:

Each coin type multiplied by its value in cents: quarters=25, dimes=10, nickels=5, pennies=1
Check: 3×25 + 2×10 + 1×5 + 4×1 = 75 + 20 + 5 + 4 = 104 ✓
If coin counts can be very large, switch to long long

🏆 Challenge Problems

These require more thought — especially around data types and problem-solving.

Challenge 2.1.11 — Overflow Detector Read two integers A and B (each up to 10^9). Compute their product TWO ways: as an int and as a long long. Print both results. Observe the difference when overflow occurs.

Sample Input: 1000000000 3 Sample Output:

int product: -1294967296
long long product: 3000000000

(The int result is wrong due to overflow; long long is correct.)

💡 Solution (click to reveal)

Approach: Read both numbers as long long, then compute the product both ways — once forcing integer math, once with long long. This demonstrates overflow visually.

#include <bits/stdc++.h>
using namespace std;

int main() {
    long long a, b;
    cin >> a >> b;

    // Cast to int FIRST to force integer overflow
    int int_product = (int)a * (int)b;

    // Long long multiplication — no overflow for values up to 10^9
    long long ll_product = a * b;

    cout << "int product: " << int_product << "\n";
    cout << "long long product: " << ll_product << "\n";

    return 0;
}

Key points:

(int)a * (int)b — both operands are cast to int before multiplication, so the multiplication overflows
a * b where a,b are long long — multiplication is done in long long space, no overflow
The actual output for 10^9 × 3: correct is 3×10^9, but int wraps around because max int ≈ 2.147×10^9 < 3×10^9, so the result overflows to -1294967296
Lesson: Always use long long when multiplying values that could each be up to ~10^5 or larger

Challenge 2.1.12 — USACO-Style Large Multiply You're given two integers N and M (1 ≤ N, M ≤ 10^9). Print their product. (This seems simple, but requires long long.)

Sample Input: 1000000000 1000000000 Sample Output: 1000000000000000000

💡 Solution (click to reveal)

Approach: N and M fit individually in int, but N × M = 10^18 — which doesn't fit in int (max ~2.1×10^9) and barely fits in long long (max ~9.2×10^18). Must use long long.

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    long long n, m;
    cin >> n >> m;

    cout << n * m << "\n";

    return 0;
}

Key points:

Reading into long long variables is the key — cin >> n can handle values up to 9.2×10^18
If you read into int variables: int n, m; cin >> n >> m; cout << n * m; — this overflows silently and gives the wrong answer
In USACO, always check the constraints: if N can be 10^9, and you might multiply N by N, you need long long

Challenge 2.1.13 — Quadrant Problem (USACO 2016 February Bronze) Read two non-zero integers x and y. Determine which quadrant of the coordinate plane the point (x, y) is in:

Quadrant 1: x > 0 and y > 0
Quadrant 2: x < 0 and y > 0
Quadrant 3: x < 0 and y < 0
Quadrant 4: x > 0 and y < 0

Print just the number: 1, 2, 3, or 4.

Sample Input 1: 3 5 → Output: 1 Sample Input 2: -1 2 → Output: 2 Sample Input 3: -4 -7 → Output: 3 Sample Input 4: 8 -3 → Output: 4

💡 Solution (click to reveal)

Approach: Check the signs of x and y. Each combination of positive/negative x and y maps to exactly one quadrant. We use if/else-if chains (covered fully in Chapter 2.2, but straightforward here).

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int x, y;
    cin >> x >> y;

    if (x > 0 && y > 0) {
        cout << 1 << "\n";
    } else if (x < 0 && y > 0) {
        cout << 2 << "\n";
    } else if (x < 0 && y < 0) {
        cout << 3 << "\n";
    } else {  // x > 0 && y < 0
        cout << 4 << "\n";
    }

    return 0;
}

Key points:

The && operator means "AND" — both conditions must be true
Since the problem guarantees x ≠ 0 and y ≠ 0, we don't need to handle those edge cases
The four cases are mutually exclusive (exactly one will be true for any input), so else-if chains work perfectly
We could simplify using a formula, but the explicit if/else is clearer and equally fast

📖 Chapter 2.2 ⏱️ ~60 min read 🎯 Beginner

Chapter 2.2: Control Flow

📝 Prerequisites: Chapter 2.1 (variables, cin/cout, basic arithmetic)

2.2.0 What is "Control Flow"?

So far, every program we wrote ran top to bottom — line 1, line 2, line 3, done. Like reading a book straight through.

But real programs need to make decisions and repeat things. That's what "control flow" means — controlling the flow (order) of execution.

Think of it like a "Choose Your Own Adventure" book:

Sometimes you're told "if you want to fight the dragon, turn to page 47; otherwise turn to page 52"
Sometimes you're told "repeat this section until you escape the dungeon"

C++ gives us exactly this with:

if/else — make decisions based on conditions
for/while loops — repeat a section of code

Here's a visual overview:

Control Flow Overview

In the loop diagram: the program keeps going back to Step 2 until the condition becomes false, then it exits to Step 3.

2.2.1 The `if` Statement

The if statement lets your program make a decision: "if this condition is true, do this thing."

Basic `if`

#include <bits/stdc++.h>
using namespace std;

int main() {
    int score;
    cin >> score;

    if (score >= 90) {
        cout << "Excellent!\n";
    }

    cout << "Done.\n";  // always runs regardless of score
    return 0;
}

If score is 95: prints Excellent! then Done. If score is 80: prints only Done. (the if-block is skipped)

`if` / `else`

int score;
cin >> score;

if (score >= 60) {
    cout << "Pass\n";
} else {
    cout << "Fail\n";
}

The else block runs only when the if condition is false. Exactly one of the two blocks will run.

`if` / `else if` / `else` Chains

When you have multiple conditions to check:

int score;
cin >> score;

if (score >= 90) {
    cout << "A\n";
} else if (score >= 80) {
    cout << "B\n";
} else if (score >= 70) {
    cout << "C\n";
} else if (score >= 60) {
    cout << "D\n";
} else {
    cout << "F\n";
}

C++ checks these conditions in order, from top to bottom, and runs the first one that's true. Once it runs one block, it skips all the remaining else if/else blocks.

So if score = 85:

Is 85 >= 90? No → skip
Is 85 >= 80? Yes → print "B", then jump past all the else-ifs

🤔 Why does this work? When we reach else if (score >= 80), we already know score < 90 (because if it were ≥ 90, the first condition would have caught it). Each else if implicitly assumes all the previous conditions were false.

Comparison Operators

Operator	Meaning	Example
`==`	Equal to	`a == b`
`!=`	Not equal to	`a != b`
`<`	Less than	`a < b`
`>`	Greater than	`a > b`
`<=`	Less than or equal to	`a <= b`
`>=`	Greater than or equal to	`a >= b`

Logical Operators (Combining Conditions)

Operator	Meaning	Example
`&&`	AND — both must be true	`x > 0 && y > 0`
`\|\|`	OR — at least one must be true	`x == 0 \|\| y == 0`
`!`	NOT — flips true to false	`!finished`

int x, y;
cin >> x >> y;

if (x > 0 && y > 0) {
    cout << "Both positive\n";
}

if (x < 0 || y < 0) {
    cout << "At least one is negative\n";
}

bool done = false;
if (!done) {
    cout << "Still working...\n";
}

🐛 Common Bug: `=` vs `==`

This is one of the most common mistakes for beginners (and even experienced programmers!):

int x = 5;

// DANGEROUS BUG:
if (x = 10) {   // This ASSIGNS 10 to x, doesn't compare!
                 // x becomes 10, and since 10 is nonzero, this is always TRUE
    cout << "x is 10\n";  // This ALWAYS runs, even though x started as 5!
}

// CORRECT:
if (x == 10) {  // This COMPARES x with 10
    cout << "x is 10\n";  // Only runs when x actually equals 10
}

The = operator assigns (stores a value). The == operator compares (checks if two values are equal). They look similar but do completely different things.

⚡ Pro Tip: Some programmers write 10 == x instead of x == 10 — if you accidentally type = instead of ==, it becomes 10 = x which is a compile error (you can't assign to a literal). This is called a "Yoda condition."

Nested `if` Statements

You can put if statements inside other if statements:

int age, income;
cin >> age >> income;

if (age >= 18) {
    cout << "Adult\n";
    if (income > 50000) {
        cout << "High income adult\n";
    } else {
        cout << "Standard income adult\n";
    }
} else {
    cout << "Minor\n";
}

Be careful: each else matches the nearest preceding if that doesn't already have an else.

2.2.2 The `while` Loop

A while loop repeats a block of code as long as its condition is true. When the condition becomes false, execution continues after the loop.

while (condition) {
    body (runs over and over)
}

#include <bits/stdc++.h>
using namespace std;

int main() {
    int i = 1;             // 1. Initialize before the loop
    while (i <= 5) {       // 2. Check condition — if false, skip the loop
        cout << i << "\n"; // 3. Run the body
        i++;               // 4. Update — VERY IMPORTANT! Forget this → infinite loop
    }
    // After loop: i = 6, condition 6 <= 5 is false, loop exits
    return 0;
}

Output:

🐛 Common Bug: Infinite Loop

If you forget to update the variable (step 4 above), the condition never becomes false and the loop runs forever!

int i = 1;
while (i <= 5) {
    cout << i << "\n";
    // BUG: forgot i++ — this prints "1" forever!
}

If your program seems stuck, press Ctrl+C to stop it.

When to use `while` vs `for`

Use while when you don't know in advance how many iterations you need
Use for when you do know the count (we'll cover for next)

Classic while use case: read until a condition is met.

// Common USACO pattern: read until end of input
int x;
while (cin >> x) {    // cin >> x returns false when input runs out
    cout << x * 2 << "\n";
}

`do-while` Loop

A do-while loop always runs its body at least once, then checks the condition:

int n;
do {
    cin >> n;
} while (n <= 0);   // keep re-reading until user gives a positive number

This is useful when you want to execute something before checking whether to repeat. It's rare in competitive programming but worth knowing.

2.2.3 The `for` Loop

The for loop is the most used loop in competitive programming. It packages initialization, condition-check, and update into one clean line:

for (initialization; condition; update) {
    body
}

This is equivalent to:

initialization;
while (condition) {
    body
    update;
}

Visual: For Loop Flowchart

For Loop Flowchart

The flowchart above traces the execution: initialization runs once, then the condition is checked before every iteration. When false, the loop exits.

Common `for` Patterns

// Count from 0 to 9 (standard competitive programming pattern)
for (int i = 0; i < 10; i++) {
    cout << i << " ";
}
// Prints: 0 1 2 3 4 5 6 7 8 9

// Count from 1 to n (inclusive)
int n = 5;
for (int i = 1; i <= n; i++) {
    cout << i << " ";
}
// Prints: 1 2 3 4 5

// Count backwards
for (int i = 10; i >= 1; i--) {
    cout << i << " ";
}
// Prints: 10 9 8 7 6 5 4 3 2 1

// Count by steps of 2
for (int i = 0; i <= 10; i += 2) {
    cout << i << " ";
}
// Prints: 0 2 4 6 8 10

🧠 Loop Tracing: Understanding Exactly What Happens

When learning loops, trace through them manually. Here's how:

Code: for (int i = 0; i < 4; i++) cout << i * i << " ";

Loop Trace Example

Practice tracing loops on paper before running them — it builds intuition and helps spot bugs.

The Most Common USACO Loop Pattern

Read N numbers and process each one:

int n;
cin >> n;

for (int i = 0; i < n; i++) {
    int x;
    cin >> x;
    // process x here
    cout << x * 2 << "\n";
}

⚡ Pro Tip: In competitive programming, for (int i = 0; i < n; i++) with 0-based indexing is standard. It matches how arrays are indexed (Chapter 2.3), so everything lines up neatly.

2.2.4 Nested Loops

You can put a loop inside another loop. The inner loop runs completely for each single iteration of the outer loop.

Nested Loop Clock Analogy

// Print a 4x4 multiplication table
for (int i = 1; i <= 4; i++) {         // outer: rows
    for (int j = 1; j <= 4; j++) {     // inner: columns
        cout << i * j << "\t";          // \t = tab character
    }
    cout << "\n";  // newline after each row
}

Output:

1   2   3   4
2   4   6   8
3   6   9   12
4   8   12  16

Tracing the first two rows:

i=1: j=1→print 1, j=2→print 2, j=3→print 3, j=4→print 4, then newline
i=2: j=1→print 2, j=2→print 4, j=3→print 6, j=4→print 8, then newline
...

⚠️ Nested Loop Time Complexity

💡 Why should you care about loop counts? In competitions, your program typically needs to finish within 1-2 seconds. A modern computer can execute roughly 10^8 to 10^9 simple operations per second. So if you can estimate how many times your loop body executes in total, you can determine whether it will exceed the time limit (TLE). This is the core idea behind "time complexity analysis" — we'll study it in greater depth in later chapters.

A single loop of N iterations does N operations. Two nested loops of N do N × N = N² operations.

Loops	Operations	Safe for N ≤	Example
1	N	~10^8	Iterating through an array to compute a sum
2 (nested)	N²	~10^4	Comparing all pairs
3 (nested)	N³	~450	Enumerating all triplets

If N = 1000 and you have two nested loops, that's 10^6 operations — fine. But if N = 100,000, that's 10^10 — too slow!

🧠 Quick Rule of Thumb: After seeing the range of N, use the table above to work backwards and determine the maximum number of nested loops you can afford. For example, N ≤ 10^5 → you can only use O(N) or O(N log N) algorithms; N ≤ 5000 → O(N²) is acceptable. This technique is extremely useful in USACO!

2.2.5 Switch Statements

When you have a variable and want to check many specific values, switch is cleaner than a long chain of if/else if:

int day;
cin >> day;

switch (day) {
    case 1:
        cout << "Monday\n";
        break;   // IMPORTANT: break exits the switch
    case 2:
        cout << "Tuesday\n";
        break;
    case 3:
        cout << "Wednesday\n";
        break;
    case 4:
        cout << "Thursday\n";
        break;
    case 5:
        cout << "Friday\n";
        break;
    case 6:
    case 7:
        cout << "Weekend!\n";  // cases 6 and 7 share this code
        break;
    default:
        cout << "Invalid day\n";  // runs if no case matches
}

When to use `switch` vs `if-else`

Use `switch` when...	Use `if-else` when...
Checking one variable against exact integer/char values	Comparing ranges (x > 10, x < 5)
3+ specific values to check	Only 1-2 conditions
Cases are mutually exclusive	Complex boolean logic

🐛 Common Bug: Forgetting break — Without break, execution "falls through" to the next case!

int x = 2;
switch (x) {
    case 1:
        cout << "one\n";
    case 2:
        cout << "two\n";   // this runs
    case 3:
        cout << "three\n"; // ALSO runs (fall-through!) because no break after case 2
}
// Output: two\nthree\n  (surprising!)

2.2.6 `break` and `continue`

`break` — Exit the Loop Immediately

// Find the first number divisible by 7 between 1 and 100
for (int i = 1; i <= 100; i++) {
    if (i % 7 == 0) {
        cout << "First multiple of 7: " << i << "\n";  // prints 7
        break;  // stop searching — we found it
    }
}

// Print all numbers 1 to 10 except multiples of 3
for (int i = 1; i <= 10; i++) {
    if (i % 3 == 0) {
        continue;  // skip the rest of this iteration, go to i++
    }
    cout << i << " ";
}
// Output: 1 2 4 5 7 8 10

`break` in Nested Loops

break only exits the innermost loop. To exit multiple levels, use a flag variable:

bool found = false;
int target = 25;

for (int i = 0; i < 10 && !found; i++) {    // outer loop also checks !found
    for (int j = 0; j < 10; j++) {
        if (i * j == target) {
            cout << i << " * " << j << " = " << target << "\n";
            found = true;
            break;   // exits inner loop; outer loop exits too because of !found
        }
    }
}

2.2.7 Classic Loop Patterns in Competitive Programming

These patterns appear in nearly every USACO solution. Learn them cold.

Pattern 1: Read N Numbers, Compute Sum

int n;
cin >> n;

long long sum = 0;
for (int i = 0; i < n; i++) {
    int x;
    cin >> x;
    sum += x;
}
cout << sum << "\n";

Complexity Analysis:

Time: O(N) — iterate through N numbers, each processed in O(1)

Space: O(1) — only one accumulator variable sum

Pattern 2: Find Maximum (and Minimum) in a List

int n;
cin >> n;

int maxVal, minVal;
cin >> maxVal;    // read first element
minVal = maxVal;  // initialize both max and min to first element

for (int i = 1; i < n; i++) {   // start from 2nd element (index 1)
    int x;
    cin >> x;
    if (x > maxVal) maxVal = x;
    if (x < minVal) minVal = x;
}

cout << "Max: " << maxVal << "\n";
cout << "Min: " << minVal << "\n";

Complexity Analysis:

Time: O(N) — iterate through N numbers, each comparison in O(1)

Space: O(1) — only two variables maxVal and minVal

🤔 Why initialize to the first element? Don't initialize max to 0! What if all numbers are negative? Initializing to the first element guarantees we start with a real value from the input.

Pattern 3: Count How Many Satisfy a Condition

int n;
cin >> n;

int count = 0;
for (int i = 0; i < n; i++) {
    int x;
    cin >> x;
    if (x % 2 == 0) {   // condition: even number
        count++;
    }
}
cout << "Even count: " << count << "\n";

Pattern 4: Print a Star Triangle Pattern

int n;
cin >> n;

for (int row = 1; row <= n; row++) {     // row goes from 1 to n
    for (int col = 1; col <= row; col++) { // print `row` stars per row
        cout << "*";
    }
    cout << "\n";  // newline after each row
}

For n=4, output:

*
**
***
****

Pattern 5: Compute Sum of Digits

int n;
cin >> n;

int digitSum = 0;
while (n > 0) {
    digitSum += n % 10;  // last digit
    n /= 10;             // remove last digit
}
cout << digitSum << "\n";

Tracing for n = 12345:

n=12345: digitSum += 5, n becomes 1234
n=1234:  digitSum += 4, n becomes 123
n=123:   digitSum += 3, n becomes 12
n=12:    digitSum += 2, n becomes 1
n=1:     digitSum += 1, n becomes 0
n=0: loop exits. digitSum = 15 ✓

2.2.8 Complete Example: USACO-Style Problem

Problem: You have N cows. Each cow has a milk production rating. Find the highest-rated cow's rating and count how many cows produce above-average milk.

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;

    // We need to store all values to compare against the average
    // (We'll learn arrays/vectors in Chapter 2.3 — for now use two passes)

    // First pass: find sum and max
    long long sum = 0;
    int maxMilk = 0;
    vector<int> milk(n);   // store all values (preview of Chapter 2.3)

    for (int i = 0; i < n; i++) {
        cin >> milk[i];
        sum += milk[i];
        if (milk[i] > maxMilk) maxMilk = milk[i];
    }

    double avg = (double)sum / n;

    // Second pass: count above-average
    int aboveAvg = 0;
    for (int i = 0; i < n; i++) {
        if (milk[i] > avg) aboveAvg++;
    }

    cout << "Maximum: " << maxMilk << "\n";
    cout << "Above average: " << aboveAvg << "\n";

    return 0;
}

Sample Input:

5
10 20 30 40 50

Sample Output:

Maximum: 50
Above average: 2

(Average is 30; cows with 40 and 50 are above average → 2 cows)

Complexity Analysis:

Time: O(N) — two passes (read + count), each O(N), total O(2N) = O(N)

Space: O(N) — uses vector<int> milk(n) to store all data

⚠️ Common Mistakes in Chapter 2.2

#	Mistake	Example	Why It's Wrong	Fix
1	Confusing `=` with `==`	`if (x = 10)`	`=` is assignment, not comparison; result is always true	Use `==` for comparison
2	Forgetting `i++` causing infinite loop	`while (i < n) { ... }` without `i++`	Condition is always true, program hangs	Ensure the loop variable is updated
3	Forgetting `break` in switch	`case 2: cout << "two";` without break	Execution "falls through" to the next case	Add `break;` at the end of each case
4	Off-by-one error	`for (int i = 0; i <= n; i++)` should be `< n`	Loops one extra time, may go out of bounds or overcount	Carefully verify `<` vs `<=`
5	Initializing max to 0	`int maxVal = 0;` when all numbers are negative	0 is larger than all inputs, result is wrong	Initialize to the first element or `INT_MIN`
6	Reusing the same variable name in nested loops	Outer `for (int i...)` and inner `for (int i...)`	Inner `i` shadows outer `i`, causing unexpected outer loop behavior	Use different variable names for inner and outer loops (e.g., `i` and `j`)

Chapter Summary

📌 Key Takeaways

Concept	Syntax	When to Use	Why It Matters
`if`	`if (cond) { ... }`	Execute when a condition is true	Foundation of program decisions; used in almost every problem
`if/else`	`if (...) {...} else {...}`	Choose between two options	Handles yes/no type decisions
`if/else if/else`	chained	Choose among multiple options	Grading scales, classification scenarios
`while`	`while (cond) {...}`	Repeat when count is unknown	Reading until end of input, simulating processes
`for`	`for (int i=0; i<n; i++) {...}`	Repeat when count is known	Most commonly used loop in competitive programming
Nested loops	Loop inside loop	Need to iterate over all pairs	Watch out for O(N²) complexity limits
`break`	`break;`	Exit immediately after finding target	Early termination saves time
`continue`	`continue;`	Skip current iteration	Filter out elements that don't need processing
`switch`	`switch(x) { case 1: ... }`	Check one variable against multiple exact values	Cleaner code than long if-else chains
`&&` / `\|\|` / `!`	logical operators	Combine multiple conditions	Building blocks for complex decisions

🧩 Five Classic Loop Patterns Quick Reference

Pattern	Purpose	Complexity	Section
Read N + Sum	Read N numbers and compute their sum	O(N)	2.2.7 Pattern 1
Find Max/Min	Find the maximum/minimum value	O(N)	2.2.7 Pattern 2
Count Condition	Count how many elements satisfy a condition	O(N)	2.2.7 Pattern 3
Star Triangle	Print patterns using nested loops	O(N²)	2.2.7 Pattern 4
Digit Sum	Extract and sum individual digits	O(log₁₀N)	2.2.7 Pattern 5

❓ FAQ (Frequently Asked Questions)

Q1: Can for and while replace each other? When should I use which?

A: Yes, any for loop can be rewritten as a while loop, and vice versa. Rule of thumb: if you know the number of iterations (e.g., "loop N times"), use for; if you don't know the count (e.g., "read until end of input"), use while. In competitions, for is used about 90% of the time.

Q2: How many levels deep can nested loops go? Is there a limit?

A: Syntactically there's no limit, but in practice you should be cautious beyond 3 levels. Two nested loops give O(N²), three give O(N³). When N ≥ 1000, three nested loops can easily time out. If you find yourself needing more than 3 levels of nesting, it usually means you need a more efficient algorithm (covered in later chapters).

Q3: break only exits the innermost loop. How do I break out of multiple nested loops at once?

A: Two common approaches: ① Use a bool found = false flag variable, and have the outer loop also check !found; ② Wrap the nested loops in a function and use return to exit directly. Approach ① is more common — see Section 2.2.6 for a complete example.

Q4: Which is faster, switch or if-else if?

A: For a small number of cases (< 10), performance is virtually identical. The advantage of switch is code readability, not speed. In competitions, you can freely choose either. If conditions involve range comparisons (like x > 10), you must use if-else.

Q5: My program produces correct output, but after submission it shows TLE (Time Limit Exceeded). What should I do?

A: Step one: estimate your algorithm's complexity. Look at the range of N → use the "nested loop complexity table" from this chapter to estimate total operations → if it exceeds 10^8, you need to optimize. Common optimization strategies include: reducing the number of loop levels, replacing brute-force search with sorting + binary search (Chapter 3.3), and replacing repeated summation with prefix sums (Chapter 3.2).

🔗 Connections to Later Chapters

Chapter 2.3 (Functions & Arrays) will let you encapsulate the loop patterns from this chapter into functions, and use arrays to store collections of data
Chapter 3.2 (Arrays & Prefix Sums) will teach you how to optimize O(N²) range sum queries to O(N) preprocessing + O(1) per query — one of the solutions for when "nested loops are too slow"
Chapter 3.3 (Sorting & Searching) will teach you binary search, optimizing the O(N) linear search from this chapter to O(log N)
The five classic loop patterns learned in this chapter (summation, finding max/min, counting, nested iteration, digit processing) are the foundational building blocks for all algorithms in this book
Nested loop complexity analysis is the first step toward understanding time complexity (a theme throughout the entire book)

Practice Problems

🌡️ Warm-Up Problems

Warm-up 2.2.1 — Count to Ten Print the numbers 1 through 10, each on its own line. Use a for loop.

💡 Solution (click to reveal)

Approach: A for loop from 1 to 10 (inclusive).

#include <bits/stdc++.h>
using namespace std;

int main() {
    for (int i = 1; i <= 10; i++) {
        cout << i << "\n";
    }
    return 0;
}

Key points:

i <= 10 (not i < 10) because we want to include 10
Alternatively: for (int i = 1; i < 11; i++) — same result

Warm-up 2.2.2 — Even Numbers Print all even numbers from 2 to 20, each on its own line.

💡 Solution (click to reveal)

Approach: Two options — loop by 2s, or loop every number and check if even.

#include <bits/stdc++.h>
using namespace std;

int main() {
    // Option 1: step by 2
    for (int i = 2; i <= 20; i += 2) {
        cout << i << "\n";
    }
    return 0;
}

Key points:

i += 2 increments by 2 each time instead of the usual 1
Alternative: for (int i = 1; i <= 20; i++) { if (i % 2 == 0) cout << i << "\n"; }

Warm-up 2.2.3 — Sign Check Read one integer. Print Positive if it's > 0, Negative if it's < 0, Zero if it's 0.

Sample Input: -5 → Output: Negative

💡 Solution (click to reveal)

Approach: Three-way if/else if/else to cover all cases.

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;

    if (n > 0) {
        cout << "Positive\n";
    } else if (n < 0) {
        cout << "Negative\n";
    } else {
        cout << "Zero\n";
    }

    return 0;
}

Key points:

The else clause at the end catches exactly n == 0 (since the two conditions above cover n>0 and n<0)

Warm-up 2.2.4 — Multiplication Table of 3 Print the first 10 multiples of 3 (i.e., 3, 6, 9, ..., 30), each on its own line.

💡 Solution (click to reveal)

Approach: Loop from 1 to 10, print i*3 each time.

#include <bits/stdc++.h>
using namespace std;

int main() {
    for (int i = 1; i <= 10; i++) {
        cout << i * 3 << "\n";
    }
    return 0;
}

Key points:

Alternative: for (int i = 3; i <= 30; i += 3) — same result

Warm-up 2.2.5 — Sum of Five Read exactly 5 integers (on separate lines or the same line). Print their sum.

Sample Input: 3 7 2 8 5 → Output: 25

💡 Solution (click to reveal)

Approach: Read 5 times in a loop, accumulate sum.

#include <bits/stdc++.h>
using namespace std;

int main() {
    long long sum = 0;
    for (int i = 0; i < 5; i++) {
        int x;
        cin >> x;
        sum += x;
    }
    cout << sum << "\n";
    return 0;
}

Key points:

sum should be long long in case the integers are large
We read exactly 5 times since the problem says "exactly 5 integers"

🏋️ Core Practice Problems

Problem 2.2.6 — FizzBuzz The classic programming challenge: print numbers from 1 to 100. But:

If the number is divisible by 3, print Fizz instead
If divisible by 5, print Buzz instead
If divisible by both 3 and 5, print FizzBuzz instead

First few lines of output:

1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz

💡 Solution (click to reveal)

Approach: Loop 1 to 100. For each number, check divisibility — check the combined case (divisible by both) FIRST, otherwise that case would be caught by the Fizz or Buzz case alone.

#include <bits/stdc++.h>
using namespace std;

int main() {
    for (int i = 1; i <= 100; i++) {
        if (i % 3 == 0 && i % 5 == 0) {
            cout << "FizzBuzz\n";
        } else if (i % 3 == 0) {
            cout << "Fizz\n";
        } else if (i % 5 == 0) {
            cout << "Buzz\n";
        } else {
            cout << i << "\n";
        }
    }
    return 0;
}

Key points:

Check i % 3 == 0 && i % 5 == 0 FIRST — if you check i % 3 == 0 first, then 15 would print "Fizz" and never reach the FizzBuzz case
A number divisible by both 3 and 5 is divisible by 15: i % 15 == 0 also works

Problem 2.2.7 — Minimum of N Read N (1 ≤ N ≤ 1000), then read N integers. Print the minimum value.

Sample Input:

5
8 3 7 1 9

Sample Output: 1

💡 Solution (click to reveal)

Approach: Initialize min to the first value read, then update whenever we see something smaller.

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;

    int first;
    cin >> first;
    int minVal = first;  // initialize to first element

    for (int i = 1; i < n; i++) {   // read remaining n-1 elements
        int x;
        cin >> x;
        if (x < minVal) {
            minVal = x;
        }
    }

    cout << minVal << "\n";
    return 0;
}

Key points:

Initialize minVal to the first element read (not 0 or INT_MAX), then handle remaining elements in the loop
Alternatively, use INT_MAX as the initial value: int minVal = INT_MAX; — this is guaranteed to be larger than any int, so the first element will always update it

Problem 2.2.8 — Count Positives Read N (1 ≤ N ≤ 1000), then read N integers. Print how many of them are strictly positive (> 0).

Sample Input:

6
3 -1 0 5 -2 7

Sample Output: 3

💡 Solution (click to reveal)

Approach: Maintain a counter, increment when the condition (x > 0) is met.

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;

    int count = 0;
    for (int i = 0; i < n; i++) {
        int x;
        cin >> x;
        if (x > 0) {
            count++;
        }
    }

    cout << count << "\n";
    return 0;
}

Key points:

count starts at 0 and increments only when x > 0
0 is NOT positive (not negative either — it's zero), so x > 0 correctly excludes it

Problem 2.2.9 — Star Triangle Read N. Print a right triangle of * characters with N rows, where row i has i stars.

Sample Input: 4

Sample Output:

*
**
***
****

💡 Solution (click to reveal)

Approach: Nested loops — outer loop over rows, inner loop prints the right number of stars.

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;

    for (int row = 1; row <= n; row++) {
        for (int star = 1; star <= row; star++) {
            cout << "*";
        }
        cout << "\n";
    }

    return 0;
}

Key points:

Row 1 has 1 star, row 2 has 2 stars, ..., row N has N stars
The inner loop runs exactly row times for each value of row
Alternative using string: cout << string(row, '*') << "\n"; — creates a string of row copies of *

Problem 2.2.10 — Sum of Digits Read a positive integer N (1 ≤ N ≤ 10^9). Print the sum of its digits.

Sample Input: 12345 → Sample Output: 15 Sample Input: 9999 → Sample Output: 36

💡 Solution (click to reveal)

Approach: Use the modulo trick. N % 10 gives the last digit. N / 10 removes the last digit. Repeat until N becomes 0.

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;

    int digitSum = 0;
    while (n > 0) {
        digitSum += n % 10;  // add last digit
        n /= 10;             // remove last digit
    }

    cout << digitSum << "\n";
    return 0;
}

Key points:

n % 10 extracts the ones digit (e.g., 12345 % 10 = 5)
n /= 10 is integer division, removing the last digit (e.g., 12345 / 10 = 1234)
The loop continues until n = 0 (all digits extracted)
Trace: 12345 → +5 → 1234 → +4 → 123 → +3 → 12 → +2 → 1 → +1 → 0. Sum = 15 ✓

🏆 Challenge Problems

Challenge 2.2.11 — Collatz Sequence The Collatz sequence starting from N works as follows:

If N is even: next = N / 2
If N is odd: next = N * 3 + 1
Stop when N = 1

Read N. Print the entire sequence (including N and 1). Also print how many steps it takes to reach 1.

Sample Input: 6 Sample Output:

6 3 10 5 16 8 4 2 1
Steps: 8

💡 Solution (click to reveal)

Approach: Use a while loop. Keep applying the rule until we reach 1. Count steps.

#include <bits/stdc++.h>
using namespace std;

int main() {
    long long n;
    cin >> n;

    int steps = 0;
    cout << n;         // print starting number

    while (n != 1) {
        if (n % 2 == 0) {
            n = n / 2;
        } else {
            n = n * 3 + 1;
        }
        cout << " " << n;  // print each next number
        steps++;
    }
    cout << "\n";
    cout << "Steps: " << steps << "\n";

    return 0;
}

Key points:

Use long long — even starting from small numbers, the sequence can reach large intermediate values (e.g., N=27 reaches 9232!)
The Collatz conjecture says this always reaches 1, but it's not proven for all N
We print N before the loop (as the starting value), then print each new value after each step

Challenge 2.2.12 — Prime Check Read N (2 ≤ N ≤ 10^6). Print prime if N is prime, composite otherwise.

A number is prime if it has no divisors other than 1 and itself.

Sample Input: 17 → Output: prime Sample Input: 100 → Output: composite

💡 Solution (click to reveal)

Approach: Trial division — check if any number from 2 to √N divides N. If none do, N is prime. We only need to check up to √N because if N = a×b and a > √N, then b < √N (so we would have found b already).

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;

    bool isPrime = true;

    if (n < 2) {
        isPrime = false;
    } else {
        // Check divisors from 2 to sqrt(n)
        for (int i = 2; (long long)i * i <= n; i++) {
            if (n % i == 0) {
                isPrime = false;
                break;  // found a divisor, no need to continue
            }
        }
    }

    cout << (isPrime ? "prime" : "composite") << "\n";
    return 0;
}

Key points:

We check i * i <= n instead of i <= sqrt(n) to avoid floating-point issues (and it's slightly faster)
The (long long)i * i cast prevents overflow when i is large (e.g., i = 1000000, i*i = 10^12)
break exits the loop early as soon as we find any divisor — no need to keep checking
Time complexity: O(√N), so this handles N up to 10^6 easily (√10^6 = 1000 iterations)

Challenge 2.2.13 — Highest Rated Cow Read N (1 ≤ N ≤ 1000), then read N pairs of (cow name, rating). Find and print the name of the cow with the highest rating.

Sample Input:

4
Bessie 95
Elsie 82
Moo 95
Daisy 88

Sample Output: Bessie (If there's a tie, print the name of the first one that appeared.)

💡 Solution (click to reveal)

Approach: Track the best rating and name seen so far. Update whenever we see a strictly higher rating.

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;

    string bestName;
    int bestRating = -1;  // initialize to -1 so any real rating beats it

    for (int i = 0; i < n; i++) {
        string name;
        int rating;
        cin >> name >> rating;

        if (rating > bestRating) {
            bestRating = rating;
            bestName = name;
        }
    }

    cout << bestName << "\n";
    return 0;
}

Key points:

Initialize bestRating = -1 (or use INT_MIN) so the first cow always becomes the new best
We use > (strictly greater), not >=, so in case of a tie, we keep the first one seen (the problem asks for first)
Mixing cin >> name >> rating reads a string and then an int from the same line — this works perfectly

📖 Chapter 2.3 ⏱️ ~65 min read 🎯 Beginner

Chapter 2.3: Functions & Arrays

📝 Prerequisites: Chapters 2.1 & 2.2 (variables, loops, if/else)

As your programs grow larger, you need ways to organize code (functions) and store collections of data (arrays and vectors). This chapter introduces both — two of the most important tools in competitive programming.

2.3.1 Functions — What and Why

🍕 The Recipe Analogy

A function is like a pizza recipe:

- Input (parameters):   ingredients — flour, cheese, tomatoes
- Process (body):       the cooking steps
- Output (return value): the finished pizza

Just like you can make many pizzas using one recipe,
you can call a function many times with different inputs.

pizza("thin crust", "pepperoni")  → one pizza
pizza("thick crust", "mushroom")  → another pizza

Without functions, if you need to compute "is this number prime?" in five different places, you'd copy-paste the same 10 lines of code five times. Then if you find a bug, you have to fix it in all five places!

When to Write a Function

Use a function when:

You repeat the same logic 3+ times in your program
A block of code does one clear, named thing (e.g., "check if prime", "compute distance")
Your main is getting too long to read comfortably

Basic Function Syntax

returnType functionName(parameter1Type param1, parameter2Type param2, ...) {
    // function body
    return value;  // must match returnType; omit for void functions
}

Your First Functions

#include <bits/stdc++.h>
using namespace std;

// ---- FUNCTION DEFINITIONS (must come BEFORE they are used, or use prototypes) ----

// Takes one integer, returns its square
int square(int x) {
    return x * x;
}

// Takes two integers, returns the larger one
int maxOf(int a, int b) {
    if (a > b) return a;
    else return b;
}

// void function: does something but doesn't return a value
void printSeparator() {
    cout << "====================\n";
}

// ---- MAIN ----
int main() {
    cout << square(5) << "\n";       // calls square with x=5, prints 25
    cout << square(12) << "\n";      // calls square with x=12, prints 144

    cout << maxOf(7, 3) << "\n";     // prints 7
    cout << maxOf(-5, -2) << "\n";   // prints -2

    printSeparator();                // prints the divider line
    cout << "Done!\n";
    printSeparator();

    return 0;
}

🤔 Why do functions come before `main`?

C++ reads your file top-to-bottom. When it sees a call like square(5), it needs to already know what square means. If square is defined after main, the compiler will say "I've never heard of square!"

Solution 1: Define all functions above main (simplest approach).

Solution 2: Use a function prototype — a forward declaration telling the compiler "this function exists, I'll define it later":

#include <bits/stdc++.h>
using namespace std;

int square(int x);       // prototype — just the signature, no body
int maxOf(int a, int b); // prototype

int main() {
    cout << square(5) << "\n";   // OK! compiler knows square exists
    return 0;
}

// Full definitions can come after main
int square(int x) {
    return x * x;
}

int maxOf(int a, int b) {
    return (a > b) ? a : b;
}

2.3.2 Void Functions vs Return Functions

`void` functions: Do something, return nothing

// void functions perform an action
void printLine(int n) {
    for (int i = 0; i < n; i++) {
        cout << "-";
    }
    cout << "\n";
}

// Calling a void function — just call it, don't try to capture a value
printLine(10);    // prints: ----------
printLine(20);    // prints: --------------------

Return functions: Compute and give back a value

// Returns the absolute value of x
int absoluteValue(int x) {
    if (x < 0) return -x;
    return x;
}

// Calling a return function — capture the result in a variable or use it directly
int result = absoluteValue(-7);
cout << result << "\n";           // 7
cout << absoluteValue(-3) << "\n"; // 3 (used directly)

Multiple `return` statements

A function can have multiple return statements — execution stops at the first one reached:

string classify(int n) {
    if (n < 0) return "negative";   // exits here if n < 0
    if (n == 0) return "zero";      // exits here if n == 0
    return "positive";              // exits here otherwise
}

cout << classify(-5) << "\n";   // negative
cout << classify(0) << "\n";    // zero
cout << classify(3) << "\n";    // positive

2.3.3 Pass by Value vs Pass by Reference

When you pass a variable to a function, there are two ways it can happen. Understanding this is crucial.

Pass by Value (default): Function gets a COPY

void addOne_byValue(int x) {
    x++;  // modifies the LOCAL COPY — original is unchanged
    cout << "Inside function: " << x << "\n";  // 6
}

int main() {
    int n = 5;
    addOne_byValue(n);
    cout << "After function: " << n << "\n";   // still 5! original unchanged
    return 0;
}

Think of it like a photocopy: the function works on a photocopy of the paper. Changes to the photocopy don't affect the original.

Pass by Reference (`&`): Function works on the ORIGINAL

void addOne_byRef(int& x) {  // & means "reference to the original"
    x++;  // modifies the ORIGINAL variable directly
    cout << "Inside function: " << x << "\n";  // 6
}

int main() {
    int n = 5;
    addOne_byRef(n);
    cout << "After function: " << n << "\n";   // now 6! original was changed
    return 0;
}

When to use each

Use pass by value when...	Use pass by reference when...
Function shouldn't modify original	Function needs to modify original
Small types (int, double, char)	Returning multiple values
You want safety (no side effects)	Large types (avoiding expensive copy)

Multiple Return Values via References

A C++ function can only return one value. But you can "return" multiple values through reference parameters:

// Computes both quotient AND remainder simultaneously
void divmod(int a, int b, int& quotient, int& remainder) {
    quotient = a / b;
    remainder = a % b;
}

int main() {
    int q, r;
    divmod(17, 5, q, r);  // q and r are modified by the function
    cout << "17 / 5 = " << q << " remainder " << r << "\n";
    // prints: 17 / 5 = 3 remainder 2
    return 0;
}

2.3.4 Recursion

A recursive function is one that calls itself. It's perfect for problems that break down into smaller versions of the same problem.

Classic Example: Factorial

5! = 5 × 4 × 3 × 2 × 1 = 120
   = 5 × (4!)              ← same problem, smaller input!

💡 Three-Step Recursive Thinking:

Find "self-similarity": Can the original problem be broken into smaller problems of the same type? 5! = 5 × 4!, and 4! and 5! are the same type ✓

Identify the base case: What is the smallest case? 0! = 1, cannot be broken down further

Write the inductive step: n! = n × (n-1)!, call yourself with smaller input

This thinking process will be used repeatedly in Graph Algorithms (Chapter 5.1) and Dynamic Programming (Chapters 6.1–6.3).

int factorial(int n) {
    if (n == 0) return 1;            // BASE CASE: stop recursing
    return n * factorial(n - 1);    // RECURSIVE CASE: reduce to smaller problem
}

Tracing factorial(4):

factorial(4)
= 4 * factorial(3)
= 4 * (3 * factorial(2))
= 4 * (3 * (2 * factorial(1)))
= 4 * (3 * (2 * (1 * factorial(0))))
= 4 * (3 * (2 * (1 * 1)))   ← base case!
= 4 * (3 * (2 * 1))
= 4 * (3 * 2)
= 4 * 6
= 24  ✓

Every recursive function needs:

A base case — stops the recursion (prevents infinite recursion)
A recursive case — calls itself with a smaller input

🐛 Common Bug: Forgetting the base case → infinite recursion → "Stack Overflow" crash!

2.3.5 Arrays — Fixed Collections

🏠 The Mailbox Analogy

An array is like a row of mailboxes on a street:
- All mailboxes are the same size (same type)
- Each has a number on the door (the index, starting from 0)
- You can go directly to any mailbox by its number

Array Index Visual

Visual: Array Memory Layout

Array Memory Layout

Arrays are stored as consecutive blocks of memory. Each element sits right next to the previous one, allowing O(1) random access.

Array Basics

#include <bits/stdc++.h>
using namespace std;

int main() {
    // Declare an array of 5 integers (elements are uninitialized — garbage values!)
    int arr[5];

    // Assign values one by one
    arr[0] = 10;
    arr[1] = 20;
    arr[2] = 30;
    arr[3] = 40;
    arr[4] = 50;

    // Declare AND initialize at the same time
    int nums[5] = {1, 2, 3, 4, 5};

    // Initialize all elements to zero
    int zeros[100] = {};          // all 100 elements = 0
    int zeros2[100];
    fill(zeros2, zeros2 + 100, 0); // another way

    // Access and print
    cout << arr[2] << "\n";       // 30

    // Loop through the array
    for (int i = 0; i < 5; i++) {
        cout << nums[i] << " ";   // 1 2 3 4 5
    }
    cout << "\n";

    return 0;
}

🐛 The Off-By-One Error — The #1 Array Bug

Arrays are 0-indexed: if you declare int arr[5], valid indices are 0, 1, 2, 3, 4. There is NO arr[5]!

int arr[5] = {10, 20, 30, 40, 50};

// WRONG: loop goes from i=0 to i=5 inclusive — index 5 doesn't exist!
for (int i = 0; i <= 5; i++) {   // BUG: <= 5 should be < 5
    cout << arr[i];               // CRASH or garbage value when i=5
}

// CORRECT: loop from i=0 to i=4 (i < 5 ensures i never reaches 5)
for (int i = 0; i < 5; i++) {    // i goes: 0, 1, 2, 3, 4 ✓
    cout << arr[i];               // always valid
}

This is called an "off-by-one error" — going one element past the end. It's the single most common array bug in competitive programming.

🤔 Why start at 0? C++ inherited this from C, which was designed close to hardware. The index is actually an offset from the start of the array. The first element is at offset 0 (no offset from the beginning).

Global Arrays for Large Sizes

The local variables inside main live on the "stack," which has limited space (~1-8 MB). For competitive programming with N up to 10^6, you need global arrays (live in a different memory area, much larger):

#include <bits/stdc++.h>
using namespace std;

const int MAXN = 1000001;  // max size + 1 (common convention)
int arr[MAXN];              // declared globally — safe for large sizes
// Global arrays are automatically initialized to 0!

int main() {
    int n;
    cin >> n;
    for (int i = 0; i < n; i++) {
        cin >> arr[i];
    }
    return 0;
}

⚡ Pro Tip: Global arrays are initialized to 0 automatically. Local arrays are NOT — they contain garbage values until you assign them!

2.3.6 Common Array Algorithms

Find Sum, Max, Min

int n;
cin >> n;

vector<int> arr(n);    // we'll learn vectors soon; this works like an array
for (int i = 0; i < n; i++) cin >> arr[i];

// Sum
long long sum = 0;
for (int i = 0; i < n; i++) sum += arr[i];
cout << "Sum: " << sum << "\n";

// Max (initialize to first element!)
int maxVal = arr[0];
for (int i = 1; i < n; i++) {
    if (arr[i] > maxVal) maxVal = arr[i];
}
cout << "Max: " << maxVal << "\n";

// Min (same idea)
int minVal = arr[0];
for (int i = 1; i < n; i++) {
    minVal = min(minVal, arr[i]);  // min() is a built-in function
}
cout << "Min: " << minVal << "\n";

Complexity Analysis:

Time: O(N) — each algorithm only needs one pass through the array

Space: O(1) — only a few extra variables (not counting the input array itself)

Reverse an Array

int arr[] = {1, 2, 3, 4, 5};
int n = 5;

// Swap elements from both ends, moving toward the middle
for (int i = 0, j = n - 1; i < j; i++, j--) {
    swap(arr[i], arr[j]);  // swap() is a built-in function
}
// arr is now {5, 4, 3, 2, 1}

Complexity Analysis:

Time: O(N) — each pair of elements is swapped once, N/2 swaps total

Space: O(1) — in-place swap, no extra array needed

Two-Dimensional Arrays

A 2D array is like a table or grid. Perfect for maps, grids, matrices:

int grid[3][4];  // 3 rows, 4 columns

// Fill with i * 10 + j
for (int r = 0; r < 3; r++) {
    for (int c = 0; c < 4; c++) {
        grid[r][c] = r * 10 + c;
    }
}

// Print
for (int r = 0; r < 3; r++) {
    for (int c = 0; c < 4; c++) {
        cout << grid[r][c] << "\t";
    }
    cout << "\n";
}

Output:

0   1   2   3
10  11  12  13
20  21  22  23

2.3.7 Vectors — Dynamic Arrays

Arrays have a major limitation: their size must be known at compile time (or must be declared large enough in advance). Vectors solve this — they can grow and shrink as needed while your program is running.

Array vs Vector Comparison

Feature	Array	Vector
Size	Fixed at compile time	Can grow/shrink at runtime
Read N elements	Must hardcode or use `MAXN`	`push_back(x)` works naturally
Memory location	Stack (fast, limited)	Heap (slightly slower, unlimited)
Syntax	`int arr[5]`	`vector<int> v(5)`
Preferred in competitive programming	For fixed-size, simple cases	For most problems

Vector Basics

#include <bits/stdc++.h>
using namespace std;

int main() {
    // Create an empty vector
    vector<int> v;

    // Add elements to the back with push_back
    v.push_back(10);    // v = [10]
    v.push_back(20);    // v = [10, 20]
    v.push_back(30);    // v = [10, 20, 30]

    // Access by index (same as arrays, 0-indexed)
    cout << v[0] << "\n";     // 10
    cout << v[1] << "\n";     // 20

    // Useful functions
    cout << v.size() << "\n"; // 3 (number of elements)
    cout << v.front() << "\n"; // 10 (first element)
    cout << v.back() << "\n";  // 30 (last element)
    cout << v.empty() << "\n"; // 0 (false — not empty)

    // Remove last element
    v.pop_back();   // v = [10, 20]

    // Clear all elements
    v.clear();      // v = []
    cout << v.empty() << "\n"; // 1 (true — now empty)

    return 0;
}

Creating Vectors With Initial Values

vector<int> zeros(10, 0);       // ten 0s: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
vector<int> ones(5, 1);         // five 1s: [1, 1, 1, 1, 1]
vector<int> primes = {2, 3, 5, 7, 11};  // initialized from list
vector<int> empty;              // empty vector

Iterating Over a Vector

vector<int> v = {10, 20, 30, 40, 50};

// Method 1: index-based (like arrays)
for (int i = 0; i < (int)v.size(); i++) {
    cout << v[i] << " ";
}
cout << "\n";

// Method 2: range-based for loop (cleaner, preferred)
for (int x : v) {
    cout << x << " ";
}
cout << "\n";

// Method 3: range-based with reference (use when modifying)
for (int& x : v) {
    x *= 2;  // doubles each element in-place
}

🤔 Why (int)v.size() in the index-based loop? v.size() returns an unsigned integer. If you compare int i with an unsigned value, C++ can behave unexpectedly (especially if i goes negative). Casting to (int) is the safe habit.

The Standard USACO Pattern with Vectors

int n;
cin >> n;

vector<int> arr(n);         // create vector of size n
for (int i = 0; i < n; i++) {
    cin >> arr[i];          // read into each position
}

// Now process arr...
sort(arr.begin(), arr.end());  // sort ascending

2D Vectors

int rows = 3, cols = 4;
vector<vector<int>> grid(rows, vector<int>(cols, 0));  // 3×4 grid of 0s

// Access: grid[r][c]
grid[1][2] = 42;
cout << grid[1][2] << "\n";  // 42

2.3.8 Passing Arrays and Vectors to Functions

Arrays

When you pass an array to a function, the function receives a pointer to the first element. Changes inside the function affect the original:

void fillSquares(int arr[], int n) {  // arr[] syntax for array parameter
    for (int i = 0; i < n; i++) {
        arr[i] = i * i;   // modifies the original!
    }
}

int main() {
    int arr[5] = {0};
    fillSquares(arr, 5);
    // arr is now {0, 1, 4, 9, 16}
    for (int i = 0; i < 5; i++) cout << arr[i] << " ";
    cout << "\n";
    return 0;
}

Vectors

Vectors by default are copied when passed to functions (expensive for large vectors!). Use & to pass by reference:

// Pass by value — makes a copy (SLOW for large vectors)
void printVec(vector<int> v) {
    for (int x : v) cout << x << " ";
}

// Pass by reference — no copy, CAN modify original (use for output params)
void sortVec(vector<int>& v) {
    sort(v.begin(), v.end());
}

// Pass by const reference — no copy, CANNOT modify (best for read-only)
void printVecFast(const vector<int>& v) {
    for (int x : v) cout << x << " ";
}

⚡ Pro Tip: For any vector parameter that you're only reading (not modifying), always write const vector<int>&. It avoids the copy and also signals to readers that the function won't change the vector.

⚠️ Common Mistakes in Chapter 2.3

#	Mistake	Example	Why It's Wrong	Fix
1	Off-by-one array out-of-bounds	`arr[n]` when array size is n	Valid indices are 0 to n-1, `arr[n]` is out-of-bounds	Use `i < n` instead of `i <= n`
2	Forgot recursive base case	`int f(int n) { return n*f(n-1); }`	Never stops, causes stack overflow crash	Add `if (n == 0) return 1;`
3	Recursive function receives invalid (e.g. negative) argument	`factorial(-1)`	Base case only handles `n == 0`; negative values cause infinite recursion → stack overflow	Before calling, ensure input is within valid range; or在函数入口加防御：`if (n < 0) return -1;`
4	Vector passed by value causes performance issue	`void f(vector<int> v)`	Copies entire vector, very slow when N is large	Use `const vector<int>& v`
5	Local array uninitialized	`int arr[100]; sum += arr[50];`	Local arrays are not auto-zeroed, contain garbage values	Use `= {}` to initialize or use global arrays
6	Array too large inside main	`int main() { int arr[1000000]; }`	Exceeds stack memory limit (usually 1-8 MB), program crashes	Put large arrays outside main (global)
7	Function defined after call	`main` calls `square(5)` but `square` is defined below `main`	Compiler does not recognize undefined functions	Define function before main, or use function prototype

Chapter Summary

📌 Key Takeaways

Concept	Key Points	Why It Matters
Functions	Define once, call anywhere	Reduce duplicate code, improve readability
Return types	`int`, `double`, `bool`, `void`	Use different return types for different scenarios
Pass by value	Function gets a copy, original unchanged	Safe, no side effects
Pass by reference (`&`)	Function operates on original variable	Can modify original, avoids copying large objects
Recursion	Function calls itself, must have base case	Foundation of divide & conquer, backtracking, DP
Arrays	Fixed size, 0-indexed, O(1) random access	Most fundamental data structure in competitive programming
Global arrays	Avoid stack overflow, auto-initialized to 0	Must use global arrays when N exceeds 10^5
`vector<int>`	Dynamic array, variable size	Preferred data container in competitive programming
`push_back` / `pop_back`	Add/remove at end	O(1) operation, primary way to build dynamic collections
Prefix Sum	Preprocess O(N), query O(1)	Core technique for range sum queries, covered in depth in Chapter 3.2

❓ FAQ

Q1: Which is better, arrays or vectors?

A: Both are common in competitive programming. Rule of thumb: if the size is fixed and known, global arrays are simplest; if the size changes dynamically or needs to be passed to functions, use vector. Many contestants default to vector because it is more flexible and less error-prone.

Q2: Is there a limit to recursion depth? Can it crash?

A: Yes. Each function call allocates space on the stack, and the default stack size is about 1-8 MB. In practice, about 10^4 ~ 10^5 levels of recursion are supported. If exceeded, the program crashes with a "stack overflow". In contests, if recursion depth may exceed 10^4, consider switching to an iterative (loop) approach.

Q3: When should I use pass by reference (&)?

A: Two cases: ① You need to modify the original variable inside the function; ② The parameter is a large object (like vector or string) and you want to avoid copy overhead. For small types like int and double, copy overhead is negligible, so pass by value is fine.

Q4: Can a function return an array or vector?

A: Arrays cannot be returned directly, but vector can! vector<int> solve() { ... return result; } is perfectly valid. Modern C++ compilers optimize the return process (called RVO), so the entire vector is not actually copied.

Q5: Why does the prefix sum array have one extra index? prefix[n+1] instead of prefix[n]?

A: prefix[0] = 0 is a "sentinel value" that makes the formula prefix[R+1] - prefix[L] work in all cases. Without this sentinel, querying [0, R] would require special handling when L=0. This is a very common programming trick: use an extra sentinel value to simplify boundary handling.

🔗 Connections to Later Chapters

Chapter 3.1 (STL Essentials) will introduce tools like sort, binary_search, and pair, letting you accomplish in one line what this chapter implements by hand
Chapter 3.2 (Prefix Sums) will dive deeper into the prefix sum technique introduced in Problem 3.10, including 2D prefix sums and difference arrays
Chapter 5.1 (Introduction to Graphs) will build on the recursion foundation in Section 2.3.4 to teach graph traversals like DFS and BFS
Chapters 6.1–6.3 (Dynamic Programming): the core idea of "breaking large problems into smaller ones" is closely related to recursion; this chapter's recursive thinking is important groundwork
The function encapsulation and array/vector operations learned in this chapter will be used continuously in every subsequent chapter

Practice Problems

🌡️ Warm-Up Problems

Warm-up 2.3.1 — Square Function Write a function int square(int x) that returns x². In main, read one integer and print its square.

Sample Input: 7 → Sample Output: 49

💡 Solution (click to reveal)

Approach: Write the function above main, call it with the input.

#include <bits/stdc++.h>
using namespace std;

int square(int x) {
    return x * x;
}

int main() {
    int n;
    cin >> n;
    cout << square(n) << "\n";
    return 0;
}

Key points:

Function defined above main so the compiler knows about it
return x * x; — C++ evaluates x * x and returns the result
Use long long if x can be large (e.g., x up to 10^9, then x² up to 10^18)

Warm-up 2.3.2 — Max of Two Write a function int myMax(int a, int b) that returns the larger of two integers. In main, read two integers and print the larger.

Sample Input: 13 7 → Sample Output: 13

💡 Solution (click to reveal)

Approach: Compare a and b, return whichever is larger.

#include <bits/stdc++.h>
using namespace std;

int myMax(int a, int b) {
    if (a > b) return a;
    return b;
}

int main() {
    int a, b;
    cin >> a >> b;
    cout << myMax(a, b) << "\n";
    return 0;
}

Key points:

C++ has a built-in max(a, b) function — but writing your own teaches the concept
Alternative using ternary operator: return (a > b) ? a : b;

Warm-up 2.3.3 — Reverse Array Declare an array of exactly 5 integers: {1, 2, 3, 4, 5}. Print them in reverse order (no input needed).

Expected Output:

5 4 3 2 1

💡 Solution (click to reveal)

Approach: Loop from index 4 down to 0 (backwards).

#include <bits/stdc++.h>
using namespace std;

int main() {
    int arr[5] = {1, 2, 3, 4, 5};

    for (int i = 4; i >= 0; i--) {
        cout << arr[i];
        if (i > 0) cout << " ";
    }
    cout << "\n";

    return 0;
}

Key points:

Loop from index n-1 = 4 down to 0 (inclusive), using i--
The if (i > 0) cout << " " avoids a trailing space — but for USACO, a trailing space is usually acceptable

Warm-up 2.3.4 — Vector Sum Create a vector, push the values 10, 20, 30, 40, 50 into it using push_back, then print their sum.

Expected Output: 150

💡 Solution (click to reveal)

Approach: Create empty vector, push 5 values, loop to sum.

#include <bits/stdc++.h>
using namespace std;

int main() {
    vector<int> v;
    v.push_back(10);
    v.push_back(20);
    v.push_back(30);
    v.push_back(40);
    v.push_back(50);

    long long sum = 0;
    for (int x : v) {
        sum += x;
    }

    cout << sum << "\n";
    return 0;
}

Key points:

Range-for for (int x : v) iterates over every element
accumulate(v.begin(), v.end(), 0LL) is a one-liner alternative

Warm-up 2.3.5 — Hello N Times Write a void function sayHello(int n) that prints "Hello!" exactly n times. Call it from main after reading n.

Sample Input: 3 Sample Output:

Hello!
Hello!
Hello!

💡 Solution (click to reveal)

Approach: A void function with a for loop inside.

#include <bits/stdc++.h>
using namespace std;

void sayHello(int n) {
    for (int i = 0; i < n; i++) {
        cout << "Hello!\n";
    }
}

int main() {
    int n;
    cin >> n;
    sayHello(n);
    return 0;
}

Key points:

void means the function returns nothing — no return value; needed (can use bare return; to exit early)
The n in sayHello's parameter is a separate copy from the n in main (pass by value)

🏋️ Core Practice Problems

Problem 2.3.6 — Array Reverse Read N (1 ≤ N ≤ 100), then read N integers. Print them in reverse order.

Sample Input:

5
1 2 3 4 5

Sample Output: 5 4 3 2 1

💡 Solution (click to reveal)

Approach: Store in a vector, then print from the last index to the first.

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;

    vector<int> arr(n);
    for (int i = 0; i < n; i++) {
        cin >> arr[i];
    }

    for (int i = n - 1; i >= 0; i--) {
        cout << arr[i];
        if (i > 0) cout << " ";
    }
    cout << "\n";

    return 0;
}

Key points:

vector<int> arr(n) creates a vector of size n (all zeros initially)
We read into arr[i] just like an array
Print from n-1 down to 0 inclusive

Problem 2.3.7 — Running Average Read N (1 ≤ N ≤ 100), then read N integers one at a time. After reading each integer, print the average of all integers read so far (as a decimal with 2 decimal places).

Sample Input:

4
10 20 30 40

Sample Output:

💡 Solution (click to reveal)

Approach: Keep a running sum. After each new input, divide by how many we've read so far.

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;

    long long sum = 0;
    for (int i = 1; i <= n; i++) {
        int x;
        cin >> x;
        sum += x;
        double avg = (double)sum / i;
        cout << fixed << setprecision(2) << avg << "\n";
    }

    return 0;
}

Key points:

sum is updated with each new element; i is the count of elements read so far
(double)sum / i — cast to double before dividing so we get decimal result
fixed << setprecision(2) forces exactly 2 decimal places

Problem 2.3.8 — Frequency Count Read N (1 ≤ N ≤ 100) integers. Each integer is between 1 and 10 inclusive. Print how many times each value from 1 to 10 appears.

Sample Input:

7
3 1 2 3 3 1 7

Sample Output:

1 appears 2 times
2 appears 1 times
3 appears 3 times
4 appears 0 times
5 appears 0 times
6 appears 0 times
7 appears 1 times
8 appears 0 times
9 appears 0 times
10 appears 0 times

💡 Solution (click to reveal)

Approach: Use an array (or vector) as a "tally counter" — index 1 through 10 holds the count for that value.

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;

    int freq[11] = {};  // indices 0-10; we'll use 1-10. Initialize all to 0.

    for (int i = 0; i < n; i++) {
        int x;
        cin >> x;
        freq[x]++;    // increment the count for value x
    }

    for (int v = 1; v <= 10; v++) {
        cout << v << " appears " << freq[v] << " times\n";
    }

    return 0;
}

Key points:

freq[x]++ is a very common pattern — use the VALUE as the INDEX in a frequency array
We declare freq[11] with indices 0-10 so that freq[10] is valid (index 10 for value 10)
int freq[11] = {} — the = {} zero-initializes all elements

Problem 2.3.9 — Two Sum Read N (1 ≤ N ≤ 100) integers and a target value T. Print YES if any two different elements in the array sum to T, NO otherwise.

Sample Input:

5 9
1 4 5 6 3

(N=5, T=9, then the array) Sample Output: YES (because 4+5=9 or 3+6=9)

💡 Solution (click to reveal)

Approach: Check all pairs (i, j) where i < j. If any pair sums to T, print YES.

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, t;
    cin >> n >> t;

    vector<int> arr(n);
    for (int i = 0; i < n; i++) cin >> arr[i];

    bool found = false;
    for (int i = 0; i < n && !found; i++) {
        for (int j = i + 1; j < n; j++) {
            if (arr[i] + arr[j] == t) {
                found = true;
                break;
            }
        }
    }

    cout << (found ? "YES" : "NO") << "\n";

    return 0;
}

Key points:

Inner loop starts at j = i + 1 to avoid using the same element twice and checking duplicate pairs
break + the && !found condition in the outer loop ensures we stop as soon as we find a match
This is O(N²) — fine for N ≤ 100. For N up to 10^5, you'd use a set (Chapter 3.1)

Problem 2.3.10 — Prefix Sums Read N (1 ≤ N ≤ 1000), then N integers. Then read Q queries (1 ≤ Q ≤ 1000), each with two integers L and R (0-indexed, inclusive). For each query, print the sum of elements from index L to R.

Sample Input:

Sample Output:

6
9
12

💡 Solution (click to reveal)

Why not sum directly for each query? Brute force: each query loops from L to R, time complexity O(N), all queries total O(N×Q). When N=10^5, Q=10^5, that is 10^{10} operations—far exceeding the time limit.

Optimization idea: Preprocess the array once in O(N), then each query takes only O(1). Total time O(N+Q), much faster! This is the core idea of prefix sums (covered in depth in Chapter 3.2).

Approach: Build a prefix sum array where prefix[i] = sum of arr[0..i-1]. Then sum from L to R = prefix[R+1] - prefix[L]. This gives O(1) per query instead of O(N).

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;

    vector<long long> arr(n), prefix(n + 1, 0);

    for (int i = 0; i < n; i++) {
        cin >> arr[i];
        prefix[i + 1] = prefix[i] + arr[i];  // build prefix sum
    }
    // prefix[0] = 0
    // prefix[1] = arr[0]
    // prefix[2] = arr[0] + arr[1]
    // prefix[i] = arr[0] + arr[1] + ... + arr[i-1]

    int q;
    cin >> q;
    while (q--) {
        int l, r;
        cin >> l >> r;
        // sum from l to r (inclusive) = prefix[r+1] - prefix[l]
        cout << prefix[r + 1] - prefix[l] << "\n";
    }

    return 0;
}

Key points:

prefix[i] = sum of the first i elements (prefix[0] = 0 is a sentinel)
Sum of arr[L..R] = prefix[R+1] - prefix[L] — subtracting the part before L
Check with sample: arr=[1,2,3,4,5], prefix=[0,1,3,6,10,15]. Query [0,2]: prefix[3]-prefix[0]=6-0=6 ✓

Complexity Analysis:

Time: O(N + Q) — preprocess O(N) + each query O(1) × Q queries

Space: O(N) — prefix sum array uses N+1 space

💡 Brute force vs optimized: Brute force O(N×Q) vs prefix sum O(N+Q). When N=Q=10^5, the former takes 10^{10} operations (TLE), the latter only 2×10^5 operations (instant).

🏆 Challenge Problems

Challenge 2.3.11 — Rotate Array Read N (1 ≤ N ≤ 1000) and K (0 ≤ K < N). Read N integers. Print the array rotated right by K positions (the last K elements wrap to the front).

Sample Input:

5 2
1 2 3 4 5

Sample Output: 4 5 1 2 3

💡 Solution (click to reveal)

Approach: The new array has element at original position (i - K + N) % N at position i. Equivalently, print elements starting from index N-K, wrapping around.

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, k;
    cin >> n >> k;

    vector<int> arr(n);
    for (int i = 0; i < n; i++) cin >> arr[i];

    // Print n elements starting from index (n - k) % n, wrapping around
    for (int i = 0; i < n; i++) {
        int idx = (n - k + i) % n;
        cout << arr[idx];
        if (i < n - 1) cout << " ";
    }
    cout << "\n";

    return 0;
}

Key points:

Right rotate by K: last K elements come first, then first N-K elements
(n - k + i) % n maps new position i to old position — the % n handles the wraparound
Check: n=5, k=2. i=0: idx=(5-2+0)%5=3 → arr[3]=4. i=1: idx=4 → arr[4]=5. i=2: idx=0 → arr[0]=1. Correct!

Challenge 2.3.12 — Merge Sorted Arrays Read N₁, then N₁ sorted integers. Read N₂, then N₂ sorted integers. Print the merged sorted array.

Sample Input:

Sample Output: 1 2 3 4 5 6 8

💡 Solution (click to reveal)

Approach: Use two pointers — one for each array. At each step, take the smaller of the two current elements.

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n1;
    cin >> n1;
    vector<int> a(n1);
    for (int i = 0; i < n1; i++) cin >> a[i];

    int n2;
    cin >> n2;
    vector<int> b(n2);
    for (int i = 0; i < n2; i++) cin >> b[i];

    // Two-pointer merge
    int i = 0, j = 0;
    vector<int> result;

    while (i < n1 && j < n2) {
        if (a[i] <= b[j]) {
            result.push_back(a[i++]);  // take from a, advance i
        } else {
            result.push_back(b[j++]);  // take from b, advance j
        }
    }
    // One array may have leftover elements
    while (i < n1) result.push_back(a[i++]);
    while (j < n2) result.push_back(b[j++]);

    for (int idx = 0; idx < (int)result.size(); idx++) {
        cout << result[idx];
        if (idx < (int)result.size() - 1) cout << " ";
    }
    cout << "\n";

    return 0;
}

Key points:

Two pointers i and j scan through arrays a and b simultaneously
We always pick the smaller current element — this maintains sorted order
After the while loop, one array might still have elements — copy those directly

Challenge 2.3.13 — Smell Distance (Inspired by USACO Bronze)

N cows are standing in a line. Each cow has a position p[i] and a smell radius s[i]. A cow can smell another if the distance between them is at most the sum of their radii. Read N, then N pairs (position, radius). Print the number of pairs of cows that can smell each other.

Sample Input:

Sample Output: 1

(Pair (0,1): dist=|1-5|=4, radii sum=2+1=3. 4>3, NO. Pair (0,2): dist=|1-8|=7, sum=2+3=5. 7>5, NO. Pair (1,2): dist=|5-8|=3, sum=1+3=4. 3≤4, YES. Pair (0,3): 14>3 NO. Pair (1,3): 10>2 NO. Pair (2,3): 7>4 NO. Total: 1.)

💡 Solution (click to reveal)

Approach: Check all pairs (i, j) where i < j. For each pair, compute the distance and compare to the sum of their radii.

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;

    vector<long long> pos(n), rad(n);
    for (int i = 0; i < n; i++) {
        cin >> pos[i] >> rad[i];
    }

    int count = 0;
    for (int i = 0; i < n; i++) {
        for (int j = i + 1; j < n; j++) {
            long long dist = abs(pos[i] - pos[j]);
            long long sumRad = rad[i] + rad[j];
            if (dist <= sumRad) {
                count++;
            }
        }
    }

    cout << count << "\n";
    return 0;
}

Key points:

Check all pairs (i, j) with i < j to avoid counting the same pair twice
abs(pos[i] - pos[j]) computes the absolute distance between positions
Use long long in case positions and radii are large

🏗️ Part 3: Core Data Structures

The data structures that appear in nearly every USACO Bronze and Silver problem — prefix sums, sorting, two pointers, stacks, maps, and segment trees.

📚 11 Chapters · ⏱️ Estimated 2-3 weeks · 🎯 Target: Solve USACO Bronze problems

Part 3: Core Data Structures

Estimated time: 2–3 weeks

Part 3 is where competitive programming starts getting exciting. You'll learn the data structures that appear in nearly every USACO Bronze and Silver problem — and techniques that can turn O(N²) brute force into O(N) elegance.

What Topics Are Covered

Chapter	Topic	The Big Idea
Chapter 3.1	STL Essentials	Master the powerful built-in containers: sort, map, set, queue, stack
Chapter 3.2	Arrays & Prefix Sums	Answer range sum queries in O(1) after O(N) preprocessing
Chapter 3.3	Sorting & Searching	Sort + binary search turns many O(N²) problems into O(N log N)
Chapter 3.4	Two Pointers & Sliding Window	Efficiently process subarrays/pairs with two coordinated pointers
Chapter 3.5	Monotonic Stack & Monotonic Queue	Next greater element, sliding window max/min in O(N)
Chapter 3.6	Stacks, Queues & Deques	Order-based data structures for LIFO/FIFO processing
Chapter 3.7	Hashing Techniques	Fast key lookup, polynomial hashing, rolling hash
Chapter 3.8	Maps & Sets	O(log N) lookup, unique collections, frequency counting
Chapter 3.9	Introduction to Segment Trees	Efficient range queries and point updates in O(log N)
Chapter 3.10	Fenwick Tree (BIT)	Efficient prefix-sum with point updates, inversion count
Chapter 3.11	Binary Trees	Tree traversals, BST operations, balanced trees

What You'll Be Able to Solve After This Part

After completing Part 3, you'll be ready to tackle:

USACO Bronze: Most Bronze problems use Part 3 techniques
- Range queries (how many cows of type X in positions L to R?)
- Sorting problems (closest pair, ranking, scheduling)
- Frequency counting (how many times does each value appear?)
- Stack-based problems (balanced brackets, monotonic processing)
USACO Silver Intro:
- Binary search on the answer (aggressive cows, rope cutting)
- Sliding window maximum/minimum
- Difference arrays for range updates

Key Algorithms Introduced

Technique	Chapter	USACO Relevance
1D Prefix Sum	3.2	Breed counting, range queries
2D Prefix Sum	3.2	Rectangle sum queries on grids
Difference Array	3.2	Range update, point query
`std::sort` with custom comparator	3.3	Nearly every Silver problem
Binary search (`lower_bound`, `upper_bound`)	3.3	Counting, range queries
Binary search on answer	3.3	Aggressive cows, painter's partition
Monotonic stack	3.5	Next greater element, histogram
Sliding window (monotonic deque)	3.5	Window min/max
Frequency map (`unordered_map`)	3.7	Counting occurrences
Ordered set operations	3.8	K-th element, range queries

Prerequisites

Before starting Part 3, make sure you can:

Write and compile a C++ program from scratch (Chapter 2.1)
Use for loops and nested loops correctly (Chapter 2.2)
Work with arrays and vector<int> (Chapter 2.3)

Note: Chapter 3.1 (STL Essentials) is the first chapter of this part and will teach you std::sort, map, set, and other key STL containers before you need them in later chapters.

Tips for This Part

Chapter 3.2 (Prefix Sums) is the most frequently tested technique in Bronze. Make sure you can implement it from scratch in 5 minutes.
Chapter 3.3 (Binary Search) introduces "binary search on the answer" — this is a Silver-level technique that separates good solutions from great ones.
Don't skip the practice problems. Each chapter's problems are specifically chosen to build the intuition you need.
After finishing Chapter 3.3, you have enough tools for most USACO Bronze problems. Try solving 5–10 Bronze problems before continuing.

🏆 USACO Tip: At USACO Bronze, the most common techniques are: simulation (Chapters 2.1–2.3), sorting (Chapter 3.3), and prefix sums (Chapter 3.2). If you master these, you can solve almost any Bronze problem.

Let's dive in!

STL Essentials

📖 Chapter 3.2 ⏱️ ~55 min read 🎯 Intermediate

Chapter 3.2: Arrays & Prefix Sums

📝 Before You Continue: Make sure you're comfortable with arrays, vectors, and basic loops (Chapters 2.2–2.3). You'll also want to understand long long overflow (Chapter 2.1).

Imagine you have an array of N numbers, and someone asks you 100,000 times: "What is the sum of elements from index L to index R?" A naive approach recomputes the sum from scratch each time — that's O(N) per query, or O(N × Q) total. With N = Q = 10^5, that's 10^10 operations. Way too slow.

Prefix sums solve this in O(N) preprocessing and O(1) per query. This is one of the most elegant and useful techniques in all of competitive programming.

💡 Key Insight: Prefix sums transform a "range query" problem into a subtraction. Instead of summing L to R every time, you precompute cumulative sums and subtract two of them. This trades O(Q) repeated work for one-time O(N) preprocessing.

3.2.1 The Prefix Sum Idea

The prefix sum of an array is a new array where each element stores the cumulative sum up to that index.

Visual: Prefix Sum Array

Prefix Sum Visualization

The diagram above shows how the prefix sum array is constructed from the original array, and how a range query sum(L, R) = P[R] - P[L-1] is computed in O(1) time. The blue cells highlight a query range while the red and green cells show the two prefix values being subtracted.

Given array: A = [3, 1, 4, 1, 5, 9, 2, 6] (1-indexed for clarity)

Index:  1  2  3  4  5  6  7  8
A:      3  1  4  1  5  9  2  6
P:      3  4  8  9  14 23 25 31

Where P[i] = A[1] + A[2] + ... + A[i].

Why 1-Indexing?

Using 1-indexed arrays lets us define P[0] = 0 (the "empty prefix" sums to zero). This makes the query formula P[R] - P[L-1] work even when L = 1 — we'd compute P[R] - P[0] = P[R], which is correct.

Building the Prefix Sum Array

// Solution: Build Prefix Sum Array — O(N)
#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;

    // Step 1: Read input (1-indexed)
    vector<int> A(n + 1);
    for (int i = 1; i <= n; i++) cin >> A[i];

    // Step 2: Build prefix sums
    vector<long long> P(n + 1, 0);  // P[0] = 0 (base case)
    for (int i = 1; i <= n; i++) {
        P[i] = P[i - 1] + A[i];   // ← KEY LINE: each P[i] = all elements up to i
    }

    return 0;
}

Complexity Analysis:

Time: O(N) — one pass through the array
Space: O(N) — stores the prefix array

Step-by-step trace for A = [3, 1, 4, 1, 5]:

i=1: P[1] = P[0] + A[1] = 0 + 3 = 3
i=2: P[2] = P[1] + A[2] = 3 + 1 = 4
i=3: P[3] = P[2] + A[3] = 4 + 4 = 8
i=4: P[4] = P[3] + A[4] = 8 + 1 = 9
i=5: P[5] = P[4] + A[5] = 9 + 5 = 14

3.2.2 Range Sum Queries in `O(1)`

Once you have the prefix sum array, the sum from index L to R is:

sum(L, R) = P[R] - P[L-1]

Why? P[R] = sum of elements 1..R. P[L-1] = sum of elements 1..(L-1). Their difference = sum of elements L..R.

💡 Key Insight: Think of P[i] as "the total sum of the first i elements." To get the sum of a window [L, R], you subtract the "prefix before L" from the "prefix through R." It's like: big triangle minus smaller triangle = trapezoid.

// Solution: Range Sum Queries — Preprocessing O(N), Each Query O(1)
#include <bits/stdc++.h>
using namespace std;

const int MAXN = 100001;
long long A[MAXN];
long long P[MAXN];

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, q;
    cin >> n >> q;

    // Step 1: Read array
    for (int i = 1; i <= n; i++) cin >> A[i];

    // Step 2: Build prefix sum — O(n)
    P[0] = 0;
    for (int i = 1; i <= n; i++) {
        P[i] = P[i - 1] + A[i];
    }

    // Step 3: Answer q range sum queries — O(1) each
    for (int i = 0; i < q; i++) {
        int l, r;
        cin >> l >> r;
        cout << P[r] - P[l - 1] << "\n";  // ← KEY LINE: range sum formula
    }

    return 0;
}

Sample Input:

8 3
3 1 4 1 5 9 2 6
1 4
3 7
2 6

Sample Output:

9
21
20

Verification:

sum(1,4) = P[4] - P[0] = 9 - 0 = 9 → A[1]+A[2]+A[3]+A[4] = 3+1+4+1 = 9 ✓
sum(3,7) = P[7] - P[2] = 25 - 4 = 21 → A[3]+...+A[7] = 4+1+5+9+2 = 21 ✓
sum(2,6) = P[6] - P[1] = 23 - 3 = 20 → A[2]+...+A[6] = 1+4+1+5+9 = 20 ✓

⚠️ Common Mistake: Writing P[R] - P[L] instead of P[R] - P[L-1]. The formula includes both endpoints L and R — you want to subtract the sum before L, not the sum at L.

Total Complexity: O(N + Q) — perfect for N, Q up to 10^5.

3.2.3 USACO Example: Breed Counting

This is a classic USACO Bronze problem (2015 December).

Problem: N cows in a line. Each cow is breed 1, 2, or 3. Answer Q queries: how many cows of breed B are in positions L to R?

Solution: Maintain one prefix sum array per breed.

// Solution: Multi-Breed Prefix Sums — O(N + Q)
#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, q;
    cin >> n >> q;

    vector<int> breed(n + 1);
    vector<vector<long long>> P(4, vector<long long>(n + 1, 0));
    // P[b][i] = number of cows of breed b in positions 1..i

    // Step 1: Build prefix sums for each breed
    for (int i = 1; i <= n; i++) {
        cin >> breed[i];
        for (int b = 1; b <= 3; b++) {
            P[b][i] = P[b][i - 1] + (breed[i] == b ? 1 : 0);  // ← KEY LINE
        }
    }

    // Step 2: Answer each query in O(1)
    for (int i = 0; i < q; i++) {
        int l, r, b;
        cin >> l >> r >> b;
        cout << P[b][r] - P[b][l - 1] << "\n";
    }

    return 0;
}

🏆 USACO Tip: Many USACO Bronze problems involve "count elements satisfying property X in a range." If Q is large, always consider prefix sums.

3.2.4 USACO-Style Problem Walkthrough: Farmer John's Grass Fields

🔗 Related Problem: This is a fictional USACO-style problem inspired by "Breed Counting" and "Tallest Cow" — both classic Bronze problems.

Problem Statement: Farmer John has N fields in a row. Field i has grass[i] units of grass. He needs to answer Q queries: "What is the total grass in fields L through R (inclusive)?" With N, Q up to 10^5, he needs each query answered in O(1).

Sample Input:

6 4
4 2 7 1 8 3
1 3
2 5
4 6
1 6

Sample Output:

Step-by-Step Solution:

Step 1: Understand the problem. We have an array [4, 2, 7, 1, 8, 3] and need range sums.

Step 2: Build the prefix sum array.

Index:  0  1  2  3  4  5  6
grass:  -  4  2  7  1  8  3
P:      0  4  6  13 14 22 25

Step 3: Answer queries using P[R] - P[L-1]:

Query (1,3): P[3] - P[0] = 13 - 0 = 13 ✓
Query (2,5): P[5] - P[1] = 22 - 4 = 18 ✓
Query (4,6): P[6] - P[3] = 25 - 13 = 12 ✓
Query (1,6): P[6] - P[0] = 25 - 0 = 25 ✓

Complete C++ Solution:

// Farmer John's Grass Fields — Prefix Sum Solution O(N + Q)
#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, q;
    cin >> n >> q;

    // Step 1: Read grass values and build prefix sum simultaneously
    vector<long long> P(n + 1, 0);
    for (int i = 1; i <= n; i++) {
        long long g;
        cin >> g;
        P[i] = P[i - 1] + g;   // ← KEY LINE: incremental prefix sum
    }

    // Step 2: Answer each query in O(1)
    while (q--) {
        int l, r;
        cin >> l >> r;
        cout << P[r] - P[l - 1] << "\n";
    }

    return 0;
}

Why is this O(N + Q)?

Building prefix sums: one loop, N iterations → O(N)
Each query: one subtraction → O(1) per query, O(Q) total
Total: O(N + Q) — much better than the O(NQ) brute force

⚠️ Common Mistake: Using int instead of long long for the prefix sum. If grass values are up to 10^9 and N = 10^5, the total could be up to 10^14 — way beyond int's range of ~2×10^9.

3.2.5 Difference Arrays

The difference array is the inverse of prefix sums. It's useful when you need to add a value to a range of positions, then query final values.

Problem: Start with all zeros. Apply M updates: "add V to all positions from L to R." Then print the final array.

Naively, each update is O(R-L+1). With a difference array, each update is O(1), and reconstruction is O(N).

💡 Key Insight: Instead of adding V to every position in [L, R] (slow), we record "+V at position L" and "-V at position R+1" (fast). When we later do a prefix sum of these markers, the +V and -V "cancel out" outside [L,R], so the net effect is exactly adding V to [L,R].

// Solution: Difference Array for Range Updates — O(N + M)
#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, m;
    cin >> n >> m;

    vector<long long> diff(n + 2, 0);  // difference array (extra space for R+1 case)

    // Step 1: Process all range updates in O(1) each
    for (int i = 0; i < m; i++) {
        int l, r, v;
        cin >> l >> r >> v;
        diff[l] += v;      // ← KEY LINE: mark start of range
        diff[r + 1] -= v;  // ← KEY LINE: mark end+1 to undo the addition
    }

    // Step 2: Reconstruct the final array by taking prefix sums of diff
    long long running = 0;
    for (int i = 1; i <= n; i++) {
        running += diff[i];
        cout << running;
        if (i < n) cout << " ";
    }
    cout << "\n";

    return 0;
}

Sample Input:

Step-by-step trace:

📝 索引说明： 以下追踪中 diff 数组使用 1-indexed（即 diff[1]..diff[n+1]），与代码 vector<long long> diff(n + 2, 0) 一致。方括号内的数字表示 diff[1], diff[2], ..., diff[6]（n=5 时，数组共 7 个位置，有效使用 diff[1..6]）。

初始状态:           diff[1..6] = [0,  0,  0,  0,  0,  0]

After update(1,3,+2): diff[1]+=2, diff[4]-=2
                      diff[1..6] = [2,  0,  0, -2,  0,  0]

After update(2,5,+3): diff[2]+=3, diff[6]-=3
                      diff[1..6] = [2,  3,  0, -2,  0, -3]

After update(3,4,-1): diff[3]+=-1 即 diff[3]-=1, diff[5]-=(-1) 即 diff[5]+=1
                      diff[1..6] = [2,  3, -1, -2,  1, -3]

Prefix sum reconstruction: i=1: running = 0+2 = 2 → result[1] = 2 i=2: running = 2+3 = 5 → result[2] = 5 i=3: running = 5-1 = 4 → result[3] = 4 i=4: running = 4-2 = 2 → result[4] = 2 i=5: running = 2+1 = 3 → result[5] = 3


**Sample Output:**

2 5 4 2 3


**Complexity Analysis:**
- **Time:** `O(N + M)` — `O(1)` per update, `O(N)` reconstruction
- **Space:** `O(N)` — just the difference array

> ⚠️ **Common Mistake:** Declaring `diff` with size N+1 instead of N+2. When R=N, you write to `diff[R+1] = diff[N+1]`, which needs to exist!

---

## 3.2.6 2D Prefix Sums

For 2D grids, you can extend prefix sums to answer rectangular range queries in `O(1)`.

Given an R×C grid, define `P[r][c]` = sum of all elements in the rectangle from (1,1) to (r,c).

### Building the 2D Prefix Sum

P[r][c] = A[r][c] + P[r-1][c] + P[r][c-1] - P[r-1][c-1]


The subtraction removes the overlap (otherwise the top-left rectangle is counted twice).

> 💡 **Key Insight (Inclusion-Exclusion):** Visualize the four rectangles:
> - `P[r-1][c]` = the "top" rectangle
> - `P[r][c-1]` = the "left" rectangle
> - `P[r-1][c-1]` = the "top-left corner" (counted in BOTH above — so subtract once)
> - `A[r][c]` = the single new cell

### Step-by-Step 2D Prefix Sum Worked Example

Let's trace through a 4×4 grid:

**Original Grid A:**

 c=1  c=2  c=3  c=4

r=1: 1 2 3 4 r=2: 5 6 7 8 r=3: 9 10 11 12 r=4: 13 14 15 16


**Building P step by step (left-to-right, top-to-bottom):**

P[1][1] = A[1][1] = 1

P[1][2] = A[1][2] + P[0][2] + P[1][1] - P[0][1] = 2 + 0 + 1 - 0 = 3 P[1][3] = A[1][3] + P[0][3] + P[1][2] - P[0][2] = 3 + 0 + 3 - 0 = 6 P[1][4] = 4 + 0 + 6 - 0 = 10

P[2][1] = A[2][1] + P[1][1] + P[2][0] - P[1][0] = 5 + 1 + 0 - 0 = 6 P[2][2] = A[2][2] + P[1][2] + P[2][1] - P[1][1] = 6 + 3 + 6 - 1 = 14 P[2][3] = 7 + 6 + 14 - 3 = 24 P[2][4] = 8 + 10 + 24 - 6 = 36

P[3][1] = 9 + 6 + 0 - 0 = 15 P[3][2] = 10 + 14 + 15 - 6 = 33 P[3][3] = 11 + 24 + 33 - 14 = 54 P[3][4] = 12 + 36 + 54 - 24 = 78

P[4][1] = 13 + 15 + 0 - 0 = 28 P[4][2] = 14 + 33 + 28 - 15 = 60 P[4][3] = 15 + 54 + 60 - 33 = 96 P[4][4] = 16 + 78 + 96 - 54 = 136


**Resulting prefix sum grid P:**

 c=1  c=2  c=3  c=4

r=1: 1 3 6 10 r=2: 6 14 24 36 r=3: 15 33 54 78 r=4: 28 60 96 136


**Query: Sum of subgrid (r1=2, c1=2) to (r2=3, c2=3):**

ans = P[3][3] - P[1][3] - P[3][1] + P[1][1] = 54 - 6 - 15 + 1 = 34

Verify: A[2][2]+A[2][3]+A[3][2]+A[3][3] = 6+7+10+11 = 34 ✓


**Visualization of the inclusion-exclusion:**

![2D Prefix Sum Inclusion-Exclusion](../images/prefix_sum_2d_inclusion_exclusion.svg)

```cpp
// Solution: 2D Prefix Sums — Build O(R×C), Query O(1)
#include <bits/stdc++.h>
using namespace std;

const int MAXR = 1001, MAXC = 1001;
int A[MAXR][MAXC];
long long P[MAXR][MAXC];

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int R, C;
    cin >> R >> C;

    for (int r = 1; r <= R; r++)
        for (int c = 1; c <= C; c++)
            cin >> A[r][c];

    // Step 1: Build 2D prefix sum — O(R × C)
    for (int r = 1; r <= R; r++) {
        for (int c = 1; c <= C; c++) {
            P[r][c] = A[r][c]
                    + P[r-1][c]    // rectangle above
                    + P[r][c-1]    // rectangle to the left
                    - P[r-1][c-1]; // ← KEY LINE: remove overlap (counted twice)
        }
    }

    // Step 2: Answer each query in O(1)
    int q;
    cin >> q;
    while (q--) {
        int r1, c1, r2, c2;
        cin >> r1 >> c1 >> r2 >> c2;
        long long ans = P[r2][c2]
                      - P[r1-1][c2]    // subtract top strip
                      - P[r2][c1-1]    // subtract left strip
                      + P[r1-1][c1-1]; // add back top-left corner
        cout << ans << "\n";
    }

    return 0;
}

Complexity Analysis:

Build time: O(R × C)
Query time: O(1) per query
Space: O(R × C)

⚠️ Common Mistake: Forgetting to add P[r1-1][c1-1] back in the query formula. The top strip and left strip both include the top-left corner, so it gets subtracted twice — you need to add it back once!

3.2.7 USACO Example: Max Subarray Sum

Problem (variation of Kadane's algorithm): Find the contiguous subarray with the maximum sum.

// Solution: Kadane's Algorithm — O(N) Time, O(1) Space
#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;
    vector<int> A(n);
    for (int &x : A) cin >> x;

    // Kadane's Algorithm: O(n)
    long long maxSum = LLONG_MIN;  // LLONG_MIN = smallest long long
    long long current = 0;

    for (int i = 0; i < n; i++) {
        current += A[i];
        maxSum = max(maxSum, current);
        if (current < 0) current = 0;  // ← KEY LINE: restart if sum goes negative
    }

    cout << maxSum << "\n";

    return 0;
}

💡 Key Insight: Why reset current to 0 when it goes negative? Because a negative prefix sum hurts any future subarray. If the running sum so far is -5, any future subarray starting fresh (sum 0) will always beat continuing from -5.

Alternative with prefix sums: The max subarray sum equals max over all pairs (i,j) of P[j] - P[i-1]. For each j, this is maximized when P[i-1] is minimized. Track the running minimum of prefix sums!

// Alternative: Min Prefix Trick — also O(N)
long long maxSum = LLONG_MIN, minPrefix = 0, prefix = 0;
for (int x : A) {
    prefix += x;
    maxSum = max(maxSum, prefix - minPrefix);  // best sum ending here
    minPrefix = min(minPrefix, prefix);         // track minimum prefix seen so far
    // ⚠️ 注意：minPrefix 的更新必须在 maxSum 之后。
    // 若提前更新 minPrefix，相当于允许空子数组（长度为0，和为0）参与比较，
    // 会导致结果在全负数组时错误地返回 0 而非最大负数。
}

⚠️ Common Mistakes in Chapter 3.2

Off-by-one in range queries: P[R] - P[L] instead of P[R] - P[L-1]. Always verify on a small example.
Overflow: Prefix sums of large values can exceed int range (2×10^9). Use long long for the prefix array even if elements are int.
2D query formula: Forgetting the +P[r1-1][c1-1] term in the 2D query — a very easy slip.
Difference array size: Declaring diff[n+1] when you need diff[n+2] (because you write to index r+1 which could be n+1).
1-indexing vs 0-indexing: If you use 0-indexed prefix sums, the query formula changes to P[R+1] - P[L]. Pick one convention and stick to it within a problem.

Chapter Summary

📌 Key Takeaways

Technique	Build Time	Query Time	Space	Use Case
1D prefix sum	`O(N)`	`O(1)`	`O(N)`	Range sum on 1D array
2D prefix sum	`O(RC)`	`O(1)`	`O(RC)`	Range sum on 2D grid
Difference array	`O(N+M)`	`O(1)`*	`O(N)`	Range addition updates
Kadane's algorithm	`O(N)`	—	`O(1)`	Maximum subarray sum

*After O(N) reconstruction pass to read all values.

🧩 Core Formula Quick Reference

Operation	Formula	Notes
1D range sum	`P[R] - P[L-1]`	P[0] = 0 is the sentinel value
2D rectangle sum	`P[r2][c2] - P[r1-1][c2] - P[r2][c1-1] + P[r1-1][c1-1]`	Inclusion-exclusion: subtract twice, add once
Difference array update	`diff[L] += V; diff[R+1] -= V;`	Array size should be N+2
Restore from difference	Take prefix sum of diff	Result is the final array

❓ FAQ

Q1: What is the relationship between prefix sums and difference arrays?

A: They are inverse operations. Taking the prefix sum of an array gives the prefix sum array; taking the difference (adjacent element differences) of the prefix sum array restores the original. Conversely, taking the prefix sum of a difference array also restores the original. This is analogous to integration and differentiation in mathematics.

Q2: When to use prefix sums vs. difference arrays?

A: Rule of thumb — look at the operation type:

Multiple range sum queries → prefix sum (preprocess O(N), query O(1))

Multiple range add/subtract operations → difference array (update O(1), restore O(N) at the end)

If both operations alternate, you need a more advanced data structure (like Segment Tree in Chapter 3.9)

Q3: Can prefix sums handle dynamic modifications? (array elements change)

A: No. Prefix sums are a one-time preprocessing; the array cannot change afterward. If elements are modified, use Fenwick Tree (BIT) or Segment Tree, which support point updates and range queries in O(log N) time.

Q4: Why are there two versions of Kadane's algorithm (current=0 vs minPrefix)?

A: Both are essentially the same, both O(N). The first (classic Kadane) is more intuitive: restart when the current subarray sum goes negative. The second (min-prefix method) uses prefix sum thinking: max subarray = max(P[j] - P[i-1]) = max(P[j]) - min(P[i]). Choose based on personal preference.

Q5: What are the space constraints for 2D prefix sums?

A: If R, C are both up to 10^4, the P array needs 10^8 long long values (about 800MB) — exceeding memory limits. Generally R×C ≤ 10^6~10^7 is safe. For larger grids, consider compression or offline processing.

🔗 Connections to Later Chapters

Chapter 3.4 (Two Pointers): sliding window can also do range queries, but only for fixed-size or monotonically moving windows; prefix sums are more general
Chapter 3.3 (Sorting & Searching): binary search can combine with prefix sums — e.g., binary search on the prefix sum array for the first position ≥ target
Chapter 3.9 (Segment Trees): solves "dynamic update + range query" problems that prefix sums cannot handle
Chapters 6.1–6.3 (DP): many state transitions involve range sums; prefix sums are an important tool for optimizing DP
The difference array idea ("+V at start, -V after end") recurs in sweep line algorithms, event sorting, and other advanced techniques

Practice Problems

Problem 3.2.1 — Range Sum 🟢 Easy Read N integers and Q queries. Each query gives L and R. Print the sum of elements from index L to R (1-indexed).

Hint

Build a prefix sum array P where P[i] = A[1]+...+A[i]. Answer each query as P[R] - P[L-1].

Problem 3.2.2 — Range Add, Point Query 🟢 Easy Start with N zeros. Process M operations: each operation adds V to all positions from L to R. After all operations, print the value at each position. (Use difference array)

Hint

Use ``diff[L]`` += V and ``diff[R+1]`` -= V for each update, then take prefix sums of diff.

Problem 3.2.3 — Rectangular Sum 🟡 Medium Read an N×M grid of integers and Q queries. Each query gives (r1,c1,r2,c2). Print the sum of the subgrid.

Hint

Build a 2D prefix sum. Query = P[r2][c2] - P[r1-1][c2] - P[r2][c1-1] + P[r1-1][c1-1].

Problem 3.2.4 — USACO 2016 January Bronze: Mowing the Field 🔴 Hard (Challenge) Farmer John mows grass along a path. Cells visited more than once contribute to "double-mowed" area. Use a 2D array and count cells visited at least twice.

Hint

Simulate the path, marking cells in a 2D visited array. Count cells with value ≥ 2 at the end.

Problem 3.2.5 — Maximum Subarray (Negative numbers allowed) 🟡 Medium Read N integers (possibly negative). Find the maximum possible sum of a contiguous subarray. What if all numbers are negative?

Hint

Use Kadane's algorithm. If all numbers are negative, the answer is the single largest element (that's why we initialize maxSum to `LLONG_MIN`, not 0).

🏆 Challenge Problem: Cows and Paint Buckets An N×M grid contains paint buckets, each with a positive value. You can select any rectangular subgrid. Your score is the maximum value in your subgrid minus the sum of all border cells of your subgrid. Find the optimal rectangle. (N, M ≤ 500)

Solution approach: 2D prefix sums for sums + careful enumeration of all rectangles.

📖 Chapter 3.3 ⏱️ ~60 min read 🎯 Intermediate

Chapter 3.3: Sorting & Searching

📝 Before You Continue: You should be comfortable with arrays, vectors, and basic loops (Chapters 2.2–2.3). Familiarity with std::sort from Chapter 3.1 helps, but this chapter covers it in depth.

Sorting and searching are two of the most fundamental operations in computer science. In USACO, a huge fraction of problems become easy once you sort the data correctly. And binary search — the ability to search a sorted array in O(log n) — is a technique you'll reach for again and again.

3.3.1 Why Sorting Matters

Consider this problem: "Given N cow heights, find the two cows whose heights are closest together."

Unsorted approach: Compare every pair → O(N²). For N = 10^5, that's 10^10 operations. TLE.
Sorted approach: Sort the heights → O(N log N). Then the closest pair must be adjacent! Check N-1 pairs → O(N). Total: O(N log N). ✓

💡 Key Insight: Sorting transforms many O(N²) brute-force solutions into O(N log N) or O(N) solutions. When you see "find the pair with property X" or "find the minimum/maximum of something involving two elements," always consider sorting first.

Complexity Analysis:

Sorting: O(N log N) time, O(log N) space (for the recursion stack in Introsort / quicksort)
After sorting: adjacent comparisons or two-pointer techniques are O(N)

3.3.2 How Sorting Works (Conceptual)

You don't need to implement sorting algorithms yourself — std::sort does it for you. But understanding the ideas helps you reason about time complexity and choose the right approach.

Here are four classic sorting algorithms, each with an interactive visualization to help you understand how they work.

Algorithm	Time Complexity	Space	Stable	Core Idea
Bubble Sort	O(N²)	O(1)	✅	Swap adjacent elements; large values "bubble" to the end
Insertion Sort	O(N²) / O(N) best	O(1)	✅	Insert each element into its correct position in the sorted region
Merge Sort	O(N log N)	O(N)	✅	Divide and conquer: split recursively, then merge
Quicksort	O(N log N) avg	O(log N)	❌	Divide and conquer: partition around a pivot, recurse

🫧 Bubble Sort — `O(N²)`

Repeatedly scan the array, swapping adjacent elements that are out of order. Each pass "bubbles" the current maximum to the end:

Pass 1: [64,34,25,12,22,11,90] → 90 bubbles to end
Pass 2: [34,25,12,22,11,64,90] → 64 bubbles to second-to-last
...

Bubble sort is O(N²). Never use it on large inputs in competitive programming. We cover it only because it's conceptually the simplest.

🃏 Insertion Sort — `O(N²)` / `O(N)` best case

Divide the array into a left "sorted region" and a right "unsorted region." Each step takes the first element of the unsorted region and inserts it into the correct position in the sorted region:

Start: [64 | 34, 25, 12, 22, 11, 90]   ← | sorted on left
i=1:   [34, 64 | 25, 12, 22, 11, 90]   ← 34 inserted before 64
i=2:   [25, 34, 64 | 12, 22, 11, 90]   ← 25 inserted at front
i=3:   [12, 25, 34, 64 | 22, 11, 90]   ← 12 inserted at front
...

💡 Insertion sort's strength: Very fast on nearly-sorted arrays (approaches O(N)). std::sort switches to insertion sort for small subarrays.

void insertionSort(vector<int>& a) {
    int n = a.size();
    for (int i = 1; i < n; i++) {
        int key = a[i];   // element to insert
        int j = i - 1;
        // shift elements greater than key one position to the right
        while (j >= 0 && a[j] > key) {
            a[j + 1] = a[j];
            j--;
        }
        a[j + 1] = key;  // place key in its correct position
    }
}

🔀 Merge Sort — `O(N log N)` always

Divide and conquer: recursively split the array in half, then merge the two sorted halves back together:

[38, 27, 43, 3, 9, 82, 10]
        ↓ split recursively
[38,27,43,3]    [9,82,10]
[38,27] [43,3]  [9,82] [10]
[38][27][43][3] [9][82][10]
        ↓ merge bottom-up
[27,38] [3,43]  [9,82] [10]
  [3,27,38,43]    [9,10,82]
      [3,9,10,27,38,43,82] ✓

Merge sort is O(N log N) in all cases and is a stable sort.

void merge(vector<int>& a, int lo, int mid, int hi) {
    vector<int> tmp(a.begin() + lo, a.begin() + hi + 1);
    int i = lo, j = mid + 1, k = lo;
    while (i <= mid && j <= hi) {
        if (tmp[i - lo] <= tmp[j - lo])
            a[k++] = tmp[i++ - lo];  // take smaller from left half
        else
            a[k++] = tmp[j++ - lo];  // take smaller from right half
    }
    while (i <= mid) a[k++] = tmp[i++ - lo];  // append remaining left
    while (j <= hi)  a[k++] = tmp[j++ - lo];  // append remaining right
}

void mergeSort(vector<int>& a, int lo, int hi) {
    if (lo >= hi) return;
    int mid = lo + (hi - lo) / 2;
    mergeSort(a, lo, mid);       // sort left half
    mergeSort(a, mid + 1, hi);   // sort right half
    merge(a, lo, mid, hi);       // merge two sorted halves
}

⚡ Quicksort — `O(N log N)` average

Quicksort is one of the core algorithms underlying std::sort. Its key idea is divide and conquer:

Pick a pivot element (typically the last element)
Partition: move all elements ≤ pivot to the left, all > pivot to the right; pivot lands in its final position
Recurse on the left and right subarrays

[8, 3, 6, 1, 9, 2, 7, 4]   ← pivot = 4
         ↓ partition
[3, 1, 2, 4, 9, 6, 7, 8]   ← 4 in final position; left ≤ 4, right > 4
 ↑_______↑  ↑  ↑__________↑
 left subarray  right subarray

Recurse on [3,1,2] → [1,2,3]
Recurse on [9,6,7,8] → [6,7,8,9]

Final: [1, 2, 3, 4, 6, 7, 8, 9] ✓

// Partition arr[lo..hi] using last element as pivot.
// Returns the final index of the pivot.
int partition(vector<int>& arr, int lo, int hi) {
    int pivot = arr[hi];   // choose last element as pivot
    int i = lo - 1;        // i points to end of "≤ pivot" region

    for (int j = lo; j < hi; j++) {
        if (arr[j] <= pivot) {
            i++;
            swap(arr[i], arr[j]);  // bring arr[j] into ≤ pivot region
        }
    }
    swap(arr[i + 1], arr[hi]);  // place pivot in its final position
    return i + 1;               // return pivot's index
}

void quickSort(vector<int>& arr, int lo, int hi) {
    if (lo >= hi) return;           // base case: subarray length ≤ 1
    int p = partition(arr, lo, hi); // p is pivot's final position
    quickSort(arr, lo, p - 1);      // sort left subarray
    quickSort(arr, p + 1, hi);      // sort right subarray
}

Visual: Quicksort Partition

Quicksort Partition

The diagram above illustrates how the partition operation rearranges elements around the pivot. Elements ≤ pivot move to the left; elements > pivot move to the right. The pivot then lands in its final sorted position.

⚠️ Worst case: If the pivot is always the max or min (e.g., already-sorted input), recursion depth degrades to O(N) and total time becomes O(N²). std::sort avoids this via random pivot selection or median-of-three, guaranteeing O(N log N) worst case.

Case	Time	Notes
Average	O(N log N)	Pivot roughly splits array in half
Worst	O(N²)	Pivot always extreme (sorted input)
Space	O(log N)	Recursion stack depth (average)

3.3.3 `std::sort` in Practice

⚠️ Stability Note: std::sort is NOT stable — it uses Introsort (Quicksort + Heapsort + Insertion sort hybrid), which does not preserve the relative order of equal elements. If you need stable sorting, use std::stable_sort instead (see the comparison table in this section).

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;
    vector<int> v(n);
    for (int &x : v) cin >> x;

    // Sort ascending
    sort(v.begin(), v.end());

    // Sort descending
    sort(v.begin(), v.end(), greater<int>());

    // Sort only part of a vector (indices 2 through 5 inclusive)
    sort(v.begin() + 2, v.begin() + 6);

    for (int x : v) cout << x << " ";
    cout << "\n";

    return 0;
}

Sorting by Multiple Criteria

Often you want to sort by one field, and break ties with another. With pair, this is automatic (sorts by .first, then .second):

vector<pair<int, string>> students;
students.push_back({85, "Alice"});
students.push_back({92, "Bob"});
students.push_back({85, "Charlie"});

sort(students.begin(), students.end());
// Result: {85, "Alice"}, {85, "Charlie"}, {92, "Bob"}
// Sorted by score first, then alphabetically by name

Custom Comparators

A comparator is a function that returns true if the first argument should come before the second in the sorted order.

The clearest way to write a comparator is as a standalone function:

struct Cow {
    string name;
    int weight;
    int height;
};

// Sort by weight ascending; break ties by height descending
bool cmpCow(const Cow &a, const Cow &b) {
    if (a.weight != b.weight) return a.weight < b.weight;  // lighter first
    return a.height > b.height;                             // taller first (tie-break)
}

int main() {
    vector<Cow> cows = {{"Bessie", 500, 140}, {"Elsie", 480, 135}, {"Moo", 500, 138}};

    sort(cows.begin(), cows.end(), cmpCow);

    for (auto &c : cows) {
        cout << c.name << " " << c.weight << " " << c.height << "\n";
    }
    // Output:
    // Elsie 480 135
    // Bessie 500 140
    // Moo 500 138
    return 0;
}

💡 Style Note: Defining cmp as a standalone function (rather than an inline lambda) makes the sorting logic easier to read, test, and reuse — especially when the comparison involves multiple fields.

Sorting Algorithm Stability

⚠️ Important: std::sort is NOT stable — equal elements may appear in any order after sorting. Use std::stable_sort if relative order of equal elements must be preserved.

Sorting Algorithm Stability Comparison

Algorithm	Time Complexity	Space Complexity	Stable	C++ Function
std::sort	O(N log N)	O(log N)	❌	`sort()`
std::stable_sort	O(N log² N)	O(N)	✅	`stable_sort()`
std::partial_sort	O(N log K)	O(1)	❌	`partial_sort()`
Counting Sort	O(N+K)	O(K)	✅	Manual
Radix Sort	O(d(N+K))	O(N+K)	✅	Manual

📝 Note: std::sort uses Introsort (a hybrid of Quicksort + Heapsort + Insertion sort). Because Quicksort is not stable, std::sort makes no guarantee on the relative order of equal elements. When you sort students by score and need students with the same score to remain in their original order, use std::stable_sort.

Visual: Sorting Algorithm Comparison

Sorting Algorithm Comparison

This chart compares the time complexity, space usage, and stability of common sorting algorithms, helping you choose the right one for each situation.

Counting Sort — O(N+K) for Small Value Ranges

When values are bounded integers in a small range [0, MAXVAL], counting sort beats std::sort by a wide margin:

// Counting sort: for integers in range [0, MAXVAL]
// Time O(N+MAXVAL), stable sort
void countingSort(vector<int>& arr, int maxVal) {
    vector<int> cnt(maxVal + 1, 0);
    for (int x : arr) cnt[x]++;
    int idx = 0;
    for (int v = 0; v <= maxVal; v++)
        while (cnt[v]--) arr[idx++] = v;
}
// USACO use case: faster than std::sort when value range is small (e.g., cow IDs 1-1000)

When to use counting sort in USACO:

Cow IDs in range [1, 1000], N = 10^6 → counting sort is O(N + 1000) vs O(N log N)
Grade values [0, 100] → trivially fast
Color categories [0, 3] → instant

Caution: If MAXVAL is large (e.g., 10^9), counting sort requires O(MAXVAL) memory — don't use it. Coordinate compress first (Section 3.3.6), then count.

3.3.4 Binary Search

Binary search finds a target in a sorted array in O(log n) — instead of O(n) for linear search.

Analogy: Searching for a word in a dictionary. You don't start from A and read every entry — you open to the middle, check if your word is before or after, then repeat. Each step cuts the search space in half: after k steps, you've gone from N candidates to N/2^k. When N/2^k < 1, you're done — that takes k = log₂(N) steps.

💡 Key Insight: Binary search works whenever you have a monotone predicate — a condition that is false false false ... true true true (or the reverse). You can binary search for the boundary between false and true in O(log N).

Visual: Binary Search in Action

Binary Search

The diagram above shows a single-step binary search finding 7 in [1,3,5,7,9,11,13]. The left (L), right (R), and mid (M) pointers are shown. The key insight: computing mid = left + (right - left) / 2 avoids integer overflow compared to (left + right) / 2.

Manual Binary Search

// Solution: Binary Search — O(log N)
#include <bits/stdc++.h>
using namespace std;

// Returns index of target in sorted arr, or -1 if not found
int binarySearch(const vector<int> &arr, int target) {
    int lo = 0, hi = (int)arr.size() - 1;

    while (lo <= hi) {
        int mid = lo + (hi - lo) / 2;  // ← KEY LINE: avoid overflow (don't use (lo+hi)/2)

        if (arr[mid] == target) {
            return mid;         // found!
        } else if (arr[mid] < target) {
            lo = mid + 1;       // target is in the right half
        } else {
            hi = mid - 1;       // target is in the left half
        }
    }

    return -1;  // not found
}

int main() {
    vector<int> v = {1, 3, 5, 7, 9, 11, 13, 15};
    cout << binarySearch(v, 7) << "\n";   // 3 (index)
    cout << binarySearch(v, 6) << "\n";   // -1 (not found)
    return 0;
}

Step-by-step trace for searching 7 in [1, 3, 5, 7, 9, 11, 13, 15]:

lo=0, hi=7: mid=3, arr[3]=7 → FOUND at index 3 ✓

Searching for 6:
lo=0, hi=7: mid=3, arr[3]=7 > 6 → hi=2
lo=0, hi=2: mid=1, arr[1]=3 < 6 → lo=2
lo=2, hi=2: mid=2, arr[2]=5 < 6 → lo=3
lo=3 > hi=2: loop ends → return -1 ✓

Why lo + (hi - lo) / 2? If lo and hi are both large (close to INT_MAX), then lo + hi overflows! This formula is equivalent but safe.

The STL Way: `lower_bound` and `upper_bound`

These are almost always what you actually want in competitive programming:

// STL Binary Search Operations — all O(log N)
#include <bits/stdc++.h>
using namespace std;

int main() {
    vector<int> v = {1, 3, 3, 5, 7, 9, 9, 11};

    // lower_bound: iterator to first element >= target
    auto lb = lower_bound(v.begin(), v.end(), 3);
    cout << *lb << "\n";                    // 3 (first 3)
    cout << lb - v.begin() << "\n";         // 1 (index)

    // upper_bound: iterator to first element > target
    auto ub = upper_bound(v.begin(), v.end(), 3);
    cout << *ub << "\n";                    // 5 (first element after all 3s)
    cout << ub - v.begin() << "\n";         // 3 (index)

    // Count occurrences: upper_bound - lower_bound
    int count_of_3 = upper_bound(v.begin(), v.end(), 3)
                   - lower_bound(v.begin(), v.end(), 3);
    cout << count_of_3 << "\n";   // 2

    // Check if value exists
    bool exists = binary_search(v.begin(), v.end(), 7);
    cout << exists << "\n";  // 1

    // Find largest value <= target (floor)
    auto it = upper_bound(v.begin(), v.end(), 6);
    if (it != v.begin()) {
        --it;
        cout << *it << "\n";  // 5 (largest value <= 6)
    }

    return 0;
}

⚠️ Common Mistake: Using lower_bound/upper_bound on an unsorted container. These functions assume sorted order — on unsorted data, they give wrong results with no error!

3.3.5 Binary Search on the Answer

This is one of the most powerful and commonly-tested techniques in USACO Silver. The idea:

Instead of searching for a value in an array, binary search over the answer space itself.

When does this apply? When:

The answer is a number in some range [lo, hi]
There's a function canAchieve(X) that checks if X is feasible
The function is monotone: if X works, all values ≤ X also work (or all ≥ X work)

💡 Key Insight: Monotonicity means there's a "threshold" separating feasible from infeasible answers. Binary search finds this threshold in O(log(hi-lo)) calls to canAchieve. If each call takes O(f(N)), total time is O(f(N) × log(answer_range)).

Classic Example: Aggressive Cows (USACO 2011 March Silver)

Problem: N stalls at positions p[1..N], place C cows to maximize the minimum distance between any two cows.

Why binary search? If we can place cows with minimum gap D, we can also place them with gap D-1. So feasibility is monotone: there's a threshold D* where ≥ D* is infeasible and < D* is feasible. We binary search for D*.

The canPlace(minDist) function: Place the first cow at the leftmost stall, then greedily pick the next stall that is at least minDist away. Count how many cows we can place this way — if ≥ C, return true.

// Solution: Binary Search on Answer — O(N log N log(max_distance))
#include <bits/stdc++.h>
using namespace std;

int n, c;
vector<int> stalls;

// Can we place c cows such that the minimum gap between any two cows is >= minDist?
bool canPlace(int minDist) {
    int placed = 1;           // place first cow at stall 0
    int lastPos = stalls[0];  // position of last placed cow

    for (int i = 1; i < n; i++) {
        if (stalls[i] - lastPos >= minDist) {  // this stall is far enough
            placed++;
            lastPos = stalls[i];
        }
    }
    return placed >= c;  // did we place all c cows?
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    cin >> n >> c;
    stalls.resize(n);
    for (int &x : stalls) cin >> x;
    sort(stalls.begin(), stalls.end());  // must sort first!

    // Binary search on the answer: what's the maximum possible minimum distance?
    int lo = 1, hi = stalls.back() - stalls.front();
    int answer = 0;

    while (lo <= hi) {
        int mid = lo + (hi - lo) / 2;
        if (canPlace(mid)) {
            answer = mid;    // mid works, try larger
            lo = mid + 1;
        } else {
            hi = mid - 1;    // mid doesn't work, try smaller
        }
    }

    cout << answer << "\n";
    return 0;
}

Trace for stalls = [1, 2, 4, 8, 9], C = 3:

Sorted: [1, 2, 4, 8, 9]
lo=1, hi=8

mid=4: canPlace(4)?
  Place cow at 1. Next stall ≥ 1+4=5: that's 8. Place at 8.
  Next stall ≥ 8+4=12: none. Total placed=2 < 3. Return false.
  → hi = 3

mid=2: canPlace(2)?
  Place cow at 1. Next stall ≥ 3: that's 4. Place at 4.
  Next stall ≥ 6: that's 8. Place at 8. Total placed=3 ≥ 3. Return true.
  → answer=2, lo=3

mid=3: canPlace(3)?
  Place cow at 1. Next ≥ 4: that's 4. Place at 4.
  Next ≥ 7: that's 8. Place at 8. Total placed=3 ≥ 3. Return true.
  → answer=3, lo=4

lo=4 > hi=3: done. Answer = 3

Another Classic: Minimum Time to Complete Tasks (Rope Cutting)

Problem: Given N ropes of lengths L[i], cut K ropes of equal length. What's the maximum length you can cut each piece to?

// Can we get K pieces of length >= len from the ropes?
bool canCut(vector<int> &ropes, long long len, int K) {
    long long count = 0;
    for (int r : ropes) count += r / len;  // pieces from each rope
    return count >= K;
}

// Binary search: maximize len such that canCut(len) is true
long long lo = 1, hi = *max_element(ropes.begin(), ropes.end());
long long answer = 0;
while (lo <= hi) {
    long long mid = lo + (hi - lo) / 2;
    if (canCut(ropes, mid, K)) {
        answer = mid;
        lo = mid + 1;
    } else {
        hi = mid - 1;
    }
}

Template for Binary Search on Answer:

// Generic template — adapt lo, hi, and check() for your problem
long long lo = min_possible_answer;
long long hi = max_possible_answer;
long long answer = lo;  // or -1 if no valid answer exists

while (lo <= hi) {
    long long mid = lo + (hi - lo) / 2;
    if (check(mid)) {       // mid is feasible
        answer = mid;       // save it
        lo = mid + 1;       // try to do better (or worse, depending on problem)
    } else {
        hi = mid - 1;       // mid not feasible, go lower
    }
}

🏆 USACO Tip: Whenever a USACO problem asks "find the maximum X such that [some condition]" or "find the minimum X such that [some condition]," consider binary search on the answer. This technique solves USACO Silver problems frequently.

3.3.6 Coordinate Compression

Sometimes values are large (up to 10^9), but there are few distinct values. Coordinate compression maps them to small indices (0, 1, 2, ...).

// Solution: Coordinate Compression — O(N log N)
#include <bits/stdc++.h>
using namespace std;

int main() {
    vector<int> A = {100, 500, 200, 100, 700, 200};

    // Step 1: Get sorted unique values
    vector<int> sorted_unique = A;
    sort(sorted_unique.begin(), sorted_unique.end());
    sorted_unique.erase(unique(sorted_unique.begin(), sorted_unique.end()),
                        sorted_unique.end());
    // sorted_unique = {100, 200, 500, 700}

    // Step 2: Map each original value to its compressed index
    vector<int> compressed(A.size());
    for (int i = 0; i < (int)A.size(); i++) {
        compressed[i] = lower_bound(sorted_unique.begin(), sorted_unique.end(), A[i])
                        - sorted_unique.begin();
        // 100→0, 200→1, 500→2, 700→3
    }

    for (int x : compressed) cout << x << " ";
    cout << "\n";  // 0 2 1 0 3 1

    return 0;
}

⚠️ Common Mistakes in Chapter 3.3

Sorting with wrong comparator: Your lambda must return true if a should come BEFORE b. If it returns true for a == b, you get undefined behavior (strict weak ordering violation).
Binary search on unsorted array: lower_bound and upper_bound assume sorted order. On unsorted data, results are meaningless.
Off-by-one in binary search: lo <= hi vs lo < hi matters. When in doubt, test your binary search on a 1-element and 2-element array.
Wrong answer range in "binary search on answer": If the answer could be 0, set lo = 0, not lo = 1. If it could be very large, make sure hi is large enough (use long long if necessary).
Integer overflow in mid computation: Always write mid = lo + (hi - lo) / 2, never (lo + hi) / 2.

Chapter Summary

📌 Key Takeaways

Operation	Method	Time Complexity	Notes
Sort ascending	`sort(v.begin(), v.end())`	`O(N log N)`	Uses IntroSort
Sort descending	`sort(..., greater<int>())`	`O(N log N)`
Custom sort	Lambda comparator	`O(N log N)`	Must be strict weak order
Find exact value	`binary_search`	`O(log N)`	Returns bool
First index ≥ x	`lower_bound`	`O(log N)`	Returns iterator
First index > x	`upper_bound`	`O(log N)`	Returns iterator
Count of value x	`ub - lb`	`O(log N)`
Binary search on answer	Manual BS + check()	`O(f(N) log V)`	V = answer range
Coordinate compression	sort + unique + lower_bound	`O(N log N)`	Map large values to small indices

🧩 Binary Search Template Quick Reference

Scenario	lo/hi init	Update rule	Answer
Maximize value satisfying condition	lo=min, hi=max	`check(mid) → ans=mid, lo=mid+1`	ans
Minimize value satisfying condition	lo=min, hi=max	`check(mid) → hi=mid`	lo (when loop ends)
Floating-point binary search	lo=min, hi=max	Loop 100 times, `check(mid) → hi=mid` else `lo=mid`	lo ≈ hi

❓ FAQ

Q1: Is sort's time complexity O(N log N) or O(N²)?

A: C++'s std::sort uses Introsort (a hybrid of Quicksort + Heapsort + Insertion sort), guaranteeing O(N log N) worst case. No need to worry about degrading to O(N²). But note: if your custom comparator doesn't satisfy strict weak ordering, behavior is undefined (may infinite loop or crash).

Q2: What's the difference between lo <= hi and lo < hi in binary search?

A: The two styles correspond to different templates:

while (lo <= hi): when search ends, lo > hi, answer is stored in answer variable. Good for "find target value" or "maximize value satisfying condition".

while (lo < hi): when search ends, lo == hi, answer is lo. Good for "minimize value satisfying condition". Both can solve all problems; the key is pairing with the correct update rule. Beginners should pick one style and stick with it.

Q3: What problems is "binary search on answer" applicable to? How to identify them?

A: Three signals: ① The problem asks "the maximum/minimum X such that..."; ② There exists a decision function check(X) that can determine feasibility in polynomial time; ③ The decision function is monotone (X feasible → X-1 also feasible, or vice versa). If all three hold, binary search on answer applies.

Q4: What is coordinate compression actually useful for?

A: When the value range is large (e.g., 10^9) but the number of distinct values is small (e.g., 10^5), coordinate compression maps large values to small indices 0~N-1. This lets you use arrays instead of maps (faster), or perform prefix sums/BIT operations over the value domain. Frequently needed in USACO Silver.

Q5: Why can't the sort comparator use <=?

A: C++ sorting requires the comparator to satisfy strict weak ordering: when a == b, comp(a,b) must return false. <= returns true when a==b, violating this rule. The result is undefined behavior — may infinite loop, crash, or produce incorrect ordering.

🔗 Connections to Later Chapters

Chapter 3.4 (Two Pointers): two-pointer techniques are often used after sorting — sort first O(N log N), then two pointers O(N)
Chapter 3.2 (Prefix Sums): prefix sum arrays are naturally ordered, enabling binary search on them (e.g., find first prefix sum ≥ target)
Chapters 4.1 & 5.4 (Greedy + Shortest Paths): Dijkstra internally uses a priority queue + greedy strategy, fundamentally related to sorting
Chapter 6.2 (DP): LIS (Longest Increasing Subsequence) can be optimized to O(N log N) using binary search
"Binary search on answer" is one of the most core techniques in USACO Silver, also frequently combined in Chapter 4.1 (Greedy)

Practice Problems

Problem 3.3.1 — Closest Pair 🟢 Easy Read N integers. Find the pair with the minimum difference. Print that difference.

Hint

Sort the array. The closest pair must be adjacent after sorting — scan pairs and take the minimum difference.

Problem 3.3.2 — Room Allocation 🟡 Medium Read N events, each with start and end time. What is the maximum number of events that overlap at any single moment? (Hint: sort start/end times together and sweep)

Hint

Create an array of events: (time, +1 for start, -1 for end). Sort by time. Sweep and maintain a running count of active events; track the maximum.

Problem 3.3.3 — Kth Smallest 🟡 Medium Read N integers. Find the K-th smallest element (1-indexed).

Solution sketch: Binary search on the answer X. Count how many elements ≤ X using a scan — this is O(N). Total: O(N log(max_value)).

Hint

Alternatively, just sort and return v[K-1]. But try the binary search approach for practice!

Problem 3.3.4 — Aggressive Cows (USACO 2011 March Silver) 🔴 Hard N stalls at positions p[1..N], place C cows to maximize the minimum distance between any two cows. (Full implementation of the example above.)

Solution sketch: Sort stalls. Binary search on minimum distance D. For each D, greedily place cows: always place next cow at the earliest stall that is ≥ D away from the last cow.

Hint

The check function `canPlace(D)` runs in `O(N)` by scanning sorted stalls greedily. Total time: `O(N log N)` sort + `O(N log(max_dist))` binary search.

Problem 3.3.5 — Binary Search on Answer: Painter's Partition 🔴 Hard N boards with widths w[1..N]. K painters, each takes 1 unit time per unit width. Assign contiguous boards to painters to minimize total time (the maximum any single painter works).

Solution sketch: Binary search on the answer T (max time any painter works). Check: greedily assign boards to painters, starting a new painter whenever the current one would exceed T. If ≤ K painters suffice, T is feasible.

Hint

Feasibility check: simulate greedily — run left to right, assign boards to current painter until adding the next board would exceed T. `O(N)` per check, `O(log(sum))` binary search iterations.

🏆 Challenge Problem: USACO 2016 February Silver: Fencing the Cows Fence all N points in a convex region using minimum fencing. This is the Convex Hull problem — look up the Graham scan or Jarvis march algorithms. While this is a Gold-level topic, thinking about it now will prime your intuition.

3.3.7 Advanced Binary Search on Answer — Three Examples

Example 1: Minimum Time to Finish Tasks (Parametric Search)

Problem: N workers, M tasks with effort[i]. Assign tasks to workers (each worker gets contiguous tasks). Minimize the maximum time any worker spends (minimize the bottleneck).

This is the "Painter's Partition" problem. Binary search on the answer (max time T), check if T is achievable.

// Check: can we distribute tasks among K workers so max work <= T?
bool canFinish(vector<int>& tasks, int K, long long T) {
    int workers = 1;
    long long current = 0;
    for (int t : tasks) {
        if (t > T) return false;  // single task exceeds T — impossible
        if (current + t > T) {
            workers++;             // start new worker
            current = t;
            if (workers > K) return false;
        } else {
            current += t;
        }
    }
    return true;
}

// Binary search on T
long long lo = *max_element(tasks.begin(), tasks.end());  // minimum possible T
long long hi = accumulate(tasks.begin(), tasks.end(), 0LL);  // maximum T (1 worker)

while (lo < hi) {
    long long mid = lo + (hi - lo) / 2;
    if (canFinish(tasks, K, mid)) hi = mid;  // mid works, try smaller
    else lo = mid + 1;                        // mid doesn't work, need larger
}
cout << lo << "\n";  // minimum possible maximum time

📝 Note: Here we binary search for the minimum feasible T, so we use hi = mid when feasible (not answer = mid; lo = mid+1). The two templates are mirror images.

Example 2: Kth Smallest in Multiplication Table

Problem: N×M multiplication table. Find the Kth smallest value.

The table has values i*j for 1≤i≤N, 1≤j≤M. Binary search on the answer X: count how many values are ≤ X.

// Count values <= X in N×M multiplication table
long long countLE(long long X, int N, int M) {
    long long count = 0;
    for (int i = 1; i <= N; i++) {
        count += min((long long)M, X / i);
        // Row i has values i, 2i, ..., Mi
        // Count of values <= X in row i: min(M, floor(X/i))
    }
    return count;
}

// Binary search for Kth smallest
long long lo = 1, hi = (long long)N * M;
while (lo < hi) {
    long long mid = lo + (hi - lo) / 2;
    if (countLE(mid, N, M) >= K) hi = mid;
    else lo = mid + 1;
}
cout << lo << "\n";

Complexity: O(N log(NM)) — O(N) per check, O(log(NM)) iterations.

Example 3: USACO-Style Cable Length (Agri-Net inspired)

Problem: Given N farm locations, connect them all with cables. The cables must be at most length L. Find the maximum L such that you can form a spanning tree with all edges ≤ L.

// Binary search on maximum cable length L
// Check: does a spanning tree exist using only edges of length <= L?
// (This reduces to: is the graph connected when restricted to edges <= L?)
bool canConnect(vector<tuple<int,int,int>>& edges, int n, int L) {
    DSU dsu(n);
    for (auto [w, u, v] : edges) {
        if (w <= L) dsu.unite(u, v);
    }
    return dsu.components == 1;  // all nodes connected
}

3.3.8 lower_bound / upper_bound Complete Cheat Sheet

vector<int> v = {1, 3, 3, 5, 7, 9, 9, 11};
//                0  1  2  3  4  5  6   7

// ── lower_bound: first position >= x ──
lower_bound(all, 3)  → index 1  (first 3)
lower_bound(all, 4)  → index 3  (first element >= 4, which is 5)
lower_bound(all, 12) → index 8  (past-end: no element ≥ 12 exists in the array)

// ── upper_bound: first position > x ──
upper_bound(all, 3)  → index 3  (first element after all 3s)
upper_bound(all, 4)  → index 3  (same as above: no 4s)
upper_bound(all, 11) → index 8  (past-end)

// ── Derived operations ──
// Count occurrences of x:
ub(x) - lb(x) = upper_bound(all,3) - lower_bound(all,3) = 3-1 = 2 ✓

// Does x exist?
binary_search(all, x)  // O(log N), returns bool

// Largest value <= x (floor):
auto it = upper_bound(all, x);
if (it != v.begin()) cout << *prev(it);  // *--it

// Smallest value >= x (ceil):
auto it = lower_bound(all, x);
if (it != v.end()) cout << *it;

// Largest value < x (strict floor):
auto it = lower_bound(all, x);
if (it != v.begin()) cout << *prev(it);

// Count elements < x:
lower_bound(all, x) - v.begin()

// Count elements <= x:
upper_bound(all, x) - v.begin()

// Count elements in range [a, b]:
upper_bound(all, b) - lower_bound(all, a)

Goal	Code	Note
First index ≥ x	`lower_bound(v.begin(), v.end(), x) - v.begin()`	Equals v.size() if all < x
First index > x	`upper_bound(v.begin(), v.end(), x) - v.begin()`
Count of value x	`upper_bound(...,x) - lower_bound(...,x)`
Largest value ≤ x	`*prev(upper_bound(...,x))`	Check iterator ≠ begin
Smallest value ≥ x	`*lower_bound(...,x)`	Check iterator ≠ end
Does x exist?	`binary_search(...)`	Returns bool

3.3.9 Custom Predicate Binary Search

For non-standard sorted structures or custom criteria:

// Binary search with custom predicate
// Find first index i where pred(i) is true, in range [lo, hi]
// Assumption: pred is monotone: false...false, true...true

int lo = 0, hi = n - 1, answer = -1;
while (lo <= hi) {
    int mid = lo + (hi - lo) / 2;
    if (/* some condition on mid */) {
        answer = mid;
        hi = mid - 1;  // look for smaller index
    } else {
        lo = mid + 1;
    }
}

// Example: first index where arr[i] >= arr[i-1] + 10 (gap >= 10)
int lo = 1, hi = n - 1, firstLargeGap = -1;
while (lo <= hi) {
    int mid = lo + (hi - lo) / 2;
    if (arr[mid] - arr[mid-1] >= 10) {
        firstLargeGap = mid;
        hi = mid - 1;
    } else {
        lo = mid + 1;
    }
}

// Floating point binary search (epsilon-based)
double lo_f = 0.0, hi_f = 1e9;
for (int iter = 0; iter < 100; iter++) {  // 100 iterations → error < 1e-30
    double mid = (lo_f + hi_f) / 2;
    if (check(mid)) hi_f = mid;
    else lo_f = mid;
}
// Answer: lo_f (or hi_f, they converge to same value)

🏆 USACO Pro Tip: "Binary search on answer" is one of the most common Silver techniques. When you see "maximize/minimize X subject to [constraint]," ask yourself: Is the feasibility function monotone? If yes, binary search.

3.3.10 Ternary Search — Finding the Peak of a Unimodal Function

Binary search requires a monotone predicate (false→true boundary). For unimodal functions (increases then decreases), use ternary search to find the maximum.

💡 When to use: A function f is unimodal on [lo, hi] if it first strictly increases then strictly decreases (or is always one direction). Ternary search finds the maximum point in O(log((hi-lo)/eps)) evaluations.

USACO appearances: Problems where the answer depends on a continuous parameter (e.g., "find the optimal point on a line to minimize the sum of distances to a set of points") sometimes require ternary search.

// Ternary search: find maximum of unimodal function f on [lo, hi]
// Prerequisite: f increases then decreases (unimodal)
// Time: O(log((hi-lo)/eps)) for continuous, or O(log N) for integers

// f must be declared/defined before calling this
double ternarySearch(double lo, double hi) {
    for (int iter = 0; iter < 200; iter++) {
        double m1 = lo + (hi - lo) / 3;
        double m2 = hi - (hi - lo) / 3;
        if (f(m1) < f(m2)) lo = m1;  // maximum is in [m1, hi]
        else hi = m2;                 // maximum is in [lo, m2]
    }
    return (lo + hi) / 2;  // Maximum point (lo ≈ hi after convergence)
}

// Integer ternary search (when f is defined on integers):
int ternarySearchInt(int lo, int hi) {
    // 使用 > 2 而非 >= 2：保留至少 3 个候选值再暴力枚举。
    // 当范围缩至 2 个元素时，m1 == m2（因为 (hi-lo)/3 == 0），
    // 会导致死循环。用 > 2 可确保安全退出并正确处理边界。
    while (hi - lo > 2) {
        int m1 = lo + (hi - lo) / 3;
        int m2 = hi - (hi - lo) / 3;
        if (f(m1) < f(m2)) lo = m1 + 1;
        else hi = m2 - 1;
    }
    // Check remaining candidates [lo, hi] (at most 3 elements)
    int best = lo;
    for (int x = lo + 1; x <= hi; x++)
        if (f(x) > f(best)) best = x;
    return best;
}

Contrast with binary search:

	Binary Search	Ternary Search
Requires	Monotone predicate	Unimodal function
Finds	Boundary (false→true)	Peak (maximum/minimum)
Each step eliminates	Half the range	One-third of the range
Iterations for ε precision	log₂(range/ε)	log₃/₂(range/ε) ≈ 2.4× more

⚠️ Note: Ternary search on integers requires care — use while (hi - lo > 2) to avoid infinite loops when the range shrinks to 2 or 3 elements, then brute-force the remaining candidates.

📖 Chapter 3.4 ⏱️ ~50 min read 🎯 Intermediate

Chapter 3.4: Two Pointers & Sliding Window

📝 Before You Continue: You should be comfortable with arrays, vectors, and std::sort (Chapters 2.3–3.3). This technique requires a sorted array for the classic two-pointer approach.

Two pointers and sliding window are among the most elegant tricks in competitive programming. They transform naive O(N²) solutions into O(N) by exploiting monotonicity: as one pointer moves forward, the other never needs to go backward.

3.4.1 The Two Pointer Technique

The idea: maintain two indices, left and right, into a sorted array. Move them toward each other (or in the same direction) based on the current sum/window.

When to use:

Finding a pair/triplet with a given sum in a sorted array
Checking if a sorted array contains two elements with a specific relationship
Problems where "if we can do X with window size k, we can do X with window size k-1"

Two Pointer Technique

The diagram shows how two pointers converge toward the center, each step eliminating an entire row/column of pairs from consideration.

Problem: Find All Pairs with Sum = Target

Naïve O(N²) approach:

// O(N²): check every pair
for (int i = 0; i < n; i++) {
    for (int j = i + 1; j < n; j++) {
        if (arr[i] + arr[j] == target) {
            cout << arr[i] << " + " << arr[j] << "\n";
        }
    }
}

Two Pointer O(N) approach (requires sorted array):

// Solution: Two Pointer — O(N log N) for sort + O(N) for search
#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, target;
    cin >> n >> target;
    vector<int> arr(n);
    for (int &x : arr) cin >> x;

    sort(arr.begin(), arr.end());  // MUST sort first

    int left = 0, right = n - 1;
    while (left < right) {
        int sum = arr[left] + arr[right];
        if (sum == target) {
            cout << arr[left] << " + " << arr[right] << " = " << target << "\n";
            left++;
            right--;  // advance both pointers
        } else if (sum < target) {
            left++;   // sum too small: move left pointer right (increase sum)
        } else {
            right--;  // sum too large: move right pointer left (decrease sum)
        }
    }

    return 0;
}

Why Does This Work?

Key insight: After sorting, if arr[left] + arr[right] < target, then no element smaller than arr[right] can pair with arr[left] to reach target. So we safely advance left.

Similarly, if the sum is too large, no element larger than arr[left] can pair with arr[right] to reach target. So we safely decrease right.

Each step eliminates at least one element from consideration → O(N) total steps.

Complete Trace

Array = [1, 2, 3, 4, 5, 6, 7, 8], target = 9:

State: left=0(1), right=7(8)
  sum = 1+8 = 9 ✓ → print (1,8), left++, right--

State: left=1(2), right=6(7)
  sum = 2+7 = 9 ✓ → print (2,7), left++, right--

State: left=2(3), right=5(6)
  sum = 3+6 = 9 ✓ → print (3,6), left++, right--

State: left=3(4), right=4(5)
  sum = 4+5 = 9 ✓ → print (4,5), left++, right--

State: left=4, right=3 → left >= right, STOP

All pairs: (1,8), (2,7), (3,6), (4,5)

3-Sum Extension

Finding a triplet that sums to target: fix one element, use two pointers for the remaining pair.

// O(N²) — much better than O(N³) brute force
sort(arr.begin(), arr.end());
for (int i = 0; i < n - 2; i++) {
    int left = i + 1, right = n - 1;
    while (left < right) {
        int sum = arr[i] + arr[left] + arr[right];
        if (sum == target) {
            cout << arr[i] << " " << arr[left] << " " << arr[right] << "\n";
            left++; right--;
        } else if (sum < target) left++;
        else right--;
    }
}

3.4.2 Sliding Window — Fixed Size

A sliding window of fixed size K moves across an array, maintaining a running aggregate (sum, max, count of distinct, etc.).

Problem: Find the maximum sum of any contiguous subarray of size K.

Array: [2, 1, 5, 1, 3, 2], K=3
Windows: [2,1,5]=8, [1,5,1]=7, [5,1,3]=9, [1,3,2]=6
Answer: 9

Naïve O(NK): Compute sum from scratch for each window.

Sliding window O(N): Add the new element entering the window, subtract the element leaving.

// Solution: Sliding Window Fixed Size — O(N)
#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, k;
    cin >> n >> k;
    vector<int> arr(n);
    for (int &x : arr) cin >> x;

    // Compute sum of first window
    long long windowSum = 0;
    for (int i = 0; i < k; i++) windowSum += arr[i];

    long long maxSum = windowSum;

    // Slide the window: add arr[i], remove arr[i-k]
    for (int i = k; i < n; i++) {
        windowSum += arr[i];        // new element enters window
        windowSum -= arr[i - k];   // old element leaves window
        maxSum = max(maxSum, windowSum);
    }

    cout << maxSum << "\n";
    return 0;
}

Trace for [2, 1, 5, 1, 3, 2], K=3:

Initial window [2,1,5]: sum=8, max=8
i=3: add 1, remove 2 → sum=7, max=8
i=4: add 3, remove 1 → sum=9, max=9
i=5: add 2, remove 5 → sum=6, max=9
Answer: 9 ✓

3.4.3 Sliding Window — Variable Size

The most powerful variant: the window expands when we need more, and shrinks when a constraint is violated.

Problem: Find the smallest contiguous subarray with sum ≥ target.

// Solution: Variable Window — O(N)
#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, target;
    cin >> n >> target;
    vector<int> arr(n);
    for (int &x : arr) cin >> x;

    int left = 0;
    long long windowSum = 0;
    int minLen = INT_MAX;

    for (int right = 0; right < n; right++) {
        windowSum += arr[right];   // expand: add right element

        // Shrink window from left while constraint satisfied
        while (windowSum >= target) {
            minLen = min(minLen, right - left + 1);
            windowSum -= arr[left];
            left++;                // shrink: remove left element
        }
    }

    if (minLen == INT_MAX) cout << 0 << "\n";  // no such subarray
    else cout << minLen << "\n";

    return 0;
}

Why O(N)? Each element is added once (when right passes it) and removed at most once (when left passes it). Total operations: O(2N) = O(N).

Problem: Longest Subarray with At Most K Distinct Values

// Variable window: longest subarray with at most K distinct values
int left = 0, maxLen = 0;
map<int, int> freq;  // frequency of each value in window

for (int right = 0; right < n; right++) {
    freq[arr[right]]++;

    // Shrink while we have > k distinct values
    while ((int)freq.size() > k) {
        freq[arr[left]]--;
        if (freq[arr[left]] == 0) freq.erase(arr[left]);
        left++;
    }

    maxLen = max(maxLen, right - left + 1);
}
cout << maxLen << "\n";

3.4.4 USACO Example: Haybale Stacking

Problem (USACO 2012 November Bronze): N haybales in a line. M operations, each adds 1 to all bales in range [a, b]. How many bales have an odd number of additions at the end?

This is actually best solved with a difference array (Chapter 3.2), but a simpler version:

Problem: Given array of integers, find the longest subarray where all elements are ≥ K.

// Two pointer: longest contiguous subarray where all elements >= K
int left = 0, maxLen = 0;
for (int right = 0; right < n; right++) {
    if (arr[right] < K) {
        left = right + 1;  // reset window: current element violates constraint
    } else {
        maxLen = max(maxLen, right - left + 1);
    }
}

⚠️ Common Mistakes

Not sorting before two-pointer: The two-pointer technique for pair sum only works on sorted arrays. Without sorting, you'll miss pairs or get wrong answers.
Moving both pointers when a pair is found: When you find a matching pair, you must move BOTH left++ AND right--. Moving only one misses some pairs (unless duplicates aren't relevant).
Off-by-one in window size: The window [left, right] has size right - left + 1, not right - left.
Forgetting to handle empty answer: For the "minimum subarray" problem, initialize minLen = INT_MAX and check if it changed before outputting.

Chapter Summary

📌 Key Takeaways

Technique	Constraint	Time	Space	Key Idea
Two pointer (pairs)	Sorted array	`O(N)`	`O(1)`	Approach from both ends, eliminate impossible pairs
Two pointer (3-sum)	Sorted array	`O(N²)`	`O(1)`	Fix one, use two pointers on the rest
Sliding window (fixed)	Any	`O(N)`	`O(1)`	Add new element, remove old element
Sliding window (variable)	Any	`O(N)`	`O(1~N)`	Expand right end, shrink left end

❓ FAQ

Q1: Does two-pointer always require sorting?

A: Not necessarily. "Opposite-direction two pointers" (like pair sum) require sorting; "same-direction two pointers" (like sliding window) do not. The key is monotonicity — pointers only move in one direction.

Q2: Both sliding window and prefix sum can compute range sums — which to use?

A: For fixed-size window sum/max, sliding window is more intuitive. For arbitrary range queries, prefix sum is more general. Sliding window can only handle "continuously moving windows"; prefix sum can answer any [L,R] query.

Q3: Can sliding window handle both "longest subarray satisfying condition" and "shortest subarray satisfying condition"?

A: Both, but with slightly different logic. "Longest": expand right until condition fails, then shrink left until condition holds again. "Shortest": expand right until condition holds, then shrink left until it no longer holds, recording the minimum length throughout.

Q4: How does two-pointer handle duplicate elements?

A: Depends on the problem. If you want "all distinct pair values", after finding a pair do left++; right-- and skip duplicate values. If you want "count of all pairs", you need to carefully count duplicates (may require extra counting logic).

🔗 Connections to Later Chapters

Chapter 3.2 (Prefix Sums): prefix sums and sliding window are complementary — prefix sums suit offline queries, sliding window suits online processing
Chapter 3.3 (Sorting): sorting is a prerequisite for two pointers — opposite-direction two pointers require a sorted array
Chapter 3.5 (Monotonic): monotonic deque can enhance sliding window — maintaining window min/max in O(N)
Chapters 6.1–6.3 (DP): some problems (like LIS variants) can be optimized with two pointers

Practice Problems

Problem 3.4.1 — Pair Sum Count 🟢 Easy Given N integers and a target T, count the number of pairs (i < j) with arr[i] + arr[j] = T.

Hint

Sort the array first. Use two pointers from both ends. When a pair is found, both advance. Handle duplicate elements carefully.

Problem 3.4.2 — Maximum Average Subarray 🟡 Medium Find the contiguous subarray of length exactly K with the maximum average. Print the average as a fraction or decimal.

Hint

Use fixed-size sliding window to find the maximum sum of K elements. Average = maxSum / K.

Problem 3.4.3 — Minimum Window Covering 🔴 Hard Given string S and string T, find the shortest substring of S that contains all characters of T.

Hint

Variable sliding window. Use a frequency map for T characters needed. Expand right until all T chars covered; shrink left while still covered. Track minimum window length.

🏆 Challenge: USACO 2017 February Bronze — Why Did the Cow Cross the Road Given a grid with cows and their destinations, find which cow can reach its destination fastest. Use two-pointer / greedy on sorted intervals.

📖 Chapter 3.5 ⏱️ ~50 min read 🎯 Intermediate

Chapter 3.5: Monotonic Stack & Monotonic Queue

📝 Before You Continue: Make sure you're comfortable with two pointers / sliding window (Chapter 3.4) and basic stack/queue operations (Chapter 3.1). This chapter builds directly on those techniques.

Monotonic stacks and queues are elegant tools that solve "nearest greater/smaller element" and "sliding window extremum" problems in O(N) time — problems that would naively require O(N²).

3.5.1 Monotonic Stack: Next Greater Element

Problem: Given an array A of N integers, for each element A[i], find the next greater element (NGE): the index of the first element to the right of i that is greater than A[i]. If none exists, output -1.

Naive approach: O(N²) — for each i, scan right until finding a greater element.

Monotonic stack approach: O(N) — maintain a stack that is always decreasing from bottom to top. When we push a new element, pop all smaller elements first (they just found their NGE!).

💡 Key Insight: The stack contains indices of elements that haven't found their NGE yet. When A[i] arrives, every element in the stack that is smaller than A[i] has found its NGE (it's i!). We pop them and record the answer.

单调栈状态变化示意（A=[2,1,5,6,2,3]）：

flowchart LR
    subgraph i0["i=0, A[0]=2"]
        direction TB
        ST0["Stack: [0]↓\n底→顶: [2]"]
    end
    subgraph i1["i=1, A[1]=1"]
        direction TB
        ST1["1<2, 直接入栈\nStack: [0,1]↓\n底→顶: [2,1]"]
    end
    subgraph i2["i=2, A[2]=5"]
        direction TB
        ST2["5>1: pop 1, NGE[1]=2\n5>2: pop 0, NGE[0]=2\nStack: [2]↓\n底→顶: [5]"]
    end
    subgraph i3["i=3, A[3]=6"]
        direction TB
        ST3["6>5: pop 2, NGE[2]=3\nStack: [3]↓\n底→顶: [6]"]
    end
    subgraph i5["i=5, A[5]=3"]
        direction TB
        ST5["3>2: pop 4, NGE[4]=5\n3<6, 入栈\nStack: [3,5]↓\n幕尾无 NGE"]
    end
    i0 --> i1 --> i2 --> i3 --> i5
    style ST2 fill:#dcfce7,stroke:#16a34a
    style ST3 fill:#dcfce7,stroke:#16a34a
    style ST5 fill:#dcfce7,stroke:#16a34a

💡 摘要： 栈始终保持单调递减（底大顶小）。每个元素至多入栈一次、出栈一次，总操作次数 O(2N) = O(N)。

Array A: [2, 1, 5, 6, 2, 3]
         idx: 0  1  2  3  4  5

Processing i=0 (A[0]=2): stack empty → push 0
Stack: [0]          // stack holds indices of unresolved elements

Processing i=1 (A[1]=1): A[1]=1 < A[0]=2 → just push
Stack: [0, 1]

Processing i=2 (A[2]=5): 
  A[2]=5 > A[1]=1 → pop 1, NGE[1] = 2  (A[2]=5 is next greater for A[1])
  A[2]=5 > A[0]=2 → pop 0, NGE[0] = 2  (A[2]=5 is next greater for A[0])
  Stack empty → push 2
Stack: [2]

Processing i=3 (A[3]=6): 
  A[3]=6 > A[2]=5 → pop 2, NGE[2] = 3
  Push 3
Stack: [3]

Processing i=4 (A[4]=2): A[4]=2 < A[3]=6 → just push
Stack: [3, 4]

Processing i=5 (A[5]=3): 
  A[5]=3 > A[4]=2 → pop 4, NGE[4] = 5
  A[5]=3 < A[3]=6 → stop, push 5
Stack: [3, 5]

End: remaining stack [3, 5] → NGE[3] = NGE[5] = -1 (no greater element to the right)

Result: NGE = [2, 2, 3, -1, 5, -1]
Verify: 
  A[0]=2, next greater is A[2]=5 ✓
  A[1]=1, next greater is A[2]=5 ✓
  A[2]=5, next greater is A[3]=6 ✓
  A[3]=6, no greater → -1 ✓

Complete Implementation

// Solution: Next Greater Element using Monotonic Stack — O(N)
#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;
    vector<int> A(n);
    for (int& x : A) cin >> x;

    vector<int> nge(n, -1);   // nge[i] = index of next greater element, -1 if none
    stack<int> st;             // monotonic decreasing stack (stores indices)

    for (int i = 0; i < n; i++) {
        // While the top of stack has a smaller value than A[i]
        // → the current element A[i] is the NGE of all those elements
        while (!st.empty() && A[st.top()] < A[i]) {
            nge[st.top()] = i;  // ← KEY: record NGE for stack top
            st.pop();
        }
        st.push(i);  // push current index (not yet resolved)
    }
    // Remaining elements in stack have no NGE → already initialized to -1

    for (int i = 0; i < n; i++) {
        cout << nge[i];
        if (i < n - 1) cout << " ";
    }
    cout << "\n";

    return 0;
}

Complexity Analysis:

Each element is pushed exactly once and popped at most once
Total operations: O(2N) = O(N)
Space: O(N) for the stack

⚠️ Common Mistake: Storing values instead of indices in the stack. Always store indices — you need to know where in the array to record the answer.

3.5.2 Variations: Previous Smaller, Previous Greater

By changing the comparison direction and the traversal direction, you get four related problems:

Problem	Stack Type	Direction	Use Case
Next Greater Element	Decreasing	Left → Right	Stock price problems
Next Smaller Element	Increasing	Left → Right	Histogram problems
Previous Greater	Decreasing	Right → Left	Range problems
Previous Smaller	Increasing	Right → Left	Nearest smaller to left

Template for Previous Smaller Element:

// Previous Smaller Element: for each i, find the nearest j < i where A[j] < A[i]
vector<int> pse(n, -1);  // pse[i] = index of previous smaller, -1 if none
stack<int> st;

for (int i = 0; i < n; i++) {
    while (!st.empty() && A[st.top()] >= A[i]) {
        st.pop();  // pop elements that are >= A[i] (not the "previous smaller")
    }
    pse[i] = st.empty() ? -1 : st.top();  // stack top is the previous smaller
    st.push(i);
}

3.5.3 USACO Application: Largest Rectangle in Histogram

Problem: Given an array of heights H[0..N-1], find the area of the largest rectangle that fits under the histogram.

Key insight: For each bar i, the largest rectangle with height H[i] extends left and right until it hits a shorter bar. Use monotonic stack to find, for each i:

left[i] = previous smaller element index
right[i] = next smaller element index

直方图最大矩形边界计算示意（H=[2,1,5,6,2,3]）：

flowchart LR
    subgraph bars["每个柱子的左右边界"]
        direction TB
        B0["i=0, H=2\nleft=-1, right=1\nwidth=1, area=2"]
        B1["i=1, H=1\nleft=-1, right=6\nwidth=6, area=6"]
        B2["i=2, H=5\nleft=1, right=4\nwidth=2, area=10 ⭐"]
        B3["i=3, H=6\nleft=2, right=4\nwidth=1, area=6"]
        B4["i=4, H=2\nleft=1, right=6\nwidth=4, area=8"]
        B5["i=5, H=3\nleft=4, right=6\nwidth=1, area=3"]
    end
    note["最大面积 = 10\n（i=2, 高度=5, 宽度=2）"]
    style B2 fill:#dcfce7,stroke:#16a34a
    style note fill:#f0fdf4,stroke:#16a34a

💡 公式： width = right[i] - left[i] - 1，area = H[i] × width。左边界用"左侧第一个更小元素的下标"，右边界用"右侧第一个更小元素的下标"。

// Solution: Largest Rectangle in Histogram — O(N)
#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;
    vector<int> H(n);
    for (int& h : H) cin >> h;

    // Find previous smaller for each position
    vector<int> left(n), right(n);
    stack<int> st;

    // Previous smaller (left boundary)
    for (int i = 0; i < n; i++) {
        while (!st.empty() && H[st.top()] >= H[i]) st.pop();
        left[i] = st.empty() ? -1 : st.top();  // index before rectangle starts
        st.push(i);
    }

    while (!st.empty()) st.pop();

    // Next smaller (right boundary)
    for (int i = n - 1; i >= 0; i--) {
        while (!st.empty() && H[st.top()] >= H[i]) st.pop();
        right[i] = st.empty() ? n : st.top();  // index after rectangle ends
        st.push(i);
    }

    // Compute maximum area
    long long maxArea = 0;
    for (int i = 0; i < n; i++) {
        long long width = right[i] - left[i] - 1;  // width of rectangle
        long long area = (long long)H[i] * width;
        maxArea = max(maxArea, area);
    }

    cout << maxArea << "\n";
    return 0;
}

Trace for H = [2, 1, 5, 6, 2, 3]:

left  = [-1, -1, 1, 2, 1, 4]   (index of previous smaller, -1 = none)
right = [1, 6, 4, 4, 6, 6]     (index of next smaller, n=6 = none)

Widths:  1-(-1)-1=1, 6-(-1)-1=6, 4-1-1=2, 4-2-1=1, 6-1-1=4, 6-4-1=1
Areas:   2×1=2, 1×6=6, 5×2=10, 6×1=6, 2×4=8, 3×1=3

Maximum area = 10
  i=2: H[2]=5, left[2]=1, right[2]=4, width=4-1-1=2, area=5×2=10 ✓
  (bars at indices 2 and 3 both have height ≥ 5, so the rectangle of height 5 spans width 2)

📌 Note for Students: Always trace through your algorithm on the sample input before submitting. Small off-by-one errors in index boundary calculations are the #1 source of bugs in monotonic stack problems.

3.5.4 Monotonic Deque: Sliding Window Maximum

Problem: Given array A of N integers and window size K, find the maximum value in each window of size K as it slides from left to right. Output N-K+1 values.

Naive approach: O(NK) — scan each window for its maximum.

Monotonic deque approach: O(N) — maintain a decreasing deque (front = maximum of current window).

💡 Key Insight: We want the maximum in a sliding window. We maintain a deque of indices such that:

The deque is decreasing in value (front is always the maximum)

The deque only contains indices within the current window

When a new element arrives:

Remove all smaller elements from the back (they can never be the maximum while this new element is in the window)

Remove the front if it's outside the current window

Step-by-Step Trace

Array A: [1, 3, -1, -3, 5, 3, 6, 7], K = 3

Window [1,3,-1]: max = 3
Window [3,-1,-3]: max = 3
Window [-1,-3,5]: max = 5
Window [-3,5,3]: max = 5
Window [5,3,6]: max = 6
Window [3,6,7]: max = 7

i=0, A[0]=1: deque=[0]
i=1, A[1]=3: 3>1 → pop 0; deque=[1]
i=2, A[2]=-1: -1<3 → push; deque=[1,2]; window [0..2]: max=A[1]=3 ✓
i=3, A[3]=-3: -3<-1 → push; deque=[1,2,3]; window [1..3]: front=1 still in window, max=A[1]=3 ✓
i=4, A[4]=5: 5>-3→pop 3; 5>-1→pop 2; 5>3→pop 1; deque=[4]; window [2..4]: max=A[4]=5 ✓
i=5, A[5]=3: 3<5→push; deque=[4,5]; window [3..5]: front=4 in window, max=A[4]=5 ✓
i=6, A[6]=6: 6>3→pop 5; 6>5→pop 4; deque=[6]; window [4..6]: max=A[6]=6 ✓
i=7, A[7]=7: 7>6→pop 6; deque=[7]; window [5..7]: max=A[7]=7 ✓

Complete Implementation

// Solution: Sliding Window Maximum using Monotonic Deque — O(N)
#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, k;
    cin >> n >> k;
    vector<int> A(n);
    for (int& x : A) cin >> x;

    deque<int> dq;   // monotonic decreasing deque, stores indices
    vector<int> result;

    for (int i = 0; i < n; i++) {
        // 1. Remove elements outside the current window
        while (!dq.empty() && dq.front() <= i - k) {
            dq.pop_front();   // ← KEY: expired window front
        }

        // 2. Maintain decreasing property
        //    Remove from back all elements smaller than A[i]
        //    (they'll never be the max while A[i] is in the window)
        while (!dq.empty() && A[dq.back()] <= A[i]) {
            dq.pop_back();    // ← KEY: pop smaller elements from back
        }

        dq.push_back(i);   // add current element

        // 3. Record maximum once first full window is formed
        if (i >= k - 1) {
            result.push_back(A[dq.front()]);  // front = maximum of current window
        }
    }

    for (int i = 0; i < (int)result.size(); i++) {
        cout << result[i];
        if (i + 1 < (int)result.size()) cout << "\n";
    }
    cout << "\n";

    return 0;
}

Complexity:

Each element is pushed/popped from the deque at most once → O(N) total
Space: O(K) for the deque

⚠️ Common Mistake #1: Forgetting to check dq.front() <= i - k for window expiration. The deque must only contain indices in [i-k+1, i].

⚠️ Common Mistake #2: Using < instead of <= when popping from the back. With <, equal elements are preserved, but duplicates can cause issues. Use <= to maintain strict decreasing deque.

3.5.5 USACO Problem: Haybale Stacking (Monotonic Stack)

🔗 Inspiration: This problem type appears in USACO Bronze/Silver ("Haybale Stacking" style).

Problem: There are N positions on a number line. You have K operations: each operation sets all positions in [L, R] to 1. After all operations, output 1 for each position that was set, 0 otherwise.

Solution: Difference array (Chapter 3.2). But let's see a harder variant:

Harder Variant: Given an array H of N "heights," find for each position i the leftmost j such that H[j] < H[i] for all k in [j+1, i-1]. (Find the span of each bar in a histogram.)

This is exactly the "stock span problem" and solves using a monotonic stack — identical to the previous smaller element pattern.

// Stock Span Problem: for each day i, find how many consecutive days
// before i had price <= price[i]
// (the "span" of day i)
vector<int> stockSpan(vector<int>& prices) {
    int n = prices.size();
    vector<int> span(n, 1);
    stack<int> st;  // monotonic decreasing stack of indices

    for (int i = 0; i < n; i++) {
        while (!st.empty() && prices[st.top()] <= prices[i]) {
            st.pop();
        }
        span[i] = st.empty() ? (i + 1) : (i - st.top());
        st.push(i);
    }
    return span;
}
// span[i] = number of consecutive days up to and including i with price <= prices[i]

3.5.6 USACO-Style Problem: Barn Painting Temperatures

Problem: N readings, find the maximum value in each window of size K.

(This is the sliding window maximum — solution already shown in 3.5.4.)

A trickier USACO variant: Given N cows in a line, each with temperature T[i]. A "fever cluster" is a maximal contiguous subarray where all temperatures are above threshold X. Find the maximum cluster size for each of Q threshold queries.

Offline approach: Sort queries by X, process with monotonic deque.

⚠️ Common Mistakes in Chapter 3.5

Storing values instead of indices — Always store indices. You need them to check window bounds and to record answers.
Wrong comparison in deque (< vs <=) — For sliding window MAXIMUM, pop when A[dq.back()] <= A[i] (strict non-increase). For MINIMUM, pop when A[dq.back()] >= A[i].
Forgetting window expiration — In sliding window deque, always check dq.front() < i - k + 1 (or <= i - k) before recording the maximum.
Stack bottom-top direction confusion — The "monotonic" property means: bottom-to-top, the stack is increasing (for NGE) or decreasing (for NSE). Draw it out if confused.
Processing order for NGE vs PSE:
- Next Greater Element: left-to-right traversal
- Previous Greater Element: right-to-left traversal (OR: left-to-right, record stack.top() before pushing)

Chapter Summary

📌 Key Summary

Problem	Data Structure	Time Complexity	Key Operation
Next Greater Element (NGE)	Monotone decreasing stack	O(N)	Pop when larger element found
Previous Smaller Element (PSE)	Monotone increasing stack	O(N)	Stack top is answer before push
Largest Rectangle in Histogram	Monotone stack (two passes)	O(N)	Left boundary + right boundary + width
Sliding Window Maximum	Monotone decreasing deque	O(N)	Maintain window + maintain decreasing property

🧩 Template Quick Reference

// Monotone decreasing stack (for NGE / Next Greater Element)
stack<int> st;
for (int i = 0; i < n; i++) {
    while (!st.empty() && A[st.top()] < A[i]) {
        answer[st.top()] = i;  // i is the NGE of st.top()
        st.pop();
    }
    st.push(i);
}

// Monotone decreasing deque (sliding window maximum)
deque<int> dq;
for (int i = 0; i < n; i++) {
    while (!dq.empty() && dq.front() <= i - k) dq.pop_front();  // remove expired
    while (!dq.empty() && A[dq.back()] <= A[i]) dq.pop_back();  // maintain monotone
    dq.push_back(i);
    if (i >= k - 1) ans.push_back(A[dq.front()]);
}

❓ FAQ

Q1: Should the monotone stack store values or indices?

A: Always store indices. Even if you only need values, storing indices is more flexible — you can get the value via A[idx], but not vice versa. Especially when computing widths (e.g., histogram problems), indices are required.

Q2: How do I decide between monotone stack and two pointers?

A: Look at the problem structure — if you need "for each element, find the first greater/smaller element to its left/right", use monotone stack. If you need "maintain the maximum of a sliding window", use monotone deque. If "two pointers moving toward each other from both ends", use two pointers.

Q3: Why is the time complexity of monotone stack O(N) and not O(N²)?

A: Amortized analysis. Each element is pushed at most once and popped at most once, totaling 2N operations, so O(N). Although a single while loop may pop multiple times, the total number of pops across all while loops never exceeds N.

Practice Problems

Problem 3.5.1 — Next Greater Element 🟢 Easy For each element in an array, find the first element to its right that is greater. Print -1 if none exists.

Hint

Maintain a monotonic decreasing stack of indices. When processing A[i], pop all smaller elements from the stack (they found their NGE).

Problem 3.5.2 — Daily Temperatures 🟢 Easy For each day, find how many days you have to wait until a warmer temperature. (LeetCode 739 style)

Hint

This is exactly NGE. Answer[i] = NGE_index[i] - i. Use monotonic decreasing stack.

Problem 3.5.3 — Sliding Window Maximum 🟡 Medium Find the maximum in each sliding window of size K.

Hint

Use monotonic decreasing deque. Maintain deque indices in range [i-k+1, i]. Front = max.

Problem 3.5.4 — Largest Rectangle in Histogram 🟡 Medium Find the largest rectangle that fits in a histogram.

Hint

For each bar, find the previous smaller (left boundary) and next smaller (right boundary). Width = right - left - 1. Area = height × width.

Problem 3.5.5 — Trapping Rain Water 🔴 Hard Given an elevation map, compute how much water can be trapped after raining. (Classic problem)

Hint

For each position i, water = min(max_left[i], max_right[i]) - height[i]. Can be solved with: (1) prefix/suffix max arrays O(N), (2) two pointers O(N), or (3) monotonic stack O(N).

🏆 Challenge: USACO 2016 February Silver: Fencing the Cows Given a polygon, find if a point is inside. Use ray casting — involves careful implementation with edge cases.

📖 Chapter 3.6 ⏱️ ~50 min read 🎯 Intermediate

Chapter 3.6: Stacks, Queues & Deques

These three data structures control the order in which elements are processed. Each has a unique "personality" that makes it perfect for specific types of problems.

Stack: Last In, First Out (like a stack of plates)
Queue: First In, First Out (like a line at a store)
Deque: Double-ended — insert/remove from both ends

3.6.1 Stack Deep Dive

We introduced stack in Chapter 3.1. Let's use it to solve real problems.

Visual: Stack Operations

Stack Operations

The diagram above illustrates the LIFO (Last In, First Out) property with step-by-step push and pop operations. Note how pop() always removes the most-recently-pushed element — this is what makes stacks ideal for matching brackets, DFS, and undo operations.

The Balanced Brackets Problem

Problem: Given a string of brackets ()[]{}, determine if they're properly nested.

#include <bits/stdc++.h>
using namespace std;

bool isBalanced(const string &s) {
    stack<char> st;

    for (char ch : s) {
        if (ch == '(' || ch == '[' || ch == '{') {
            st.push(ch);   // opening bracket: push onto stack
        } else {
            // closing bracket: must match the most recent opening
            if (st.empty()) return false;   // no matching opening bracket

            char top = st.top();
            st.pop();

            // Check if it matches
            if (ch == ')' && top != '(') return false;
            if (ch == ']' && top != '[') return false;
            if (ch == '}' && top != '{') return false;
        }
    }

    return st.empty();  // all brackets matched if stack is empty
}

int main() {
    cout << isBalanced("()[]{}") << "\n";    // 1 (true)
    cout << isBalanced("([]){}") << "\n";    // 1 (true)
    cout << isBalanced("([)]")   << "\n";    // 0 (false)
    cout << isBalanced("(()")    << "\n";    // 0 (false — unmatched '(')
    return 0;
}

The "Next Greater Element" Problem

Problem: For each element in an array, find the next element to its right that is strictly greater. If none exists, output -1.

This is a classic monotonic stack problem.

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;
    vector<int> A(n);
    for (int &x : A) cin >> x;

    vector<int> answer(n, -1);  // default: -1 (no greater element)
    stack<int> st;              // stores indices of elements awaiting their answer

    for (int i = 0; i < n; i++) {
        // While stack is non-empty and current element > element at stack's top index
        while (!st.empty() && A[i] > A[st.top()]) {
            answer[st.top()] = A[i];  // A[i] is the next greater element for st.top()
            st.pop();
        }
        st.push(i);  // push current index (waiting for a larger element later)
    }

    for (int x : answer) cout << x << " ";
    cout << "\n";

    return 0;
}

Trace for [3, 1, 4, 1, 5, 9, 2, 6]:

i=0: push 0. Stack: [0]
i=1: A[1]=1 ≤ A[0]=3, push 1. Stack: [0,1]
i=2: A[2]=4 > A[1]=1 → answer[1]=4, pop. A[2]=4 > A[0]=3 → answer[0]=4, pop. Push 2.
i=3: push 3. Stack: [2,3]
i=4: A[4]=5 > A[3]=1 → answer[3]=5. A[4]=5 > A[2]=4 → answer[2]=5. Push 4.
i=5: A[5]=9 > A[4]=5 → answer[4]=9. Push 5. Stack: [5]
i=6: push 6. Stack: [5,6]
i=7: A[7]=6 > A[6]=2 → answer[6]=6. Push 7.
Remaining on stack (5, 7): answer stays -1.

Output: 4 4 5 5 9 -1 6 -1

Key insight: A monotonic stack maintains elements in a strictly increasing or decreasing order. When a new element breaks that order, it "solves" all the elements it's greater than. This is O(n) because each element is pushed and popped at most once.

3.6.2 Queue and BFS Preparation

The queue's FIFO property makes it perfect for Breadth-First Search (BFS), which we cover in Chapter 5.2. Here we focus on the queue itself and related patterns.

Visual: Queue Operations

Queue Operations

The queue processes elements in order of arrival: the front element is always dequeued next, while new elements join at the back. This FIFO property ensures BFS visits nodes level-by-level, guaranteeing shortest-path distances.

Simulation with a Queue

Problem: A theme park ride has N groups of people. Each group has size[i]. The ride holds at most M people per run. Simulate how many runs are needed to take everyone.

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, m;
    cin >> n >> m;

    queue<int> groups;
    for (int i = 0; i < n; i++) {
        int x;
        cin >> x;
        groups.push(x);
    }

    int runs = 0;
    while (!groups.empty()) {
        int capacity = m;   // remaining capacity for this run
        runs++;

        while (!groups.empty() && groups.front() <= capacity) {
            capacity -= groups.front();  // fit this group
            groups.pop();
        }
    }

    cout << runs << "\n";
    return 0;
}

3.6.3 Deque — Double-Ended Queue

A deque (pronounced "deck") supports O(1) insertion and removal at both the front and back.

#include <bits/stdc++.h>
using namespace std;

int main() {
    deque<int> dq;

    dq.push_back(1);    // [1]
    dq.push_back(2);    // [1, 2]
    dq.push_front(0);   // [0, 1, 2]
    dq.push_front(-1);  // [-1, 0, 1, 2]

    cout << dq.front() << "\n";  // -1
    cout << dq.back() << "\n";   // 2

    dq.pop_front();  // [-1 removed] → [0, 1, 2]
    dq.pop_back();   // [2 removed]  → [0, 1]

    cout << dq.front() << "\n";  // 0
    cout << dq.size() << "\n";   // 2

    // Random access (like a vector)
    cout << dq[0] << "\n";  // 0
    cout << dq[1] << "\n";  // 1

    return 0;
}

3.6.4 Monotonic Deque — Sliding Window Maximum

Problem: Given an array A of N integers and a window of size K, find the maximum value in each window as it slides from left to right.

Naive approach: for each window, scan all K elements → O(N×K). Too slow for large K.

Monotonic deque approach: O(N).

The deque stores indices of elements in decreasing order of their values. The front is always the maximum.

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, k;
    cin >> n >> k;
    vector<int> A(n);
    for (int &x : A) cin >> x;

    deque<int> dq;  // stores indices; values A[dq[i]] are decreasing
    vector<int> maxInWindow;

    for (int i = 0; i < n; i++) {
        // Remove elements outside the window (front is too old)
        while (!dq.empty() && dq.front() <= i - k) {
            dq.pop_front();
        }

        // Remove elements from back that are smaller than A[i]
        // (they can never be the maximum for future windows)
        while (!dq.empty() && A[dq.back()] <= A[i]) {
            dq.pop_back();
        }

        dq.push_back(i);  // add current index

        // Window is full starting at i = k-1
        if (i >= k - 1) {
            maxInWindow.push_back(A[dq.front()]);  // front is always the max
        }
    }

    for (int x : maxInWindow) cout << x << " ";
    cout << "\n";

    return 0;
}

Sample Input:

8 3
1 3 -1 -3 5 3 6 7

Sample Output:

3 3 5 5 6 7

Windows: [1,3,-1]=3, [3,-1,-3]=3, [-1,-3,5]=5, [-3,5,3]=5, [5,3,6]=6, [3,6,7]=7.

3.6.5 Stack-Based: Largest Rectangle in Histogram

A classic competitive programming problem: given N bars of heights h[0..N-1], find the largest rectangle that fits within the histogram.

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;
    vector<int> h(n);
    for (int &x : h) cin >> x;

    stack<int> st;   // stores indices of bars in increasing height order
    long long maxArea = 0;

    for (int i = 0; i <= n; i++) {
        int currentH = (i == n) ? 0 : h[i];  // sentinel 0 at the end

        while (!st.empty() && h[st.top()] > currentH) {
            int height = h[st.top()];   // height of the rectangle
            st.pop();
            int width = st.empty() ? i : i - st.top() - 1;  // width
            maxArea = max(maxArea, (long long)height * width);
        }

        st.push(i);
    }

    cout << maxArea << "\n";
    return 0;
}

⚠️ Common Mistakes in Chapter 3.6

#	Mistake	Why It's Wrong	Fix
1	Calling `top()`/`front()` on empty stack/queue	Undefined behavior, program crashes	Check `!st.empty()` first
2	Wrong comparison direction in monotonic stack	"Next Greater" needs `>` but used `<`, gets "Next Smaller"	Read carefully, verify with examples
3	Forgetting to remove expired elements in sliding window	Front index of deque is out of window range, wrong result	`while (dq.front() <= i - k)`
4	Forgetting sentinel in histogram max rectangle	Remaining stack elements unprocessed, missing final answer	Use height 0 when `i == n`
5	Confusing `stack` and `deque`	`stack` can only access top, cannot traverse middle elements	Use `deque` when two-end operations needed

Chapter Summary

📌 Key Takeaways

Structure	Operations	Key Use Cases	Why It Matters
`stack<T>`	push/pop/top — `O(1)`	Bracket matching, undo/redo, DFS	Core tool for LIFO logic
`queue<T>`	push/pop/front — `O(1)`	BFS, simulating queues	Core tool for FIFO logic
`deque<T>`	push/pop front & back — `O(1)`	Sliding window, BFS variants	Versatile container with two-end access
Monotonic stack	`O(n)` total	Next Greater/Smaller Element	High-frequency USACO Silver topic
Monotonic deque	`O(n)` total	Sliding Window Max/Min	`O(N)` solution for window extremes

❓ FAQ

Q1: Why is the monotonic stack O(N) and not O(N²)? It looks like there's a nested loop.

A: Key observation — each element is pushed at most once and popped at most once. Although the inner while loop may pop multiple elements at once, the total number of pops globally is ≤ N. So total operations ≤ 2N = O(N). This analysis method is called amortized analysis.

Q2: When to use stack vs deque?

A: If you only need LIFO (one-end access), use stack; if you need two-end operations (e.g., sliding window needs front removal + back addition), use deque. stack is actually backed by deque internally, but restricts the interface to only expose the top.

Q3: Must BFS use queue? Can I use vector?

A: Technically you can simulate with vector + index, but queue is clearer and less error-prone. In contests, use queue directly. The only exception is 0-1 BFS (shortest path with only 0 and 1 weights), which requires deque.

Q4: Why can the "largest rectangle" problem be solved with a stack?

A: The stack maintains an increasing sequence of bars. When a shorter bar is encountered, it means the top bar's "rightward extension" ends here. At that point, we can compute the rectangle area with the top bar's height. Each bar is pushed/popped once, total complexity O(N).

🔗 Connections to Later Chapters

Chapter 5.2 (Graph BFS/DFS): queue is the core container for BFS, stack can be used for iterative DFS
Chapter 3.4 (Two Pointers): the sliding window technique combines well with the monotonic deque from this chapter
Chapters 6.1–6.3 (DP): certain optimization techniques (e.g., DP-optimized sliding window extremes) directly use the monotonic deque from this chapter
The monotonic stack also appears as an alternative to Chapter 3.9 (Segment Trees) — many problems solvable by segment trees can also be solved in O(N) with a monotonic stack

Practice Problems

Problem 3.6.1 — Stock Span Read N daily stock prices. For each day, find the number of consecutive days up to that day where the price was ≤ today's price (including today). (Classic monotonic stack problem)

Problem 3.6.2 — Circular Queue Implement a circular queue of size K. Process operations: PUSH x (add x to back), POP (remove from front). Print "OVERFLOW" if push on full queue, "UNDERFLOW" if pop on empty.

Problem 3.6.3 — Sliding Window Minimum Same as the sliding window maximum example, but find the minimum.

Problem 3.6.4 — Expression Evaluation Read a simple expression with integers and +, - operators (no parentheses). Evaluate it using a stack.

Problem 3.6.5 — USACO 2020 January Bronze: Loan Repayment (Simplified) You have N stacks of hay. Each day, you can take one bale from any non-empty stack. Model this with a priority_queue: always take from the tallest stack. Simulate for D days and print the remaining bales.

📖 Chapter 3.7 ⏱️ ~50 min read 🎯 Intermediate

Chapter 3.7: Hashing Techniques

📝 Before You Continue: You should know STL containers (Chapter 3.1) and string basics (Chapter 2.3). This chapter covers hashing principles and advanced competitive programming usage.

Hashing is one of the most important "tools" in competitive programming: it turns complex comparison problems into O(1) numeric comparisons. But hashing is also the easiest technique to get "hacked"—this chapter teaches both how to use it well and how to prevent being hacked.

3.7.1 `unordered_map` vs `map`: Internals & Performance

Internal Implementation Comparison

Feature	`map`	`unordered_map`
Internal structure	Red-black tree (balanced BST)	Hash table
Lookup time	O(log N)	O(1) avg, O(N) worst
Insert time	O(log N)	O(1) avg, O(N) worst
Iteration order	Ordered (ascending by key)	Unordered
Memory usage	O(N), smaller constant	O(N), larger constant
Worst case	O(log N) (stable)	O(N) (hash collision)

#include <bits/stdc++.h>
using namespace std;

int main() {
    // map: ordered, O(log N)
    map<int, int> m;
    m[3] = 30; m[1] = 10; m[2] = 20;
    for (auto [k, v] : m) cout << k << ":" << v << " ";
    // output: 1:10 2:20 3:30  ← ordered!

    // unordered_map: unordered, O(1) average
    unordered_map<int, int> um;
    um[3] = 30; um[1] = 10; um[2] = 20;
    // iteration order undefined, but lookup is very fast

    // performance difference: N=10^6 operations
    // map: ~300ms; unordered_map: ~80ms (roughly)
}

When to Choose Which?

Use map: need ordered iteration, need lower_bound/upper_bound, extreme key range (high hash collision risk)
Use unordered_map: pure lookup/insert, key is integer or string, large N (> 10^5)

3.7.2 Anti-Hack: Custom Hash

Problem: unordered_map's default integer hash is essentially hash(x) = x, allowing attackers to construct many hash collisions, degrading operations to O(N) and causing TLE.

On platforms like Codeforces, this is a common hack technique.

Solution: splitmix64 Hash

// Anti-hack custom hasher — uses splitmix64
struct custom_hash {
    static uint64_t splitmix64(uint64_t x) {
        x += 0x9e3779b97f4a7c15;
        x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9;
        x = (x ^ (x >> 27)) * 0x94d049bb133111eb;
        return x ^ (x >> 31);
    }

    size_t operator()(uint64_t x) const {
        static const uint64_t FIXED_RANDOM =
            chrono::steady_clock::now().time_since_epoch().count();
        return splitmix64(x + FIXED_RANDOM);
    }
};

// Usage:
unordered_map<int, int, custom_hash> safe_map;
unordered_set<int, custom_hash> safe_set;

⚠️ Contest tip: When using unordered_map on Codeforces, always add custom_hash. USACO test data won't deliberately construct hacks, but it's a good habit.

3.7.3 String Hashing (Polynomial Hash)

String hashing maps a string to an integer, turning string comparison into numeric comparison (O(1)).

Core Formula

For string s[0..n-1], define the hash value as:

hash(s) = s[0]·B^(n-1) + s[1]·B^(n-2) + ... + s[n-1]·B^0  (mod M)

where B is the base (typically 131 or 131117) and M is a large prime (typically 10⁹+7 or 10⁹+9).

Prefix Hash + Substring Hash O(1)

// String hashing: O(N) preprocessing, O(1) substring hash
#include <bits/stdc++.h>
using namespace std;
typedef unsigned long long ull;

const ull BASE = 131;
// Use unsigned long long natural overflow (equivalent to mod 2^64)
// Or specify MOD manually:
// const ull MOD = 1e9 + 7;

struct StringHash {
    int n;
    vector<ull> h, pw;

    StringHash(const string& s) : n(s.size()), h(n + 1, 0), pw(n + 1, 1) {
        for (int i = 0; i < n; i++) {
            h[i + 1] = h[i] * BASE + (s[i] - 'a' + 1);  // 1-indexed prefix hash
            pw[i + 1] = pw[i] * BASE;                      // BASE^(i+1)
        }
    }

    // Get hash of substring s[l..r] (0-indexed)
    ull get(int l, int r) {
        return h[r + 1] - h[l] * pw[r - l + 1];  // ← KEY formula
    }
};

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    string s = "abcabc";
    StringHash sh(s);

    // Compare if two substrings are equal
    // s[0..2] = "abc", s[3..5] = "abc"
    cout << (sh.get(0, 2) == sh.get(3, 5) ? "Equal" : "Not Equal") << "\n";  // Equal

    // Compare s[0..1] = "ab" vs s[3..4] = "ab"
    cout << (sh.get(0, 1) == sh.get(3, 4) ? "Equal" : "Not Equal") << "\n";  // Equal
}

Hash Formula Derivation:

h[r+1] = s[0]*B^r + s[1]*B^(r-1) + ... + s[r]*B^0
h[l]   = s[0]*B^(l-1) + ... + s[l-1]*B^0

h[r+1] - h[l] * B^(r-l+1)
= (s[0]*B^r + ... + s[r]*B^0)
  - (s[0]*B^r + ... + s[l-1]*B^(r-l+1))
= s[l]*B^(r-l) + s[l+1]*B^(r-l-1) + ... + s[r]*B^0
= hash(s[l..r]) ✓

下图直观展示了前缀哈希数组的构建过程，以及如何用 get(l, r) 公式在 O(1) 内提取任意子串的哈希值：

String Polynomial Hash

3.7.4 Double Hashing (Avoiding Collisions)

Single hash (mod M) has collision probability ≈ 1/M. For N substring comparisons, expected collisions ≈ N²/(2M).

If M = 10⁹+7, N = 10⁶: collision probability ≈ 10¹²/(2×10⁹) = 500 times! Not safe.
Solution: double hashing, using two different (B, M) pairs simultaneously, collision probability drops to 1/(M₁×M₂) ≈ 10⁻¹⁸.

// Double hashing: two (BASE, MOD) pairs used simultaneously, extremely low collision probability
struct DoubleHash {
    static const ull B1 = 131, M1 = 1e9 + 7;
    static const ull B2 = 137, M2 = 1e9 + 9;

    int n;
    vector<ull> h1, h2, pw1, pw2;

    DoubleHash(const string& s) : n(s.size()),
        h1(n+1,0), h2(n+1,0), pw1(n+1,1), pw2(n+1,1) {
        for (int i = 0; i < n; i++) {
            ull c = s[i] - 'a' + 1;
            h1[i+1] = (h1[i] * B1 + c) % M1;
            h2[i+1] = (h2[i] * B2 + c) % M2;
            pw1[i+1] = pw1[i] * B1 % M1;
            pw2[i+1] = pw2[i] * B2 % M2;
        }
    }

    // Return pair<ull,ull> as the hash "fingerprint" of substring s[l..r]
    pair<ull,ull> get(int l, int r) {
        ull v1 = (h1[r+1] - h1[l] * pw1[r-l+1] % M1 + M1) % M1;
        ull v2 = (h2[r+1] - h2[l] * pw2[r-l+1] % M2 + M2) % M2;
        return {v1, v2};
    }
};

3.7.5 Application: String Matching (Rabin-Karp)

// Rabin-Karp string matching: find all occurrences of pattern P in text T
// Time: O(N+M) average, O(NM) worst case (but extremely fast in practice)
#include <bits/stdc++.h>
using namespace std;
typedef unsigned long long ull;

vector<int> rabinKarp(const string& T, const string& P) {
    int n = T.size(), m = P.size();
    if (m > n) return {};

    const ull BASE = 131;
    ull patHash = 0, textHash = 0, pow_m = 1;

    // Compute BASE^m (natural overflow)
    for (int i = 0; i < m - 1; i++) pow_m *= BASE;

    // Initial hash
    for (int i = 0; i < m; i++) {
        patHash = patHash * BASE + P[i];
        textHash = textHash * BASE + T[i];
    }

    vector<int> result;
    for (int i = 0; i + m <= n; i++) {
        if (textHash == patHash) {
            // Verify when hashes match (avoid false positives from collision)
            if (T.substr(i, m) == P) result.push_back(i);
        }
        if (i + m < n) {
            // Rolling hash: remove leftmost char, add rightmost char
            textHash = textHash - T[i] * pow_m;   // remove leftmost
            textHash = textHash * BASE + T[i + m]; // add rightmost
        }
    }
    return result;
}

3.7.6 Application: Longest Common Substring

Problem: Given strings S and T, find the length of their longest common substring.

Approach: Binary search on the answer (length L of longest common substring), then use a hash set to check if any substring of length L appears in both strings.

// Longest common substring: O(N log N) — binary search + hashing
int longestCommonSubstring(const string& S, const string& T) {
    StringHash hs(S), ht(T);
    int ns = S.size(), nt = T.size();

    auto check = [&](int len) -> bool {
        unordered_set<ull> setS;
        for (int i = 0; i + len <= ns; i++)
            setS.insert(hs.get(i, i + len - 1));
        for (int j = 0; j + len <= nt; j++)
            if (setS.count(ht.get(j, j + len - 1)))
                return true;
        return false;
    };

    int lo = 0, hi = min(ns, nt);
    while (lo < hi) {
        int mid = (lo + hi + 1) / 2;
        if (check(mid)) lo = mid;
        else hi = mid - 1;
    }
    return lo;
}

⚠️ Common Mistakes

Bad modulus choice: Don't use numbers other than 10⁹+7; especially avoid non-prime moduli (high collision rate). Recommended: 10⁹+7 and 10⁹+9 as a double hash pair.
unordered_map hacked: On platforms like Codeforces, the default hash can be attacked. Always use custom_hash.
Substring hash subtraction underflow: h[r+1] - h[l] * pw[r-l+1] may be negative (with signed integers). Use unsigned long long natural overflow, or (... % M + M) % M to ensure non-negative.
BASE doesn't match character set: For lowercase letters (26 types), BASE must be > 26 (typically 31 or 131). For all ASCII characters (128 types), BASE must be > 128 (use 131 or 137).
Hash collision causing WA: Even with double hashing, collisions are theoretically possible. If uncertain, add direct string comparison when hashes match.

Chapter Summary

📌 Core Comparison Table

Tool	Time Complexity	Use Case
`map<K,V>`	O(log N)	Need ordering, need range queries
`unordered_map<K,V>`	O(1) amortized	Only need lookup/insert, key order not required
String hash (single)	O(N) preprocess, O(1) query	Substring comparison, pattern matching
String hash (double)	O(N) preprocess, O(1) query	High-precision scenarios, avoid collisions

❓ FAQ

Q1: Which is better — unsigned long long natural overflow double hash or manual mod hash?

A: ull natural overflow (equivalent to mod 2⁶⁴) is simpler to code, and 2⁶⁴ is large enough that single-hash collision probability is already very low (≈ 10⁻¹⁸). But crafted data can deliberately cause collisions — double hashing is safer then. Both work in contests; ull is more common.

Q2: What can string hashing do that KMP cannot?

A: String hashing excels at multi-string comparison (e.g., finding longest common substring, palindromic substrings), while KMP only excels at single-pattern matching. Hash + binary search can solve many string problems in O(N log N) that would require more complex KMP implementations.

Q3: Should I use BASE 31 or 131?

A: Use 31 for lowercase letters only (a prime less than 37, avoids too-small hash space). Use 131 for mixed case or digits (a prime greater than 128, covers full ASCII). The key is: BASE must be larger than the character set size and ideally a prime.

Practice Problems

Problem 3.7.1 — Two Sum with Hash 🟢 Easy Given array A, find if any two distinct elements sum to target X. Use unordered_set.

Hint

For each A[i], check if (X - A[i]) is already in the hash set. Insert A[i] after checking.

Problem 3.7.2 — Substring Check 🟢 Easy Given string T and pattern P, check if P appears in T. Print all starting indices.

Hint

Use Rabin-Karp rolling hash, or just use `string::find` for practice, then implement manually.

Problem 3.7.3 — Longest Palindromic Substring 🟡 Medium Find the length of the longest palindromic substring.

Hint

A palindrome s[l..r] satisfies: hash(s[l..r]) == hash(reverse(s)[n-1-r..n-1-l]). Binary search + hash on both forward and reversed string.

Problem 3.7.4 — Count Distinct Substrings 🟡 Medium Given string S of length N (N ≤ 5000), count the number of distinct substrings.

Hint

Insert all O(N²) substring hashes into an unordered_set, count distinct values. Use double hash to avoid collisions.

Problem 3.7.5 — String Periods 🔴 Hard Find the smallest period of a string S (smallest k such that S is a repetition of S[0..k-1]).

Hint

Try each k that divides n. For each candidate k, verify using string hash comparison in O(1) per check. Total O(d(n) × n) where d(n) is number of divisors.

📖 Chapter 3.8 ⏱️ ~55 min read 🎯 Intermediate

Chapter 3.8: Maps & Sets

Maps and sets are the workhorses of frequency counting, lookup, and tracking unique elements. In this chapter, we go deep into their practical use in USACO problems.

3.8.1 `map` vs `unordered_map` — Choosing Wisely

Visual: Map Internal Structure (BST)

Map Structure

std::map stores key-value pairs in a balanced BST (Red-Black tree). This gives O(log N) for all operations and keeps keys sorted automatically — great when you need lower_bound/upper_bound queries. Use unordered_map when you only need O(1) lookups and don't care about order.

Feature	`map`	`unordered_map`
Underlying structure	Red-black tree	Hash table
Insert/lookup time	`O(log n)`	`O(1)` average, `O(n)` worst
Iterates in	Sorted key order	Arbitrary order
Min/Max key	Available via `.begin()`/`.rbegin()`	Not available
Keys must be	Comparable (has `<`)	Hashable
Use when	You need sorted keys or find min/max	You need fastest possible lookup

For most USACO problems, either works fine. Use unordered_map for speed when keys are integers or strings, map when you need ordered iteration.

Example: Frequency Map

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;

    unordered_map<int, int> freq;
    for (int i = 0; i < n; i++) {
        int x;
        cin >> x;
        freq[x]++;   // increment count; creates with 0 if not present
    }

    // Find the element with highest frequency
    int maxFreq = 0, maxVal = INT_MIN;
    for (auto &[val, count] : freq) {   // structured binding (C++17)
        if (count > maxFreq || (count == maxFreq && val < maxVal)) {
            maxFreq = count;
            maxVal = val;
        }
    }

    cout << "Most frequent: " << maxVal << " (" << maxFreq << " times)\n";

    return 0;
}

3.8.2 Map Operations — Complete Reference

#include <bits/stdc++.h>
using namespace std;

int main() {
    map<string, int> scores;

    // Insert
    scores["Alice"] = 95;
    scores["Bob"] = 87;
    scores["Charlie"] = 92;
    scores.insert({"Dave", 78});    // another way
    scores.emplace("Eve", 88);      // most efficient way

    // Lookup
    cout << scores["Alice"] << "\n";  // 95
    // WARNING: scores["Unknown"] creates it with value 0!

    // Safe lookup
    if (scores.count("Frank")) {
        cout << scores["Frank"] << "\n";
    } else {
        cout << "Frank not found\n";
    }

    // Using find() — returns iterator
    auto it = scores.find("Bob");
    if (it != scores.end()) {
        cout << it->first << ": " << it->second << "\n";  // Bob: 87
    }

    // Update
    scores["Alice"] += 5;    // Alice now has 100

    // Erase
    scores.erase("Charlie");

    // Iterate in sorted key order (map always gives sorted order)
    for (const auto &[name, score] : scores) {
        cout << name << ": " << score << "\n";
    }
    // Alice: 100
    // Bob: 87
    // Dave: 78
    // Eve: 88

    // Size and empty check
    cout << scores.size() << "\n";   // 4
    cout << scores.empty() << "\n";  // 0 (false)

    // Clear all entries
    scores.clear();

    return 0;
}

3.8.3 Set Operations — Complete Reference

#include <bits/stdc++.h>
using namespace std;

int main() {
    set<int> s = {5, 3, 8, 1, 9, 2};
    // s = {1, 2, 3, 5, 8, 9} (always sorted!)

    // Insert
    s.insert(4);   // s = {1, 2, 3, 4, 5, 8, 9}
    s.insert(3);   // already there, no change

    // Erase
    s.erase(8);    // s = {1, 2, 3, 4, 5, 9}

    // Lookup
    cout << s.count(3) << "\n";  // 1 (exists)
    cout << s.count(7) << "\n";  // 0 (not found)

    // Iterator-based queries
    auto it = s.lower_bound(4);  // first element >= 4
    cout << *it << "\n";         // 4

    auto it2 = s.upper_bound(4); // first element > 4
    cout << *it2 << "\n";        // 5

    // Min and Max
    cout << *s.begin() << "\n";   // 1 (min)
    cout << *s.rbegin() << "\n";  // 9 (max)

    // Remove minimum
    s.erase(s.begin());   // removes 1
    cout << *s.begin() << "\n";  // 2

    // Iterate
    for (int x : s) cout << x << " ";
    cout << "\n";  // 2 3 4 5 9

    return 0;
}

3.8.4 USACO Problem: Cow IDs

Problem (USACO 2017 February Bronze): Bessie wants to find the N-th smallest number that doesn't appear in a set of "taken" IDs. Given a set of taken IDs and N, find the N-th available ID.

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, q;
    cin >> n >> q;

    set<int> taken;
    for (int i = 0; i < n; i++) {
        int x; cin >> x;
        taken.insert(x);
    }

    // For each query q, find the q-th positive integer NOT in taken
    while (q--) {
        int k; cin >> k;

        // Binary search: find smallest x such that x - (# taken values <= x) >= k
        int lo = 1, hi = 2e9;
        while (lo < hi) {
            int mid = lo + (hi - lo) / 2;
            // count of available numbers in [1, mid] = mid - (# taken values <= mid)
            int taken_count = (int)(taken.lower_bound(mid + 1) - taken.begin());
            int available = mid - taken_count;
            if (available >= k) hi = mid;
            else lo = mid + 1;
        }

        cout << lo << "\n";
    }

    return 0;
}

3.8.5 Multiset — Sorted Bag with Duplicates

A multiset is like a set, but allows duplicate values:

#include <bits/stdc++.h>
using namespace std;

int main() {
    multiset<int> ms;
    ms.insert(3);
    ms.insert(1);
    ms.insert(3);   // duplicate allowed
    ms.insert(5);
    ms.insert(1);

    // ms = {1, 1, 3, 3, 5}

    cout << ms.count(3) << "\n";  // 2 (how many 3s)
    cout << ms.count(2) << "\n";  // 0

    // Remove ONE occurrence of 3
    ms.erase(ms.find(3));  // removes only one 3
    // ms = {1, 1, 3, 5}

    // Remove ALL occurrences of 1
    ms.erase(1);  // removes all 1s
    // ms = {3, 5}

    cout << *ms.begin() << "\n";   // 3 (min)
    cout << *ms.rbegin() << "\n";  // 5 (max)

    return 0;
}

Running Median with Two Multisets

Keep track of the median of a stream of numbers using a max-multiset (lower half) and a min-multiset (upper half):

#include <bits/stdc++.h>
using namespace std;

int main() {
    multiset<int> lo;  // max-heap: lower half (use negation or reverse iterator)
    multiset<int> hi;  // min-heap: upper half

    // For simplicity, use two multisets where lo stores values in reverse
    // lo's maximum = lo.rbegin(); hi's minimum = hi.begin()

    int n;
    cin >> n;

    for (int i = 0; i < n; i++) {
        int x;
        cin >> x;

        // Add to appropriate half
        if (lo.empty() || x <= *lo.rbegin()) {
            lo.insert(x);
        } else {
            hi.insert(x);
        }

        // Rebalance: sizes should differ by at most 1
        while (lo.size() > hi.size() + 1) {
            hi.insert(*lo.rbegin());
            lo.erase(lo.find(*lo.rbegin()));
        }
        while (hi.size() > lo.size()) {
            lo.insert(*hi.begin());
            hi.erase(hi.begin());
        }

        // Print median
        if (lo.size() == hi.size()) {
            // Even count: average of two middle values
            double median = (*lo.rbegin() + *hi.begin()) / 2.0;
            cout << fixed << setprecision(1) << median << "\n";
        } else {
            // Odd count: middle value is in lo
            cout << *lo.rbegin() << "\n";
        }
    }

    return 0;
}

3.8.6 Practical Patterns

Pattern 1: Counting Distinct Elements

vector<int> data = {1, 5, 3, 1, 2, 5, 5, 3};
set<int> distinct(data.begin(), data.end());
cout << "Distinct count: " << distinct.size() << "\n";  // 4

Pattern 2: Group by Frequency, Sort by Value

vector<int> nums = {3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5};
map<int, int> freq;
for (int x : nums) freq[x]++;

// Group values by their frequency
map<int, vector<int>> byFreq;
for (auto &[val, cnt] : freq) {
    byFreq[cnt].push_back(val);
}

// Print in order of frequency
for (auto &[cnt, vals] : byFreq) {
    for (int v : vals) cout << v << " (×" << cnt << ")\n";
}

Pattern 3: Offline Queries with Sorting

Sort queries along with events to process them together in O((N+Q) log N):

// Example: for each query point, count how many events have value <= query point
// Sort both arrays, sweep through with two pointers

⚠️ Common Mistakes in Chapter 3.8

#	Mistake	Why It's Wrong	Fix
1	`map[key]` accessing non-existent key	Auto-creates entry with value 0, pollutes data	Use `m.count(key)` or `m.find(key)` to check first
2	`multiset::erase(value)` deletes all equal values	Expected to delete one, deleted all	Use `ms.erase(ms.find(value))` to delete just one
3	Modifying map/set size during iteration	Iterator invalidated, crash or skipped elements	Use `it = m.erase(it)` for safe deletion
4	`unordered_map` hacked to degrade to `O(N)`	Adversary constructs hash-collision data, TLE	Switch to `map` or use custom hash function
5	Forgetting `set` doesn't store duplicates	`size()` doesn't grow after inserting duplicate, count wrong	Use `multiset` when duplicates needed

Chapter Summary

📌 Key Takeaways

Structure	Ordered	Duplicates	Key Feature	Why It Matters
`map<K,V>`	Yes (sorted)	No (unique keys)	Key-value mapping, `O(log N)`	Frequency counting, ID→attribute mapping
`unordered_map<K,V>`	No	No	`O(1)` average lookup	5-10x faster than map for large data
`set<T>`	Yes (sorted)	No	Ordered unique set	Deduplication, range queries (`lower_bound`)
`unordered_set<T>`	No	No	`O(1)` membership test	Just need to check "seen before?"
`multiset<T>`	Yes (sorted)	Yes	Ordered multiset	Dynamic median, sliding window

🧩 "Which Container to Use" Quick Reference

Need	Recommended Container	Reason
Count occurrences of each element	`map` / `unordered_map`	`freq[x]++` in one line
Deduplicate and sort	`set`	Auto-dedup + auto-sort
Check if element was seen	`unordered_set`	`O(1)` lookup
Dynamic ordered set + find extremes	`set` / `multiset`	`O(1)` access to min/max
Need `lower_bound` / `upper_bound`	`set` / `map`	Only ordered containers support this
Value→index mapping	`map` / `unordered_map`	Coordinate compression etc.

❓ FAQ

Q1: What's the difference between map's [] operator and find?

A: m[key] auto-creates a default value (0 for int) when key doesn't exist; m.find(key) only searches, doesn't create. If you just want to check if a key exists, use m.count(key) or m.find(key) != m.end().

Q2: Both multiset and priority_queue can get extremes — which to use?

A: priority_queue can only get the max (or min) and delete it, doesn't support deletion by value. multiset supports finding and deleting any value, more flexible. If you only need to repeatedly get the extreme, priority_queue is simpler; if you need to delete specific elements (e.g., removing elements leaving a sliding window), use multiset.

Q3: When can unordered_map be slower than map?

A: Two situations: ① When hash collisions are severe (many keys hash to the same bucket), degrades to O(N); ② In contests, adversaries deliberately construct data to hack unordered_map. Solution: use a custom hash function, or switch to map.

Q4: Is C++17 structured binding auto &[key, val] safe? Can I use it in contests?

A: USACO and most contest platforms support C++17, so for (auto &[key, val] : m) is safe to use. It's cleaner than entry.first/entry.second.

🔗 Connections to Later Chapters

Chapter 3.3 (Sorting & Searching): coordinate compression often combines with map (value → compressed index)
Chapter 3.9 (Segment Trees): ordered set's lower_bound can replace simple segment tree queries
Chapters 5.1–5.2 (Graphs): map is commonly used to store adjacency lists for sparse graphs
Chapter 4.1 (Greedy): multiset combined with greedy strategies can efficiently maintain dynamic optimal choices
The map frequency counting pattern appears throughout the book and is one of the most fundamental tools in competitive programming

Practice Problems

Problem 3.8.1 — Two Sum Read N integers and a target T. Find two values in the array that sum to T. Print their indices (1-indexed). (Hint: use a map to store value → index)

Problem 3.8.2 — Anagram Groups Read N words. Group them by their sorted-letter form. Print each group on one line, sorted alphabetically.

Example: "eat tea tan ate nat bat" → groups: {ate, eat, tea}, {bat}, {nat, tan}

Problem 3.8.3 — Interval Overlap Count Read N intervals [L_i, R_i]. For each integer point from 1 to M, count how many intervals contain it. Output the maximum overlap count. (Hint: use difference array, or sort events and sweep with a set)

Problem 3.8.4 — Cow Photography (USACO Bronze Inspired) N cows each have a unique ID. Read N lists (each a permutation of IDs). Find the ordering that's consistent with all lists (the "true" order). (Hint: use maps to count pairwise orderings)

Problem 3.8.5 — Running Distinct Count Read N integers one by one. After each new integer, print the count of distinct values seen so far. (Hint: maintain an unordered_set; its size is the answer)

📖 Chapter 3.9 ⏱️ ~70 min read 🎯 Advanced

Chapter 3.9: Introduction to Segment Trees

📝 Before You Continue: You should understand prefix sums (Chapter 3.2), arrays, and recursion (Chapter 2.3). Segment trees are a more advanced data structure — make sure you're comfortable with recursion before diving in.

Segment trees are one of the most powerful data structures in competitive programming. They solve a fundamental problem that prefix sums cannot: range queries with updates.

3.9.1 The Problem: Why We Need Segment Trees

Consider this challenge:

Array A of N integers
Q1: What is the sum of A[l..r]? (Range sum query)
Q2: Update A[i] = x (Point update)

Prefix sum solution: Range query in O(1), but update requires O(N) to recompute all prefix sums. For M mixed queries, total: O(NM) — too slow for N,M = 10^5.

Segment tree solution: Both query and update in O(log N). For M mixed queries: O(M log N) ✓

Data Structure	Build	Query	Update	Best For
Simple array	`O(N)`	`O(N)`	`O(1)`	Only updates
Prefix sum	`O(N)`	`O(1)`	`O(N)`	Only queries
Segment Tree	`O(N)`	`O(log N)`	`O(log N)`	Both queries + updates
Fenwick Tree (BIT)	`O(N log N)`	`O(log N)`	`O(log N)`	Simpler code, prefix sums only

The diagram shows a segment tree built on array [1, 3, 5, 7, 9, 11]. Each internal node stores the sum of its range. A query for range [2,4] (sum=21) is answered by combining just 2 nodes — O(log N) instead of O(N).

3.9.2 Structure: What Is a Segment Tree?

A segment tree is a complete binary tree where:

Each leaf corresponds to a single array element
Each internal node stores the aggregate (sum, min, max, etc.) of its range
The root covers the entire array [0..N-1]
A node covering [l..r] has two children: [l..mid] and [mid+1..r]

For an array of N elements, the tree has at most 4N nodes (we use a 1-indexed tree array of size 4N as a safe upper bound).

Array: [1, 3, 5, 7, 9, 11]  (indices 0..5)

Tree (1-indexed, node i has children 2i and 2i+1):
         [0..5]=36
        /          \
  [0..2]=9       [3..5]=27
   /     \        /      \
[0..1]=4 [2]=5  [3..4]=16  [5]=11
  /   \          /    \
[0]=1 [1]=3   [3]=7  [4]=9

下图展示了线段树的完整结构，以及查询 sum([2,4]) 时蓝色高亮的访问路径：

Segment Tree Structure

3.9.3 Building the Segment Tree

// Solution: Segment Tree Build — O(N)
#include <bits/stdc++.h>
using namespace std;

const int MAXN = 100005;
int tree[4 * MAXN];  // segment tree array (4x array size for safety)
int arr[MAXN];       // original array

// Build: recursively fill tree[]
// node = current tree node index (start with 1)
// start, end = range this node covers
void build(int node, int start, int end) {
    if (start == end) {
        // Leaf node: stores the array element
        tree[node] = arr[start];
    } else {
        int mid = (start + end) / 2;
        // Build left and right children first
        build(2 * node, start, mid);        // left child
        build(2 * node + 1, mid + 1, end);  // right child
        // Internal node: sum of children
        tree[node] = tree[2 * node] + tree[2 * node + 1];
    }
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;
    for (int i = 0; i < n; i++) cin >> arr[i];

    build(1, 0, n - 1);  // build from node 1, covering [0..n-1]

    return 0;
}

Build trace for [1, 3, 5, 7, 9, 11]:

build(1, 0, 5):
  build(2, 0, 2):
    build(4, 0, 1):
      build(8, 0, 0): tree[8] = arr[0] = 1
      build(9, 1, 1): tree[9] = arr[1] = 3
      tree[4] = tree[8] + tree[9] = 4
    build(5, 2, 2): tree[5] = arr[2] = 5
    tree[2] = tree[4] + tree[5] = 9
  build(3, 3, 5):
    build(6, 3, 4):
      ...
    tree[3] = 27
  tree[1] = 9 + 27 = 36

3.9.4 Range Query

Query sum of arr[l..r]:

Key idea: Recursively descend the tree. At each node covering [start..end]:

If [start..end] is completely inside [l..r]: return this node's value (done!)
If [start..end] is completely outside [l..r]: return 0 (no contribution)
Otherwise: recurse into both children, sum the results

// Range Query: sum of arr[l..r] — O(log N)
// node = current tree node, [start, end] = range it covers
// [l, r] = query range
int query(int node, int start, int end, int l, int r) {
    if (r < start || end < l) {
        // Case 1: Current segment completely outside query range
        return 0;   // identity for sum (use INT_MAX for min queries)
    }
    if (l <= start && end <= r) {
        // Case 2: Current segment completely inside query range
        return tree[node];   // ← KEY LINE: use this node directly!
    }
    // Case 3: Partial overlap — recurse into children
    int mid = (start + end) / 2;
    int leftSum  = query(2 * node, start, mid, l, r);
    int rightSum = query(2 * node + 1, mid + 1, end, l, r);
    return leftSum + rightSum;
}

// Usage: sum of arr[2..4]
int result = query(1, 0, n - 1, 2, 4);
cout << result << "\n";  // 5 + 7 + 9 = 21

Query trace for [2..4] on tree of [1,3,5,7,9,11]:

query(1, 0, 5, 2, 4):
  query(2, 0, 2, 2, 4): [0..2] partially overlaps [2..4]
    query(4, 0, 1, 2, 4): [0..1] outside [2..4] → return 0
    query(5, 2, 2, 2, 4): [2..2] inside [2..4] → return 5
    return 0 + 5 = 5
  query(3, 3, 5, 2, 4): [3..5] partially overlaps [2..4]
    query(6, 3, 4, 2, 4): [3..4] inside [2..4] → return 16
    query(7, 5, 5, 2, 4): [5..5] outside [2..4] → return 0
    return 16 + 0 = 16
  return 5 + 16 = 21 ✓

Only 4 nodes visited — O(log N)!

3.9.5 Point Update

Update arr[i] = x (change a single element):

// Point Update: set arr[idx] = val — O(log N)
void update(int node, int start, int end, int idx, int val) {
    if (start == end) {
        // Leaf: update the value
        arr[idx] = val;
        tree[node] = val;
    } else {
        int mid = (start + end) / 2;
        if (idx <= mid) {
            update(2 * node, start, mid, idx, val);      // update in left child
        } else {
            update(2 * node + 1, mid + 1, end, idx, val); // update in right child
        }
        // Update this internal node after child changes
        tree[node] = tree[2 * node] + tree[2 * node + 1];
    }
}

// Usage: set arr[2] = 10
update(1, 0, n - 1, 2, 10);

3.9.6 Complete Implementation

Here's the full, contest-ready segment tree:

// Solution: Segment Tree — O(N) build, O(log N) query/update
#include <bits/stdc++.h>
using namespace std;

const int MAXN = 100005;
long long tree[4 * MAXN];

void build(int node, int start, int end, long long arr[]) {
    if (start == end) {
        tree[node] = arr[start];
        return;
    }
    int mid = (start + end) / 2;
    build(2 * node, start, mid, arr);
    build(2 * node + 1, mid + 1, end, arr);
    tree[node] = tree[2 * node] + tree[2 * node + 1];
}

long long query(int node, int start, int end, int l, int r) {
    if (r < start || end < l) return 0;
    if (l <= start && end <= r) return tree[node];
    int mid = (start + end) / 2;
    return query(2 * node, start, mid, l, r)
         + query(2 * node + 1, mid + 1, end, l, r);
}

void update(int node, int start, int end, int idx, long long val) {
    if (start == end) {
        tree[node] = val;
        return;
    }
    int mid = (start + end) / 2;
    if (idx <= mid) update(2 * node, start, mid, idx, val);
    else update(2 * node + 1, mid + 1, end, idx, val);
    tree[node] = tree[2 * node] + tree[2 * node + 1];
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, q;
    cin >> n >> q;
    long long arr[MAXN];
    for (int i = 0; i < n; i++) cin >> arr[i];

    build(1, 0, n - 1, arr);

    while (q--) {
        int type;
        cin >> type;
        if (type == 1) {
            // Point update: set arr[i] = v
            int i; long long v;
            cin >> i >> v;
            update(1, 0, n - 1, i, v);
        } else {
            // Range query: sum of arr[l..r]
            int l, r;
            cin >> l >> r;
            cout << query(1, 0, n - 1, l, r) << "\n";
        }
    }

    return 0;
}

Sample Input:

6 5
1 3 5 7 9 11
2 2 4
1 2 10
2 2 4
2 0 5
1 0 0

Sample Output:

21
26
41

(第1次查询 [2,4] = 5+7+9 = 21；执行 update arr[2]=10 后，第2次查询 [2,4] = 10+7+9 = 26；第3次查询 [0,5] = 1+3+10+7+9+11 = 41；最后一条操作 update arr[0]=0 无输出)

3.9.7 Segment Tree vs. Fenwick Tree (BIT)

Feature	Segment Tree	Fenwick Tree (BIT)
Code complexity	Medium (~30 lines)	Simple (~15 lines)
Range query	Any associative op	Prefix sums only
Range update	Yes (with lazy prop)	Yes (with tricks)
Point update	`O(log N)`	`O(log N)`
Space	`O(4N)`	`O(N)`
When to use	Range min/max, complex queries	Prefix sum with updates

💡 Key Insight: If you need range sum with updates, a Fenwick tree is simpler. If you need range minimum, range maximum, or any other aggregate that isn't a prefix operation, use a segment tree.

3.9.8 Range Minimum Query Variant

Just change the aggregate from + to min:

// Range Minimum Segment Tree — same structure, different operation
void build_min(int node, int start, int end, int arr[]) {
    if (start == end) { tree[node] = arr[start]; return; }
    int mid = (start + end) / 2;
    build_min(2*node, start, mid, arr);
    build_min(2*node+1, mid+1, end, arr);
    tree[node] = min(tree[2*node], tree[2*node+1]);  // ← changed to min
}

int query_min(int node, int start, int end, int l, int r) {
    if (r < start || end < l) return INT_MAX;   // ← identity for min
    if (l <= start && end <= r) return tree[node];
    int mid = (start + end) / 2;
    return min(query_min(2*node, start, mid, l, r),
               query_min(2*node+1, mid+1, end, l, r));
}

⚠️ Common Mistakes

Array size too small: Always allocate tree[4 * MAXN]. Using 2 * MAXN will cause out-of-bounds for non-power-of-2 sizes.
Wrong identity for out-of-range: For sum queries, return 0. For min queries, return INT_MAX. For max queries, return INT_MIN.
Forgetting to update the parent node: After updating a child, you MUST recompute the parent: tree[node] = tree[2*node] + tree[2*node+1].
0-indexed vs 1-indexed confusion: This implementation uses 0-indexed arrays but 1-indexed tree nodes. Be consistent.
Using segment tree when prefix sum suffices: If there are no updates, prefix sum (O(1) query) beats segment tree (O(log N) query). Use the simpler tool when appropriate.

Chapter Summary

📌 Key Takeaways

Operation	Time	Key Code Line
Build	`O(N)`	`tree[node] = tree[2node] + tree[2node+1]`
Point update	`O(log N)`	Recurse to leaf, update upward
Range query	`O(log N)`	Return early if fully inside/outside
Space	`O(4N)`	Allocate `tree[4 * MAXN]`

❓ FAQ

Q1: When to choose segment tree vs prefix sum?

A: Simple rule — if the array never changes, prefix sum is better (O(1) query vs O(log N)). If the array gets modified (point updates), use segment tree or BIT. If you need range updates (add a value to a range), use segment tree with lazy propagation.

Q2: Why does the tree array need size 4N?

A: A segment tree is a complete binary tree. When N is not a power of 2, the last level may be incomplete but still needs space. In the worst case, about 4N nodes are needed. Using 4*MAXN is a safe upper bound.

Q3: Which is better, Fenwick Tree (BIT) or Segment Tree?

A: BIT code is shorter (~15 lines vs 30 lines), has smaller constants, but can only handle "prefix-decomposable" operations (like sum). Segment Tree is more general (can do range min/max, GCD, etc.) and supports more complex operations (like lazy propagation). In contests: use BIT when possible, switch to Segment Tree when BIT is insufficient.

Q4: What types of queries can segment trees handle?

A: Any operation satisfying the associative law: sum (+), minimum (min), maximum (max), GCD, XOR, product, etc. The key is having an "identity element" (e.g., 0 for sum, INT_MAX for min, INT_MIN for max).

Q5: What is Lazy Propagation? When is it needed?

A: When you need to "add V to every element in range [L,R]" (range update), the naive approach updates every leaf from L to R (O(N)), which is too slow. Lazy Propagation stores updates "lazily" in internal nodes and only pushes them down when a child node actually needs to be queried, optimizing range updates to O(log N) as well.

🔗 Connections to Later Chapters

Chapter 3.2 (Prefix Sums): the "simplified version" of segment trees — use prefix sums when there are no update operations
Chapters 5.1–5.2 (Graphs): Euler Tour + segment tree can efficiently handle path queries on trees
Chapters 6.1–6.3 (DP): some DP optimizations require segment trees to maintain range min/max of DP values
Segment tree is a core data structure at USACO Gold level, mastering it solves a large number of Gold problems

Practice Problems

Problem 3.9.1 — Classic Range Sum 🟢 Easy Implement a segment tree. Handle N elements and Q queries: either update a single element or query the sum of a range.

Hint

Use the complete implementation from Section 3.9.6. Distinguish query type by a flag (1 = update, 2 = query).

Problem 3.9.2 — Range Minimum 🟡 Medium Same as above but query the minimum of a range. Handle point updates.

Hint

Change `+` to `min` in the tree operations. Return `INT_MAX` for out-of-range. The identity element for min is +∞.

Problem 3.9.3 — Number of Inversions 🔴 Hard Count the number of pairs (i,j) where i < j and arr[i] > arr[j].

Hint

Process elements left to right. For each element x, query how many elements already inserted are > x (using a segment tree indexed by value). Then insert x. Total inversions = sum of these counts.

🏆 Challenge: USACO 2016 February Gold: Fencing the Cows A problem requiring range max queries with updates. Try solving it with both a Fenwick tree and a segment tree to understand the tradeoffs.

3.9.6 Lazy Propagation — Range Updates in `O(log N)`

The segment tree so far handles point updates (change one element). But what about range updates: "add V to all elements in [L, R]"?

Without lazy propagation, we'd need O(N) updates (one per element). With lazy propagation, we achieve O(log N) range updates.

💡 Key Insight: Instead of immediately updating all affected leaf nodes, we "lazily" defer the update — store it at the highest applicable node and only push it down when we actually need the children.

How Lazy Propagation Works

Each node now stores two values:

tree[node]: the actual aggregated value (range sum) for this range
lazy[node]: a pending update that hasn't been pushed to children yet

The push-down rule: When we visit a node with a pending lazy update, we:

Apply the lazy update to the node's value
Pass the lazy update to both children (push down)
Clear the lazy for this node

Example: Array = [1, 2, 3, 4, 5], update "add 10 to [1..3]"

Initial tree:
         [15]            ← sum of [0..4]
        /      \
     [6]        [9]      ← sum of [0..2], [3..4]
    /   \      /   \
  [3]  [3]  [4]   [5]   ← sum of [0..1], [2], [3], [4]
  / \
 [1] [2]

After update "add 10 to [1..3]" with lazy propagation:
We need to update indices 1, 2, 3 (0-indexed).

At node covering [0..2]:
  - Only partially inside [1..3], so recurse down
  
At node covering [0..1]:
  - Partially inside [1..3], so recurse down
  - At leaf [1]: update arr[1] += 10. tree = [1, 12, 3, 4, 5]
  
At leaf [2]:
  - Fully inside [1..3]: store lazy, don't recurse!
  - lazy[covering [2]] = +10
  - tree[node] += 10 × (length of [2]) = +10
  
At node covering [3..4]:
  - Partially inside, recurse to [3]
  - Leaf [3]: += 10

Complete Lazy Propagation Implementation

// Solution: Segment Tree with Lazy Propagation
// Supports: range add update, range sum query — O(log N) each
#include <bits/stdc++.h>
using namespace std;

typedef long long ll;
const int MAXN = 100005;

ll tree[4 * MAXN];   // tree[node] = sum of range
ll lazy[4 * MAXN];   // lazy[node] = pending add value (0 means no pending)

// ── PUSH DOWN: apply pending lazy to children ──
// Called before we recurse into children
void pushDown(int node, int start, int end) {
    if (lazy[node] == 0) return;  // no pending update, nothing to do
    
    int mid = (start + end) / 2;
    int left = 2 * node, right = 2 * node + 1;
    
    // Update left child's sum: add lazy * (number of elements in left child)
    tree[left]  += lazy[node] * (mid - start + 1);
    tree[right] += lazy[node] * (end - mid);
    
    // Pass lazy to children
    lazy[left]  += lazy[node];
    lazy[right] += lazy[node];
    
    // Clear current node's lazy (it's been pushed down)
    lazy[node] = 0;
}

// ── BUILD: construct tree from array ──
void build(int node, int start, int end, ll arr[]) {
    lazy[node] = 0;  // no pending updates initially
    if (start == end) {
        tree[node] = arr[start];
        return;
    }
    int mid = (start + end) / 2;
    build(2*node, start, mid, arr);
    build(2*node+1, mid+1, end, arr);
    tree[node] = tree[2*node] + tree[2*node+1];
}

// ── RANGE UPDATE: add val to all elements in [l, r] ──
void update(int node, int start, int end, int l, int r, ll val) {
    if (r < start || end < l) return;  // out of range: no-op
    
    if (l <= start && end <= r) {
        // Current segment fully inside [l, r]: apply lazy here, don't recurse
        tree[node] += val * (end - start + 1);  // ← KEY: multiply by range length
        lazy[node] += val;                        // store pending for children
        return;
    }
    
    // Partial overlap: push down existing lazy, then recurse
    pushDown(node, start, end);  // ← CRITICAL: push before recursing!
    
    int mid = (start + end) / 2;
    update(2*node,   start, mid, l, r, val);
    update(2*node+1, mid+1, end, l, r, val);
    
    // Update current node from children
    tree[node] = tree[2*node] + tree[2*node+1];
}

// ── RANGE QUERY: sum of elements in [l, r] ──
ll query(int node, int start, int end, int l, int r) {
    if (r < start || end < l) return 0;  // out of range
    
    if (l <= start && end <= r) {
        return tree[node];  // fully inside: return stored sum (already includes lazy!)
    }
    
    // Partial overlap: push down, then recurse
    pushDown(node, start, end);  // ← CRITICAL: push before recursing!
    
    int mid = (start + end) / 2;
    ll leftSum  = query(2*node,   start, mid, l, r);
    ll rightSum = query(2*node+1, mid+1, end, l, r);
    return leftSum + rightSum;
}

// ── COMPLETE EXAMPLE ──
int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    
    int n, q;
    cin >> n >> q;
    
    ll arr[MAXN];
    for (int i = 0; i < n; i++) cin >> arr[i];
    
    build(1, 0, n-1, arr);
    
    while (q--) {
        int type;
        cin >> type;
        
        if (type == 1) {
            // Range update: add val to [l, r]
            int l, r; ll val;
            cin >> l >> r >> val;
            update(1, 0, n-1, l, r, val);
        } else {
            // Range query: sum of [l, r]
            int l, r;
            cin >> l >> r;
            cout << query(1, 0, n-1, l, r) << "\n";
        }
    }
    
    return 0;
}

Visual Trace: Range Update with Lazy

Array: [1, 2, 3, 4, 5, 6]  (0-indexed)

Initial tree (sums):
tree[1]  = 21  [0..5]
tree[2]  =  6  [0..2]    tree[3]  = 15  [3..5]
tree[4]  =  3  [0..1]    tree[5]  =  3  [2..2]    tree[6]  =  7  [3..4]    tree[7]  =  6  [5..5]
tree[8]  =  1  [0..0]    tree[9]  =  2  [1..1]   tree[12]  =  4  [3..3]   tree[13]  =  3  [4..4]

update(1, 0, 5, 1, 4, +10):  (add 10 to indices 1..4)

  At node 1 [0..5]: partial overlap, pushDown(1)—no lazy. Recurse.
    At node 2 [0..2]: partial overlap, pushDown(2)—no lazy. Recurse.
      At node 4 [0..1]: partial overlap, pushDown(4)—no lazy. Recurse.
        At node 8 [0..0]: outside [1..4]. Return.
        At node 9 [1..1]: FULLY inside [1..4].
          tree[9] += 10×1 = 12. lazy[9] = 10. Return.
      tree[4] = tree[8] + tree[9] = 1 + 12 = 13.
      At node 5 [2..2]: FULLY inside [1..4].
        tree[5] += 10×1 = 13. lazy[5] = 10. Return.
    tree[2] = 13 + 13 = 26.
    At node 3 [3..5]: partial overlap. pushDown(3)—no lazy. Recurse.
      At node 6 [3..4]: FULLY inside [1..4].
        tree[6] += 10×2 = 27. lazy[6] = 10. Return.   ← lazy stored for later!
      At node 7 [5..5]: outside [1..4]. Return.
    tree[3] = 27 + 6 = 33.
  tree[1] = 26 + 33 = 59. ✓ (original 21 + 10×4 = 61... let me recheck)

query(1, 0, 5, 2, 3): sum of [2..3]
  At node 1 [0..5]: partial. pushDown(1)—no lazy. Recurse.
  At node 2 [0..2]: partial. pushDown(2)—no lazy. Recurse.
    At node 4 [0..1]: outside [2..3]. Return 0.
    At node 5 [2..2]: FULLY inside. Return tree[5] = 13. ✓ (arr[2] = 3+10 = 13)
  At node 3 [3..5]: partial. pushDown(3)—no lazy. Recurse.
    At node 6 [3..4]: partial. pushDown(6)! (lazy[6] = 10)
      tree[12] += 10×1 = 14, lazy[12] = 10.
      tree[13] += 10×1 = 13, lazy[13] = 10.
      lazy[6] = 0.
      At node 12 [3..3]: FULLY inside. Return tree[12] = 14. ✓ (arr[3] = 4+10 = 14)
      At node 13 [4..4]: outside. Return 0.
  Result = 13 + 14 = 27. ✓

Complexity Analysis

Build

O(N)

Range Update

O(log N)

Range Query

O(log N)

Space

O(4N) tree + O(4N) lazy

Why O(log N)? Each update/query visits at most O(log N) "fully covered" nodes (where we stop and apply lazy). Between two consecutive fully-covered nodes at the same level, there's at most one partially-covered node that requires descent.

⚠️ Lazy Propagation Common Mistakes

Wrong — Forget pushDown before recursion

// BAD: This gives wrong answers!
void update(int node, int start, int end, int l, int r, ll val) {
    if (r < start || end < l) return;
    if (l <= start && end <= r) {
        tree[node] += val * (end - start + 1);
        lazy[node] += val;
        return;
    }
    // FORGOT: pushDown(node, start, end); ← BUG!
    int mid = (start + end) / 2;
    update(2*node,   start, mid, l, r, val);
    update(2*node+1, mid+1, end, l, r, val);
    tree[node] = tree[2*node] + tree[2*node+1];
}

Correct — Always pushDown before recursion

// GOOD: Push pending lazy before going to children
void update(int node, int start, int end, int l, int r, ll val) {
    if (r < start || end < l) return;
    if (l <= start && end <= r) {
        tree[node] += val * (end - start + 1);
        lazy[node] += val;
        return;
    }
    pushDown(node, start, end);  // ← ALWAYS before recursing!
    int mid = (start + end) / 2;
    update(2*node,   start, mid, l, r, val);
    update(2*node+1, mid+1, end, l, r, val);
    tree[node] = tree[2*node] + tree[2*node+1];
}

Top 4 Lazy Propagation Bugs:

Forgetting pushDown before recursion — children receive parent's lazy on top of their own, giving wrong query results
Wrong size multiplier — tree[node] += val instead of tree[node] += val * (end - start + 1). The node stores a SUM, so adding val to each of (end-start+1) elements means adding val*(size) to the sum.
Not initializing lazy[] to 0 — use memset(lazy, 0, sizeof(lazy)) or initialize in build()
Mixing lazy for different operations — if you have both "range add" and "range multiply" lazy, the order matters. You need two separate lazy arrays and a careful push-down combining both.

Generalizing Lazy Propagation

The pattern works for any operation where:

The aggregate is an associative operation (sum, min, max, XOR...)
The update distributes over the aggregate (sum += k*n when adding k to n elements)

Common variants:

Update	Query	Lazy stores	Push-down formula
Range Add	Range Sum	Add delta	`tree[child] += lazy * size; lazy[child] += lazy`
Range Set	Range Sum	Set value	`tree[child] = lazy * size; lazy[child] = lazy`
Range Add	Range Min	Add delta	`tree[child] += lazy; lazy[child] += lazy`
Range Set	Range Min	Set value	`tree[child] = lazy; lazy[child] = lazy`

📖 Chapter 3.10 ⏱️ ~60 min read 🎯 Advanced

Chapter 3.10: Fenwick Tree (Binary Indexed Tree)

📝 Before You Continue: You should already know prefix sums (Chapter 3.2) and bitwise operations. This chapter complements Segment Tree (Chapter 3.9) — BIT code is shorter, with smaller constants, but supports fewer operations.

Fenwick Tree (also known as Binary Indexed Tree / BIT) is one of the most commonly used data structures in competitive programming: under 15 lines of code, yet supports point updates and prefix queries in O(log N) time.

3.10.1 The Core Idea: What Is `lowbit`?

Bitwise Principle of lowbit

For any positive integer x, lowbit(x) = x & (-x) returns the value of the lowest set bit in the binary representation of x.

x  =  6  →  binary: 0110
-x = -6  →  two's complement: 1010  (bitwise NOT + 1)
x & (-x) = 0010 = 2   ← lowest set bit corresponds to 2^1 = 2

Examples:

x	Binary	-x (two's complement)	x & (-x)	Meaning
1	0001	1111	0001 = 1	Manages 1 element
2	0010	1110	0010 = 2	Manages 2 elements
3	0011	1101	0001 = 1	Manages 1 element
4	0100	1100	0100 = 4	Manages 4 elements
6	0110	1010	0010 = 2	Manages 2 elements
8	1000	1000	1000 = 8	Manages 8 elements

BIT Tree Index Intuition

The elegance of BIT: tree[i] does not store a single element, but stores the sum of a range, with length exactly lowbit(i).

BIT 树形结构示意（n=8）：

flowchart BT
    T1["tree[1]\nA[1]\n管理 1 个"]
    T2["tree[2]\nA[1..2]\n管理 2 个"]
    T3["tree[3]\nA[3]\n管理 1 个"]
    T4["tree[4]\nA[1..4]\n管理 4 个"]
    T5["tree[5]\nA[5]\n管理 1 个"]
    T6["tree[6]\nA[5..6]\n管理 2 个"]
    T7["tree[7]\nA[7]\n管理 1 个"]
    T8["tree[8]\nA[1..8]\n管理 8 个"]
    T1 --> T2
    T3 --> T4
    T2 --> T4
    T5 --> T6
    T7 --> T8
    T6 --> T8
    T4 --> T8
    style T8 fill:#dbeafe,stroke:#3b82f6
    style T4 fill:#e0f2fe,stroke:#0284c7
    style T2 fill:#f0f9ff,stroke:#38bdf8
    style T6 fill:#f0f9ff,stroke:#38bdf8

查询 prefix(7) 的跳转路径：

flowchart LR
    Q7["i=7\n加 tree[7]=A[7]"] -->|"7-lowbit(7)=6"| Q6
    Q6["i=6\n加 tree[6]=A[5..6]"] -->|"6-lowbit(6)=4"| Q4
    Q4["i=4\n加 tree[4]=A[1..4]"] -->|"4-lowbit(4)=0"| Q0
    Q0(["i=0 停止\n共 3 步 = O(log 7)"])
    style Q0 fill:#dcfce7,stroke:#16a34a

💡 跳转规律： 查询时 i -= lowbit(i)（向下跳），更新时 i += lowbit(i)（向上跳）。每次跳转消除最低位的 1，最多 log N 步。

Index i:  1    2    3    4    5    6    7    8
Range managed by tree[i]:
  tree[1] = A[1]            (length lowbit(1)=1)
  tree[2] = A[1]+A[2]       (length lowbit(2)=2)
  tree[3] = A[3]            (length lowbit(3)=1)
  tree[4] = A[1]+...+A[4]   (length lowbit(4)=4)
  tree[5] = A[5]            (length lowbit(5)=1)
  tree[6] = A[5]+A[6]       (length lowbit(6)=2)
  tree[7] = A[7]            (length lowbit(7)=1)
  tree[8] = A[1]+...+A[8]   (length lowbit(8)=8)

更新位置 3 的跳转路径：

flowchart LR
    U3["i=3\n更新 tree[3]"] -->|"3+lowbit(3)=4"| U4
    U4["i=4\n更新 tree[4]"] -->|"4+lowbit(4)=8"| U8
    U8["i=8\n更新 tree[8]"] -->|"8+lowbit(8)=16>n"| U_end
    U_end(["i>n 停止\n共 3 步 = O(log N)"])
    style U_end fill:#dcfce7,stroke:#16a34a

When querying prefix sum prefix(7), jump up via i -= lowbit(i):

i=7: add tree[7] (manages A[7]), then 7 - lowbit(7) = 7 - 1 = 6
i=6: add tree[6] (manages A[5..6]), then 6 - lowbit(6) = 6 - 2 = 4
i=4: add tree[4] (manages A[1..4]), then 4 - lowbit(4) = 4 - 4 = 0, stop

Total 3 steps = O(log 7) ≈ 3 steps.

When updating position 3, jump up via i += lowbit(i):

i=3: update tree[3], then 3 + lowbit(3) = 3 + 1 = 4
i=4: update tree[4], then 4 + lowbit(4) = 4 + 4 = 8
i=8: update tree[8], 8 > n, stop

3.10.2 Point Update + Prefix Query — Complete Code

// ══════════════════════════════════════════════════════════════
// Fenwick Tree (Binary Indexed Tree) — Classic Implementation
// Supports: Point Update O(log N), Prefix Sum Query O(log N)
// Arrays are 1-INDEXED (critical!)
// ══════════════════════════════════════════════════════════════
#include <bits/stdc++.h>
using namespace std;

const int MAXN = 300005;

int n;
long long tree[MAXN];  // BIT array, 1-indexed

// ── lowbit: returns the value of the lowest set bit ──
// x & (-x) works because:
//   -x in two's complement = ~x + 1
//   The lowest set bit of x is preserved, all higher bits cancel out
// Example: x=6 (0110), -x=1010, x&(-x)=0010=2
inline int lowbit(int x) {
    return x & (-x);
}

// ── update: add val to position i ──
// Walk UP the tree: i += lowbit(i)
// Each ancestor that covers position i gets updated
void update(int i, long long val) {
    for (; i <= n; i += lowbit(i))
        tree[i] += val;
    // Time: O(log N) — at most log2(N) iterations
}

// ── query: return prefix sum A[1..i] ──
// Walk DOWN the tree: i -= lowbit(i)
// Decompose [1..i] into O(log N) non-overlapping ranges
long long query(int i) {
    long long sum = 0;
    for (; i > 0; i -= lowbit(i))
        sum += tree[i];
    return sum;
    // Time: O(log N) — at most log2(N) iterations
}

// ── build: initialize BIT from an existing array A[1..n] ──
// Method 1: N individual updates — O(N log N)
void build_slow(long long A[]) {
    fill(tree + 1, tree + n + 1, 0LL);
    for (int i = 1; i <= n; i++)
        update(i, A[i]);
}

// Method 2: O(N) build using the "direct parent" trick
void build_fast(long long A[]) {
    for (int i = 1; i <= n; i++) {
        tree[i] += A[i];
        int parent = i + lowbit(i);  // direct parent in BIT
        if (parent <= n)
            tree[parent] += tree[i];
    }
}

// ── Full Example ──
int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int q;
    cin >> n >> q;

    long long A[MAXN] = {};
    for (int i = 1; i <= n; i++) cin >> A[i];
    build_fast(A);  // O(N) initialization

    while (q--) {
        int type;
        cin >> type;
        if (type == 1) {
            // Point update: A[i] += val
            int i; long long val;
            cin >> i >> val;
            update(i, val);
        } else {
            // Prefix query: sum of A[1..r]
            int r;
            cin >> r;
            cout << query(r) << "\n";
        }
    }
    return 0;
}

3.10.3 Range Query = prefix(r) - prefix(l-1)

Range query sum(l, r) is identical to the prefix sum technique:

// Range sum query: sum of A[l..r]
// Time: O(log N) — two prefix queries
long long range_query(int l, int r) {
    return query(r) - query(l - 1);
    // query(r)   = A[1] + A[2] + ... + A[r]
    // query(l-1) = A[1] + A[2] + ... + A[l-1]
    // difference = A[l] + A[l+1] + ... + A[r]
}

// Example usage:
// A = [3, 1, 4, 1, 5, 9, 2, 6]  (1-indexed)
// range_query(3, 6) = query(6) - query(2)
//                  = (3+1+4+1+5+9) - (3+1)
//                  = 23 - 4 = 19
// Verify: A[3]+A[4]+A[5]+A[6] = 4+1+5+9 = 19 ✓

3.10.4 Comparison: Prefix Sum vs BIT vs Segment Tree

Operation	Prefix Sum Array	Fenwick Tree (BIT)	Segment Tree
Build	`O(N)`	`O(N)` or `O(N log N)`	`O(N)`
Prefix Query	`O(1)`	`O(log N)`	`O(log N)`
Range Query	`O(1)`	`O(log N)`	`O(log N)`
Point Update	`O(N)` rebuild	`O(log N)` ✓	`O(log N)` ✓
Range Update	`O(N)`	`O(log N)` (Difference BIT)	`O(log N)` (lazy tag)
Range Min/Max	`O(1)` (sparse table)	❌ Not supported	✓ Supported
Code Complexity	Minimal	Simple (10 lines)	Complex (50+ lines)
Constant Factor	Smallest	Very small	Larger
Space	`O(N)`	`O(N)`	`O(4N)`

When to choose BIT?

✅ Only need prefix/range sum + point update
✅ Need extremely concise code (fewer bugs in contest)
✅ Counting inversions, merge sort counting problems
❌ Need range min/max → use Segment Tree
❌ Need complex range operations (range multiply, etc.) → use Segment Tree

3.10.5 Interactive Visualization: BIT Update Process

3.10.6 Range Update + Point Query (Difference BIT)

Standard BIT supports "point update + prefix query". Using the difference array technique, it can instead support "range update + point query".

Principle

Let difference array D[i] = A[i] - A[i-1] (D[1] = A[1]), then:

A[i] = D[1] + D[2] + ... + D[i] (i.e., A[i] is the prefix sum of D)
Adding val to all A[l..r] is equivalent to: D[l] += val; D[r+1] -= val

// ══════════════════════════════════════════════════════════════
// Difference BIT: Range Update + Point Query
// ══════════════════════════════════════════════════════════════
#include <bits/stdc++.h>
using namespace std;

const int MAXN = 300005;
int n;
long long diff_bit[MAXN];  // BIT over difference array D[]

inline int lowbit(int x) { return x & (-x); }

// Update D[i] += val in the difference BIT
void diff_update(int i, long long val) {
    for (; i <= n; i += lowbit(i))
        diff_bit[i] += val;
}

// Query A[i] = sum of D[1..i] = prefix query on diff BIT
long long diff_query(int i) {
    long long s = 0;
    for (; i > 0; i -= lowbit(i))
        s += diff_bit[i];
    return s;
}

// Range update: add val to all A[l..r]
// Equivalent to: D[l] += val, D[r+1] -= val
void range_update(int l, int r, long long val) {
    diff_update(l, val);       // D[l] += val
    diff_update(r + 1, -val);  // D[r+1] -= val
}

// Point query: return current value of A[i]
// A[i] = D[1] + D[2] + ... + D[i] = prefix_sum(D, i)
long long point_query(int i) {
    return diff_query(i);
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    int q;
    cin >> n >> q;
    // Initialize: read A[i], build diff BIT
    for (int i = 1; i <= n; i++) {
        long long x; cin >> x;
        // D[i] = A[i] - A[i-1]: use two updates
        diff_update(i, x);
        if (i + 1 <= n) diff_update(i + 1, -x); // will be overridden by next iteration
        // Simpler: just set D[i] = A[i]-A[i-1] directly
    }
    // Better initialization using range_update for each element:
    // fill diff_bit with 0, then range_update(i, i, A[i]) for each i
    // Or: diff_update(1, A[1]); for i>=2: diff_update(i, A[i]-A[i-1])

    while (q--) {
        int type; cin >> type;
        if (type == 1) {
            int l, r; long long val;
            cin >> l >> r >> val;
            range_update(l, r, val);  // A[l..r] += val, O(log N)
        } else {
            int i; cin >> i;
            cout << point_query(i) << "\n";  // query A[i], O(log N)
        }
    }
    return 0;
}

Advanced: Range Update + Range Query (Dual BIT)

To support both range update + range query simultaneously, use two BITs:

// ══════════════════════════════════════════════════════════════
// Double BIT: Range Update + Range Query
// Formula: sum(1..r) = B1[r] * r - B2[r]
// where B1 is BIT over D[], B2 is BIT over (i-1)*D[i]
// ══════════════════════════════════════════════════════════════
long long B1[MAXN], B2[MAXN];  // Two BITs

inline int lowbit(int x) { return x & (-x); }

void add(long long* b, int i, long long v) {
    for (; i <= n; i += lowbit(i)) b[i] += v;
}
long long sum(long long* b, int i) {
    long long s = 0;
    for (; i > 0; i -= lowbit(i)) s += b[i];
    return s;
}

// Range update: add val to A[l..r]
void range_add(int l, int r, long long val) {
    add(B1, l, val);
    add(B1, r + 1, -val);
    add(B2, l, val * (l - 1));     // compensate for prefix formula
    add(B2, r + 1, -val * r);
}

// Prefix sum A[1..r]
long long prefix_sum(int r) {
    return sum(B1, r) * r - sum(B2, r);
}

// Range sum A[l..r]
long long range_sum(int l, int r) {
    return prefix_sum(r) - prefix_sum(l - 1);
}

3.10.7 USACO-Style Problem: Counting Inversions with BIT

Problem Statement

Counting Inversions (O(N log N))

Given an integer array A of length N (distinct elements, range 1..N), count the number of inversions.

Inversion: a pair of indices (i, j) where i < j but A[i] > A[j].

Constraints: N ≤ 3×10⁵, requires O(N log N) solution.

Sample Input:

5
3 1 4 2 5

Sample Output:

Explanation: Inversions are (3,1), (3,2), (4,2), total 3 pairs.

Solution: BIT Inversion Count

// ══════════════════════════════════════════════════════════════
// Counting Inversions using Fenwick Tree — O(N log N)
//
// Key Idea:
//   Process A[i] from left to right.
//   For each A[i], the number of inversions with A[i] as the
//   RIGHT element = count of already-processed values > A[i]
//                 = (elements processed so far) - (elements <= A[i])
//                 = i-1 - prefix_query(A[i])
//   Sum over all i gives total inversions.
//
// BIT role: track frequency of seen values.
//   After seeing value v: update(v, +1)
//   Query # of values <= x: query(x)
// ══════════════════════════════════════════════════════════════
#include <bits/stdc++.h>
using namespace std;

typedef long long ll;
const int MAXN = 300005;

int n;
int bit[MAXN];  // BIT for frequency counting; bit[v] tracks how many times v appeared

inline int lowbit(int x) { return x & (-x); }

// Add 1 to position v (we saw value v)
void update(int v) {
    for (; v <= n; v += lowbit(v))
        bit[v]++;
}

// Count how many values in [1..v] have been seen
int query(int v) {
    int cnt = 0;
    for (; v > 0; v -= lowbit(v))
        cnt += bit[v];
    return cnt;
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    cin >> n;
    
    ll inversions = 0;
    
    for (int i = 1; i <= n; i++) {
        int a;
        cin >> a;
        
        // Count inversions where a is the RIGHT element:
        // # of already-seen values GREATER than a
        // = (i-1 elements seen so far) - (# of seen values <= a)
        int less_or_equal = query(a);         // # of seen values in [1..a]
        int greater = (i - 1) - less_or_equal; // # of seen values in [a+1..n]
        inversions += greater;
        
        // Mark that we've now seen value a
        update(a);
    }
    
    cout << inversions << "\n";
    return 0;
}

/*
Trace for A = [3, 1, 4, 2, 5]:

i=1, a=3: seen=[], query(3)=0, greater=0-0=0. inversions=0. update(3).
i=2, a=1: seen=[3], query(1)=0, greater=1-0=1. inversions=1. update(1).
           (3 > 1: that's 1 inversion: (3,1) ✓)
i=3, a=4: seen=[3,1], query(4)=2, greater=2-2=0. inversions=1. update(4).
           (no element > 4 was seen before)
i=4, a=2: seen=[3,1,4], query(2)=1, greater=3-1=2. inversions=3. update(2).
           (3>2 and 4>2: 2 inversions: (3,2),(4,2) ✓)
i=5, a=5: seen=[3,1,4,2], query(5)=4, greater=4-4=0. inversions=3. update(5).

Final: 3 ✓
*/

Complexity Analysis:

Time: O(N log N) — N iterations, each O(log N) for update + query
Space: O(N) for BIT

Extension: If array elements are not in range 1..N, first apply coordinate compression before using BIT:

// Coordinate compression for arbitrary values
vector<int> A(n);
for (int i = 0; i < n; i++) cin >> A[i];

// Step 1: sort and deduplicate
vector<int> sorted_A = A;
sort(sorted_A.begin(), sorted_A.end());
sorted_A.erase(unique(sorted_A.begin(), sorted_A.end()), sorted_A.end());

// Step 2: replace each value with its rank (1-indexed)
for (int i = 0; i < n; i++) {
    A[i] = lower_bound(sorted_A.begin(), sorted_A.end(), A[i]) - sorted_A.begin() + 1;
    // A[i] is now in [1..M] where M = sorted_A.size()
}
// Now use BIT with n = sorted_A.size()

3.10.8 Common Mistakes

❌ Mistake 1: Wrong `lowbit` Implementation

// ❌ WRONG — common typo/confusion
int lowbit(int x) { return x & (x - 1); }  // This CLEARS the lowest bit, not returns it!
// x=6 (0110): x&(x-1) = 0110&0101 = 0100 = 4 (WRONG, should be 2)

int lowbit(int x) { return x % 2; }         // Only works for last bit, not lowbit value

// ✅ CORRECT
int lowbit(int x) { return x & (-x); }
// x=6: -6 = ...11111010 (two's complement)
// 0110 & 11111010 = 0010 = 2 ✓

Memory trick: x & (-x) reads as "x AND negative-x". -x is bitwise NOT plus 1, which clears all bits below the lowest set bit, flips all bits above it, and the AND operation keeps only the lowest set bit.

❌ Mistake 2: 0-indexed Array (the 0-index trap)

BIT must use 1-indexed arrays. 0-indexed causes infinite loops!

// ❌ WRONG — 0-indexed causes infinite loop!
// If i = 0: query loop: i -= lowbit(0) = 0 - (0 & 0) = 0 - 0 = 0 → infinite loop!
// (Actually lowbit(0) = 0 & 0 = 0, so i never decreases)

void query_WRONG(int i) {  // i is 0-indexed
    int s = 0;
    for (; i > 0; i -= lowbit(i))  // if i=0 initially, loop doesn't execute but
        s += bit[i];               // if called with i=0 during calculation... disaster
    return s;
}

// ❌ WRONG — forgetting +1 when converting to 1-indexed
int arr[n]; // 0-indexed A[0..n-1]
for (int i = 0; i < n; i++) {
    update(i, arr[i]);    // BUG: should be update(i+1, arr[i])
}

// ✅ CORRECT — always shift to 1-indexed
for (int i = 0; i < n; i++) {
    update(i + 1, arr[i]);  // convert 0-indexed i to 1-indexed i+1
}
// And remember: query(r+1) - query(l) for 0-indexed range [l, r]

❌ Mistake 3: Integer Overflow in Large Sum

// ❌ WRONG — tree[] should be long long for large sums
int tree[MAXN];   // overflow if sum > 2^31

// ✅ CORRECT
long long tree[MAXN];

// Also: when counting inversions, inversions can be up to N*(N-1)/2 ≈ 4.5×10^10 for N=3×10^5
// Always use long long for the result counter!
long long inversions = 0;  // ✅ not int!

❌ Mistake 4: Forgetting to Clear BIT Between Test Cases

// ❌ WRONG — in problems with multiple test cases
int T; cin >> T;
while (T--) {
    // forgot to clear tree[]!
    // Old data from previous test case corrupts results
    solve();
}

// ✅ CORRECT — reset before each test case
int T; cin >> T;
while (T--) {
    fill(tree + 1, tree + n + 1, 0LL);  // clear BIT
    solve();
}

3.10.9 Chapter Summary

📋 Formula Quick Reference

Operation	Code	Description
lowbit	`x & (-x)`	Value of lowest set bit of x
Point Update	`for(;i<=n;i+=lowbit(i)) t[i]+=v`	Propagate upward
Prefix Query	`for(;i>0;i-=lowbit(i)) s+=t[i]`	Decompose downward
Range Query	`query(r) - query(l-1)`	Difference formula
Range Update (Diff BIT)	`upd(l,+v); upd(r+1,-v)`	Difference array
Inversion Count	`(i-1) - query(a[i])`	Count when processing each element
Array must be	1-indexed	0-indexed → infinite loop

❓ FAQ

Q1: Both BIT and Segment Tree support prefix sum + point update. Which should I choose?

A: Use BIT whenever possible. BIT code is only 10 lines, has smaller constants (empirically 2-3x faster), and lower error probability. Only choose Segment Tree when you need range min/max (RMQ), range coloring, or more complex range operations. In contests, BIT is the "default weapon", Segment Tree is "heavy artillery".

Q2: Can BIT support Range Minimum Query (RMQ)?

A: Standard BIT cannot support RMQ, because the min operation has no "inverse" (cannot "undo" a merged min value like subtraction). For range min/max, use Segment Tree or Sparse Table. There is a "static BIT for RMQ" technique, but it only works without updates and has limited practical use.

Q3: Can BIT support 2D (2D BIT)?

A: Yes! 2D BIT solves 2D prefix sum + point update problems, with complexity O(log N × log M). The code structure uses two nested loops:
// 2D BIT update
void update2D(int x, int y, long long v) {
    for (int i = x; i <= N; i += lowbit(i))
        for (int j = y; j <= M; j += lowbit(j))
            bit[i][j] += v;
}
Less common in USACO, but occasionally needed for 2D coordinate counting problems.

3.10.10 Practice Problems

🟢 Easy 1: Range Sum Query (Single-point Update)

Given an array of length N, support two operations:

1 i x: Increase A[i] by x
2 l r: Query A[l] + A[l+1] + ... + A[r]

Constraints: N, Q ≤ 10⁵.

Hint: Direct BIT application. Use update(i, x) and query(r) - query(l-1).

🟢 Easy 2: Number of Elements Less Than K

Given N operations, each either inserts an integer (range 1..10⁶) or queries "how many of the currently inserted integers are ≤ K?"

Hint: BIT maintains a frequency array over the value domain. update(v, 1) inserts value v, query(K) is the answer.

🟡 Medium 1: Range Add, Point Query

Given an array of length N (initially all zeros), support two operations:

1 l r x: Add x to every element in A[l..r]
2 i: Query the current value of A[i]

Constraints: N, Q ≤ 3×10⁵.

Hint: Use Difference BIT (Section 3.10.6).

🟡 Medium 2: Counting Inversions (with Coordinate Compression)

Given an array of length N with elements in range 1..10⁹ (possibly repeated). Count the number of inversions.

Constraints: N ≤ 3×10⁵.

Hint: First apply coordinate compression, then use BIT counting (variant of Section 3.10.7). Note equal elements: (i,j) with i<j and A[i]>A[j] (strictly greater) counts as an inversion.

🔴 Hard: Range Add, Range Sum (Double BIT)

Given an array of length N, support two operations:

1 l r x: Add x to every element in A[l..r]
2 l r: Query A[l] + ... + A[r]

Constraints: N, Q ≤ 3×10⁵, elements and x can reach 10⁹.

Hint: Use Dual BIT (Dual BIT method at end of Section 3.10.6). Formula: prefix_sum(r) = B1[r] * r - B2[r], where B1 maintains the difference array and B2 maintains the weighted difference array. Derivation: let D[i] be the difference array, then A[1]+...+A[r] = Σᵢ₌₁ʳ Σⱼ₌₁ⁱ D[j] = Σⱼ₌₁ʳ D[j]*(r-j+1) = (r+1)Σ D[j] - Σ jD[j].

💡 Chapter Connection: BIT and Segment Tree are the two most commonly paired data structures in USACO. BIT handles 80% of scenarios with 1/5 the code of Segment Tree. After mastering BIT, return to Chapter 3.9 to learn Segment Tree lazy propagation—the territory BIT cannot reach.

📖 Chapter 3.11 ⏱️ ~60 min read 🎯 Intermediate Tree Graph

Chapter 3.11: Binary Trees

Prerequisites You should be comfortable with: recursion (Chapter 2.3), pointers / structs in C++, and basic graph concepts (adjacency, nodes, edges). This chapter is a prerequisite for Chapter 5.1 (Graph Algorithms) and Chapter 5.3 (Trees & Special Graphs).

Binary trees are the foundation of some of the most important data structures in competitive programming — from Binary Search Trees (BST) to Segment Trees to Heaps. Understanding them deeply will make graph algorithms, DP on trees, and USACO Gold problems significantly more approachable.

3.11.1 Binary Tree Fundamentals

A binary tree is a hierarchical data structure where:

Each node has at most 2 children: a left child and a right child
There is exactly one root node (no parent)
Each non-root node has exactly one parent

🌳

Core Terminology

Root — topmost node (depth 0)
Leaf — node with no children
Internal node — node with at least one child
Height — longest path from root to any leaf
Depth — distance from root to that node
Subtree — a node and all its descendants

Visual Example

Binary Tree Structure

In this tree:

Height = 2 (longest root-to-leaf path: A → B → D)
Root = A, Leaves = D, E, F
B is parent of D and E; D is left child of B, E is right child of B

C++ Node Definition

Throughout this chapter, we use a consistent struct TreeNode:

// Solution: Basic Binary Tree Node
#include <bits/stdc++.h>
using namespace std;

struct TreeNode {
    int val;
    TreeNode* left;
    TreeNode* right;
    
    // Constructor: initialize with value, no children
    TreeNode(int v) : val(v), left(nullptr), right(nullptr) {}
};

💡 Why raw pointers? In competitive programming, we often manage memory manually for speed. nullptr (C++11) is always safer than uninitialized pointers — always initialize left = right = nullptr.

3.11.2 Binary Search Trees (BST)

A Binary Search Tree is a binary tree with a crucial ordering property:

BST Property

left < node < right

O(log N) avg

Insert

O(log N) avg

Delete

O(log N) avg

Worst Case

O(N)

BST Property: For every node v:

All values in the left subtree are strictly less than v.val
All values in the right subtree are strictly greater than v.val

       [5]          ← valid BST
      /    \
    [3]    [8]
   /   \   /  \
  [1] [4] [7] [10]

  left of 5 = {1, 3, 4} — all < 5  ✓
  right of 5 = {7, 8, 10} — all > 5  ✓

3.11.2.1 BST Search

// Solution: BST Search — O(log N) average, O(N) worst case
// Returns pointer to node with value 'target', or nullptr if not found
TreeNode* search(TreeNode* root, int target) {
    // Base case: empty tree or found the target
    if (root == nullptr || root->val == target) {
        return root;
    }
    // BST property: go left if target is smaller
    if (target < root->val) {
        return search(root->left, target);
    }
    // Go right if target is larger
    return search(root->right, target);
}

Iterative version (avoids stack overflow for large trees):

// Solution: BST Search Iterative
TreeNode* searchIterative(TreeNode* root, int target) {
    while (root != nullptr) {
        if (target == root->val) return root;       // found
        else if (target < root->val) root = root->left;   // go left
        else root = root->right;                     // go right
    }
    return nullptr;  // not found
}

3.11.2.2 BST Insert

// Solution: BST Insert — O(log N) average
// Returns the (potentially new) root of the subtree
TreeNode* insert(TreeNode* root, int val) {
    // If we've reached a null spot, create the new node here
    if (root == nullptr) {
        return new TreeNode(val);
    }
    if (val < root->val) {
        root->left = insert(root->left, val);   // recurse left
    } else if (val > root->val) {
        root->right = insert(root->right, val); // recurse right
    }
    // val == root->val: duplicate, ignore (or handle as needed)
    return root;
}

// Usage:
// TreeNode* root = nullptr;
// root = insert(root, 5);
// root = insert(root, 3);
// root = insert(root, 8);

3.11.2.3 BST Delete

Deletion is the trickiest BST operation. There are 3 cases:

Node has no children (leaf): simply delete it
Node has one child: replace node with its child
Node has two children: replace with inorder successor (smallest in right subtree), then delete the successor

// Solution: BST Delete — O(log N) average
// Helper: find minimum node in a subtree
TreeNode* findMin(TreeNode* node) {
    while (node->left != nullptr) node = node->left;
    return node;
}

// Delete node with value 'val' from tree rooted at 'root'
TreeNode* deleteNode(TreeNode* root, int val) {
    if (root == nullptr) return nullptr;  // value not found
    
    if (val < root->val) {
        // Case: target is in left subtree
        root->left = deleteNode(root->left, val);
    } else if (val > root->val) {
        // Case: target is in right subtree
        root->right = deleteNode(root->right, val);
    } else {
        // Found the node to delete!
        
        // Case 1: No children (leaf)
        if (root->left == nullptr && root->right == nullptr) {
            delete root;
            return nullptr;
        }
        // Case 2a: Only right child
        else if (root->left == nullptr) {
            TreeNode* temp = root->right;
            delete root;
            return temp;
        }
        // Case 2b: Only left child
        else if (root->right == nullptr) {
            TreeNode* temp = root->left;
            delete root;
            return temp;
        }
        // Case 3: Two children — replace with inorder successor
        else {
            TreeNode* successor = findMin(root->right);  // smallest in right subtree
            root->val = successor->val;                  // copy successor's value
            root->right = deleteNode(root->right, successor->val);  // delete successor
        }
    }
    return root;
}

3.11.2.4 BST Degeneration Problem

⚠️ Critical Issue: If you insert values in sorted order (1, 2, 3, 4, 5...), the BST becomes a linked list:

[1]
  \
  [2]
    \
    [3]        ← This is O(N) per operation, not O(log N)!
      \
      [4]
        \
        [5]

This is why balanced BSTs (AVL trees, Red-Black trees) exist. In C++, std::set and std::map are implemented as Red-Black trees — always O(log N).

🔗 Key takeaway: In competitive programming, use std::set / std::map instead of writing your own BST. They are always balanced. Learn BST fundamentals to understand why they work, then use the STL in contests (see Chapter 3.8).

3.11.3 Tree Traversals

Traversal = visiting every node exactly once. There are 4 fundamental traversals:

Traversal	Order	Use Case
Preorder	Root → Left → Right	Copy tree, prefix expression
Inorder	Left → Root → Right	Sorted output from BST
Postorder	Left → Right → Root	Delete tree, postfix expression
Level-order	BFS by depth	Find shortest path, level operations

3.11.3.1 Preorder Traversal

// Solution: Preorder Traversal — O(N) time, O(H) space (H = height)
// Visit order: Root, Left subtree, Right subtree
void preorder(TreeNode* root) {
    if (root == nullptr) return;   // base case
    cout << root->val << " ";      // process ROOT first
    preorder(root->left);          // then left subtree
    preorder(root->right);         // then right subtree
}

// For the tree:    [5]
//                 /    \
//               [3]    [8]
//              /   \
//            [1]   [4]
// Preorder: 5 3 1 4 8

Iterative Preorder (using stack):

// Solution: Preorder Iterative
void preorderIterative(TreeNode* root) {
    if (root == nullptr) return;
    stack<TreeNode*> stk;
    stk.push(root);
    
    while (!stk.empty()) {
        TreeNode* node = stk.top(); stk.pop();
        cout << node->val << " ";    // process current
        
        // Push RIGHT first (so LEFT is processed first — LIFO!)
        if (node->right) stk.push(node->right);
        if (node->left)  stk.push(node->left);
    }
}

3.11.3.2 Inorder Traversal

// Solution: Inorder Traversal — O(N) time
// Visit order: Left subtree, Root, Right subtree
// KEY PROPERTY: Inorder traversal of a BST gives SORTED output!
void inorder(TreeNode* root) {
    if (root == nullptr) return;
    inorder(root->left);           // left subtree first
    cout << root->val << " ";      // then ROOT
    inorder(root->right);          // then right subtree
}

// For BST with values {1, 3, 4, 5, 8}:
// Inorder: 1 3 4 5 8  ← sorted! This is the most important BST property

🔑 Key Insight: Inorder traversal of any BST always produces a sorted sequence. This is why std::set can be iterated in sorted order — it uses inorder traversal internally.

Iterative Inorder (slightly trickier):

// Solution: Inorder Iterative
void inorderIterative(TreeNode* root) {
    stack<TreeNode*> stk;
    TreeNode* curr = root;
    
    while (curr != nullptr || !stk.empty()) {
        // Go as far left as possible
        while (curr != nullptr) {
            stk.push(curr);
            curr = curr->left;
        }
        // Process the leftmost unprocessed node
        curr = stk.top(); stk.pop();
        cout << curr->val << " ";
        
        // Move to right subtree
        curr = curr->right;
    }
}

3.11.3.3 Postorder Traversal

// Solution: Postorder Traversal — O(N) time
// Visit order: Left subtree, Right subtree, Root
// Used for: deleting trees, evaluating expression trees
void postorder(TreeNode* root) {
    if (root == nullptr) return;
    postorder(root->left);         // left subtree first
    postorder(root->right);        // then right subtree
    cout << root->val << " ";      // ROOT last
}

// For BST [1, 3, 4, 5, 8]:
// Postorder: 1 4 3 8 5  (root 5 is always last)

// ── Memory cleanup using postorder ──
void deleteTree(TreeNode* root) {
    if (root == nullptr) return;
    deleteTree(root->left);   // delete left first
    deleteTree(root->right);  // then right
    delete root;              // then this node (safe: children already deleted)
}

3.11.3.4 Level-Order Traversal (BFS)

// Solution: Level-Order Traversal (BFS) — O(N) time, O(W) space (W = max width)
// Uses a queue: process nodes level by level
void levelOrder(TreeNode* root) {
    if (root == nullptr) return;
    
    queue<TreeNode*> q;
    q.push(root);
    
    while (!q.empty()) {
        int levelSize = q.size();  // number of nodes at current level
        
        for (int i = 0; i < levelSize; i++) {
            TreeNode* node = q.front(); q.pop();
            cout << node->val << " ";
            
            if (node->left)  q.push(node->left);
            if (node->right) q.push(node->right);
        }
        cout << "\n";  // newline between levels
    }
}

// For the BST [5, 3, 8, 1, 4]:
// Level 0: 5
// Level 1: 3 8
// Level 2: 1 4

Traversal Summary Table

Tree:           [5]
               /    \
             [3]    [8]
            /   \   /
          [1]  [4] [7]

Preorder:   5 3 1 4 8 7
Inorder:    1 3 4 5 7 8    ← sorted!
Postorder:  1 4 3 7 8 5
Level-order: 5 | 3 8 | 1 4 7

3.11.4 Tree Height and Balance

3.11.4.1 Computing Tree Height

// Solution: Tree Height — O(N) time, O(H) space for recursion stack
// Height = length of longest root-to-leaf path
// Convention: height of null tree = -1, leaf node height = 0
int height(TreeNode* root) {
    if (root == nullptr) return -1;  // empty subtree has height -1
    
    int leftHeight  = height(root->left);   // height of left subtree
    int rightHeight = height(root->right);  // height of right subtree
    
    return 1 + max(leftHeight, rightHeight);  // +1 for current node
}
// Time: O(N) — visit every node exactly once
// Space: O(H) — recursion stack depth = tree height

// Alternative: some define height as number of nodes on longest path
// Then: leaf has height 1, and empty tree has height 0
// Be careful about which convention your problem uses!

3.11.4.2 Checking Balance

A balanced binary tree requires that for every node, the heights of its left and right subtrees differ by at most 1.

// Solution: Check Balanced BST — O(N) time
// Returns -1 if unbalanced, otherwise returns the height of subtree
int checkBalanced(TreeNode* root) {
    if (root == nullptr) return 0;  // empty is balanced, height 0
    
    int leftH = checkBalanced(root->left);
    if (leftH == -1) return -1;     // left subtree is unbalanced
    
    int rightH = checkBalanced(root->right);
    if (rightH == -1) return -1;    // right subtree is unbalanced
    
    // Check balance at current node: heights can differ by at most 1
    if (abs(leftH - rightH) > 1) return -1;  // unbalanced!
    
    return 1 + max(leftH, rightH);   // return height if balanced
}

bool isBalanced(TreeNode* root) {
    return checkBalanced(root) != -1;
}

Start from leaves (base case: null → height 0)

For each node, recursively get left and right subtree heights

If either subtree is unbalanced, immediately return -1 (early exit)

If |leftH - rightH| > 1, this node is unbalanced → return -1

Otherwise return the actual height for the parent to use

3.11.4.3 Counting Nodes

// Solution: Count Nodes — O(N)
int countNodes(TreeNode* root) {
    if (root == nullptr) return 0;
    return 1 + countNodes(root->left) + countNodes(root->right);
}

// Count leaves specifically
int countLeaves(TreeNode* root) {
    if (root == nullptr) return 0;
    if (root->left == nullptr && root->right == nullptr) return 1;  // leaf!
    return countLeaves(root->left) + countLeaves(root->right);
}

3.11.5 Lowest Common Ancestor (LCA) — Brute Force

The LCA of two nodes u and v in a rooted tree is the deepest node that is an ancestor of both.

          [1]
         /    \
       [2]    [3]
      /   \      \
    [4]   [5]   [6]
   /
  [7]

LCA(4, 5) = 2     (both 4 and 5 are descendants of 2)
LCA(4, 6) = 1     (deepest common ancestor is the root 1)
LCA(2, 4) = 2     (node 2 is ancestor of 4 and ancestor of itself)

`O(N)` Brute Force LCA

// Solution: LCA Brute Force — O(N) per query
// Strategy: find path from root to each node, then find last common node

// Step 1: Find path from root to target node
bool findPath(TreeNode* root, int target, vector<int>& path) {
    if (root == nullptr) return false;
    
    path.push_back(root->val);  // add current node to path
    
    if (root->val == target) return true;  // found target!
    
    // Try left then right
    if (findPath(root->left, target, path)) return true;
    if (findPath(root->right, target, path)) return true;
    
    path.pop_back();  // backtrack: target not in this subtree
    return false;
}

// Step 2: Find LCA using two paths
int lca(TreeNode* root, int u, int v) {
    vector<int> pathU, pathV;
    
    findPath(root, u, pathU);   // path from root to u
    findPath(root, v, pathV);   // path from root to v
    
    // Find last common node in both paths
    int result = root->val;
    int minLen = min(pathU.size(), pathV.size());
    
    for (int i = 0; i < minLen; i++) {
        if (pathU[i] == pathV[i]) {
            result = pathU[i];  // still common
        } else {
            break;  // diverged
        }
    }
    return result;
}

Brute Force

O(N) per query

Binary Lifting

O(log N) per query

Build Time

O(N log N)

💡 USACO Note: For USACO Silver problems, the O(N) brute force LCA is NOT always sufficient. With N ≤ 10^5 nodes and Q ≤ 10^5 queries, the total is O(NQ) = O(10^10) — too slow. Use it only when N, Q ≤ 5000. Chapter 5.3 covers O(log N) LCA with binary lifting for harder problems.

3.11.6 Complete BST Implementation

Here's a complete, contest-ready BST with all operations:

// Solution: Complete BST Implementation
#include <bits/stdc++.h>
using namespace std;

struct TreeNode {
    int val;
    TreeNode* left;
    TreeNode* right;
    TreeNode(int v) : val(v), left(nullptr), right(nullptr) {}
};

struct BST {
    TreeNode* root;
    BST() : root(nullptr) {}
    
    // ── Insert ──
    TreeNode* _insert(TreeNode* node, int val) {
        if (!node) return new TreeNode(val);
        if (val < node->val) node->left  = _insert(node->left,  val);
        else if (val > node->val) node->right = _insert(node->right, val);
        return node;
    }
    void insert(int val) { root = _insert(root, val); }
    
    // ── Search ──
    bool search(int val) {
        TreeNode* curr = root;
        while (curr) {
            if (val == curr->val) return true;
            curr = (val < curr->val) ? curr->left : curr->right;
        }
        return false;
    }
    
    // ── Inorder (sorted output) ──
    void _inorder(TreeNode* node, vector<int>& result) {
        if (!node) return;
        _inorder(node->left, result);
        result.push_back(node->val);
        _inorder(node->right, result);
    }
    vector<int> getSorted() {
        vector<int> result;
        _inorder(root, result);
        return result;
    }
    
    // ── Height ──
    int _height(TreeNode* node) {
        if (!node) return -1;
        return 1 + max(_height(node->left), _height(node->right));
    }
    int height() { return _height(root); }
};

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    
    BST bst;
    vector<int> vals = {5, 3, 8, 1, 4, 7, 10};
    for (int v : vals) bst.insert(v);
    
    cout << "Sorted: ";
    for (int v : bst.getSorted()) cout << v << " ";
    cout << "\n";
    // Output: 1 3 4 5 7 8 10
    
    cout << "Height: " << bst.height() << "\n";  // 2
    cout << "Search 4: " << bst.search(4) << "\n";  // 1 (true)
    cout << "Search 6: " << bst.search(6) << "\n";  // 0 (false)
    
    return 0;
}

3.11.7 USACO-Style Practice Problem

Problem: "Cow Family Tree" (USACO Bronze Style)

Problem Statement:

Farmer John has N cows numbered 1 to N. Cow 1 is the ancestor of all cows (the "root"). For each cow i (2 ≤ i ≤ N), its parent is cow parent[i]. The depth of a cow is defined as the number of edges from the root (cow 1) to that cow (so cow 1 has depth 0).

Given the tree and M queries, each asking "what is the depth of cow x?", answer all queries.

Input:

Line 1: N, M (1 ≤ N, M ≤ 100,000)
Lines 2 to N: each line contains i parent[i]
Next M lines: each contains a single integer x

Output: For each query, print the depth of cow x.

Sample Input:

Sample Output:

2
2
0

Cow 4's path: 4→2→1, depth = 2
Cow 5's path: 5→3→1, depth = 2
Cow 1: root, depth = 0

Solution Approach: Use DFS/BFS to compute depth of each node.

// Solution: Cow Family Tree — Depth Query
#include <bits/stdc++.h>
using namespace std;

const int MAXN = 100005;
vector<int> children[MAXN];  // adjacency list: children[i] = list of i's children
int depth[MAXN];             // depth[i] = depth of node i

// DFS to compute depths
void dfs(int node, int d) {
    depth[node] = d;
    for (int child : children[node]) {
        dfs(child, d + 1);  // children have depth+1
    }
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    
    int n, m;
    cin >> n >> m;
    
    for (int i = 2; i <= n; i++) {
        int par;
        cin >> par;
        children[par].push_back(i);  // par is parent of i
    }
    
    dfs(1, 0);  // start DFS from root (cow 1) at depth 0
    
    while (m--) {
        int x;
        cin >> x;
        cout << depth[x] << "\n";
    }
    
    return 0;
}
// Time: O(N + M)
// Space: O(N)

💡 Extension: What if we want sum of values on path to root?

// Instead of depth, compute path sum (sum of node values on path to root)
int pathSum[MAXN];  // pathSum[i] = sum of values from root to i
int nodeVal[MAXN];  // nodeVal[i] = value of node i

void dfs(int node, int cumSum) {
    pathSum[node] = cumSum + nodeVal[node];
    for (int child : children[node]) {
        dfs(child, pathSum[node]);
    }
}
// Query: just return pathSum[x] in O(1)

3.11.8 Building a Tree from Traversals

A classic problem: given preorder and inorder traversals, reconstruct the original tree.

Key insight:

The first element of preorder is always the root
In the inorder array, the root splits it into left and right subtrees

// Solution: Reconstruct Tree from Preorder + Inorder — O(N^2) naive
TreeNode* build(vector<int>& pre, int preL, int preR,
                vector<int>& in,  int inL,  int inR) {
    if (preL > preR) return nullptr;
    
    int rootVal = pre[preL];  // first preorder element = root
    TreeNode* root = new TreeNode(rootVal);
    
    // Find root in inorder array
    int rootIdx = inL;
    while (in[rootIdx] != rootVal) rootIdx++;
    
    int leftSize = rootIdx - inL;  // number of nodes in left subtree
    
    // Recursively build left and right subtrees
    root->left  = build(pre, preL+1, preL+leftSize, in, inL, rootIdx-1);
    root->right = build(pre, preL+leftSize+1, preR, in, rootIdx+1, inR);
    
    return root;
}

TreeNode* buildTree(vector<int>& preorder, vector<int>& inorder) {
    int n = preorder.size();
    return build(preorder, 0, n-1, inorder, 0, n-1);
}

⚠️ Common Mistakes

Wrong — Null pointer crash

// BAD: No null check!
void inorder(TreeNode* root) {
    inorder(root->left);  // CRASH if root is null
    cout << root->val;
    inorder(root->right);
}

Correct — Always check null

// GOOD: Base case first
void inorder(TreeNode* root) {
    if (root == nullptr) return;  // ← critical!
    inorder(root->left);
    cout << root->val;
    inorder(root->right);
}

Wrong — Stack overflow on large input

// BAD: Recursive DFS on a 10^5 node
// degenerate tree (skewed) = 10^5 recursion depth
// Default stack ~ 8MB = overflow around 10^4-10^5!
void dfsRecursive(TreeNode* root) {
    if (!root) return;
    process(root);
    dfsRecursive(root->left);
    dfsRecursive(root->right);
}

Correct — Iterative is stack-safe

// GOOD: Use explicit stack for large trees
void dfsIterative(TreeNode* root) {
    stack<TreeNode*> stk;
    if (root) stk.push(root);
    while (!stk.empty()) {
        TreeNode* node = stk.top(); stk.pop();
        process(node);
        if (node->right) stk.push(node->right);
        if (node->left)  stk.push(node->left);
    }
}

Top 5 BST/Tree Bugs

Forgetting nullptr base case — causes segfault immediately
Not returning the (potentially new) root from insert/delete — tree structure broken
Stack overflow — use iterative traversal for N > 10^5
Memory leak — always delete nodes you remove (or use smart pointers)
Using unbalanced BST when STL set would work — use std::set in contests

Chapter Summary

📌 Key Takeaways

Concept	Key Point	Time Complexity
BST Search	Follow left/right based on comparison	`O(log N)` avg, `O(N)` worst
BST Insert	Find correct position, insert at null	`O(log N)` avg
BST Delete	3 cases: leaf, one child, two children	`O(log N)` avg
Inorder	Left → Root → Right	`O(N)`
Preorder	Root → Left → Right	`O(N)`
Postorder	Left → Right → Root	`O(N)`
Level-order	BFS by level	`O(N)`
Height	max(leftH, rightH) + 1	`O(N)`
Balance Check		leftH - rightH
LCA (brute)	Find paths, compare	`O(N)` per query

❓ FAQ

Q1: When should I use BST vs std::set?

A: In competitive programming, almost always use std::set. std::set is backed by a red-black tree (balanced BST), guaranteeing O(log N); a hand-written BST may degenerate to O(N). Only consider writing your own BST when you need custom BST behavior (e.g., tracking subtree sizes for "K-th largest" queries), or use __gnu_pbds::tree (Policy-Based Data Structure).

Q2: What is the relationship between Segment Tree and BST?

A: Segment Tree (Chapter 3.9) is a complete binary tree, but not a BST—nodes store range aggregate values (like range sums), not ordered keys. Both are binary trees with similar structure, but completely different purposes. Understanding BST pointer/recursion operations makes Segment Tree code easier to understand.

Q3: Which traversal—preorder/inorder/postorder—is most common in contests?

A: Inorder is most important—it outputs the BST's sorted sequence. Postorder is common for tree DP (compute children before parent). Level-order (BFS) is used when processing by level. Preorder is less common, but useful for serializing/deserializing trees.

Q4: Which is better, recursive or iterative implementation?

A: Recursive code is concise and easy to understand (preferred in contests). But when N ≥ 10^5 and the tree may degenerate, recursion risks stack overflow (default stack ~8MB, supports ~10^4~10^5 levels). USACO problems usually have non-degenerate trees, so recursion is usually fine; but if unsure, iterative is safer.

Q5: How important is LCA in competitive programming?

A: Very important! LCA is the foundation of tree DP and path queries. It appears occasionally in USACO Silver and is almost always tested in USACO Gold. The O(N) brute-force LCA learned here handles N ≤ 5000. The O(log N) Binary Lifting LCA is covered in detail in Chapter 5.3 (Trees & Special Graphs).

🔗 Connections to Other Chapters

Chapter 2.3 (Functions & Arrays): foundation of recursion—binary tree traversal is a perfect application of recursion
Chapter 3.8 (Maps & Sets): std::set / std::map are backed by balanced BST; understanding BST helps you use them better
Chapter 3.9 (Segment Trees): Segment Tree is a complete binary tree; the recursive structure of build/query/update is identical to BST traversal
Chapter 5.2 (Graph Algorithms): trees are special undirected graphs (connected, acyclic); all tree algorithms are special cases of graph algorithms
Chapter 5.3 (Trees & Special Graphs): LCA Binary Lifting, Euler Tour—built directly on this chapter's foundation

Practice Problems

Problem 3.11.1 — BST Validator 🟢 Easy Given a binary tree (not necessarily a BST), determine if it satisfies the BST property (all left subtree values < node < all right subtree values for every node).

Hint

Common mistake: only checking `root->left->val < root->val` is NOT enough (doesn't verify the full subtree). Pass `minVal` and `maxVal` bounds down the recursion: `isValidBST(root, INT_MIN, INT_MAX)`.

Problem 3.11.2 — BST Inorder K-th Smallest 🟢 Easy Given a BST, find the K-th smallest element.

Hint

Inorder traversal of a BST gives elements in sorted order. Count nodes as you do inorder traversal. Stop when you've visited K nodes.

Problem 3.11.3 — Tree Diameter 🟡 Medium Given a binary tree (not a BST), find the longest path between any two nodes (the diameter). The path does not need to pass through the root.

Hint

For each node, the longest path through it = leftHeight + rightHeight + 2. Compute this for all nodes and take the maximum. You can do this in a single DFS by returning height and updating a global `maxDiameter` variable.

Problem 3.11.4 — Flatten BST to Sorted Array (USACO Style) 🟡 Medium You are given a BST with N nodes. N cows are each assigned a "score" (the node value). Find the median cow score (the ⌈N/2⌉-th smallest value).

Hint

Do an inorder traversal to get a sorted array, then return element at index (N-1)/2 (0-indexed). Time: `O(N)`.

Problem 3.11.5 — Maximum Path Sum 🔴 Hard Given a binary tree where nodes can have negative values, find the path (between any two nodes) with the maximum sum. A path can go up and down through the tree.

Hint

For each node v, the best path through v uses some prefix of the left branch and some prefix of the right branch. Use DFS: for each node, return the maximum "one-sided" path (going down only). Maintain a global maximum considering both sides. Handle negative branches by clamping to 0 (`max(0, leftMax) + max(0, rightMax) + node->val`).

End of Chapter 3.11 — Next: Chapter 4.1: Greedy Fundamentals

⚡ Part 4: Greedy Algorithms

Elegant algorithms with no complex recurrences — just one clever observation. Learn when greedy works, how to prove it, and powerful greedy + binary search combos.

📚 2 Chapters · ⏱️ Estimated 1-2 weeks · 🎯 Target: Activity selection, scheduling, binary search + greedy

Part 4: Greedy Algorithms

Estimated time: 1–2 weeks

Greedy algorithms are elegant: no complex recurrences, no state explosions — just one clever observation that makes everything fall into place. The challenge is knowing when greedy works and being able to prove it when it does.

What Topics Are Covered

Chapter	Topic	The Big Idea
Chapter 4.1	Greedy Fundamentals	When greedy works; exchange argument proofs
Chapter 4.2	Greedy in USACO	Real USACO problems solved with greedy

What You'll Be Able to Solve After This Part

After completing Part 4, you'll be ready to tackle:

USACO Bronze:
- Simulation with greedy decisions (process events optimally)
- Simple sorting-based greedy
USACO Silver:
- Activity selection (maximum non-overlapping intervals)
- Scheduling problems (EDF, minimize lateness)
- Greedy + binary search on answer
- Huffman-style merge problems (priority queue)

Key Greedy Patterns

Pattern	Sort By	Application
Activity selection	End time ↑	Max non-overlapping intervals
Earliest deadline first	Deadline ↑	Minimize maximum lateness
Interval stabbing	End time ↑	Min points to cover all intervals
Interval covering	Start time ↑	Min intervals to cover a range
Fractional knapsack	Value/weight ↓	Maximize value with capacity
Huffman merge	Use min-heap	Minimum cost encoding

Prerequisites

Before starting Part 4, make sure you can:

Sort with custom comparators (Chapter 3.3)
Use priority_queue (Chapter 3.1)
Binary search on the answer (Chapter 3.3) — used in Chapter 4.2

The Greedy Mindset

Before coding a greedy solution, ask:

What's the "obvious best" choice at each step?
Can I make an exchange argument? If I swap the greedy choice with any other choice, does the solution only get worse (or stay the same)?
Can I find a counterexample? Try small cases where the greedy might fail.

If you can answer (1) and (2) but not find a counterexample for (3), your greedy is likely correct.

Tips for This Part

Greedy is the hardest part to "verify." Unlike DP where you just need the right recurrence, greedy requires a correctness argument. Practice sketching exchange argument proofs.
When greedy fails, DP is usually the fix. The coin change example (Chapter 4.1) shows this perfectly.
Chapter 4.2 has real USACO problems — work through the code carefully, not just the high-level idea.
Greedy + binary search (Chapter 4.2) is a powerful combination that appears frequently in Silver. The greedy solves the "check" function, and binary search finds the optimal answer.

💡 Key Insight: Sorting is the engine of most greedy algorithms. The sort criterion embodies the "greedy choice" — choosing the best element first. The exchange argument proves that this criterion is optimal.

🏆 USACO Tip: In USACO Silver, if a problem asks "maximum X subject to constraint Y" or "minimum cost to achieve Z," first try binary search on the answer with a greedy check. This combination solves a surprising fraction of Silver problems.

📖 Chapter 4.1 ⏱️ ~55 min read 🎯 Intermediate

Chapter 4.1: Greedy Fundamentals

📝 Before You Continue: You should be comfortable with sorting (Chapter 3.3) and basic priority_queue usage (Chapter 3.1). Some problems also use interval reasoning.

A greedy algorithm is like a traveler who always takes the nearest oasis — no map, no planning, just the best move visible right now. For the right problems, this always works out. For others, it leads to disaster.

4.1.1 What Makes a Problem "Greedy-Solvable"?

A greedy approach works when the problem has the greedy choice property: making the locally optimal choice at each step leads to a globally optimal solution.

Contrast with DP

Consider making change for 11 cents:

Coins: {1, 5, 6, 9}
Greedy: 9 + 1 + 1 = 3 coins
Optimal: 6 + 5 = 2 coins

Here greedy fails. The greedy choice (always take the largest coin) doesn't lead to the global optimum.

But with US coins {1, 5, 10, 25, 50}:

41 cents: Greedy → 25 + 10 + 5 + 1 = 4 coins ✓ (optimal)

US coins have a special structure that makes greedy work. Always verify!

💡 Key Insight: Greedy works when there's a "no regret" property — once you make the greedy choice, you'll never need to undo it. If you can always swap any non-greedy choice for the greedy one without making things worse, greedy is optimal.

贪心 vs DP 决策路径对比：

flowchart TD
    Start["遇到优化问题"] --> Q1{"能否找到反例？"}
    Q1 -->|"能，贪心失败"| DP["使用 DP\n考虑所有选择"]
    Q1 -->|"不能，尝试证明"| Q2{"能否用交换论证明贪心选择安全？"}
    Q2 -->|"能"| Greedy["使用贪心\n每步取局部最优"]
    Q2 -->|"不确定"| Both["先尝试贪心\n如果 WA 就改 DP"]
    style Greedy fill:#dcfce7,stroke:#16a34a
    style DP fill:#dbeafe,stroke:#3b82f6
    style Both fill:#fef9ec,stroke:#d97706

4.1.2 The Exchange Argument

The exchange argument is the standard proof technique for greedy algorithms:

Assume there's an optimal solution O that makes a different choice than our greedy at some step
Show that we can "swap" our greedy choice for theirs without making things worse
By repeated swaps, transform O into the greedy solution — it remains optimal throughout
Conclude: the greedy solution is optimal

💡 Key Insight: The exchange argument works by showing that greedy choices are "at least as good" as any alternative. You don't need to show greedy is uniquely optimal — just that no swap can improve it.

Visual: Greedy Exchange Argument

Greedy Exchange Argument

The diagram illustrates the exchange argument: if two adjacent elements are "out of order" relative to the greedy criterion, swapping them produces a solution that is at least as good. By repeatedly applying swaps we can transform any solution into the greedy solution without losing value.

Let's see this in action.

4.1.3 Activity Selection Problem

Problem: Given N activities, each with a start time s[i] and end time f[i], select the maximum number of non-overlapping activities.

Visual: Activity Selection Gantt Chart

Activity Selection

The Gantt chart shows all activities on a timeline. Selected activities (green) are non-overlapping and maximally many. Rejected activities (gray) are skipped because they overlap with an already-selected one. The greedy rule is: always pick the activity with the earliest end time that doesn't conflict.

Greedy Algorithm:

Sort activities by end time
Always select the activity that ends earliest among those compatible with previously selected activities

活动选择贪心过程示意：

flowchart LR
    subgraph sorted["按结束时间排序后"]
        direction TB
        A1["A(1,3)"] 
        B1["B(2,5)"]
        C1["C(5,7)"]
        D1["D(6,8)"]
        F1["F(8,11)"]
    end
    subgraph select["贪心选择过程"]
        direction TB
        S1["lastEnd=-1\n选 A(1,3) ✓\nlastEnd=3"]
        S2["B开始=2 < lastEnd=3\n跳过 B ✗"]
        S3["C开始=5 ≥ lastEnd=3\n选 C(5,7) ✓\nlastEnd=7"]
        S4["D开始=6 < lastEnd=7\n跳过 D ✗"]
        S5["F开始=8 ≥ lastEnd=7\n选 F(8,11) ✓\nlastEnd=11"]
        S1 --> S2 --> S3 --> S4 --> S5
    end
    sorted --> select
    style S1 fill:#dcfce7,stroke:#16a34a
    style S3 fill:#dcfce7,stroke:#16a34a
    style S5 fill:#dcfce7,stroke:#16a34a
    style S2 fill:#fef2f2,stroke:#dc2626
    style S4 fill:#fef2f2,stroke:#dc2626

💡 为什么按结束时间排序？ 选择最早结束的活动，留下最多的时间给后续活动。按开始时间排序可能选一个开始早但结束很晚的活动，占用大量时间。

// Solution: Activity Selection — O(N log N)
#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;

    vector<pair<int,int>> activities(n);  // {end_time, start_time}
    for (int i = 0; i < n; i++) {
        int s, f;
        cin >> s >> f;
        activities[i] = {f, s};  // sort by end time
    }

    sort(activities.begin(), activities.end());  // ← KEY LINE: sort by end time

    int count = 0;
    int lastEnd = -1;  // end time of the last selected activity

    for (auto [f, s] : activities) {
        if (s >= lastEnd) {      // this activity starts after the last one ends
            count++;
            lastEnd = f;         // update last end time
        }
    }

    cout << count << "\n";
    return 0;
}

Complete Walkthrough: USACO-Style Activity Selection

Problem: Given activities: [(1,3), (2,5), (3,9), (6,8), (5,7), (8,11), (10,12)] (format: start, end)

Step 1 — Sort by end time:

Activity:  A      B      C      D      E      F      G
(s,e):  (1,3)  (2,5)  (5,7)  (6,8)  (3,9)  (8,11) (10,12)

Sorted: A(1,3), B(2,5), C(5,7), D(6,8), E(3,9), F(8,11), G(10,12)

Step 2 — Greedy selection (lastEnd = -1 initially):

Activity A (1,3):  start=1 ≥ lastEnd=-1 ✓ SELECT. lastEnd = 3. Count = 1
Activity B (2,5):  start=2 ≥ lastEnd=3? NO (2 < 3). SKIP.
Activity C (5,7):  start=5 ≥ lastEnd=3 ✓ SELECT. lastEnd = 7. Count = 2
Activity D (6,8):  start=6 ≥ lastEnd=7? NO (6 < 7). SKIP.
Activity E (3,9):  start=3 ≥ lastEnd=7? NO (3 < 7). SKIP.
Activity F (8,11): start=8 ≥ lastEnd=7 ✓ SELECT. lastEnd = 11. Count = 3
Activity G (10,12):start=10 ≥ lastEnd=11? NO (10 < 11). SKIP.

Result: 3 activities selected — A(1,3), C(5,7), F(8,11)

ASCII Timeline Diagram:

Time:  0  1  2  3  4  5  6  7  8  9  10 11 12
       |  |  |  |  |  |  |  |  |  |  |  |  |
A:        [===]                                   ✓ SELECTED
B:           [======]                             ✗ overlaps A
C:                   [======]                     ✓ SELECTED
D:                      [======]                  ✗ overlaps C
E:              [============]                    ✗ overlaps A and C
F:                            [======]            ✓ SELECTED
G:                               [======]         ✗ overlaps F

Selected: A ===    C ===    F ===
          1-3      5-7      8-11

Formal Exchange Argument Proof (Activity Selection)

Claim: Sorting by end time and greedily selecting is optimal.

Proof:

Let G = greedy solution, O = some other optimal solution. Both select k activities.

Step 1 — Show first selections can be made equivalent: Let a₁ be the first activity selected by G (earliest-ending activity overall). Let b₁ be the first activity selected by O.

Since G sorts by end time, end(a₁) ≤ end(b₁).

Now "swap" b₁ for a₁ in O: replace b₁ with a₁. Does O remain feasible?

a₁ ends no later than b₁, so a₁ conflicts with at most as many activities as b₁ did
All activities in O that came after b₁ and didn't conflict with b₁ also don't conflict with a₁ (since a₁ ends ≤ b₁ ends)
So O' (with a₁ replacing b₁) is still a valid selection of k activities ✓

Step 2 — Induction: After the first selection, G picks the earliest-ending activity compatible with a₁, and O' has a₁ as its first activity. Apply the same argument to the remaining activities.

Conclusion: By induction, any optimal solution O can be transformed into G (the greedy solution) without losing optimality. Therefore G is optimal. ∎

💡 Key Insight from the proof: The greedy choice (earliest end time) is "safe" because it leaves the most remaining time for future activities. Choosing any later-ending first activity can only hurt future flexibility.

4.1.4 Interval Scheduling Maximization vs. Minimization

Visual: Interval Scheduling on a Number Line

Interval Scheduling

The number line diagram shows multiple intervals and the greedy selection process. By sorting by end time and always taking the next non-overlapping interval, we achieve the maximum number of selected intervals. Green intervals are selected; gray ones are rejected due to overlap.

Maximization: Maximum Non-Overlapping Intervals

→ Sort by end time, greedy select as above.

Minimization: Minimum "Points" to Stab All Intervals

Problem: Given N intervals, find the minimum number of points such that each interval contains at least one point.

Greedy: Sort by end time. For each interval whose left endpoint is to the right of the last placed point, place a new point at its right endpoint.

sort(intervals.begin(), intervals.end());  // sort by end time

int points = 0;
int lastPoint = INT_MIN;

for (auto [end, start] : intervals) {
    if (start > lastPoint) {  // this interval not yet covered
        lastPoint = end;       // place point at its end (covers as many future intervals as possible)
        points++;
    }
}

cout << points << "\n";

Minimization: Minimum Intervals to Cover a Range

Problem: Cover the range [0, T] with minimum intervals from a given set.

Greedy: Sort by start time. At each step, among all intervals starting at or before the current position, pick the one that extends furthest to the right.

sort(intervals.begin(), intervals.end());  // sort by start time

int covered = 0;    // currently covered up to 'covered'
int count = 0;
int i = 0;
int n = intervals.size();

while (covered < T) {
    int farthest = covered;

    // Among all intervals that start at or before 'covered', pick the farthest-reaching
    while (i < n && intervals[i].first <= covered) {
        farthest = max(farthest, intervals[i].second);
        i++;
    }

    if (farthest == covered) {
        cout << "Impossible\n";
        return 0;
    }

    covered = farthest;
    count++;
}

cout << count << "\n";

4.1.5 The Scheduling Problem: Minimize Lateness

Problem: N jobs with deadlines d[i] and processing times t[i]. Schedule all jobs on one machine to minimize maximum lateness (how much the latest job exceeds its deadline).

Lateness of job i = max(0, finish_time[i] - d[i]).

Greedy: Sort jobs by deadline (Earliest Deadline First — EDF).

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;

    vector<pair<int,int>> jobs(n);  // {deadline, processing_time}
    for (int i = 0; i < n; i++) cin >> jobs[i].second >> jobs[i].first;

    sort(jobs.begin(), jobs.end());  // sort by deadline

    int time = 0;
    int maxLateness = 0;

    for (auto [deadline, proc] : jobs) {
        time += proc;                          // finish time of this job
        int lateness = max(0, time - deadline); // how late is it?
        maxLateness = max(maxLateness, lateness);
    }

    cout << maxLateness << "\n";
    return 0;
}

Proof sketch: If job A has earlier deadline than B but is scheduled after B, swap them. The lateness of A can only decrease (it finishes earlier), and the lateness of B can only increase by at most the processing time of A — but since d[A] ≤ d[B], B's lateness doesn't worsen. So EDF is optimal.

4.1.6 Huffman Coding (Greedy Tree Building)

Problem: Given N symbols with frequencies, build a binary tree minimizing total encoding length (frequency × depth summed over all symbols).

Greedy: Always merge the two symbols/groups with smallest frequency.

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;

    priority_queue<long long, vector<long long>, greater<long long>> pq;  // min-heap
    for (int i = 0; i < n; i++) {
        long long f; cin >> f;
        pq.push(f);
    }

    long long totalCost = 0;
    while (pq.size() > 1) {
        long long a = pq.top(); pq.pop();
        long long b = pq.top(); pq.pop();
        totalCost += a + b;  // cost of merging a and b
        pq.push(a + b);      // merged group has frequency a+b
    }

    cout << totalCost << "\n";
    return 0;
}

⚠️ Common Mistakes in Chapter 4.1

Applying greedy to DP problems: Just because greedy is simpler doesn't mean it's correct. Always test your greedy on small counterexamples. Coin change with arbitrary denominations is a classic trap.
Wrong sort criterion: Sorting by start time instead of end time for activity selection is a classic bug. The justification for WHY we sort a certain way (the exchange argument) is what tells you the correct criterion.
Off-by-one in overlap check: s >= lastEnd (allows adjacent activities) vs. s > lastEnd (requires a gap). Check which interpretation the problem intends.
Assuming greedy works without proof: Always verify with a small example or brief exchange argument. If you can't find a counterexample AND you can sketch why the greedy choice is "safe," it's likely correct.
Forgetting to sort: Greedy algorithms almost always begin with a sort. Forgetting to sort means the greedy "order" doesn't exist.

Chapter Summary

📌 Key Takeaways

Problem	Greedy Strategy	Sort By	Time
Max non-overlapping intervals	Pick earliest-ending	End time ↑	O(N log N)
Min points to stab intervals	Place point at end of each uncovered interval	End time ↑	O(N log N)
Min intervals to cover range	Pick farthest-reaching at each step	Start time ↑	O(N log N)
Minimize max lateness	Earliest Deadline First (EDF)	Deadline ↑	O(N log N)
Huffman coding	Merge two smallest frequencies	Min-heap	O(N log N)

❓ FAQ

Q1: How do I tell if a problem can be solved greedily?

A: Three signals: ① After sorting, there's a clear processing order; ② You can use an exchange argument to show the greedy choice is never worse than any alternative; ③ You can't find a counterexample. If you find one (e.g., coin change with {1,5,6,9}), greedy fails — use DP instead.

Q2: What's the real difference between greedy and DP?

A: Greedy makes the locally optimal choice at each step and never looks back. DP considers all possible choices and builds the global optimum from subproblem solutions. Greedy is a special case of DP — it works when the local optimum happens to equal the global optimum.

Q3: What is the "binary search on answer + greedy check" pattern?

A: When a problem asks to "minimize the maximum" or "maximize the minimum," binary search on the answer X and use a greedy check(X) to verify feasibility. See the Convention problem in Chapter 4.2.

Q4: Why sort Activity Selection by end time instead of start time?

A: Sorting by end time ensures we always pick the activity that "frees up resources" earliest, leaving the most room for future activities. Sorting by start time might select an activity that starts early but ends very late, blocking all subsequent ones.

🔗 Connections to Other Chapters

Chapters 6.1–6.3 (DP) are the "upgrade" of greedy — when greedy fails, DP considers all choices
Chapter 3.3 (Sorting & Binary Search) is the prerequisite — almost every greedy algorithm starts with a sort
Chapter 4.2 applies greedy to real USACO problems, showcasing the classic "binary search on answer + greedy check" pattern
Chapter 5.3 (Kruskal's MST) is fundamentally greedy — sort edges and greedily pick the minimum, one of the most classic greedy algorithms

Practice Problems

Problem 4.1.1 — Meeting Rooms II 🟡 Medium N meetings with start/end times. Find the minimum number of rooms needed so all meetings can occur simultaneously.

Solution sketch: Sort by start time. Use a min-heap of end times (when each room becomes free). For each meeting, if its start ≥ earliest-free room, reuse that room. Otherwise, add a new room.

Hint

The minimum rooms needed = the maximum number of meetings happening simultaneously. Use a priority queue (min-heap) to track when rooms become available.

Problem 4.1.2 — Gas Station 🔴 Hard N gas stations in a circle. Station i has gas[i] liters and requires cost[i] to reach the next. Can you complete the circuit? If yes, find the starting station.

Solution sketch: If total gas ≥ total cost, a solution exists. Greedy: try each starting station. If tank drops negative, reset starting station to the next one.

Hint

Key insight: if the total gas ≥ total cost, there's always exactly one valid starting station. Track cumulative gas balance; when it goes negative, the starting station must be after the current failed position.

Problem 4.1.3 — Minimum Platforms 🟡 Medium Given arrival and departure times for N trains, find the minimum number of platforms needed so no train waits.

Hint

Create events: +1 for each arrival, -1 for each departure. Sort by time. Sweep and track the running count; the maximum is the answer.

Problem 4.1.4 — Fractional Knapsack 🟢 Easy You can take fractions of items. Weight w[i], value v[i], capacity W. Maximize value.

Solution sketch: Sort by value/weight ratio (highest first). Take as much as possible of each item until knapsack is full.

Hint

Greedy works here (unlike 0/1 knapsack) because you can take fractions. Always take from the highest value/weight ratio item first.

Problem 4.1.5 — Jump Game 🟡 Medium Array A of non-negative integers. From position i, you can jump up to A[i] steps forward. Can you reach the last position from position 0?

Solution sketch: Track farthest = furthest position reachable so far. At each position i ≤ farthest, update farthest = max(farthest, i + A[i]). If we reach farthest ≥ n-1, return true.

Hint

If you can reach position i, you can reach all positions ≤ i + A[i]. Greedily maintain the farthest reachable position.

🏆 Challenge Problem: USACO 2016 February Silver: Fencing the Cows (Activity Selection Variant) Farmer John has N fence segments on the x-axis, each defined by [L_i, R_i]. He wants to select a minimum set of "anchor points" such that every fence segment contains at least one anchor point. (This is the interval stabbing problem — greedy with end-time sorting.)

📖 Chapter 4.2 ⏱️ ~60 min read 🎯 Advanced

Chapter 4.2: Greedy in USACO

USACO problems that yield to greedy solutions are some of the most satisfying to solve — once you see the insight, the code practically writes itself. This chapter walks through several USACO-style problems where greedy is the key.

4.2.1 Pattern Recognition: Is It Greedy?

Before coding, ask yourself:

Can I sort the input in some clever way?
Is there a "natural" order to process elements that always leads to the best result?
Can I argue that taking the "obvious best" at each step never hurts?

If yes to any of these, try greedy. If your greedy fails a test case, reconsider — maybe it's actually a DP problem.

4.2.2 USACO Bronze: Cow Sorting

Problem: N cows in a line. Each cow has a "grumpiness" value g[i]. To sort them in increasing order, you can swap two adjacent cows, but you pay g[i] + g[j] for swapping cows i and j. Minimize total cost.

Key Insight: With adjacent swaps, each inversion (pair (i, j) where i < j but g[i] > g[j]) requires exactly one swap. The total cost is the sum of (g[i] + g[j]) over all inversions. There is no freedom to reduce this — every inversion pair must be swapped exactly once, and any ordering of swaps gives the same total cost.

⚠️ Common Misconception: The formula sumG + (n-2) × minG is NOT the correct answer for general Cow Sorting. That expression only coincidentally equals the answer in edge cases (e.g., n=2). The correct cost is always the sum over all inversions.

Counting inversions in O(N²):

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;
    vector<long long> g(n);
    for (long long &x : g) cin >> x;

    // Total cost = sum of (g[i] + g[j]) for every inversion pair i < j where g[i] > g[j]
    // Equivalently: for each element g[i], add g[i] * (# elements it must "cross"):
    //   (# elements to its left that are > g[i]) + (# elements to its right that are < g[i])
    // Both counts together = total inversions involving g[i].

    long long totalCost = 0;
    for (int i = 0; i < n; i++) {
        for (int j = i + 1; j < n; j++) {
            if (g[i] > g[j]) {
                totalCost += g[i] + g[j];  // this inversion costs g[i]+g[j]
            }
        }
    }

    cout << totalCost << "\n";
    return 0;
}
// Time: O(N²) — for N ≤ 10^5 use merge-sort inversion count (O(N log N))

Example:

Input: g = [3, 1, 2]
Inversions: (3,1) → cost 4; (3,2) → cost 5
Total: 9

Verification: Bubble sort on [3,1,2]:

Swap(3,1) = cost 4 → [1,3,2]
Swap(3,2) = cost 5 → [1,2,3]
Total = 9 ✓

4.2.3 USACO Bronze: The Cow Signal (Greedy Simulation)

Many USACO Bronze problems are pure simulation with a greedy twist: process events in time order and maintain the optimal state.

Problem: N cows each leave the barn at time t[i] and must reach the pasture. The barn-pasture road has capacity C (at most C cows at once). Cows travel instantaneously but must wait if capacity is full. What is the time when the last cow arrives?

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, c;
    cin >> n >> c;

    vector<int> t(n);
    for (int &x : t) cin >> x;
    sort(t.begin(), t.end());  // process cows in order of departure time

    int ans = 0;
    // Process in groups of c
    for (int i = 0; i < n; i += c) {
        // Group starts at t[i] (the earliest cow in this batch)
        // But batch can't start before previous batch finished
        ans = max(ans, t[i]);  // this batch must start at least when earliest cow is ready
        ans++;  // takes 1 time unit
    }

    cout << ans << "\n";
    return 0;
}

4.2.4 USACO Silver: Paired Up

Problem: N cows in two groups (group A and B). Each cow in A must be paired with one in group B. Pairing cow with value a with cow with value b gives profit f(a, b). Maximize total profit.

For specific profit functions, greedy sorting works. The classic version: profit = min(a, b), maximize sum.

Greedy: Sort both groups. Pair the largest A with the largest B, etc.

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;

    vector<int> A(n), B(n);
    for (int &x : A) cin >> x;
    for (int &x : B) cin >> x;

    sort(A.begin(), A.end());
    sort(B.begin(), B.end());

    long long total = 0;
    for (int i = 0; i < n; i++) {
        total += min(A[i], B[i]);  // pair i-th smallest with i-th smallest
    }

    cout << total << "\n";
    return 0;
}

This works because if you pair (a_large, b_small) instead of (a_large, b_large) and (a_small, b_small), you get min(a_large, b_small) + min(a_small, b_small) ≤ min(a_large, b_large) + min(a_small, b_small). Always match sorted order.

4.2.5 USACO Silver: Convention

Problem (USACO 2018 February Silver): N cows arrive at times t[1..N] at a bus stop. There are M buses, each holding C cows. A bus departs when full or at a scheduled time. Assign cows to buses to minimize the maximum waiting time for any cow.

Approach: Binary search on the answer + greedy check.

This is a "binary search on the answer with greedy verification" problem:

#include <bits/stdc++.h>
using namespace std;

int n, m, c;
vector<long long> cows;   // sorted arrival times

// Can we schedule all cows with max wait <= maxWait?
bool canDo(long long maxWait) {
    int busesUsed = 0;
    int i = 0;  // current cow index

    while (i < n) {
        busesUsed++;
        if (busesUsed > m) return false;  // ran out of buses

        // This bus serves cows starting from cow i
        // The bus must depart by cows[i] + maxWait
        long long depart = cows[i] + maxWait;

        // Fill bus with as many cows as possible (capacity c, all with arrival <= depart)
        int count = 0;
        while (i < n && count < c && cows[i] <= depart) {
            i++;
            count++;
        }
    }

    return true;
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    cin >> n >> m >> c;
    cows.resize(n);
    for (long long &x : cows) cin >> x;
    sort(cows.begin(), cows.end());

    // Binary search on the maximum wait time
    long long lo = 0, hi = 1e14;
    while (lo < hi) {
        long long mid = lo + (hi - lo) / 2;
        if (canDo(mid)) hi = mid;
        else lo = mid + 1;
    }

    cout << lo << "\n";
    return 0;
}

4.2.6 USACO Bronze: Herding (Greedy Observation)

Problem: 3 cows at positions a, b, c on a number line. In one move, you can move any cow to any empty position. Find the minimum moves to get all 3 cows into consecutive positions.

Insight: 2 moves are always sufficient (you can move the outer two to surround the middle). Can 1 move work? Can 0 work? Check these cases.

#include <bits/stdc++.h>
using namespace std;

int main() {
    long long a, b, c;
    cin >> a >> b >> c;

    // Make sure a <= b <= c
    long long pos[3] = {a, b, c};
    sort(pos, pos + 3);
    a = pos[0]; b = pos[1]; c = pos[2];

    // 0 moves: already consecutive
    if (c - a == 2) { cout << 0; return 0; }

    // 1 move: check if moving one cow can make them consecutive
    // Options:
    // - Move a to b+1 or b-1 (if that makes 3 consecutive with c)
    // - Move c to b-1 or b+1 (if that makes 3 consecutive with a)
    // - Move b to somewhere

    // Case: after moving a, we have {b, b+1, b+2} or similar
    bool one_move = false;
    // Move a: can {b, c} be made consecutive with a new position?
    // Need b and c to differ by 1: c - b == 1 (then a → b-1 or c+1)
    if (c - b == 1) one_move = true;
    // Or c - b == 2: then a → b+1 fills the gap
    if (c - b == 2) one_move = true;

    // Move c symmetrically
    if (b - a == 1) one_move = true;
    if (b - a == 2) one_move = true;

    // Move b:
    // After moving b, we have {a, c} and new position x
    // Need {a, x, c} consecutive: x = a+1, c = a+2
    if (c - a == 2) one_move = true;  // already handled above
    // Or put b adjacent to a or c
    if (a + 1 == c - 1 && a + 1 != b) one_move = true; // if a+1 == c-1 means c-a=2 already...

    // Simpler approach: just try all possible "target" consecutive triples
    // The cows need to end up at some {x, x+1, x+2}
    // In 1 move: one cow is already at its target, two others might need to move... wait, exactly 1 cow moves
    // So two cows stay put and one moves. Check all combos.
    // Pairs that stay: (a,b), (a,c), (b,c)

    // Pair (b, c) stays: a moves. Consecutive triple containing b and c:
    // {b-2, b-1, b} with c = b (c!=b), {b-1, b, b+1} with c = b+1, {b, b+1, b+2} with c = b+2
    if (c - b == 1 || c - b == 2) one_move = true;
    // Pair (a, b) stays: c moves.
    if (b - a == 1 || b - a == 2) one_move = true;
    // Pair (a, c) stays: b moves. We need {a, x, c} consecutive.
    // c - a == 2 → already checked. c - a == 3: put b at a+1 or c-1.
    if (c - a == 3) one_move = true;

    if (one_move) { cout << 1; return 0; }

    cout << 2;
    return 0;
}

4.2.7 Common Greedy Patterns in USACO

Pattern	Description	Sort By
Activity selection	Max non-overlapping intervals	End time
Scheduling	Minimize completion time / lateness	Deadline or ratio
Greedy + binary search	Check feasibility, find optimal via BS	Various
Pairing	Optimal matching of two sorted lists	Both arrays
Simulation	Process events in time order	Event time
Sweep line	Maintain active set as you move across time	Start/end events

Chapter Summary

📌 Key Takeaways

Greedy algorithms in USACO often involve:

Sorting the input in a clever order
Scanning once (or twice) with a simple update rule
Occasionally combining with binary search on the answer

USACO Greedy Pattern	Description	Sort By
Activity selection	Max non-overlapping intervals	End time
Scheduling	Minimize completion time / lateness	Deadline or ratio
Greedy + binary search	Check feasibility, find optimal via BS	Various
Pairing	Optimal matching of two sorted lists	Both arrays
Simulation	Process events in time order	Event time
Sweep line	Maintain active set as you scan	Start/end events

❓ FAQ

Q1: What is the template for "binary search on answer + greedy check"?

A: Outer layer: binary search on answer X (lo=min possible, hi=max possible). Inner layer: write a check(X) function that uses a greedy strategy to verify whether X is feasible. Adjust lo/hi based on the result. The key requirement is that check must be monotone (if X is feasible, so is X+1, or vice versa).

Q2: How are USACO greedy problems different from LeetCode greedy problems?

A: USACO greedy problems typically require proving correctness (exchange argument) and are often combined with binary search and sorting. LeetCode tends to focus on simpler "always pick max/min" greedy. USACO Silver greedy problems are noticeably harder than LeetCode Medium.

Q3: When should I use priority_queue to assist greedy?

A: When you repeatedly need to extract the "current best" element (e.g., Huffman coding, minimum meeting rooms, repeatedly picking max/min values). priority_queue reduces "find the best" from O(N) to O(log N).

🔗 Connections to Other Chapters

Chapter 4.1 covered the theory of greedy and exchange arguments; this chapter applies them to real USACO problems
Chapter 3.3 (Binary Search) introduced the "binary search on answer" pattern used directly in the Convention problem here
Chapter 7.1 (Understanding USACO) and Chapter 7.2 (Problem-Solving Strategies) will further discuss how to recognize greedy vs DP in contests
Chapter 3.1 (STL) introduced priority_queue, which appears frequently in greedy simulations in this chapter

Practice Problems

Problem 4.2.1 — USACO 2016 December Bronze: Counting Haybales N haybales at positions on a number line. Q queries: how many haybales are in [L, R]? (Prefix sums, but practice the sorting mindset)

Problem 4.2.2 — USACO 2019 February Bronze: Sleepy Cow Sorting N cows labeled 1 to N (not in order). Move cows to sort them. Each move takes one cow from the end and inserts it somewhere. Minimum moves? (Greedy: find the longest already-sorted suffix)

Problem 4.2.3 — Task Scheduler N tasks labeled A–Z. Must wait k steps between two instances of the same task. Minimum time to complete all tasks? (Greedy: always schedule the most frequent remaining task)

Problem 4.2.4 — USACO 2018 February Silver: Convention II Cows arrive at a watering hole with arrival times and drink durations. The most senior waiting cow goes next. Simulate and find the maximum wait time. (Greedy simulation with priority queue)

Problem 4.2.5 — Weighted Job Scheduling N jobs with start, end, and profit. Select non-overlapping jobs to maximize total profit. (This one requires DP, NOT greedy — a good lesson in when greedy fails!)

🕸️ Part 5: Graph Algorithms

Learn to see graphs in problems and solve them efficiently. BFS, DFS, trees, Union-Find, and Kruskal's MST — the core of USACO Silver.

📚 4 Chapters · ⏱️ Estimated 2-3 weeks · 🎯 Target: Reach USACO Silver level

Part 5: Graph Algorithms

Estimated time: 2–3 weeks

Graphs are everywhere in competitive programming: mazes, networks, family trees, city maps. Part 5 teaches you to see graphs in problems and solve them efficiently.

What Topics Are Covered

Chapter	Topic	The Big Idea
Chapter 5.1	Introduction to Graphs	Representing graphs; adjacency lists; types of graphs
Chapter 5.2	BFS & DFS	Traversal, shortest paths, flood fill, connected components
Chapter 5.3	Trees & Special Graphs	Tree traversals; Union-Find; Kruskal's MST
Chapter 5.4	Shortest Paths	Dijkstra, Bellman-Ford, Floyd-Warshall, SPFA

What You'll Be Able to Solve After This Part

After completing Part 5, you'll be ready to tackle:

USACO Bronze:
- Flood fill (count connected regions in a grid)
- Reachability problems (can cow A reach cow B?)
- Simple BFS shortest paths in grids/graphs
USACO Silver:
- BFS/DFS on implicit graphs (states rather than explicit nodes)
- Multi-source BFS (distance to nearest obstacle/fire)
- Union-Find for dynamic connectivity
- Graph connectivity under edge additions
- Tree problems (subtree sums, depths, LCA)

Key Algorithms Introduced

Technique	Chapter	Time Complexity	USACO Relevance
DFS (recursive & iterative)	5.2	O(V + E)	Connectivity, cycle detection
BFS	5.2	O(V + E)	Shortest path (unweighted)
Grid BFS	5.2	O(R × C)	Maze problems, flood fill
Multi-source BFS	5.2	O(V + E)	Distance to nearest source
Connected components	5.2	O(V + E)	Counting disconnected regions
Tree traversals (pre/post-order)	5.3	O(N)	Subtree aggregation
Union-Find (DSU)	5.3	O(α(N)) ≈ O(1)	Dynamic connectivity
Kruskal's MST	5.3	O(E log E)	Minimum spanning tree
Dijkstra's algorithm	5.4	O((V + E) log V)	SSSP on non-negative weighted graphs
Bellman-Ford	5.4	O(V × E)	SSSP with negative edges; detect negative cycles
Floyd-Warshall	5.4	O(V³)	All-pairs shortest paths on small graphs
SPFA	5.4	O(V × E) worst	Practical Bellman-Ford with queue optimization

Prerequisites

Before starting Part 5, make sure you can:

Use vector<vector<int>> for adjacency lists (Chapters 2.3–3.1)
Use queue and stack from STL (Chapter 3.1, 3.5)
Work with 2D arrays and grid traversal (Chapter 2.3)
Understand basic nested loops (Chapter 2.2)
Use priority_queue (Chapter 3.1) — needed for Chapter 5.4 (Dijkstra)

Tips for This Part

Chapter 5.1 is mostly setup — read it to understand graph representation, but the real algorithms start in Chapter 5.2.
Chapter 5.2 (BFS) is one of the most important chapters for USACO Silver. Grid BFS appears in roughly 1/3 of Silver problems.
The dist[v] == -1 pattern for unvisited nodes in BFS is the key. Never mark visited when you pop — always when you push.
Chapter 5.3's Union-Find is faster to code than BFS for connectivity questions. Memorize the 15-line template — you'll use it constantly.
Chapter 5.4 (Dijkstra) is essential for weighted shortest path problems. Use priority_queue<pair<int,int>> with the standard template — it's the most common Silver/Gold graph algorithm.

💡 Key Insight: Most USACO graph problems are actually grid problems in disguise. A grid cell (r,c) becomes a graph node; adjacent cells become edges. BFS on this implicit graph finds shortest paths.

🏆 USACO Tip: Whenever you see "shortest path," "minimum steps," or "fewest moves" in a problem, think BFS immediately. Whenever you see "are these connected?" or "how many groups?", think DSU.

📖 Chapter 5.1 ⏱️ ~50 min read 🎯 Intermediate

Chapter 5.1: Introduction to Graphs

A graph is one of the most versatile mathematical structures ever invented. It models relationships between things — roads between cities, friendships between people, connections between web pages. In USACO, graphs represent mazes, networks, and relationships between cows.

5.1.1 What Is a Graph?

A graph consists of:

Vertices (also called nodes): the "things" (cities, cows, cells)
Edges: the connections between them (roads, friendships)

This graph has 6 vertices (1–6) and 6 edges.

Visual: Graph Basics Reference

Graph Basics

This reference diagram shows the key graph terminology — vertices, edges, directed vs undirected, weighted edges, and common graph properties — all in one view.

Types of Graphs

Type	Description	Example
Undirected	Edges have no direction; if A-B, then B-A	Friendships
Directed	Edges go one way; A→B doesn't mean B→A	Twitter follows
Weighted	Edges have costs/distances	Road distances
Unweighted	All edges equal	Maze connections
Tree	Connected, no cycles, N-1 edges for N nodes	File system
DAG	Directed Acyclic Graph	Dependencies

Most USACO Bronze/Silver problems use unweighted, undirected graphs or simple grids.

5.1.2 Graph Representation

The most important decision when coding a graph algorithm is how to store the graph.

Visual: Graph Structure and Adjacency List

The left side shows an undirected graph with 5 nodes and their edges. The right side shows the adjacency list — for each node, a list of its neighbors. This representation uses O(V + E) space, which is optimal for sparse graphs typical in USACO problems.

Adjacency List (USE THIS)

Store each vertex's neighbors as a list. This is the standard in competitive programming.

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, m;  // n vertices, m edges
    cin >> n >> m;

    // adj[u] = list of vertices connected to u
    vector<vector<int>> adj(n + 1);  // 1-indexed: vertices 1..n

    for (int i = 0; i < m; i++) {
        int u, v;
        cin >> u >> v;
        adj[u].push_back(v);  // edge u → v
        adj[v].push_back(u);  // edge v → u (undirected: add both directions)
    }

    // Print adjacency list
    for (int u = 1; u <= n; u++) {
        cout << u << " -> ";
        for (int v : adj[u]) cout << v << " ";
        cout << "\n";
    }

    return 0;
}

Input:

Output:

1 -> 2 3
2 -> 1 4
3 -> 1 5
4 -> 2 6
5 -> 3 6
6 -> 4 5

Space complexity: O(V + E). For V = 10^5 and E = 2×10^5, this is fine.

Adjacency Matrix (When to Use)

A 2D array where adj[u][v] = 1 if there's an edge from u to v.

bool adj[1001][1001] = {};  // global, zero-initialized

// Add edge u-v
adj[u][v] = true;
adj[v][u] = true;  // undirected

// Check if edge exists: O(1)
if (adj[u][v]) { ... }

Space complexity: O(V²). For V = 10^5, that's 10^10 bytes — way too much! Only use for V ≤ 1000.

When to Use Which

Condition	Use
V ≤ 1000 and need `O(1)` edge lookup	Adjacency matrix
V up to 10^5 (or larger)	Adjacency list
Almost all pairs are connected (dense graph)	Adjacency matrix
Few edges compared to pairs (sparse graph)	Adjacency list

Default in competitive programming: Always use adjacency list unless V is very small.

5.1.3 Reading Graph Input

USACO graphs come in several formats. Here are the patterns:

Standard: Edge List

5 4        ← n vertices, m edges
1 2        ← edge between 1 and 2
2 3
3 4
4 5

int n, m;
cin >> n >> m;
vector<vector<int>> adj(n + 1);
for (int i = 0; i < m; i++) {
    int u, v;
    cin >> u >> v;
    adj[u].push_back(v);
    adj[v].push_back(u);
}

Tree: Parent Array

5          ← n nodes
2 3 1 1    ← parent[2]=2, parent[3]=3, parent[4]=1, parent[5]=1 (node 1 is root)

int n;
cin >> n;
vector<vector<int>> children(n + 1);
for (int i = 2; i <= n; i++) {
    int parent;
    cin >> parent;
    children[parent].push_back(i);  // parent → child edge
}

Grid Graph

A grid where cells are nodes; edges connect adjacent cells (up/down/left/right):

4 4        ← rows × columns
....
.##.
....
....

int R, C;
cin >> R >> C;
vector<string> grid(R);
for (int r = 0; r < R; r++) cin >> grid[r];

// To iterate over neighbors of cell (r, c):
int dr[] = {-1, 1, 0, 0};  // row offsets for up/down/left/right
int dc[] = {0, 0, -1, 1};  // col offsets

for (int d = 0; d < 4; d++) {
    int nr = r + dr[d];  // neighbor row
    int nc = c + dc[d];  // neighbor col
    if (nr >= 0 && nr < R && nc >= 0 && nc < C) {
        // (nr, nc) is a valid neighbor
    }
}

5.1.4 Trees vs. Graphs

A tree is a special type of graph with these properties:

N nodes and exactly N-1 edges
Connected (every node reachable from every other)

Visual: Rooted Tree Structure

Tree Structure

A rooted tree has a designated root node at depth 0. Each node has a parent (except the root) and zero or more children. Leaf nodes have no children. This structure naturally represents hierarchies and enables efficient tree DP algorithms.

No cycles (acyclic)

Trees appear constantly in USACO — they represent hierarchies, family trees, and many other structures.

        1          ← root
       / \
      2   3
     / \   \
    4   5   6

Key tree vocabulary:

Root: The topmost node (usually node 1)
Parent: The node directly above in the hierarchy
Children: Nodes directly below
Leaf: A node with no children
Depth: Distance from the root (root has depth 0)
Height: Length of the longest path from a node to a leaf

Representing a Rooted Tree

vector<vector<int>> children(n + 1);  // children[u] = list of u's children
int parent[n + 1];

// Read tree as undirected graph, then root it with DFS
vector<vector<int>> adj(n + 1);
for (int i = 0; i < n - 1; i++) {
    int u, v;
    cin >> u >> v;
    adj[u].push_back(v);
    adj[v].push_back(u);
}

// Root at node 1 using DFS
fill(parent + 1, parent + n + 1, 0);
function<void(int, int)> root_tree = [&](int u, int par) {
    parent[u] = par;
    for (int v : adj[u]) {
        if (v != par) {
            children[u].push_back(v);
            root_tree(v, u);  // recursive DFS
        }
    }
};
root_tree(1, 0);

5.1.5 Weighted Graphs

For weighted graphs (edges with costs), store the weight alongside each neighbor:

vector<vector<pair<int,int>>> adj(n + 1);
// adj[u] = list of {v, weight} pairs

// Add weighted edge u-v with weight w
adj[u].push_back({v, w});
adj[v].push_back({u, w});

// Iterate neighbors with weights
for (auto [v, w] : adj[u]) {
    cout << "Edge " << u << "-" << v << " weight " << w << "\n";
}

Chapter Summary

📌 Key Takeaways

Concept	Key Points	Why It Matters
Graph	Vertices + edges; model "relationships"	Almost all USACO Silver+ problems involve graphs
Undirected	Add v to `adj[u]` and u to `adj[v]`	Forgetting both directions is the most common bug
Directed	Add only v to `adj[u]`	Twitter follows, dependency relations, etc.
Adjacency list	`vector<vector<int>> adj(n+1)`	Default choice, `O(V+E)` space
Adjacency matrix	`bool adj[1001][1001]`	Only use when V ≤ 1000
Grid graph	4-direction neighbors + boundary check	Most common graph input in USACO
Tree	Connected acyclic, N-1 edges	Special graph, supports efficient algorithms

❓ FAQ

Q1: Why use vector<vector<int>> for adjacency list instead of linked lists?

A: C++ vector uses contiguous memory, is cache-friendly, and is much faster than linked lists. In contests, list is almost never used; vector<vector<int>> is the standard approach.

Q2: Should graph vertices be 0-indexed or 1-indexed?

A: USACO problems are usually 1-indexed. Recommend declaring adjacency list with size n+1: vector<vector<int>> adj(n+1). This wastes index 0 but makes code clearer and less error-prone.

Q3: What is the only difference between directed and undirected graphs?

A: When reading edges, undirected graphs add two (u→v and v→u), directed graphs add only one (u→v). The subsequent BFS/DFS code is identical.

Q4: Does a grid graph need an explicit adjacency list?

A: No! Grid graph "neighbors" can be computed implicitly via direction arrays dr[]/dc[], no need to store an adjacency list—saves memory and is cleaner.

🔗 Connections to Later Chapters

Chapter 5.2 (BFS & DFS) runs on the adjacency list built in this chapter—this chapter is a prerequisite for Chapter 5.2
Chapter 5.3 (Trees & DSU) uses this chapter's tree representation and adds Union-Find
Graph traversal from Chapters 5.1–5.2 is the foundation for "Tree DP" and "DP on DAG" in Chapters 6.1–6.3 (DP)
Grid graph representation is used throughout the book—BFS shortest path, Flood Fill, grid DP, etc.

Practice Problems

Problem 5.1.1 — Degree Count Read an undirected graph with N vertices and M edges. Print the degree (number of edges) of each vertex.

Problem 5.1.2 — Is It a Tree? Read a connected graph. Determine if it's a tree (exactly N-1 edges and no cycles).

Problem 5.1.3 — Reachability Read a directed graph and two vertices S and T. Print "YES" if T is reachable from S following directed edges, "NO" otherwise. (You'll need DFS from Chapter 5.2 to fully solve this, but you can set it up now)

Problem 5.1.4 — Leaf Count Read a rooted tree. Count how many nodes are leaves (have no children).

Problem 5.1.5 — Grid to Graph Read an N×M grid. Cells with '.' are passable; '#' are walls. Print the number of edges in the implicit graph (connect adjacent '.' cells).

Visual: Graph Adjacency List

Graph Adjacency List

The left side shows a 5-node weighted graph visually. The right side shows the corresponding adjacency list in C++: vector<pair<int,int>> adj[] where each entry is a {neighbor, weight} pair. This is the standard representation for most USACO graph problems.

📖 Chapter 5.2 ⏱️ ~75 min read 🎯 Intermediate

Chapter 5.2: BFS & DFS

📝 Before You Continue: Make sure you understand graph representation (Chapter 5.1), queues and stacks (Chapter 3.6), and basic 2D array traversal (Chapter 2.3).

Graph traversal algorithms explore every node reachable from a starting point. They're the foundation of dozens of graph algorithms. DFS (Depth-First Search) dives deep before backtracking. BFS (Breadth-First Search) explores layer by layer. Knowing which to use and when is a skill you'll develop throughout your competitive programming career.

5.2.1 Depth-First Search (DFS)

DFS works like exploring a maze: you keep going forward until you hit a dead end, then backtrack and try another path.

Visual: DFS Traversal Order

DFS Traversal

DFS dives as deep as possible before backtracking. The numbered circles show the visit order, red dashed arrows show backtracking. The call stack on the right illustrates how recursion naturally implements the LIFO behaviour needed for DFS.

Recursive DFS

#include <bits/stdc++.h>
using namespace std;

const int MAXN = 100001;
vector<int> adj[MAXN];
bool visited[MAXN];

void dfs(int u) {
    visited[u] = true;           // mark current node as visited
    cout << u << " ";            // process u (print it, in this example)

    for (int v : adj[u]) {       // for each neighbor v
        if (!visited[v]) {       // if not yet visited
            dfs(v);              // recursively explore v
        }
    }
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, m;
    cin >> n >> m;

    for (int i = 0; i < m; i++) {
        int u, v;
        cin >> u >> v;
        adj[u].push_back(v);
        adj[v].push_back(u);
    }

    // DFS from node 1
    dfs(1);
    cout << "\n";

    return 0;
}

Important: Always mark nodes as visited before recursing, not after! This prevents infinite loops on cycles.

Iterative DFS (Using a Stack)

For very large graphs, recursive DFS can cause a stack overflow (too deep recursion). The iterative version uses an explicit stack:

void dfs_iterative(int start, int n) {
    vector<bool> visited(n + 1, false);
    stack<int> st;

    st.push(start);

    while (!st.empty()) {
        int u = st.top();
        st.pop();

        if (visited[u]) continue;  // may have been pushed multiple times
        visited[u] = true;
        cout << u << " ";

        for (int v : adj[u]) {
            if (!visited[v]) {
                st.push(v);
            }
        }
    }
}

5.2.2 Connected Components

A connected component is a maximal set of vertices where every vertex can reach every other vertex. Finding components is a very common USACO task.

#include <bits/stdc++.h>
using namespace std;

const int MAXN = 100001;
vector<int> adj[MAXN];
int comp[MAXN];   // comp[v] = component ID of vertex v

void dfs(int u, int id) {
    comp[u] = id;
    for (int v : adj[u]) {
        if (comp[v] == 0) {   // 0 means unvisited (use 0 as sentinel, 1-index components from 1)
            dfs(v, id);
        }
    }
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, m;
    cin >> n >> m;

    for (int i = 0; i < m; i++) {
        int u, v;
        cin >> u >> v;
        adj[u].push_back(v);
        adj[v].push_back(u);
    }

    int numComponents = 0;
    for (int u = 1; u <= n; u++) {
        if (comp[u] == 0) {
            numComponents++;
            dfs(u, numComponents);  // assign component ID
        }
    }

    cout << "Number of components: " << numComponents << "\n";

    // Print component sizes
    vector<int> size(numComponents + 1, 0);
    for (int u = 1; u <= n; u++) size[comp[u]]++;
    for (int i = 1; i <= numComponents; i++) {
        cout << "Component " << i << ": " << size[i] << " nodes\n";
    }

    return 0;
}

5.2.3 Breadth-First Search (BFS)

BFS explores all nodes at distance 1, then all at distance 2, then distance 3, and so on. This makes it perfect for finding shortest paths in unweighted graphs.

Visual: BFS Level-by-Level Traversal

BFS Traversal

BFS spreads outward like ripples in a pond. Each "level" of nodes is colored differently, showing that all nodes at distance d from the source are discovered before any node at distance d+1. The queue at the bottom shows the processing order.

BFS Template

// Solution: BFS Shortest Path — O(V + E)
#include <bits/stdc++.h>
using namespace std;

const int MAXN = 100001;
vector<int> adj[MAXN];

// Returns array of shortest distances from source to all vertices
// dist[v] = -1 means unreachable
vector<int> bfs(int source, int n) {
    vector<int> dist(n + 1, -1);
    queue<int> q;

    dist[source] = 0;     // distance to source is 0
    q.push(source);       // seed the queue with the source

    while (!q.empty()) {
        int u = q.front();
        q.pop();

        for (int v : adj[u]) {
            if (dist[v] == -1) {          // not yet visited
                dist[v] = dist[u] + 1;   // ← KEY LINE: one hop further
                q.push(v);
            }
        }
    }

    return dist;
}

Why BFS Finds Shortest Paths

BFS processes nodes in order of their distance from the source. The first time BFS visits a node, it's via the shortest path. This is because BFS never visits a node at distance d+1 before visiting all nodes at distance d.

💡 Key Insight: Think of BFS as dropping a stone in water — ripples spread outward one layer at a time. All cells at distance 1 are processed before any cell at distance 2. This level-by-level processing guarantees the first visit to any node is via the shortest path.

BFS vs. DFS for shortest path:

BFS: guaranteed shortest path in unweighted graphs ✓

DFS: does NOT guarantee shortest path ✗

Complexity Analysis:

Time: O(V + E) — each vertex and edge is processed at most once
Space: O(V) — for the distance array and queue

Complete BFS Shortest Path Trace on a 4×4 Grid

Let's trace BFS starting from node 1 in this graph:

Edges: 1-2, 2-3, 1-4, 3-6, 4-5, 5-7, 7-8

BFS Trace:

Start: dist = [-1, 0, -1, -1, -1, -1, -1, -1, -1]  (1-indexed, source=1)
Queue: [1]

Process 1: neighbors 2, 4
  → dist[2] = 1, dist[4] = 1
  Queue: [2, 4]

Process 2: neighbors 1, 3
  → 1 already visited; dist[3] = 2
  Queue: [4, 3]

Process 4: neighbors 1, 5
  → 1 already visited; dist[5] = 2
  Queue: [3, 5]

Process 3: neighbors 2, 6
  → 2 already visited; dist[6] = 3
  Queue: [5, 6]

Process 5: neighbors 4, 7
  → 4 already visited; dist[7] = 3
  Queue: [6, 7]

Process 6: neighbor 3 → already visited
Process 7: neighbors 5, 8
  → 5 already visited; dist[8] = 4
  Queue: [8]

Process 8: neighbor 7 → already visited. Queue empty.

Final distances from node 1:
Node: 1  2  3  4  5  6  7  8
Dist: 0  1  2  1  2  3  3  4

5.2.4 Grid BFS — The Most Common USACO Pattern

Many USACO problems give you a grid with passable (.) and blocked (#) cells. BFS finds the shortest path from one cell to another.

Visual: Grid BFS Distance Flood Fill

Grid BFS

Starting from the center cell (distance 0), BFS expands to all reachable cells, recording the minimum number of steps to reach each one. Cells colored more blue are farther away. This is exactly how USACO flood-fill and shortest-path problems work on grids.

USACO-Style Grid BFS Problem: Maze Shortest Path

Problem: Given a 5×5 maze with walls (#) and open cells (.), find the shortest path from top-left (0,0) to bottom-right (4,4). Print the length, or -1 if no path exists.

The Maze:

. . . # .
# # . # .
. . . . .
. # # # .
. . . . .

BFS Trace — Distance Array Filling:

Starting at (0,0), BFS expands level by level. Here's the distance each cell gets assigned:

Step 0 — Initialize:
dist[0][0] = 0, queue: [(0,0)]

Step 1 — Process (0,0):
  Neighbors: (0,1)='.', (1,0)='#'(wall)
  dist[0][1] = 1. Queue: [(0,1)]

Step 2 — Process (0,1):
  Neighbors: (0,0)=visited, (0,2)='.', (1,1)='#'
  dist[0][2] = 2. Queue: [(0,2)]

Step 3 — Process (0,2):
  Neighbors: (0,1)=visited, (0,3)='#', (1,2)='.'
  dist[1][2] = 3. Queue: [(1,2)]

Step 4 — Process (1,2):
  Neighbors: (0,2)=visited, (1,1)='#', (1,3)='#', (2,2)='.'
  dist[2][2] = 4. Queue: [(2,2)]

Step 5 — Process (2,2):
  Neighbors: (1,2)=visited, (2,1)='.', (2,3)='.', (3,2)='#'
  dist[2][1] = 5, dist[2][3] = 5. Queue: [(2,1),(2,3)]

...continuing BFS...

Final distance array (. = reachable, # = wall, X = unreachable):
    c=0  c=1  c=2  c=3  c=4
r=0:  0    1    2    #    X
r=1:  #    #    3    #    X
r=2:  8    5    4    5    6
r=3:  9    #    #    #    7
r=4: 10   11   12   11    8

Shortest path length = dist[4][4] = 8

Path reconstruction: Follow the path backward from (4,4), always moving to the cell with distance one less:

(4,4)=8 → (3,4)=7 → (2,4)=6 → (2,3)=5 → (2,2)=4 → (1,2)=3 → (0,2)=2 → (0,1)=1 → (0,0)=0
Path length: 8 steps ✓

ASCII Visualization of the path:

S → . → . # .
# # ↓ # .
. . ↓ → → →
. # # # ↓
. . . . E

Complete C++ Code:

// Solution: Grid BFS Shortest Path — O(R × C)
#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int R, C;
    cin >> R >> C;
    vector<string> grid(R);
    for (int r = 0; r < R; r++) cin >> grid[r];

    // Find start (S) and end (E), or use fixed corners
    int sr = 0, sc = 0, er = R-1, ec = C-1;

    // BFS distance array: -1 = unvisited
    vector<vector<int>> dist(R, vector<int>(C, -1));
    queue<pair<int,int>> q;

    // Step 1: Seed BFS from source
    dist[sr][sc] = 0;
    q.push({sr, sc});

    // Step 2: Direction arrays (up, down, left, right)
    int dr[] = {-1, 1, 0, 0};
    int dc[] = {0, 0, -1, 1};

    // Step 3: BFS expansion
    while (!q.empty()) {
        auto [r, c] = q.front();
        q.pop();

        for (int d = 0; d < 4; d++) {
            int nr = r + dr[d];
            int nc = c + dc[d];

            if (nr >= 0 && nr < R           // in-bounds row
                && nc >= 0 && nc < C        // in-bounds col
                && grid[nr][nc] != '#'       // not a wall
                && dist[nr][nc] == -1) {     // ← KEY LINE: not yet visited

                dist[nr][nc] = dist[r][c] + 1;
                q.push({nr, nc});
            }
        }
    }

    // Step 4: Output result
    if (dist[er][ec] == -1) {
        cout << -1 << "\n";   // no path
    } else {
        cout << dist[er][ec] << "\n";
    }

    return 0;
}

Sample Input (the maze above):

5 5
...#.
##.#.
.....
.###.
.....

Sample Output:

⚠️ Common Mistake: Using DFS instead of BFS for shortest path in a maze. DFS might find A path, but not the SHORTEST path. Always use BFS for shortest distances in unweighted grids.

5.2.5 USACO Example: Flood Fill

USACO loves "flood fill" problems: find all connected cells of the same type, or count connected regions.

Problem: Count the number of distinct connected regions of '.' cells in a grid. (Like counting islands.)

#include <bits/stdc++.h>
using namespace std;

int R, C;
vector<string> grid;
vector<vector<bool>> visited;

void floodFill(int r, int c) {
    if (r < 0 || r >= R || c < 0 || c >= C) return;  // out of bounds
    if (visited[r][c]) return;                          // already visited
    if (grid[r][c] == '#') return;                      // wall

    visited[r][c] = true;

    floodFill(r - 1, c);  // up
    floodFill(r + 1, c);  // down
    floodFill(r, c - 1);  // left
    floodFill(r, c + 1);  // right
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    cin >> R >> C;
    grid.resize(R);
    visited.assign(R, vector<bool>(C, false));

    for (int r = 0; r < R; r++) cin >> grid[r];

    int regions = 0;
    for (int r = 0; r < R; r++) {
        for (int c = 0; c < C; c++) {
            if (!visited[r][c] && grid[r][c] == '.') {
                regions++;
                floodFill(r, c);
            }
        }
    }

    cout << regions << "\n";
    return 0;
}

5.2.6 Multi-Source BFS

Sometimes you start BFS from multiple source nodes simultaneously. For example: "Find the minimum distance from each cell to the nearest fire."

// Multi-source BFS: start from all fire cells at once
queue<pair<int,int>> q;
vector<vector<int>> dist(R, vector<int>(C, -1));

// Push ALL sources first
for (int r = 0; r < R; r++) {
    for (int c = 0; c < C; c++) {
        if (grid[r][c] == 'F') {  // fire cell
            dist[r][c] = 0;
            q.push({r, c});
        }
    }
}

// Run BFS from all sources simultaneously
while (!q.empty()) {
    auto [r, c] = q.front();
    q.pop();
    for (int d = 0; d < 4; d++) {
        int nr = r + dr[d], nc = c + dc[d];
        if (/*valid and unvisited*/ nr >= 0 && nr < R && nc >= 0 && nc < C && dist[nr][nc] == -1) {
            dist[nr][nc] = dist[r][c] + 1;
            q.push({nr, nc});
        }
    }
}

5.2.7 DFS vs. BFS — When to Use Each

Task	Use	Why
Shortest path (unweighted)	BFS ✓	Level-by-level guarantees shortest
Connectivity / connected components	Either	Both work; DFS often simpler recursively
Cycle detection	DFS ✓	Recursion stack tracks current path
Topological sort	DFS ✓	Post-order gives reverse topological order
Flood fill	Either (DFS often simpler)	DFS recursion is concise
Bipartite check	BFS or DFS	2-color with either
Distance to ALL nodes	BFS ✓	BFS naturally computes all distances
Tree traversals (pre/in/post order)	DFS ✓	Recursion maps naturally to tree structure

💡 Key Insight: Use BFS whenever you need "the minimum number of steps." Use DFS whenever you just need to visit all nodes or check properties of paths.

⚠️ Common Mistakes in Chapter 5.2

Using DFS for shortest path: DFS explores one path deeply and doesn't guarantee minimum steps. Always use BFS for unweighted shortest paths.
Forgetting bounds check: nr >= 0 && nr < R && nc >= 0 && nc < C — missing any one of these four conditions causes out-of-bounds crashes.
Not marking visited before pushing to queue: If you mark visited when popping instead of pushing, the same node can be pushed multiple times, causing O(V²) time instead of O(V+E).
Stack overflow in recursive DFS: For grids with N×M = 10^6, recursive DFS can exceed the default stack size. Use iterative DFS or increase stack size.
Using wrong starting point: In grid problems, make sure you're BFSing from the correct cell (0-indexed vs 1-indexed confusion).

Chapter Summary

📌 Key Takeaways

Algorithm	Data Structure	Time	Space	Best For
DFS (recursive)	Call stack	`O(V+E)`	`O(V)`	Connectivity, cycle detection, tree problems
DFS (iterative)	Explicit stack	`O(V+E)`	`O(V)`	Same, avoids stack overflow
BFS	Queue	`O(V+E)`	`O(V)`	Shortest path, layer traversal
Multi-source BFS	Queue (multi-source pre-fill)	`O(V+E)`	`O(V)`	Distance from each node to nearest source
3-Color DFS	Color array	`O(V+E)`	`O(V)`	Directed graph cycle detection
Topological Sort	DFS/BFS (Kahn)	`O(V+E)`	`O(V)`	Sorting/DP on DAG

❓ FAQ

Q1: Both BFS and DFS have time complexity O(V+E). Why can BFS find shortest paths but DFS cannot?

A: The key is visit order. BFS uses a queue to guarantee "process all nodes at distance d before distance d+1," so the first time a node is reached is always via the shortest path. DFS uses a stack (or recursion) and may take a long path to a node, missing shorter ones.

Q2: When does recursive DFS cause stack overflow? How to fix it?

A: Default stack size is ~1-8 MB. Each recursion level uses ~100-200 bytes. When graph depth exceeds ~10^4-10^5, overflow may occur. Solutions: ① Switch to iterative DFS (explicit stack); ② Add -Wl,-z,stacksize=67108864 at compile time to increase stack size.

Q3: In Grid BFS, why use dist == -1 for unvisited instead of a visited array?

A: Using dist[r][c] == -1 kills two birds with one stone: it records both "visited or not" and "distance to reach." One fewer array, cleaner code.

Q4: When to use DFS topological sort vs. Kahn's BFS topological sort?

A: DFS topological sort has shorter code (just reverse postorder), but Kahn's is more intuitive and can detect cycles (if final sorted length < N, there is a cycle). Both are common in contests; choose whichever you're more comfortable with.

🔗 Connections to Later Chapters

Chapter 5.3 (Trees & DSU): Tree Traversal (pre/postorder) is essentially DFS
Chapters 5.3 & 6.1–6.3 (DP): "DP on DAG" requires topological sort first, then compute DP in topological order
Chapter 4.1 (Greedy): Some graph greedy problems need BFS to compute distances as input
BFS shortest path is a simplified version of Dijkstra (Gold level)—Dijkstra handles weighted graphs, BFS handles unweighted
Multi-source BFS is extremely common in USACO Silver and is a must-master core technique

Practice Problems

Problem 5.2.1 — Island Count 🟢 Easy Read an N×M grid of '.' (water) and '#' (land). Count the number of islands (connected groups of '#').

Hint

Do DFS/BFS from each unvisited '#' cell. Each DFS call marks a full island. Count how many DFS calls you make.

Problem 5.2.2 — Maze Shortest Path 🟢 Easy Read an N×M maze with 'S' (start), 'E' (end), '.' (passable), '#' (wall). Find the minimum steps from S to E, or -1 if impossible.

Hint

BFS from S. When you reach E, output ``dist[E]``. If E is never reached, output -1.

Problem 5.2.3 — Bipartite Check 🟡 Medium A graph is bipartite if you can color each node black or white such that every edge connects a black node to a white node. Given a graph, determine if it's bipartite.

Solution sketch: BFS and 2-color. When you visit a node, color it the opposite of its parent. If you ever find an edge between two same-colored nodes, it's not bipartite.

Hint

Assign color 0 to the source. For each neighbor, assign color 1-parent_color. If a neighbor already has the same color as the current node, return false.

Problem 5.2.4 — Multi-Source BFS: Nearest Fire 🟡 Medium Given a grid with fire cells 'F', empty cells '.', and walls '#', find the minimum distance from each empty cell to the nearest fire cell.

Solution sketch: Initialize the BFS queue with ALL fire cells at distance 0. Run BFS normally. Each empty cell gets the distance to its nearest fire.

Hint

Multi-source BFS = push all sources into queue at step 0. The BFS then naturally computes the minimum distance from any source to each cell.

Problem 5.2.5 — USACO 2016 February Bronze: Milk Pails 🔴 Hard Starting from a state (0, 0) of two buckets with capacities A and B, operations: fill A (→ capacity A), fill B, pour A into B, pour B into A, empty A, empty B. Find minimum operations to get exactly X gallons in either bucket.

Solution sketch: BFS on states (a, b) where a ∈ [0,A] and b ∈ [0,B]. Each state is a node, each operation is an edge. BFS from (0,0) finds minimum operations.

Hint

Total states: `O(A×B)`. BFS explores at most `O(A×B)` states, each with 6 transitions. Make sure to mark visited states to avoid cycles.

🏆 Challenge Problem: USACO 2015 December Bronze: Switching on the Lights You have an N×N grid of light switches. Each switch is connected to some lights. Turn on all lights by flipping switches. Model as a BFS/DFS graph reachability problem where turning on a light may reveal new switches.

This requires multi-source BFS + careful state management.

5.2.8 Multi-Source BFS — In Depth

Multi-source BFS starts from multiple source nodes simultaneously. The key: push all sources into the queue at distance 0 before starting BFS.

Why does this work? BFS processes nodes level by level. If multiple nodes start at "level 0," BFS naturally propagates from all of them in parallel — exactly as if you had a virtual super-source connected to all real sources at cost 0.

Level 0:    [S₁][S₂][S₃]    ← all fire sources / all starting nodes
Level 1:   neighbors of S₁, S₂, S₃
Level 2:   their neighbors not yet visited
...

Complete Example: Spreading Fire

Problem: Given an N×M grid with fire cells ('F'), water cells ('.'), and walls ('#'), compute the minimum distance from each '.' cell to the nearest fire cell.

// Solution: Multi-Source BFS — O(N×M)
#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int R, C;
    cin >> R >> C;
    vector<string> grid(R);
    for (auto& row : grid) cin >> row;

    vector<vector<int>> dist(R, vector<int>(C, -1));
    queue<pair<int,int>> q;

    // ← KEY: Push ALL fire sources at distance 0 before starting BFS
    for (int r = 0; r < R; r++) {
        for (int c = 0; c < C; c++) {
            if (grid[r][c] == 'F') {
                dist[r][c] = 0;
                q.push({r, c});
            }
        }
    }

    int dr[] = {-1, 1, 0, 0};
    int dc[] = {0, 0, -1, 1};

    while (!q.empty()) {
        auto [r, c] = q.front(); q.pop();
        for (int d = 0; d < 4; d++) {
            int nr = r + dr[d], nc = c + dc[d];
            if (nr >= 0 && nr < R && nc >= 0 && nc < C
                && grid[nr][nc] != '#' && dist[nr][nc] == -1) {
                dist[nr][nc] = dist[r][c] + 1;
                q.push({nr, nc});
            }
        }
    }

    // Print distance grid
    for (int r = 0; r < R; r++) {
        for (int c = 0; c < C; c++) {
            if (dist[r][c] == -1) cout << " # ";
            else cout << " " << dist[r][c] << " ";
        }
        cout << "\n";
    }

    return 0;
}

BFS Level Visualization:

Level 0:    [F₁][F₂]          ← all fire sources enter queue together
Level 1:   [ 1 ][ 1 ][ 1 ]    ← cells adjacent to any fire source
Level 2:  [ 2 ][ 2 ][ 2 ][ 2 ]
Level 3: [ 3 ][ 3 ][ 3 ][ 3 ][ 3 ]

Multi-Source BFS 层级扩散示意：

flowchart TD
    subgraph L0["Level 0 — 初始火源"]
        F1(["F₁\ndist=0"]) 
        F2(["F₂\ndist=0"])
    end
    subgraph L1["Level 1 — 第一层扩散"]
        N1(["dist=1"])
        N2(["dist=1"])
        N3(["dist=1"])
    end
    subgraph L2["Level 2 — 第二层扩散"]
        N4(["dist=2"])
        N5(["dist=2"])
    end
    F1 --> N1
    F1 --> N2
    F2 --> N2
    F2 --> N3
    N1 --> N4
    N3 --> N5
    style F1 fill:#fca5a5,stroke:#dc2626
    style F2 fill:#fca5a5,stroke:#dc2626
    style L0 fill:#fff1f2
    style L1 fill:#fef9ec
    style L2 fill:#f0fdf4

💡 关键原理： 所有火源同时入队（dist=0），BFS 自然地并行向外扩散。每个空白格子得到的是到最近火源的最短距离——这是 BFS 层序性质的直接推论。

Each cell gets the minimum distance to any fire source — guaranteed by BFS's level-order property.

USACO Application: "Icy Perimeter" Style

Multi-source BFS is useful when you need:

"Distance from each cell to nearest [thing]"
"Spreading from multiple starting points" (fire, infection, flood)
"Simultaneous evacuation from multiple exits"

5.2.9 Cycle Detection with DFS — White/Gray/Black Coloring

For directed graphs, cycle detection uses 3-color DFS:

White (0): Not yet visited
Gray (1): Currently in DFS call stack (being processed)
Black (2): Fully processed (all descendants explored)

A back edge (edge to a gray node) indicates a cycle.

// Solution: Cycle Detection in Directed Graph — O(V+E)
#include <bits/stdc++.h>
using namespace std;

int n;
vector<int> adj[100001];
vector<int> color;   // 0=white, 1=gray, 2=black
bool hasCycle = false;

void dfs(int u) {
    color[u] = 1;  // mark as "in progress" (gray)

    for (int v : adj[u]) {
        if (color[v] == 0) {
            dfs(v);              // unvisited: recurse
        } else if (color[v] == 1) {
            hasCycle = true;     // ← back edge: v is an ancestor of u → cycle!
        }
        // color[v] == 2: already fully processed, safe to skip
    }

    color[u] = 2;  // mark as "done" (black)
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int m;
    cin >> n >> m;
    color.assign(n + 1, 0);

    for (int i = 0; i < m; i++) {
        int u, v; cin >> u >> v;
        adj[u].push_back(v);  // directed edge u → v
    }

    for (int u = 1; u <= n; u++) {
        if (color[u] == 0) dfs(u);
    }

    cout << (hasCycle ? "HAS CYCLE" : "NO CYCLE") << "\n";
    return 0;
}

⚠️ Undirected graph cycle detection: For undirected graphs, use a simpler method: during DFS, if you visit a node that's already visited AND it's not the parent of the current node, there's a cycle. Alternatively, use DSU: if an edge connects two already-connected nodes, it creates a cycle.

5.2.10 Topological Sort with DFS

Topological sort orders the nodes of a directed acyclic graph (DAG) such that for every edge u → v, u comes before v.

DFS approach: When a node finishes (all descendants processed), add it to the front of the result list. This gives reverse topological order.

Finish order (post-order):  E, D, C, B, A
Topological order (reverse): A, B, C, D, E

// Solution: Topological Sort via DFS — O(V+E)
#include <bits/stdc++.h>
using namespace std;

vector<int> adj[100001];
vector<bool> visited;
vector<int> topoOrder;

void dfs(int u) {
    visited[u] = true;
    for (int v : adj[u]) {
        if (!visited[v]) dfs(v);
    }
    topoOrder.push_back(u);  // ← add AFTER all children processed (post-order)
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, m;
    cin >> n >> m;
    visited.assign(n + 1, false);

    for (int i = 0; i < m; i++) {
        int u, v; cin >> u >> v;
        adj[u].push_back(v);
    }

    for (int u = 1; u <= n; u++) {
        if (!visited[u]) dfs(u);
    }

    // Reverse post-order = topological order
    reverse(topoOrder.begin(), topoOrder.end());

    for (int u : topoOrder) cout << u << " ";
    cout << "\n";

    return 0;
}

Alternative: Kahn's Algorithm (BFS-based Topological Sort)

// Kahn's Algorithm: Process nodes with in-degree 0 first — O(V+E)
vector<int> inDeg(n + 1, 0);
for (int u = 1; u <= n; u++)
    for (int v : adj[u])
        inDeg[v]++;

queue<int> q;
for (int u = 1; u <= n; u++)
    if (inDeg[u] == 0) q.push(u);  // start with nodes having no prerequisites

vector<int> order;
while (!q.empty()) {
    int u = q.front(); q.pop();
    order.push_back(u);
    for (int v : adj[u]) {
        inDeg[v]--;
        if (inDeg[v] == 0) q.push(v);
    }
}

// If order.size() != n, there's a cycle (not a DAG)
if ((int)order.size() != n) cout << "CYCLE DETECTED\n";
else for (int u : order) cout << u << " ";

Kahn 算法入度变化过程：

flowchart LR
    subgraph init["初始入度"]
        direction TB
        A0(["A\nin=0"]) 
        B0(["B\nin=1"])
        C0(["C\nin=2"])
        D0(["D\nin=1"])
    end
    subgraph step1["处理 A（in=0）"]
        direction TB
        A1(["A\n已处理"]) 
        B1(["B\nin=0↓"])
        C1(["C\nin=2"])
        D1(["D\nin=1"])
    end
    subgraph step2["处理 B（in=0）"]
        direction TB
        A2(["A\n已处理"]) 
        B2(["B\n已处理"])
        C2(["C\nin=1↓"])
        D2(["D\nin=0↓"])
    end
    subgraph step3["处理 D→C（in=0）"]
        direction TB
        A3(["A"]) 
        B3(["B"])
        C3(["C\nin=0↓"])
        D3(["D\n已处理"])
    end
    init -->|"将 A 入队"| step1
    step1 -->|"将 B 入队"| step2
    step2 -->|"将 D,C 入队"| step3
    style A0 fill:#dcfce7,stroke:#16a34a
    style B1 fill:#dcfce7,stroke:#16a34a
    style D2 fill:#dcfce7,stroke:#16a34a
    style C3 fill:#dcfce7,stroke:#16a34a

💡 循环检测： 若最终 order.size() < n，说明有节点的入度始终不为 0（它们在环中）。这是 Kahn 算法相比 DFS 拓扑排序的一大优势。

💡 Key Application: Topological sort is essential for DP on DAGs. If the dependency graph is a DAG, process nodes in topological order — each node's DP state depends only on previously-processed nodes.

Example DAG and BFS levels visualization: BFS DAG Levels

Visual: Grid BFS Distances from Source

BFS Grid Distances

The diagram shows a 5×5 grid BFS where each cell displays its minimum distance from the source (0,0). Walls are shown in dark gray. Note how the BFS "flood fills" outward in concentric rings, never revisiting a cell — guaranteeing minimum distances.

📖 Chapter 5.3 ⏱️ ~80 min read 🎯 Advanced

Chapter 5.3: Trees & Special Graphs

Trees are graphs with a special structure that enables elegant and efficient algorithms. This chapter covers tree traversals, and one of the most important data structures in competitive programming: Union-Find (also called Disjoint Set Union or DSU).

5.3.1 Tree Traversals

Given a rooted tree, there are three classic ways to visit every node with DFS:

Pre-order: Visit node, then children (node before subtree)
In-order: Visit left child, node, right child (only for binary trees)
Post-order: Visit children, then node (subtree before node)

#include <bits/stdc++.h>
using namespace std;

const int MAXN = 100001;
vector<int> children[MAXN];

// Pre-order: parent before children (useful for computing subtree info top-down)
void preorder(int u) {
    cout << u << " ";  // process u first
    for (int v : children[u]) preorder(v);
}

// Post-order: children before parent (useful for subtree aggregation bottom-up)
void postorder(int u) {
    for (int v : children[u]) postorder(v);
    cout << u << " ";  // process u after all children
}

// Calculate subtree size (post-order style)
int subtreeSize[MAXN];
void calcSize(int u) {
    subtreeSize[u] = 1;  // start with just this node
    for (int v : children[u]) {
        calcSize(v);
        subtreeSize[u] += subtreeSize[v];  // add child subtree sizes
    }
}

// Calculate depth of each node (pre-order style)
int depth[MAXN];
void calcDepth(int u, int d) {
    depth[u] = d;
    for (int v : children[u]) {
        calcDepth(v, d + 1);
    }
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;

    for (int i = 2; i <= n; i++) {
        int p; cin >> p;
        children[p].push_back(i);
    }

    cout << "Pre-order: ";
    preorder(1);
    cout << "\n";

    cout << "Post-order: ";
    postorder(1);
    cout << "\n";

    calcSize(1);
    cout << "Subtree sizes: ";
    for (int i = 1; i <= n; i++) cout << subtreeSize[i] << " ";
    cout << "\n";

    calcDepth(1, 0);
    cout << "Depths: ";
    for (int i = 1; i <= n; i++) cout << depth[i] << " ";
    cout << "\n";

    return 0;
}

5.3.2 Lowest Common Ancestor (LCA) — Naive

The LCA of two nodes u and v is the deepest node that is an ancestor of both.

For small trees, a naive approach: march both nodes up to the root, find where they first meet.

Naive LCA 向上爬升过程示意：

flowchart TD
    subgraph tree["树结构（根节点=1）"]
        N1(["1\ndepth=0"])
        N2(["2\ndepth=1"])
        N3(["3\ndepth=1"])
        N4(["4\ndepth=2"])
        N5(["5\ndepth=2"])
        N6(["6\ndepth=3"])
        N1 --> N2
        N1 --> N3
        N2 --> N4
        N2 --> N5
        N4 --> N6
    end
    subgraph lca["LCA(6, 5) 求解过程"]
        direction LR
        S1["u=6, v=5\ndepth[6]=3, depth[5]=2"] -->|"u上移至同深度"| S2
        S2["u=4, v=5\ndepth相同=2"] -->|"u≠v，同步上移"| S3
        S3["u=2, v=2\nu==v → LCA=2 ✓"]
    end
    style S3 fill:#dcfce7,stroke:#16a34a

💡 关键步骤： ① 先将深度较大的节点上移至与另一节点同深度；② 再同步上移直到两节点相遇，相遇点即为 LCA。

#include <bits/stdc++.h>
using namespace std;

int parent[100001];
int depth_arr[100001];

// Naive LCA: walk both nodes up until they meet — O(depth) per query
int lca(int u, int v) {
    while (depth_arr[u] > depth_arr[v]) u = parent[u];  // bring u up to same depth as v
    while (depth_arr[v] > depth_arr[u]) v = parent[v];  // bring v up to same depth as u
    while (u != v) {  // now both at same depth; walk up together
        u = parent[u];
        v = parent[v];
    }
    return u;
}

For Silver problems, naive LCA (O(N) per query) is often sufficient. Gold uses binary lifting (O(log N) per query).

Binary Lifting 祖先表构建示意：

flowchart LR
    subgraph anc0["anc[v][0]（直接父节点）"]
        direction TB
        v6a(["6"]) -->|"父"| v4a(["4"])
        v4a -->|"父"| v2a(["2"])
        v2a -->|"父"| v1a(["1"])
    end
    subgraph anc1["anc[v][1]（2¹=2级祖先）"]
        direction TB
        v6b(["6"]) -->|"2级祖先"| v2b(["2"])
        v4b(["4"]) -->|"2级祖先"| v1b(["1"])
    end
    subgraph anc2["anc[v][2]（2²=4级祖先）"]
        direction TB
        v6c(["6"]) -->|"4级祖先"| v1c(["1"])
    end
    anc0 -->|"anc[v][k] = anc[anc[v][k-1]][k-1]"| anc1
    anc1 --> anc2
    style anc2 fill:#f0f4ff,stroke:#4A6CF7

💡 倍增思想： anc[v][k] = v 的 2^k 级祖先 = v 的 2^(k-1) 级祖先的 2^(k-1) 级祖先。查询时将深度差分解为二进制，每次跳 2^k 步，共 O(log N) 步。

5.3.3 Union-Find (Disjoint Set Union)

Union-Find is a data structure that efficiently answers two questions:

Find: Which group does element X belong to?
Union: Merge the groups containing X and Y.

Why is this useful? It efficiently tracks connected components as edges are added one by one, which is used in Kruskal's MST algorithm, detecting cycles, and many USACO problems.

Visual: Union-Find Operations

The diagram shows Union-Find evolving: initially all nodes are separate (each is its own root), then after union(0,1) and union(1,2) a tree forms. Path compression (shown at bottom) flattens the tree so future find() calls are nearly O(1).

Union-Find Structure

This static reference diagram shows the Union-Find tree structure with path compression and union by rank, illustrating how the data structure maintains near-constant time operations.

Basic Implementation

#include <bits/stdc++.h>
using namespace std;

const int MAXN = 100001;
int parent[MAXN];  // parent[x] = parent of x in the tree
int rankArr[MAXN]; // used for union by rank

// Initialize: each element is its own group
void init(int n) {
    for (int i = 1; i <= n; i++) {
        parent[i] = i;      // parent of i is itself
        rankArr[i] = 0;     // initial rank is 0
    }
}

// Find: returns the "representative" (root) of x's group
// Uses PATH COMPRESSION: flattens the tree for future queries
int find(int x) {
    if (parent[x] != x) {
        parent[x] = find(parent[x]);  // path compression!
    }
    return parent[x];
}

// Union: merge groups containing x and y
// Uses UNION BY RANK: attach smaller tree under larger tree
void unite(int x, int y) {
    int px = find(x), py = find(y);
    if (px == py) return;  // already in same group

    // Attach tree with lower rank under tree with higher rank
    if (rankArr[px] < rankArr[py]) swap(px, py);
    parent[py] = px;
    if (rankArr[px] == rankArr[py]) rankArr[px]++;
}

// Check if x and y are in the same group
bool connected(int x, int y) {
    return find(x) == find(y);
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, m;
    cin >> n >> m;
    init(n);

    for (int i = 0; i < m; i++) {
        int u, v;
        cin >> u >> v;
        unite(u, v);
    }

    // Count connected components
    set<int> roots;
    for (int i = 1; i <= n; i++) roots.insert(find(i));
    cout << "Connected components: " << roots.size() << "\n";

    // Answer connectivity queries
    int q;
    cin >> q;
    while (q--) {
        int u, v;
        cin >> u >> v;
        cout << (connected(u, v) ? "YES" : "NO") << "\n";
    }

    return 0;
}

Time complexity: With path compression and union by rank, both find and unite run in nearly O(1) — specifically O(α(n)) where α is the inverse Ackermann function, which is effectively constant for all practical inputs.

Union by Rank vs Union by Size 对比

flowchart LR
    subgraph bad["❌ 不按秩合并（退化为链）"]
        direction TB
        R1(["1"]) --> R2(["2"]) --> R3(["3"]) --> R4(["4"]) --> R5(["5"])
        note_bad["find(5) 需要 5 步\nO(N) 每次查询"]
    end
    subgraph good["✅ 按秩合并（保持树矮"]
        direction TB
        Root(["1"])
        C2(["2"]) 
        C3(["3"])
        C4(["4"])
        C5(["5"])
        Root --> C2
        Root --> C3
        Root --> C4
        Root --> C5
        note_good["find(任意) 只需 2 步\nO(1) 每次查询"]
    end
    bad -->|"按秩合并优化"| good
    style good fill:#dcfce7,stroke:#16a34a
    style bad fill:#fef2f2,stroke:#dc2626

💡 按秩合并规则： 将 rank 较小的树挂到 rank 较大的树下，保证树高不超过 O(log N)。配合路径压缩后，均摊复杂度降至 O(α(N)) ≈ O(1)。

Why Union-Find is Powerful

Compare with BFS/DFS for connectivity queries:

BFS/DFS: O(N+M) per query (rebuilds from scratch)
Union-Find: O(α(N)) per query after O((N+M)α(N)) preprocessing

For Q queries after reading all edges: BFS = O(Q(N+M)) vs DSU = O((N+M+Q)α(N)).

5.3.4 Cycle Detection with DSU

Problem: Given a graph, determine if it has a cycle. If so, report which edge creates a cycle.

DSU 环检测过程示意：

flowchart LR
    subgraph e1["加入边 1-2"]
        direction TB
        A1(["1"]) --- B1(["2"])
        note1["find(1)≠find(2)\n→ 合并，无环"]
    end
    subgraph e2["加入边 2-3"]
        direction TB
        A2(["1"]) --- B2(["2"]) --- C2(["3"])
        note2["find(2)≠find(3)\n→ 合并，无环"]
    end
    subgraph e3["加入边 1-3"]
        direction TB
        A3(["1"]) --- B3(["2"]) --- C3(["3"])
        A3 -.-|"虚线=新边"| C3
        note3["find(1)==find(3)\n→ 已连通，成环！ ⚠️"]
    end
    e1 --> e2 --> e3
    style note3 fill:#fef2f2,stroke:#dc2626
    style e3 fill:#fff1f2,stroke:#fca5a5

💡 核心判断： 加入边 (u, v) 前，若 find(u) == find(v)，说明 u 和 v 已在同一连通分量，加入此边必然成环。

#include <bits/stdc++.h>
using namespace std;

int parent[100001];

int find(int x) {
    if (parent[x] != x) parent[x] = find(parent[x]);
    return parent[x];
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, m;
    cin >> n >> m;
    for (int i = 1; i <= n; i++) parent[i] = i;

    bool hasCycle = false;
    for (int i = 0; i < m; i++) {
        int u, v;
        cin >> u >> v;
        if (find(u) == find(v)) {
            cout << "Cycle created by edge " << u << "-" << v << "\n";
            hasCycle = true;
        } else {
            parent[find(u)] = find(v);  // simple union (no rank for brevity)
        }
    }

    if (!hasCycle) cout << "No cycle\n";
    return 0;
}

5.3.5 Minimum Spanning Tree (Kruskal's Algorithm)

A minimum spanning tree (MST) of a weighted graph connects all vertices with total edge weight minimized, using exactly N-1 edges.

Kruskal's algorithm:

Sort all edges by weight
Process edges in order; add an edge if it connects two different components (using DSU)
Stop when N-1 edges are added

#include <bits/stdc++.h>
using namespace std;

int parent[100001];

int find(int x) {
    if (parent[x] != x) parent[x] = find(parent[x]);
    return parent[x];
}

bool unite(int x, int y) {
    x = find(x); y = find(y);
    if (x == y) return false;  // already connected
    parent[x] = y;
    return true;
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, m;
    cin >> n >> m;
    for (int i = 1; i <= n; i++) parent[i] = i;

    // Read edges as (weight, u, v)
    vector<tuple<int,int,int>> edges(m);
    for (auto &[w, u, v] : edges) cin >> u >> v >> w;

    // Sort by weight
    sort(edges.begin(), edges.end());

    long long totalCost = 0;
    int edgesAdded = 0;

    for (auto [w, u, v] : edges) {
        if (unite(u, v)) {        // connects two different components
            totalCost += w;
            edgesAdded++;
            if (edgesAdded == n - 1) break;  // MST complete
        }
    }

    if (edgesAdded == n - 1) {
        cout << "MST cost: " << totalCost << "\n";
    } else {
        cout << "Graph is disconnected; no MST\n";
    }

    return 0;
}

5.3.6 USACO Example: The Fence

Problem (USACO-style): A farm has N fields and M fences. Each fence connects two fields. Fields in the same connected component form a "pasture." After adding each fence, output the size of the largest pasture.

#include <bits/stdc++.h>
using namespace std;

int parent[100001];
int sz[100001];   // sz[root] = size of component rooted at 'root'

int find(int x) {
    if (parent[x] != x) parent[x] = find(parent[x]);
    return parent[x];
}

void unite(int x, int y) {
    x = find(x); y = find(y);
    if (x == y) return;
    if (sz[x] < sz[y]) swap(x, y);
    parent[y] = x;
    sz[x] += sz[y];
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, m;
    cin >> n >> m;
    for (int i = 1; i <= n; i++) { parent[i] = i; sz[i] = 1; }

    for (int i = 0; i < m; i++) {
        int u, v;
        cin >> u >> v;
        unite(u, v);

        int maxSz = 0;
        for (int j = 1; j <= n; j++) {
            if (find(j) == j) maxSz = max(maxSz, sz[j]);  // only roots have correct sz
        }
        cout << maxSz << "\n";
    }

    return 0;
}

Chapter Summary

📌 Key Takeaways

Technique	Time per Operation	Use Case	Why It Matters
Tree DFS (pre/post order)	`O(N)` total	Subtree sum, depth calc	Foundation for tree DP
Naive LCA	`O(depth)` per query	Small trees	Understanding LCA concept
Binary Lifting LCA	`O(log N)` per query	Large tree path queries	Gold-level core technique
Union-Find find/union	`O(α(N))` ≈ `O(1)`	Dynamic connectivity	Kruskal MST, online connectivity
Kruskal's MST	`O(E log E)`	Minimum spanning tree	Common in USACO Silver/Gold
Euler Tour	`O(N)` preprocessing	Subtree→range query	Combined with Segment Tree for tree problems
Tree Diameter	`O(N)` (two BFS)	Longest path in tree	Common interview/contest problem

❓ FAQ

Q1: Can Union-Find use only one of "path compression" or "union by rank"?

A: Yes. Path compression alone gives amortized O(log N). Union by rank alone gives O(log N). Both together achieve O(α(N)). In contests, at least use path compression (more impactful); union by rank can be simplified to union by size.

Q2: What is the difference between Kruskal and Prim? When to use which?

A: Kruskal sorts edges + DSU, suited for sparse graphs (E ≪ V²), concise code. Prim is like Dijkstra with a priority queue, suited for dense graphs. In contests, use Kruskal 90% of the time.

Q3: What is the difference between Euler Tour and DFS order?

A: Essentially the same. "DFS order" usually refers to in_time and out_time; "Euler Tour" sometimes means the full entry/exit sequence (length 2N). In this book they are the same thing—the key is [in[u], out[u]] corresponds to u's subtree.

Q4: Why can tree diameter be found with "two BFS passes"?

A: Proof: Let the diameter be u→v. Starting BFS from any node s, the farthest node must be u or v (provable by contradiction). Then BFS from that endpoint finds the other endpoint and the diameter length.

Q5: What is the difference between multiset::erase(ms.find(val)) and ms.erase(val)?

A: Not this chapter's content (Chapter 3.8), but related to DSU sz tracking. ms.erase(val) removes all elements equal to val; ms.erase(ms.find(val)) removes only one. When tracking group sizes in DSU, watch for similar "delete one vs delete all" issues.

🔗 Connections to Later Chapters

Chapter 3.9 (Segment Tree) + Euler Tour = efficient subtree queries (update + query both O(log N))
Chapter 6.1 (DP Introduction): Tree DP builds directly on this chapter's tree traversal—postorder DFS aggregates bottom-up
Chapter 4.1 (Greedy): MST is a classic greedy application—Kruskal greedily selects minimum edges
Union-Find is powerful for offline processing—sort all queries/edges first, then add with DSU incrementally
Binary Lifting for LCA is one of the core techniques at USACO Gold level

Practice Problems

Problem 5.3.1 — Subtree Sum Read a rooted tree with values at each node. For each node, output the sum of values in its subtree.

Problem 5.3.2 — Network Components Read a graph. Add edges one by one. After each edge, print the number of connected components.

Problem 5.3.3 — Redundant Edge Read a tree (N nodes, N-1 edges) plus one extra edge that creates a cycle. Find the extra edge. (Hint: use DSU — the edge that unites two already-connected nodes is the answer)

Problem 5.3.4 — Friend Groups N students. Read M pairs of friendships. Friends of friends are also friends (transitivity). Print the number of friend groups and the size of the largest one.

Problem 5.3.5 — USACO 2016 February Silver: Fencing the Cows (Inspired) Read a weighted graph. Find the minimum cost to connect all nodes (MST using Kruskal's). Print the total MST weight, or "IMPOSSIBLE" if the graph is not connected.

5.3.7 Kruskal's MST — Complete Worked Example

Let's trace Kruskal's algorithm on a 6-node graph with 9 edges.

Graph:

Nodes: 0,1,2,3,4,5
Edges (sorted by weight):
  0-1: w=1
  2-3: w=2
  0-2: w=3
  1-3: w=4
  3-4: w=5
  2-4: w=6
  4-5: w=7
  1-4: w=8
  3-5: w=9

Kruskal's Algorithm Trace:

Initial: 6 components {0},{1},{2},{3},{4},{5}

Process edge 0-1 (w=1): find(0)=0, find(1)=1 → DIFFERENT → ACCEPT ✓
  Tree edges: {0-1}. Components: {0,1},{2},{3},{4},{5}

Process edge 2-3 (w=2): find(2)=2, find(3)=3 → DIFFERENT → ACCEPT ✓
  Tree edges: {0-1, 2-3}. Components: {0,1},{2,3},{4},{5}

Process edge 0-2 (w=3): find(0)=root_of_01, find(2)=root_of_23 → DIFFERENT → ACCEPT ✓
  Tree edges: {0-1, 2-3, 0-2}. Components: {0,1,2,3},{4},{5}

Process edge 1-3 (w=4): find(1)=root_of_0123, find(3)=root_of_0123 → SAME → SKIP ✗
  (Adding this would create a cycle: 0-1-3-2-0)

Process edge 3-4 (w=5): find(3)=root_of_0123, find(4)=4 → DIFFERENT → ACCEPT ✓
  Tree edges: {0-1, 2-3, 0-2, 3-4}. Components: {0,1,2,3,4},{5}

Process edge 2-4 (w=6): find(2)=root_of_01234, find(4)=root_of_01234 → SAME → SKIP ✗

Process edge 4-5 (w=7): find(4)=root_of_01234, find(5)=5 → DIFFERENT → ACCEPT ✓
  Tree edges: {0-1, 2-3, 0-2, 3-4, 4-5}.
  edgesAdded = 5 = n-1 = 5. DONE!

MST total weight: 1 + 2 + 3 + 5 + 7 = 18

Kruskal 边选择过程示意：

flowchart LR
    subgraph s0["初始：6个独立分量"]
        direction LR
        n0a([0]) 
        n1a([1])
        n2a([2])
        n3a([3])
        n4a([4])
        n5a([5])
    end
    subgraph s1["加入 0-1(w=1)、2-3(w=2)"]
        direction LR
        n01b(["0——1"]) 
        n23b(["2——3"])
        n4b([4])
        n5b([5])
    end
    subgraph s2["加入 0-2(w=3)，跳过 1-3(w=4)成环"]
        direction LR
        n0123c(["0—1—2—3"])
        n4c([4])
        n5c([5])
    end
    subgraph s3["加入 3-4(w=5)、4-5(w=7)，MST完成"]
        direction LR
        n012345d(["0—1—2—3—4—5"])
    end
    s0 --> s1 --> s2 --> s3
    style s3 fill:#dcfce7,stroke:#16a34a

Complete C++ implementation with worked example:

// Solution: Kruskal's MST — O(E log E)
#include <bits/stdc++.h>
using namespace std;

struct DSU {
    vector<int> parent, rank_;
    DSU(int n) : parent(n), rank_(n, 0) {
        iota(parent.begin(), parent.end(), 0);
    }
    int find(int x) {
        if (parent[x] != x) parent[x] = find(parent[x]);
        return parent[x];
    }
    bool unite(int x, int y) {
        x = find(x); y = find(y);
        if (x == y) return false;
        if (rank_[x] < rank_[y]) swap(x, y);
        parent[y] = x;
        if (rank_[x] == rank_[y]) rank_[x]++;
        return true;
    }
};

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, m;
    cin >> n >> m;

    // Read edges as {weight, u, v}
    vector<tuple<int,int,int>> edges(m);
    for (auto& [w, u, v] : edges) cin >> u >> v >> w;

    // Sort by weight (ascending)
    sort(edges.begin(), edges.end());

    DSU dsu(n);
    long long mstWeight = 0;
    int edgesAdded = 0;
    vector<pair<int,int>> mstEdges;

    for (auto [w, u, v] : edges) {
        if (dsu.unite(u, v)) {           // different components → safe to add
            mstWeight += w;
            mstEdges.push_back({u, v});
            if (++edgesAdded == n - 1) break;  // MST complete (N-1 edges)
        }
    }

    if (edgesAdded < n - 1) {
        cout << "IMPOSSIBLE: graph is disconnected\n";
    } else {
        cout << "MST weight: " << mstWeight << "\n";
        cout << "MST edges:\n";
        for (auto [u, v] : mstEdges) cout << u << " - " << v << "\n";
    }

    return 0;
}

5.3.8 Tree Diameter

The diameter of a tree is the longest path between any two nodes (measured in number of edges, or total weight for weighted trees).

Algorithm (Two BFS/DFS approach):

BFS/DFS from any node u. Find the farthest node v.
BFS/DFS from v. The farthest node from v is one endpoint of the diameter.
The distance found in step 2 is the diameter.

Why does this work? The farthest node from any node is always one endpoint of a diameter.

两次 BFS 求树直径过程示意：

flowchart LR
    subgraph bfs1["第1次 BFS：从任意节点 s=1 出发"]
        direction TB
        S1(["s=1\ndist=0"])
        N2a(["2\ndist=1"])
        N3a(["3\ndist=2"])
        N4a(["4\ndist=3 ← 最远"])
        S1 --> N2a --> N3a --> N4a
        note1["最远节点 = 4\n（直径端点之一）"]
    end
    subgraph bfs2["第2次 BFS：从端点 u=4 出发"]
        direction TB
        U(["u=4\ndist=0"])
        N3b(["3\ndist=1"])
        N2b(["2\ndist=2"])
        N1b(["1\ndist=3"])
        N5b(["5\ndist=4 ← 最远"])
        U --> N3b --> N2b --> N1b --> N5b
        note2["最远节点 = 5\n直径长度 = 4"]
    end
    bfs1 -->|"以最远节点为新起点"| bfs2
    style N4a fill:#dbeafe,stroke:#3b82f6
    style N5b fill:#dcfce7,stroke:#16a34a
    style note2 fill:#dcfce7,stroke:#16a34a

💡 正确性证明（反证法）： 设真正的直径端点为 p, q。从任意节点 s 出发，最远节点 u 必为 p 或 q（若不是，则 u 到 p 或 q 的距离更长，矛盾）。从 u 出发的最远节点即为另一端点，距离即为直径。

// Solution: Tree Diameter (Two BFS) — O(N)
#include <bits/stdc++.h>
using namespace std;

const int MAXN = 100001;
vector<pair<int,int>> adj[MAXN];  // {neighbor, edge_weight}

// BFS from src, returns {farthest_node, farthest_distance}
pair<int,int> bfsFarthest(int src, int n) {
    vector<int> dist(n + 1, -1);
    queue<int> q;
    dist[src] = 0;
    q.push(src);
    int farthest = src;

    while (!q.empty()) {
        int u = q.front(); q.pop();
        for (auto [v, w] : adj[u]) {
            if (dist[v] == -1) {
                dist[v] = dist[u] + w;
                q.push(v);
                if (dist[v] > dist[farthest]) farthest = v;
            }
        }
    }
    return {farthest, dist[farthest]};
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;
    for (int i = 0; i < n - 1; i++) {
        int u, v, w;
        cin >> u >> v >> w;
        adj[u].push_back({v, w});
        adj[v].push_back({u, w});
    }

    // Step 1: BFS from node 1, find farthest node u
    auto [u, _] = bfsFarthest(1, n);

    // Step 2: BFS from u, find farthest node v and its distance
    auto [v, diameter] = bfsFarthest(u, n);

    cout << "Diameter: " << diameter << "\n";
    cout << "Endpoints: " << u << " and " << v << "\n";

    return 0;
}

Unweighted version: Set all edge weights to 1, or use a simpler BFS that counts hops.

5.3.9 Lowest Common Ancestor (LCA) — Concept

The Lowest Common Ancestor (LCA) of two nodes u and v in a rooted tree is the deepest node that is an ancestor of both u and v.

Naive LCA (O(depth) per query): Walk both nodes up to the same depth, then walk together until they meet.

// Naive LCA — O(depth) per query, depth can be O(N) worst case
int lca_naive(int u, int v, int* depth, int* parent) {
    // Equalize depths
    while (depth[u] > depth[v]) u = parent[u];
    while (depth[v] > depth[u]) v = parent[v];
    // Now same depth — walk up together
    while (u != v) {
        u = parent[u];
        v = parent[v];
    }
    return u;
}

Binary Lifting LCA (O(log N) per query, O(N log N) preprocessing):

Store anc[v][k] = 2^k-th ancestor of v.

const int LOG = 17;  // log2(10^5) ≈ 17
int anc[MAXN][LOG];  // anc[v][k] = 2^k-th ancestor of v
int depth_arr[MAXN];

void preprocess(int root, int n) {
    // DFS to compute depths and anc[v][0] = direct parent
    // Then: anc[v][k] = anc[anc[v][k-1]][k-1]
    // (2^k-th ancestor = 2^(k-1)-th ancestor of 2^(k-1)-th ancestor)
    for (int v = 1; v <= n; v++)
        for (int k = 1; k < LOG; k++)
            anc[v][k] = anc[anc[v][k-1]][k-1];
}

int lca(int u, int v) {
    if (depth_arr[u] < depth_arr[v]) swap(u, v);
    int diff = depth_arr[u] - depth_arr[v];
    // Lift u up by diff levels using binary lifting
    for (int k = 0; k < LOG; k++)
        if ((diff >> k) & 1) u = anc[u][k];
    if (u == v) return u;
    // Now same depth — binary lift both
    for (int k = LOG - 1; k >= 0; k--)
        if (anc[u][k] != anc[v][k]) {
            u = anc[u][k];
            v = anc[v][k];
        }
    return anc[u][0];
}

💡 When to use LCA: Path queries on trees (e.g., "sum of values on path from u to v"), distance queries between nodes, finding "meeting points" on tree paths.

5.3.10 Euler Tour of Tree

An Euler tour flattens a tree into a linear array, enabling range queries on subtrees using regular array data structures (e.g., segment tree).

Idea: Record entry and exit times for each node during DFS. The subtree of node u corresponds to the contiguous range [in[u], out[u]] in the Euler tour array.

// Euler Tour / Heavy-Light Decomposition preprocessing
#include <bits/stdc++.h>
using namespace std;

const int MAXN = 100001;
vector<int> children[MAXN];
int in_time[MAXN], out_time[MAXN], timer_val = 0;
int euler_arr[MAXN];  // euler_arr[in_time[v]] = val[v]
int val[MAXN];        // value at each node

void dfs_euler(int u) {
    in_time[u] = ++timer_val;      // entry time
    euler_arr[timer_val] = val[u]; // record value in euler array

    for (int v : children[u]) {
        dfs_euler(v);
    }

    out_time[u] = timer_val;       // exit time (same as in_time for leaf)
}

int main() {
    int n;
    cin >> n;
    for (int i = 2; i <= n; i++) {
        int p; cin >> p;
        children[p].push_back(i);
    }
    for (int i = 1; i <= n; i++) cin >> val[i];

    dfs_euler(1);

    // Now: subtree of node u = euler_arr[in_time[u]..out_time[u]]
    // Use a segment tree or prefix sums on euler_arr for subtree queries!

    // Example: sum of values in subtree of node 3
    // answer = sum(euler_arr[in_time[3]..out_time[3]])
    cout << "Subtree of 3 covers indices: "
         << in_time[3] << " to " << out_time[3] << "\n";

    return 0;
}

Euler Tour in/out 时间戳示意：

flowchart TD
    subgraph tree["树结构"]
        R(["1\nin=1, out=7"])
        A(["2\nin=2, out=4"])
        B(["5\nin=5, out=7"])
        C(["3\nin=3, out=3"])
        D(["4\nin=4, out=4"])
        E(["6\nin=6, out=6"])
        F(["7\nin=7, out=7"])
        R --> A
        R --> B
        A --> C
        A --> D
        B --> E
        B --> F
    end
    subgraph arr["欧拉序列数组"]
        direction LR
        P1["[1]\n节点1"]
        P2["[2]\n节点2"]
        P3["[3]\n节点3"]
        P4["[4]\n节点4"]
        P5["[5]\n节点5"]
        P6["[6]\n节点6"]
        P7["[7]\n节点7"]
        P1 --- P2 --- P3 --- P4 --- P5 --- P6 --- P7
    end
    tree -->|"子树 2 = 区间 [2,4]"| arr
    style P2 fill:#dbeafe,stroke:#3b82f6
    style P3 fill:#dbeafe,stroke:#3b82f6
    style P4 fill:#dbeafe,stroke:#3b82f6

💡 关键性质： 节点 u 的子树对应欧拉序列中的连续区间 [in[u], out[u]]。子树查询变为区间查询，配合线段树可实现 O(log N) 的子树更新和查询。

Updated DSU Diagram

DSU Union-Find

The diagram shows all three key DSU operations: isolated nodes, trees after unions, and path compression side-by-side.

📖 Chapter 5.4 ⏱️ ~80 min read 🎯 Advanced Graph Greedy DP

Chapter 5.4: Shortest Paths

Prerequisites This chapter requires: Chapter 5.1 (Introduction to Graphs) — adjacency list representation, BFS. Chapter 5.2 (BFS & DFS) — BFS for shortest paths in unweighted graphs. Chapter 3.1 (STL) — priority_queue, vector. Make sure you understand how BFS works before reading about Dijkstra.

Finding the shortest path between nodes is one of the most fundamental problems in graph theory. It appears in GPS navigation, network routing, game AI, and — most importantly for us — USACO problems. This chapter covers four algorithms (Dijkstra, Bellman-Ford, Floyd-Warshall, SPFA) and explains when to use each.

5.4.1 Problem Definition

Single-Source Shortest Path (SSSP)

Given a weighted graph G = (V, E) and a source node s, find the shortest distance from s to every other node.

SSSP Example Graph

From source A:

dist[A] = 0
dist[B] = 1
dist[C] = 5
dist[D] = 5 (A→B→D = 1+4)
dist[E] = 8 (A→B→D→E = 1+4+3)

Multi-Source Shortest Path (APSP)

Find shortest distances between all pairs of nodes. Used when you need distances from multiple sources, or between every pair.

Why Not Just BFS?

BFS finds shortest path in unweighted graphs (each edge = distance 1). With weights:

Some paths have many short-weight edges
Others have few large-weight edges
BFS ignores weights entirely → wrong answer

5.4.2 Dijkstra's Algorithm

The most important shortest path algorithm. Used in ~90% of USACO problems involving weighted shortest paths.

Time

O((V+E) log V)

Space

O(V + E)

Constraint

Non-negative weights

Type

Single-Source

Core Idea: Greedy + Priority Queue

Dijkstra is a greedy algorithm:

Maintain a set of "settled" nodes (shortest distance finalized)
Always process the unvisited node with smallest current distance next
When processing a node, try to relax its neighbors (update their distances if we found a shorter path)

Why greedy works: If all edge weights are non-negative, the node currently at minimum distance cannot be improved by going through any other node (all alternatives would be ≥ current distance).

Step-by-Step Trace

Dijkstra Trace Graph

Start: node 0 | Initial: dist = [0, ∞, ∞, ∞, ∞]

Step	Process Node	Relaxations	dist array	Queue
1	node 0 (dist=0)	0→1: min(∞, 0+4)=4; 0→2: min(∞, 0+2)=2; 0→3: min(∞, 0+5)=5	[0, 4, 2, 5, ∞]	{(2,2),(4,1),(5,3)}
2	node 2 (dist=2)	2→3: min(5, 2+1)=3 ← improved!	[0, 4, 2, 3, ∞]	{(3,3),(4,1),(5,3_old)}
3	node 3 (dist=3)	3→1: min(4, 3+1)=4 (no change); 3→4: min(∞, 3+3)=6	[0, 4, 2, 3, 6]	{(4,1),(6,4),(5,3_old)}
4	node 1 (dist=4)	No relaxation possible	[0, 4, 2, 3, 6]	{(6,4)}
5	node 4 (dist=6)	Done!	[0, 4, 2, 3, 6]	{}

Final: dist = [0, 4, 2, 3, 6]

Complete Dijkstra Implementation

// Solution: Dijkstra's Algorithm with Priority Queue — O((V+E) log V)
#include <bits/stdc++.h>
using namespace std;

typedef pair<int, int> pii;   // {distance, node}
typedef long long ll;

const ll INF = 1e18;          // use long long to avoid int overflow!
const int MAXN = 100005;

// Adjacency list: adj[u] = list of {weight, v}
vector<pii> adj[MAXN];

vector<ll> dijkstra(int src, int n) {
    vector<ll> dist(n + 1, INF);   // dist[i] = shortest distance to node i
    dist[src] = 0;
    
    // Min-heap: {distance, node}
    // C++ priority_queue is max-heap by default, so negate to make min-heap
    priority_queue<pii, vector<pii>, greater<pii>> pq;
    pq.push({0, src});
    
    while (!pq.empty()) {
        auto [d, u] = pq.top(); pq.pop();  // get node with minimum distance
        
        // KEY: Skip if we've already found a better path to u
        // (outdated entry in the priority queue)
        if (d > dist[u]) continue;
        
        // Relax all neighbors of u
        for (auto [w, v] : adj[u]) {
            ll newDist = dist[u] + w;
            if (newDist < dist[v]) {
                dist[v] = newDist;          // update distance
                pq.push({newDist, v});       // add updated entry to queue
            }
        }
    }
    return dist;
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    
    int n, m;
    cin >> n >> m;
    
    for (int i = 0; i < m; i++) {
        int u, v, w;
        cin >> u >> v >> w;
        adj[u].push_back({w, v});
        adj[v].push_back({w, u});  // undirected graph
    }
    
    int src;
    cin >> src;
    
    vector<ll> dist = dijkstra(src, n);
    
    for (int i = 1; i <= n; i++) {
        if (dist[i] == INF) cout << -1 << "\n";
        else cout << dist[i] << "\n";
    }
    
    return 0;
}

Reconstructing the Shortest Path

路径回溯过程示意：

flowchart LR
    subgraph fwd["正向：Dijkstra 运行时记录 prev_node"]
        direction LR
        S(["src=0"]) -->|"w=2"| C(["2"])
        C -->|"w=1"| D(["3"])
        D -->|"w=1"| B(["1"])
        D -->|"w=3"| E(["4"])
        note_fwd["prev[2]=0, prev[3]=2\nprev[1]=3, prev[4]=3"]
    end
    subgraph back["回溯：从终点倒推到起点"]
        direction RL
        E2(["4"]) -->|"prev[4]=3"| D2(["3"])
        D2 -->|"prev[3]=2"| C2(["2"])
        C2 -->|"prev[2]=0"| S2(["0"])
        note_back["倒序路径: 4→3→2→0\n翻转后: 0→2→3→4"]
    end
    fwd -->|"回溯重建"| back
    style note_back fill:#dcfce7,stroke:#16a34a

💡 实现要点： 记录 prev_node[v] = u 表示“到达 v 的最短路径上，v 的前一个节点是 u”。回溯时从终点不断跟随 prev_node 直到起点，再翻转即得正序路径。

// Solution: Dijkstra with Path Reconstruction
vector<int> prev_node(MAXN, -1);  // prev_node[v] = previous node on shortest path to v

vector<ll> dijkstraWithPath(int src, int n) {
    vector<ll> dist(n + 1, INF);
    dist[src] = 0;
    priority_queue<pii, vector<pii>, greater<pii>> pq;
    pq.push({0, src});
    
    while (!pq.empty()) {
        auto [d, u] = pq.top(); pq.pop();
        if (d > dist[u]) continue;
        
        for (auto [w, v] : adj[u]) {
            if (dist[u] + w < dist[v]) {
                dist[v] = dist[u] + w;
                prev_node[v] = u;       // track where we came from
                pq.push({dist[v], v});
            }
        }
    }
    return dist;
}

// Reconstruct path from src to dst
vector<int> getPath(int src, int dst) {
    vector<int> path;
    for (int v = dst; v != -1; v = prev_node[v]) {
        path.push_back(v);
    }
    reverse(path.begin(), path.end());
    return path;
}

Common Mistake — Missing stale check

// BAD: Processes stale entries in queue
while (!pq.empty()) {
    auto [d, u] = pq.top(); pq.pop();
    // NO CHECK for d > dist[u]!
    // Will re-process nodes with outdated distances
    // Still correct, but O(E log E) instead of O(E log V)
    for (auto [w, v] : adj[u]) {
        if (d + w < dist[v]) {
            dist[v] = d + w;
            pq.push({dist[v], v});
        }
    }
}

Correct — Skip stale entries

// GOOD: Skip outdated priority queue entries
while (!pq.empty()) {
    auto [d, u] = pq.top(); pq.pop();
    if (d > dist[u]) continue;  // ← stale entry, skip!
    
    for (auto [w, v] : adj[u]) {
        if (dist[u] + w < dist[v]) {
            dist[v] = dist[u] + w;
            pq.push({dist[v], v});
        }
    }
}

Key Points for Dijkstra

🚫 CRITICAL: Dijkstra does NOT work with negative edge weights! If any edge weight is negative, Dijkstra may produce incorrect results. The algorithm's correctness relies on the greedy assumption that once a node is settled (popped from the priority queue), its distance is final — negative edges break this assumption. For graphs with negative weights, use Bellman-Ford or SPFA instead.

Only works with non-negative weights. Negative edges break the greedy assumption (see warning above).
Use long long for distances when edge weights can be large. dist[u] + w can overflow int.
Use greater<pii> to make priority_queue a min-heap.
The if (d > dist[u]) continue; check is essential for correctness and performance.

5.4.3 Bellman-Ford Algorithm

When edges can have negative weights, Dijkstra fails. Bellman-Ford handles negative weights — and even detects negative cycles.

Time

O(V × E)

Negative Edges

✓ Supported

Neg. Cycle

✓ Detectable

Type

Single-Source

Core Idea: Relaxation V-1 Times

Key insight: any shortest path in a graph with V nodes uses at most V-1 edges (no repeated nodes). So if we relax ALL edges V-1 times, we're guaranteed to find the correct shortest paths.

Algorithm:
1. dist[src] = 0, dist[all others] = INF
2. Repeat V-1 times:
   For every edge (u, v, w):
     if dist[u] + w < dist[v]:
       dist[v] = dist[u] + w   (relax!)
3. Check for negative cycles:
   If ANY edge can still be relaxed → negative cycle exists!

Bellman-Ford Relaxation Process (5 nodes, 6 edges):

flowchart LR
    subgraph iter0["初始状态"]
        direction LR
        A0(["A\ndist=0"]) 
        B0(["B\ndist=∞"])
        C0(["C\ndist=∞"])
        D0(["D\ndist=∞"])
    end
    subgraph iter1["第1轮松弛"]
        direction LR
        A1(["A\ndist=0"])
        B1(["B\ndist=2"])
        C1(["C\ndist=∞→5"])
        D1(["D\ndist=∞"])
    end
    subgraph iter2["第2轮松弛"]
        direction LR
        A2(["A\ndist=0"])
        B2(["B\ndist=2"])
        C2(["C\ndist=5→4"])
        D2(["D\ndist=∞→7"])
    end
    subgraph iter3["第3轮松弛（收敛）"]
        direction LR
        A3(["A\ndist=0"])
        B3(["B\ndist=2"])
        C3(["C\ndist=4"])
        D3(["D\ndist=7→6"])
    end
    iter0 -->|"处理边 A→B(2), A→C(5)"| iter1
    iter1 -->|"处理边 B→C(-1), C→D(2)"| iter2
    iter2 -->|"处理边 B→D(4)"| iter3

💡 关键观察： 每轮松弛后，至少有一个节点的最短距离被确定。V-1 轮后所有节点的最短距离均已确定（前提：无负环）。

Bellman-Ford Implementation

// Solution: Bellman-Ford Algorithm — O(V * E)
#include <bits/stdc++.h>
using namespace std;

typedef long long ll;
typedef tuple<int, int, int> Edge;  // {from, to, weight}

const ll INF = 1e18;

// Returns shortest distances, or empty if negative cycle detected
vector<ll> bellmanFord(int src, int n, vector<Edge>& edges) {
    vector<ll> dist(n + 1, INF);
    dist[src] = 0;
    
    // Relax all edges V-1 times
    for (int iter = 0; iter < n - 1; iter++) {
        bool updated = false;
        for (auto [u, v, w] : edges) {
            if (dist[u] != INF && dist[u] + w < dist[v]) {
                dist[v] = dist[u] + w;
                updated = true;
            }
        }
        if (!updated) break;  // early termination: already converged
    }
    
    // Check for negative cycles (one more relaxation pass)
    for (auto [u, v, w] : edges) {
        if (dist[u] != INF && dist[u] + w < dist[v]) {
            // Negative cycle reachable from source!
            return {};  // signal: negative cycle exists
        }
    }
    
    return dist;
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    
    int n, m;
    cin >> n >> m;
    
    vector<Edge> edges;
    for (int i = 0; i < m; i++) {
        int u, v, w;
        cin >> u >> v >> w;
        edges.push_back({u, v, w});
        // For undirected: also add {v, u, w}
    }
    
    int src;
    cin >> src;
    
    vector<ll> dist = bellmanFord(src, n, edges);
    
    if (dist.empty()) {
        cout << "Negative cycle detected!\n";
    } else {
        for (int i = 1; i <= n; i++) {
            cout << (dist[i] == INF ? -1 : dist[i]) << "\n";
        }
    }
    return 0;
}

Why Bellman-Ford Works

After k iterations of the outer loop, dist[v] contains the shortest path from src to v using at most k edges. After V-1 iterations, all shortest paths (which use at most V-1 edges in a cycle-free graph) are found.

Negative Cycle Detection: A negative cycle means you can keep decreasing distance indefinitely. If the V-th relaxation still improves a distance, that node is on or reachable from a negative cycle.

5.4.4 Floyd-Warshall Algorithm

For finding shortest paths between all pairs of nodes.

Time

O(V³)

Space

O(V²)

Negative Edges

✓ Supported

Type

All-Pairs

Core Idea: DP Through Intermediate Nodes

dp[k][i][j] = shortest distance from i to j using only nodes {1, 2, ..., k} as intermediate nodes.

Recurrence:

dp[k][i][j] = min(dp[k-1][i][j],          // don't use node k
                   dp[k-1][i][k] + dp[k-1][k][j])  // use node k

Since we only need the previous layer, we can collapse to 2D:

// Solution: Floyd-Warshall All-Pairs Shortest Path — O(V^3)
#include <bits/stdc++.h>
using namespace std;

typedef long long ll;
const ll INF = 1e18;
const int MAXV = 505;

ll dist[MAXV][MAXV];  // dist[i][j] = shortest distance from i to j

void floydWarshall(int n) {
    // ⚠️ CRITICAL: k MUST be the OUTERMOST loop!
    // Invariant: after processing k, dist[i][j] = shortest path from i to j
    //            using only nodes {1..k} as intermediates.
    // If k were inner, dist[i][k] or dist[k][j] might not yet reflect all
    // intermediate nodes up to k-1, breaking the DP correctness.
    for (int k = 1; k <= n; k++) {        // ← OUTER: intermediate node
        for (int i = 1; i <= n; i++) {    // ← MIDDLE: source
            for (int j = 1; j <= n; j++) { // ← INNER: destination
                // Can we go i→k→j faster than i→j directly?
                if (dist[i][k] != INF && dist[k][j] != INF) {
                    dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j]);
                }
            }
        }
    }
    // After Floyd-Warshall, dist[i][i] < 0 iff node i is on a negative cycle
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    
    int n, m;
    cin >> n >> m;
    
    // Initialize: distance to self = 0, all others = INF
    for (int i = 1; i <= n; i++)
        for (int j = 1; j <= n; j++)
            dist[i][j] = (i == j) ? 0 : INF;
    
    // Read edges
    for (int i = 0; i < m; i++) {
        int u, v; ll w;
        cin >> u >> v >> w;
        dist[u][v] = min(dist[u][v], w);  // handle multiple edges
        dist[v][u] = min(dist[v][u], w);  // undirected
    }
    
    floydWarshall(n);
    
    // Query: shortest path from u to v
    int q; cin >> q;
    while (q--) {
        int u, v; cin >> u >> v;
        cout << (dist[u][v] == INF ? -1 : dist[u][v]) << "\n";
    }
    return 0;
}

Floyd-Warshall Complexity

Time: O(V³) — three nested loops, each running V times
Space: O(V²) — the 2D distance array
Practical limit: V ≤ 500 or so (500³ = 1.25 × 10⁸ is borderline)
For V > 1000, use Dijkstra from each source: O(V × (V+E) log V)

Floyd-Warshall DP 状态转移示意：

flowchart LR
    subgraph before["引入节点 k 之前"]
        i1([i]) -->|"dist[i][j]"| j1([j])
    end
    subgraph after["引入节点 k 之后"]
        i2([i]) -->|"dist[i][k]"| k2([k])
        k2 -->|"dist[k][j]"| j2([j])
        i2 -.->|"min(dist[i][j],\ndist[i][k]+dist[k][j])"| j2
    end
    before -->|"k 作为中间节点"| after

💡 为什么 k 必须是最外层循环？ 当处理中间节点 k 时，dist[i][k] 和 dist[k][j] 必须已经基于 {1..k-1} 完全计算好。若 k 在内层，这些值可能在同一轮中被修改，破坏 DP 的正确性。

5.4.5 Algorithm Comparison Table

Algorithm	Time Complexity	Negative Edges	Negative Cycles	Multi-Source	Best For
BFS	`O(V + E)`	✗ No	✗ No	✓ Yes (multi-source BFS)	Unweighted graphs
Dijkstra	`O((V+E) log V)`	✗ No	✗ No	✗ (run once per source)	Weighted, non-negative edges
Bellman-Ford	`O(V × E)`	✓ Yes	✓ Detects	✗	Negative edges, detecting neg cycles
SPFA	`O(V × E)` worst, `O(E)` avg	✓ Yes	✓ Detects	✗	Sparse graphs with neg edges
Floyd-Warshall	`O(V³)`	✓ Yes	✓ Detects (diag)	✓ Yes (all pairs)	Dense graphs, all-pairs queries

When to Use Which?

flowchart TD
    Start(["图中有负权边？"])
    Start -->|"是"| NegEdge["Bellman-Ford 或 SPFA\n或 Floyd-Warshall（全对）"]
    Start -->|"否"| NoNeg["V ≤ 500 且需要全对最短路？"]
    NoNeg -->|"是"| Floyd["Floyd-Warshall\nO(V³)"]
    NoNeg -->|"否"| Unweighted["无权图（边权=1）？"]
    Unweighted -->|"是"| BFS["BFS\nO(V+E)"]
    Unweighted -->|"否"| ZeroOne["边权只有 0 或 1？"]
    ZeroOne -->|"是"| BFS01["0-1 BFS\nO(V+E)"]
    ZeroOne -->|"否"| Dijkstra["Dijkstra\nO((V+E) log V)"]

    style NegEdge fill:#fef3c7,stroke:#d97706
    style Floyd fill:#dbeafe,stroke:#3b82f6
    style BFS fill:#dcfce7,stroke:#16a34a
    style BFS01 fill:#dcfce7,stroke:#16a34a
    style Dijkstra fill:#f0f4ff,stroke:#4A6CF7

5.4.6 SPFA — Bellman-Ford with Queue Optimization

SPFA (Shortest Path Faster Algorithm) is an optimized Bellman-Ford that only adds a node to the queue when its distance is updated, avoiding redundant relaxations.

Worst Time

O(V × E)

Average Time

O(E) in practice

Neg. Edges

✓ Handled

⚠️ SPFA Worst Case: SPFA's worst-case time complexity is O(V × E) — identical to plain Bellman-Ford. On adversarially constructed graphs (common in competitive programming "anti-SPFA" test cases), SPFA degrades to O(VE) and may TLE. A node can enter the queue up to V times; with E edges processed per queue entry, the total is O(VE). In most random/practical cases it's fast (O(E) average), but for USACO, prefer Dijkstra when all weights are non-negative.

// Solution: SPFA (Bellman-Ford + Queue Optimization)
#include <bits/stdc++.h>
using namespace std;
typedef pair<int,int> pii;
typedef long long ll;

const ll INF = 1e18;
const int MAXN = 100005;
vector<pii> adj[MAXN];

vector<ll> spfa(int src, int n) {
    vector<ll> dist(n + 1, INF);
    vector<bool> inQueue(n + 1, false);
    vector<int> cnt(n + 1, 0);   // cnt[v] = number of times v entered queue
    
    queue<int> q;
    dist[src] = 0;
    q.push(src);
    inQueue[src] = true;
    
    while (!q.empty()) {
        int u = q.front(); q.pop();
        inQueue[u] = false;
        
        for (auto [w, v] : adj[u]) {
            if (dist[u] + w < dist[v]) {
                dist[v] = dist[u] + w;
                
                if (!inQueue[v]) {
                    q.push(v);
                    inQueue[v] = true;
                    cnt[v]++;
                    
                    // Negative cycle detection: if a node enters queue >= n times
                    // (a node can enter at most n-1 times without a neg cycle;
                    //  using > n is also safe but detects one step later)
                    if (cnt[v] >= n) return {};  // negative cycle!
                }
            }
        }
    }
    return dist;
}

5.4.7 BFS as Dijkstra for Unweighted Graphs

When all edge weights are 1 (unweighted graph), BFS is exactly Dijkstra with a simple queue:

Dijkstra's priority queue naturally processes nodes in order of distance
In an unweighted graph, all edges have weight 1, so nodes at distance d are processed before distance d+1
BFS naturally explores level-by-level, which is exactly "by distance"

// Solution: BFS for Unweighted Shortest Path — O(V + E)
// Equivalent to Dijkstra when all weights = 1
vector<int> bfsShortestPath(int src, int n) {
    vector<int> dist(n + 1, -1);
    queue<int> q;
    
    dist[src] = 0;
    q.push(src);
    
    while (!q.empty()) {
        int u = q.front(); q.pop();
        
        for (auto [w, v] : adj[u]) {
            if (dist[v] == -1) {       // unvisited
                dist[v] = dist[u] + 1; // all weights = 1
                q.push(v);
            }
        }
    }
    return dist;
}

Why is BFS correct for unweighted graphs? Because BFS explores nodes in strictly increasing order of their distance. The first time you reach a node v, you've found the shortest path (fewest edges = minimum distance when all weights are 1).

0-1 BFS: A powerful trick when edge weights are only 0 or 1 (use deque instead of queue):

0-1 BFS 双端队列操作示意：

flowchart LR
    subgraph dq["双端队列 deque 状态"]
        direction LR
        Front[["队首（小距离）"]] --- M1["..."] --- M2["..."] --- Back[["队尾（大距离）"]]
    end
    subgraph rule["入队规则"]
        direction TB
        W0["边权 w=0\n→ push_front（加队首）"] 
        W1["边权 w=1\n→ push_back（加队尾）"]
    end
    subgraph why["为什么正确？"]
        direction TB
        Exp["队首始终是当前最小距离的节点\nw=0 边不增加距离，应与当前节点同优先级\nw=1 边增加距离，排到队尾等待"]
    end
    rule --> dq
    dq --> why
    style W0 fill:#dbeafe,stroke:#3b82f6
    style W1 fill:#fef3c7,stroke:#d97706

💡 效率对比： 0-1 BFS 为 O(V+E)，比 Dijkstra 的 O((V+E) log V) 更快。当边权只有 0 和 1 时，优先选用此方法。

// Solution: 0-1 BFS — O(V + E), handles {0,1} weight edges
vector<int> bfs01(int src, int n) {
    vector<int> dist(n + 1, INT_MAX);
    deque<int> dq;
    
    dist[src] = 0;
    dq.push_front(src);
    
    while (!dq.empty()) {
        int u = dq.front(); dq.pop_front();
        
        for (auto [w, v] : adj[u]) {
            if (dist[u] + w < dist[v]) {
                dist[v] = dist[u] + w;
                if (w == 0) dq.push_front(v);   // 0-weight: add to front
                else        dq.push_back(v);    // 1-weight: add to back
            }
        }
    }
    return dist;
}

5.4.8 USACO Example: Farm Tours

Problem Statement (USACO 2003 Style)

Farmer John wants to take a round trip: travel from farm 1 to farm N, then return from N to farm 1, using no road twice. Roads are bidirectional. Find the minimum total distance of such a round trip.

Constraints: N ≤ 1000, M ≤ 10,000, weights ≤ 1000.

Input Format:

N M
u1 v1 w1
u2 v2 w2
...

Analysis:

We need to go 1→N and N→1 without repeating any edge
Key insight: this equals finding two edge-disjoint paths from 1 to N with minimum total cost
Alternative insight: the "return trip" N→1 is just another path 1→N in the original graph
Simplification for this problem: Find the shortest path from 1 to N twice, but with different edges

For this USACO-style problem, a simpler interpretation: since roads are bidirectional and we can use each road at most once in each direction, find:

Shortest path 1→N
Shortest path N→1 (using possibly different roads)
These can be found independently with Dijkstra

But the real challenge: "using no road twice" means globally, not just per direction.

Greedy approach for this version: Find shortest path 1→N, then find shortest path on remaining graph N→1. This greedy doesn't always work, but for USACO Bronze/Silver, many problems simplify to just running Dijkstra twice.

// Solution: Farm Tours — Two Dijkstra (simplified version)
// Run Dijkstra from both endpoints, find min round-trip distance
#include <bits/stdc++.h>
using namespace std;

typedef long long ll;
typedef pair<ll, int> pli;
const ll INF = 1e18;
const int MAXN = 1005;

vector<pair<int,int>> adj[MAXN];  // {weight, dest}

vector<ll> dijkstra(int src, int n) {
    vector<ll> dist(n + 1, INF);
    priority_queue<pli, vector<pli>, greater<pli>> pq;
    dist[src] = 0;
    pq.push({0, src});
    
    while (!pq.empty()) {
        auto [d, u] = pq.top(); pq.pop();
        if (d > dist[u]) continue;
        
        for (auto [w, v] : adj[u]) {
            if (dist[u] + w < dist[v]) {
                dist[v] = dist[u] + w;
                pq.push({dist[v], v});
            }
        }
    }
    return dist;
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    
    int n, m;
    cin >> n >> m;
    
    for (int i = 0; i < m; i++) {
        int u, v, w;
        cin >> u >> v >> w;
        adj[u].push_back({w, v});
        adj[v].push_back({w, u});  // bidirectional
    }
    
    // Run Dijkstra from farm 1 and farm N
    vector<ll> distFrom1 = dijkstra(1, n);
    vector<ll> distFromN = dijkstra(n, n);
    
    // Find intermediate farm that minimizes: dist(1,k) + dist(k,N) + dist(N,k) + dist(k,1)
    // = 2 * (dist(1,k) + dist(k,N)) ... but this is just going via k twice
    
    // Simplest: answer is distFrom1[n] + distFromN[1]
    // (Go 1→N one way, return N→1 by shortest path — may reuse edges)
    ll answer = distFrom1[n] + distFromN[1];
    
    if (answer >= INF) cout << "NO VALID TRIP\n";
    else cout << answer << "\n";
    
    // For the "no road reuse" constraint, see flow algorithms (beyond Silver)
    
    return 0;
}

💡 Extended: Finding Two Edge-Disjoint Paths

The true "no road reuse" version requires min-cost flow (a Gold+ topic). The key insight is:

Model each undirected edge as two directed edges with capacity 1
Find min-cost flow of 2 units from node 1 to node N
This equals two edge-disjoint paths with minimum total cost

For USACO Silver, you'll rarely need min-cost flow — the simpler Dijkstra approach suffices.

5.4.9 Dijkstra on Grids

Many USACO problems involve grid-based shortest paths. The graph is implicit:

// Solution: Dijkstra on Grid — find shortest path from (0,0) to (R-1,C-1)
// Each cell has a "cost" to enter
#include <bits/stdc++.h>
using namespace std;

typedef long long ll;
typedef tuple<ll,int,int> tli;

const ll INF = 1e18;
int dx[] = {0,0,1,-1};
int dy[] = {1,-1,0,0};

ll dijkstraGrid(vector<vector<int>>& grid) {
    int R = grid.size(), C = grid[0].size();
    vector<vector<ll>> dist(R, vector<ll>(C, INF));
    priority_queue<tli, vector<tli>, greater<tli>> pq;
    
    dist[0][0] = grid[0][0];
    pq.push({grid[0][0], 0, 0});
    
    while (!pq.empty()) {
        auto [d, r, c] = pq.top(); pq.pop();
        if (d > dist[r][c]) continue;
        
        for (int k = 0; k < 4; k++) {
            int nr = r + dx[k], nc = c + dy[k];
            if (nr < 0 || nr >= R || nc < 0 || nc >= C) continue;
            
            ll newDist = dist[r][c] + grid[nr][nc];
            if (newDist < dist[nr][nc]) {
                dist[nr][nc] = newDist;
                pq.push({newDist, nr, nc});
            }
        }
    }
    return dist[R-1][C-1];
}

⚠️ Common Mistakes — The Dirty Five

Mistake 1 — Int overflow

// BAD: int overflow when adding large distances
vector<int> dist(n+1, 1e9);  // use int

// dist[u] = 9×10^8, w = 9×10^8
// dist[u] + w overflows int!
if (dist[u] + w < dist[v]) { ... }

Fix — Use long long

// GOOD: always use long long for distances
const ll INF = 1e18;
vector<ll> dist(n+1, INF);

// No overflow with long long (max ~9.2×10^18)
if (dist[u] + w < dist[v]) { ... }

Mistake 2 — Wrong priority queue direction

// BAD: This is a MAX-heap, not min-heap!
priority_queue<pii> pq;   // default is max-heap
pq.push({dist[v], v});
// Will process FARTHEST node first — wrong!

Fix — Use greater

// GOOD: explicitly specify min-heap
priority_queue<pii, vector<pii>, greater<pii>> pq;
pq.push({dist[v], v});
// Now processes NEAREST node first ✓

5 Classic Dijkstra Bugs:

Using int instead of long long — distance sum overflows → wrong answers silently
Max-heap instead of min-heap — forgetting greater<pii> → processes wrong node first
Missing stale entry check (if (d > dist[u]) continue) → not wrong but ~10x slower
Forgetting dist[src] = 0 — all distances remain INF
Using Dijkstra with negative edges — undefined behavior, may loop infinitely or give wrong answer

Chapter Summary

📌 Key Takeaways

Algorithm	Complexity	Handles Neg	Use When
BFS	`O(V+E)`	✗	Unweighted graphs
Dijkstra	`O((V+E) log V)`	✗	Non-negative weighted SSSP
Bellman-Ford	`O(VE)`	✓	Negative edges, detect neg cycles
SPFA	`O(VE)` worst, fast avg	✓	Sparse graphs, neg edges
Floyd-Warshall	`O(V³)`	✓	All-pairs, V ≤ 500
0-1 BFS	`O(V+E)`	N/A	Edges with weight 0 or 1 only

❓ FAQ

Q1: Why can't Dijkstra handle negative edges?

A: Dijkstra's greedy assumption is "the node with the current shortest distance cannot be improved by later paths." With negative edges, this assumption fails—a longer path through a negative edge may end up shorter.

Concrete counterexample: Nodes A, B, C. Edges: A→B=2, A→C=10, B→C=−20.

Dijkstra processes A first (dist=0), relaxes to dist[B]=2, dist[C]=10

Then processes B (dist=2, minimum), relaxes to dist[C]=min(10, 2+(-20))=-18

But if Dijkstra "settles" C before processing B (this specific case won't, but slightly different weights will cause issues)

General explanation: When node u is popped and settled, Dijkstra considers dist[u] optimal. But if there is a negative edge (v, u, w) with w < 0, there may be a path src→...→v→u with total weight < current dist[u], while v has not yet been processed.

Conclusion: With negative edges, you must use Bellman-Ford (O(VE)) or SPFA (average O(E), worst O(VE)).

Q2: What is the difference between SPFA and Bellman-Ford?

A: SPFA is a queue-optimized version of Bellman-Ford. Bellman-Ford traverses all edges each round; SPFA only updates neighbors of nodes whose distance improved, using a queue to track which nodes need processing. In practice SPFA is much faster (average O(E)), but the theoretical worst case is the same (O(VE)). On some contest platforms SPFA can be hacked to worst case, so with negative edges consider Bellman-Ford; without negative edges always use Dijkstra.

Q3: Why must the k loop be the outermost in Floyd-Warshall?

A: This is the most common Floyd-Warshall implementation error! The DP invariant is: after the k-th outer loop iteration, dist[i][j] represents the shortest path from i to j using only nodes {1, 2, ..., k} as intermediates. When processing intermediate node k, dist[i][k] and dist[k][j] must already be fully computed based on {1..k-1}. If k is in the inner loop, dist[i][k] may have just been updated in the same outer loop iteration, leading to incorrect results. Remember: k is outermost, i and j are inner — order matters!

Q4: How to determine whether a USACO problem needs Dijkstra or BFS?

A: Key question: Are edges weighted?

Unweighted graph (edge weight=1 or find minimum edges) → BFS, O(V+E), faster and simpler code

Weighted graph (different non-negative weights) → Dijkstra

Edge weights only 0 or 1 → 0-1 BFS (faster than Dijkstra, O(V+E))

Has negative edges → Bellman-Ford/SPFA

Q5: When to use Floyd-Warshall?

A: When you need shortest distances between all pairs, and V ≤ 500 (since O(V³) ≈ 1.25×10⁸ is barely feasible at V=500). Typical scenario: given multiple sources and targets, query distance between any pair. For V > 500, run Dijkstra once per node (O(V × (V+E) log V)) is faster.

🔗 Connections to Other Chapters

Chapter 5.2 (BFS & DFS): BFS is "Dijkstra for unweighted graphs"; this chapter is a direct extension of BFS
Chapter 3.11 (Binary Trees): Dijkstra's priority queue is a binary heap; understanding heaps helps analyze complexity
Chapter 5.3 (Trees & Special Graphs): Shortest path on a tree is the unique root-to-node path (DFS/BFS suffices)
Chapter 6.1 (DP Introduction): Floyd-Warshall is essentially DP (state = "using first k nodes"); many shortest path variants can be modeled with DP
USACO Gold: Shortest path + DP combinations (e.g., DP on shortest path DAG), shortest path + binary search, shortest path + data structure optimization

Practice Problems

Problem 5.4.1 — Classic Dijkstra 🟢 Easy Given N cities and M roads with travel time, find the shortest travel time from city 1 to city N. If unreachable, output -1. (N ≤ 10^5, M ≤ 5×10^5, weights ≤ 10^9)

Hint

Standard Dijkstra. Use `long long` for distances (max path ≤ N × max_weight = 10^5 × 10^9 = 10^14). Initialize `dist[1] = 0`, all others INF.

Problem 5.4.2 — BFS on Grid 🟢 Easy A robot is on an R×C grid. Some cells are walls. Find the shortest path (in steps) from top-left to bottom-right. Output -1 if impossible.

Hint

Use BFS. Each step moves to an adjacent (4-directional) non-wall cell. Distance = number of steps = number of edges = works with BFS.

Problem 5.4.3 — Negative Edge Detection 🟡 Medium Given a directed graph with possibly negative edge weights, determine:

The shortest distance from node 1 to node N
Whether any negative cycle exists that is reachable from node 1

Hint

Use Bellman-Ford. Run V-1 relaxation iterations. Then do one more: if any distance improves, there's a negative cycle. Report the distance (it may be -INF if a negative cycle can reach node N).

Problem 5.4.4 — Multi-Source BFS 🟡 Medium A zombie outbreak starts at K infected cities. Find the minimum time for zombies to reach each city (spread 1 city per time unit via roads).

多源 BFS 扩散过程示意：

flowchart LR
    subgraph t0["初始：K个感染源同时入队"]
        direction TB
        Z1(["Z1\nt=0"]) 
        Z2(["Z2\nt=0"])
        Z3(["Z3\nt=0"])
        note0["queue = [Z1, Z2, Z3]"]
    end
    subgraph t1["第 1 轮：向外扩散"]
        direction TB
        A1(["Z1\nt=0"]) --- B1(["A\nt=1"])
        C1(["Z2\nt=0"]) --- D1(["B\nt=1"])
        E1(["Z3\nt=0"]) --- F1(["C\nt=1"])
    end
    subgraph t2["第 2 轮：继续扩散"]
        direction TB
        G2(["A\nt=1"]) --- H2(["D\nt=2"])
        I2(["B\nt=1"]) --- J2(["E\nt=2"])
        note2["已访问节点不再入队"]
    end
    t0 --> t1 --> t2
    style Z1 fill:#fef2f2,stroke:#dc2626
    style Z2 fill:#fef2f2,stroke:#dc2626
    style Z3 fill:#fef2f2,stroke:#dc2626

💡 等价转化： 多源 BFS = 添加一个虚拟起点 S，将 S 以距离 0 连接到所有 K 个感染源，再运行单源 BFS。所有感染源同时入队就是这个思路的实现。

Hint

Multi-source BFS: initialize the queue with all K infected cities at time 0. Run BFS normally. This is equivalent to adding a virtual "source" node connected to all K cities with weight 0.

Problem 5.4.5 — All-Pairs with Floyd 🟡 Medium Given N cities (N ≤ 300) and M roads, answer Q queries: "Is city u reachable from city v within distance D?"

Hint

Run Floyd-Warshall to get all-pairs shortest paths in `O(N³)`. Each query is then `O(1)`: check `dist[u][v] <= D`.

Problem 5.4.6 — Dijkstra + Binary Search 🔴 Hard A delivery drone can carry a maximum weight of W. There are N cities connected by roads, each road has a weight limit. Find the path from city 1 to city N that maximizes the minimum weight limit along the path (i.e., the heaviest cargo the drone can carry).

Hint

This is "Maximum Bottleneck Path" — find the path where the minimum edge weight is maximized. Two approaches: (1) Binary search on the answer W, then check if a path exists using only edges with weight ≥ W. (2) Run a modified Dijkstra where `dist[v]` = maximum minimum edge weight on any path to v. Use max-heap, update: `dist[v] = max(dist[v], min(dist[u], weight(u,v)))`.

End of Chapter 5.4 — Next: Chapter 6.1: Introduction to DP

🧠 Part 6: Dynamic Programming

The most powerful and most feared topic in competitive programming. Master memoization, tabulation, and classic DP patterns for USACO Silver.

📚 3 Chapters · ⏱️ Estimated 3-4 weeks · 🎯 Target: Reach USACO Silver level

Part 6: Dynamic Programming

Estimated time: 3–4 weeks

Dynamic programming is the most powerful and most feared topic in competitive programming. Once you master it, you'll be able to solve problems that seem impossible by brute force. Take your time with this part — it's worth it.

What Topics Are Covered

Chapter	Topic	The Big Idea
Chapter 6.1	Introduction to DP	Memoization, tabulation, the DP recipe
Chapter 6.2	Classic DP Problems	LIS, 0/1 Knapsack, grid path counting
Chapter 6.3	Advanced DP Patterns	Bitmask DP, interval DP, tree DP

What You'll Be Able to Solve After This Part

After completing Part 6, you'll be ready to tackle:

USACO Bronze:
- Simple counting problems (how many ways to do X?)
- Basic optimization (minimum cost to do Y?)
USACO Silver:
- Longest increasing subsequence (and variants)
- Knapsack-style resource allocation
- Grid path problems (max value path, count paths)
- 1D DP with careful state definition (Hoof-Paper-Scissors, etc.)
DP on intervals or trees (Chapter 6.3)

Key DP Patterns to Master

Pattern	Chapter	Example Problem
1D DP (sequential)	6.1	Fibonacci, climbing stairs
1D DP (optimization)	6.1	Coin change (minimum coins)
1D DP (counting)	6.1	Coin change (number of ways)
2D DP	6.2	0/1 Knapsack, grid paths
LIS (O(N²))	6.2	Longest increasing subsequence
LIS (O(N log N))	6.2	Fast LIS with binary search
Bitmask DP	6.3	TSP, assignment problem
Interval DP	6.3	Matrix chain multiplication
Tree DP	6.3	Independent set on trees

Prerequisites

Before starting Part 6, make sure you can:

Write recursive functions and understand the call stack (Chapter 2.3)
Use 2D vectors comfortably (Chapter 2.3)
Understand binary search (Chapter 3.3) — needed for O(N log N) LIS
Solve basic BFS problems (Chapter 5.2) — DP and BFS share "state space exploration" intuition

The DP Mindset

DP is not about memorizing formulas — it's about asking the right questions:

What is the "state"? What information do I need to describe a subproblem?
What is the "transition"? How does the answer to a bigger state depend on smaller states?
What are the base cases? What are the simplest subproblems with known answers?
What order do I fill the table? Dependencies must be computed before they're used.

💡 Key Insight: If you find yourself writing the same computation multiple times in a recursive solution, DP is the fix. Cache the result the first time, reuse it every subsequent time.

Tips for This Part

Start with Chapter 6.1 carefully. Don't rush to knapsack before you truly understand Fibonacci DP. The "why" of DP is more important than the "what."
Write both memoization and tabulation for the same problem. Converting between them deepens understanding.
Chapter 6.2's LIS has two implementations: O(N²) (easy to understand) and O(N log N) (fast, needed for large N). Learn both.
Chapter 6.3 is Silver/Gold level. If you're targeting Bronze, you can skip Chapter 6.3 initially and return to it later.
Most DP bugs come from wrong initialization. For min-cost problems, initialize to INF, not 0. For counting problems, initialize the base case to 1, not 0.

⚠️ Warning: The #1 DP bug: forgetting to check dp[w-c] != INF before using it in a minimization DP. INF + 1 overflows!

The #2 DP bug: wrong loop order for 0/1 knapsack vs. unbounded knapsack. Backward iteration = each item used at most once. Forward iteration = unlimited use.

📖 Chapter 6.1 ⏱️ ~65 min read 🎯 Intermediate

Chapter 6.1: Introduction to Dynamic Programming

📝 Before You Continue: Make sure you understand recursion (Chapter 2.3), arrays/vectors (Chapters 2.3–3.1), and basic loop patterns (Chapter 2.2). DP builds directly on recursion concepts.

Dynamic programming (DP) is often described as "clever recursion with memory." Let's build up this intuition from scratch, starting with the simplest possible example: Fibonacci numbers.

💡 Key Insight: DP solves problems with two properties:

Overlapping subproblems — the same sub-computation appears many times

Optimal substructure — the optimal solution to a big problem can be built from optimal solutions to smaller problems

When both are true, DP transforms exponential time into polynomial time.

6.1.1 The Problem with Naive Recursion

The Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, ...

Definition: F(0) = 0, F(1) = 1, F(n) = F(n-1) + F(n-2) for n ≥ 2.

Visual: Fibonacci Recursion Tree and Memoization

The recursion tree for fib(5) exposes the problem: fib(3) is computed twice (red nodes). Memoization caches each result the first time it's computed, reducing 2^N calls to just N unique calls — the fundamental insight behind dynamic programming.

Fibonacci Memoization

The static diagram above shows how memoization eliminates redundant computations: each unique subproblem is solved only once and its result is cached for future lookups.

The naïve recursive implementation:

int fib(int n) {
    if (n == 0) return 0;
    if (n == 1) return 1;
    return fib(n-1) + fib(n-2);  // recursive
}

This is correct, but devastatingly slow. Let's see why:

fib(5)
├── fib(4)
│   ├── fib(3)
│   │   ├── fib(2)
│   │   │   ├── fib(1) = 1
│   │   │   └── fib(0) = 0
│   │   └── fib(1) = 1
│   └── fib(2)           ← COMPUTED AGAIN!
│       ├── fib(1) = 1
│       └── fib(0) = 0
└── fib(3)               ← COMPUTED AGAIN!
    ├── fib(2)            ← COMPUTED AGAIN!
    │   ├── fib(1) = 1
    │   └── fib(0) = 0
    └── fib(1) = 1

fib(3) is computed twice. fib(2) three times. For fib(50), the number of calls exceeds 10^10. This is exponential time: O(2^n).

The core insight: we're recomputing the same subproblems over and over. DP fixes this.

6.1.2 Memoization (Top-Down DP)

Memoization = recursion + cache. Before computing, check if we've already computed this value. If yes, return the cached result. If no, compute it, cache it, return it.

#include <bits/stdc++.h>
using namespace std;

const int MAXN = 100;
long long memo[MAXN];  // memo[n] = F(n), or -1 if not yet computed
bool computed[MAXN];   // track which values are computed

long long fib_memo(int n) {
    if (n == 0) return 0;
    if (n == 1) return 1;
    if (computed[n]) return memo[n];  // already computed? return cached value

    memo[n] = fib_memo(n-1) + fib_memo(n-2);  // compute and cache
    computed[n] = true;
    return memo[n];
}

int main() {
    memset(computed, false, sizeof(computed));  // initialize cache as empty

    for (int i = 0; i <= 20; i++) {
        cout << "F(" << i << ") = " << fib_memo(i) << "\n";
    }
    return 0;
}

Or using -1 as the sentinel:

📝 说明： 以下是与上文 fib_memo 等价的另一种写法。区别在于：① 用 -1 作为"未计算"的哨兵值，省去单独的 computed[] 数组；② 函数名改为 fib，写法更简洁。两种写法在功能上完全相同，请勿将两种写法的代码片段混用（它们各自拥有独立的全局 memo 数组）。

// 写法二：-1 哨兵值（等价于上文 fib_memo，更简洁）
const int MAXN = 100;
long long memo[MAXN];

long long fib(int n) {
    if (n <= 1) return n;
    if (memo[n] != -1) return memo[n];
    return memo[n] = fib(n-1) + fib(n-2);
}

int main() {
    fill(memo, memo + MAXN, -1LL);  // 将所有值初始化为 -1（"未计算"标记）
    cout << fib(50) << "\n";        // 12586269025
    return 0;
}

Now each value is computed exactly once. Time complexity: O(N). 🎉

6.1.3 Tabulation (Bottom-Up DP)

Tabulation builds the answer from the ground up — compute small subproblems first, use them to compute larger ones.

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n = 50;
    vector<long long> dp(n + 1);

    // Base cases
    dp[0] = 0;
    dp[1] = 1;

    // Fill the table bottom-up
    for (int i = 2; i <= n; i++) {
        dp[i] = dp[i-1] + dp[i-2];  // use already-computed values
    }

    cout << dp[n] << "\n";  // 12586269025
    return 0;
}

We can even optimize space: since each Fibonacci number only depends on the previous two, we only need O(1) space:

long long a = 0, b = 1;
for (int i = 2; i <= n; i++) {
    long long c = a + b;
    a = b;
    b = c;
}
cout << b << "\n";

Memoization vs. Tabulation

两种方式的执行路径对比（以 fib(4) 为例）：

flowchart LR
    subgraph topdown["🔽 Top-Down（记忆化递归）"]
        direction TB
        F4a(["fib(4)"])
        F3a(["fib(3)"])
        F2a(["fib(2)"])
        F1a(["fib(1)=1"])
        F0a(["fib(0)=0"])
        F2b(["fib(2)\n📦 缓存命中!"])
        F4a --> F3a
        F4a --> F2b
        F3a --> F2a
        F3a --> F1a
        F2a --> F1a
        F2a --> F0a
        style F2b fill:#dcfce7,stroke:#16a34a
    end
    subgraph bottomup["🔼 Bottom-Up（制表法）"]
        direction LR
        D0["dp[0]=0"] --> D1["dp[1]=1"] --> D2["dp[2]=1"] --> D3["dp[3]=2"] --> D4["dp[4]=3"]
        note["顺序填表，每格只算一次"]
    end
    style topdown fill:#f0f4ff,stroke:#4A6CF7
    style bottomup fill:#f0fdf4,stroke:#16a34a

💡 核心区别： Top-Down 按需计算（只算用到的子问题），Bottom-Up 全量填表（按顺序算所有子问题）。两者时间复杂度相同，但 Bottom-Up 无递归栈开销。

Aspect	Memoization (Top-Down)	Tabulation (Bottom-Up)
Approach	Recursive with caching	Iterative table filling
Memory usage	Only computed states	All states (even unused)
Implementation	Often more intuitive	May need to figure out fill order
Stack overflow risk	Yes (deep recursion)	No
Speed	Slightly slower (function call overhead)	Slightly faster
Subproblems computed	Only reachable ones	All (even unreachable)
Debugging	Easier (follow recursion)	Harder (need correct fill order)
USACO preference	Great for understanding	Great for final solutions

🏆 USACO Tip: In competition, bottom-up tabulation is slightly preferred because it avoids potential stack overflow (critical on problems with N = 10^5) and is often faster. But start with top-down if you're having trouble seeing the recurrence — it's a great way to think through the problem.

In competitive programming, both are valid. Practice both until you can convert easily between them.

6.1.4 The DP Recipe

Every DP problem follows the same recipe:

DP 四步法流程图：

flowchart TD
    S1["① 定义状态\ndp[i] 代表什么？"] --> S2
    S2["② 写出递推关系\ndp[i] 如何由更小的状态得到？"] --> S3
    S3["③ 确定边界条件\n最小子问题的答案是什么？"] --> S4
    S4["④ 确定填表顺序\n从小到大？从大到小？"] --> S5
    S5{"能否压缩空间？"}
    S5 -->|"只依赖前1-2行"| S6["滚动数组 / 1D 优化"]
    S5 -->|"依赖整个表"| S7["保留完整 2D 表"]
    style S1 fill:#dbeafe,stroke:#3b82f6
    style S2 fill:#dbeafe,stroke:#3b82f6
    style S3 fill:#dbeafe,stroke:#3b82f6
    style S4 fill:#dbeafe,stroke:#3b82f6
    style S6 fill:#dcfce7,stroke:#16a34a

Define the state: What information uniquely describes a subproblem?
Define the recurrence: How does dp[state] depend on smaller states?
Identify base cases: What are the simplest subproblems with known answers?
Determine order: In what order should we fill the table?

Let's apply this to Fibonacci:

State: dp[i] = the i-th Fibonacci number
Recurrence: dp[i] = dp[i-1] + dp[i-2]
Base cases: dp[0] = 0, dp[1] = 1
Order: i from 2 to n (each depends on smaller i)

6.1.5 Coin Change — Classic DP

Problem: You have coins of denominations coins[]. What is the minimum number of coins needed to make amount W? You can use each coin type unlimited times.

Example: coins = [1, 5, 6, 9], W = 11

Let's first try the greedy approach (always pick the largest coin ≤ remaining):

Greedy: 9 + 1 + 1 = 3 coins ← not optimal!
Optimal: 5 + 6 = 2 coins ← DP finds this

This is why greedy fails here and we need DP.

Visual: Coin Change DP Table

The DP table shows how dp[i] (minimum coins to make amount i) is filled left to right. For coins {1,3,4}, notice that dp[3]=1 (just use coin 3) and dp[6]=2 (use two 3s). Each cell builds on previous cells using the recurrence.

Coin Change DP

This static reference shows the complete coin change DP table, with arrows indicating how each cell's value depends on previous cells via the recurrence dp[w] = 1 + min(dp[w-c]).

DP Definition

Coin Change 状态转移示意（coins=[1,5,6], W=7）：

flowchart LR
    D0(["dp[0]=0"]) 
    D1(["dp[1]=1"])
    D5(["dp[5]=1"])
    D6(["dp[6]=1"])
    D7(["dp[7]=2"])
    D0 -->|"用硬币1"| D1
    D0 -->|"用硬币5"| D5
    D0 -->|"用硬币6"| D6
    D1 -->|"用硬币6"| D7
    D5 -->|"用硬币1\ndp[5]+1=2"| D6_2(["dp[6]=min(1,2)=1"])
    D6 -->|"用硬币1\ndp[6]+1=2"| D7
    note1["dp[7] = min(dp[6]+1, dp[2]+1, dp[1]+1)\n     = min(2, ?, 2) = 2\n最优: 1+6 或 6+1"]
    style D0 fill:#f0fdf4,stroke:#16a34a
    style D7 fill:#dbeafe,stroke:#3b82f6
    style note1 fill:#fef9ec,stroke:#d97706

💡 转移方向： 每个 dp[w] 都从 dp[w-c]（用了硬币 c 之后的剩余金额）转移而来。箭头方向 = 状态依赖方向。

State: dp[w] = minimum coins to make exactly amount w
Recurrence: dp[w] = 1 + min over all coins c where c ≤ w: dp[w - c] (use coin c, then solve the remaining w-c optimally)
Base case: dp[0] = 0 (zero coins to make amount 0)
Answer: dp[W]
Order: fill w from 1 to W

Complete Walkthrough: coins = [1, 5, 6, 9], W = 11

dp[0] = 0 (base case)

dp[1]:  try coin 1: dp[0]+1=1          → dp[1] = 1
dp[2]:  try coin 1: dp[1]+1=2          → dp[2] = 2
dp[3]:  try coin 1: dp[2]+1=3          → dp[3] = 3
dp[4]:  try coin 1: dp[3]+1=4          → dp[4] = 4
dp[5]:  try coin 1: dp[4]+1=5
        try coin 5: dp[0]+1=1          → dp[5] = 1  ← use the 5-coin!
dp[6]:  try coin 1: dp[5]+1=2
        try coin 5: dp[1]+1=2
        try coin 6: dp[0]+1=1          → dp[6] = 1  ← use the 6-coin!
dp[7]:  try coin 1: dp[6]+1=2
        try coin 5: dp[2]+1=3
        try coin 6: dp[1]+1=2          → dp[7] = 2  ← 1+6 or 6+1
dp[8]:  try coin 1: dp[7]+1=3
        try coin 5: dp[3]+1=4
        try coin 6: dp[2]+1=3          → dp[8] = 3
dp[9]:  try coin 1: dp[8]+1=4
        try coin 5: dp[4]+1=5
        try coin 6: dp[3]+1=4
        try coin 9: dp[0]+1=1          → dp[9] = 1  ← use the 9-coin!
dp[10]: try coin 1: dp[9]+1=2
        try coin 5: dp[5]+1=2
        try coin 6: dp[4]+1=5
        try coin 9: dp[1]+1=2          → dp[10] = 2  ← 1+9, 5+5, or 9+1
dp[11]: try coin 1: dp[10]+1=3
        try coin 5: dp[6]+1=2
        try coin 6: dp[5]+1=2
        try coin 9: dp[2]+1=3          → dp[11] = 2  ← 5+6 or 6+5!

dp table: [0, 1, 2, 3, 4, 1, 1, 2, 3, 1, 2, 2]

Answer: dp[11] = 2 (coins 5 and 6) ✓

// Solution: Minimum Coin Change — O(N × W)
#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, W;
    cin >> n >> W;

    vector<int> coins(n);
    for (int &c : coins) cin >> c;

    const int INF = 1e9;
    vector<int> dp(W + 1, INF);  // dp[w] = min coins to make w
    dp[0] = 0;                    // base case

    // Step 1: Fill dp table bottom-up
    for (int w = 1; w <= W; w++) {
        for (int c : coins) {
            if (c <= w && dp[w - c] != INF) {
                dp[w] = min(dp[w], dp[w - c] + 1);  // ← KEY LINE
            }
        }
    }

    // Step 2: Output result
    if (dp[W] == INF) {
        cout << "Impossible\n";
    } else {
        cout << dp[W] << "\n";
    }

    return 0;
}

Sample Input:

4 11
1 5 6 9

Sample Output:

Complexity Analysis:

Time: O(N × W) — for each amount w (1..W), try all N coins
Space: O(W) — just the dp array

Reconstructing the Solution

How do we print which coins were used? Track parent[w] = which coin was used last:

vector<int> dp(W + 1, INF);
vector<int> lastCoin(W + 1, -1);  // which coin gave optimal solution for w
dp[0] = 0;

for (int w = 1; w <= W; w++) {
    for (int c : coins) {
        if (c <= w && dp[w-c] + 1 < dp[w]) {
            dp[w] = dp[w-c] + 1;
            lastCoin[w] = c;   // record that coin c was used
        }
    }
}

// Trace back the solution
vector<int> solution;
int w = W;
while (w > 0) {
    solution.push_back(lastCoin[w]);
    w -= lastCoin[w];
}
for (int c : solution) cout << c << " ";
cout << "\n";

6.1.6 Number of Ways — Coin Change Variant

Problem: How many different ways can you make amount W using the given coins? (Order matters: [1,5] and [5,1] are different.)

// Ordered ways (permutations — order matters)
vector<long long> ways(W + 1, 0);
ways[0] = 1;  // one way to make 0: use no coins

for (int w = 1; w <= W; w++) {
    for (int c : coins) {
        if (c <= w) {
            ways[w] += ways[w - c];  // ← KEY LINE
        }
    }
}

If order doesn't matter (combinations — [1,5] same as [5,1]):

// Unordered ways (combinations — order doesn't matter)
vector<long long> ways(W + 1, 0);
ways[0] = 1;

for (int c : coins) {           // outer loop: coins (each coin is considered once)
    for (int w = c; w <= W; w++) {  // inner loop: amounts
        ways[w] += ways[w - c];
    }
}

💡 Key Insight: The order of loops matters for counting combinations vs. permutations! When coins are in the outer loop, each coin is "introduced" once and order is ignored. When amounts are in the outer loop, each amount is formed fresh each time, allowing all orderings.

⚠️ Common Mistakes in Chapter 6.1

Initializing dp with 0 instead of INF: For minimization problems, dp[w] = 0 means "0 coins" which will never get improved. Use dp[w] = INF and only dp[0] = 0.
Not checking dp[w-c] != INF before using it: INF + 1 overflows! Always check that the subproblem is solvable.
Wrong loop order for knapsack variants: For unbounded (unlimited coins), loop amounts forward. For 0/1 (each used once), loop amounts backward. Getting this wrong gives wrong answers silently.
Using INT_MAX as INF then adding 1: INT_MAX + 1 overflows to negative. Use 1e9 or 1e18 as INF.
Forgetting the base case: dp[0] = 0 is crucial. Without it, nothing ever gets set.

Chapter Summary

📌 Key Takeaways

Concept	Key Points	When to Use
Overlapping subproblems	Same computation repeated exponentially	Duplicate calls in recursion tree
Memoization (top-down)	Cache recursive results; easy to write	When recursive structure is clear
Tabulation (bottom-up)	Iterative table-filling; no stack overflow	Final contest solution; large N
DP state	Information that uniquely identifies a subproblem	Define carefully — determines everything
DP recurrence	How `dp[state]` depends on smaller states	"Transition equation"
Base case	Known answer for the simplest subproblem	Usually `dp[0]` = some trivial value

🧩 DP Four-Step Method Quick Reference

Step	Question	Fibonacci Example
1. Define state	"What does `dp[i]` represent?"	`dp[i]` = the i-th Fibonacci number
2. Write recurrence	"Which smaller states does `dp[i]` depend on?"	`dp[i]` = `dp[i-1]` + `dp[i-2]`
3. Determine base case	"What is the answer for the smallest subproblem?"	`dp[0]`=0, `dp[1]`=1
4. Determine fill order	"i from small to large? Large to small?"	i from 2 to n

❓ FAQ

Q1: How do I tell if a problem is a DP problem?

A: Two signals: ① the problem asks for an "optimal value" or "number of ways" (not "output the specific solution"); ② there are overlapping subproblems (the same subproblem is computed multiple times in brute-force recursion). If greedy can be proven correct, DP is usually not needed; otherwise it's likely DP.

Q2: Should I use top-down or bottom-up?

A: While learning, use top-down (more naturally expresses recursive thinking); for contest submission, use bottom-up (faster, no stack overflow). Both are correct. If you can quickly write bottom-up, go with it directly.

Q3: What is "optimal substructure" (no aftereffect)?

A: The core prerequisite of DP — once dp[i] is determined, subsequent computations will not "come back" to change it. In other words, dp[i]'s value only depends on the "past" (smaller states), not the "future". If this property is violated, DP cannot be used.

Q4: What value should INF be set to?

A: For int use 1e9 (= 10^9), for long long use 1e18 (= 10^18). Do not use INT_MAX, because INT_MAX + 1 overflows to a negative number.

🔗 Connections to Later Chapters

Chapter 6.2 (Classic DP): extends to LIS, knapsack, grid paths — all applications of the four-step DP method from this chapter
Chapter 6.3 (Advanced DP): enters bitmask DP, interval DP, tree DP — more complex state definitions but same thinking
Chapter 3.2 (Prefix Sums): difference arrays can sometimes replace simple DP, and prefix sum arrays can speed up interval computations in DP
Chapter 4.1 (Greedy) vs DP: greedy-solvable problems are a special case of DP (local optimum = global optimum at each step); when greedy fails, DP is needed

Practice Problems

Problem 6.1.1 — Climbing Stairs 🟢 Easy You can climb 1 or 2 stairs at a time. How many ways to climb N stairs? (Same as Fibonacci — ways[n] = ways[n-1] + ways[n-2])

Hint

This is exactly Fibonacci! ways[1]=1, ways[2]=2. Or start with ways[0]=1, ways[1]=1, then ways[n] = ways[n-1] + ways[n-2].

Problem 6.1.2 — Minimum Coin Change 🟡 Medium Given coin denominations [1, 3, 4] and target 6, find the minimum coins. (Expected answer: 2 coins — use 3+3)

Hint

Build `dp[0..6]` using the coin change recurrence. Greedy gives 4+1+1=3 coins, but dp finds 3+3=2.

Problem 6.1.3 — Tile Tiling 🟡 Medium A 2×N board can be tiled with 1×2 dominoes (placed horizontally or vertically). How many ways?

Solution sketch: dp[n] = dp[n-1] + dp[n-2]. Vertical tile fills one column alone; two horizontal tiles fill two columns together.

Hint

Same recurrence as Fibonacci! The key insight: when you place a vertical domino at column n, you recurse on n-1; when you place two horizontal dominoes at columns n-1 and n, you recurse on n-2.

Problem 6.1.4 — Bounded Coin Change 🔴 Hard Same as coin change, but you can use each coin at most once (0/1 knapsack). Find the minimum coins.

Solution sketch: Similar to 0/1 knapsack. Use a 2D dp[i][w] = min coins using first i coins to make w. Or the space-optimized version with backward iteration.

Hint

This is a 0/1 knapsack variant. Key difference: when you use coin i, you can't use it again. In the 1D space-optimized version, iterate w from W down to coins[i] to prevent reuse.

Problem 6.1.5 — USACO Bronze: Haybale Stacking 🔴 Hard Given N operations "add 1 to all positions from L to R", determine the final value at each position.

Use difference array from Chapter 3.2. This is also solvable by thinking of it as "build the answer" (DP-like perspective).

Hint

Difference array: ``diff[L]``++, ``diff[R+1]``--. Then prefix sum of diff gives final values.

🏆 Challenge Problem: Unique Paths with Obstacles An N×M grid has '.' cells and '#' obstacles. Count paths from (1,1) to (N,M) moving only right or down. Answer modulo 10^9+7. (N, M ≤ 1000)

Visual: Fibonacci Recursion Tree

Fibonacci Recursion Tree

The diagram shows naive recursion for fib(6). Red dashed nodes are duplicate subproblems — computed multiple times. Green nodes show where memoization caches results. Without memoization: O(2^N). With memoization: O(N). This is the fundamental insight behind dynamic programming.

📖 Chapter 6.2 ⏱️ ~80 min read 🎯 Advanced

Chapter 6.2: Classic DP Problems

📝 Before You Continue: Make sure you've mastered Chapter 6.1's core DP concepts — states, recurrences, and base cases. You should be able to implement Fibonacci and basic coin change from scratch.

In this chapter, we tackle three of the most important and widely-applied DP problems in competitive programming. Mastering these patterns will help you recognize and solve dozens of USACO problems.

6.2.1 Longest Increasing Subsequence (LIS)

Problem: Given an array A of N integers, find the length of the longest subsequence where elements are strictly increasing. A subsequence doesn't need to be contiguous.

Example: A = [3, 1, 8, 2, 5]

LIS: [1, 2, 5] → length 3
Or: [3, 8] → length 2 (not the longest)
Or: [1, 5] → length 2

💡 Key Insight: A subsequence can skip elements but must maintain relative order. The key DP insight: for each index i, ask "what's the longest increasing subsequence that ends at A[i]?" Then the answer is the maximum over all i.

LIS O(N²) 状态转移示意（A=[3,1,8,2,5]）：

flowchart LR
    subgraph arr["数组 A"]
        direction LR
        A0(["A[0]=3\ndp=1"])
        A1(["A[1]=1\ndp=1"])
        A2(["A[2]=8\ndp=2"])
        A3(["A[3]=2\ndp=2"])
        A4(["A[4]=5\ndp=3"])
    end
    A0 -->|"3<8"| A2
    A1 -->|"1<8"| A2
    A1 -->|"1<2"| A3
    A1 -->|"1<5"| A4
    A3 -->|"2<5"| A4
    note["答案: max(dp)=3\nLIS=[1,2,5]"]
    style A4 fill:#dcfce7,stroke:#16a34a
    style note fill:#f0fdf4,stroke:#16a34a

💡 转移规则： dp[i] = 1 + max(dp[j]) 对所有 j<i 且 A[j]<A[i]。箭头表示“可以延伸”的关系。

LIS Visualization

The diagram above illustrates the LIS structure: arrows show which earlier elements each position can extend from, and highlighted elements form the longest increasing subsequence.

The diagram shows the array [3,1,4,1,5,9,2,6] with the LIS 1→4→5→6 highlighted in green. Each dp[i] value below the array shows the LIS length ending at that position. Arrows connect elements that extend the subsequence.

`O(N²)` DP Solution

State: dp[i] = length of the longest increasing subsequence ending at index i
Recurrence: dp[i] = 1 + max(dp[j]) for all j < i where A[j] < A[i]
Base case: dp[i] = 1 (a subsequence of just A[i])
Answer: max(dp[0], dp[1], ..., dp[N-1])

Step-by-step trace for A = [3, 1, 8, 2, 5]:

dp[0] = 1  (LIS ending at 3: just [3])

dp[1] = 1  (LIS ending at 1: just [1], since no j<1 with A[j]<1)

dp[2] = 2  (LIS ending at 8: A[0]=3 < 8 → dp[0]+1=2; A[1]=1 < 8 → dp[1]+1=2)
            Best: 2 ([3,8] or [1,8])

dp[3] = 2  (LIS ending at 2: A[1]=1 < 2 → dp[1]+1=2)
            Best: 2 ([1,2])

dp[4] = 3  (LIS ending at 5: A[1]=1 < 5 → dp[1]+1=2; A[3]=2 < 5 → dp[3]+1=3)
            Best: 3 ([1,2,5])

LIS length = max(dp) = 3

// Solution: LIS O(N²) — simple but too slow for N > 5000
#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;
    vector<int> A(n);
    for (int &x : A) cin >> x;

    vector<int> dp(n, 1);  // every element alone is a subsequence of length 1

    for (int i = 1; i < n; i++) {
        for (int j = 0; j < i; j++) {
            if (A[j] < A[i]) {              // A[j] can extend subsequence ending at A[i]
                dp[i] = max(dp[i], dp[j] + 1);  // ← KEY LINE
            }
        }
    }

    cout << *max_element(dp.begin(), dp.end()) << "\n";
    return 0;
}

Sample Input: 5 / 3 1 8 2 5 → Output: 3

Complexity Analysis:

Time: O(N²) — double loop
Space: O(N) — dp array

For N ≤ 5000, O(N²) is fast enough. For N up to 10^5, we need the O(N log N) approach.

`O(N log N)` LIS with Binary Search (Patience Sorting)

The key idea: instead of tracking exact dp values, maintain a tails array where tails[k] = the smallest possible tail element of any increasing subsequence of length k+1 seen so far.

Why is this useful? Because if we can maintain this array, we can use binary search to find where to place each new element.

💡 Key Insight (Patience Sorting): Imagine dealing cards to piles. Each pile is a decreasing sequence (like Solitaire). A card goes on the leftmost pile whose top is ≥ it. If no such pile exists, start a new pile. The number of piles equals the LIS length! The tails array is exactly the tops of these piles.

Step-by-step trace for A = [3, 1, 8, 2, 5]:

Process 3: tails = [], no element ≥ 3, so push: tails = [3]
  → LIS length so far: 1

Process 1: tails = [3], lower_bound(1) hits index 0 (3 ≥ 1), replace:
  tails = [1]
  → LIS length still 1; but now the best 1-length subsequence ends in 1 (better!)

Process 8: tails = [1], lower_bound(8) hits end, push: tails = [1, 8]
  → LIS length: 2 (e.g., [1, 8])

Process 2: tails = [1, 8], lower_bound(2) hits index 1 (8 ≥ 2), replace:
  tails = [1, 2]
  → LIS length still 2; but best 2-length subsequence now ends in 2 (better!)

Process 5: tails = [1, 2], lower_bound(5) hits end, push: tails = [1, 2, 5]
  → LIS length: 3 (e.g., [1, 2, 5]) ✓

Answer = tails.size() = 3

ASCII Patience Sorting Visualization:

Cards dealt: 3, 1, 8, 2, 5

After 3:    After 1:    After 8:    After 2:    After 5:
[3]         [1]         [1][8]      [1][2]      [1][2][5]
Pile 1      Pile 1      P1  P2      P1  P2      P1  P2  P3

Number of piles = LIS length = 3 ✓

// Solution: LIS O(N log N) — fast enough for N up to 10^5
#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;
    vector<int> A(n);
    for (int &x : A) cin >> x;

    vector<int> tails;  // tails[i] = smallest tail of any IS of length i+1

    for (int x : A) {
        // Find first tail >= x (for strictly increasing: use lower_bound)
        auto it = lower_bound(tails.begin(), tails.end(), x);

        if (it == tails.end()) {
            tails.push_back(x);   // x extends the longest subsequence
        } else {
            *it = x;              // ← KEY LINE: replace to maintain smallest possible tail
        }
    }

    cout << tails.size() << "\n";
    return 0;
}

⚠️ Note: tails doesn't store the actual LIS elements, just its length. The elements in tails are maintained in sorted order, which is why binary search works.

⚠️ Common Mistake: Using lower_bound gives LIS for strictly increasing (A[j] < A[i]). For non-decreasing (A[j] ≤ A[i]), use upper_bound instead.

Complexity Analysis:

Time: O(N log N) — N elements, each with O(log N) binary search
Space: O(N) — the tails array

LIS Application in USACO

Many USACO Silver problems reduce to LIS:

"Minimum number of groups to partition a sequence so each group is non-increasing" → same as LIS length (by Dilworth's theorem)
Sorting with restrictions often becomes LIS
2D LIS: sort by one dimension, find LIS of the other

🔗 Related Problem: USACO 2015 February Silver: "Censoring" — involves finding a pattern that's a subsequence.

6.2.2 The 0/1 Knapsack Problem

Problem: You have N items. Item i has weight w[i] and value v[i]. Your knapsack holds total weight W. Choose items to maximize total value without exceeding weight W. Each item can be used at most once (0/1 = take it or leave it).

Example:

Items: (weight=2, value=3), (weight=3, value=4), (weight=4, value=5), (weight=5, value=6)
W = 8
Best: take items 1+2+3 (weight 2+3+4=9 > 8), or items 1+2 (weight 5, value 7), or items 1+4 (weight 7, value 9), or items 2+4 (weight 8, value 10). Answer: 10.

Visual: Knapsack DP Table

The 2D table shows dp[item][capacity]. Each row adds one item, and each cell represents the best value achievable with that capacity. The answer (8) is in the bottom-right corner. Highlighted cells show where new items changed the optimal value.

Knapsack DP Table

This static reference shows the complete knapsack DP table with the take/skip decisions highlighted for each item at each capacity level.

DP Formulation

0/1 背包决策过程示意（以第 i 个物品为例）：

flowchart TD
    State["当前状态: dp[i-1][w]"] --> Dec{"第 i 个物品\nweight[i]=wi, value[i]=vi"}
    Dec -->|"不拿 (skip)"| Skip["dp[i][w] = dp[i-1][w]"]
    Dec -->|"拿 (take)\n前提: wi ≤ w"| Take["dp[i][w] = dp[i-1][w-wi] + vi"]
    Skip --> Max["dp[i][w] = max(不拿, 拿)"]
    Take --> Max
    style Dec fill:#fef9ec,stroke:#d97706
    style Max fill:#dcfce7,stroke:#16a34a

💡 关键区别： 0/1 背包每个物品只能用一次，所以“拿”的时候从上一行 dp[i-1] 转移而来，而不是当前行。这就是为什么 1D 优化时要倒序遍历的原因。

State: dp[i][w] = maximum value using items 1..i with total weight ≤ w
Recurrence:
- Don't take item i: dp[i][w] = dp[i-1][w]
- Take item i (only if w[i] ≤ w): dp[i][w] = dp[i-1][w - weight[i]] + value[i]
- Take the maximum: dp[i][w] = max(don't take, take)
Base case: dp[0][w] = 0 (no items = zero value)
Answer: dp[N][W]

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, W;
    cin >> n >> W;

    vector<int> weight(n + 1), value(n + 1);
    for (int i = 1; i <= n; i++) cin >> weight[i] >> value[i];

    // dp[i][w] = max value using first i items with weight limit w
    vector<vector<int>> dp(n + 1, vector<int>(W + 1, 0));

    for (int i = 1; i <= n; i++) {
        for (int w = 0; w <= W; w++) {
            dp[i][w] = dp[i-1][w];  // option 1: don't take item i

            if (weight[i] <= w) {    // option 2: take item i (if it fits)
                dp[i][w] = max(dp[i][w], dp[i-1][w - weight[i]] + value[i]);
            }
        }
    }

    cout << dp[n][W] << "\n";
    return 0;
}

Space-Optimized 0/1 Knapsack — `O(W)` Space

We only need the previous row dp[i-1], so we can use a 1D array. Crucial: iterate w from W down to 0 (otherwise item i is used multiple times):

vector<int> dp(W + 1, 0);

for (int i = 1; i <= n; i++) {
    // Iterate BACKWARDS to prevent using item i more than once
    for (int w = W; w >= weight[i]; w--) {
        dp[w] = max(dp[w], dp[w - weight[i]] + value[i]);
    }
}

cout << dp[W] << "\n";

Why backwards? When computing dp[w], we need dp[w - weight[i]] from the previous item's row (not current item's). Iterating backwards ensures dp[w - weight[i]] hasn't been updated by item i yet.

Unbounded Knapsack (Unlimited Items)

If each item can be used multiple times, iterate forwards:

for (int i = 1; i <= n; i++) {
    for (int w = weight[i]; w <= W; w++) {  // FORWARDS — allows reuse
        dp[w] = max(dp[w], dp[w - weight[i]] + value[i]);
    }
}

6.2.3 Grid Path Counting

Problem: Count the number of paths from the top-left corner (1,1) to the bottom-right corner (N,M) of a grid, moving only right or down. Some cells are blocked.

Example: 3×3 grid with no blockages → 6 paths (C(4,2) = 6).

Visual: Grid Path DP Values

Grid DP

Each cell shows the number of paths from (0,0) to that cell. The recurrence dp[i][j] = dp[i-1][j] + dp[i][j-1] adds paths arriving from above and from the left. The Pascal's triangle pattern emerges naturally when there are no obstacles.

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, m;
    cin >> n >> m;

    vector<string> grid(n);
    for (int r = 0; r < n; r++) cin >> grid[r];

    // dp[r][c] = number of paths to reach (r, c)
    vector<vector<long long>> dp(n, vector<long long>(m, 0));

    // Base case: starting cell (if not blocked)
    if (grid[0][0] != '#') dp[0][0] = 1;

    // Fill first row (can only come from the left)
    for (int c = 1; c < m; c++) {
        if (grid[0][c] != '#') dp[0][c] = dp[0][c-1];
    }

    // Fill first column (can only come from above)
    for (int r = 1; r < n; r++) {
        if (grid[r][0] != '#') dp[r][0] = dp[r-1][0];
    }

    // Fill rest of the grid
    for (int r = 1; r < n; r++) {
        for (int c = 1; c < m; c++) {
            if (grid[r][c] == '#') {
                dp[r][c] = 0;  // blocked — no paths through here
            } else {
                dp[r][c] = dp[r-1][c] + dp[r][c-1];  // from above + from left
            }
        }
    }

    cout << dp[n-1][m-1] << "\n";
    return 0;
}

Grid Maximum Value Path

Problem: Find the path from (1,1) to (N,M) (moving right or down) that maximizes the sum of values.

vector<vector<int>> val(n, vector<int>(m));
for (int r = 0; r < n; r++)
    for (int c = 0; c < m; c++)
        cin >> val[r][c];

vector<vector<long long>> dp(n, vector<long long>(m, 0));
dp[0][0] = val[0][0];

for (int c = 1; c < m; c++) dp[0][c] = dp[0][c-1] + val[0][c];
for (int r = 1; r < n; r++) dp[r][0] = dp[r-1][0] + val[r][0];

for (int r = 1; r < n; r++) {
    for (int c = 1; c < m; c++) {
        dp[r][c] = max(dp[r-1][c], dp[r][c-1]) + val[r][c];
    }
}

cout << dp[n-1][m-1] << "\n";

6.2.4 USACO DP Example: Hoof Paper Scissors

Problem (USACO 2019 January Silver): Bessie plays N rounds of Hoof-Paper-Scissors (like Rock-Paper-Scissors but with cow gestures). She knows the opponent's moves in advance. She can change her gesture at most K times. Maximize wins.

State: dp[i][j][g] = max wins in the first i rounds, having changed j times, currently playing gesture g.

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, k;
    cin >> n >> k;

    // 0=Hoof, 1=Paper, 2=Scissors
    vector<int> opp(n + 1);
    for (int i = 1; i <= n; i++) {
        char c; cin >> c;
        if (c == 'H') opp[i] = 0;
        else if (c == 'P') opp[i] = 1;
        else opp[i] = 2;
    }

    // dp[j][g] = max wins using j changes so far, currently playing gesture g
    // (2D since we process rounds iteratively)
    const int NEG_INF = -1e9;
    vector<vector<int>> dp(k + 1, vector<int>(3, NEG_INF));

    // Initialize: before round 1, 0 changes, any starting gesture
    for (int g = 0; g < 3; g++) dp[0][g] = 0;

    for (int i = 1; i <= n; i++) {
        vector<vector<int>> ndp(k + 1, vector<int>(3, NEG_INF));

        for (int j = 0; j <= k; j++) {
            for (int g = 0; g < 3; g++) {
                if (dp[j][g] == NEG_INF) continue;

                int win = (g == opp[i]) ? 1 : 0;  // do we win this round?

                // Option 1: don't change gesture
                ndp[j][g] = max(ndp[j][g], dp[j][g] + win);

                // Option 2: change gesture (costs 1 change)
                if (j < k) {
                    for (int ng = 0; ng < 3; ng++) {
                        if (ng != g) {
                            int nwin = (ng == opp[i]) ? 1 : 0;
                            ndp[j+1][ng] = max(ndp[j+1][ng], dp[j][g] + nwin);
                        }
                    }
                }
            }
        }

        dp = ndp;
    }

    int ans = 0;
    for (int j = 0; j <= k; j++)
        for (int g = 0; g < 3; g++)
            ans = max(ans, dp[j][g]);

    cout << ans << "\n";
    return 0;
}

6.2.5 Interval DP — Matrix Chain and Burst Balloons Patterns

Interval DP is a powerful DP technique where the state represents a contiguous subarray or subrange, and we combine solutions of smaller intervals to solve larger ones.

💡 Key Insight: When the optimal solution for a range [l, r] depends on how we split that range at some point k, and the sub-problems for [l, k] and [k+1, r] are independent, interval DP applies.

The Interval DP Framework

Interval DP 填表顺序示意（以 n=4 为例）：

flowchart LR
    subgraph len1["len=1 （基础情况）"]
        direction TB
        L11["dp[1][1]"] 
        L22["dp[2][2]"]
        L33["dp[3][3]"]
        L44["dp[4][4]"]
    end
    subgraph len2["len=2"]
        direction TB
        L12["dp[1][2]"]
        L23["dp[2][3]"]
        L34["dp[3][4]"]
    end
    subgraph len3["len=3"]
        direction TB
        L13["dp[1][3]"]
        L24["dp[2][4]"]
    end
    subgraph len4["len=4 （答案）"]
        direction TB
        L14["dp[1][4] ⭐"]
    end
    len1 -->|"依赖"| len2
    len2 -->|"依赖"| len3
    len3 -->|"依赖"| len4
    style L14 fill:#dcfce7,stroke:#16a34a
    style len4 fill:#f0fdf4,stroke:#16a34a

💡 填表顺序关键： 必须按区间长度由小到大填表。计算 dp[l][r] 时，所有更短的子区间 dp[l][k] 和 dp[k+1][r] 已经就绪。

State:   dp[l][r] = optimal solution for the subproblem on interval [l, r]
Base:    dp[i][i] = cost/value for a single element (often 0 or trivial)
Order:   Fill by increasing interval LENGTH (len = 1, 2, 3, ..., n)
         This ensures dp[l][k] and dp[k+1][r] are computed before dp[l][r]
Transition:
         dp[l][r] = min/max over all split points k in [l, r-1] of:
                    dp[l][k] + dp[k+1][r] + cost(l, k, r)
Answer:  dp[1][n]  (or dp[0][n-1] for 0-indexed)

Enumeration order matters! We enumerate by interval length, not by left endpoint. This guarantees all sub-intervals are solved before we need them.

Classic Example: Matrix Chain Multiplication

Problem: Given N matrices A₁, A₂, ..., Aₙ where matrix Aᵢ has dimensions dim[i-1] × dim[i], find the parenthesization that minimizes the total number of scalar multiplications.

Why DP? Different parenthesizations have wildly different costs:

(A₁A₂)A₃: cost = p×q×r + p×r×s (where shapes are p×q, q×r, r×s)
A₁(A₂A₃): cost = q×r×s + p×q×s

State: dp[l][r] = minimum multiplications to compute the product Aₗ × Aₗ₊₁ × ... × Aᵣ

Transition: Try every split point k ∈ [l, r-1]. When we split at k:

Left product Aₗ...Aₖ has cost dp[l][k], resulting shape dim[l-1] × dim[k]
Right product Aₖ₊₁...Aᵣ has cost dp[k+1][r], resulting shape dim[k] × dim[r]
Multiplying these two results costs dim[l-1] × dim[k] × dim[r]

// Solution: Matrix Chain Multiplication — O(N³) time, O(N²) space
#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;  // number of matrices

    // dim[i-1] × dim[i] is the shape of matrix i (1-indexed)
    // So we need n+1 dimensions
    vector<int> dim(n + 1);
    for (int i = 0; i <= n; i++) cin >> dim[i];
    // Matrix i has shape dim[i-1] × dim[i]

    // dp[l][r] = min cost to compute product of matrices l..r
    vector<vector<long long>> dp(n + 1, vector<long long>(n + 1, 0));
    const long long INF = 1e18;

    // Fill dp by increasing interval length
    for (int len = 2; len <= n; len++) {          // interval length
        for (int l = 1; l + len - 1 <= n; l++) {  // left endpoint
            int r = l + len - 1;                   // right endpoint
            dp[l][r] = INF;

            // Try every split point k
            for (int k = l; k < r; k++) {
                long long cost = dp[l][k]                    // left subproblem
                               + dp[k+1][r]                  // right subproblem
                               + (long long)dim[l-1] * dim[k] * dim[r]; // merge cost
                dp[l][r] = min(dp[l][r], cost);
            }
        }
    }

    cout << dp[1][n] << "\n";  // min cost to multiply all n matrices
    return 0;
}

Complexity Analysis:

States: O(N²) — all pairs (l, r) with l ≤ r
Transition: O(N) per state — try all split points k
Total Time: O(N³)
Space: O(N²)

Example trace for N=4, dims = [10, 30, 5, 60, 10]:

Matrices: A1(10×30), A2(30×5), A3(5×60), A4(60×10)

len=2:
  dp[1][2] = dim[0]*dim[1]*dim[2] = 10*30*5 = 1500
  dp[2][3] = dim[1]*dim[2]*dim[3] = 30*5*60 = 9000
  dp[3][4] = dim[2]*dim[3]*dim[4] = 5*60*10 = 3000

len=3:
  dp[1][3]: try k=1: dp[1][1]+dp[2][3]+10*30*60 = 0+9000+18000 = 27000
             try k=2: dp[1][2]+dp[3][3]+10*5*60  = 1500+0+3000  = 4500
             dp[1][3] = 4500
  dp[2][4]: try k=2: dp[2][2]+dp[3][4]+30*5*10  = 0+3000+1500  = 4500
             try k=3: dp[2][3]+dp[4][4]+30*60*10 = 9000+0+18000 = 27000
             dp[2][4] = 4500

len=4:
  dp[1][4]: try k=1: dp[1][1]+dp[2][4]+10*30*10 = 0+4500+3000  = 7500
             try k=2: dp[1][2]+dp[3][4]+10*5*10  = 1500+3000+500 = 5000 ← min!
             try k=3: dp[1][3]+dp[4][4]+10*60*10 = 4500+0+6000  = 10500
             dp[1][4] = 5000

Answer: 5000 scalar multiplications (parenthesization: (A1 A2)(A3 A4))

Template Summary

// Generic Interval DP Template
// Assumes 1-indexed, n elements
void intervalDP(int n) {
    vector<vector<int>> dp(n + 1, vector<int>(n + 1, 0));

    // Base case: intervals of length 1
    for (int i = 1; i <= n; i++) dp[i][i] = base_case(i);

    // Fill by increasing length
    for (int len = 2; len <= n; len++) {
        for (int l = 1; l + len - 1 <= n; l++) {
            int r = l + len - 1;
            dp[l][r] = INF;  // or -INF for maximization

            for (int k = l; k < r; k++) {  // split at k (or k+1)
                int val = dp[l][k] + dp[k+1][r] + cost(l, k, r);
                dp[l][r] = min(dp[l][r], val);  // or max
            }
        }
    }
    // Answer is dp[1][n]
}

⚠️ Common Mistake: Iterating over left endpoint l in the outer loop and length in the inner loop. This is wrong — when you compute dp[l][r], the sub-intervals dp[l][k] and dp[k+1][r] must already be computed. Always iterate by length in the outer loop.

// WRONG — dp[l][k] might not be ready yet!
for (int l = 1; l <= n; l++)
    for (int r = l + 1; r <= n; r++)
        ...

// CORRECT — all shorter intervals are computed first
for (int len = 2; len <= n; len++)
    for (int l = 1; l + len - 1 <= n; l++) {
        int r = l + len - 1;
        ...
    }

⚠️ Common Mistakes in Chapter 6.2

LIS: using upper_bound for strictly increasing: For strictly increasing, use lower_bound. For non-decreasing, use upper_bound. Getting this wrong gives LIS length off by 1.
0/1 Knapsack: iterating weight forward: Iterating w from 0 to W (forward) allows using item i multiple times — that's unbounded knapsack, not 0/1. Always iterate backwards for 0/1.
Grid paths: forgetting to handle blocked cells: If grid[r][c] == '#', set dp[r][c] = 0 (not dp[r-1][c] + dp[r][c-1]).
Overflow in grid path counting: Even for small grids, the number of paths can be astronomically large. Use long long or modular arithmetic.
LIS: thinking tails contains the actual LIS: It doesn't! tails contains the smallest possible tail elements for subsequences of each length. The actual LIS must be reconstructed separately.

Chapter Summary

📌 Key Takeaways

Problem	State Definition	Recurrence	Complexity
LIS (`O(N²)`)	`dp[i]` = LIS length ending at A[i]	`dp[i]` = max(`dp[j]`+1), j<i and A[j]<A[i]	`O(N²)`
LIS (`O(N log N)`)	tails[k] = min tail of IS with length k+1	binary search + replace	`O(N log N)`
0/1 Knapsack (2D)	`dp[i][w]` = max value using first i items, capacity ≤ w	max(skip, take)	`O(NW)`
0/1 Knapsack (1D)	`dp[w]` = max value with capacity ≤ w	reverse iterate w	`O(NW)`
Grid Path	`dp[r][c]` = path count to reach (r,c)	`dp[r-1][c]` + `dp[r][c-1]`	`O(RC)`

❓ FAQ

Q1: In the O(N log N) LIS solution, does the tails array store the actual LIS?

A: No! tails stores "the minimum tail element of increasing subsequences of each length". Its length equals the LIS length, but the elements themselves may not form a valid increasing subsequence. To reconstruct the actual LIS, you need to record each element's "predecessor".

Q2: Why does 0/1 knapsack require reverse iteration over w?

A: Because dp[w] needs the "previous row's" dp[w - weight[i]]. If iterating forward, dp[w - weight[i]] may already be updated by the current row (equivalent to using item i multiple times). Reverse iteration ensures each item is used at most once.

Q3: What is the only difference between unbounded knapsack (items usable unlimited times) and 0/1 knapsack code?

A: Just the inner loop direction. 0/1 knapsack: w from W down to weight[i] (reverse). Unbounded knapsack: w from weight[i] up to W (forward).

Q4: What if the grid path can also move up or left?

A: Then simple grid DP no longer works (because there would be cycles). You need BFS/DFS or more complex DP. Standard grid path DP only applies to "right/down only" movement.

🔗 Connections to Later Chapters

Chapter 3.3 (Sorting & Binary Search): binary search is the core of O(N log N) LIS — lower_bound on the tails array
Chapter 6.3 (Advanced DP): extends knapsack to bitmask DP (item sets → bitmask), extends grid DP to interval DP
Chapter 4.1 (Greedy): interval scheduling problems can sometimes be converted to LIS (via Dilworth's theorem)
LIS is extremely common in USACO Silver — 2D LIS, weighted LIS, LIS counting variants appear frequently

Practice Problems

Problem 6.2.1 — LIS Length 🟢 Easy Read N integers. Find the length of the longest strictly increasing subsequence.

Hint

Use the `O(N log N)` approach with `lower_bound` on the `tails` array. Answer is `tails.size()`.

Problem 6.2.2 — Number of LIS 🔴 Hard Read N integers. Find the number of distinct longest increasing subsequences. (Answer modulo 10^9+7.)

Solution sketch: Maintain both dp[i] (LIS length ending at i) and cnt[i] (number of such LIS). When dp[j]+1 > dp[i]: update dp[i] and reset cnt[i] = cnt[j]. When equal: add cnt[j] to cnt[i].

Hint

This requires the `O(N²)` approach. For each i, find all j < i where A[j] < A[i] and `dp[j]`+1 = `dp[i]`. Sum up their ``cnt[j]`` values.

Problem 6.2.3 — 0/1 Knapsack 🟡 Medium N items with weights and values, capacity W. Find maximum value. (N, W ≤ 1000)

Hint

Space-optimized 1D dp: iterate items in outer loop, weights BACKWARDS (W down to weight[i]) in inner loop.

Problem 6.2.4 — Collect Stars 🟡 Medium An N×M grid has stars ('*') and obstacles ('#'). Moving only right or down from (1,1) to (N,M), collect as many stars as possible. Print the maximum stars collected.

Hint

`dp[r][c]` = max stars collected to reach (r,c). For each cell, `dp[r][c]` = max(`dp[r-1][c]`, `dp[r][c-1]`) + (1 if grid[r][c]=='*').

Problem 6.2.5 — Variations of Knapsack 🔴 Hard Read N items each with weight w[i] and value v[i]. Capacity W.

Variant A: Each item available up to k[i] times (bounded knapsack)
Variant B: Must fill the knapsack exactly (no extra space allowed)
Variant C: Minimize weight while achieving value ≥ target V

Solution sketch: (A) Treat each item as k[i] copies for 0/1 knapsack, or use monotonic deque optimization. (B) Initialize dp[0] = 0, all other dp[w] = INF, answer is dp[W]. (C) Swap the roles of weight and value in the DP.

Hint

For variant B: change "INF means unreachable" to "INF means infeasible". Only states reachable from `dp[0]`=0 will have finite values. For variant C: `dp[v]` = minimum weight to achieve exactly value v.

🏆 Challenge Problem: USACO 2019 January Silver: Grass Planting Each of N fields has a certain grass density. Farmer John can re-plant any number of non-overlapping intervals. Design a DP to maximize the number of fields with the specific grass density he wants after at most K re-plantings. (Interval DP combined with 1D DP)

Visual: LIS via Patience Sorting

LIS Patience Sort

This diagram illustrates LIS using the patience sorting analogy. Each "pile" represents a potential subsequence endpoint. The number of piles equals the LIS length. Binary search finds where each card goes in O(log N), giving an O(N log N) overall algorithm.

Visual: Knapsack DP Table

Knapsack DP Table

The 0/1 Knapsack DP table: rows = items considered, columns = capacity. Each cell shows the maximum value achievable. Blue cells show single-item contributions, green cells show combinations, and the starred cell is the optimal answer.

📖 Chapter 6.3 ⏱️ ~55 min read 🎯 Advanced

Chapter 6.3: Advanced DP Patterns

📝 Before You Continue: You must have completed Chapter 6.1 (Introduction to DP) and Chapter 6.2 (Classic DP Problems). Advanced patterns build on memoization, tabulation, and the classic DP problems (LIS, knapsack, grid paths).

This chapter covers DP techniques that appear at USACO Silver and above: bitmask DP, interval DP, tree DP, and digit DP. Each has a characteristic structure that, once recognized, makes the problem tractable.

6.3.1 Bitmask DP

When to use: Problems involving subsets of a small set (N ≤ 20), where the state includes "which elements have been selected."

Core idea: Represent the set of selected elements as a bitmask (integer). Bit i is 1 if element i is included.

{0, 2, 3} in a set of 5 elements → bitmask = 0b01101 = 13
bit 0 = 1 (element 0 ∈ set)
bit 1 = 0 (element 1 ∉ set)
bit 2 = 1 (element 2 ∈ set)
bit 3 = 1 (element 3 ∈ set)
bit 4 = 0 (element 4 ∉ set)

Essential Bitmask Operations

// Element operations
int mask = 0;
mask |= (1 << i);      // add element i to set
mask &= ~(1 << i);     // remove element i from set
bool has_i = (mask >> i) & 1;  // check if element i is in set

// Enumerate all subsets of mask
for (int sub = mask; sub > 0; sub = (sub - 1) & mask) {
    // process subset 'sub'
}
// Include the empty subset too: add sub=0 after the loop

// Count bits set (number of elements in set)
int count = __builtin_popcount(mask);   // for int
int count = __builtin_popcountll(mask); // for long long

// Enumerate all masks with exactly k bits set
for (int mask = 0; mask < (1 << n); mask++) {
    if (__builtin_popcount(mask) == k) { /* ... */ }
}

Classic: Traveling Salesman Problem (TSP) — O(2^N × N²)

Problem: N cities, complete weighted graph. Find the minimum-cost Hamiltonian path (visit every city exactly once).

State: dp[mask][u] = minimum cost to visit exactly the cities in mask, ending at city u.

Transition: To extend to city v not in mask:

dp[mask | (1<<v)][v] = min(dp[mask][v], dp[mask][u] + dist[u][v])

// Solution: TSP with Bitmask DP — O(2^N × N^2)
// Works for N ≤ 20 (2^20 × 400 ≈ 4×10^8 — tight; N≤18 is safer)
#include <bits/stdc++.h>
using namespace std;

typedef long long ll;
const ll INF = 1e18;

int n;
int dist[20][20];
ll dp[1 << 20][20];

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    cin >> n;
    for (int i = 0; i < n; i++)
        for (int j = 0; j < n; j++)
            cin >> dist[i][j];

    // Initialize: INF everywhere
    for (int mask = 0; mask < (1 << n); mask++)
        fill(dp[mask], dp[mask] + n, INF);

    // Base case: start at city 0, only city 0 visited
    dp[1][0] = 0;  // mask=1 (bit 0 set), at city 0, cost=0

    // Fill DP
    for (int mask = 1; mask < (1 << n); mask++) {
        for (int u = 0; u < n; u++) {
            if (!(mask & (1 << u))) continue;  // u not in current set
            if (dp[mask][u] == INF) continue;

            // Try extending to city v not yet visited
            for (int v = 0; v < n; v++) {
                if (mask & (1 << v)) continue;  // v already visited
                int newMask = mask | (1 << v);
                dp[newMask][v] = min(dp[newMask][v], dp[mask][u] + dist[u][v]);
            }
        }
    }

    // Answer: minimum over all ending cities to return to city 0
    // (or just minimum over all ending cities for Hamiltonian PATH, not cycle)
    int fullMask = (1 << n) - 1;  // all cities visited
    ll ans = INF;
    for (int u = 1; u < n; u++) {  // end at any city except 0
        ans = min(ans, dp[fullMask][u] + dist[u][0]);  // return to 0 for cycle
    }

    cout << ans << "\n";
    return 0;
}

⚠️ Memory Warning: dp[1<<20][20] uses 2^20 × 20 × 8 bytes ≈ 168MB（而非 160MB）. For N=20, this is close to typical 256MB memory limits. If distances fit in int, use int dp instead of long long to halve memory to ~84MB.

6.3.2 Interval DP

When to use: Problems where the answer for a larger interval can be built from answers for smaller intervals. Keywords: "merge," "split," "burst," "matrix chain."

Core structure:

dp[l][r] = optimal answer for subproblem on interval [l, r]
Base case: dp[i][i] = trivial (single element)
Transition: dp[l][r] = min/max over k ∈ [l, r-1] of:
              dp[l][k] + dp[k+1][r] + cost(l, k, r)
Fill order: by INCREASING interval length (len = r - l + 1)

Classic: Matrix Chain Multiplication — O(N³)

Problem: Multiply N matrices in sequence. Matrix i has dimensions dims[i] × dims[i+1]. The number of scalar multiplications to multiply A (p×q) by B (q×r) is p*q*r. Find the parenthesization that minimizes total multiplications.

State: dp[l][r] = minimum multiplications to compute the product of matrices l through r.

// Solution: Matrix Chain Multiplication — O(N^3), O(N^2) space
#include <bits/stdc++.h>
using namespace std;

typedef long long ll;
const ll INF = 1e18;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;
    // dims[i] = rows of matrix i; dims[i+1] = cols of matrix i
    vector<int> dims(n + 1);
    for (int i = 0; i <= n; i++) cin >> dims[i];

    // dp[l][r] = min multiplications to compute M_l × M_{l+1} × ... × M_r
    vector<vector<ll>> dp(n + 1, vector<ll>(n + 1, 0));

    // Fill by increasing interval length
    for (int len = 2; len <= n; len++) {          // len = number of matrices
        for (int l = 1; l + len - 1 <= n; l++) {
            int r = l + len - 1;
            dp[l][r] = INF;

            // Try all split points k (split after matrix k)
            for (int k = l; k < r; k++) {
                // Cost: compute [l..k], compute [k+1..r], then multiply the results
                // Result of [l..k]: dims[l-1] × dims[k]
                // Result of [k+1..r]: dims[k] × dims[r]
                ll cost = dp[l][k] + dp[k+1][r]
                        + (ll)dims[l-1] * dims[k] * dims[r]; // ← KEY: cost of final multiply
                dp[l][r] = min(dp[l][r], cost);
            }
        }
    }

    cout << dp[1][n] << "\n";
    return 0;
}

Worked Example:

3 matrices: A(10×30), B(30×5), C(5×60)
dims = [10, 30, 5, 60]

dp[1][1] = dp[2][2] = dp[3][3] = 0 (single matrices, no multiplication)

len=2:
  dp[1][2] = dp[1][1] + dp[2][2] + 10*30*5 = 0 + 0 + 1500 = 1500
  dp[2][3] = dp[2][2] + dp[3][3] + 30*5*60 = 0 + 0 + 9000 = 9000

len=3:
  dp[1][3]: try k=1 and k=2
    k=1: dp[1][1] + dp[2][3] + 10*30*60 = 0 + 9000 + 18000 = 27000
    k=2: dp[1][2] + dp[3][3] + 10*5*60 = 1500 + 0 + 3000 = 4500  ← minimum!
  dp[1][3] = 4500

Answer: 4500 (parenthesize as (A×B)×C)
Verify: (10×30)×5 = 1500 ops, then (10×5)×60 = 3000 ops, total = 4500 ✓

Classic: Burst Balloons (Variant of Interval DP)

Problem: N balloons with values. Burst balloon i: earn left_value × value[i] × right_value. Find maximum coins.

// dp[l][r] = max coins from bursting ALL balloons in (l, r) exclusively
// (l and r are boundaries, not burst)
// Key insight: think about which balloon is burst LAST in [l, r]
// (The last balloon sees l and r as neighbors)

// Add sentinel balloons: val[-1] = val[n] = 1
vector<int> val(n + 2);
val[0] = val[n + 1] = 1;
for (int i = 1; i <= n; i++) cin >> val[i];

vector<vector<ll>> dp(n + 2, vector<ll>(n + 2, 0));

for (int len = 1; len <= n; len++) {
    for (int l = 1; l + len - 1 <= n; l++) {
        int r = l + len - 1;
        for (int k = l; k <= r; k++) {
            // k is the LAST balloon burst in [l, r]
            // When k is burst, its neighbors are l-1 and r+1 (sentinels)
            ll cost = dp[l][k-1] + dp[k+1][r]
                    + (ll)val[l-1] * val[k] * val[r+1];
            dp[l][r] = max(dp[l][r], cost);
        }
    }
}
cout << dp[1][n] << "\n";

6.3.3 Tree DP

When to use: DP on a tree, where the state of a node depends on its subtree (post-order) or its ancestors (pre-order).

Pattern: Subtree DP (Post-Order)

dp[u] = some value computed from dp[children of u]
Process nodes in post-order (leaves first, root last)

Classic: Tree Knapsack / Maximum Independent Set on Tree

Problem: N nodes, each with value val[u]. Select a subset S maximizing total value, subject to: if u ∈ S, then no child of u is in S.

State: dp[u][0] = max value from subtree of u if u is NOT selected. dp[u][1] = max value from subtree of u if u IS selected.

// Solution: Max Independent Set on Tree — O(N)
#include <bits/stdc++.h>
using namespace std;

const int MAXN = 100005;
vector<int> children[MAXN];
int val[MAXN];
long long dp[MAXN][2];  // dp[u][0/1] = max value if u excluded/included

// DFS post-order: compute dp[u] after computing all dp[children]
void dfs(int u) {
    dp[u][1] = val[u];  // include u: get val[u]
    dp[u][0] = 0;        // exclude u: get 0 from this node

    for (int v : children[u]) {
        dfs(v);  // ← process child first (post-order)

        // If we INCLUDE u: children must be EXCLUDED
        dp[u][1] += dp[v][0];

        // If we EXCLUDE u: children can be either included or excluded
        dp[u][0] += max(dp[v][0], dp[v][1]);
    }
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, root;
    cin >> n >> root;
    for (int i = 1; i <= n; i++) cin >> val[i];

    for (int i = 0; i < n - 1; i++) {
        int u, v;
        cin >> u >> v;
        children[u].push_back(v);
        // Note: if the tree is given as undirected edges, need to root it first
    }

    dfs(root);
    cout << max(dp[root][0], dp[root][1]) << "\n";
    return 0;
}

Tree Diameter (Two DFS)

// Tree Diameter: longest path between any two nodes
// Method: Two DFS
// 1. DFS from any node u → find farthest node v
// 2. DFS from v → find farthest node w
// dist(v, w) = diameter

int farthest_node, max_dist;

void dfs_diameter(int u, int parent, int d, vector<int> adj[]) {
    if (d > max_dist) {
        max_dist = d;
        farthest_node = u;
    }
    for (int v : adj[u]) {
        if (v != parent) dfs_diameter(v, u, d + 1, adj);
    }
}

int tree_diameter(int n, vector<int> adj[]) {
    // First DFS from node 1
    max_dist = 0; farthest_node = 1;
    dfs_diameter(1, -1, 0, adj);

    // Second DFS from farthest node found
    int v = farthest_node;
    max_dist = 0;
    dfs_diameter(v, -1, 0, adj);

    return max_dist;  // this is the diameter
}

6.3.4 Digit DP

When to use: Count numbers in range [1, N] satisfying some property related to their digits.

Core idea: Build the number digit by digit (left to right), maintaining a "tight" constraint (whether we're still bounded by N's digits).

State: dp[position][tight][...other state...]

position: which digit we're currently deciding (0 = leftmost)
tight: are we still constrained by N? (1 = yes, can't exceed N's digit; 0 = no, can use 0-9 freely)
Other state: whatever property we're tracking (sum of digits, count of zeros, etc.)

Classic: Count numbers in [1, N] with digit sum divisible by K

// Solution: Digit DP — O(|digits| × 10 × K) time, O(|digits| × K) space
#include <bits/stdc++.h>
using namespace std;

string num;     // N as a string
int K;
// dp[pos][tight][sum % K] = count of valid numbers
// Here we use top-down memoization
map<tuple<int,int,int>, long long> memo;

// pos: current digit position (0-indexed)
// tight: are we bounded by num[pos]?
// rem: current digit sum mod K
long long solve(int pos, bool tight, int rem) {
    if (pos == (int)num.size()) {
        return rem == 0 ? 1 : 0;  // complete number: valid iff digit sum ≡ 0 (mod K)
    }

    auto key = make_tuple(pos, tight, rem);
    if (memo.count(key)) return memo[key];

    int limit = tight ? (num[pos] - '0') : 9;  // max digit we can place here
    long long result = 0;

    for (int d = 0; d <= limit; d++) {
        bool new_tight = tight && (d == limit);
        result += solve(pos + 1, new_tight, (rem + d) % K);
    }

    return memo[key] = result;
}

// Count numbers in [1, N] with digit sum divisible by K
long long count_up_to(long long N) {
    num = to_string(N);
    memo.clear();
    long long ans = solve(0, true, 0);
    // Subtract 1 because 0 itself has digit sum 0 (divisible by K)
    // but we want [1, N], not [0, N]
    return ans - 1;
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    long long L, R;
    cin >> L >> R >> K;

    // Count in [L, R] = count_up_to(R) - count_up_to(L-1)
    cout << count_up_to(R) - count_up_to(L - 1) << "\n";
    return 0;
}

💡 Key Insight: The tight flag is crucial. When tight=true, we can only use digits up to num[pos]. Once we place a digit less than num[pos], all subsequent digits are free (0–9), so tight becomes false. This "peeling off" of the upper bound is what makes digit DP correct.

6.3.5 DP Optimization: When Standard DP Is Too Slow

Slope Trick (O(N log N) for Convex/Concave DP)

For DPs of the form dp[i] = min_{j<i} (dp[j] + cost(j, i)) where the cost function has "convex" structure.

Divide & Conquer Optimization (O(N² → N log N))

When the optimal split point opt[i][j] is monotone:

opt[i][j] ≤ opt[i][j+1] (or similar monotone property)
Reduces cubic DP to O(N log N) per DP dimension

Standard interval DP: O(N^3)
With D&C optimization: O(N^2 log N)
With Knuth's optimization: O(N^2) (requires additional condition)

📌 USACO Relevance: These optimizations are typically USACO Gold/Platinum level. For Silver, mastery of the four patterns in this chapter (bitmask, interval, tree, digit) is sufficient.

Chapter Summary

📌 Pattern Recognition Guide

Pattern	Clue in Problem	State	Transition
Bitmask DP	"subset," N ≤ 20, assign tasks	`dp[mask][last]`	Flip bit, try next element
Interval DP	"merge," "split," "parenthesize"	`dp[l][r]`	Split at k, combine
Tree DP	"tree," subtree property	`dp[node][state]`	Aggregate from children
Digit DP	"count numbers with property"	`dp[pos][tight][...]`	Try each digit d

🧩 Core Framework Quick Reference

// Bitmask DP framework
for (int mask = 0; mask < (1<<n); mask++)
    for (int u = 0; u < n; u++) if (mask & (1<<u))
        for (int v = 0; v < n; v++) if (!(mask & (1<<v)))
            dp[mask|(1<<v)][v] = min(dp[mask|(1<<v)][v], dp[mask][u] + cost[u][v]);

// Interval DP framework
for (int len = 2; len <= n; len++)           // enumerate interval length
    for (int l = 1; l+len-1 <= n; l++) {     // enumerate left endpoint
        int r = l + len - 1;
        for (int k = l; k < r; k++)           // enumerate split point
            dp[l][r] = min(dp[l][r], dp[l][k] + dp[k+1][r] + cost(l,k,r));
    }

// Tree DP framework (post-order traversal)
void dfs(int u, int parent) {
    for (int v : adj[u]) if (v != parent) {
        dfs(v, u);
        dp[u] = update(dp[u], dp[v]);  // update current node with child info
    }
}

// Digit DP framework
long long solve(int pos, bool tight, int state) {
    if (pos == len) return (state == target) ? 1 : 0;
    if (memo[pos][tight][state] != -1) return memo[pos][tight][state];
    int lim = tight ? (num[pos]-'0') : 9;
    long long res = 0;
    for (int d = 0; d <= lim; d++)
        res += solve(pos+1, tight && (d==lim), next_state(state, d));
    return memo[pos][tight][state] = res;
}

❓ FAQ

Q1: Why must interval DP enumerate by length first?

A: Because dp[l][r] depends on dp[l][k] and dp[k+1][r], both of which have length less than r-l+1. So all shorter intervals must be computed before dp[l][r]. Enumerating by length from small to large satisfies this requirement. If you enumerate l and r directly, you may compute dp[l][r] before its dependencies are ready.

Q2: In tree DP, how do you handle an unrooted tree (given undirected edges)?

A: Choose any node as root (usually node 1), then use DFS to turn undirected edges into directed edges (parent→child direction). Pass a parent parameter in DFS to avoid going back to the parent.

void dfs(int u, int par) {
    for (int v : adj[u]) {
        if (v != par) {  // only visit children, not parent
            dfs(v, u);
            // Updated dp[u]
        }
    }
}

Q3: In digit DP, can tight=true and tight=false share the same memoization array?

A: Yes, which is exactly why tight is part of the state. dp[pos][1][rem] and dp[pos][0][rem] are different states, recording "count under upper bound constraint" and "count when free" respectively. Note that tight=false states can be reused across multiple calls (once tight becomes false, the remaining digits are unconstrained).

Practice Problems

Problem 6.3.1 — Bitmask DP: Task Assignment 🟡 Medium N workers, N tasks. Worker i can do task j in time[i][j] hours. Assign each task to exactly one worker to minimize total time. (N ≤ 15)

Hint

dp[mask] = minimum time to complete the tasks in `mask`, with worker popcount(mask)-1 assigned to the "new" task. Actually: dp[mask] = min time to assign the first popcount(mask) workers to the tasks in mask.

Problem 6.3.2 — Interval DP: Palindrome Partitioning 🟡 Medium Find the minimum number of cuts to partition a string into palindromes.

Hint

First precompute isPalin[l][r] with interval DP. Then dp[i] = min cuts for s[0..i].

Problem 6.3.3 — Tree DP: Maximum Matching 🔴 Hard Find the maximum matching in a tree (maximum set of edges with no shared vertex).

Hint

dp[u][0] = max matching in subtree of u when u is NOT matched. dp[u][1] = max matching in subtree of u when u IS matched (to one child).

Problem 6.3.4 — Digit DP: Count Lucky Numbers 🟡 Medium A "lucky" number only contains digits 4 and 7. Count lucky numbers in [1, N].

Hint

This can be solved without DP (just enumerate 2^k possibilities for k ≤ 18 digits). But practice the digit DP framework: state = (position, tight, has_only_4_7_so_far).

Problem 6.3.5 — Mixed: USACO 2019 December Platinum 🔴 Hard Cow Poetry — combinatorics + DP. Count poem arrangements with specific rhyme schemes.

Hint

Group lines by their suffix hash. Use DP to count valid arrangements.

🏆 Part 7: USACO Contest Guide

Not algorithms — contest strategy. Learn how to compete: read problems, manage time, debug under pressure, and think strategically about scoring partial credit.

📚 3 Chapters · ⏱️ Read anytime · 🎯 Target: Promote from Bronze to Silver

Part 7: USACO Contest Guide

Read anytime — no prerequisites

Part 7 is different from the rest of the book. Instead of teaching algorithms, it teaches you how to compete — how to read problems, manage time, debug under pressure, and think strategically about scoring.

What Topics Are Covered

Chapter	Topic	The Big Idea
Chapter 7.1	Understanding USACO	Contest format, divisions, scoring, partial credit
Chapter 7.2	Problem-Solving Strategies	How to approach problems you've never seen before
Chapter 7.3	Ad Hoc Problems	Observation-based problems with no standard algorithm

When to Read This Part

Before your first USACO contest: Read Chapter 7.1 to understand the format
When you're stuck on practice problems: Chapter 7.2's algorithm decision tree helps
After finishing Parts 2-6: Chapter 7.2's checklist tells you if you're ready for Silver

Key Topics in This Part

Chapter 7.1: Understanding USACO

Contest schedule (4 contests/year: December, January, February, US Open)
Division structure: Bronze → Silver → Gold → Platinum
Scoring: ~1000 points, need 750+ to promote
Partial credit strategy: how to score points even without a perfect solution
Common mistakes and how to avoid them

Chapter 7.2: Problem-Solving Strategies

The Algorithm Decision Tree: Given constraints, what algorithm fits?
- N ≤ 20 → brute force/bitmask
- N ≤ 1000 → O(N²)
- N ≤ 10^5 → O(N log N)
- Grid + shortest path → BFS
- Optimal decisions → DP or greedy
Testing methodology: sample cases, edge cases, stress testing
Debugging tips: cerr, assert, AddressSanitizer
The Bronze → Silver checklist

Chapter 7.3: Ad Hoc Problems

What is ad hoc: no standard algorithm; requires problem-specific insight
The ad hoc mindset: small cases → find pattern → prove invariant → implement
6 categories: observation/pattern, simulation shortcut, constructive, invariant/impossibility, greedy observation, geometry/grid
Core techniques: parity arguments, pigeonhole, coordinate compression, symmetry reduction, think backwards
9 practice problems (Easy → Hard → Challenge) with hints
Silver-level ad hoc patterns: observation + BFS/DP/binary search

Contest Day Checklist

Refer to this on contest day:

Template compiled and tested
Read ALL THREE problems before coding anything
Work through examples by hand
Identify constraints and appropriate algorithm tier
Code the easiest problem first
Test with sample cases before submitting
For partial credit: code brute force for small cases if stuck
With 30 min left: stop adding code, focus on testing
Double-check: long long where needed? Array bounds correct?

🏆 USACO Tip: The best investment of time in the week before a contest is to re-solve 5-10 problems you've seen before, from memory. Speed + accuracy matter as much as knowledge.

📖 Chapter 7.1 ⏱️ ~40 min read 🎯 All Levels

Chapter 7.1: Understanding USACO

Before you can ace a competition, you need to understand how it works. This chapter covers everything about USACO's structure, rules, and scoring that you need to know to compete effectively.

7.1.1 What Is USACO?

The USA Computing Olympiad (USACO) is the premier competitive programming contest for pre-college students in the United States. Established in 1993, it selects the US team for the International Olympiad in Informatics (IOI).

Key facts:

Completely free and open to anyone
Competed from home, on your own computer
Problems involve algorithms and data structures
No math competition, no trivia — pure algorithmic thinking

7.1.2 Contest Format

Schedule

USACO holds 4 contests per year:

December contest (typically first or second week)
January contest
February contest
US Open (March/April) — a bit harder, 5 hours instead of 4

Contests open on a Friday and close after 4 hours of actual competition time (you choose when to start, within a 3-day window).

Problems

Each contest has 3 problems. The time limit is 4 hours (US Open: 5 hours).

Input/Output

Problems use file I/O OR standard I/O (newer contests use standard I/O)
For file I/O: input from problem.in, output to problem.out
Template for file I/O:

#include <bits/stdc++.h>
using namespace std;

int main() {
    // Redirect cin/cout to files
    freopen("problem.in", "r", stdin);
    freopen("problem.out", "w", stdout);

    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    // Your solution here

    return 0;
}

Important: Starting from 2020, most USACO problems use standard I/O. Always check the problem statement!

7.1.3 The Four Divisions

USACO has four competitive divisions, each with distinct difficulty:

Visual: USACO Divisions Pyramid

USACO Divisions

The pyramid shows USACO's four divisions from entry-level Bronze at the base to elite Platinum at the top. Each tier requires mastery of the concepts below it. The percentages indicate roughly what fraction of contestants compete at each level.

🥉 Bronze

Audience: Beginners with basic programming knowledge
Algorithms: Simulation, brute force, basic loops, simple arrays
Typical complexity: O(N²) or O(N³) for small N, sometimes O(N) with insights
N constraints: Usually ≤ 1000 or very small
Promotion threshold: Score 750/1000 or higher (exact threshold varies)

🥈 Silver

Audience: Intermediate programmers
Algorithms: Sorting, binary search, BFS/DFS, prefix sums, basic DP, greedy
Typical complexity: O(N log N) or O(N)
N constraints: Up to 10^5
Promotion threshold: Score 750+/1000

🥇 Gold

Audience: Advanced programmers
Algorithms: Dijkstra, segment trees, advanced DP, network flow, LCA
Typical complexity: O(N log N) to O(N log² N)
N constraints: Up to 10^5 to 10^6

💎 Platinum

Audience: Top competitors
Algorithms: Difficult combinatorics, advanced data structures, geometry
Top performers qualify for the USACO Finalist camp and possibly the IOI team (4 selected per year)

7.1.4 Scoring

How Scoring Works

Each problem has multiple test cases (typically 10–15). You earn partial credit for each test case you pass.

Each problem is worth approximately 333 points
Total: ~1000 points per contest
Exact breakdown depends on the contest

The All-Or-Nothing Myth

People think you need the perfect solution. You don't! Partial credit from simpler cases (smaller N, special structures) can get you to 750+ for promotion. In Bronze especially, many partial credit strategies exist.

Partial Credit Strategies

If you can't solve a problem fully:

Solve small cases: If N ≤ 20, brute force with O(N!) or O(2^N) often passes several test cases
Solve special cases: If the graph is a tree, or all values are equal, solve those first
Output always the same answer: If you think the answer is always "YES" or some constant, try it for the first few test cases
Time out gracefully: Make sure your partial solution doesn't crash — a TLE is better than a runtime error for some OJs

7.1.5 Time Management in Contests

The 4-Hour Strategy

First 30 minutes: Read all 3 problems. Don't code yet. Just understand them and think.

Identify which problem looks easiest
Note any edge cases or trick conditions
Start forming approaches in your head

Hours 1-2: Solve the easiest problem (usually problem 1 or 2).

Implement, test against examples, debug
Aim for 100% on at least one problem

Hours 2-3: Tackle the second-easiest problem.

If stuck, consider partial credit approaches

Final hour: Either finish the third problem or consolidate/debug existing solutions.

With 30 minutes left: stop adding new code; focus on testing and fixing bugs

Reading the Problem

Spend 5–10 minutes reading each problem before writing any code:

Re-read the constraints (N, values, special conditions)
Work through the examples manually on paper
Think: "What algorithm does this remind me of?"

If You're Stuck

Try small examples manually — what pattern do you see?
Think about simpler versions: what if N=1? N=2? N=10?
Consider: is this a graph problem? A DP? A sorting/greedy problem?
Write brute force first — it might be fast enough, or it helps you understand the structure

7.1.6 Common Mistake Patterns

1. Off-by-One Errors

// Wrong: misses last element
for (int i = 0; i < n - 1; i++) { ... }

// Wrong: accesses arr[n] — out of bounds!
for (int i = 0; i <= n; i++) { cout << arr[i]; }

// Correct
for (int i = 0; i < n; i++) { ... }      // 0-indexed
for (int i = 1; i <= n; i++) { ... }     // 1-indexed

2. Integer Overflow

int a = 1e9, b = 1e9;
int wrong = a * b;            // OVERFLOW
long long right = (long long)a * b;  // Correct

3. Uninitialized Variables

int ans;  // uninitialized — has garbage value!
// Always initialize:
int ans = 0;
int best = INT_MIN;

4. Wrong Answer on Empty Input / Edge Cases

// What if n = 0?
int maxVal = arr[0];  // crash if n = 0!
// Check: if (n == 0) { cout << 0; return 0; }

5. Using `endl` Instead of `"\n"`

// Slow (flushes buffer every time)
for (int i = 0; i < n; i++) cout << arr[i] << endl;

// Fast
for (int i = 0; i < n; i++) cout << arr[i] << "\n";

6. Forgetting to Handle All Cases

Read the problem carefully. "What if all cows have the same height?" "What if N=1?" Test these edge cases.

7.1.7 Bronze Problem Types Cheat Sheet

Category	Description	Key Technique
Simulation	Follow instructions step by step	Implement carefully; use arrays/maps
Counting	Count elements satisfying some condition	Loops, prefix sums, hash maps
Geometry	Points, rectangles on a grid	Index carefully, avoid float errors
Sorting-based	Sort and check properties	`std::sort` + scan
String processing	Manipulate character sequences	String indexing, maps
Ad hoc	Clever observation, no standard algo	Read carefully, find the pattern (see Chapter 7.3)

Chapter Summary

📌 Key Takeaways

Topic	Key Points
Format	4 contests per year, 4 hours each, 3 problems
Divisions	Bronze → Silver → Gold → Platinum
Scoring	~1000 points per contest, need 750+ to advance
Partial credit	Brute force on small data still earns points
Time management	Read all problems first, start with the easiest
Common bugs	Overflow, off-by-one, uninitialized variables

❓ FAQ

Q1: What language does USACO use? Is C++ recommended?

A: USACO supports C++, Java, Python. C++ is strongly recommended — it's the fastest (Python is 10-50x slower), with a rich STL. Java works too, but is ~2x slower than C++ and more verbose. This book uses C++ throughout.

Q2: How long does it take to advance from Bronze to Silver?

A: It varies. Students with programming background typically take 2-6 months (5-10 hours of practice per week). Complete beginners may need 6-12 months. The key is not the time, but effective practice — solve problems + read editorials + reflect.

Q3: Can you look things up online during the contest?

A: You can look up general reference materials (like C++ reference, algorithm tutorials), but cannot look up existing USACO editorials or get help from others. USACO is open-resource but independently completed.

Q4: Is there a penalty for wrong answers?

A: No. USACO allows unlimited resubmissions, and only the last submission counts. So submitting a partially correct solution first, then optimizing, is a smart strategy.

Q5: When should you give up on a problem and move to the next?

A: If you've been stuck on a problem for 40+ minutes with no new ideas, consider moving to the next. But before switching, submit your current code to get partial credit. Come back if you have time at the end.

🔗 Connections to Other Chapters

Chapters 2.1-2.3 (Part 2) cover all C++ knowledge needed for Bronze
Chapters 3.1-3.11 (Part 3) cover core data structures and algorithms for Silver
Chapters 5.1-5.4 (Part 5) cover graph theory at the Silver/Gold boundary
Chapters 4.1-4.2, 6.1-6.3 (Parts 4, 6) cover greedy and DP for Silver/Gold
Chapter 7.2 continues this chapter with deeper problem-solving strategies and thinking methods
Chapter 7.3 gives a full deep dive into ad hoc problems — the 10–15% of Bronze problems that require creative observation rather than standard algorithms

7.1.8 Complete Bronze Problem Taxonomy

Bronze problems fall into these 10 categories. Knowing the taxonomy helps you recognize patterns instantly.

#	Category	Description	Key Approach	Example
1	Simulation	Follow given rules step by step	Implement carefully, use arrays	"Simulate N cows moving"
2	Counting / Iteration	Count elements satisfying a condition	Nested loops, prefix sums	"Count pairs with sum K"
3	Sorting + Scan	Sort, then scan with a simple check	`std::sort` + linear scan	"Find median, find closest pair"
4	Grid / 2D array	Process cells in a 2D grid	Index carefully, BFS/DFS	"Count connected regions"
5	String processing	Manipulate character sequences	String indexing, maps	"Find most frequent substring"
6	Brute Force Search	Try all possibilities	Nested loops over small N	"Try all subsets of ≤ 20 items"
7	Geometry (integer)	Points, rectangles on a grid	Integer arithmetic, no floats	"Area of overlapping rectangles"
8	Math / Modular	Number theory, patterns	Modular arithmetic, formulas	"Nth element of sequence"
9	Data Structure	Use the right container	Map, set, priority queue	"Who arrives first?"
10	Ad Hoc / Observation	Clever insight, no standard algo	Read carefully, find pattern	"Unique USACO-flavored problems" — see Chapter 7.3 for deep dive

Bronze Category Breakdown (estimated frequency):

Simulation:         ████████████ ~30%
Counting/Loops:     ████████     ~20%
Sorting+Scan:       ██████       ~15%
Grid/2D:            █████        ~12%
Ad Hoc:             █████        ~12%
Other:              ████         ~11%

7.1.9 Silver Problem Taxonomy

Silver problems require more sophisticated algorithms. Here are the main categories:

Category	Key Algorithms	N Constraint	Time Needed
Sorting + Greedy	Sort + sweep, interval scheduling	N ≤ 10^5	O(N log N)
Binary Search	BS on answer, parametric search	N ≤ 10^5	O(N log N) or O(N log² N)
BFS/DFS	Shortest path, components, flood fill	N ≤ 10^5	O(N + M)
Prefix Sums	1D/2D range queries, difference arrays	N ≤ 10^5	O(N)
Basic DP	1D DP, LIS, knapsack, grid paths	N ≤ 5000	O(N²) or O(N log N)
DSU	Dynamic connectivity, Kruskal's MST	N ≤ 10^5	O(N α(N))
Graph + DP	DP on trees, DAG paths	N ≤ 10^5	O(N) or O(N log N)

Time Complexity Limits for USACO

This is crucial: USACO problems have tight time limits (typically 2–4 seconds). Use this table to determine the required algorithm complexity.

N (input size)	Required Complexity	Allowed Algorithms
N ≤ 10	O(N!)	Permutation brute force
N ≤ 20	O(2^N × N)	Bitmask DP, full search
N ≤ 100	O(N³)	Floyd-Warshall, interval DP
N ≤ 1,000	O(N²)	Standard DP, pairwise
N ≤ 10,000	O(N² / constants)	Optimized O(N²) sometimes OK
N ≤ 100,000	O(N log N)	Sort, BFS, binary search, DSU
N ≤ 1,000,000	O(N)	Linear algorithms, prefix sums
N ≤ 10^9	O(log N)	Binary search, math formulas

⚠️ Rule of thumb: ~10^8 simple operations per second. With N=10^5, O(N²) = 10^10 operations → TLE. You need O(N log N) or better.

7.1.10 How to Upsolve — When You're Stuck

"Upsolving" means solving a problem you couldn't solve during the contest, after looking at hints or the editorial. It's the most important skill for improving at USACO.

Step-by-Step Upsolving Process

Step 1: Struggle first (30–60 min)

Don't look at the editorial immediately. Struggling builds intuition.
Try small examples (N=2, N=3). What's the pattern?
Think: "What algorithm does this smell like?"

Step 2: Get a hint, not the solution

Look at just the first line of the editorial: "This is a BFS problem" or "Sort first."
Try again with just that hint.

Step 3: Read the full editorial

Read slowly. Understand why the algorithm works, not just what it does.
Ask yourself: "What insight am I missing? Why didn't I think of this?"

Step 4: Implement from scratch

Don't copy the editorial's code. Write it yourself.
This is where real learning happens.

Step 5: Identify your gap

Was the issue recognizing the algorithm type? → Study more problem patterns.
Was the issue implementation? → Practice coding faster, learn STL better.
Was the issue the observation/insight? → Practice thinking about properties and invariants.

Common Reasons People Get Stuck

Reason	Fix
Don't recognize the algorithm	Study more patterns; classify every problem you solve
Know algorithm but can't implement	Code templates from memory daily
Algorithm is correct but wrong answer	Check edge cases: N=1, all same values, empty input
Algorithm is correct but TLE	Review complexity; look for unnecessary O(N) loops inside O(N) loops
Panicked during contest	Practice under timed conditions

The "Algorithm Recognition" Mental Checklist

When reading a USACO problem, ask yourself:

1. What's N? (N≤20 → bitmask; N≤10^5 → O(N log N))
2. Is there a graph/grid? → BFS/DFS
3. Is there a "minimum/maximum subject to constraint"? → Binary search on answer
4. Can the problem be modeled as: "best subsequence"? → DP
5. "Minimize max" or "maximize min"? → Binary search or greedy
6. "Connect/disconnect" queries? → DSU
7. "Range queries"? → Prefix sums or segment tree
8. Seems combinatorial with small N? → Try all cases (bitmask or permutations)

7.1.11 USACO Patterns Cheat Sheet

Pattern	Recognition Keywords	Algorithm	Example Problem
Shortest path grid	"minimum steps", "maze", "BFS"	BFS	Maze navigation
Nearest X to each cell	"closest fire", "distance to nearest"	Multi-source BFS	Fire spreading
Sort + scan	"close together", "largest gap"	Sort, adjacent pairs	Closest pair of cows
Binary search on answer	"maximize minimum distance", "minimize maximum"	BS + check	Aggressive Cows
Sliding window	"subarray sum", "contiguous", "window"	Two pointers	Max sum subarray of size K
Connected components	"regions", "islands", "groups"	DFS/BFS flood fill	Count farm regions
Dynamic connectivity	"union groups", "add connections"	DSU	Fence connectivity
Minimum spanning tree	"connect cheapest", "road network"	Kruskal's	Farm cable network
Counting pairs	"how many pairs satisfy"	Sort + two pointers or BS	Pairs with sum
1D DP	"optimal sequence of decisions"	DP array	Coin change, LIS
Grid DP	"paths in grid", "rectangular regions"	2D DP	Grid path max sum
Activity selection	"maximum non-overlapping events"	Sort by end time, greedy	Job scheduling
Prefix sum range query	"sum of range [l,r]", "2D rectangle sum"	Prefix sum	Range sum queries
Topological order	"prerequisites", "dependency order"	Topo sort	Course prerequisites
Bipartite check	"2-colorable", "odd cycle?"	BFS 2-coloring	Team division

7.1.12 Contest Strategy Refined

The First 5 Minutes Are Critical

Before writing a single line of code:

Read all 3 problems (titles and constraints first)
Estimate difficulty: Which is easiest? (Usually problem 1 at Bronze/Silver)
Note key constraints: N ≤ ?, time limit, special conditions
Mentally classify each problem using the taxonomy above

Partial Credit Strategy

Even if you can't solve a problem fully, earn partial credit:

Bronze (N ≤ ~1000 usually):
  - Brute force O(N²) or O(N³) often passes several test cases
  - "Solve small cases" approach: N ≤ 20 → brute force

Silver (N ≤ 10^5 usually):
  - O(N²) solution often passes 4-6/15 test cases (partial credit!)
  - Implement the brute force FIRST, then optimize
  
Always:
  - Make sure your code compiles and runs (no runtime errors)
  - Output something for every test case, even if wrong
  - A wrong answer beats a crash

Debugging Checklist

Before submitting:

Correct output for all given examples?
Edge case: N=1?
Integer overflow? (use long long when values > 10^9)
Array out of bounds? (size arrays carefully)
Off-by-one in loops?
Using "\n" not endl?
Reading correct number of test cases?

📖 Chapter 7.2 ⏱️ ~45 min read 🎯 All Levels

Chapter 7.2: Problem-Solving Strategies

Knowing algorithms is necessary but not sufficient. You also need to know how to think when facing a problem you've never seen before. This chapter teaches you a systematic approach.

7.2.1 How to Read a Competitive Programming Problem

USACO problems follow a consistent structure. Learn to parse it efficiently.

Problem Structure

Story/Setup — a theme (usually cows 🐄). Mostly flavor text — don't get distracted.
Task/Objective — the actual question. Read this very carefully.
Input format — how to read the data.
Output format — exactly what to print.
Sample input/output — the examples.
Constraints — the most important section for algorithm choice.

Reading Discipline

Step 1: Read the task/objective first. Then read input/output format. Step 2: Read the constraints. These tell you:

N ≤ 20 → maybe O(2^N) or O(N!)
N ≤ 1000 → probably O(N²) or O(N² log N)
N ≤ 10^5 → must be O(N log N) or O(N)
N ≤ 10^6 → must be O(N) or O(N log N)
Values up to 10^9 → might need long long
Values up to 10^18 → definitely long long

Step 3: Work through the sample manually. Verify your understanding.

Step 4: Look for hidden constraints. "All values are distinct." "The graph is a tree." "N is even." These often unlock simpler solutions.

7.2.2 Identifying the Algorithm Type

After reading the problem, ask yourself these questions in order:

Visual: Problem-Solving Flowchart

Problem Solving Flow

The flowchart above captures the complete contest workflow. The key step is mapping input constraints to algorithm complexity — use the complexity table below to make that decision quickly.

Visual: Complexity vs Input Size

Complexity Table

This reference table tells you immediately whether your chosen algorithm will pass. If N = 10⁵ and you have an O(N²) solution, it will TLE. This table should be your first mental check when designing an approach.

Question 1: Can I brute force it?

If N ≤ 15, brute force all subsets: O(2^N)
If N ≤ 8, try all permutations: O(N!)
Even if brute force is too slow for full credit, it's good for partial credit and for verifying your correct solution

Question 2: Does it involve a grid or graph?

Grid with shortest path question → BFS
Grid/graph with connectivity → DFS or Union-Find
Graph with weighted edges, shortest path → Dijkstra (Gold topic)
Tree structure → Tree DP or LCA

Question 3: Does it involve sorted data?

Finding closest elements → Sort + adjacent scan
Range queries → Binary search or prefix sums
"Can we achieve value X?" type question → Binary search on answer

Question 4: Does it involve optimal decisions over a sequence?

"Maximum/minimum cost path" → DP
"Maximum number of non-overlapping intervals" → Greedy
"Minimum operations to transform X to Y" → BFS (if small state space) or DP

Question 5: Does it involve counting?

Counting subsets → Bitmask DP (if small N) or combinatorics
Counting paths in a DAG → DP
Frequency of elements → Hash map

The Algorithm Decision Tree

Is N ≤ 20?
├── YES → Try brute force (O(2^N) or O(N!))
└── NO
    Is it a graph/grid problem?
    ├── YES
    │   Is it about shortest path?
    │   ├── YES (unweighted) → BFS
    │   ├── YES (weighted) → Dijkstra (Gold)
    │   └── NO (connectivity) → DFS / Union-Find
    └── NO
        Does sorting help?
        ├── YES → Sort + scan / binary search
        └── NO
            Does it have "overlapping subproblems"?
            ├── YES → Dynamic Programming
            └── NO → Greedy / simulation

7.2.3 Testing with Examples

Always Test the Given Examples First

Before submitting, verify your solution produces exactly the right output for all provided examples.

# Compile
g++ -o sol solution.cpp -std=c++17

# Test with sample input
echo "5
3 1 4 1 5" | ./sol

# Or from file
./sol < sample.in

Create Your Own Test Cases

The provided examples are easy. Create:

Minimum case: N=1, N=0, empty input
Maximum case: N at max constraint, all values at max
All same values: N elements all equal
Already sorted / reverse sorted
Special structures: Complete graph, path graph, star graph (for graph problems)

Stress Testing

Write a brute-force solution for small N, then compare against your optimized solution on random inputs:

// brute.cpp — simple O(N^3) solution
// sol.cpp — your O(N log N) solution

// stress_test.sh:
for i in {1..1000}; do
    # Generate random test
    python3 gen.py > test.in
    # Run both solutions
    ./brute < test.in > expected.out
    ./sol < test.in > got.out
    # Compare
    if ! diff -q expected.out got.out > /dev/null; then
        echo "MISMATCH on test $i"
        cat test.in
        break
    fi
done
echo "All tests passed!"

Stress testing catches subtle bugs that sample cases miss.

7.2.4 Debugging Tips for C++

Strategy 1: Print Everything

When something's wrong, add cerr statements to trace your program's execution. cerr goes to standard error (separate from standard output):

cerr << "At node " << u << ", dist = " << dist[u] << "\n";
cerr << "Array state: ";
for (int x : arr) cerr << x << " ";
cerr << "\n";

Why cerr not cout? cout goes to standard output where the judge checks your answer. cerr goes to standard error, which the judge usually ignores. So your debug output doesn't pollute your answer.

Strategy 2: Use `assert` for Invariants

assert(n >= 1 && n <= 100000);   // crashes with a message if condition fails
assert(dist[v] >= 0);            // check BFS invariant

Strategy 3: Check Array Bounds

Common out-of-bounds patterns:

int arr[100];
arr[100] = 5;   // Bug! Valid indices are 0-99

// Use this to detect bounds issues while debugging:
// Compile with -fsanitize=address (AddressSanitizer)
// g++ -fsanitize=address,undefined -o sol sol.cpp

Strategy 4: Rubber Duck Debugging

Explain your code line by line, out loud or in writing. The act of explaining forces you to notice inconsistencies. Many bugs are found this way — not by staring at the screen, but by articulating what each line is supposed to do.

Strategy 5: Reduce the Problem

If your code fails on a large input, manually create the smallest input that still fails. Fix that. Repeat.

Strategy 6: Read Compiler Warnings

g++ -Wall -Wextra -o sol sol.cpp

The -Wall -Wextra flags enable all warnings. Read them! Uninitialized variables, unused variables, signed/unsigned mismatches — all common USACO bugs.

7.2.5 USACO-Specific Debugging

Check Your I/O

The #1 cause of Wrong Answer on correct algorithms: wrong input/output format.

Did you read the right number of values?
Are you printing the right number of lines?
Is there a trailing space or missing newline?

Test Timing

To check if your solution is fast enough:

time ./sol < large_input.in

USACO typically allows 2–4 seconds. If your solution takes 10 seconds locally, it'll time out.

Estimate Complexity First

Before coding, calculate: "My algorithm is O(N²). N = 10^5. That's 10^10 operations. Way too slow."

Rough guide for what runs in 1 second with C++:

10^8 simple operations
10^7 complex operations (like map lookups)
10^5 × 10^3 = 10^8 for nested loops with simple body

7.2.6 From Bronze to Silver Checklist

Use this checklist to evaluate your readiness for Silver:

Algorithms to Know

Prefix sums (1D and 2D)
Binary search (including on the answer)
BFS and DFS on graphs and grids
Union-Find (DSU)
Sorting with custom comparators
Basic DP (1D DP, 2D DP, knapsack)
STL: map, set, priority_queue, vector, sort

Problem-Solving Skills

Can identify whether a problem needs BFS vs. DFS vs. DP vs. Greedy
Can implement BFS from scratch in 10 minutes
Can implement DSU from scratch in 5 minutes
Can model grid problems as graphs
Knows how to binary search on the answer
Comfortable with 2D arrays and grid traversal

Contest Skills

Can write a clean template with fast I/O in 30 seconds
Never forget long long when needed
Always test with sample cases before submitting
Can read and understand constraints quickly
Has practiced at least 20 Bronze problems
Has solved at least 5 Silver problems (even with hints)

Practice Plan

Solve all easily available USACO Bronze problems (2016–2024)
For each problem you can't solve in 2 hours: read editorial, implement from scratch
After solving 30+ Bronze problems, attempt Silver: start with 2016–2018 Silver
Keep a problem log: problem name, techniques used, key insight

7.2.7 Resources

Official

USACO website: usaco.org — contest archive, editorials
USACO training: train.usaco.org — old but good structured curriculum

Unofficial

USACO Guide: usaco.guide — excellent community-written guide, highly recommended
Codeforces: codeforces.com — more problems and contests
AtCoder: atcoder.jp — high-quality educational problems

Books

Competitive Programmer's Handbook by Antti Laaksonen — free PDF, excellent
Introduction to Algorithms (CLRS) — the bible for theory (heavy reading)

Chapter Summary

📌 Key Takeaways

Skill	Practice Until...
Reading	Understand the problem within 3 minutes
Algorithm ID	Guess the right approach 70%+ of the time
Implementation	Finish standard problems in ≤30 minutes
Debugging	Locate and fix bugs within 30 minutes
Testing	Develop the habit of testing edge cases before submitting

🧩 "Problem-Solving Mindset" Quick Checklist

Step	Question to Ask Yourself
1. Check N range	N ≤ 20 → brute force/bitmask; N ≤ 10^5 → O(N log N)
2. Graph/grid?	Yes → BFS/DFS/DSU
3. Optimize a value?	"maximize minimum" or "minimize maximum" → binary search on answer
4. Overlapping subproblems?	Yes → DP
5. Sort then greedy?	Yes → Greedy
6. Range queries?	Yes → prefix sum / segment tree

❓ FAQ

Q1: What to do when you encounter a completely unfamiliar problem type?

A: ① First write a brute force for small data to get partial credit; ② Draw diagrams, manually compute small examples to find patterns; ③ Try simplifying the problem (if 2D, think about the 1D version first); ④ If still stuck, move to the next problem and come back later.

Q2: How to improve "problem recognition" ability?

A: Deliberate categorized practice. After each problem, record its "tags" (BFS, DP, greedy, binary search, etc.). After enough practice, you'll immediately associate similar constraints and keywords with the right algorithm. The Pattern Cheat Sheet in Chapter 7.1 of this book is a good starting point.

Q3: In a contest, should you write brute force first or go straight to the optimal solution?

A: Write brute force first. Brute force code usually takes only 5 minutes and serves three purposes: ① gets partial credit; ② helps you understand the problem; ③ can be used for stress testing to verify the optimal solution. Even if you're confident in your solution, it's recommended to write brute force first.

Q4: How to use stress testing for efficient debugging?

A: Write three programs: brute.cpp (correct brute force), sol.cpp (your optimized solution), gen.cpp (random data generator). Run them in a loop and compare outputs. When a discrepancy is found, that small test case is your debugging clue. This is the most powerful debugging technique in competitive programming.

🔗 Connections to Other Chapters

The algorithm decision tree in this chapter covers the core algorithms from all chapters in this book
Chapter 7.1 covers USACO contest rules and problem categories; this chapter covers "how to solve problems"
The Bronze-to-Silver Checklist summarizes all knowledge points from Chapters 2.1–6.3
The Stress Testing technique in this chapter can be applied to Practice Problems in all chapters

The journey from Bronze to Silver is about volume of practice combined with deliberate reflection. After each problem you solve — or fail to solve — ask: "What was the key insight? How do I recognize this type faster next time?"

Good luck, and enjoy the cows. 🐄

📖 Chapter 7.3 ⏱️ ~50 min read 🎯 Bronze → Silver

Chapter 7.3: Ad Hoc Problems

"Ad hoc" is Latin for "for this purpose." An ad hoc problem has no standard algorithm — you must invent a solution specifically for that problem.

Ad hoc problems are the most creative and often the most frustrating category in competitive programming. They don't fit neatly into "BFS" or "DP" or "greedy." Instead, they require you to observe a key property of the problem and exploit it directly.

At USACO Bronze, roughly 10–15% of problems are ad hoc. At Silver, they appear less frequently but are often the hardest problem on the set. Learning to recognize and solve them is a crucial skill.

7.3.1 What Is an Ad Hoc Problem?

Definition

An ad hoc problem is one where:

No standard algorithm (BFS, DP, greedy, etc.) directly applies
The solution relies on a clever observation or mathematical insight specific to the problem
Once you see the key insight, the implementation is usually simple

How to Recognize Ad Hoc Problems

When reading a problem, if you ask yourself "What algorithm is this?" and the answer is "...none of the above," it's probably ad hoc.

Common signals:

The problem involves a small, specific structure (e.g., a 3×3 grid, a sequence of length ≤ 10)
The problem asks about a property that seems hard to compute directly
The constraints are unusual (e.g., N ≤ 50, or values are very small)
The problem has a "trick" that makes it much simpler than it looks
The problem involves simulation but with a hidden shortcut

Ad Hoc vs. Other Categories

Category	Key Feature	Example
Simulation	Follow rules step by step; no shortcut needed	"Simulate N cows moving for T steps"
Greedy	Local optimal choice leads to global optimum	"Schedule jobs to minimize lateness"
DP	Overlapping subproblems, optimal substructure	"Minimum coins to make change"
Ad Hoc	Clever observation eliminates brute force	"Find the pattern; implement it directly"

💡 Key distinction: Simulation problems are also "ad hoc" in spirit, but they're straightforward to implement once understood. True ad hoc problems require an insight that isn't obvious from the problem statement.

7.3.2 The Ad Hoc Mindset

Solving ad hoc problems requires a different mental approach than algorithmic problems.

Step 1: Understand the Problem Deeply

Don't rush to code. Spend 5–10 minutes just thinking about the problem:

What is the problem really asking?
What makes this problem hard?
What would make it easy?

Step 2: Try Small Cases

Work through examples with N = 2, 3, 4 by hand. Look for patterns:

Does the answer follow a formula?
Is there a symmetry or invariant?
Can you reduce the problem to a simpler form?

Step 3: Look for Invariants

An invariant is a property that doesn't change as the problem evolves. Finding invariants often unlocks ad hoc solutions.

Example: In a problem where you can swap adjacent elements, the parity of the number of inversions is an invariant. If the initial and target configurations have different parities, the answer is "impossible."

Step 4: Consider the Extremes

What happens when all values are equal?
What happens when N = 1?
What happens when all values are at their maximum?

Extreme cases often reveal the structure of the solution.

Step 5: Think About What You're Really Computing

Sometimes the problem description obscures a simpler underlying computation. Ask: "Is there a formula for this?"

7.3.3 Ad Hoc Problem Categories

Ad hoc problems at USACO Bronze/Silver fall into several recurring patterns:

Category 1: Observation / Pattern Finding

The key is to find a mathematical pattern or formula.

Typical structure: Given some sequence or structure, find a property that can be computed directly.

Example problem: You have N cows in a circle. Each cow either faces left or right. A cow is "happy" if it faces the same direction as both its neighbors. How many cows are happy?

Brute force: Check each cow's neighbors — O(N). This is already optimal, but the insight is recognizing that you just need to count "same-same-same" triples.

Category 2: Simulation with a Shortcut

The problem looks like a simulation, but the naive simulation is too slow. There's a mathematical shortcut.

Typical structure: "Repeat this operation T times" where T is huge (up to 10^9).

Key insight: The state space is finite, so the sequence must eventually cycle. Find the cycle length, then use modular arithmetic.

Example:

// Naive: simulate T steps — O(T), too slow if T = 10^9
// Smart: find cycle length C, then simulate T % C steps — O(C)

int simulate(vector<int> state, int T) {
    map<vector<int>, int> seen;
    int step = 0;
    while (step < T) {
        if (seen.count(state)) {
            int cycle_start = seen[state];
            int cycle_len = step - cycle_start;
            int remaining = (T - step) % cycle_len;
            // simulate 'remaining' more steps
            for (int i = 0; i < remaining; i++) {
                state = next_state(state);
            }
            return answer(state);
        }
        seen[state] = step;
        state = next_state(state);
        step++;
    }
    return answer(state);
}

Category 3: Constructive / Build the Answer

Instead of searching for the answer, construct it directly.

Typical structure: "Find any configuration satisfying these constraints" or "Is it possible to achieve X?"

Key insight: Think about what constraints must be satisfied, then build a solution that satisfies them.

Example: Given N, construct a permutation of 1..N such that no two adjacent elements differ by more than K.

Insight: Sort the elements and interleave them: place elements at positions 1, K+1, 2K+1, ... then 2, K+2, 2K+2, ...

Category 4: Invariant / Impossibility

Prove that something is impossible by finding an invariant that the target state violates.

Typical structure: "Can you transform state A into state B using these operations?"

Key insight: Find a quantity that is preserved (or changes in a predictable way) under each operation. If A and B have different values of this quantity, transformation is impossible.

Classic example: The 15-puzzle (sliding tiles). The solvability depends on the parity of the permutation combined with the blank tile's position.

Category 5: Greedy Observation

The problem looks like it needs DP, but a simple greedy observation makes it trivial.

Typical structure: Optimization problem where the greedy choice is non-obvious.

Example: You have N items with values v[i]. You can take at most K items. Maximize total value.

Obvious greedy: Sort by value descending, take top K. (This is trivial once you see it, but the problem might be disguised.)

Category 6: Geometry / Grid Observation

Problems on grids or with geometric constraints often have elegant observations.

Typical structure: Count something on a grid, or determine if a configuration is reachable.

Key insight: Often involves parity (checkerboard coloring), symmetry, or a clever coordinate transformation.

7.3.4 Worked Examples

Example 1: The Fence Painting Problem

Problem: Farmer John has a fence of length N. He paints it with two colors: red (positions a to b) and blue (positions c to d). What fraction of the fence is painted?

Naive approach: Use an array of size N, mark painted positions, count. O(N).

Ad hoc insight: The painted region is the union of two intervals. Use inclusion-exclusion:

Painted = |[a,b]| + |[c,d]| - |[a,b] ∩ [c,d]|
Intersection of [a,b] and [c,d] = [max(a,c), min(b,d)] if max(a,c) ≤ min(b,d), else 0

#include <bits/stdc++.h>
using namespace std;

int main() {
    int a, b, c, d;
    cin >> a >> b >> c >> d;
    
    int red = b - a;
    int blue = d - c;
    
    // Intersection
    int inter_start = max(a, c);
    int inter_end = min(b, d);
    int overlap = max(0, inter_end - inter_start);
    
    cout << red + blue - overlap << "\n";
    return 0;
}

Why this is ad hoc: The key insight (inclusion-exclusion on intervals) isn't a "standard algorithm" — it's a direct observation about the structure of the problem.

Example 2: Cow Lineup

Problem: N cows stand in a line. Each cow has a breed (integer 1 to K). Find the shortest contiguous subarray that contains at least one cow of every breed that appears in the array.

This looks like: Sliding window (Chapter 3.4). But wait — what if K is very large and most breeds appear only once?

Ad hoc insight: If a breed appears only once, the subarray must include that cow. So the answer must span from the leftmost "unique" cow to the rightmost "unique" cow. Then check if this span already contains all breeds.

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    vector<int> a(n);
    map<int, int> cnt;
    for (int i = 0; i < n; i++) {
        cin >> a[i];
        cnt[a[i]]++;
    }
    
    // Find breeds that appear exactly once
    set<int> unique_breeds;
    for (auto& [breed, c] : cnt) {
        if (c == 1) unique_breeds.insert(breed);
    }
    
    if (unique_breeds.empty()) {
        // Use sliding window for the general case
        // ... (standard two-pointer approach)
    } else {
        // Must include all unique-breed cows
        int lo = n, hi = -1;
        for (int i = 0; i < n; i++) {
            if (unique_breeds.count(a[i])) {
                lo = min(lo, i);
                hi = max(hi, i);
            }
        }
        // Check if [lo, hi] contains all breeds
        // ...
    }
    return 0;
}

Example 3: Cycle Detection in Simulation

Problem: A sequence of N numbers undergoes a transformation: each number is replaced by the sum of its digits. Starting from value X, how many steps until you reach a single-digit number? (N up to 10^18)

Naive approach: Simulate step by step. But what if it takes millions of steps?

Ad hoc insight: The sum of digits of a number ≤ 10^18 is at most 9×18 = 162. After one step, the value is ≤ 162. After two steps, it's ≤ 9+9 = 18. After three steps, it's a single digit. So the answer is at most 3 steps for any starting value!

#include <bits/stdc++.h>
using namespace std;

long long digit_sum(long long x) {
    long long s = 0;
    while (x > 0) { s += x % 10; x /= 10; }
    return s;
}

int main() {
    long long x;
    cin >> x;
    int steps = 0;
    while (x >= 10) {
        x = digit_sum(x);
        steps++;
    }
    cout << steps << "\n";
    return 0;
}

The insight: Recognizing that the value shrinks so rapidly that brute force is actually fast.

Example 4: Grid Coloring Invariant

Problem: You have an N×M grid. You can flip any 2×2 square (toggle all 4 cells between 0 and 1). Starting from all zeros, can you reach a target configuration?

Ad hoc insight: Consider the "checkerboard parity." Color the grid like a checkerboard (black/white). Each 2×2 flip toggles exactly 2 black and 2 white cells. Therefore, the number of black cells that are 1 and the number of white cells that are 1 always have the same parity (both start at 0, both change by ±2 or 0 with each flip).

If the target has an odd number of black 1-cells or an odd number of white 1-cells, it's impossible.

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n, m;
    cin >> n >> m;
    vector<string> grid(n);
    for (auto& row : grid) cin >> row;
    
    int black_ones = 0, white_ones = 0;
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < m; j++) {
            if (grid[i][j] == '1') {
                if ((i + j) % 2 == 0) black_ones++;
                else white_ones++;
            }
        }
    }
    
    // Both must be even for the configuration to be reachable
    if (black_ones % 2 == 0 && white_ones % 2 == 0) {
        cout << "YES\n";
    } else {
        cout << "NO\n";
    }
    return 0;
}

7.3.5 Common Ad Hoc Techniques

Technique 1: Parity Arguments

Many impossibility results come from parity. If an operation always changes some quantity by an even amount, then the parity of that quantity is an invariant.

When to use: "Can you transform A into B?" problems.

How to apply:

Identify what each operation does to some quantity Q
If every operation changes Q by an even amount, then Q mod 2 is invariant
If A and B have different Q mod 2, the answer is "impossible"

Technique 2: Pigeonhole Principle

If you have N+1 items in N categories, at least one category has ≥ 2 items.

When to use: "Prove that something must exist" or "find a guaranteed collision."

Example: In any sequence of N²+1 numbers, there exists either an increasing subsequence of length N+1 or a decreasing subsequence of length N+1 (Erdős–Szekeres theorem).

Technique 3: Coordinate Compression

When values are large but the number of distinct values is small, map values to indices 0, 1, 2, ...

vector<int> vals = {1000000, 3, 999, 42, 1000000};
sort(vals.begin(), vals.end());
vals.erase(unique(vals.begin(), vals.end()), vals.end());
// vals is now {3, 42, 999, 1000000}

// Map original value to compressed index:
auto compress = [&](int x) {
    return lower_bound(vals.begin(), vals.end(), x) - vals.begin();
};
// compress(1000000) = 3, compress(3) = 0, etc.

Technique 4: Symmetry Reduction

If the problem has symmetry, you only need to consider one representative from each equivalence class.

Example: If the problem is symmetric under rotation, you can fix one element's position and only consider the remaining N-1! arrangements instead of N!.

Technique 5: Think Backwards

Sometimes it's easier to work backwards from the target state to the initial state.

Example: "What's the minimum number of operations to reach state B from state A?" might be easier as "What's the minimum number of reverse-operations to reach state A from state B?"

Technique 6: Reformulate the Problem

Restate the problem in a different form that reveals structure.

Example: "Find the maximum number of non-overlapping intervals" can be reformulated as "find the minimum number of points that 'stab' all intervals" (they're equivalent by LP duality — but you don't need to know that; just recognize the reformulation).

7.3.6 USACO Bronze Ad Hoc Examples

Here are patterns from actual USACO Bronze problems (paraphrased):

Pattern: "Minimum operations to sort"

Problem type: Given a sequence, find the minimum number of swaps/moves to sort it.

Key insight: Often the answer is N minus the length of the longest already-sorted subsequence, or related to the number of cycles in the permutation.

Cycle decomposition approach:

// For sorting a permutation with minimum swaps:
// Answer = N - (number of cycles in the permutation)
vector<int> perm = {3, 1, 4, 2};  // 1-indexed values
int n = perm.size();
vector<bool> visited(n, false);
int cycles = 0;
for (int i = 0; i < n; i++) {
    if (!visited[i]) {
        cycles++;
        int j = i;
        while (!visited[j]) {
            visited[j] = true;
            j = perm[j] - 1;  // follow the permutation (0-indexed)
        }
    }
}
cout << n - cycles << "\n";  // minimum swaps

Pattern: "Reachability with constraints"

Problem type: Can you reach position B from position A, given movement rules?

Key insight: Often reduces to a parity or modular arithmetic condition.

Example: On a number line, you can move +3 or -5. Can you reach position T from position 0?

Insight: You can reach any position that is a multiple of gcd(3, 5) = 1, so you can reach any integer. But if the moves were +4 and +6, you can only reach multiples of gcd(4, 6) = 2.

#include <bits/stdc++.h>
using namespace std;

int main() {
    int a, b, target;
    cin >> a >> b >> target;
    // Can reach target using moves +a and -b (or +b and -a)?
    // Equivalent: can we write target = x*a - y*b for non-negative x, y?
    // Key: target must be divisible by gcd(a, b)
    if (target % __gcd(a, b) == 0) {
        cout << "YES\n";
    } else {
        cout << "NO\n";
    }
    return 0;
}

Pattern: "Count valid configurations"

Problem type: Count the number of ways to arrange/assign things satisfying constraints.

Key insight: Often the constraints reduce the count dramatically. Look for what's forced.

Example: N cows, each either black or white. Constraint: no two adjacent cows are the same color. How many valid colorings?

Insight: Once you fix the first cow's color, the entire sequence is determined. So the answer is 2 (if N ≥ 1) or 0 if the constraints are contradictory.

7.3.7 Practice Problems

🟢 Easy

P1. Fence Painting (USACO 2012 November Bronze) Farmer John paints fence posts a to b red, then c to d blue (blue overwrites red). How many posts are painted red? Blue? Both?

💡 Hint

Use an array of size 100 (posts are numbered 1–100). Mark red posts, then mark blue posts (overwriting). Count each color.

Alternatively: red_only = max(0, b-a) - overlap, where overlap = max(0, min(b,d) - max(a,c)).

P2. Digit Sum Steps Starting from integer X (1 ≤ X ≤ 10^9), repeatedly replace X with the sum of its digits until X < 10. How many steps does it take?

💡 Hint

Just simulate! The value drops so fast (sum of digits of a 9-digit number is at most 81) that you'll reach a single digit in at most 3 steps.

P3. Cow Checkerboard (ad hoc grid) An N×N grid (N ≤ 100) is colored like a checkerboard. You can swap any two adjacent cells (horizontally or vertically). Can you transform the initial configuration into the target configuration?

💡 Hint

Count the number of black cells that are '1' and white cells that are '1' in both configurations. Each swap changes both counts by the same amount (±1 each). So the difference (black_ones - white_ones) is invariant. If the initial and target have different differences, it's impossible.

🟡 Medium

P4. Permutation Sorting Given a permutation of 1..N, find the minimum number of adjacent swaps to sort it.

💡 Hint

The minimum number of adjacent swaps equals the number of inversions in the permutation (pairs (i,j) where i < j but perm[i] > perm[j]). Count inversions using merge sort or a Fenwick tree in O(N log N).

P5. Cycle Simulation (USACO-style) A function f maps {1, ..., N} to itself. Starting from position 1, repeatedly apply f. After exactly K steps (K up to 10^18), where are you?

💡 Hint

Starting from 1, the sequence must eventually cycle (since the state space is finite). Find the cycle start and length using Floyd's algorithm or a visited array. Then use modular arithmetic to find the position after K steps.

P6. Rectangle Union Area Given M axis-aligned rectangles (M ≤ 100, coordinates ≤ 1000), find the total area covered (counting overlapping regions only once).

💡 Hint

Since coordinates are ≤ 1000, use a 1000×1000 boolean grid. Mark each cell covered by at least one rectangle. Count marked cells. O(M × max_coord²) = O(100 × 10^6) — might be tight; optimize by only iterating over each rectangle's area.

🔴 Hard

P7. Reachability on a Torus (invariant problem) On an N×M grid (with wraparound — a torus), you start at (0,0). Each step, you move either (+a, 0) or (0, +b) (mod N and mod M respectively). Can you reach every cell?

💡 Hint

You can reach cell (x, y) if and only if x is a multiple of gcd(a, N) and y is a multiple of gcd(b, M). You can reach every cell if and only if gcd(a, N) = 1 and gcd(b, M) = 1.

P8. Minimum Swaps to Group (USACO 2016 February Bronze — "Milk Pails") N cows stand in a circle. Each cow is either type A or type B. You want all type-A cows to be contiguous. What is the minimum number of swaps of adjacent cows needed?

💡 Hint

Let K = number of type-A cows. Consider all windows of size K in the circular arrangement. For each window, count how many type-B cows are inside (these need to be swapped out). The answer is the minimum over all windows. This is O(N) with a sliding window.

🏆 Challenge

P9. Lights Out (classic ad hoc) You have a 5×5 grid of lights, each on or off. Pressing a light toggles it and all its orthogonal neighbors. Given an initial configuration, find the minimum number of presses to turn all lights off, or report it's impossible.

💡 Hint

Key insight: pressing a light twice is the same as not pressing it. So each light is either pressed 0 or 1 times. There are 2^25 ≈ 33 million possibilities — too many to brute force directly.

Better insight: once you decide the first row's presses (2^5 = 32 possibilities), the rest of the grid is forced (each subsequent row's presses are determined by whether the row above is fully off). Try all 32 first-row configurations and check if the last row ends up all-off.

7.3.8 Ad Hoc in USACO Silver

At Silver level, ad hoc problems are rarer but harder. They often combine an observation with a standard algorithm.

Silver Ad Hoc Patterns

Pattern	Description	Example
Observation + BFS	Key insight reduces the state space, then BFS	"Cows can only move to cells of the same color" → BFS on reduced graph
Observation + DP	Insight reveals DP structure	"Optimal solution always has this property" → DP with that property
Observation + Binary Search	Insight makes the check function simple	"Answer is monotone" → binary search on answer
Pure observation	No standard algorithm needed	"The answer is always ⌈N/2⌉"

How to Approach Silver Ad Hoc

Don't panic when you can't identify the algorithm type
Work small examples — N=2, N=3, N=4 — and look for patterns
Ask: "What's special about this problem?" — what property makes it different from a generic version?
Consider: "What if I could solve it for a simpler version?" — then generalize
Trust your observations — if you notice a pattern in small cases, it's probably correct

Chapter Summary

📌 Key Takeaways

Concept	Key Point
Definition	Ad hoc = no standard algorithm; requires problem-specific insight
Recognition	Can't identify algorithm type → probably ad hoc
Approach	Small cases → find pattern → prove it → implement
Invariants	Find quantities preserved by operations → prove impossibility
Simulation shortcut	Large T → find cycle → use modular arithmetic
Parity	Many impossibility results come from parity arguments
Constructive	Build the answer directly instead of searching

🧩 Ad Hoc Problem-Solving Checklist

When you suspect a problem is ad hoc:

Try N = 1, 2, 3, 4 — compute answers by hand
Look for a formula — does the answer follow a simple pattern?
Check parity — is there an invariant that rules out some configurations?
Look for cycles — if simulating, does the state repeat?
Consider the extremes — what if all values are equal? All maximum?
Reformulate — can you restate the problem in a simpler way?
Think backwards — is the reverse problem easier?
Trust small-case patterns — if it works for N=2,3,4,5, it probably works in general

❓ FAQ

Q1: How do I know if a problem is ad hoc or just a standard algorithm I haven't learned yet?

A: This is genuinely hard to tell. A good heuristic: if the problem has small constraints (N ≤ 100) and doesn't involve graphs, DP, or sorting in an obvious way, it's likely ad hoc. If N ≤ 10^5 and you can't identify the algorithm, you might be missing a standard technique — check the problem tags after solving.

Q2: I found the pattern in small cases but can't prove it. Should I just submit?

A: In a contest, yes — submit and move on. In practice, try to understand why the pattern holds. Unproven patterns sometimes fail on edge cases. But partial credit from a pattern-based solution is better than nothing.

Q3: Ad hoc problems feel impossible. How do I get better at them?

A: Practice is the only way. Solve 20–30 ad hoc problems, and after each one, write down: "What was the key insight? How could I have found it faster?" Over time, you'll build a library of techniques (parity, cycles, invariants, etc.) that you recognize in new problems.

Q4: Is there a systematic way to find invariants?

A: Yes. For each operation in the problem, ask: "What quantities does this operation change? By how much?" If an operation always changes quantity Q by a multiple of K, then Q mod K is an invariant. Common invariants: parity (mod 2), sum mod K, number of inversions mod 2.

🔗 Connections to Other Chapters

Chapter 7.1 (Understanding USACO): Ad hoc is one of the 10 Bronze problem categories; this chapter gives it the depth it deserves
Chapter 7.2 (Problem-Solving Strategies): The algorithm decision tree ends with "Greedy / simulation" — ad hoc problems fall outside the tree entirely
Chapter 3.4 (Two Pointers): The sliding window technique appears in several ad hoc problems (e.g., P8 above)
Chapter 3.2 (Prefix Sums): Many ad hoc counting problems use prefix sums as a sub-step
Appendix E (Math Foundations): GCD, modular arithmetic, and number theory underpin many ad hoc insights

🐄 Final thought: Ad hoc problems are where competitive programming becomes an art. There's no formula — just careful observation, creative thinking, and the satisfaction of finding an elegant solution to a problem that seemed impossible. Embrace the struggle.

Appendix A: C++ Quick Reference

This appendix is your cheat sheet. Keep it handy during practice sessions. Everything here has been covered in the book; this is the condensed reference form.

A.1 The Competition Template

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    // freopen("problem.in", "r", stdin);   // uncomment for file I/O (use actual problem name)
    // freopen("problem.out", "w", stdout);  // uncomment for file I/O

    // Your code here

    return 0;
}

A.2 Common Data Types

Type	Size	Range	Use When
`int`	32-bit	±2.1 × 10^9	Default integer
`long long`	64-bit	±9.2 × 10^18	Large numbers, products
`double`	64-bit	~15 significant digits	Decimals
`bool`	1-byte	true/false	Flags
`char`	8-bit	-128 to 127	Single characters
`string`	variable	any length	Text

Safe maximum values:

INT_MAX   = 2,147,483,647   ≈ 2.1 × 10^9
LLONG_MAX = 9,223,372,036,854,775,807 ≈ 9.2 × 10^18

A.3 STL Containers — Operations Cheat Sheet

`vector<T>`

vector<int> v;              // empty
vector<int> v(n, 0);        // n zeros
vector<int> v = {1,2,3};    // from list

v.push_back(x);     // add to end — O(1) amortized
v.pop_back();       // remove last — O(1)
v[i]                // access index i — O(1)
v.front()           // first element
v.back()            // last element
v.size()            // number of elements
v.empty()           // true if empty
v.clear()           // remove all
v.resize(k, val)    // resize to k, fill new with val
v.insert(v.begin()+i, x)  // insert at index i — O(n)
v.erase(v.begin()+i)      // remove at index i — O(n)

`pair<A,B>`

pair<int,int> p = {3, 5};
p.first             // 3
p.second            // 5
make_pair(a, b)     // create pair
// Comparison: by .first, then .second

`map<K,V>`

map<string,int> m;
m[key] = val;           // insert/update — O(log n)
m[key]                  // access (creates if absent!) — O(log n)
m.find(key)             // iterator; .end() if not found — O(log n)
m.count(key)            // 0 or 1 — O(log n)
m.erase(key)            // remove — O(log n)
m.size()                // number of entries
for (auto &[k,v] : m)   // iterate in sorted key order

`set<T>`

set<int> s;
s.insert(x)             // add — O(log n)
s.erase(x)              // remove all x — O(log n)
s.count(x)              // 0 or 1 — O(log n)
s.find(x)               // iterator — O(log n)
s.lower_bound(x)        // first element >= x
s.upper_bound(x)        // first element > x
*s.begin()              // minimum element
*s.rbegin()             // maximum element

`stack<T>`

stack<int> st;
st.push(x)      // push — O(1)
st.pop()        // pop (no return!) — O(1)
st.top()        // peek at top — O(1)
st.empty()      // true if empty
st.size()       // count

`queue<T>`

queue<int> q;
q.push(x)       // enqueue — O(1)
q.pop()         // dequeue (no return!) — O(1)
q.front()       // front element — O(1)
q.back()        // back element — O(1)
q.empty()
q.size()

`priority_queue<T>` (max-heap)

priority_queue<int> pq;                               // max-heap
priority_queue<int, vector<int>, greater<int>> pq2;   // min-heap

pq.push(x)      // insert — O(log n)
pq.pop()        // remove top — O(log n)
pq.top()        // view top (max) — O(1)
pq.empty()
pq.size()

`unordered_map<K,V>` / `unordered_set<T>`

Same interface as map/set, but O(1) average (no ordered iteration).

A.4 STL Algorithms Cheat Sheet

// All assume #include <bits/stdc++.h>

// SORT
sort(v.begin(), v.end());                          // ascending
sort(v.begin(), v.end(), greater<int>());          // descending
sort(v.begin(), v.end(), [](int a, int b){...});   // custom

// BINARY SEARCH (requires sorted container)
binary_search(v.begin(), v.end(), x)               // bool: exists?
lower_bound(v.begin(), v.end(), x)                 // iterator to first >= x
upper_bound(v.begin(), v.end(), x)                 // iterator to first > x

// MIN/MAX
min(a, b)               // minimum of two
max(a, b)               // maximum of two
min({a, b, c})          // minimum of many (C++11)
*min_element(v.begin(), v.end())   // min of container
*max_element(v.begin(), v.end())   // max of container

// ACCUMULATE
accumulate(v.begin(), v.end(), 0LL)   // sum (use 0LL for long long)

// FILL
fill(v.begin(), v.end(), x)           // fill all with x
memset(arr, 0, sizeof(arr))           // zero a C-array (fast)

// REVERSE
reverse(v.begin(), v.end())           // reverse in place

// COUNT
count(v.begin(), v.end(), x)          // count occurrences of x

// UNIQUE (removes consecutive duplicates — sort first!)
auto it = unique(v.begin(), v.end());
v.erase(it, v.end());

// SWAP
swap(a, b)              // swap two values

// PERMUTATION (useful for brute force)
sort(v.begin(), v.end());
do {
    // process current permutation
} while (next_permutation(v.begin(), v.end()));

// GCD / LCM (C++17)
gcd(a, b)                           // GCD — std::gcd from <numeric>
lcm(a, b)                           // LCM — std::lcm from <numeric>
// Legacy (pre-C++17): __gcd(a, b)  // still works but prefer std::gcd

A.5 Time Complexity Reference Table

Visual: Complexity vs N Reference

Complexity Table

The color-coded table above gives an at-a-glance feasibility check. When reading a problem, find N in the columns and your algorithm's complexity in the rows to see if it will pass within 1 second.

N	Max feasible complexity	Algorithm tier
N ≤ 12	`O(N! × N)`	All permutations
N ≤ 20	`O(2^N × N)`	All subsets + linear work
N ≤ 500	`O(N³)`	3 nested loops, interval DP
N ≤ 5000	`O(N²)`	2 nested loops, `O(N²)` DP
N ≤ 10^5	`O(N log N)`	Sort, BFS, binary search
N ≤ 10^6	`O(N)`	Linear scan, prefix sums
N ≤ 10^8	`O(N)` or `O(N / 32)`	Pure loop or bitsets

A.6 Common Pitfalls

Integer Overflow

// WRONG
int a = 1e9, b = 1e9;
int product = a * b;  // overflow!

// CORRECT
long long product = (long long)a * b;

// WRONG
int n = 1e5;
int arr[n * n];  // n*n = 10^10, way too large

// Check: if any intermediate value might exceed 2 × 10^9, use long long

Off-by-One

// WRONG: accesses arr[n]
for (int i = 0; i <= n; i++) cout << arr[i];

// CORRECT
for (int i = 0; i < n; i++) cout << arr[i];   // 0-indexed
for (int i = 1; i <= n; i++) cout << arr[i];  // 1-indexed

// Prefix sum: P[i] = sum of first i elements
// Query sum from L to R (1-indexed): P[R] - P[L-1]
// NOT P[R] - P[L]  ← off by one!

Modifying Container While Iterating

// WRONG
for (auto it = s.begin(); it != s.end(); ++it) {
    if (*it % 2 == 0) s.erase(it);  // iterator invalidated!
}

// CORRECT
set<int> toErase;
for (int x : s) if (x % 2 == 0) toErase.insert(x);
for (int x : toErase) s.erase(x);

`map` Creating Entries on Access

map<string,int> m;
if (m["missing_key"])  // creates "missing_key" with value 0!

// CORRECT: check first
if (m.count("missing_key") && m["missing_key"])  // safe
// Or:
auto it = m.find("missing_key");
if (it != m.end() && it->second) { ... }

Double Comparison

double a = 0.1 + 0.2;
if (a == 0.3)  // might be false due to floating point!

// CORRECT: use epsilon comparison
const double EPS = 1e-9;
if (abs(a - 0.3) < EPS) { ... }

Stack Overflow from Deep Recursion

// DFS on large graphs can cause stack overflow
// For trees with N = 10^5 nodes in a line (like a chain), recursion depth = 10^5
// Fix: increase stack size, or use iterative DFS

// On Linux/Mac, increase stack:
// ulimit -s unlimited
// Or compile with: g++ -DLOCAL ... and set stack size manually

A.7 Useful `#define` and `typedef`

// Common shortcuts (personal taste — don't overdo it)
typedef long long ll;
typedef pair<int,int> pii;
typedef vector<int> vi;

#define pb push_back
#define all(v) (v).begin(), (v).end()
#define sz(v) ((int)(v).size())

// Example usage:
ll x = 1e18;
pii p = {3, 5};
vi v = {1, 2, 3};
sort(all(v));

A.8 C++17 Useful Features

// Structured bindings — unpack pairs/tuples cleanly
auto [x, y] = make_pair(3, 5);
for (auto [key, val] : mymap) { ... }

// If with initializer
if (auto it = m.find(key); it != m.end()) {
    // use it->second
}

// __gcd and gcd
int g = gcd(12, 8);   // C++17: use std::gcd from <numeric>
int l = lcm(4, 6);    // C++17: use std::lcm from <numeric>

// Compile with: g++ -std=c++17 -O2 -o sol sol.cpp

Appendix B: USACO Problem Set

This appendix provides a curated list of 20 USACO problems organized by topic. These problems are carefully selected to reinforce the techniques covered in this book. All are available for free on usaco.org.

How to Use This Problem Set

Work through these problems roughly in order. For each problem:

Read the problem carefully and try to solve it independently for at least 1–2 hours
If stuck, look at the hint below (not the full editorial)
If still stuck after another 30 minutes, read the editorial on the USACO website
After solving (or reading the editorial), implement the solution yourself from scratch

Learning happens most when you struggle and then understand — not when you read a solution passively.

Section 1: Simulation & Brute Force (Bronze)

Problem 1: Blocked Billboard

Contest: USACO 2017 December Bronze Topic: 2D geometry, rectangles Link: usaco.org — 2017 December Bronze

Description: Two billboards and a truck (all rectangles). Find the area of the billboards not covered by the truck.

Key Insight: Compute the intersection of the truck with each billboard. Area of billboard - area of intersection = visible area.

Techniques: 2D rectangle intersection, careful arithmetic Difficulty: ⭐⭐

Problem 2: The Cow-Signal

Contest: USACO 2016 February Bronze Topic: 2D array manipulation Link: usaco.org — 2016 February Bronze

Description: Given a pattern of characters in a K×L grid, "scale" it up by factor R (repeat each character R times in each direction).

Key Insight: Character at position (i,j) in the output comes from ((i-1)/R + 1, (j-1)/R + 1) in the input.

Techniques: 2D array indexing, integer division Difficulty: ⭐

Problem 3: Shell Game

Contest: USACO 2016 January Bronze Topic: Simulation Link: usaco.org — 2016 January Bronze

Description: Elsie plays a shell game. Track where a ball ends up after a sequence of swaps.

Key Insight: Track the ball's position through each swap. The pea starts under one of the three shells; try all three starting positions.

Techniques: Simulation, brute force over starting positions Difficulty: ⭐

Problem 4: Counting Haybales

Contest: USACO 2016 November Bronze Topic: Sorting, searching Link: usaco.org — 2016 November Bronze

Description: N haybales at positions. Q queries asking how many haybales are in range [A, B].

Key Insight: Sort haybale positions, then use binary search (lower_bound/upper_bound) for each query.

Techniques: Sorting, binary search Difficulty: ⭐⭐

Problem 5: Mowing the Field

Contest: USACO 2016 January Bronze Topic: Grid simulation Link: usaco.org — 2016 January Bronze

Description: FJ mows a field by following N instructions. Count how many cells he mows more than once.

Key Insight: Track all visited positions in a set/map. When a cell is visited again, it's double-mowed.

Techniques: Set/map for tracking visited cells, direction simulation Difficulty: ⭐⭐

Section 2: Arrays & Prefix Sums (Bronze/Silver)

Problem 6: Breed Counting

Contest: USACO 2015 December Bronze Topic: Prefix sums Link: usaco.org — 2015 December Bronze

Description: N cows each with breed 1, 2, or 3. Q queries: how many cows of breed B in range [L, R]?

Key Insight: Build a prefix sum array for each of the 3 breeds. Answer each query in O(1).

Techniques: Prefix sums, multiple arrays Difficulty: ⭐⭐

Problem 7: Hoof, Paper, Scissors

Contest: USACO 2019 January Silver Topic: DP Link: usaco.org — 2019 January Silver

Description: Bessie plays N rounds of Hoof-Paper-Scissors. She can change her gesture at most K times. Maximize wins.

Key Insight: DP state: (round, changes used, current gesture). See Chapter 6.2 for full solution.

Techniques: 3D DP Difficulty: ⭐⭐⭐

Section 3: Sorting & Binary Search (Bronze/Silver)

Problem 8: Angry Cows

Contest: USACO 2016 February Bronze Topic: Sorting, simulation Link: usaco.org — 2016 February Bronze

Description: Cows placed on a number line. One cow fires a "blast" that spreads outward, setting off other cows. Find the minimum initial blast radius to set off all cows.

Key Insight: Binary search on the blast radius. For a given radius, simulate which cows get set off.

Techniques: Binary search on answer, sorting, simulation Difficulty: ⭐⭐⭐

Problem 9: Aggressive Cows

Contest: USACO 2011 March Silver Topic: Binary search on answer Link: usaco.org — 2011 March Silver

Description: N stalls at given positions. Place C cows to maximize the minimum distance between any two cows.

Key Insight: Binary search on the answer (minimum distance). For each candidate distance, greedily check if C cows can be placed.

Techniques: Binary search on answer, greedy check Difficulty: ⭐⭐⭐

Problem 10: Convention

Contest: USACO 2018 February Silver Topic: Binary search on answer + greedy Link: usaco.org — 2018 February Silver

Description: N cows arrive at times t[i] and board M buses of capacity C. Minimize the maximum waiting time.

Key Insight: Binary search on the maximum wait time. For each candidate, greedily assign cows to buses.

Techniques: Binary search on answer, greedy simulation, sorting Difficulty: ⭐⭐⭐

Section 4: Graph Algorithms (Silver)

Problem 11: Closing the Farm

Contest: USACO 2016 January Silver Topic: DSU (Union-Find), offline processing Link: usaco.org — 2016 January Silver

Description: A farm has N fields and M paths. Remove fields one by one. After each removal, determine if the remaining fields are still all connected.

Key Insight: Reverse the process — add fields in reverse order. Use DSU to track connectivity as fields are added.

Techniques: DSU, reverse processing Difficulty: ⭐⭐⭐

Problem 12: Moocast

Contest: USACO 2016 February Silver Topic: DSU / BFS Link: usaco.org — 2016 February Silver

Description: N cows on a field. Cow i has walkie-talkie range p[i]. Can cow i directly contact cow j? Find the minimum range such that all cows can communicate (directly or via relays).

Key Insight: Binary search on the minimum range. For a given range, build a graph and check connectivity.

Techniques: Binary search on answer, BFS/DFS connectivity, or Kruskal's MST Difficulty: ⭐⭐⭐

Problem 13: BFS Shortest Path

Contest: USACO 2016 February Bronze: Milk Pails (modified) Topic: BFS on state space Link: usaco.org — 2016 February Bronze

Description: Two buckets with capacities X and Y. Fill/empty/pour operations. Find minimum operations to get exactly M liters in either bucket.

Key Insight: Model (amount in bucket 1, amount in bucket 2) as a graph state. BFS finds the minimum operations.

Techniques: BFS on state graph Difficulty: ⭐⭐⭐

Problem 14: Grass Cownoisseur

Contest: USACO 2015 December Silver Topic: SCC (Strongly Connected Components), BFS on DAG Link: usaco.org — 2015 December Silver

Description: Directed graph of pastures. Bessie can reverse one edge for free. Find the maximum number of pastures reachable in a round trip from pasture 1.

Key Insight: Contract SCCs into super-nodes, then BFS on the DAG. For each edge that could be reversed, check improvement.

Techniques: SCC, BFS, graph contraction Difficulty: ⭐⭐⭐⭐ (Gold-level thinking, Silver contest)

Section 5: Dynamic Programming (Silver)

Problem 15: Rectangular Pasture

Contest: USACO 2021 January Silver Topic: 2D prefix sums, DP Link: usaco.org — 2021 January Silver

Description: N cows on a 2D grid (all at distinct x and y coordinates). Count the number of axis-aligned rectangles that contain exactly K cows.

Key Insight: Sort by x, then for each pair of columns, use a DP over rows. 2D prefix sums for fast rectangle counting.

Techniques: 2D prefix sums, combinatorics Difficulty: ⭐⭐⭐

Problem 16: Lemonade Line

Contest: USACO 2017 February Bronze Topic: Greedy Link: usaco.org — 2017 February Bronze

Description: N cows. Cow i will join a lemonade line if there are at most p[i] cows already in line. Find the maximum number of cows in line.

Key Insight: Sort cows by patience (p[i]) in decreasing order. Greedily add each cow if possible.

Techniques: Sorting, greedy Difficulty: ⭐⭐

Problem 17: Tallest Cow

Contest: USACO 2016 February Silver Topic: Difference arrays Link: usaco.org — 2016 February Silver

Description: N cows in a line. H[i] is the height of cow i. Given pairs (A, B) meaning cow A can see cow B (implies all cows between them are shorter), find maximum possible height of each cow.

Key Insight: Use difference arrays to track height constraints. For each (A, B) pair, all cows strictly between A and B must be shorter than both.

Techniques: Difference arrays, prefix sums Difficulty: ⭐⭐⭐

Section 6: Mixed (Silver)

Problem 18: Balancing Act

Contest: USACO 2018 January Silver Topic: Tree DP, centroid Link: usaco.org — 2018 January Silver

Description: Find the "centroid" of a tree — the node whose removal creates the most balanced partition (minimizes the size of the largest remaining component).

Key Insight: Compute subtree sizes via DFS. For each node, the largest component when it's removed is max(subtree size of each child, N - subtree size of this node).

Techniques: Tree DP, subtree sizes Difficulty: ⭐⭐⭐

Problem 19: Concatenation Nation

Contest: USACO 2016 January Bronze Topic: String manipulation, sorting Link: usaco.org — 2016 January Bronze

Description: Given N strings, for each pair (i, j) with i < j, form the string s_i + s_j. Count how many such concatenated strings are palindromes.

Key Insight: Check each pair; O(N² × L) where L is string length. For N ≤ 1000, this works.

Techniques: String manipulation, palindrome check Difficulty: ⭐⭐

Problem 20: Berry Picking

Contest: USACO 2020 January Silver Topic: Greedy, DP Link: usaco.org — 2020 January Silver

Description: Bessie picks berries from N trees. She has K baskets; each basket can hold berries from only one tree. Maximize total berries given that each basket in a group must hold the same amount.

Key Insight: Optimal: use K/2 baskets for Bessie, K/2 for Elsie. Sort trees. For each possible basket-size for Elsie's trees, binary search to find Bessie's optimal allocation.

Techniques: Sorting, binary search, greedy Difficulty: ⭐⭐⭐⭐

Quick Reference: Problems by Technique

Technique	Problems
Simulation	1, 2, 3, 5
Sorting	4, 8, 9, 10, 16
Prefix Sums	6, 17
Binary Search	4, 8, 9, 10, 12
BFS / DFS	13, 14
Union-Find	11, 12
Dynamic Programming	7, 15, 18, 20
Greedy	16, 20
String / Ad hoc	19

Tips for Practicing

Use the USACO training gate at train.usaco.org for auto-grading
Read editorials at usaco.org after each problem — even for problems you solved
Keep a problem journal — write the key insight for each problem you solve
Difficulty progression: do easy problems from recent years, then medium from older years

Additional Problem Sources

Source	URL	Best For
USACO Archive	usaco.org	USACO-specific practice
USACO Guide	usaco.guide	Structured curriculum with problems
Codeforces	codeforces.com	Volume practice, diverse problems
AtCoder Beginner	atcoder.jp	High-quality beginner problems
LeetCode	leetcode.com	Data structure fundamentals
CSES	cses.fi/problemset	Classic algorithm problems

CSES Problem Set at cses.fi/problemset is especially recommended — it has ~300 carefully curated problems covering all USACO Silver topics, auto-graded, free.

Appendix C: C++ Competitive Programming Tricks

This appendix collects the most useful C++ tricks, macros, templates, and code snippets that competitive programmers use daily. These techniques can save significant time in contests and help your code run faster.

C.1 Fast I/O

The most important performance optimization for I/O-heavy problems:

// Always include these at the start of main()
ios_base::sync_with_stdio(false);  // disconnect C and C++ I/O streams
cin.tie(NULL);                      // untie cin from cout

// Why they help:
// sync_with_stdio(false): by default, C++ syncs with C I/O (printf/scanf)
//   for compatibility. Turning this off makes cin/cout much faster.
// cin.tie(NULL): by default, cin flushes cout before each read.
//   Untying eliminates this unnecessary flush.

File I/O (USACO traditional problems):

freopen("problem.in",  "r", stdin);   // redirect cin to file (replace "problem" with actual name)
freopen("problem.out", "w", stdout);  // redirect cout to file
// After these lines, cin/cout work as normal but read/write files
// Example: for "Blocked Billboard", use "billboard.in" / "billboard.out"

Even faster: manual reading with getchar_unlocked (Linux):

inline int readInt() {
    int x = 0; bool neg = false;
    char c = getchar_unlocked();
    while (c != '-' && (c < '0' || c > '9')) c = getchar_unlocked();
    if (c == '-') { neg = true; c = getchar_unlocked(); }
    while (c >= '0' && c <= '9') { x = x*10 + c-'0'; c = getchar_unlocked(); }
    return neg ? -x : x;
}
// Typically 3-5× faster than cin for large integer inputs

C.2 Common Macros and Typedefs

// Shorter type names
typedef long long ll;
typedef unsigned long long ull;
typedef long double ld;
typedef pair<int,int> pii;
typedef pair<ll,ll> pll;
typedef vector<int> vi;
typedef vector<ll> vll;
typedef vector<pii> vpii;

// Shorthand operations
#define pb push_back
#define pf push_front
#define all(v) (v).begin(), (v).end()
#define rall(v) (v).rbegin(), (v).rend()
#define sz(v) ((int)(v).size())
#define fi first
#define se second

// Loop macros (use sparingly — can hurt readability)
#define FOR(i, a, b) for(int i = (a); i < (b); i++)
#define REP(i, n) FOR(i, 0, n)

// Min/max shortcuts
#define chmin(a, b) a = min(a, b)
#define chmax(a, b) a = max(a, b)

// Usage examples:
// vi v; v.pb(5);        → v.push_back(5)
// sort(all(v));         → sort(v.begin(), v.end())
// cout << sz(v) << "\n";→ cout << (int)v.size() << "\n"
// FOR(i, 1, n+1) { ... }→ for(int i = 1; i < n+1; i++) { ... }

C.3 GCC Pragmas for Speed

// These pragmas can give 2-4× speedup on GCC compilers (used on USACO judges)
#pragma GCC optimize("O3,unroll-loops")
#pragma GCC target("avx2,bmi,bmi2,popcnt")

// Place these BEFORE #include lines
// Warning: "O3" and "avx2" may cause subtle numerical differences
//   (usually fine for integer problems, be careful with floating point)

// Safer version (just O2 without vector instructions):
#pragma GCC optimize("O2")

// Full competitive template with pragmas:
#pragma GCC optimize("O3,unroll-loops")
#pragma GCC target("avx2")
#include <bits/stdc++.h>
using namespace std;
// ... rest of your code

C.4 Useful Math: GCD, LCM, Modular Arithmetic

#include <bits/stdc++.h>
using namespace std;

// ─── GCD and LCM ───────────────────────────────────────────────────────────

// C++17: std::gcd and std::lcm from <numeric>
#include <numeric>
int g = gcd(12, 8);            // 4
int l = lcm(4, 6);             // 12

// C++14 and earlier: __gcd from <algorithm>
int g2 = __gcd(12, 8);         // 4
long long l2 = 4LL / __gcd(4, 6) * 6;  // 12 (careful: divide first to avoid overflow)

// Custom GCD (Euclidean algorithm):
ll mygcd(ll a, ll b) { return b ? mygcd(b, a%b) : a; }
ll mylcm(ll a, ll b) { return a / mygcd(a,b) * b; }  // divide first!

// ─── Modular Arithmetic ─────────────────────────────────────────────────────

const ll MOD = 1e9 + 7;  // standard USACO/Codeforces modulus

// Add: (a + b) % MOD
ll addmod(ll a, ll b) { return (a + b) % MOD; }

// Subtract: (a - b + MOD) % MOD  ← always add MOD before % to avoid negatives
ll submod(ll a, ll b) { return (a - b + MOD) % MOD; }

// Multiply: (a * b) % MOD
ll mulmod(ll a, ll b) { return (a % MOD) * (b % MOD) % MOD; }

// Power: a^b mod MOD using fast exponentiation — O(log b)
ll power(ll base, ll exp, ll mod = MOD) {
    ll result = 1;
    base %= mod;
    while (exp > 0) {
        if (exp & 1) result = result * base % mod;  // odd exponent
        base = base * base % mod;                    // square
        exp >>= 1;                                   // halve exponent
    }
    return result;
}

// Modular inverse (a^{-1} mod p, where p is prime):
ll modinv(ll a, ll mod = MOD) { return power(a, mod-2, mod); }
// This uses Fermat's little theorem: a^{p-1} ≡ 1 (mod p) for prime p
// So a^{-1} ≡ a^{p-2} (mod p)

// Modular division: (a / b) mod p = (a * b^{-1}) mod p
ll divmod(ll a, ll b) { return mulmod(a, modinv(b)); }

// Example: C(n, k) mod p using precomputed factorials
const int MAXN = 200001;
ll fact[MAXN], inv_fact[MAXN];

void precompute_factorials() {
    fact[0] = 1;
    for (int i = 1; i < MAXN; i++) fact[i] = fact[i-1] * i % MOD;
    inv_fact[MAXN-1] = modinv(fact[MAXN-1]);
    for (int i = MAXN-2; i >= 0; i--) inv_fact[i] = inv_fact[i+1] * (i+1) % MOD;
}

ll C(int n, int k) {
    if (k < 0 || k > n) return 0;
    return fact[n] * inv_fact[k] % MOD * inv_fact[n-k] % MOD;
}

C.5 Useful Code Snippets

Disjoint Set Union (DSU / Union-Find) Template

// DSU — complete template with size tracking
struct DSU {
    vector<int> parent, sz;

    DSU(int n) : parent(n+1), sz(n+1, 1) {
        iota(parent.begin(), parent.end(), 0);  // parent[i] = i
    }

    int find(int x) {
        if (parent[x] != x) parent[x] = find(parent[x]);  // path compression
        return parent[x];
    }

    bool unite(int x, int y) {
        x = find(x); y = find(y);
        if (x == y) return false;              // already same component
        if (sz[x] < sz[y]) swap(x, y);        // union by size
        parent[y] = x;
        sz[x] += sz[y];
        return true;                            // successfully merged
    }

    bool connected(int x, int y) { return find(x) == find(y); }
    int size(int x) { return sz[find(x)]; }     // size of x's component
};

// Usage:
DSU dsu(n);
dsu.unite(1, 2);
cout << dsu.connected(1, 3) << "\n";   // 0 (false)
cout << dsu.size(1) << "\n";           // 2

Segment Tree (Point Update, Range Query)

// Segment Tree — supports:
//   point_update(i, val): set position i to val
//   query(l, r): sum of [l, r]
// All operations O(log N)

struct SegTree {
    int n;
    vector<ll> tree;

    SegTree(int n) : n(n), tree(4*n, 0) {}

    void update(int node, int start, int end, int idx, ll val) {
        if (start == end) {
            tree[node] = val;
            return;
        }
        int mid = (start + end) / 2;
        if (idx <= mid) update(2*node, start, mid, idx, val);
        else            update(2*node+1, mid+1, end, idx, val);
        tree[node] = tree[2*node] + tree[2*node+1];  // merge
    }

    ll query(int node, int start, int end, int l, int r) {
        if (r < start || end < l) return 0;           // out of range
        if (l <= start && end <= r) return tree[node]; // fully in range
        int mid = (start + end) / 2;
        return query(2*node, start, mid, l, r)
             + query(2*node+1, mid+1, end, l, r);
    }

    void update(int i, ll val) { update(1, 1, n, i, val); }
    ll query(int l, int r) { return query(1, 1, n, l, r); }
};

// Usage:
SegTree st(n);
st.update(3, 10);           // set position 3 to 10
cout << st.query(1, 5);     // sum of positions 1..5

BFS Template

// Grid BFS — shortest path in unweighted grid
int bfs_grid(vector<string>& grid, int sr, int sc, int er, int ec) {
    int R = grid.size(), C = grid[0].size();
    vector<vector<int>> dist(R, vector<int>(C, -1));
    queue<pair<int,int>> q;
    int dr[] = {-1, 1, 0, 0};
    int dc[] = {0, 0, -1, 1};

    dist[sr][sc] = 0;
    q.push({sr, sc});

    while (!q.empty()) {
        auto [r, c] = q.front(); q.pop();
        for (int d = 0; d < 4; d++) {
            int nr = r + dr[d], nc = c + dc[d];
            if (nr >= 0 && nr < R && nc >= 0 && nc < C
                && grid[nr][nc] != '#' && dist[nr][nc] == -1) {
                dist[nr][nc] = dist[r][c] + 1;
                q.push({nr, nc});
            }
        }
    }
    return dist[er][ec];
}

Binary Search on Answer Template

// Binary search on answer — maximize X such that check(X) is true
// Precondition: check is monotone (false...false...true...true)
template<typename T, typename F>
T binary_search_ans(T lo, T hi, F check) {
    T ans = lo;  // or -1 if no valid answer
    while (lo <= hi) {
        T mid = lo + (hi - lo) / 2;
        if (check(mid)) { ans = mid; lo = mid + 1; }
        else { hi = mid - 1; }
    }
    return ans;
}

// Usage example: find max D such that canPlace(D) is true
int result = binary_search_ans(1, maxDist, canPlace);

C.6 Built-in Functions Worth Knowing

// ─── Integer operations ─────────────────────────────────────────────────────

__builtin_popcount(x)      // count set bits in x (int)
__builtin_popcountll(x)    // count set bits in x (long long)
__builtin_clz(x)           // count leading zeros (int, x > 0)
__builtin_ctz(x)           // count trailing zeros (int, x > 0)

// Examples:
__builtin_popcount(0b1011) == 3       // three 1-bits
__builtin_ctz(0b1000)      == 3       // three trailing zeros
__builtin_clz(1)           == 31      // 31 leading zeros (for 32-bit int)
(31 - __builtin_clz(x))              // floor(log2(x))

// ─── Bit tricks ─────────────────────────────────────────────────────────────

// Check if x is a power of 2:
bool isPow2 = (x > 0) && !(x & (x-1));

// Extract lowest set bit:
int lsb = x & (-x);

// Turn off lowest set bit:
x = x & (x-1);

// Iterate all subsets of a bitmask (for bitmask DP):
for (int sub = mask; sub > 0; sub = (sub-1) & mask) {
    // process subset 'sub' of 'mask'
}

// ─── Useful STL functions ────────────────────────────────────────────────────

// next_permutation: iterate all permutations
sort(v.begin(), v.end());    // start from sorted order
do {
    // v is current permutation
} while (next_permutation(v.begin(), v.end()));

// __gcd: greatest common divisor (available before C++17)
int g = __gcd(a, b);

// std::gcd, std::lcm (C++17 <numeric>):
#include <numeric>
int g = gcd(a, b);
int l = lcm(a, b);

C.7 The Full Competition Template

// ────────────────────────────────────────────────────────────────────────────
// Competitive Programming Template — C++17
// ────────────────────────────────────────────────────────────────────────────
#pragma GCC optimize("O2")
#include <bits/stdc++.h>
using namespace std;

// Type aliases
typedef long long ll;
typedef pair<int,int> pii;
typedef vector<int> vi;

// Convenience macros
#define pb push_back
#define all(v) (v).begin(), (v).end()
#define sz(v) ((int)(v).size())
#define fi first
#define se second

// Constants
const ll MOD = 1e9 + 7;
const ll INF = 1e18;
const int MAXN = 200005;

// Fast power mod
ll power(ll base, ll exp, ll mod = MOD) {
    ll res = 1; base %= mod;
    for (; exp > 0; exp >>= 1) {
        if (exp & 1) res = res * base % mod;
        base = base * base % mod;
    }
    return res;
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    // Uncomment for file I/O:
    // freopen("problem.in", "r", stdin);
    // freopen("problem.out", "w", stdout);

    // ── Your solution here ──

    return 0;
}

C.8 Common Patterns and Idioms

// ─── Reading N integers into a vector ────────────────────────────────────────
int n; cin >> n;
vi a(n);
for (int &x : a) cin >> x;

// ─── 2D vector initialization ────────────────────────────────────────────────
int R, C;
vector<vector<int>> grid(R, vector<int>(C, 0));

// ─── Sorting with custom criterion ───────────────────────────────────────────
sort(all(v), [](const auto &a, const auto &b) {
    return a.weight < b.weight;  // sort by weight ascending
});

// ─── Finding min/max with index ───────────────────────────────────────────────
auto maxIt = max_element(all(v));
int maxVal = *maxIt;
int maxIdx = maxIt - v.begin();

// ─── Erase duplicates from sorted vector ─────────────────────────────────────
sort(all(v));
v.erase(unique(all(v)), v.end());

// ─── String splitting by character ───────────────────────────────────────────
vector<string> split(const string &s, char delim) {
    vector<string> parts;
    stringstream ss(s);
    string part;
    while (getline(ss, part, delim)) parts.pb(part);
    return parts;
}

// ─── Integer square root (exact, no float issues) ───────────────────────────
ll isqrt(ll n) {
    ll r = sqrtl(n);
    while (r*r > n) r--;
    while ((r+1)*(r+1) <= n) r++;
    return r;
}

// ─── Checking if a number is prime ───────────────────────────────────────────
bool isPrime(ll n) {
    if (n < 2) return false;
    if (n == 2) return true;
    if (n % 2 == 0) return false;
    for (ll i = 3; i * i <= n; i += 2) {
        if (n % i == 0) return false;
    }
    return true;
}

// ─── Sieve of Eratosthenes (all primes up to N) ─────────────────────────────
vector<bool> sieve(int N) {
    vector<bool> is_prime(N+1, true);
    is_prime[0] = is_prime[1] = false;
    for (int i = 2; i * i <= N; i++) {
        if (is_prime[i]) {
            for (int j = i*i; j <= N; j += i)
                is_prime[j] = false;
        }
    }
    return is_prime;
}

C.9 Debugging Tips

// Use cerr for debug output (judges usually ignore stderr)
#ifdef DEBUG
    #define dbg(x) cerr << #x << " = " << x << "\n"
    #define dbgv(v) cerr << #v << ": "; for(auto x:v) cerr << x << " "; cerr << "\n"
#else
    #define dbg(x)
    #define dbgv(v)
#endif
// Compile with: g++ -DDEBUG -o sol sol.cpp  (enables debug output)
// Compile without: g++ -o sol sol.cpp  (removes debug output)

// Usage:
int x = 42;
dbg(x);         // prints: x = 42  (only in debug mode)
vi v = {1,2,3};
dbgv(v);        // prints: v: 1 2 3  (only in debug mode)

// Compile with sanitizers to catch memory errors and UB:
// g++ -fsanitize=address,undefined -O1 -o sol sol.cpp
// These are invaluable for catching:
//   - Out-of-bounds array access
//   - Integer overflow (with -fsanitize=signed-integer-overflow)
//   - Use of uninitialized memory
//   - Null pointer dereference

Fenwick Tree (BIT) — Prefix Sum with Updates

Binary Indexed Tree

The Binary Indexed Tree (BIT or Fenwick Tree) uses the lowest set bit trick to achieve O(log N) prefix sum queries and updates. Each index i is "responsible" for the range [i - lowbit(i) + 1, i] where lowbit(i) = i & (-i).

// Fenwick Tree / BIT — O(log N) update and prefix query
struct BIT {
    int n;
    vector<long long> tree;
    BIT(int n) : n(n), tree(n + 1, 0) {}

    // Add val to position i (1-indexed)
    void update(int i, long long val) {
        for (; i <= n; i += i & (-i))
            tree[i] += val;
    }

    // Prefix sum [1..i]
    long long query(int i) {
        long long sum = 0;
        for (; i > 0; i -= i & (-i))
            sum += tree[i];
        return sum;
    }

    // Range sum [l..r]
    long long query(int l, int r) { return query(r) - query(l - 1); }
};

Appendix D: Contest-Ready Algorithm Templates

🏆 Quick Reference: These templates are battle-tested, copy-paste ready, and designed to work correctly in competitive programming. Each is annotated with complexity and typical use cases.

D.1 DSU / Union-Find

Use when: Dynamic connectivity, Kruskal's MST, cycle detection, grouping elements.

Complexity: O(α(N)) ≈ O(1) per operation.

// =============================================================
// DSU (Disjoint Set Union) with Path Compression + Union by Rank
// =============================================================
struct DSU {
    vector<int> parent, rank_;
    int components;  // number of connected components

    DSU(int n) : parent(n), rank_(n, 0), components(n) {
        iota(parent.begin(), parent.end(), 0);  // parent[i] = i
    }

    // Find with path compression
    int find(int x) {
        if (parent[x] != x)
            parent[x] = find(parent[x]);  // path compression
        return parent[x];
    }

    // Union by rank — returns true if actually merged (different components)
    bool unite(int x, int y) {
        x = find(x); y = find(y);
        if (x == y) return false;  // already connected
        if (rank_[x] < rank_[y]) swap(x, y);
        parent[y] = x;
        if (rank_[x] == rank_[y]) rank_[x]++;
        components--;
        return true;
    }

    bool connected(int x, int y) { return find(x) == find(y); }
};

// Example usage:
int main() {
    int n = 5;
    DSU dsu(n);
    dsu.unite(0, 1);
    dsu.unite(2, 3);
    cout << dsu.connected(0, 1) << "\n";  // 1 (true)
    cout << dsu.connected(0, 2) << "\n";  // 0 (false)
    cout << dsu.components << "\n";       // 3
    return 0;
}

D.2 Segment Tree (Point Update, Range Sum)

Use when: Range sum/min/max queries with point updates.

Complexity: O(N) build, O(log N) per query/update.

// =============================================================
// Segment Tree — Point Update, Range Sum Query
// =============================================================
struct SegTree {
    int n;
    vector<long long> tree;

    SegTree(int n) : n(n), tree(4 * n, 0) {}

    void build(vector<long long>& arr, int node, int start, int end) {
        if (start == end) { tree[node] = arr[start]; return; }
        int mid = (start + end) / 2;
        build(arr, 2*node, start, mid);
        build(arr, 2*node+1, mid+1, end);
        tree[node] = tree[2*node] + tree[2*node+1];
    }
    void build(vector<long long>& arr) { build(arr, 1, 0, n-1); }

    void update(int node, int start, int end, int idx, long long val) {
        if (start == end) { tree[node] = val; return; }
        int mid = (start + end) / 2;
        if (idx <= mid) update(2*node, start, mid, idx, val);
        else update(2*node+1, mid+1, end, idx, val);
        tree[node] = tree[2*node] + tree[2*node+1];
    }
    // Update arr[idx] = val
    void update(int idx, long long val) { update(1, 0, n-1, idx, val); }

    long long query(int node, int start, int end, int l, int r) {
        if (r < start || end < l) return 0;  // identity for sum
        if (l <= start && end <= r) return tree[node];
        int mid = (start + end) / 2;
        return query(2*node, start, mid, l, r)
             + query(2*node+1, mid+1, end, l, r);
    }
    // Query sum of arr[l..r]
    long long query(int l, int r) { return query(1, 0, n-1, l, r); }
};

// Example usage:
int main() {
    vector<long long> arr = {1, 3, 5, 7, 9, 11};
    SegTree st(arr.size());
    st.build(arr);
    cout << st.query(2, 4) << "\n";   // 5+7+9 = 21
    st.update(2, 10);                 // arr[2] = 10
    cout << st.query(2, 4) << "\n";   // 10+7+9 = 26
    return 0;
}

D.3 BFS Template

Use when: Shortest path in unweighted graph/grid, level-order traversal, multi-source distances.

Complexity: O(V + E).

// =============================================================
// BFS — Shortest Path in Unweighted Graph
// =============================================================
#include <bits/stdc++.h>
using namespace std;

// Returns dist[] where dist[v] = shortest distance from src to v
// dist[v] = -1 if unreachable
vector<int> bfs(int src, int n, vector<vector<int>>& adj) {
    vector<int> dist(n, -1);
    queue<int> q;
    dist[src] = 0;
    q.push(src);
    while (!q.empty()) {
        int u = q.front(); q.pop();
        for (int v : adj[u]) {
            if (dist[v] == -1) {
                dist[v] = dist[u] + 1;
                q.push(v);
            }
        }
    }
    return dist;
}

// Grid BFS (4-directional)
const int dr[] = {-1, 1, 0, 0};
const int dc[] = {0, 0, -1, 1};

int gridBFS(vector<string>& grid, int sr, int sc, int er, int ec) {
    int R = grid.size(), C = grid[0].size();
    vector<vector<int>> dist(R, vector<int>(C, -1));
    queue<pair<int,int>> q;
    dist[sr][sc] = 0;
    q.push({sr, sc});
    while (!q.empty()) {
        auto [r, c] = q.front(); q.pop();
        for (int d = 0; d < 4; d++) {
            int nr = r + dr[d], nc = c + dc[d];
            if (nr >= 0 && nr < R && nc >= 0 && nc < C
                && grid[nr][nc] != '#' && dist[nr][nc] == -1) {
                dist[nr][nc] = dist[r][c] + 1;
                q.push({nr, nc});
            }
        }
    }
    return dist[er][ec];  // -1 if unreachable
}

D.4 DFS Template

Use when: Connected components, cycle detection, topological sort, flood fill.

Complexity: O(V + E).

// =============================================================
// DFS — Iterative and Recursive Templates
// =============================================================

vector<vector<int>> adj;
vector<int> color;  // 0=white, 1=gray (in stack), 2=black (done)

// Recursive DFS with cycle detection (directed graph)
bool hasCycle = false;
void dfs(int u) {
    color[u] = 1;  // mark as "in progress"
    for (int v : adj[u]) {
        if (color[v] == 0) dfs(v);
        else if (color[v] == 1) hasCycle = true;  // back edge → cycle!
    }
    color[u] = 2;  // mark as "done"
}

// Topological sort using DFS post-order
vector<int> topoOrder;
void dfsToposort(int u) {
    color[u] = 1;
    for (int v : adj[u]) {
        if (color[v] == 0) dfsToposort(v);
    }
    color[u] = 2;
    topoOrder.push_back(u);  // add to order AFTER processing all children
}
// Reverse topoOrder for correct topological sequence

// Iterative DFS (avoids stack overflow for large graphs)
void dfsIterative(int src, int n) {
    vector<bool> visited(n, false);
    stack<int> st;
    st.push(src);
    while (!st.empty()) {
        int u = st.top(); st.pop();
        if (visited[u]) continue;
        visited[u] = true;
        // Process u here
        for (int v : adj[u]) {
            if (!visited[v]) st.push(v);
        }
    }
}

D.5 Dijkstra's Algorithm

Use when: Shortest path in weighted graph with non-negative edge weights.

Complexity: O((V + E) log V).

// =============================================================
// Dijkstra's Shortest Path — O((V+E) log V)
// =============================================================
#include <bits/stdc++.h>
using namespace std;

typedef pair<long long, int> pli;  // {distance, node}
const long long INF = 1e18;

vector<long long> dijkstra(int src, int n,
                            vector<vector<pair<int,int>>>& adj) {
    // adj[u] = { {v, weight}, ... }
    vector<long long> dist(n, INF);
    priority_queue<pli, vector<pli>, greater<pli>> pq;  // min-heap

    dist[src] = 0;
    pq.push({0, src});

    while (!pq.empty()) {
        auto [d, u] = pq.top(); pq.pop();

        if (d > dist[u]) continue;  // ← KEY LINE: skip outdated entries

        for (auto [v, w] : adj[u]) {
            if (dist[u] + w < dist[v]) {
                dist[v] = dist[u] + w;
                pq.push({dist[v], v});
            }
        }
    }

    return dist;  // dist[v] = shortest distance src → v, INF if unreachable
}

// Example usage:
int main() {
    int n = 5;
    vector<vector<pair<int,int>>> adj(n);
    // Add edge u-v with weight w (undirected):
    auto addEdge = [&](int u, int v, int w) {
        adj[u].push_back({v, w});
        adj[v].push_back({u, w});
    };
    addEdge(0, 1, 4);
    addEdge(0, 2, 1);
    addEdge(2, 1, 2);
    addEdge(1, 3, 1);
    addEdge(2, 3, 5);

    auto dist = dijkstra(0, n, adj);
    cout << dist[3] << "\n";  // 4 (path: 0→2→1→3 with cost 1+2+1=4)
    return 0;
}

D.6 Binary Search Templates

Use when: Searching in sorted arrays, or "binary search on answer" (parametric search).

Complexity: O(log N) per search, O(f(N) × log V) for binary search on answer.

// =============================================================
// Binary Search Templates
// =============================================================

// 1. Find exact value (returns index or -1)
int binarySearch(vector<int>& arr, int target) {
    int lo = 0, hi = (int)arr.size() - 1;
    while (lo <= hi) {
        int mid = lo + (hi - lo) / 2;
        if (arr[mid] == target) return mid;
        else if (arr[mid] < target) lo = mid + 1;
        else hi = mid - 1;
    }
    return -1;
}

// 2. First index where arr[i] >= target (lower_bound)
int lowerBound(vector<int>& arr, int target) {
    int lo = 0, hi = (int)arr.size();
    while (lo < hi) {
        int mid = lo + (hi - lo) / 2;
        if (arr[mid] < target) lo = mid + 1;
        else hi = mid;
    }
    return lo;  // arr.size() if all elements < target
}

// 3. First index where arr[i] > target (upper_bound)
int upperBound(vector<int>& arr, int target) {
    int lo = 0, hi = (int)arr.size();
    while (lo < hi) {
        int mid = lo + (hi - lo) / 2;
        if (arr[mid] <= target) lo = mid + 1;
        else hi = mid;
    }
    return lo;
}

// 4. Binary search on answer — find maximum X where check(X) is true
// Template: adapt lo, hi, and check() for your problem
long long bsOnAnswer(long long lo, long long hi,
                     function<bool(long long)> check) {
    long long answer = lo - 1;  // sentinel: no valid answer
    while (lo <= hi) {
        long long mid = lo + (hi - lo) / 2;
        if (check(mid)) {
            answer = mid;
            lo = mid + 1;  // try to do better
        } else {
            hi = mid - 1;
        }
    }
    return answer;
}

// STL wrappers (prefer these in practice):
// lower_bound(v.begin(), v.end(), x) → iterator to first element >= x
// upper_bound(v.begin(), v.end(), x) → iterator to first element >  x
// binary_search(v.begin(), v.end(), x) → bool, whether x exists

lower_bound / upper_bound cheat sheet:

Goal	Code
First index ≥ x	`lower_bound(v.begin(), v.end(), x) - v.begin()`
First index > x	`upper_bound(v.begin(), v.end(), x) - v.begin()`
Count of x	`upper_bound(..., x) - lower_bound(..., x)`
Largest value ≤ x	`prev(upper_bound(..., x))` if exists
Smallest value ≥ x	`*lower_bound(..., x)` if < end

D.7 Modular Arithmetic Template

Use when: Large numbers, combinatorics, DP with large values.

Complexity: O(1) per operation, O(log exp) for modpow.

// =============================================================
// Modular Arithmetic Template
// =============================================================
const long long MOD = 1e9 + 7;  // or 998244353 for NTT-friendly

long long mod(long long x) { return ((x % MOD) + MOD) % MOD; }
long long add(long long a, long long b) { return (a + b) % MOD; }
long long sub(long long a, long long b) { return mod(a - b); }
long long mul(long long a, long long b) { return a % MOD * (b % MOD) % MOD; }

// Fast power: base^exp mod MOD — O(log exp)
long long power(long long base, long long exp, long long mod = MOD) {
    long long result = 1;
    base %= mod;
    while (exp > 0) {
        if (exp & 1) result = result * base % mod;  // if last bit is 1
        base = base * base % mod;                    // square the base
        exp >>= 1;                                   // shift right
    }
    return result;
}

// Modular inverse (base^(MOD-2) mod MOD, only when MOD is prime)
long long inv(long long x) { return power(x, MOD - 2); }

// Modular division
long long divide(long long a, long long b) { return mul(a, inv(b)); }

// Precompute factorials for combinations
const int MAXN = 200005;
long long fact[MAXN], inv_fact[MAXN];

void precompute_factorials() {
    fact[0] = 1;
    for (int i = 1; i < MAXN; i++) fact[i] = fact[i-1] * i % MOD;
    inv_fact[MAXN-1] = inv(fact[MAXN-1]);
    for (int i = MAXN-2; i >= 0; i--) inv_fact[i] = inv_fact[i+1] * (i+1) % MOD;
}

// C(n, k) = n choose k mod MOD
long long C(int n, int k) {
    if (k < 0 || k > n) return 0;
    return fact[n] * inv_fact[k] % MOD * inv_fact[n-k] % MOD;
}

D.8 Fast Power (Binary Exponentiation)

Use when: Computing a^b for large b (standalone or modular).

Complexity: O(log b).

// =============================================================
// Binary Exponentiation — a^b in O(log b)
// =============================================================

// Integer power (no mod) — careful of overflow for large a,b
long long fastPow(long long a, long long b) {
    long long result = 1;
    while (b > 0) {
        if (b & 1) result *= a;  // if current bit is 1
        a *= a;                   // square a
        b >>= 1;                  // next bit
    }
    return result;
}

// Modular power — a^b mod m
long long modPow(long long a, long long b, long long m) {
    long long result = 1;
    a %= m;
    while (b > 0) {
        if (b & 1) result = result * a % m;
        a = a * a % m;
        b >>= 1;
    }
    return result;
}

// Matrix exponentiation — M^b for matrix M (for Fibonacci in O(log N) etc.)
typedef vector<vector<long long>> Matrix;
// Note: uses MOD from D.7 (const long long MOD = 1e9 + 7)

Matrix multiply(const Matrix& A, const Matrix& B) {
    int n = A.size();
    Matrix C(n, vector<long long>(n, 0));
    for (int i = 0; i < n; i++)
        for (int k = 0; k < n; k++)
            if (A[i][k])
                for (int j = 0; j < n; j++)
                    C[i][j] = (C[i][j] + A[i][k] * B[k][j]) % MOD;
    return C;
}

Matrix matPow(Matrix M, long long b) {
    int n = M.size();
    Matrix result(n, vector<long long>(n, 0));
    for (int i = 0; i < n; i++) result[i][i] = 1;  // identity matrix
    while (b > 0) {
        if (b & 1) result = multiply(result, M);
        M = multiply(M, M);
        b >>= 1;
    }
    return result;
}

// Example: Fibonacci(N) in O(log N) using matrix exponentiation
// [F(n+1)]   [1 1]^n   [F(1)]
// [F(n)  ] = [1 0]   * [F(0)]
long long fibonacci(long long n) {
    if (n <= 1) return n;
    Matrix M = {{1, 1}, {1, 0}};
    Matrix result = matPow(M, n - 1);
    return result[0][0];  // F(n)
}

D.9 Other Useful Templates

Prefix Sum (1D and 2D)

// 1D Prefix Sum
vector<long long> prefSum(n + 1, 0);
for (int i = 1; i <= n; i++) prefSum[i] = prefSum[i-1] + arr[i];
// Query sum of arr[l..r] (1-indexed): prefSum[r] - prefSum[l-1]

// 2D Prefix Sum
long long psum[N+1][M+1] = {};
for (int i = 1; i <= N; i++)
    for (int j = 1; j <= M; j++)
        psum[i][j] = grid[i][j] + psum[i-1][j] + psum[i][j-1] - psum[i-1][j-1];
// Query sum of rectangle [r1,c1]..[r2,c2]:
// psum[r2][c2] - psum[r1-1][c2] - psum[r2][c1-1] + psum[r1-1][c1-1]

Competitive Programming Header

// Standard competitive programming template
#include <bits/stdc++.h>
using namespace std;

typedef long long ll;
typedef pair<int,int> pii;
typedef vector<int> vi;
typedef vector<ll> vll;

#define all(x) x.begin(), x.end()
#define sz(x) (int)(x).size()
#define pb push_back
#define mp make_pair

const int INF = 1e9;
const ll LINF = 1e18;
const int MOD = 1e9 + 7;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    // Your solution here
    return 0;
}

Quick Reference Card

Algorithm	Complexity	Header to include
DSU (Union-Find)	`O(α(N))` per op	—
Segment Tree	`O(N)` build, `O(log N)` per op	—
BFS	`O(V+E)`	`<queue>`
DFS	`O(V+E)`	`<stack>`
Dijkstra	`O((V+E) log V)`	`<queue>`
Binary search	`O(log N)`	`<algorithm>`
Sort	`O(N log N)`	`<algorithm>`
Modular exponentiation	`O(log exp)`	—
lower/upper_bound	`O(log N)`	`<algorithm>`

✅ All examples compiled and tested with C++17 (-std=c++17 -O2).

📎 Appendix E ⏱️ ~50 min read 🎯 Reference Math

Appendix E: Math Foundations for Competitive Programming

💡 About This Appendix: Competitive programming often requires mathematical tools beyond basic arithmetic. This appendix covers the essential math you'll encounter in USACO Bronze, Silver, and Gold — with contest-ready code templates for each topic.

E.1 Modular Arithmetic

Why Do We Need Modular Arithmetic?

Many problems ask you to output an answer "modulo 10⁹ + 7". This isn't arbitrary — it prevents integer overflow when answers are astronomically large.

Consider: "How many permutations of N elements?" Answer: N! For N = 20, that's 2,432,902,008,176,640,000 — larger than long long's max (~9.2 × 10¹⁸). For N = 100, it's completely unrepresentable.

Solution: Compute everything modulo a prime M (typically 10⁹ + 7).

(a + b) mod M = ((a mod M) + (b mod M)) mod M (a × b) mod M = ((a mod M) × (b mod M)) mod M (a - b) mod M = ((a mod M) - (b mod M) + M) mod M ← note the +M!

Common MOD Values

Constant	Value	Why This Value?
`1e9 + 7`	1,000,000,007	Prime, fits in `int` (< 2³¹), widely used
`1e9 + 9`	1,000,000,009	Prime, alternative to 1e9+7
`998244353`	998,244,353	NTT-friendly prime (for polynomial operations)

Basic Modular Operations Template

// Solution: Modular Arithmetic Basics
#include <bits/stdc++.h>
using namespace std;

typedef long long ll;
const ll MOD = 1e9 + 7;  // standard competitive programming MOD

// Safe addition: (a + b) % MOD
ll addMod(ll a, ll b) {
    return (a % MOD + b % MOD) % MOD;
}

// Safe subtraction: (a - b + MOD) % MOD (handle negative result)
ll subMod(ll a, ll b) {
    return ((a % MOD) - (b % MOD) + MOD) % MOD;  // +MOD prevents negative!
}

// Safe multiplication: (a * b) % MOD
// Key: a and b are at most MOD-1 ≈ 10^9, so a*b ≈ 10^18 which fits long long
ll mulMod(ll a, ll b) {
    return (a % MOD) * (b % MOD) % MOD;
}

// Example: Compute sum of first N integers modulo MOD
ll sumFirstN(ll n) {
    // Formula: n*(n+1)/2, but careful with division — need modular inverse!
    // For now: just accumulate with addMod
    ll result = 0;
    for (ll i = 1; i <= n; i++) {
        result = addMod(result, i);
    }
    return result;
}

⚠️ Critical Bug: (a - b) % MOD can be negative in C++ if a < b! Always use (a - b + MOD) % MOD.

E.1.1 Fast Exponentiation (Binary Exponentiation)

Computing a^n mod M naively takes O(N) multiplications. Fast exponentiation (exponentiation by squaring) does it in O(log N).

Key insight: a^n = a^(n/2) × a^(n/2)          if n is even
              a^n = a × a^((n-1)/2) × a^((n-1)/2)  if n is odd

Example: a^13 = a^(1101 in binary)
       = a^8 × a^4 × a^1
       = 3 multiplications instead of 12!

// Solution: Fast Modular Exponentiation — O(log n)
// Computes (base^exp) % mod
ll power(ll base, ll exp, ll mod = MOD) {
    ll result = 1;
    base %= mod;                  // reduce base first
    
    while (exp > 0) {
        if (exp & 1) {            // if current bit is 1
            result = result * base % mod;
        }
        base = base * base % mod; // square the base
        exp >>= 1;                // shift to next bit
    }
    return result;
}

// Example usage:
// power(2, 10) = 1024 % MOD = 1024
// power(2, 100, MOD) = 2^100 mod (10^9+7)

E.1.2 Modular Inverse (Fermat's Little Theorem)

The modular inverse of a modulo M is a number a⁻¹ such that a × a⁻¹ ≡ 1 (mod M).

This lets us do modular division: a / b mod M = a × b⁻¹ mod M.

Fermat's Little Theorem: If M is prime and gcd(a, M) = 1, then:

a^(M-1) ≡ 1 (mod M) ⟹ a^(M-2) ≡ a⁻¹ (mod M)

// Solution: Modular Inverse using Fermat's Little Theorem
// Only works when MOD is PRIME and gcd(a, MOD) = 1
ll modInverse(ll a, ll mod = MOD) {
    return power(a, mod - 2, mod);
}

// Division with modular arithmetic:
ll divMod(ll a, ll b) {
    return mulMod(a, modInverse(b));
}

// Example: (n! / k!) mod MOD
// = n! × (k!)^(-1) mod MOD
// = n! × modInverse(k!) mod MOD

E.1.3 Precomputing Factorials and Inverses

For problems requiring many combinations C(n, k):

// Solution: Precomputed Factorials for O(1) Combination Queries
const int MAXN = 1000005;
ll fact[MAXN], inv_fact[MAXN];

void precompute() {
    fact[0] = 1;
    for (int i = 1; i < MAXN; i++) {
        fact[i] = fact[i-1] * i % MOD;
    }
    inv_fact[MAXN-1] = modInverse(fact[MAXN-1]);
    for (int i = MAXN-2; i >= 0; i--) {
        inv_fact[i] = inv_fact[i+1] * (i+1) % MOD;
    }
}

// C(n, k) = n! / (k! * (n-k)!)
ll C(int n, int k) {
    if (k < 0 || k > n) return 0;
    return fact[n] * inv_fact[k] % MOD * inv_fact[n-k] % MOD;
}

// Usage: precompute() once, then C(n, k) in O(1)

E.2 GCD and LCM

Euclidean Algorithm

The Greatest Common Divisor (GCD) of two numbers is the largest number that divides both.

Euclidean Algorithm: Based on gcd(a, b) = gcd(b, a % b).

// Solution: GCD — O(log(min(a,b)))
int gcd(int a, int b) {
    while (b != 0) {
        a %= b;
        swap(a, b);
    }
    return a;
}
// Or recursively:
// int gcd(int a, int b) { return b == 0 ? a : gcd(b, a % b); }

// C++17: std::gcd from <numeric>
// int g = gcd(a, b);           // std::gcd, C++17 (recommended)
// int g = __gcd(a, b);         // legacy GCC built-in, still works

Trace: gcd(48, 18):

gcd(48, 18) → gcd(18, 48%18=12) → gcd(12, 18%12=6) → gcd(6, 0) = 6

LCM and the Overflow Trap

// Solution: LCM — be careful with overflow!

// WRONG: overflows for large a, b
long long lcmWrong(long long a, long long b) {
    return a * b / gcd(a, b);  // a*b can overflow even long long!
}

// CORRECT: divide first, then multiply
long long lcm(long long a, long long b) {
    return a / gcd(a, b) * b;  // divide BEFORE multiplying
}
// a / gcd(a,b) is always an integer, so no precision loss
// Then * b: max value is around 10^18 which fits in long long

lcm(a, b) = a × b / gcd(a, b) = (a / gcd(a, b)) × b

⚠️ Always divide before multiplying to avoid overflow!

Extended Euclidean Algorithm

Finds integers x, y such that ax + by = gcd(a, b) — useful for modular inverse when MOD is not prime:

// Solution: Extended Euclidean Algorithm — O(log(min(a,b)))
// Returns gcd(a,b), and sets x,y such that a*x + b*y = gcd(a,b)
long long extgcd(long long a, long long b, long long &x, long long &y) {
    if (b == 0) { x = 1; y = 0; return a; }
    long long x1, y1;
    long long g = extgcd(b, a % b, x1, y1);
    x = y1;
    y = x1 - (a / b) * y1;
    return g;
}

// Modular inverse using extgcd (works even when MOD is not prime):
long long modInverseExtGcd(long long a, long long mod) {
    long long x, y;
    long long g = extgcd(a, mod, x, y);
    if (g != 1) return -1;  // no inverse exists (gcd != 1)
    return (x % mod + mod) % mod;
}

E.3 Prime Numbers and Sieves

Trial Division

// Solution: Trial Division Primality Test — O(sqrt(N))
bool isPrime(long long n) {
    if (n < 2) return false;
    if (n == 2) return true;
    if (n % 2 == 0) return false;
    for (long long i = 3; i * i <= n; i += 2) {
        if (n % i == 0) return false;
    }
    return true;
}
// Efficient because: if n has a factor > sqrt(n), it must also have one <= sqrt(n)
// Only check odd numbers after 2 (halves the iterations)

Sieve of Eratosthenes

Find all primes up to N efficiently:

// Solution: Sieve of Eratosthenes — O(N log log N) time, O(N) space
// After running, isPrime[i] = true iff i is prime
const int MAXN = 1000005;
bool isPrime[MAXN];

void sieve(int n) {
    fill(isPrime, isPrime + n + 1, true);  // assume all prime initially
    isPrime[0] = isPrime[1] = false;        // 0 and 1 are not prime
    
    for (int i = 2; (long long)i * i <= n; i++) {
        if (isPrime[i]) {
            // Mark all multiples of i as composite
            for (int j = i * i; j <= n; j += i) {
                isPrime[j] = false;
                // Start from i*i (smaller multiples already marked by smaller primes)
            }
        }
    }
}

// Count primes up to N:
void countPrimes(int n) {
    sieve(n);
    int count = 0;
    for (int i = 2; i <= n; i++) {
        if (isPrime[i]) count++;
    }
    cout << count << "\n";
}

Why start inner loop at i²? All multiples of i smaller than i² (i.e., 2i, 3i, ..., (i-1)i) were already marked by smaller primes (2, 3, ..., i-1).

Linear Sieve (Euler Sieve) — `O(N)`

The Euler sieve marks each composite number exactly once:

// Solution: Linear Sieve (Euler Sieve) — O(N) time
// Also computes smallest prime factor (SPF) for each number
const int MAXN = 1000005;
int spf[MAXN];      // smallest prime factor
vector<int> primes;

void linearSieve(int n) {
    fill(spf, spf + n + 1, 0);
    for (int i = 2; i <= n; i++) {
        if (spf[i] == 0) {          // i is prime
            spf[i] = i;
            primes.push_back(i);
        }
        for (int j = 0; j < (int)primes.size() && primes[j] <= spf[i] && (long long)i * primes[j] <= n; j++) {
            spf[i * primes[j]] = primes[j];  // mark composite
        }
    }
}

// Fast prime factorization using SPF:
// O(log N) per factorization
vector<int> factorize(int n) {
    vector<int> factors;
    while (n > 1) {
        factors.push_back(spf[n]);
        n /= spf[n];
    }
    return factors;
}

E.4 Binary Representations and Bit Manipulation

Fundamental Bit Operations

// Solution: Common Bit Operations Reference
int n = 42;   // binary: 101010

// ── AND (&): both bits must be 1 ──
int a = 6 & 3;     // 110 & 011 = 010 = 2

// ── OR (|): at least one bit is 1 ──
int b = 6 | 3;     // 110 | 011 = 111 = 7

// ── XOR (^): exactly one bit is 1 ──
int c = 6 ^ 3;     // 110 ^ 011 = 101 = 5

// ── NOT (~): flip all bits (two's complement) ──
int d = ~6;        // = -7 (in two's complement)

// ── Left shift (<<): multiply by 2^k ──
int e = 1 << 4;    // = 16 = 2^4

// ── Right shift (>>): divide by 2^k (arithmetic) ──
int f = 32 >> 2;   // = 8 = 32/4

Essential Bit Tricks

// Solution: Competitive Programming Bit Tricks

// ── Check if n is odd ──
bool isOdd(int n) { return n & 1; }  // last bit is 1 iff odd

// ── Check if n is a power of 2 ──
bool isPow2(int n) { return n > 0 && (n & (n-1)) == 0; }
// Why? Powers of 2: 1=001, 2=010, 4=100. n-1 flips all lower bits.
// 4 & 3 = 100 & 011 = 000. Non-powers: 6 & 5 = 110 & 101 = 100 ≠ 0.

// ── Get k-th bit (0-indexed from right) ──
bool getBit(int n, int k) { return (n >> k) & 1; }

// ── Set k-th bit to 1 ──
int setBit(int n, int k) { return n | (1 << k); }

// ── Clear k-th bit (set to 0) ──
int clearBit(int n, int k) { return n & ~(1 << k); }

// ── Toggle k-th bit ──
int toggleBit(int n, int k) { return n ^ (1 << k); }

// ── lowbit: lowest set bit (used in Fenwick tree!) ──
int lowbit(int n) { return n & (-n); }
// Example: lowbit(12) = lowbit(1100) = 0100 = 4

// ── Count number of set bits (popcount) ──
int popcount(int n) { return __builtin_popcount(n); }   // use built-in!
// For long long: __builtin_popcountll(n)

// ── Swap two numbers without temp variable ──
void swapXOR(int &a, int &b) {
    a ^= b;
    b ^= a;
    a ^= b;
}
// (usually just use std::swap — this is mainly a curiosity)

// ── Find position of lowest set bit ──
int lowestBitPos(int n) { return __builtin_ctz(n); }  // count trailing zeros
// __builtin_clz(n) = count leading zeros

Subset Enumeration

A powerful technique: enumerate all subsets of a set represented as a bitmask.

// Solution: Subset Enumeration with Bitmasks
// Enumerate all subsets of an N-element set

void enumerateAllSubsets(int n) {
    // Total subsets = 2^n
    for (int mask = 0; mask < (1 << n); mask++) {
        // 'mask' represents a subset: bit i set = element i is included
        cout << "Subset: {";
        for (int i = 0; i < n; i++) {
            if (mask & (1 << i)) {
                cout << i << " ";
            }
        }
        cout << "}\n";
    }
}

// Enumerate all NON-EMPTY subsets of a given set 'S'
void enumerateSubsetsOf(int S) {
    for (int sub = S; sub > 0; sub = (sub - 1) & S) {
        // Process subset 'sub'
        // The trick: (sub-1) & S gives the "next smaller" subset of S
        // This enumerates all 2^|S| subsets of S in O(1) amortized per step
    }
}

// Classic use: bitmask DP
// dp[mask] = minimum cost to visit the set of cities represented by mask
// dp[0] = 0 (start: no cities visited)
// dp[mask | (1 << v)] = min(dp[mask | (1 << v)], dp[mask] + cost[last][v])

E.5 Combinatorics Basics

Counting Formulas

Permutation: P(n, k) = n! / (n-k)! — ordered selection of k from n Combination: C(n, k) = n! / (k! × (n-k)!) — unordered selection of k from n

// Solution: Combinatorics with Modular Arithmetic
// Assumes precompute() from E.1.3 has been called

// C(n, k) = n! / (k! * (n-k)!)
ll combination(int n, int k) {
    if (k < 0 || k > n) return 0;
    return fact[n] * inv_fact[k] % MOD * inv_fact[n-k] % MOD;
}

// P(n, k) = n! / (n-k)!
ll permutation(int n, int k) {
    if (k < 0 || k > n) return 0;
    return fact[n] * inv_fact[n-k] % MOD;
}

// Stars and Bars: number of ways to put n identical balls into k distinct boxes
// = C(n + k - 1, k - 1)
ll starsAndBars(int n, int k) {
    return combination(n + k - 1, k - 1);
}

Pascal's Triangle — Computing C(n, k) without Precomputation

When n is small (n ≤ 2000), Pascal's triangle is simpler:

// Solution: Pascal's Triangle DP — O(n^2) precomputation
const int MAXN = 2005;
ll C[MAXN][MAXN];

void buildPascal() {
    for (int i = 0; i < MAXN; i++) {
        C[i][0] = C[i][i] = 1;
        for (int j = 1; j < i; j++) {
            C[i][j] = (C[i-1][j-1] + C[i-1][j]) % MOD;
        }
    }
}
// Then C[n][k] is the answer for any 0 <= k <= n < MAXN
// This avoids modular inverse entirely — useful when MOD might not be prime

Pascal's Rule: C(n, k) = C(n-1, k-1) + C(n-1, k)

This comes from: "choose k items from n" = "include item n and choose k-1 from n-1" + "exclude item n and choose k from n-1".

Key Combinatorial Identities

// Useful identities in competitive programming:

// Hockey Stick Identity: sum of C(r+k, k) for k=0..n = C(n+r+1, n)
// Useful for: 2D prefix sums, polynomial evaluations

// Vandermonde's Identity: sum_k C(m,k)*C(n,r-k) = C(m+n, r)
// Useful for: counting problems with two groups

// Inclusion-Exclusion:
// |A ∪ B| = |A| + |B| - |A ∩ B|
// |A ∪ B ∪ C| = |A| + |B| + |C| - |A∩B| - |A∩C| - |B∩C| + |A∩B∩C|
// Generalizes to n sets with 2^n terms (or bitmask enumeration)

E.6 Common Mathematical Results for Complexity Analysis

Harmonic Series

1 + 1/2 + 1/3 + ... + 1/N ≈ ln(N) ≈ 0.693 × log₂(N)

This explains why the Sieve of Eratosthenes runs in O(N log log N):

Total work = N/2 + N/3 + N/5 + N/7 + ... (for each prime p, mark N/p multiples)
Sum over primes ≈ N × ln(ln(N))

And why Fenwick tree operations are O(log N): the lowbit operation advances by 1, 2, 4, ... bits.

Key Estimates

Expression	Approximation	Notes
log₂(10⁵)	≈ 17	Depth of BST/segment tree on 10⁵ elements
log₂(10⁹)	≈ 30	Binary search on 10⁹ range
√(10⁶)	= 1000	Trial division up to √N for N ≤ 10⁶
2²⁰	≈ 10⁶	Bitmask DP limit (20 items)
20!	≈ 2.4 × 10¹⁸	Barely fits in `long long`
13!	≈ 6 × 10⁹	Just over `int` limit

Operations Per Second Estimate

Time Limit	Max Operations (safe)
1 second	~10⁸ simple operations
2 seconds	~2 × 10⁸
3 seconds	~3 × 10⁸

Using this, you can estimate if your algorithm is fast enough:

N = 10⁵, O(N log N) → ~1.7 × 10⁶ ops → fast
N = 10⁵, O(N²) → 10¹⁰ ops → too slow
N = 10⁵, O(N√N) → ~3 × 10⁷ ops → borderline (usually OK with 2s limit)

E.7 Complete Math Template

Here's a single file with all the templates from this appendix:

// Solution: Complete Math Template for Competitive Programming
#include <bits/stdc++.h>
using namespace std;
typedef long long ll;
typedef unsigned long long ull;

// ═══════════════════════════════════════════════
// MODULAR ARITHMETIC
// ═══════════════════════════════════════════════
const ll MOD = 1e9 + 7;

ll power(ll base, ll exp, ll mod = MOD) {
    ll result = 1;
    base %= mod;
    while (exp > 0) {
        if (exp & 1) result = result * base % mod;
        base = base * base % mod;
        exp >>= 1;
    }
    return result;
}

ll modInverse(ll a, ll mod = MOD) {
    return power(a, mod - 2, mod);
}

// ═══════════════════════════════════════════════
// FACTORIALS (precomputed up to MAXN)
// ═══════════════════════════════════════════════
const int MAXN = 1000005;
ll fact[MAXN], inv_fact[MAXN];

void precomputeFactorials() {
    fact[0] = 1;
    for (int i = 1; i < MAXN; i++) fact[i] = fact[i-1] * i % MOD;
    inv_fact[MAXN-1] = modInverse(fact[MAXN-1]);
    for (int i = MAXN-2; i >= 0; i--) inv_fact[i] = inv_fact[i+1] * (i+1) % MOD;
}

ll C(int n, int k) {
    if (k < 0 || k > n) return 0;
    return fact[n] * inv_fact[k] % MOD * inv_fact[n-k] % MOD;
}

// ═══════════════════════════════════════════════
// GCD / LCM
// ═══════════════════════════════════════════════
ll gcd(ll a, ll b) { return b == 0 ? a : gcd(b, a % b); }
ll lcm(ll a, ll b)  { return a / gcd(a, b) * b; }

// ═══════════════════════════════════════════════
// PRIME SIEVE
// ═══════════════════════════════════════════════
const int MAXP = 1000005;
bool notPrime[MAXP];
vector<int> primes;

void sieve(int n = MAXP - 1) {
    notPrime[0] = notPrime[1] = true;
    for (int i = 2; i <= n; i++) {
        if (!notPrime[i]) {
            primes.push_back(i);
            for (long long j = (long long)i*i; j <= n; j += i)
                notPrime[j] = true;
        }
    }
}

bool isPrime(int n) { return n >= 2 && !notPrime[n]; }

// ═══════════════════════════════════════════════
// BIT TRICKS
// ═══════════════════════════════════════════════
bool isOdd(int n)       { return n & 1; }
bool isPow2(int n)      { return n > 0 && !(n & (n-1)); }
int  lowbit(int n)      { return n & (-n); }
int  popcount(int n)    { return __builtin_popcount(n); }
int  ctz(int n)         { return __builtin_ctz(n); }  // count trailing zeros

// ═══════════════════════════════════════════════
// EXTENDED GCD
// ═══════════════════════════════════════════════
ll extgcd(ll a, ll b, ll &x, ll &y) {
    if (!b) { x = 1; y = 0; return a; }
    ll x1, y1, g = extgcd(b, a%b, x1, y1);
    x = y1; y = x1 - a/b * y1;
    return g;
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    
    precomputeFactorials();
    sieve();
    
    // Test: C(10, 3) = 120
    cout << C(10, 3) << "\n";
    
    // Test: 2^100 mod (10^9+7)
    cout << power(2, 100) << "\n";
    
    // Test: first few primes
    for (int i = 0; i < 10; i++) cout << primes[i] << " ";
    cout << "\n";
    
    return 0;
}

E.8 Number Theory Quick Reference

Divisibility Rules (useful for manual checks)

Divisor	Rule
2	Last digit is even
3	Sum of digits divisible by 3
4	Last two digits form a number divisible by 4
5	Last digit is 0 or 5
9	Sum of digits divisible by 9
10	Last digit is 0
11	Alternating sum of digits divisible by 11

Integer Square Root

// Safe integer square root (avoids floating point errors)
ll isqrt(ll n) {
    ll x = sqrtl(n);              // floating point approximation
    while (x * x > n) x--;        // correct downward if needed
    while ((x+1) * (x+1) <= n) x++; // correct upward if needed
    return x;
}

Ceiling Division

// Ceiling division: ceil(a/b) for positive integers
ll ceilDiv(ll a, ll b) {
    return (a + b - 1) / b;
    // Or: (a - 1) / b + 1  (same thing for a > 0)
}

❓ FAQ

Q1: When should I use long long?

A: When values might exceed 2 × 10⁹ (roughly the int limit). Typical cases: ① multiplying two large int values (10⁹ × 10⁹ = 10¹⁸); ② summing path weights (N edges, each weight 10⁶, total up to 10¹¹); ③ factorials/combinations (use long long for intermediate calculations even with modular arithmetic). Rule of thumb: use long long whenever there's multiplication in competitive programming code.

Q2: Why use 10⁹ + 7 as the modulus instead of 10⁹?

A: 10⁹ is not prime (= 2⁹ × 5⁹), so Fermat's little theorem can't be used to compute modular inverses. 10⁹ + 7 = 1,000,000,007 is prime, and (10⁹ + 7)² < 2⁶³ (the long long limit), so multiplying two numbers after taking the modulus won't overflow long long.

Q3: How does the bit-manipulation trick in fast exponentiation work?

A: Write the exponent n in binary: n = b_k × 2^k + ... + b_1 × 2 + b_0. Then a^n = a^(b_k × 2^k) × ... × a^(b_1 × 2) × a^b_0. Each loop iteration squares the base (representing a to the power of 2^k), and multiplies into the result when the current bit is 1. This requires only log₂(n) multiplications.

Q4: Why does the Sieve of Eratosthenes start marking from i×i?

A: Multiples 2i, 3i, ..., (i-1)i have already been marked by the smaller primes 2, 3, ..., i-1. For example, 6 = 2×3 was marked by 2; 7×5=35 was marked by 5. Starting from i×i avoids redundant work and optimizes the constant factor.

Q5: Why does n & (n-1) check if n is a power of 2?

A: Powers of 2 have exactly one 1-bit in binary (e.g., 8 = 1000). Subtracting 1 flips the lowest 1-bit to 0 and all lower 0-bits to 1 (e.g., 7 = 0111). So n & (n-1) clears the lowest 1-bit. If n is a power of 2 (only one 1-bit), the result is 0; otherwise it's nonzero.

End of Appendix E — See also: Algorithm Templates | Competitive Programming Tricks

📖 Appendix F ⏱️ ~30 min read 🎯 All Levels

Appendix F: Debugging Guide — Common Bugs & How to Fix Them

💡 Why This Appendix? Even correct algorithmic thinking fails when bugs slip through. This guide is a systematic catalogue of the most common bugs in competitive programming C++ code, organized by category. Bookmark it and check here first when your solution gives WA (Wrong Answer), TLE (Time Limit Exceeded), RE (Runtime Error), or MLE (Memory Limit Exceeded).

F.1 Integer Overflow

The most common source of Wrong Answer in C++.

Problem: `int` is Too Small

int holds values up to ~2.1 × 10⁹ (≈ 2 × 10⁹). Many problems exceed this.

// ❌ WRONG: n*n can overflow when n = 10^5
int n = 100000;
int result = n * n;  // = 10^10 → overflows int (max ~2×10^9)!

// ✅ CORRECT: cast to long long before multiplication
long long result = (long long)n * n;  // = 10^10, fits in long long
// OR:
long long n_ll = n;
long long result2 = n_ll * n_ll;

When to Use `long long`

Situation	Use `long long`?
Array values up to 10⁹, need range sums	✅ Yes (sum can be 10⁹ × 10⁵ = 10¹⁴)
Prefix sums of up to 10⁵ elements	✅ Yes (safe default)
Matrix entries, intermediate DP values	✅ Yes
Distances in shortest path (Dijkstra)	✅ Yes (`dist[u] + w` can overflow `int`)
Simple counters (0 to N where N ≤ 10⁶)	❌ `int` is fine
Indices and loop variables	❌ `int` is fine

Dangerous Operations

// ❌ Overflow examples:
int a = 1e9, b = 1e9;
cout << a + b;     // overflow (answer > INT_MAX)
cout << a * 2;     // overflow
cout << a * a;     // catastrophic overflow

// ❌ Comparison overflow:
if (a * b > 1e18) ...  // a*b itself may have overflowed!

// ✅ Safe versions:
cout << (long long)a + b;
cout << (long long)a * 2;
cout << (long long)a * a;
if ((long long)a * b > (long long)1e18) ...  // compare as long long

`INF` Value Choice

// ❌ WRONG: Using INT_MAX as infinity in Dijkstra
const int INF = INT_MAX;
if (dist[u] + w < dist[v]) ...  // dist[u] + w OVERFLOWS if dist[u]=INT_MAX!

// ✅ CORRECT: Use a safe sentinel
const long long INF = 1e18;   // for long long distances
const int INF_INT = 1e9;       // for int distances (leave headroom for addition)

F.2 Off-By-One Errors

The second most common source of WA.

Array Indexing

// ❌ WRONG: Array out of bounds (accessing index n)
int A[n];
for (int i = 0; i <= n; i++) cout << A[i];  // A[n] is undefined!

// ✅ CORRECT
for (int i = 0; i < n; i++) cout << A[i];   // indices 0..n-1
// OR (1-indexed):
for (int i = 1; i <= n; i++) cout << A[i];  // indices 1..n

Prefix Sum Formula

// ❌ WRONG: Off-by-one in range sum
// sum(L, R) should be P[R] - P[L-1], NOT P[R] - P[L]
cout << P[R] - P[L];    // missing element A[L]!

// ✅ CORRECT
cout << P[R] - P[L-1];  // P[0]=0 handles the L=1 case correctly

Binary Search Boundaries

// Finding first index where A[i] >= target (lower_bound behavior):

// ❌ WRONG: Common binary search mistakes
int lo = 0, hi = n - 1;
while (lo < hi) {
    int mid = (lo + hi) / 2;
    if (A[mid] < target) lo = mid;      // BUG: should be lo = mid + 1
    else hi = mid - 1;                   // BUG: should be hi = mid
}

// ✅ CORRECT: Standard lower_bound template
int lo = 0, hi = n;  // hi = n (not n-1!) to allow "not found" answer
while (lo < hi) {
    int mid = (lo + hi) / 2;
    if (A[mid] < target) lo = mid + 1;  // target is in [mid+1, hi]
    else hi = mid;                       // target is in [lo, mid]
}
// lo = hi = first index with A[i] >= target; lo=n means not found

Loop Bounds

// ❌ Common mistake: loop runs one too few or many times
for (int i = 1; i < n; i++) ...    // misses i=n if you meant i=0 to n-1
for (int i = 0; i <= n-1; i++) ... // OK but confusing; prefer i < n

// DP table filling: check if the recurrence accesses i-1
// ❌ If dp[i] uses dp[i-1], and i starts at 0, then dp[-1] is undefined!
for (int i = 0; i <= n; i++) {
    dp[i] = dp[i-1] + ...;  // BUG when i=0: dp[-1]!
}

// ✅ Start at i=1, or initialize dp[0] as base case separately
dp[0] = BASE_CASE;
for (int i = 1; i <= n; i++) {
    dp[i] = dp[i-1] + ...;  // safe: dp[i-1] always valid
}

F.3 Uninitialized Variables

// ❌ WRONG: dp array not initialized
int dp[1005][1005];  // contains garbage values in C++!
// dp[i][j] might be non-zero from previous test cases or OS memory

// ✅ CORRECT options:
// Option 1: memset (fills bytes, use 0 or 0x3f for near-infinity)
memset(dp, 0, sizeof(dp));          // fills with 0
memset(dp, 0x3f, sizeof(dp));       // fills with ~1.06e9 (useful as "infinity" for int)

// Option 2: vector with explicit initialization
vector<vector<int>> dp(n+1, vector<int>(m+1, 0));

// Option 3: fill
fill(dp, dp + n, 0);

// ⚠️ WARNING: memset(dp, -1, sizeof(dp)) fills each BYTE with 0xFF
// For int: 0xFFFFFFFF = -1 (works for "unvisited" marker)
// For long long: 0xFFFFFFFFFFFFFFFF = -1 (also works)
// But memset(dp, 1, sizeof(dp)) gives 0x01010101 = 16843009, not 1!

Global vs Local Arrays

// Global arrays are zero-initialized by default in C++
// Local (stack) arrays are NOT initialized

int globalArr[100005];     // ✅ initialized to 0
int globalDP[1005][1005];  // ✅ initialized to 0

int main() {
    int localArr[1000];    // ❌ NOT initialized (garbage values)
    int localDP[100][100]; // ❌ NOT initialized
    
    // Tip: Declare large arrays globally to avoid stack overflow AND ensure init
}

F.4 Stack Overflow (Recursion Too Deep)

// C++ default stack size is typically 1-8 MB
// Deep recursion can exceed this → Runtime Error (segfault)

// ❌ Dangerous: DFS/recursion on tree of depth 10^5
void dfs(int u) { dfs(children[u]); }  // stack overflow for long chains!

// ✅ FIX 1: Convert to iterative using explicit stack
void dfs_iterative(int start) {
    stack<int> st;
    st.push(start);
    while (!st.empty()) {
        int u = st.top(); st.pop();
        for (int v : children[u]) st.push(v);
    }
}

// ✅ FIX 2: Increase stack size (platform-specific, contest judges often allow this)
// On Linux, compile and run with: ulimit -s unlimited && ./sol

// Rule of thumb:
// Recursion depth up to ~10^4: usually safe
// Recursion depth up to ~10^5: risky, consider iterative
// Recursion depth up to ~10^6: almost certainly stack overflow → use iterative

F.5 Modular Arithmetic Bugs

// When the problem asks for answer mod 10^9+7:
const int MOD = 1e9 + 7;

// ❌ WRONG: Forgot to mod, result overflows long long
long long dp = 1;
for (int i = 0; i < n; i++) dp *= A[i];  // overflows after ~18 large multiplications!

// ❌ WRONG: Subtraction underflow (result is negative mod)
long long ans = (a - b) % MOD;  // if a < b, result is negative in C++!

// ✅ CORRECT: Add MOD before taking mod of a subtraction
long long ans = ((a - b) % MOD + MOD) % MOD;  // guaranteed non-negative

// ❌ WRONG: Forgetting to mod intermediate values in DP
dp[i][j] = dp[i-1][j] + dp[i][j-1];  // can overflow if iterations are many

// ✅ CORRECT: Mod every addition
dp[i][j] = (dp[i-1][j] + dp[i][j-1]) % MOD;

// ✅ CORRECT modular exponentiation:
long long modpow(long long base, long long exp, long long mod) {
    long long result = 1;
    base %= mod;
    while (exp > 0) {
        if (exp & 1) result = result * base % mod;  // ← mod after each multiply!
        base = base * base % mod;
        exp >>= 1;
    }
    return result;
}

F.6 Graph / BFS / DFS Bugs

// ❌ BFS: Forgetting to mark visited BEFORE entering queue
// This causes nodes to be processed multiple times!
queue<int> q;
q.push(src);
while (!q.empty()) {
    int u = q.front(); q.pop();
    visited[u] = true;  // ❌ Marking AFTER dequeue → same node pushed multiple times
    for (int v : adj[u]) if (!visited[v]) q.push(v);
}

// ✅ CORRECT: Mark visited when ADDING to queue
visited[src] = true;
queue<int> q;
q.push(src);
while (!q.empty()) {
    int u = q.front(); q.pop();
    for (int v : adj[u]) {
        if (!visited[v]) {
            visited[v] = true;  // ✅ Mark BEFORE pushing
            q.push(v);
        }
    }
}

// ❌ DFS: Forgetting to reset visited between test cases
// In problems with multiple test cases, reinitialize visited[]!
memset(visited, false, sizeof(visited));

// ❌ Dijkstra: Using int instead of long long for distances
int dist[MAXN];  // ❌ if edge weights can be up to 10^9, sum overflows!
long long dist[MAXN];  // ✅

F.7 I/O Bugs

// ❌ WRONG: Missing ios_base::sync_with_stdio(false) for large I/O
// Without this, cin/cout are synced with C stdio → very slow!
// For N = 10^6 inputs, this can be the difference between AC and TLE.

// ✅ ALWAYS add at start of main() for competitive programming:
ios_base::sync_with_stdio(false);
cin.tie(NULL);

// ❌ WRONG: Using endl (flushes buffer every line → slow)
for (int i = 0; i < n; i++) cout << ans[i] << endl;  // slow!

// ✅ CORRECT: Use "\n" instead
for (int i = 0; i < n; i++) cout << ans[i] << "\n";  // fast

// ❌ WRONG: Mixing cin and scanf/printf after disabling sync
ios_base::sync_with_stdio(false);
scanf("%d", &n);  // BUG: mixing C and C++ I/O after desync!

// ✅ CORRECT: Pick ONE and stick with it
// Either use cin/cout exclusively, or scanf/printf exclusively

// USACO file I/O (when required):
freopen("problem.in", "r", stdin);
freopen("problem.out", "w", stdout);
// After these lines, cin/cout work with files automatically

F.8 2D Array Bounds and Directions

// Grid BFS: off-by-one in boundary checking
int dx[] = {0, 0, 1, -1};
int dy[] = {1, -1, 0, 0};

// ❌ WRONG: Bounds check is wrong (allows -1 index)
for (int d = 0; d < 4; d++) {
    int nx = x + dx[d], ny = y + dy[d];
    if (nx >= 0 && ny >= 0 && nx < n && ny < m) // ✅ This is actually correct!
    // Just make sure you check ALL FOUR conditions
}

// ❌ WRONG: Wrong dimensions (swapping rows and columns)
// If grid is N rows × M columns:
// A[row][col]: row goes 0..N-1, col goes 0..M-1
// Bounds: row < N, col < M  (NOT row < M!)

// ❌ WRONG: Visiting same cell multiple times (forgetting dist check)
// In multi-source BFS for distance:
if (!visited[nx][ny]) {  // ✅ Only visit unvisited cells
    visited[nx][ny] = true;
    dist[nx][ny] = dist[x][y] + 1;
    q.push({nx, ny});
}

F.9 DP-Specific Bugs

// ❌ WRONG: 0/1 Knapsack inner loop direction
// Must iterate capacity from HIGH to LOW to prevent reusing items!
for (int i = 0; i < n; i++) {
    for (int j = W; j >= weight[i]; j--) {  // ✅ HIGH to LOW
        dp[j] = max(dp[j], dp[j - weight[i]] + value[i]);
    }
}
// If you iterate j from LOW to HIGH:
for (int j = weight[i]; j <= W; j++) {  // ❌ LOW to HIGH = unbounded knapsack!
    dp[j] = max(dp[j], dp[j - weight[i]] + value[i]);
}

// ❌ WRONG: LIS with binary search — using upper_bound vs lower_bound
// For STRICTLY increasing LIS: use lower_bound (find first >= x, replace)
// For NON-DECREASING LIS: use upper_bound (find first > x, replace)
auto it = lower_bound(tails.begin(), tails.end(), x);  // strictly increasing
auto it = upper_bound(tails.begin(), tails.end(), x);  // non-decreasing

// ❌ WRONG: Forgetting base cases
// dp[0] or dp[i][0] or dp[0][j] MUST be explicitly set before the main loop
dp[0][0] = 0;  // always initialize base cases!

F.10 Memory Limit Exceeded (MLE)

// Common causes of MLE:

// ❌ Array too large for the problem
int dp[10005][10005];  // = 10^8 ints = 400MB → exceeds typical 256MB limit!

// Calculate: N*M*sizeof(type) bytes
// int: 4 bytes, long long: 8 bytes
// 256MB = 256 × 10^6 bytes
// Max int array: 64 × 10^6 elements
// Max long long array: 32 × 10^6 elements

// ✅ Space optimization for 1D DP:
// If dp[i] only depends on dp[i-1], use rolling array:
vector<long long> dp(2, 0);  // dp[cur] and dp[prev]
for (int i = 0; i < n; i++) {
    dp[1 - cur] = f(dp[cur]);  // alternate between 0 and 1
    cur = 1 - cur;
}

// ✅ Space optimization for 2D DP (knapsack-style):
// If dp[i][j] only depends on dp[i-1][...], keep only two rows
vector<int> prev_row(W+1, 0), curr_row(W+1, 0);

Quick Diagnosis Checklist

When you get WA/RE/TLE, go through this checklist:

Wrong Answer (WA):

Integer overflow? (Add long long casts or change types)
Off-by-one in array bounds, loop bounds, range sum formula?
Uninitialized array? (Add memset or use vector with init)
Wrong DP transition direction? (0/1 knapsack: high-to-low)
Wrong binary search template? (Verify on [1,2,3] for target 2)
Edge cases: empty input, N=0, N=1, all equal elements?

Runtime Error (RE):

Array out of bounds? (Add bounds checks or use vector)
Stack overflow from deep recursion? (Convert to iterative)
Null/invalid pointer dereference?
Division by zero?

Time Limit Exceeded (TLE):

Missing ios_base::sync_with_stdio(false); cin.tie(NULL);?
O(N²) algorithm when N=10⁵ needs O(N log N)?
Unnecessary recomputation in DP? (Need memoization)
BFS visiting nodes multiple times? (Mark visited before pushing)

Memory Limit Exceeded (MLE):

2D array too large? (Calculate N×M×sizeof bytes)
Recursive DFS with implicit call stack too deep?
Dynamic memory allocation in tight loop?

💡 Pro Tip: Print your intermediate values! cerr << "DEBUG: dp[3] = " << dp[3] << "\n"; cerr goes to stderr (not stdout), so it won't affect your output in competitive programming judges. Remove all cerr lines before final submission.

Glossary of Competitive Programming Terms

This glossary defines 35+ key terms used throughout this book and in competitive programming generally. When you encounter an unfamiliar term, look it up here first.

A

Algorithm A step-by-step procedure for solving a problem. An algorithm must be correct (give the right answer), finite (eventually terminate), and well-defined (each step is unambiguous). Examples: binary search, BFS, merge sort.

Adjacency List A way to represent a graph where each vertex stores a list of its neighbors. Space: O(V + E). The standard representation in competitive programming.

Adjacency Matrix A 2D array where matrix[u][v] = 1 if there's an edge from u to v. Space: O(V²). Use only for dense graphs with V ≤ 1000.

Amortized Time The average time per operation over a sequence of operations. Example: vector::push_back is O(1) amortized even though occasional doubling is O(N).

B

Base Case In recursion and DP, the simplest subproblem with a known answer (requires no further recursion). Example: fib(0) = 0, fib(1) = 1.

BFS (Breadth-First Search) A graph traversal that explores nodes level by level (all nodes at distance 1, then distance 2, ...). Uses a queue. Guarantees shortest path in unweighted graphs. Time: O(V + E).

Big-O Notation A mathematical notation describing the upper bound on an algorithm's time or space growth. "O(N log N)" means "at most c × N × log(N) operations for some constant c." Used to compare algorithm efficiency.

Binary Search An O(log N) search algorithm on a sorted array. Each step eliminates half the remaining candidates by comparing with the midpoint. The most important application: "binary search on the answer" for optimization problems.

Brute Force A naive solution that tries all possibilities. Usually O(N²) or O(2^N). Correct but too slow for large inputs. Useful for: partial credit, verifying optimized solutions, small test cases.

C

Comparator A function that defines a sorting order. Takes two elements and returns true if the first should come before the second. Used with std::sort.

Competitive Programming A type of programming contest where participants solve algorithmic problems within a time limit. USACO, Codeforces, LeetCode, and IOI are popular platforms.

Connected Component A maximal subgraph where every pair of vertices is connected by a path. Find components with DFS/BFS or Union-Find.

Coordinate Compression Mapping a large range of values (e.g., up to 10^9) to small consecutive indices (0, 1, 2, ...) without changing relative order. Enables using arrays instead of hash maps.

D

DAG (Directed Acyclic Graph) A directed graph with no cycles. Key property: has a topological ordering. Examples: dependency graphs, task scheduling.

DFS (Depth-First Search) A graph traversal that explores as deep as possible before backtracking. Uses a stack (or recursion). Good for: connectivity, cycle detection, topological sort. Time: O(V + E).

Difference Array A technique for O(1) range updates. Store differences between consecutive elements; range add [L,R] becomes diff[L]++ and diff[R+1]--. Reconstruct with prefix sums.

DP (Dynamic Programming) An optimization technique that solves problems by breaking them into overlapping subproblems and caching results. Two properties needed: optimal substructure + overlapping subproblems. See: memoization, tabulation.

DSU (Disjoint Set Union) See Union-Find.

E

Edge A connection between two vertices in a graph. Can be directed (one-way) or undirected (two-way). May have a weight.

Exchange Argument A proof technique for greedy algorithms. Show that swapping the greedy choice with any other choice never worsens the solution.

F

Flood Fill An algorithm (usually DFS or BFS) that marks all connected cells of the same "color" in a grid. Used to count connected regions.

G

Graph A data structure consisting of vertices (nodes) and edges (connections). Models relationships, networks, maps, etc.

Greedy Algorithm An algorithm that makes the locally optimal choice at each step, hoping for a globally optimal result. Works when the "greedy choice property" holds. Examples: activity selection, Huffman coding, Kruskal's MST.

H

Hash Map (unordered_map) A data structure that stores key-value pairs with O(1) average lookup. Implemented with hash tables. No ordering guarantee. Use when you need fast lookup but don't need sorted keys.

I

Interval DP A DP pattern where the state is a subarray [l, r] and you try all split points. Classic examples: matrix chain multiplication, palindrome partitioning. Time: O(N³).

K

Knapsack Problem A DP problem: given items with weights and values, maximize value within a weight limit. "0/1 knapsack" means each item used at most once. "Unbounded knapsack" means unlimited uses.

L

LIS (Longest Increasing Subsequence) The longest subsequence of an array where each element is strictly greater than the previous. O(N²) DP or O(N log N) with binary search.

LCA (Lowest Common Ancestor) The deepest node that is an ancestor of both u and v in a rooted tree. Naive: O(depth) per query. Binary lifting: O(log N).

M

Memoization Caching the results of recursive function calls to avoid recomputation. "Top-down DP." A memo table stores computed values; before computing, check if the answer is already known.

MST (Minimum Spanning Tree) A spanning tree of a weighted graph with minimum total edge weight. Kruskal's algorithm: sort edges + DSU. Prim's algorithm: priority queue + visited set. Both O(E log E).

Monotone / Monotonic Consistently increasing or decreasing. A function is monotone if it never reverses direction. Key for binary search on answer: the feasibility function must be monotone.

O

Off-By-One Error A bug where an index or count is wrong by exactly 1. Very common in loops (< n vs <= n), binary search, prefix sums (P[L-1] vs P[L]).

Optimal Substructure A property: the optimal solution to a problem can be built from optimal solutions to its subproblems. Required for DP to work correctly.

Overflow When a value exceeds the maximum representable value for its type. int max is ~2×10^9; long long max is ~9.2×10^18. Multiplying two 10^9 ints overflows int — cast to long long first.

P

Prefix Sum An array where P[i] = sum of all elements from index 0 (or 1) through i. Enables O(1) range sum queries: sum(L,R) = P[R] - P[L-1].

R

Recurrence Relation A formula expressing a DP value in terms of smaller DP values. Example: fib(n) = fib(n-1) + fib(n-2). Defines the DP transition.

S

Segment Tree A data structure for range queries and updates in O(log N). More powerful than prefix sums (supports updates). A Gold/Platinum topic.

Sparse Graph A graph with few edges relative to V². In practice: E = O(V). Use adjacency lists.

State (DP) The set of information that uniquely identifies a DP subproblem. Example in knapsack: (item_index, remaining_capacity). Choosing the right state is the key skill in DP.

Subtree All nodes in a tree that are descendants of a given node (including itself). Tree DP often computes aggregate values over subtrees.

T

Tabulation Building a DP table iteratively from base cases to larger subproblems. "Bottom-up DP." No recursion, no stack overflow risk.

Time Limit Exceeded (TLE) A verdict meaning your solution is correct but too slow. In USACO, most problems have a 2-4 second time limit. If you get TLE, optimize the algorithm — not just the constant factors.

Topological Sort An ordering of vertices in a DAG such that for every directed edge u→v, u comes before v. Computed with DFS (reverse post-order) or Kahn's algorithm (BFS-based).

Two Pointers A technique using two indices moving through an array, usually in the same direction. Converts O(N²) pair searches into O(N). Works on sorted arrays or when the condition is monotone.

U

Union-Find (DSU) A data structure supporting two operations: find(x) (which group is x in?) and union(x,y) (merge groups of x and y). With path compression + union by rank: O(α(N)) ≈ O(1) per operation. Used for dynamic connectivity, Kruskal's MST, cycle detection.

V

Vertex (Node) A fundamental unit of a graph. Vertices have indices (usually 1-indexed in USACO).

W

Wrong Answer (WA) A verdict meaning your program ran but produced incorrect output. Check edge cases, off-by-ones, and overflow.

📊 Knowledge Dependency Map

This interactive map shows prerequisite relationships between all chapters. Click any node to highlight its prerequisites (red) and dependent chapters (green).

Foundation

Data Structures

Graph Algorithms

Dynamic Programming

Greedy

← Prerequisite (red)

→ Unlocks (green)

Click a chapter node to see dependencies

How to Read This Map

Color	Meaning
🔵 Blue nodes	C++ Foundation chapters (Ch.2.1–3.1)
🟢 Green nodes	Core Data Structure chapters
🟠 Orange nodes	Graph Algorithm chapters
🟣 Purple nodes	Dynamic Programming chapters
🔴 Red nodes	Greedy Algorithm chapters
Red highlighted edges	Prerequisites of the selected chapter
Green highlighted edges	Chapters unlocked by the selected chapter

Tip: Click any node to reveal its full dependency chain. Click again (or press "↺ Clear Selection") to reset.