📖 Chapter 6.3 ⏱️ ~55 min read 🎯 Advanced

Chapter 6.3: Advanced DP Patterns

📝 Before You Continue: You must have completed Chapter 6.1 (Introduction to DP) and Chapter 6.2 (Classic DP Problems). Advanced patterns build on memoization, tabulation, and the classic DP problems (LIS, knapsack, grid paths).

This chapter covers DP techniques that appear at USACO Silver and above: bitmask DP, interval DP, tree DP, and digit DP. Each has a characteristic structure that, once recognized, makes the problem tractable.

6.3.1 Bitmask DP

When to use: Problems involving subsets of a small set (N ≤ 20), where the state includes "which elements have been selected."

Core idea: Represent the set of selected elements as a bitmask (integer). Bit i is 1 if element i is included.

{0, 2, 3} in a set of 5 elements → bitmask = 0b01101 = 13
bit 0 = 1 (element 0 ∈ set)
bit 1 = 0 (element 1 ∉ set)
bit 2 = 1 (element 2 ∈ set)
bit 3 = 1 (element 3 ∈ set)
bit 4 = 0 (element 4 ∉ set)

Essential Bitmask Operations

// Element operations
int mask = 0;
mask |= (1 << i);      // add element i to set
mask &= ~(1 << i);     // remove element i from set
bool has_i = (mask >> i) & 1;  // check if element i is in set

// Enumerate all subsets of mask
for (int sub = mask; sub > 0; sub = (sub - 1) & mask) {
    // process subset 'sub'
}
// Include the empty subset too: add sub=0 after the loop

// Count bits set (number of elements in set)
int count = __builtin_popcount(mask);   // for int
int count = __builtin_popcountll(mask); // for long long

// Enumerate all masks with exactly k bits set
for (int mask = 0; mask < (1 << n); mask++) {
    if (__builtin_popcount(mask) == k) { /* ... */ }
}

Classic: Traveling Salesman Problem (TSP) — O(2^N × N²)

Problem: N cities, complete weighted graph. Find the minimum-cost Hamiltonian path (visit every city exactly once).

State: dp[mask][u] = minimum cost to visit exactly the cities in mask, ending at city u.

Transition: To extend to city v not in mask:

dp[mask | (1<<v)][v] = min(dp[mask][v], dp[mask][u] + dist[u][v])

// Solution: TSP with Bitmask DP — O(2^N × N^2)
// Works for N ≤ 20 (2^20 × 400 ≈ 4×10^8 — tight; N≤18 is safer)
#include <bits/stdc++.h>
using namespace std;

typedef long long ll;
const ll INF = 1e18;

int n;
int dist[20][20];
ll dp[1 << 20][20];

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    cin >> n;
    for (int i = 0; i < n; i++)
        for (int j = 0; j < n; j++)
            cin >> dist[i][j];

    // Initialize: INF everywhere
    for (int mask = 0; mask < (1 << n); mask++)
        fill(dp[mask], dp[mask] + n, INF);

    // Base case: start at city 0, only city 0 visited
    dp[1][0] = 0;  // mask=1 (bit 0 set), at city 0, cost=0

    // Fill DP
    for (int mask = 1; mask < (1 << n); mask++) {
        for (int u = 0; u < n; u++) {
            if (!(mask & (1 << u))) continue;  // u not in current set
            if (dp[mask][u] == INF) continue;

            // Try extending to city v not yet visited
            for (int v = 0; v < n; v++) {
                if (mask & (1 << v)) continue;  // v already visited
                int newMask = mask | (1 << v);
                dp[newMask][v] = min(dp[newMask][v], dp[mask][u] + dist[u][v]);
            }
        }
    }

    // Answer: minimum over all ending cities to return to city 0
    // (or just minimum over all ending cities for Hamiltonian PATH, not cycle)
    int fullMask = (1 << n) - 1;  // all cities visited
    ll ans = INF;
    for (int u = 1; u < n; u++) {  // end at any city except 0
        ans = min(ans, dp[fullMask][u] + dist[u][0]);  // return to 0 for cycle
    }

    cout << ans << "\n";
    return 0;
}

⚠️ Memory Warning: dp[1<<20][20] uses 2^20 × 20 × 8 bytes ≈ 168MB（而非 160MB）. For N=20, this is close to typical 256MB memory limits. If distances fit in int, use int dp instead of long long to halve memory to ~84MB.

6.3.2 Interval DP

When to use: Problems where the answer for a larger interval can be built from answers for smaller intervals. Keywords: "merge," "split," "burst," "matrix chain."

Core structure:

dp[l][r] = optimal answer for subproblem on interval [l, r]
Base case: dp[i][i] = trivial (single element)
Transition: dp[l][r] = min/max over k ∈ [l, r-1] of:
              dp[l][k] + dp[k+1][r] + cost(l, k, r)
Fill order: by INCREASING interval length (len = r - l + 1)

Classic: Matrix Chain Multiplication — O(N³)

Problem: Multiply N matrices in sequence. Matrix i has dimensions dims[i] × dims[i+1]. The number of scalar multiplications to multiply A (p×q) by B (q×r) is p*q*r. Find the parenthesization that minimizes total multiplications.

State: dp[l][r] = minimum multiplications to compute the product of matrices l through r.

// Solution: Matrix Chain Multiplication — O(N^3), O(N^2) space
#include <bits/stdc++.h>
using namespace std;

typedef long long ll;
const ll INF = 1e18;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n;
    cin >> n;
    // dims[i] = rows of matrix i; dims[i+1] = cols of matrix i
    vector<int> dims(n + 1);
    for (int i = 0; i <= n; i++) cin >> dims[i];

    // dp[l][r] = min multiplications to compute M_l × M_{l+1} × ... × M_r
    vector<vector<ll>> dp(n + 1, vector<ll>(n + 1, 0));

    // Fill by increasing interval length
    for (int len = 2; len <= n; len++) {          // len = number of matrices
        for (int l = 1; l + len - 1 <= n; l++) {
            int r = l + len - 1;
            dp[l][r] = INF;

            // Try all split points k (split after matrix k)
            for (int k = l; k < r; k++) {
                // Cost: compute [l..k], compute [k+1..r], then multiply the results
                // Result of [l..k]: dims[l-1] × dims[k]
                // Result of [k+1..r]: dims[k] × dims[r]
                ll cost = dp[l][k] + dp[k+1][r]
                        + (ll)dims[l-1] * dims[k] * dims[r]; // ← KEY: cost of final multiply
                dp[l][r] = min(dp[l][r], cost);
            }
        }
    }

    cout << dp[1][n] << "\n";
    return 0;
}

Worked Example:

3 matrices: A(10×30), B(30×5), C(5×60)
dims = [10, 30, 5, 60]

dp[1][1] = dp[2][2] = dp[3][3] = 0 (single matrices, no multiplication)

len=2:
  dp[1][2] = dp[1][1] + dp[2][2] + 10*30*5 = 0 + 0 + 1500 = 1500
  dp[2][3] = dp[2][2] + dp[3][3] + 30*5*60 = 0 + 0 + 9000 = 9000

len=3:
  dp[1][3]: try k=1 and k=2
    k=1: dp[1][1] + dp[2][3] + 10*30*60 = 0 + 9000 + 18000 = 27000
    k=2: dp[1][2] + dp[3][3] + 10*5*60 = 1500 + 0 + 3000 = 4500  ← minimum!
  dp[1][3] = 4500

Answer: 4500 (parenthesize as (A×B)×C)
Verify: (10×30)×5 = 1500 ops, then (10×5)×60 = 3000 ops, total = 4500 ✓

Classic: Burst Balloons (Variant of Interval DP)

Problem: N balloons with values. Burst balloon i: earn left_value × value[i] × right_value. Find maximum coins.

// dp[l][r] = max coins from bursting ALL balloons in (l, r) exclusively
// (l and r are boundaries, not burst)
// Key insight: think about which balloon is burst LAST in [l, r]
// (The last balloon sees l and r as neighbors)

// Add sentinel balloons: val[-1] = val[n] = 1
vector<int> val(n + 2);
val[0] = val[n + 1] = 1;
for (int i = 1; i <= n; i++) cin >> val[i];

vector<vector<ll>> dp(n + 2, vector<ll>(n + 2, 0));

for (int len = 1; len <= n; len++) {
    for (int l = 1; l + len - 1 <= n; l++) {
        int r = l + len - 1;
        for (int k = l; k <= r; k++) {
            // k is the LAST balloon burst in [l, r]
            // When k is burst, its neighbors are l-1 and r+1 (sentinels)
            ll cost = dp[l][k-1] + dp[k+1][r]
                    + (ll)val[l-1] * val[k] * val[r+1];
            dp[l][r] = max(dp[l][r], cost);
        }
    }
}
cout << dp[1][n] << "\n";

6.3.3 Tree DP

When to use: DP on a tree, where the state of a node depends on its subtree (post-order) or its ancestors (pre-order).

Pattern: Subtree DP (Post-Order)

dp[u] = some value computed from dp[children of u]
Process nodes in post-order (leaves first, root last)

Classic: Tree Knapsack / Maximum Independent Set on Tree

Problem: N nodes, each with value val[u]. Select a subset S maximizing total value, subject to: if u ∈ S, then no child of u is in S.

State: dp[u][0] = max value from subtree of u if u is NOT selected. dp[u][1] = max value from subtree of u if u IS selected.

// Solution: Max Independent Set on Tree — O(N)
#include <bits/stdc++.h>
using namespace std;

const int MAXN = 100005;
vector<int> children[MAXN];
int val[MAXN];
long long dp[MAXN][2];  // dp[u][0/1] = max value if u excluded/included

// DFS post-order: compute dp[u] after computing all dp[children]
void dfs(int u) {
    dp[u][1] = val[u];  // include u: get val[u]
    dp[u][0] = 0;        // exclude u: get 0 from this node

    for (int v : children[u]) {
        dfs(v);  // ← process child first (post-order)

        // If we INCLUDE u: children must be EXCLUDED
        dp[u][1] += dp[v][0];

        // If we EXCLUDE u: children can be either included or excluded
        dp[u][0] += max(dp[v][0], dp[v][1]);
    }
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int n, root;
    cin >> n >> root;
    for (int i = 1; i <= n; i++) cin >> val[i];

    for (int i = 0; i < n - 1; i++) {
        int u, v;
        cin >> u >> v;
        children[u].push_back(v);
        // Note: if the tree is given as undirected edges, need to root it first
    }

    dfs(root);
    cout << max(dp[root][0], dp[root][1]) << "\n";
    return 0;
}

Tree Diameter (Two DFS)

// Tree Diameter: longest path between any two nodes
// Method: Two DFS
// 1. DFS from any node u → find farthest node v
// 2. DFS from v → find farthest node w
// dist(v, w) = diameter

int farthest_node, max_dist;

void dfs_diameter(int u, int parent, int d, vector<int> adj[]) {
    if (d > max_dist) {
        max_dist = d;
        farthest_node = u;
    }
    for (int v : adj[u]) {
        if (v != parent) dfs_diameter(v, u, d + 1, adj);
    }
}

int tree_diameter(int n, vector<int> adj[]) {
    // First DFS from node 1
    max_dist = 0; farthest_node = 1;
    dfs_diameter(1, -1, 0, adj);

    // Second DFS from farthest node found
    int v = farthest_node;
    max_dist = 0;
    dfs_diameter(v, -1, 0, adj);

    return max_dist;  // this is the diameter
}

6.3.4 Digit DP

When to use: Count numbers in range [1, N] satisfying some property related to their digits.

Core idea: Build the number digit by digit (left to right), maintaining a "tight" constraint (whether we're still bounded by N's digits).

State: dp[position][tight][...other state...]

position: which digit we're currently deciding (0 = leftmost)
tight: are we still constrained by N? (1 = yes, can't exceed N's digit; 0 = no, can use 0-9 freely)
Other state: whatever property we're tracking (sum of digits, count of zeros, etc.)

Classic: Count numbers in [1, N] with digit sum divisible by K

// Solution: Digit DP — O(|digits| × 10 × K) time, O(|digits| × K) space
#include <bits/stdc++.h>
using namespace std;

string num;     // N as a string
int K;
// dp[pos][tight][sum % K] = count of valid numbers
// Here we use top-down memoization
map<tuple<int,int,int>, long long> memo;

// pos: current digit position (0-indexed)
// tight: are we bounded by num[pos]?
// rem: current digit sum mod K
long long solve(int pos, bool tight, int rem) {
    if (pos == (int)num.size()) {
        return rem == 0 ? 1 : 0;  // complete number: valid iff digit sum ≡ 0 (mod K)
    }

    auto key = make_tuple(pos, tight, rem);
    if (memo.count(key)) return memo[key];

    int limit = tight ? (num[pos] - '0') : 9;  // max digit we can place here
    long long result = 0;

    for (int d = 0; d <= limit; d++) {
        bool new_tight = tight && (d == limit);
        result += solve(pos + 1, new_tight, (rem + d) % K);
    }

    return memo[key] = result;
}

// Count numbers in [1, N] with digit sum divisible by K
long long count_up_to(long long N) {
    num = to_string(N);
    memo.clear();
    long long ans = solve(0, true, 0);
    // Subtract 1 because 0 itself has digit sum 0 (divisible by K)
    // but we want [1, N], not [0, N]
    return ans - 1;
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    long long L, R;
    cin >> L >> R >> K;

    // Count in [L, R] = count_up_to(R) - count_up_to(L-1)
    cout << count_up_to(R) - count_up_to(L - 1) << "\n";
    return 0;
}

💡 Key Insight: The tight flag is crucial. When tight=true, we can only use digits up to num[pos]. Once we place a digit less than num[pos], all subsequent digits are free (0–9), so tight becomes false. This "peeling off" of the upper bound is what makes digit DP correct.

6.3.5 DP Optimization: When Standard DP Is Too Slow

Slope Trick (O(N log N) for Convex/Concave DP)

For DPs of the form dp[i] = min_{j<i} (dp[j] + cost(j, i)) where the cost function has "convex" structure.

Divide & Conquer Optimization (O(N² → N log N))

When the optimal split point opt[i][j] is monotone:

opt[i][j] ≤ opt[i][j+1] (or similar monotone property)
Reduces cubic DP to O(N log N) per DP dimension

Standard interval DP: O(N^3)
With D&C optimization: O(N^2 log N)
With Knuth's optimization: O(N^2) (requires additional condition)

📌 USACO Relevance: These optimizations are typically USACO Gold/Platinum level. For Silver, mastery of the four patterns in this chapter (bitmask, interval, tree, digit) is sufficient.

Chapter Summary

📌 Pattern Recognition Guide

Pattern	Clue in Problem	State	Transition
Bitmask DP	"subset," N ≤ 20, assign tasks	`dp[mask][last]`	Flip bit, try next element
Interval DP	"merge," "split," "parenthesize"	`dp[l][r]`	Split at k, combine
Tree DP	"tree," subtree property	`dp[node][state]`	Aggregate from children
Digit DP	"count numbers with property"	`dp[pos][tight][...]`	Try each digit d

🧩 Core Framework Quick Reference

// Bitmask DP framework
for (int mask = 0; mask < (1<<n); mask++)
    for (int u = 0; u < n; u++) if (mask & (1<<u))
        for (int v = 0; v < n; v++) if (!(mask & (1<<v)))
            dp[mask|(1<<v)][v] = min(dp[mask|(1<<v)][v], dp[mask][u] + cost[u][v]);

// Interval DP framework
for (int len = 2; len <= n; len++)           // enumerate interval length
    for (int l = 1; l+len-1 <= n; l++) {     // enumerate left endpoint
        int r = l + len - 1;
        for (int k = l; k < r; k++)           // enumerate split point
            dp[l][r] = min(dp[l][r], dp[l][k] + dp[k+1][r] + cost(l,k,r));
    }

// Tree DP framework (post-order traversal)
void dfs(int u, int parent) {
    for (int v : adj[u]) if (v != parent) {
        dfs(v, u);
        dp[u] = update(dp[u], dp[v]);  // update current node with child info
    }
}

// Digit DP framework
long long solve(int pos, bool tight, int state) {
    if (pos == len) return (state == target) ? 1 : 0;
    if (memo[pos][tight][state] != -1) return memo[pos][tight][state];
    int lim = tight ? (num[pos]-'0') : 9;
    long long res = 0;
    for (int d = 0; d <= lim; d++)
        res += solve(pos+1, tight && (d==lim), next_state(state, d));
    return memo[pos][tight][state] = res;
}

❓ FAQ

Q1: Why must interval DP enumerate by length first?

A: Because dp[l][r] depends on dp[l][k] and dp[k+1][r], both of which have length less than r-l+1. So all shorter intervals must be computed before dp[l][r]. Enumerating by length from small to large satisfies this requirement. If you enumerate l and r directly, you may compute dp[l][r] before its dependencies are ready.

Q2: In tree DP, how do you handle an unrooted tree (given undirected edges)?

A: Choose any node as root (usually node 1), then use DFS to turn undirected edges into directed edges (parent→child direction). Pass a parent parameter in DFS to avoid going back to the parent.

void dfs(int u, int par) {
    for (int v : adj[u]) {
        if (v != par) {  // only visit children, not parent
            dfs(v, u);
            // Updated dp[u]
        }
    }
}

Q3: In digit DP, can tight=true and tight=false share the same memoization array?

A: Yes, which is exactly why tight is part of the state. dp[pos][1][rem] and dp[pos][0][rem] are different states, recording "count under upper bound constraint" and "count when free" respectively. Note that tight=false states can be reused across multiple calls (once tight becomes false, the remaining digits are unconstrained).

Practice Problems

Problem 6.3.1 — Bitmask DP: Task Assignment 🟡 Medium N workers, N tasks. Worker i can do task j in time[i][j] hours. Assign each task to exactly one worker to minimize total time. (N ≤ 15)

Hint

dp[mask] = minimum time to complete the tasks in `mask`, with worker popcount(mask)-1 assigned to the "new" task. Actually: dp[mask] = min time to assign the first popcount(mask) workers to the tasks in mask.

Problem 6.3.2 — Interval DP: Palindrome Partitioning 🟡 Medium Find the minimum number of cuts to partition a string into palindromes.

Hint

First precompute isPalin[l][r] with interval DP. Then dp[i] = min cuts for s[0..i].

Problem 6.3.3 — Tree DP: Maximum Matching 🔴 Hard Find the maximum matching in a tree (maximum set of edges with no shared vertex).

Hint

dp[u][0] = max matching in subtree of u when u is NOT matched. dp[u][1] = max matching in subtree of u when u IS matched (to one child).

Problem 6.3.4 — Digit DP: Count Lucky Numbers 🟡 Medium A "lucky" number only contains digits 4 and 7. Count lucky numbers in [1, N].

Hint

This can be solved without DP (just enumerate 2^k possibilities for k ≤ 18 digits). But practice the digit DP framework: state = (position, tight, has_only_4_7_so_far).

Problem 6.3.5 — Mixed: USACO 2019 December Platinum 🔴 Hard Cow Poetry — combinatorics + DP. Count poem arrangements with specific rhyme schemes.

Hint

Group lines by their suffix hash. Use DP to count valid arrangements.

C++ for Competitive Programming: A USACO Guide