Chapter 3.2: Arrays and Prefix Sums
📝 Prerequisites: Make sure you are comfortable with arrays, vectors, and basic loops (Chapters 2.2–2.3). You should also understand
long longoverflow (Chapter 2.1).
Imagine you have an array of N numbers, and someone asks you 100,000 times: "What is the sum of the elements from index L to index R?" The naive approach recomputes the sum every time — O(N) per query, for a total of O(N × Q). When N = Q = 10^5, that means 10^10 operations, which is far too slow.
Prefix sums solve this problem with O(N) preprocessing and O(1) per query. This is one of the most elegant and practical techniques in competitive programming.
💡 Key idea: Prefix sums turn a "range query" problem into a single subtraction. Instead of summing from L to R every time, you precompute cumulative sums and subtract two values. This replaces repeated
O(Q)work with one-timeO(N)preprocessing.
3.2.1 The Prefix Sum Idea
The prefix sum of an array is a new array where each element stores the cumulative sum up to the current index.
Visual: Prefix Sum Array
The figure above shows how the prefix sum array is built from the original array, and how sum(L, R) = P[R] - P[L-1] computes a range sum in O(1) time. The blue cells mark the query range, while the red and green cells show the two prefix values being subtracted.
Given array: A = [3, 1, 4, 1, 5, 9, 2, 6] (using 1-indexing for explanation)
Index: 1 2 3 4 5 6 7 8
A: 3 1 4 1 5 9 2 6
P: 3 4 8 9 14 23 25 31
Here P[i] = A[1] + A[2] + ... + A[i].
Why use 1-indexing?
Using 1-indexed arrays lets us define P[0] = 0 (the "empty prefix" has sum zero). This makes the query formula P[R] - P[L-1] work even when L = 1: we compute P[R] - P[0] = P[R], which is correct.
Building the prefix sum array
📄 View code: Building the prefix sum array
// Build prefix sum array — O(N)
#include <bits/stdc++.h>
using namespace std;
int main() {
ios_base::sync_with_stdio(false);
cin.tie(NULL);
int n;
cin >> n;
// Step 1: read input (1-indexed)
vector<int> A(n + 1);
for (int i = 1; i <= n; i++) cin >> A[i];
// Step 2: build prefix sums
vector<long long> P(n + 1, 0); // P[0] = 0 (base case)
for (int i = 1; i <= n; i++) {
P[i] = P[i - 1] + A[i]; // ← key line: each P[i] = sum of all elements up to i
}
return 0;
}
Complexity analysis:
- Time:
O(N)— one pass over the array - Space:
O(N)— stores the prefix array
Step-by-step trace for A = [3, 1, 4, 1, 5]:
i=1: P[1] = P[0] + A[1] = 0 + 3 = 3
i=2: P[2] = P[1] + A[2] = 3 + 1 = 4
i=3: P[3] = P[2] + A[3] = 4 + 4 = 8
i=4: P[4] = P[3] + A[4] = 8 + 1 = 9
i=5: P[5] = P[4] + A[5] = 9 + 5 = 14
3.2.2 O(1) Range Sum Queries
Once we have the prefix sum array, the sum from index L to R is:
sum(L, R) = P[R] - P[L-1]
Why? P[R] is the sum of elements 1..R. P[L-1] is the sum of elements 1..(L-1). Their difference is exactly the sum of elements L..R.
💡 Key idea: Think of P[i] as "the total sum of the first i elements." To get the sum of the window [L, R], subtract "the prefix before L" from "the prefix up to R." It is like: big triangle minus small triangle = trapezoid.
📄 Complete C++ code
// Range sum queries — preprocessing O(N), each query O(1)
#include <bits/stdc++.h>
using namespace std;
const int MAXN = 100001;
long long A[MAXN];
long long P[MAXN];
int main() {
ios_base::sync_with_stdio(false);
cin.tie(NULL);
int n, q;
cin >> n >> q;
// Step 1: read the array
for (int i = 1; i <= n; i++) cin >> A[i];
// Step 2: build prefix sums — O(n)
P[0] = 0;
for (int i = 1; i <= n; i++) {
P[i] = P[i - 1] + A[i];
}
// Step 3: answer q range sum queries — each O(1)
for (int i = 0; i < q; i++) {
int l, r;
cin >> l >> r;
cout << P[r] - P[l - 1] << "\n"; // ← key line: range sum formula
}
return 0;
}
Sample input:
8 3
3 1 4 1 5 9 2 6
1 4
3 7
2 6
Sample output:
9
21
20
Verification:
sum(1,4) = P[4] - P[0] = 9 - 0 = 9→ A[1]+A[2]+A[3]+A[4] = 3+1+4+1 = 9 ✓sum(3,7) = P[7] - P[2] = 25 - 4 = 21→ A[3]+...+A[7] = 4+1+5+9+2 = 21 ✓sum(2,6) = P[6] - P[1] = 23 - 3 = 20→ A[2]+...+A[6] = 1+4+1+5+9 = 20 ✓
⚠️ Common mistake: Writing
P[R] - P[L]instead ofP[R] - P[L-1]. The formula includes both endpoints L and R — you need to subtract the sum before L, not the sum at L.
Total complexity: O(N + Q) — easily fast enough for N and Q up to 10^5.
3.2.3 USACO Example: Breed Counting
This is a classic USACO Bronze problem (December 2015).
Problem: N cows stand in a line. Each cow has breed 1, 2, or 3. Answer Q queries: how many cows of breed B are in positions L through R?
Solution: Maintain one prefix sum array for each breed.
📄 Complete C++ code
// Multi-breed prefix sums — O(N + Q)
#include <bits/stdc++.h>
using namespace std;
int main() {
ios_base::sync_with_stdio(false);
cin.tie(NULL);
int n, q;
cin >> n >> q;
vector<int> breed(n + 1);
vector<vector<long long>> P(4, vector<long long>(n + 1, 0));
// P[b][i] = number of cows of breed b in positions 1..i
// Step 1: build prefix sums for each breed
for (int i = 1; i <= n; i++) {
cin >> breed[i];
for (int b = 1; b <= 3; b++) {
P[b][i] = P[b][i - 1] + (breed[i] == b ? 1 : 0); // ← key line
}
}
// Step 2: answer each query in O(1)
for (int i = 0; i < q; i++) {
int l, r, b;
cin >> l >> r >> b;
cout << P[b][r] - P[b][l - 1] << "\n";
}
return 0;
}
🏆 USACO tip: Many USACO Bronze problems ask you to "count elements satisfying property X inside a range." If Q is large, always consider prefix sums.
3.2.4 USACO-Style Walkthrough: FJ's Grass Fields
🔗 Related problems: This fictional USACO-style problem is inspired by "Breed Counting" and "Tallest Cow" — both classic Bronze problems.
Problem: FJ has N consecutive fields. The i-th field has grass[i] units of grass. He needs to answer Q queries: "What is the total amount of grass from field L to field R, inclusive?" N and Q can be as large as 10^5, and each query must be answered in O(1).
Sample input:
6 4
4 2 7 1 8 3
1 3
2 5
4 6
1 6
Sample output:
13
18
12
25
Step-by-step solution:
Step 1: Understand the problem. We have the array [4, 2, 7, 1, 8, 3], and we need range sums.
Step 2: Build the prefix sum array.
Index: 0 1 2 3 4 5 6
Grass: - 4 2 7 1 8 3
P: 0 4 6 13 14 22 25
Step 3: Answer queries with P[R] - P[L-1]:
- Query (1,3):
P[3] - P[0] = 13 - 0 = 13✓ - Query (2,5):
P[5] - P[1] = 22 - 4 = 18✓ - Query (4,6):
P[6] - P[3] = 25 - 13 = 12✓ - Query (1,6):
P[6] - P[0] = 25 - 0 = 25✓
Complete C++ solution:
📄 Complete C++ code
// FJ's Grass Fields — prefix sum solution O(N + Q)
#include <bits/stdc++.h>
using namespace std;
int main() {
ios_base::sync_with_stdio(false);
cin.tie(NULL);
int n, q;
cin >> n >> q;
// Step 1: read grass amounts and build prefix sums at the same time
vector<long long> P(n + 1, 0);
for (int i = 1; i <= n; i++) {
long long g;
cin >> g;
P[i] = P[i - 1] + g; // ← key line: incremental prefix sum
}
// Step 2: answer each query in O(1)
while (q--) {
int l, r;
cin >> l >> r;
cout << P[r] - P[l - 1] << "\n";
}
return 0;
}
Why is this O(N + Q)?
- Build prefix sums: one loop with N iterations →
O(N) - Each query: one subtraction →
O(1)per query,O(Q)total - Total:
O(N + Q)— far better than brute forceO(NQ)
⚠️ Common mistake: Using
intinstead oflong longfor prefix sums. If each grass amount can be up to 10^9 and N = 10^5, the total can reach 10^14 — far beyond the range ofint(about 2×10^9).
3.2.5 2D Prefix Sums
For a 2D grid, prefix sums can be extended to answer rectangular range queries in O(1) time.
Given an R×C grid, define P[r][c] as the sum of all elements in the rectangle from (1,1) to (r,c).
Building a 2D prefix sum
P[r][c] = A[r][c] + P[r-1][c] + P[r][c-1] - P[r-1][c-1]
The subtraction removes overlap; otherwise the top-left rectangle would be counted twice.
💡 Key idea (inclusion-exclusion): Imagine four rectangles:
P[r-1][c]= the rectangle aboveP[r][c-1]= the rectangle on the leftP[r-1][c-1]= the top-left corner (counted in both of the previous two, so subtract it once)A[r][c]= the single new cell
Step-by-step 2D prefix sum example
Trace a 4×4 grid:
Original grid A:
c=1 c=2 c=3 c=4
r=1: 1 2 3 4
r=2: 5 6 7 8
r=3: 9 10 11 12
r=4: 13 14 15 16
Build P step by step (left to right, top to bottom):
📄 Code trace
P[1][1] = A[1][1] = 1
P[1][2] = 2 + 0 + 1 - 0 = 3
P[1][3] = 3 + 0 + 3 - 0 = 6
P[1][4] = 4 + 0 + 6 - 0 = 10
P[2][1] = 5 + 1 + 0 - 0 = 6
P[2][2] = 6 + 3 + 6 - 1 = 14
P[2][3] = 7 + 6 + 14 - 3 = 24
P[2][4] = 8 + 10 + 24 - 6 = 36
P[3][1] = 9 + 6 + 0 - 0 = 15
P[3][2] = 10 + 14 + 15 - 6 = 33
P[3][3] = 11 + 24 + 33 - 14 = 54
P[3][4] = 12 + 36 + 54 - 24 = 78
P[4][1] = 13 + 15 + 0 - 0 = 28
P[4][2] = 14 + 33 + 28 - 15 = 60
P[4][3] = 15 + 54 + 60 - 33 = 96
P[4][4] = 16 + 78 + 96 - 54 = 136
Prefix sum grid P:
c=1 c=2 c=3 c=4
r=1: 1 3 6 10
r=2: 6 14 24 36
r=3: 15 33 54 78
r=4: 28 60 96 136
Query: sum of subgrid (r1=2, c1=2) to (r2=3, c2=3):
ans = P[3][3] - P[1][3] - P[3][1] + P[1][1]
= 54 - 6 - 15 + 1
= 34
Verification: A[2][2]+A[2][3]+A[3][2]+A[3][3] = 6+7+10+11 = 34 ✓
Inclusion-exclusion visualization:
📄 Complete C++ code
// 2D prefix sums — build O(R×C), query O(1)
#include <bits/stdc++.h>
using namespace std;
const int MAXR = 1001, MAXC = 1001;
int A[MAXR][MAXC];
long long P[MAXR][MAXC];
int main() {
ios_base::sync_with_stdio(false);
cin.tie(NULL);
int R, C;
cin >> R >> C;
for (int r = 1; r <= R; r++)
for (int c = 1; c <= C; c++)
cin >> A[r][c];
// Step 1: build 2D prefix sums — O(R × C)
for (int r = 1; r <= R; r++) {
for (int c = 1; c <= C; c++) {
P[r][c] = A[r][c]
+ P[r-1][c] // rectangle above
+ P[r][c-1] // rectangle on the left
- P[r-1][c-1]; // ← key line: remove overlap (counted twice)
}
}
// Step 2: answer each query in O(1)
int q;
cin >> q;
while (q--) {
int r1, c1, r2, c2;
cin >> r1 >> c1 >> r2 >> c2;
long long ans = P[r2][c2]
- P[r1-1][c2] // subtract top strip
- P[r2][c1-1] // subtract left strip
+ P[r1-1][c1-1]; // add back top-left corner
cout << ans << "\n";
}
return 0;
}
Complexity analysis:
- Build time:
O(R × C) - Query time:
O(1)each - Space:
O(R × C)
⚠️ Common mistake: Forgetting to add back
P[r1-1][c1-1]in the query formula. The top strip and left strip both contain the top-left corner, so it gets subtracted twice and must be added back once.
3.2.6 Difference Arrays
Earlier, we learned prefix sums: they are good at solving "many range-sum queries."
Now consider a problem that looks like the opposite:
You have an array of length N, initially all zeros. Then there are many operations, each adding a value
Vto the entire interval[L, R]. Finally, output the whole array.
If we simulate directly, each operation loops from L to R:
for (int i = L; i <= R; i++) {
A[i] += V;
}
This is correct, but if both N and the number of operations M are 100000, the worst case is 10^10 additions — too slow.
The core problem solved by difference arrays is:
How can we express "add V to the whole interval" without actually modifying every element in the interval?
Start with a real-life intuition
Imagine a straight track with 8 positions:
Position: 1 2 3 4 5 6 7 8
Value: 0 0 0 0 0 0 0 0
Now we want to add 5 to all positions in [3, 6].
The naive method writes +5 on positions 3, 4, 5, and 6.
A difference array does not do that. It records only two facts:
Starting from position 3, +5 becomes active
Starting from position 7, +5 stops being active
That is:
diff[3] += 5;
diff[7] -= 5;
Why is that enough? Because at the end we scan from left to right and accumulate these "change markers."
diff: 0 0 +5 0 0 0 -5 0
running: 0 0 5 5 5 5 0 0
position: 1 2 3 4 5 6 7 8
Positions 3 through 6 become exactly 5, and position 7 returns to 0.
This is the entire intuition of difference arrays:
A range update does not directly modify the range. Instead, put a "start changing" marker at the range start, and put a "stop changing" marker after the range end.
What exactly does a difference array store?
If the original array is:
A: 2 5 5 9 7
Then the difference array diff stores "how much the current position changed compared with the previous position":
diff[1] = A[1] - 0 = 2
diff[2] = A[2] - A[1] = 3
diff[3] = A[3] - A[2] = 0
diff[4] = A[4] - A[3] = 4
diff[5] = A[5] - A[4] = -2
So:
A: 2 5 5 9 7
diff: 2 3 0 4 -2
Taking the prefix sum of diff restores A:
2
2 + 3 = 5
2 + 3 + 0 = 5
2 + 3 + 0 + 4 = 9
2 + 3 + 0 + 4 - 2 = 7
Therefore, you can understand difference arrays as:
diff[i]records "how much the value changes starting from position i."
Why does range addition only need two positions?
Suppose we want to add V to every element in [L, R].
Starting from position L, all later values should be larger by V, so:
diff[L] += V;
But this influence should only last through R. Starting from R+1, the extra V should disappear, so we cancel it:
diff[R + 1] -= V;
Together, this is the most important formula for difference arrays:
diff[L] += V;
diff[R + 1] -= V;
Note: this is why
diffis usually allocated with sizen + 2. WhenR = n, we still accessdiff[n + 1].
Complete example: three range updates
Start with a zero array of length 5:
A: 0 0 0 0 0
Perform three operations:
1) Add 2 to [1, 3]
2) Add 3 to [2, 5]
3) Add -1 to [3, 4]
We do not directly modify A; we only modify diff.
Operation 1: [1, 3] +2
diff[1] += 2;
diff[4] -= 2;
diff[1..6] = [2, 0, 0, -2, 0, 0]
Meaning: add 2 starting from 1, and cancel this +2 starting from 4.
Operation 2: [2, 5] +3
diff[2] += 3;
diff[6] -= 3;
diff[1..6] = [2, 3, 0, -2, 0, -3]
Operation 3: [3, 4] -1
diff[3] += -1;
diff[5] -= -1; // equivalent to diff[5] += 1
diff[1..6] = [2, 3, -1, -2, 1, -3]
Now take the prefix sum of diff to restore the final array:
i=1: 2 -> A[1] = 2
i=2: 2 + 3 = 5 -> A[2] = 5
i=3: 5 - 1 = 4 -> A[3] = 4
i=4: 4 - 2 = 2 -> A[4] = 2
i=5: 2 + 1 = 3 -> A[5] = 3
Final result:
A: 2 5 4 2 3
Complete C++ implementation
📄 Complete C++ code
// Difference array for range updates — O(N + M)
#include <bits/stdc++.h>
using namespace std;
int main() {
ios_base::sync_with_stdio(false);
cin.tie(NULL);
int n, m;
cin >> n >> m;
// diff[i] means: starting from position i, how much does the current value change?
// Allocate one extra position: when the update right endpoint r = n, we need diff[n + 1]
vector<long long> diff(n + 2, 0);
// Step 1: process all range updates
for (int i = 0; i < m; i++) {
int l, r;
long long v;
cin >> l >> r >> v;
diff[l] += v; // from l onward, adding v becomes active
diff[r + 1] -= v; // from r+1 onward, cancel this addition
}
// Step 2: take prefix sums of diff to get the final array
long long cur = 0;
for (int i = 1; i <= n; i++) {
cur += diff[i];
cout << cur;
if (i < n) cout << " ";
}
cout << "\n";
return 0;
}
Sample input:
5 3
1 3 2
2 5 3
3 4 -1
Sample output:
2 5 4 2 3
When should you use a difference array?
Difference arrays are suitable for this scenario:
Many range updates, then final results are queried together
For example:
- Add
Vto every element in[L, R] - Paint many wall segments, each operation painting one interval
- Increase traffic volume on many road intervals
- Add 1 to online user count during a time interval
But note:
A normal difference array is not suitable for dynamic problems where updates and range queries are interleaved.
If a problem requires updates and queries to appear alternately, you usually need a Fenwick tree or a segment tree.
Common mistakes with difference arrays
- Array too small:
diffshould have sizen + 2, because we may accessdiff[r + 1]. - Forgetting the final prefix sum:
diffitself is not the answer; only after taking prefix sums do we get the final array. - Writing
diff[r + 1] -= vasdiff[r] -= v: this prevents positionrfrom receiving the update. - Potential overflow: after many range additions, values can become large, so use
long long.
One-sentence memory trick:
Start adding at the left endpoint, stop adding after the right endpoint; scan once at the end, and changes become answers.
3.2.7 2D Difference Arrays
A one-dimensional difference array solves "add a value to a continuous interval."
A two-dimensional difference array solves the analogous problem on a grid:
You have an
R × Cgrid, initially all zeros. Each operation addsVto an entire rectangular region. Finally, output the whole grid.
For example:
Add V to every cell in the rectangle from top-left (r1, c1) to bottom-right (r2, c2)
If we simulate directly, we must enumerate every cell inside the rectangle. When rectangles are large and operations are numerous, this becomes very slow.
The idea of 2D difference arrays is exactly the same as in 1D:
Do not really modify the whole rectangle. Only place markers on the rectangle boundary. At the end, use a 2D prefix sum to spread those markers into the final result.
From the 1D formula to the 2D formula
For a 1D interval [L, R] + V, the markers are:
diff[L] += V;
diff[R + 1] -= V;
This means:
The effect starts at L and stops at R+1
A 2D rectangle has two directions: rows and columns.
If we only write:
diff[r1][c1] += V;
then after taking a 2D prefix sum, V affects the entire lower-right rectangle starting from (r1,c1), not just the rectangle [r1,c1]..[r2,c2] that we want.
So we need to cancel the influence to the right and below, and then add back the lower-right area that was canceled twice.
The four-corner formula is:
diff[r1][c1] += V; // effect starts from the top-left corner
diff[r1][c2 + 1] -= V; // past the right boundary, cancel horizontal influence
diff[r2 + 1][c1] -= V; // past the bottom boundary, cancel vertical influence
diff[r2 + 1][c2 + 1] += V; // lower-right area was canceled twice, add it back once
These four markers are the core of 2D difference arrays.
Understanding four-corner markers visually
Suppose we want to add V to this rectangle:
c1 c2
↓ ↓
. . . . .
r1 -> . +V +V +V .
. +V +V +V .
r2 -> . +V +V +V .
. . . . .
We place four markers:
Top-left corner: +V starts the influence
Right of top edge: -V prevents the influence from continuing right
Below left edge: -V prevents the influence from continuing downward
Lower-right outside: +V fixes the area that was subtracted twice
That is:
(r1, c1) place +V
(r1, c2 + 1) place -V
(r2 + 1, c1) place -V
(r2 + 1, c2+1) place +V
Complete example: two rectangle updates
A 3 × 3 grid, initially all zeros.
Perform two updates:
1) update(1, 1, 2, 2, +5)
2) update(2, 2, 3, 3, +3)
Operation 1: add 5 to the top-left 2 × 2
diff[1][1] += 5;
diff[1][3] -= 5;
diff[3][1] -= 5;
diff[3][3] += 5;
Operation 2: add 3 to the bottom-right 2 × 2
diff[2][2] += 3;
diff[2][4] -= 3;
diff[4][2] -= 3;
diff[4][4] += 3;
After all markers, diff is roughly:
c=1 c=2 c=3 c=4
r=1: 5 0 -5 0
r=2: 0 3 0 -3
r=3: -5 0 5 0
r=4: 0 -3 0 3
Now take a 2D prefix sum:
diff[r][c] += diff[r-1][c] + diff[r][c-1] - diff[r-1][c-1];
The final grid is:
c=1 c=2 c=3
r=1: 5 5 0
r=2: 5 8 3
r=3: 0 3 3
The cell (2,2) equals 8 because it is covered by both the +5 and +3 rectangles.
Complete C++ implementation
📄 Complete C++ code
// 2D difference array — each rectangle update O(1), final rebuild O(R × C)
#include <bits/stdc++.h>
using namespace std;
int main() {
ios_base::sync_with_stdio(false);
cin.tie(NULL);
int R, C, M;
cin >> R >> C >> M;
// Allocate an extra sentinel border: when r2=R or c2=C, we access r2+1 or c2+1
vector<vector<long long>> diff(R + 2, vector<long long>(C + 2, 0));
// Step 1: mark each rectangle update in O(1)
while (M--) {
int r1, c1, r2, c2;
long long v;
cin >> r1 >> c1 >> r2 >> c2 >> v;
diff[r1][c1] += v;
diff[r1][c2 + 1] -= v;
diff[r2 + 1][c1] -= v;
diff[r2 + 1][c2 + 1] += v;
}
// Step 2: rebuild the final grid with a 2D prefix sum
for (int r = 1; r <= R; r++) {
for (int c = 1; c <= C; c++) {
diff[r][c] += diff[r - 1][c]
+ diff[r][c - 1]
- diff[r - 1][c - 1];
}
}
// Step 3: output the final grid
for (int r = 1; r <= R; r++) {
for (int c = 1; c <= C; c++) {
cout << diff[r][c];
if (c < C) cout << " ";
}
cout << "\n";
}
return 0;
}
When should you use a 2D difference array?
2D difference arrays are suitable for:
Many rectangular region additions, then finally obtain the whole grid
Common problem types include:
- Paint many rectangular regions, then ask for each cell's color or coverage count
- Stack multiple rectangular billboards and compute the final brightness
- Increase height, temperature, or weight over rectangular regions on a map
- Process many rectangle updates offline
Common mistakes with 2D difference arrays
- Wrong signs at the four corners: remember the pattern is
+ - - +. - Array too small: you must be able to access
r2 + 1andc2 + 1, so allocate at leastR + 2byC + 2. - Forgetting the final 2D prefix sum: the four-corner markers are only changes, not the final grid.
- Wrong rebuild formula: it should be
top + left - top-left, namely:
diff[r][c] += diff[r-1][c] + diff[r][c-1] - diff[r-1][c-1];
One-sentence memory trick:
For rectangle updates, look at the four corners: top-left add, top-right subtract, bottom-left subtract, bottom-right add; then take a 2D prefix sum.
3.2.8 USACO Example: Maximum Subarray Sum
Problem (a variation of Kadane's algorithm): Find the maximum sum of a contiguous subarray.
📄 Complete C++ code
// Kadane's algorithm — O(N) time, O(1) space
#include <bits/stdc++.h>
using namespace std;
int main() {
ios_base::sync_with_stdio(false);
cin.tie(NULL);
int n;
cin >> n;
vector<int> A(n);
for (int &x : A) cin >> x;
// Kadane's algorithm: O(n)
long long maxSum = LLONG_MIN; // smallest long long
long long current = 0;
for (int i = 0; i < n; i++) {
current += A[i];
maxSum = max(maxSum, current);
if (current < 0) current = 0; // ← key line: restart when the sum becomes negative
}
cout << maxSum << "\n";
return 0;
}
💡 Key idea: Why reset
currentto 0 when it becomes negative? Because a negative prefix sum only hurts any future subarray. If the current running sum is -5, any future subarray starting fresh (sum 0) is better than continuing from -5.
Alternative using prefix sums: The maximum subarray sum is the maximum value of P[j] - P[i-1] over all pairs (i,j). For each j, this is maximized when P[i-1] is as small as possible. Track the running minimum prefix sum!
// Alternative: minimum-prefix trick — also O(N)
long long maxSum = LLONG_MIN, minPrefix = 0, prefix = 0;
for (int x : A) {
prefix += x;
maxSum = max(maxSum, prefix - minPrefix); // best sum ending here
minPrefix = min(minPrefix, prefix); // track the smallest prefix seen so far
// ⚠️ Note: minPrefix must be updated after maxSum.
// If minPrefix is updated first, it effectively allows an empty subarray (length 0),
// which makes all-negative arrays incorrectly return 0 instead of the largest negative value.
}
3.2.8 USACO Real Problem Practice: Turning Range Problems into Prefix Differences
In USACO, prefix sums and difference arrays are usually not presented as "please write a prefix sum." Instead, they are hidden behind signals like these:
| Problem signal | First-choice technique | Why |
|---|---|---|
Many queries for the count/sum of some kind of element in [L,R] | Prefix sums | The original array does not change, and there are many queries |
Many operations that add to [L,R], then a final query/output | Difference array | Many updates, but no interleaved queries |
| Many rectangle queries on a grid | 2D prefix sums | Rectangle sums can be answered with inclusion-exclusion |
| Many rectangle additions on a grid | 2D difference array | Four-corner markers, then final rebuild |
Real Problem 1: Breed Counting (USACO 2015 December Silver) — Multi-category Prefix Sums
Problem link: USACO 2015 December Silver P3: Breed Counting
Pattern: multiple prefix sum arrays
Difficulty level: Silver beginner
Problem interpretation
N cows stand in a row. Each cow's breed is one of 1, 2, or 3. Then there are Q queries. Each query gives an interval [L,R], and you must output how many cows of each of the three breeds appear in that interval.
Key conditions:
- The order of cows is fixed and never changes.
- There are many queries, so scanning
[L,R]each time is too slow. - There are only 3 breeds, so we can build one prefix sum for each breed.
Idea
For each breed b, build a prefix array:
prefix[b][i] = number of cows of breed b among the first i cows
Then the number of breed b cows in interval [L,R] is:
prefix[b][R] - prefix[b][L-1]
This is exactly the same as ordinary prefix sums, except that "summing values" becomes "counting whether a category appears."
Complete CPP code
✅ Complete code: Breed Counting
#include <bits/stdc++.h>
using namespace std;
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
// USACO official file I/O can be enabled when needed:
// freopen("bcount.in", "r", stdin);
// freopen("bcount.out", "w", stdout);
int n, q;
cin >> n >> q;
vector<array<int, 4>> prefix(n + 1); // only indices 1..3 are used
for (int i = 1; i <= n; i++) {
int breed;
cin >> breed;
prefix[i] = prefix[i - 1]; // inherit counts from the first i-1 cows
prefix[i][breed]++; // add 1 to the current breed
}
while (q--) {
int l, r;
cin >> l >> r;
for (int b = 1; b <= 3; b++) {
int count = prefix[r][b] - prefix[l - 1][b];
cout << count << " \n"[b == 3];
}
}
return 0;
}
Complexity: preprocessing O(N × 3), each query O(3), total O(N+Q), space O(3N).
Common pitfalls
- Forgetting
prefix[i] = prefix[i-1]. If you only increment the current breed, all previous counts are lost. - Out-of-bounds at
L-1. With 1-indexed prefix sums,prefix[0]is the sentinel, so queries withL=1are safe. - Wrong output format. Each query outputs 3 numbers followed by a newline.
- Confusing file names. The official file I/O names are
bcount.in/out, notbreed.in/out.
Extension
If the number of breeds increases from 3 to 10^5, but each query asks about only one breed, you can store the positions of each breed in a map<breed, vector<int>>, then use binary search to count occurrences in the interval. When choosing a data structure, always look at "number of categories × query pattern."
Real Problem 2: Haybale Stacking (USACO 2012 January Bronze) — Difference Array for Range Addition
Problem link: USACO 2012 January Bronze P2: Haybale Stacking
Pattern: difference array + final rebuild
Difficulty level: Bronze/Silver foundation
Problem interpretation
There are N hay stacks, all initially height 0. Then K instructions follow. Each instruction adds 1 to every hay stack in interval [A,B]. After all operations, output the median height.
Key conditions:
Ncan be up to10^6, andKcan be up to25000.- Directly looping over
[A,B]for each instruction can reachO(NK), which is too slow. - The problem only asks for the result after all operations; there are no intermediate queries.
Idea
A difference array turns one range addition into two endpoint markers:
diff[A] += 1
diff[B+1] -= 1
Finally, scan from left to right and take prefix sums to recover the final height of every position. Then sort the heights and take the median.
Complete CPP code
✅ Complete code: Haybale Stacking
#include <bits/stdc++.h>
using namespace std;
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
// USACO official file I/O can be enabled when needed:
// freopen("stacking.in", "r", stdin);
// freopen("stacking.out", "w", stdout);
int n, k;
cin >> n >> k;
vector<int> diff(n + 2, 0); // must write to b+1, which may be n+1
for (int i = 0; i < k; i++) {
int a, b;
cin >> a >> b;
diff[a] += 1;
diff[b + 1] -= 1;
}
vector<int> height(n + 1);
int current = 0;
for (int i = 1; i <= n; i++) {
current += diff[i];
height[i] = current;
}
sort(height.begin() + 1, height.end());
cout << height[(n + 1) / 2] << "\n"; // N is odd, so the median index is fixed
return 0;
}
Complexity: range updates O(K), rebuild O(N), sorting O(N log N), space O(N).
Common pitfalls
diffis too small. It must have sizen+2, because whenB=n, we writediff[n+1].- Forgetting to rebuild. The difference array is not the final height array; you must take prefix sums first.
- Wrong median index. In this problem
Nis odd. With a 1-indexed height array, the median is at(N+1)/2. - Overusing segment trees. There are no intermediate queries, so a difference array is simpler and faster.
Extension
If the problem becomes "after each range addition, immediately query some position or interval," a difference array is no longer enough; you need a Fenwick tree or segment tree. This is exactly what Chapters 5.8 and 5.7 will solve.
⚠️ Common Mistakes in Chapter 3.2
- Off-by-one in range queries: writing
P[R] - P[L]instead ofP[R] - P[L-1]. Always test with a small example. - Overflow: prefix sums of large values can exceed the range of
int(2×10^9). Even if the elements areint, the prefix array should uselong long. - 2D query formula: forgetting the
+P[r1-1][c1-1]term in a 2D query is very easy. - Difference array size: declaring
diff[n+1]when you needdiff[n+2](because you write to indexr+1, which can ben+1). - 1-indexed vs 0-indexed: with 0-indexed prefix sums, the query formula becomes
P[R+1] - P[L]. Pick one convention inside a problem and stick to it. - 2D difference array size: declaring
diff[R+1][C+1]when you needdiff[R+2][C+2]— the four-corner update writes to(r2+1, c2+1), so it must be in bounds. - 2D difference rebuild order: the 2D prefix-sum rebuild must process cells from left to right and top to bottom (the same order as building a 2D prefix sum). A mixed order produces wrong results.
Chapter Summary
📌 Key takeaways
| Technique | Build time | Query time | Space | Use case |
|---|---|---|---|---|
| 1D prefix sum | O(N) | O(1) | O(N) | Range sums on a 1D array |
| 2D prefix sum | O(RC) | O(1) | O(RC) | Rectangle sums on a 2D grid |
| Difference array | O(N+M) | O(1)* | O(N) | Range addition updates |
| 2D difference array | O(RC+M) | O(1)* | O(RC) | Rectangle additions on a 2D grid |
| Kadane's algorithm | O(N) | — | O(1) | Maximum subarray sum |
*Values can be read only after an O(N) rebuild pass.
🧩 Core formula quick reference
| Operation | Formula | Notes |
|---|---|---|
| 1D range sum | P[R] - P[L-1] | P[0] = 0 is the sentinel value |
| 2D rectangle sum | P[r2][c2] - P[r1-1][c2] - P[r2][c1-1] + P[r1-1][c1-1] | Inclusion-exclusion: subtract twice, add once |
| Difference array update | diff[L] += V; diff[R+1] -= V; | Array size should be N+2 |
| 2D difference update | diff[r1][c1]+=V; diff[r1][c2+1]-=V; diff[r2+1][c1]-=V; diff[r2+1][c2+1]+=V | Four-corner markers |
| Restore from difference | Take prefix sums of diff (1D or 2D) | The result is the final array |
❓ FAQ
Q1: What is the relationship between prefix sums and difference arrays?
A: They are inverse operations. Taking prefix sums accumulates values; taking differences records adjacent changes. After range updates are stored as +V and -V markers, taking prefix sums spreads those markers into the final values, and +V/-V cancel outside
[L,R].
Q2: When should I use prefix sums vs. difference arrays?
A: Rule of thumb — look at the operation type:
- Many range-sum queries → prefix sums (preprocess
O(N), queryO(1))- Many range addition/subtraction operations → difference arrays (update
O(1), restoreO(N)at the end)- If updates and queries are interleaved, use a more advanced data structure (such as the segment tree in Chapter 5.7)
Q3: Can prefix sums handle dynamic modifications? (array elements change)
A: No. Prefix sums are one-time preprocessing, and the array cannot change afterward. If elements are modified, use a Fenwick tree (BIT) or segment tree, which support point updates and range queries in
O(log N).
Q4: Why are there two versions of Kadane's algorithm (current=0 vs minPrefix)?
A: They are essentially the same and both run in
O(N). The first version (classic Kadane) is more intuitive: restart when the current subarray sum becomes negative. The second version (minimum prefix) uses prefix-sum thinking: max subarray = max(P[j] - P[i]) = max(P[j]) - min(P[i]). Choose whichever you prefer.
Q5: What are the space limits for 2D prefix sums?
A: If R and C are both 10^4, the P array needs 10^8
long longvalues (about 800MB), which exceeds many memory limits. In general, R×C ≤ 10^6~10^7 is safe. For larger grids, consider compression or offline processing.
🔗 Connections to later chapters
- Chapter 3.4 (Two Pointers): sliding windows can also handle range-like queries, but only for fixed-size or monotonically moving windows; prefix sums are more general
- Chapter 3.3 (Sorting and Searching): binary search can be combined with prefix sums — for example, binary search the prefix sum array for the first position ≥ a target
- Chapter 5.7 (Segment Trees): solves the "dynamic update + range query" problems that prefix sums cannot handle
- Chapters 6.1–6.3 (Dynamic Programming): many state transitions involve range sums; prefix sums are an important tool for optimizing DP
- The difference array idea ("+V at the start, -V after the end") appears repeatedly in sweep line algorithms, event sorting, and other advanced techniques
Practice Problems
Problem 3.2.1 — Range Sum 🟢 Easy
Read N integers and Q queries. Each query gives L and R. Print the sum of elements from index L to R (1-indexed).
Hint
Build a prefix sum array P where P[i] = A[1]+...+A[i], then answer each query with P[R] - P[L-1].✅ Complete solution
Core idea: Precompute prefix sums in O(N), then answer each query in O(1) using P[R] - P[L-1].
#include <bits/stdc++.h>
using namespace std;
int main() {
ios_base::sync_with_stdio(false); cin.tie(NULL);
int n, q; cin >> n >> q;
vector<long long> P(n + 1, 0);
for (int i = 1; i <= n; i++) {
int x; cin >> x;
P[i] = P[i-1] + x;
}
while (q--) {
int l, r; cin >> l >> r;
cout << P[r] - P[l-1] << "\n";
}
}
Complexity: O(N + Q) — far better than naive O(N × Q).
Problem 3.2.2 — Range Addition, Point Values 🟢 Easy
Start with N zeros. Process M operations: each operation adds V to every position from L to R. After all operations, print the value at every position.
Hint
For each update, do `diff[L] += V` and `diff[R+1] -= V`, then take prefix sums of diff.✅ Complete solution
Core idea: Difference array. Each range addition affects only 2 positions in diff; final values are obtained by prefix sums.
#include <bits/stdc++.h>
using namespace std;
int main() {
int n, m; cin >> n >> m;
vector<long long> diff(n + 2, 0);
while (m--) {
int l, r; long long v; cin >> l >> r >> v;
diff[l] += v;
diff[r+1] -= v;
}
long long cur = 0;
for (int i = 1; i <= n; i++) {
cur += diff[i];
cout << cur << " \n"[i == n];
}
}
Complexity: O(N + M) — each update is O(1), and the final scan is O(N).
Problem 3.2.3 — Rectangular Sum 🟡 Medium
Read an N×M grid and Q queries. Each query gives (r1,c1,r2,c2). Print the sum of the subgrid.
Hint
Use 2D prefix sums. Query = P[r2][c2] - P[r1-1][c2] - P[r2][c1-1] + P[r1-1][c1-1].✅ Complete solution
Core idea: 2D prefix sums. P[i][j] is the sum from (1,1) to (i,j), and any rectangle query is answered by inclusion-exclusion.
#include <bits/stdc++.h>
using namespace std;
int main() {
ios_base::sync_with_stdio(false); cin.tie(NULL);
int n, m, q; cin >> n >> m >> q;
vector<vector<long long>> P(n+1, vector<long long>(m+1, 0));
for (int i = 1; i <= n; i++)
for (int j = 1; j <= m; j++) {
int x; cin >> x;
P[i][j] = x + P[i-1][j] + P[i][j-1] - P[i-1][j-1];
}
while (q--) {
int r1, c1, r2, c2; cin >> r1 >> c1 >> r2 >> c2;
cout << P[r2][c2] - P[r1-1][c2] - P[r2][c1-1] + P[r1-1][c1-1] << "\n";
}
}
Complexity: O(N × M + Q).
Problem 3.2.4 — USACO 2016 January Bronze: Mowing the Field 🔴 Hard
FJ mows grass along a path. Cells visited more than once form a "double-mowed" area. Count how many cells are visited at least twice.
Hint
Simulate the path, mark visit counts in a 2D structure, then count cells with value ≥ 2.✅ Complete solution
Core idea: Direct simulation — no advanced data structure is needed. Walk along the path and increment the visit counter for every cell reached.
#include <bits/stdc++.h>
using namespace std;
int main() {
int n; cin >> n;
map<pair<int,int>, int> cnt;
int x = 0, y = 0; cnt[{x,y}]++;
while (n--) {
char dir; int steps; cin >> dir >> steps;
int dx = (dir=='E') - (dir=='W');
int dy = (dir=='N') - (dir=='S');
while (steps--) {
x += dx; y += dy;
cnt[{x,y}]++;
}
}
int doubleMowed = 0;
for (auto& [pos, c] : cnt) if (c >= 2) doubleMowed++;
cout << doubleMowed << "\n";
}
Complexity: O(total steps × log), dominated by map operations.
Problem 3.2.5 — 2D Range Addition 🟡 Medium
Given an N×M grid initially filled with zeros, process Q operations. Each operation adds V to the rectangle from [r1,c1] to [r2,c2]. Output the final grid.
Hint
Use a 2D difference array: mark 4 corners for each update, then rebuild with a 2D prefix sum.✅ Complete solution
Core idea: 2D difference array. Each rectangle update touches only 4 corners. The final grid is the 2D prefix sum of the difference array.
#include <bits/stdc++.h>
using namespace std;
int main() {
int n, m, q; cin >> n >> m >> q;
vector<vector<long long>> D(n+2, vector<long long>(m+2, 0));
while (q--) {
int r1, c1, r2, c2; long long v;
cin >> r1 >> c1 >> r2 >> c2 >> v;
D[r1][c1] += v;
D[r1][c2+1] -= v;
D[r2+1][c1] -= v;
D[r2+1][c2+1] += v;
}
for (int i = 1; i <= n; i++)
for (int j = 1; j <= m; j++)
D[i][j] += D[i-1][j] + D[i][j-1] - D[i-1][j-1];
for (int i = 1; i <= n; i++)
for (int j = 1; j <= m; j++)
cout << D[i][j] << " \n"[j == m];
}
Complexity: O(Q + N × M).
Problem 3.2.6 — Maximum Subarray (Kadane's Algorithm) 🟡 Medium
Read N integers (possibly negative). Find the maximum sum of a contiguous subarray.
Hint
Use Kadane's algorithm. If all numbers are negative, the answer is the largest single element.✅ Complete solution
Core idea: At each position, either start a new subarray or extend the current one. cur = max(A[i], cur + A[i]), and track the best value.
#include <bits/stdc++.h>
using namespace std;
int main() {
int n; cin >> n;
long long best = LLONG_MIN, cur = 0;
for (int i = 0; i < n; i++) {
long long x; cin >> x;
cur = max(x, cur + x);
best = max(best, cur);
}
cout << best << "\n";
}
Why start over when cur+x < x? Because a negative running sum only hurts future terms — discard it and restart from the current element.
Trace for [-2, 1, -3, 4, -1, 2, 1, -5, 4]:
x=-2: cur=-2, best=-2
x=1: cur=max(1,-2+1)=1, best=1
x=-3: cur=max(-3,1-3)=-2, best=1
x=4: cur=max(4,-2+4)=4, best=4
x=-1: cur=max(-1,4-1)=3, best=4
x=2: cur=5, best=5
x=1: cur=6, best=6 ✓
x=-5: cur=1, best=6
x=4: cur=5, best=6
Complexity: O(N) time, O(1) space.
Difference Array Practice
The following problems specifically train difference arrays. First decide which category the problem belongs to:
- 1D range addition, final array output: use a 1D difference array.
- 1D range coverage counts, final answer count: use a 1D difference array + scan.
- 2D rectangle addition or coverage counts: use a 2D difference array.
- Intermediate queries exist: ordinary difference arrays are usually not enough; use a Fenwick tree or segment tree.
Problem 3.2.7 — Maximum Value After Range Additions 🟢 Easy
You have an array of length N, initially all 0. Then there are M operations. Each operation adds V to all positions in interval [L,R]. After all operations, output the maximum value in the array.
Hint
You do not need to store every final value. Use a difference array to record all range additions, then scan left to right with a running prefix sum while maintaining the maximum.✅ Complete solution
Core idea: Each range addition changes only two positions: diff[L] += V, diff[R+1] -= V. During the final scan, the current prefix sum is the final value at the current position.
#include <bits/stdc++.h>
using namespace std;
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
int n, m;
cin >> n >> m;
vector<long long> diff(n + 2, 0);
while (m--) {
int l, r;
long long v;
cin >> l >> r >> v;
diff[l] += v;
diff[r + 1] -= v;
}
long long cur = 0;
long long best = LLONG_MIN;
for (int i = 1; i <= n; i++) {
cur += diff[i];
best = max(best, cur);
}
cout << best << "\n";
return 0;
}
Complexity: O(N + M) time, O(N) space.
Problem 3.2.8 — Count Positions Covered at Least K Times 🟡 Medium
There are N positions on a number line, numbered 1..N. You are given M intervals [L,R], and each interval covers those positions once. Count how many positions are covered at least K times.
Hint
Each interval cover is equivalent to adding `1` to a range. Use a difference array to compute the coverage count of each position, then count positions with value `>= K`.✅ Complete solution
Core idea: This problem does not ask you to output the final array; it asks you to count positions satisfying a condition. Rebuild the difference array and count on the fly.
#include <bits/stdc++.h>
using namespace std;
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
int n, m, k;
cin >> n >> m >> k;
vector<int> diff(n + 2, 0);
while (m--) {
int l, r;
cin >> l >> r;
diff[l] += 1;
diff[r + 1] -= 1;
}
int cur = 0;
int answer = 0;
for (int i = 1; i <= n; i++) {
cur += diff[i];
if (cur >= k) answer++;
}
cout << answer << "\n";
return 0;
}
Complexity: O(N + M) time, O(N) space.
Problem 3.2.9 — Range Additions on an Existing Array 🟡 Medium
Given an original array A of length N. Then there are M operations, each adding V to all elements in interval [L,R]. After all operations, output the final array.
Hint
The initial array is not all zeros. You can use a difference array only to record the "extra additions," then add those increments back to the original array; or you can first convert the original array into a difference array and modify it directly.✅ Complete solution
Core idea: The simplest implementation is to let diff record only the increments caused by operations. During the final scan, diff tells us how much has been added to the current position, and we add it to A[i].
#include <bits/stdc++.h>
using namespace std;
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
int n, m;
cin >> n >> m;
vector<long long> a(n + 1);
for (int i = 1; i <= n; i++) {
cin >> a[i];
}
vector<long long> diff(n + 2, 0);
while (m--) {
int l, r;
long long v;
cin >> l >> r >> v;
diff[l] += v;
diff[r + 1] -= v;
}
long long add = 0;
for (int i = 1; i <= n; i++) {
add += diff[i];
cout << a[i] + add << " \n"[i == n];
}
return 0;
}
Complexity: O(N + M) time, O(N) space.
Problem 3.2.10 — Rectangle Coverage Counts 🟡 Medium
You have an R × C grid, initially with coverage count 0 in every cell. You are given M rectangles, each represented by its top-left corner (r1,c1) and bottom-right corner (r2,c2). Each rectangle increases the coverage count of every cell inside it by 1. Count how many cells are covered at least K times.
Hint
This is a 2D difference array problem. For each rectangle update, mark four corners: top-left `+1`, right outside top edge `-1`, below left edge `-1`, and lower-right outside `+1`. Then take a 2D prefix sum and count the answer.✅ Complete solution
Core idea: The four-corner marker pattern for 2D difference arrays is + - - +. During reconstruction, use the 2D prefix formula: D[r][c] += D[r-1][c] + D[r][c-1] - D[r-1][c-1].
#include <bits/stdc++.h>
using namespace std;
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
int R, C, M, K;
cin >> R >> C >> M >> K;
vector<vector<int>> diff(R + 2, vector<int>(C + 2, 0));
while (M--) {
int r1, c1, r2, c2;
cin >> r1 >> c1 >> r2 >> c2;
diff[r1][c1] += 1;
diff[r1][c2 + 1] -= 1;
diff[r2 + 1][c1] -= 1;
diff[r2 + 1][c2 + 1] += 1;
}
int answer = 0;
for (int r = 1; r <= R; r++) {
for (int c = 1; c <= C; c++) {
diff[r][c] += diff[r - 1][c]
+ diff[r][c - 1]
- diff[r - 1][c - 1];
if (diff[r][c] >= K) answer++;
}
}
cout << answer << "\n";
return 0;
}
Complexity: O(R × C + M) time, O(R × C) space.
Problem 3.2.11 — Decide Whether You Need a Difference Array 🟡 Medium
For each of the following four problem types, what data structure or algorithm should you consider first?
- The array does not change, and there are many queries asking for the sum of
[L,R]. - There are many operations that add a value to
[L,R], and after all operations we output the final array. - There are many operations that add a value to
[L,R], and after each operation we immediately query the value at some position. - There are many operations that add a value to a rectangular region, and after all operations we report the final value of every grid cell.
Hint
Distinguish the key properties: query or modification? One-dimensional or two-dimensional? Are there intermediate queries?✅ Complete solution
- Prefix sums. The array is fixed and there are many range-sum queries; after preprocessing, each query is
O(1). - 1D difference array. There are many range additions, and the result is restored once at the end.
- Fenwick tree or segment tree. Updates and queries are interleaved, so an ordinary difference array cannot directly handle intermediate queries.
- 2D difference array. Batch rectangle additions, then rebuild the grid once at the end.
The point of this problem is not writing code, but learning to identify the correct tool from the problem statement.
🏆 Challenge Problem: Cows and Paint Buckets
An N×M grid contains paint buckets, each with a positive value. Choose any rectangular subgrid. Score = (maximum value inside the subgrid) - (sum of boundary cells). Find the best rectangle. (N, M ≤ 500)
✅ Solution idea
Naively enumerating all rectangles costs O(N²M²), which is too slow if each rectangle also requires scanning. A better plan:
- Use 2D prefix sums for O(1) sum queries.
- For maximum value inside a subgrid, preprocess a 2D sparse table (or per-row RMQ) to support O(1) maximum queries.
- Boundary sum = total sum - inner sum (both computed with prefix sums).
Total: O(N²M²) rectangle enumeration × O(1) per query, which can fit for N,M ≤ 500.