📖 Chapter 5.8 ⏱️ ~60 min 🎯 Advanced

Chapter 5.8: Fenwick Tree (BIT)

📝 Prerequisites: Understanding of prefix sums (Chapter 3.2) and bitwise operations. This chapter complements Segment Trees (Chapter 5.7): BIT has shorter code and smaller constants, but supports fewer operations.

A Fenwick Tree, also known as a Binary Indexed Tree / BIT, is one of the most commonly used data structures in competitive programming. With fewer than 15 lines of core code, it supports point updates and prefix sum queries in O(log N) time.

Learning path for this chapter:
1. Start from a simple question: why does a normal prefix sum array become slow when updates happen?
2. Learn lowbit: it tells us how many original elements each small box should manage.
3. Understand exactly which interval tree[i] stores.
4. Then study query and update separately: first calculate by hand, then read the code.
5. Finally, learn advanced uses such as difference BIT, inversion counting, and value-domain BIT.

🧒 Kid-friendly analogy: Imagine a row of students, and each student has some candies. The teacher often asks, “How many candies do the first 7 students have in total?” The teacher also often says, “Student 3 gets 5 more candies.” If we count students 1 to 7 every time, it is slow. If one student changes and we rebuild all prefix sums every time, that is also slow. A Fenwick Tree groups students into many small groups of different sizes, and each group leader records the total candies in that group. Then both asking and changing become much faster.

5.8.1 Start from the Problem: Why Do We Need a Fenwick Tree?

Before learning Fenwick Tree, let us review two very simple methods.

Method 1: Direct Array

Suppose the array is:

A = [3, 1, 4, 1, 5, 9, 2, 6]

If we want the sum of A[1..7], we add from the beginning:

3 + 1 + 4 + 1 + 5 + 9 + 2 = 25

Advantage: Updating is fast. For example, A[3] += 10 directly changes one number.
Disadvantage: Querying is slow. One prefix sum query may need to add many numbers.

Method 2: Prefix Sum Array

A prefix sum array stores “the total from the beginning to the current position” in advance:

prefix[i] = A[1] + A[2] + ... + A[i]

Then querying A[1..7] is fast: just look at prefix[7].

But what happens if A[3] changes?

prefix[3], prefix[4], prefix[5], ..., prefix[n]

Many later prefix sums need to change too.

So the conflict is:

Operation	Direct Array	Prefix Sum Array
Update one position	Fast, `O(1)`	Slow, may need to change many values
Query prefix sum	Slow, may need to add many values	Fast, `O(1)`

Fenwick Tree solves exactly this conflict:

Can we make updates not too slow and queries not too slow either?

Yes. BIT makes both operations O(log N).

Direct array: fast update, slow query
Prefix sum: fast query, slow update
Fenwick Tree: both update and query are fairly fast

5.8.2 Core Building Block: What Is `lowbit`?

The most important building block of a Fenwick Tree is only one thing: lowbit.

int lowbit(int x) {
    return x & (-x);
}

Do not rush to memorize the formula. First understand what question it answers:

Looking from right to left, what value does the first 1 in x represent?

For example, x = 6:

Binary of 6 = 0110
Looking from right to left, the first 1 is at bit 1
This bit represents 2
So lowbit(6) = 2

More examples:

lowbit(1) = lowbit(0001) = 1
lowbit(2) = lowbit(0010) = 2
lowbit(3) = lowbit(0011) = 1
lowbit(4) = lowbit(0100) = 4
lowbit(6) = lowbit(0110) = 2
lowbit(8) = lowbit(1000) = 8

Bitwise Principle of `lowbit`

For any positive integer x, lowbit(x) = x & (-x) returns the value represented by the lowest set bit in the binary representation of x.

Take x = 6 as an example:

x  =  6  →  binary: 0110
-x = -6  →  two's complement: 1010 (bitwise NOT + 1)
x & (-x) = 0010 = 2   ← the lowest set bit corresponds to 2^1 = 2

If you are not familiar with two's complement yet, that is okay. For now, treat lowbit(x) as a small tool:

It helps us find the value represented by the rightmost 1 in x.

Examples:

x	Binary	`lowbit(x)`	Beginner-friendly meaning
1	0001	1	The rightmost 1 represents 1
2	0010	2	The rightmost 1 represents 2
3	0011	1	The rightmost 1 represents 1
4	0100	4	The rightmost 1 represents 4
6	0110	2	The rightmost 1 represents 2
8	1000	8	The rightmost 1 represents 8

Why Is `lowbit` Important for BIT?

In a Fenwick Tree, lowbit(i) is not just a bitwise result. It means:

How many original array elements the small box tree[i] should manage.

For example:

lowbit(6) = 2  →  tree[6] manages 2 numbers
lowbit(8) = 8  →  tree[8] manages 8 numbers
lowbit(3) = 1  →  tree[3] manages 1 number

So remember this sentence first:

lowbit(i) decides the size of the box tree[i].

5.8.3 What Exactly Does `tree[i]` Store?

The elegance of BIT is this: tree[i] does not store a single element. It stores the sum of an interval.

More precisely:

tree[i] = A[i - lowbit(i) + 1] + ... + A[i]

That means tree[i] manages:

An interval ending at i, with length lowbit(i)

If we imagine tree[i] as a small box:

Right endpoint of the box: always i.
Length of the box: decided by lowbit(i).
What the box stores: the sum of that segment of A.

Full Box Assignment for n = 8

Suppose the original array is indexed from 1 to 8. Then each box manages this range:

Index i:  1    2    3    4    5    6    7    8
Range managed by tree[i]:
  tree[1] = A[1]            (length lowbit(1)=1)
  tree[2] = A[1]+A[2]       (length lowbit(2)=2)
  tree[3] = A[3]            (length lowbit(3)=1)
  tree[4] = A[1]+...+A[4]   (length lowbit(4)=4)
  tree[5] = A[5]            (length lowbit(5)=1)
  tree[6] = A[5]+A[6]       (length lowbit(6)=2)
  tree[7] = A[7]            (length lowbit(7)=1)
  tree[8] = A[1]+...+A[8]   (length lowbit(8)=8)

BIT structure (n=8): each tree[i] covers exactly lowbit(i) elements ending at index i.

BIT Tree Structure

Why Is This Grouping Useful?

Because these boxes have different sizes: some manage 1 number, some manage 2, some manage 4, and some manage 8.

It is like a class with:

1-person groups, 2-person groups, 4-person groups, 8-person groups

When the teacher asks “How many candies do the first 7 students have?”, we do not need to ask one by one. We can ask a few group leaders directly:

First 7 = [1..4] + [5..6] + [7]

This corresponds exactly to:

tree[4] + tree[6] + tree[7]

5.8.4 Prefix Query: How Do We Quickly Compute `A[1..i]`?

The BIT query function is usually called query(i), and it means:

A[1] + A[2] + ... + A[i]

The core operation is:

i -= lowbit(i);

This means:

I have already taken the interval ending at i, so next I jump to the position before that interval and continue taking more boxes.

Manually Compute `query(7)`

We want to compute:

A[1] + A[2] + A[3] + A[4] + A[5] + A[6] + A[7]

Start from i = 7:

Current i	`lowbit(i)`	Add which box	This box manages	Next step
7	1	`tree[7]`	`A[7]`	`7 - 1 = 6`
6	2	`tree[6]`	`A[5] + A[6]`	`6 - 2 = 4`
4	4	`tree[4]`	`A[1] + A[2] + A[3] + A[4]`	`4 - 4 = 0`

So:

query(7) = tree[7] + tree[6] + tree[4]
         = A[7] + (A[5]+A[6]) + (A[1]+A[2]+A[3]+A[4])
         = A[1] + A[2] + ... + A[7]

These three intervals are exactly non-overlapping and cover everything:

[1..4] + [5..6] + [7]

Jump path for querying prefix(7) (i -= lowbit(i) jumping down):

Fenwick Query Path

Code:

long long query(int i) {
    long long sum = 0;
    while (i > 0) {
        sum += tree[i];
        i -= lowbit(i);
    }
    return sum;
}

💡 Memory tip: Query means “jump backward”. Each jump takes a whole box of precomputed sum.

5.8.5 Point Update: Why Do We Keep Jumping Upward?

If A[3] increases by val, which tree[i] values are affected?

Answer: every box whose managed range contains A[3] must also increase by val.

From the box assignment above:

tree[3] = A[3]
tree[4] = A[1]+A[2]+A[3]+A[4]
tree[8] = A[1]+...+A[8]

So when updating A[3], we must update:

tree[3], tree[4], tree[8]

The core operation is:

i += lowbit(i);

This means:

After updating the current box, jump to the next larger box that also contains this position.

Manually Compute `update(3, val)`

Current i	`lowbit(i)`	Update which box	Next step
3	1	`tree[3] += val`	`3 + 1 = 4`
4	4	`tree[4] += val`	`4 + 4 = 8`
8	8	`tree[8] += val`	`8 + 8 = 16`, beyond n, stop

Jump path for updating position 3 (i += lowbit(i) jumping up):

Fenwick Update Path

Code:

void update(int i, long long val) {
    while (i <= n) {
        tree[i] += val;
        i += lowbit(i);
    }
}

💡 Memory tip: Update means “jump forward”. Each jump tells one affected group leader: “Your total has changed!”

Query vs Update Directions

Operation	Jump Direction	Jump Code	Intuition
Query `query(i)`	Backward	`i -= lowbit(i)`	Take the current box, then continue taking previous boxes
Update `update(i,val)`	Forward	`i += lowbit(i)`	Notify all larger boxes that contain this position

Each jump changes the binary representation significantly, so there are at most about log N jumps.

5.8.6 Combine Query and Update into Complete Code

We have already calculated query and update by hand. Now we combine them into the classic Fenwick Tree template.

First remember three key points:

The array is indexed from 1: because lowbit(0)=0, index 0 would get stuck.
Query jumps backward: i -= lowbit(i).
Update jumps forward: i += lowbit(i).

📄 View Code: 5.8.6 Combine Query and Update into Complete Code

// ══════════════════════════════════════════════════════════════
// Fenwick Tree (BIT) — Classic Implementation
// Supports: point update O(log N), prefix sum query O(log N)
// Array MUST use 1-indexed indexing (critical!)
// ══════════════════════════════════════════════════════════════
#include <bits/stdc++.h>
using namespace std;

const int MAXN = 300005;

int n;
long long tree[MAXN];  // BIT array, 1-indexed

// ── lowbit: return the value of the lowest set bit ──
inline int lowbit(int x) {
    return x & (-x);
}

// ── Update: add val to position i ──
// Traverse upward: i += lowbit(i)
// Every ancestor node covering position i is updated
void update(int i, long long val) {
    for (; i <= n; i += lowbit(i))
        tree[i] += val;
    // Time: O(log N) — at most log2(N) iterations
}

// ── Query: return prefix sum A[1..i] ──
// Traverse downward: i -= lowbit(i)
// Decompose [1..i] into O(log N) non-overlapping intervals
long long query(int i) {
    long long sum = 0;
    for (; i > 0; i -= lowbit(i))
        sum += tree[i];
    return sum;
    // Time: O(log N) — at most log2(N) iterations
}

// ── Build: initialize BIT from an existing array A[1..n] ──
// Method 1: N separate updates — O(N log N)
void build_slow(long long A[]) {
    fill(tree + 1, tree + n + 1, 0LL);
    for (int i = 1; i <= n; i++)
        update(i, A[i]);
}

// Method 2: O(N) build (using the direct parent relationship)
void build_fast(long long A[]) {
    for (int i = 1; i <= n; i++) {
        tree[i] += A[i];
        int parent = i + lowbit(i);  // direct parent in BIT
        if (parent <= n)
            tree[parent] += tree[i];
    }
}

// Method 3: O(N) build (using prefix sums)
// Principle: tree[i] = sum(A[i-lowbit(i)+1 .. i])
//          = prefix[i] - prefix[i - lowbit(i)]
void build_prefix(long long A[], long long prefix[]) {
    // Compute prefix sums first
    for (int i = 1; i <= n; i++) prefix[i] = prefix[i-1] + A[i];
    // Directly compute each node using prefix sums
    for (int i = 1; i <= n; i++)
        tree[i] = prefix[i] - prefix[i - lowbit(i)];
}

// ── Complete Example ──
int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int q;
    cin >> n >> q;

    long long A[MAXN] = {};
    for (int i = 1; i <= n; i++) cin >> A[i];
    build_fast(A);  // O(N) initialization

    while (q--) {
        int type;
        cin >> type;
        if (type == 1) {
            // Point update: A[i] += val
            int i; long long val;
            cin >> i >> val;
            update(i, val);
        } else {
            // Prefix query: sum of A[1..r]
            int r;
            cin >> r;
            cout << query(r) << "\n";
        }
    }
    return 0;
}

5.8.7 Range Query: Subtract Two Prefix Sums

Once we know prefix queries, range queries become very natural.

If we want to compute A[l..r], we can first compute A[1..r], then subtract the part we do not need, A[1..l-1]:

A[l..r] = A[1..r] - A[1..l-1]

This is exactly the same idea as prefix sums. The difference is that here query is done by BIT in O(log N) time.

📄 View Code: Range Query

// Range sum: sum of A[l..r]
// Time: O(log N) — two prefix queries
long long range_query(int l, int r) {
    return query(r) - query(l - 1);
    // query(r)   = A[1] + A[2] + ... + A[r]
    // query(l-1) = A[1] + A[2] + ... + A[l-1]
    // difference = A[l] + A[l+1] + ... + A[r]
}

// Example:
// A = [3, 1, 4, 1, 5, 9, 2, 6]  (1-indexed)
// range_query(3, 6) = query(6) - query(2)
//                  = (3+1+4+1+5+9) - (3+1)
//                  = 23 - 4 = 19
// Verify: A[3]+A[4]+A[5]+A[6] = 4+1+5+9 = 19 ✓

5.8.8 When Should We Use BIT? Prefix Sum vs BIT vs Segment Tree

At this point, you may ask: since we already have prefix sums and segment trees, why learn BIT?

In short:

Prefix sum: best for arrays without updates.
BIT: best for “point update + range sum”.
Segment tree: best for more complex range operations.

Operation	Prefix Sum Array	Fenwick Tree (BIT)	Segment Tree
Build	`O(N)`	`O(N)` or `O(N log N)`	`O(N)`
Prefix Query	`O(1)`	`O(log N)`	`O(log N)`
Range Query	`O(1)`	`O(log N)`	`O(log N)`
Point Update	`O(N)` rebuild	`O(log N)` ✓	`O(log N)` ✓
Range Update	`O(N)`	`O(log N)` (difference BIT)	`O(log N)` (lazy propagation)
Range Min/Max	`O(1)` (sparse table)	❌ Not supported	✓ Supported
Code Complexity	Very simple	Simple (10 lines)	Complex (50+ lines)
Constant Factor	Smallest	Very small	Larger
Space	`O(N)`	`O(N)`	`O(4N)`

When to choose BIT?

✅ Only need prefix/range sum + point update
✅ Need minimal code to reduce bugs in contests
✅ Inversion counting and frequency counting problems
❌ Need range minimum/maximum → use Segment Tree
❌ Need complex range operations such as range multiplication → use Segment Tree

5.8.9 Interactive Visualization: BIT Update Process

5.8.10 Range Update + Point Query (Difference BIT)

The standard BIT we learned above supports:

Point update + prefix query

Now let us go one step further. What if a problem often says “add val to a whole interval”?

Updating each position one by one is too slow. We can borrow the idea of a difference array and let BIT maintain that difference array. Then we can support:

Range update + point query

Principle

Let the difference array be D[i] = A[i] - A[i-1] (D[1] = A[1]).

You can think of a difference array as a “change notice sheet”:

D[l] += val: starting from l, every position increases by val.
D[r+1] -= val: starting from r+1, cancel the increase we just added.

So adding val to all elements in A[l..r] only needs two changes:

D[l] += val
D[r+1] -= val

And A[i] equals the prefix sum of the difference array:

A[i] = D[1] + D[2] + ... + D[i]

📄 Full C++ Code

// ══════════════════════════════════════════════════════════════
// Difference BIT: Range Update + Point Query
// ══════════════════════════════════════════════════════════════
#include <bits/stdc++.h>
using namespace std;

const int MAXN = 300005;
int n;
long long diff_bit[MAXN];  // BIT on the difference array D[]

inline int lowbit(int x) { return x & (-x); }

// Update position i in the difference BIT: D[i] += val
void diff_update(int i, long long val) {
    for (; i <= n; i += lowbit(i))
        diff_bit[i] += val;
}

// Query A[i] = D[1..i] sum = prefix query on the difference BIT
long long diff_query(int i) {
    long long s = 0;
    for (; i > 0; i -= lowbit(i))
        s += diff_bit[i];
    return s;
}

// Range update: add val to every element in A[l..r]
// Equivalent to: D[l] += val, D[r+1] -= val
void range_update(int l, int r, long long val) {
    diff_update(l, val);       // D[l] += val
    diff_update(r + 1, -val);  // D[r+1] -= val
}

// Point query: return the current value of A[i]
// A[i] = D[1] + D[2] + ... + D[i] = prefix_sum(D, i)
long long point_query(int i) {
    return diff_query(i);
}

Advanced: Range Update + Range Query (Double BIT)

To support both range updates and range queries, use two BITs:

📄 Full C++ Code: Double BIT

// ══════════════════════════════════════════════════════════════
// Double BIT: Range Update + Range Query
// Formula: sum(1..r) = B1[r] * r - B2[r]
// Where B1 is BIT on D[], B2 is BIT on (i-1)*D[i]
// ══════════════════════════════════════════════════════════════
long long B1[MAXN], B2[MAXN];

inline int lowbit(int x) { return x & (-x); }

void add(long long* b, int i, long long v) {
    for (; i <= n; i += lowbit(i)) b[i] += v;
}
long long sum(long long* b, int i) {
    long long s = 0;
    for (; i > 0; i -= lowbit(i)) s += b[i];
    return s;
}

// Range update: add val to A[l..r]
void range_add(int l, int r, long long val) {
    add(B1, l, val);
    add(B1, r + 1, -val);
    add(B2, l, val * (l - 1));     // compensation for prefix formula
    add(B2, r + 1, -val * r);
}

// Prefix sum A[1..r]
long long prefix_sum(int r) {
    return sum(B1, r) * r - sum(B2, r);
}

// Range sum A[l..r]
long long range_sum(int l, int r) {
    return prefix_sum(r) - prefix_sum(l - 1);
}

5.8.11 USACO-Style Problem: Counting Inversions with BIT

Problem Description

Count Inversions (O(N log N))

Given an integer array A of length N (distinct elements, range 1..N), count the number of inversions.

An inversion is a pair of indices (i, j) such that i < j but A[i] > A[j].

Sample Input:

5
3 1 4 2 5

Sample Output:

Explanation: The inversions are (3,1), (3,2), and (4,2), for a total of 3 pairs.

Understand Inversions with a Small Example First

Array:

[3, 1, 4, 2, 5]

Scan from left to right:

3 appears before 1, and 3 > 1, so (3,1) is an inversion.
3 appears before 2, and 3 > 2, so (3,2) is an inversion.
4 appears before 2, and 4 > 2, so (4,2) is an inversion.

There are 3 inversions in total.

Why Can BIT Help?

When processing the current number a, all numbers to its left have already appeared. We only need to know:

Among the numbers already seen on the left, how many are > a?

If BIT maintains “how many times each value has appeared”, then we can quickly get:

number of values already seen - number of seen values <= a

This is exactly what the code computes:

(i - 1) - query(a)

Solution: BIT Inversion Counting

📄 View Code: Solution: BIT Inversion Counting

// ══════════════════════════════════════════════════════════════
// Count Inversions with Fenwick Tree — O(N log N)
//
// Core idea:
//   Process A[i] from left to right.
//   For each A[i], the number of inversions with A[i] as the right endpoint
//   = number of already-processed values greater than A[i]
//   = (number of elements processed so far) - (number of processed elements <= A[i])
//   = i-1 - prefix_query(A[i])
//   Sum over all i to get the total number of inversions.
//
// BIT's role: track the frequency of seen values.
//   After seeing value v: update(v, +1)
//   Query count of values <= x: query(x)
// ══════════════════════════════════════════════════════════════
#include <bits/stdc++.h>
using namespace std;

typedef long long ll;
const int MAXN = 300005;

int n;
int bit[MAXN];  // frequency count BIT

inline int lowbit(int x) { return x & (-x); }

// Add 1 at position v (we have seen value v)
void update(int v) {
    for (; v <= n; v += lowbit(v))
        bit[v]++;
}

// Count seen values in [1..v]
int query(int v) {
    int cnt = 0;
    for (; v > 0; v -= lowbit(v))
        cnt += bit[v];
    return cnt;
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    cin >> n;

    ll inversions = 0;

    for (int i = 1; i <= n; i++) {
        int a;
        cin >> a;

        // Count inversions with a as the right endpoint:
        // number of already-seen values greater than a
        // = (i-1 elements seen so far) - (count of seen values <= a)
        int less_or_equal = query(a);          // count of seen values in [1..a]
        int greater = (i - 1) - less_or_equal; // count of seen values in [a+1..n]
        inversions += greater;

        // Mark that we have seen value a
        update(a);
    }

    cout << inversions << "\n";
    return 0;
}

/*
Trace for A = [3, 1, 4, 2, 5]:

i=1, a=3: seen=[], query(3)=0, greater=0-0=0. inversions=0. update(3).
i=2, a=1: seen=[3], query(1)=0, greater=1-0=1. inversions=1. update(1).
           (3 > 1: 1 inversion: (3,1) ✓)
i=3, a=4: seen=[3,1], query(4)=2, greater=2-2=0. inversions=1. update(4).
           (no seen element > 4)
i=4, a=2: seen=[3,1,4], query(2)=1, greater=3-1=2. inversions=3. update(2).
           (3>2 and 4>2: 2 inversions: (3,2),(4,2) ✓)
i=5, a=5: seen=[3,1,4,2], query(5)=4, greater=4-4=0. inversions=3. update(5).

Final: 3 ✓
*/

Complexity Analysis:

Time: O(N log N) — N iterations, each with an O(log N) update and query
Space: O(N) for BIT

Extension: If array values are not in the range 1..N, apply coordinate compression first:

📄 Full C++ Code

// Coordinate compression for arbitrary values
vector<int> A(n);
for (int i = 0; i < n; i++) cin >> A[i];

// Step 1: sort and deduplicate
vector<int> sorted_A = A;
sort(sorted_A.begin(), sorted_A.end());
sorted_A.erase(unique(sorted_A.begin(), sorted_A.end()), sorted_A.end());

// Step 2: replace each value with its rank (1-indexed)
for (int i = 0; i < n; i++) {
    A[i] = lower_bound(sorted_A.begin(), sorted_A.end(), A[i]) - sorted_A.begin() + 1;
    // A[i] is now in [1..M], where M = sorted_A.size()
}
// Now use BIT with n = sorted_A.size()

5.8.12 Value-Domain Fenwick Tree: Global K-th Smallest

A value-domain BIT is another common use of BIT.

The BITs above usually treat the index as an array position:

position 1, position 2, position 3, ...

A value-domain BIT treats the index as the value itself:

value 1, value 2, value 3, ...

That is, bit[v] means how many times value v appears. This lets us efficiently query “the k-th smallest element in the sequence”.

For example, the current sequence is:

[1, 2, 2, 4, 7]

The frequencies are:

value 1 appears 1 time
value 2 appears 2 times
value 4 appears 1 time
value 7 appears 1 time

The 3rd smallest is 2, because after sorting:

1, 2, 2, 4, 7
      ↑
  3rd smallest

Naive Approach: Binary Search + Prefix Query, O(log² N)

📄 View Code: Naive Approach: Binary Search + Prefix Query, O(log² N)

// Find the k-th smallest value in a BIT over value domain [1..MAXV]
int kth_binary_search(int k) {
    int lo = 1, hi = MAXV;
    while (lo < hi) {
        int mid = (lo + hi) / 2;
        if (query(mid) >= k)
            hi = mid;
        else
            lo = mid + 1;
    }
    return lo;
}

Doubling Optimization: O(log N)

Binary search is already easy to understand, but every binary-search step calls query, so the complexity is O(log² N).

Using BIT's tree structure, the doubling method can find the k-th smallest in O(log N):

📄 View Code: Doubling Optimization

// Global k-th smallest (doubling method) — O(log N)
// Prerequisite: BIT maintains value-domain frequency, bit[v] = count of v
int kth(int k) {
    int sum = 0, x = 0;
    // Determine the answer bit by bit from highest to lowest
    for (int i = (int)log2(MAXV); i >= 0; --i) {
        int nx = x + (1 << i);
        if (nx <= MAXV && sum + bit[nx] < k) {
            x = nx;          // take this whole segment and continue expanding right
            sum += bit[nx];
        }
        // Otherwise, the answer is in [x+1, x + 2^(i-1)], so do not expand
    }
    return x + 1;  // x is the last position where sum < k; answer is x+1
}

// Complete usage:
// Insert value v: update(v, 1)
// Delete value v: update(v, -1)
// Query the k-th smallest: kth(k)

💡 Principle: BIT's tree structure makes bit[x] the sum of the interval ending at x with length lowbit(x). During doubling, each step tries setting one bit of x to 1: if the prefix sum after taking that segment is still < k, the answer must be to the right, so we expand; otherwise, we keep searching on the left. There are O(log V) steps in total.

💡 Chapter connection: BIT and Segment Tree are two of the most commonly paired data structures in USACO. BIT handles 80% of scenarios with about 1/5 of the code. After mastering BIT, return to Chapter 5.7 to learn lazy propagation in Segment Trees — that is the territory BIT cannot reach.

5.8.13 Common Mistakes

❌ Mistake 1: Wrong `lowbit` Implementation

// ❌ Wrong — common typo
int lowbit(int x) { return x & (x - 1); }  // This clears the lowest bit; it does not return it!
// x=6 (0110): x&(x-1) = 0110&0101 = 0100 = 4 (wrong, should be 2)

// ✅ Correct
int lowbit(int x) { return x & (-x); }
// x=6: -6 = ...11111010 (two's complement)
// 0110 & 11111010 = 0010 = 2 ✓

Memory trick: x & (-x) reads as “x AND negative x”. -x is bitwise NOT plus 1. It preserves the lowest set bit, clears the bits below it, flips the bits above it, and the AND operation keeps only the lowest set bit.

❌ Mistake 2: 0-indexed Array Trap

BIT must use 1-indexed arrays. 0-indexing causes infinite loops!

// ❌ Wrong — 0-indexing causes an infinite loop!
// If i = 0: query loop does i -= lowbit(0) = 0 - 0 = 0 → infinite loop!

// ✅ Correct — convert to 1-indexed
for (int i = 0; i < n; i++) {
    update(i + 1, arr[i]);  // convert 0-indexed i to 1-indexed i+1
}
// Note: for a 0-indexed range [l, r], use query(r+1) - query(l)

❌ Mistake 3: Integer Overflow for Large Sums

// ❌ Wrong — tree[] should be long long for large sums
int tree[MAXN];   // overflows when the sum exceeds 2^31

// ✅ Correct
long long tree[MAXN];

// Also: inversion count can be up to N*(N-1)/2 ≈ 4.5×10^10 (N=3×10^5)
// Always use long long for the result counter!
long long inversions = 0;  // ✅ not int!

❌ Mistake 4: Forgetting to Clear BIT Between Test Cases

📄 View Code: Mistake 4

// ❌ Wrong — multiple test cases
int T; cin >> T;
while (T--) {
    // Forgot to clear tree[]!
    // Old data from the previous test case contaminates the result
    solve();
}

// ✅ Correct — reset before each test case
int T; cin >> T;
while (T--) {
    fill(tree + 1, tree + n + 1, 0LL);  // clear BIT
    solve();
}

5.8.14 Chapter Summary

📋 Quick Reference

Operation	Code	Description
lowbit	`x & (-x)`	Value of the lowest set bit in x
Point update	`for(;i<=n;i+=lowbit(i)) t[i]+=v`	Propagate upward
Prefix query	`for(;i>0;i-=lowbit(i)) s+=t[i]`	Decompose downward
Range query	`query(r) - query(l-1)`	Difference of two prefixes
Range update (difference BIT)	`upd(l,+v); upd(r+1,-v)`	Difference array
Inversion counting	`(i-1) - query(a[i])`	Count for each element
Value-domain BIT k-th smallest	Find the smallest `x` with `query(x) >= k` in a frequency BIT	Treat values as indices
Array must be	1-indexed	0-indexing → infinite loop

🧩 Complete Learning Path from Easy to Advanced

Start from the conflict: direct arrays have slow queries, prefix sums have slow updates.
Learn lowbit: it decides how many numbers each tree[i] manages.
Understand box assignment: tree[i] stores an interval sum ending at i.
Master the two directions: query jumps backward, update jumps forward.
Apply range query: sum(l,r)=query(r)-query(l-1).
Advance to difference BIT: turn range updates into two point updates on the difference array.
Apply BIT to counting problems: inversions, frequency statistics, and k-th smallest can all use BIT.

❓ FAQ

Q1: Both BIT and Segment Tree support prefix sum + point update. Which one should I choose?

A: Use BIT whenever possible. BIT has only about 10 lines of core code, smaller constants, is often 2-3x faster in practice, and is less likely to contain bugs. Use Segment Tree only when you need range min/max (RMQ), range coloring, or more complex range operations. In contests, BIT is the “default weapon”; Segment Tree is the “heavy artillery”.

Q2: Can BIT support range minimum queries (RMQ)?

A: Standard BIT cannot support RMQ, because minimum has no inverse operation. You cannot “undo” a minimum merge like you can subtract a prefix sum. Range min/max usually requires Segment Tree or Sparse Table. There is a static BIT RMQ technique, but it only works without updates and has limited practical use.

Q3: Can BIT be two-dimensional (2D BIT)?

A: Yes. 2D BIT solves 2D prefix sum + point update problems with complexity O(log N × log M). The code uses two nested loops:
// 2D BIT update
void update2D(int x, int y, long long v) {
    for (int i = x; i <= N; i += lowbit(i))
        for (int j = y; j <= M; j += lowbit(j))
            bit[i][j] += v;
}
It is not very common in USACO, but it occasionally appears in 2D coordinate counting problems.

2D Fenwick Tree (2D BIT)

5.8.15 Practice Problems

🟢 Easy 1: Range Sum (Point Update) Given an array of length N, support two operations:

1 i x: add x to A[i]
2 l r: query A[l] + A[l+1] + ... + A[r]

Hint: Direct application of BIT. Use update(i, x) and query(r) - query(l-1).

🟢 Easy 2: Count Elements Less Than K Given N operations. Each operation either inserts an integer (range 1..10^6) or asks: “Among all currently inserted integers, how many are ≤ K?”

Hint: BIT maintains a frequency array over the value domain. update(v, 1) inserts value v, and query(K) is the answer.

🟡 Medium 1: Range Add, Point Query Given an array of length N, initially all zeros, support two operations:

1 l r x: add x to every element in A[l..r]
2 i: query the current value of A[i]

Hint: Use difference BIT (Section 5.8.10).

🟡 Medium 2: Inversion Count with Coordinate Compression Given an array of length N, with values in the range 1..10^9 and possible duplicates, count the number of inversions.

Hint: Coordinate compress first, then use BIT counting (a variant of Section 5.8.11). Note that equal elements do not count: an inversion requires i < j and A[i] > A[j] strictly.

🔴 Hard: Range Add, Range Sum (Double BIT) Given an array of length N, support two operations:

1 l r x: add x to every element in A[l..r]
2 l r: query A[l] + ... + A[r]

Hint: Use Double BIT. Formula: prefix_sum(r) = B1[r] * r - B2[r], where B1 maintains the difference array and B2 maintains the weighted difference array.

✅ Full Solutions for All BIT Practice Problems

🟢 Easy 1: Range Sum

#include <bits/stdc++.h>
using namespace std;
const int MAXN = 100005;
int n, q;
long long tree[MAXN];
int lowbit(int x) { return x & (-x); }
void update(int i, long long val) { for (; i <= n; i += lowbit(i)) tree[i] += val; }
long long query(int i) { long long s=0; for (; i > 0; i -= lowbit(i)) s += tree[i]; return s; }
int main() {
    ios_base::sync_with_stdio(false); cin.tie(NULL);
    cin >> n >> q;
    while (q--) {
        int t; cin >> t;
        if (t == 1) { int i; long long x; cin >> i >> x; update(i, x); }
        else { int l, r; cin >> l >> r; cout << query(r) - query(l-1) << "\n"; }
    }
}

🟡 Medium 1: Range Add, Point Query (Difference BIT) Core idea: maintain the difference array in BIT. range_add(l,r,x) = update(l,x) + update(r+1,-x). Point query = query(i).

void range_add(int l, int r, long long x) { update(l, x); update(r+1, -x); }
long long point_query(int i) { return query(i); }

🟡 Medium 2: Inversion Count

// Coordinate compress first. Then for each element x:
// inversions += (number of inserted elements) - query(compressed x)
// Then insert x: update(compressed x, 1)

🔴 Hard: Range Add, Range Sum (Double BIT)

// prefix_sum(r) = (r+1)*sum(D[1..r]) - sum(i*D[i], i=1..r)
// = (r+1)*B1.query(r) - B2.query(r)
// Where B1 stores D[i], B2 stores i*D[i]
struct DoubleBIT {
    long long B1[MAXN], B2[MAXN];
    int n;
    DoubleBIT(int n) : n(n) { memset(B1,0,sizeof(B1)); memset(B2,0,sizeof(B2)); }
    void add(int i, long long v) {
        for (int x=i; x<=n; x+=x&-x) { B1[x]+=v; B2[x]+=v*i; }
    }
    void range_add(int l, int r, long long v) { add(l,v); add(r+1,-v); }
    long long prefix(int i) {
        long long s=0; for(int x=i;x>0;x-=x&-x) s+=(i+1)*B1[x]-B2[x]; return s;
    }
    long long range_query(int l, int r) { return prefix(r)-prefix(l-1); }
};

C++ for Competitive Programming: A USACO Guide