Chapter 3.7: Hashing Techniques
📝 Before You Continue: You should know STL containers (Chapter 3.1) and string basics (Chapter 2.3). This chapter covers hashing principles and advanced competitive programming usage.
Hashing is one of the most important "tools" in competitive programming: it turns complex comparison problems into O(1) numeric comparisons. But hashing is also the easiest technique to get "hacked"—this chapter teaches both how to use it well and how to prevent being hacked.
3.7.1 unordered_map vs map: Internals & Performance
Internal Implementation Comparison
| Feature | map | unordered_map |
|---|---|---|
| Internal structure | Red-black tree (balanced BST) | Hash table |
| Lookup time | O(log N) | O(1) avg, O(N) worst |
| Insert time | O(log N) | O(1) avg, O(N) worst |
| Iteration order | Ordered (ascending by key) | Unordered |
| Memory usage | O(N), smaller constant | O(N), larger constant |
| Worst case | O(log N) (stable) | O(N) (hash collision) |
#include <bits/stdc++.h>
using namespace std;
int main() {
// map: ordered, O(log N)
map<int, int> m;
m[3] = 30; m[1] = 10; m[2] = 20;
for (auto [k, v] : m) cout << k << ":" << v << " ";
// output: 1:10 2:20 3:30 ← ordered!
// unordered_map: unordered, O(1) average
unordered_map<int, int> um;
um[3] = 30; um[1] = 10; um[2] = 20;
// iteration order undefined, but lookup is very fast
// performance difference: N=10^6 operations
// map: ~300ms; unordered_map: ~80ms (roughly)
}
When to Choose Which?
- Use
map: need ordered iteration, needlower_bound/upper_bound, extreme key range (high hash collision risk) - Use
unordered_map: pure lookup/insert, key is integer or string, large N (> 10^5)
3.7.2 Anti-Hack: Custom Hash
Problem: unordered_map's default integer hash is essentially hash(x) = x, allowing attackers to construct many hash collisions, degrading operations to O(N) and causing TLE.
On platforms like Codeforces, this is a common hack technique.
Solution: splitmix64 Hash
// Anti-hack custom hasher — uses splitmix64
struct custom_hash {
static uint64_t splitmix64(uint64_t x) {
x += 0x9e3779b97f4a7c15;
x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9;
x = (x ^ (x >> 27)) * 0x94d049bb133111eb;
return x ^ (x >> 31);
}
size_t operator()(uint64_t x) const {
static const uint64_t FIXED_RANDOM =
chrono::steady_clock::now().time_since_epoch().count();
return splitmix64(x + FIXED_RANDOM);
}
};
// Usage:
unordered_map<int, int, custom_hash> safe_map;
unordered_set<int, custom_hash> safe_set;
⚠️ Contest tip: When using
unordered_mapon Codeforces, always addcustom_hash. USACO test data won't deliberately construct hacks, but it's a good habit.
3.7.3 String Hashing (Polynomial Hash)
String hashing maps a string to an integer, turning string comparison into numeric comparison (O(1)).
Core Formula
For string s[0..n-1], define the hash value as:
hash(s) = s[0]·B^(n-1) + s[1]·B^(n-2) + ... + s[n-1]·B^0 (mod M)
where B is the base (typically 131 or 131117) and M is a large prime (typically 10⁹+7 or 10⁹+9).
Prefix Hash + Substring Hash O(1)
// String hashing: O(N) preprocessing, O(1) substring hash
#include <bits/stdc++.h>
using namespace std;
typedef unsigned long long ull;
const ull BASE = 131;
// Use unsigned long long natural overflow (equivalent to mod 2^64)
// Or specify MOD manually:
// const ull MOD = 1e9 + 7;
struct StringHash {
int n;
vector<ull> h, pw;
StringHash(const string& s) : n(s.size()), h(n + 1, 0), pw(n + 1, 1) {
for (int i = 0; i < n; i++) {
h[i + 1] = h[i] * BASE + (s[i] - 'a' + 1); // 1-indexed prefix hash
pw[i + 1] = pw[i] * BASE; // BASE^(i+1)
}
}
// Get hash of substring s[l..r] (0-indexed)
ull get(int l, int r) {
return h[r + 1] - h[l] * pw[r - l + 1]; // ← KEY formula
}
};
int main() {
ios_base::sync_with_stdio(false);
cin.tie(NULL);
string s = "abcabc";
StringHash sh(s);
// Compare if two substrings are equal
// s[0..2] = "abc", s[3..5] = "abc"
cout << (sh.get(0, 2) == sh.get(3, 5) ? "Equal" : "Not Equal") << "\n"; // Equal
// Compare s[0..1] = "ab" vs s[3..4] = "ab"
cout << (sh.get(0, 1) == sh.get(3, 4) ? "Equal" : "Not Equal") << "\n"; // Equal
}
Hash Formula Derivation:
h[r+1] = s[0]*B^r + s[1]*B^(r-1) + ... + s[r]*B^0
h[l] = s[0]*B^(l-1) + ... + s[l-1]*B^0
h[r+1] - h[l] * B^(r-l+1)
= (s[0]*B^r + ... + s[r]*B^0)
- (s[0]*B^r + ... + s[l-1]*B^(r-l+1))
= s[l]*B^(r-l) + s[l+1]*B^(r-l-1) + ... + s[r]*B^0
= hash(s[l..r]) ✓
下图直观展示了前缀哈希数组的构建过程,以及如何用 get(l, r) 公式在 O(1) 内提取任意子串的哈希值:
3.7.4 Double Hashing (Avoiding Collisions)
Single hash (mod M) has collision probability ≈ 1/M. For N substring comparisons, expected collisions ≈ N²/(2M).
- If
M = 10⁹+7, N = 10⁶: collision probability ≈10¹²/(2×10⁹) = 500times! Not safe. - Solution: double hashing, using two different (B, M) pairs simultaneously, collision probability drops to
1/(M₁×M₂) ≈ 10⁻¹⁸.
// Double hashing: two (BASE, MOD) pairs used simultaneously, extremely low collision probability
struct DoubleHash {
static const ull B1 = 131, M1 = 1e9 + 7;
static const ull B2 = 137, M2 = 1e9 + 9;
int n;
vector<ull> h1, h2, pw1, pw2;
DoubleHash(const string& s) : n(s.size()),
h1(n+1,0), h2(n+1,0), pw1(n+1,1), pw2(n+1,1) {
for (int i = 0; i < n; i++) {
ull c = s[i] - 'a' + 1;
h1[i+1] = (h1[i] * B1 + c) % M1;
h2[i+1] = (h2[i] * B2 + c) % M2;
pw1[i+1] = pw1[i] * B1 % M1;
pw2[i+1] = pw2[i] * B2 % M2;
}
}
// Return pair<ull,ull> as the hash "fingerprint" of substring s[l..r]
pair<ull,ull> get(int l, int r) {
ull v1 = (h1[r+1] - h1[l] * pw1[r-l+1] % M1 + M1) % M1;
ull v2 = (h2[r+1] - h2[l] * pw2[r-l+1] % M2 + M2) % M2;
return {v1, v2};
}
};
3.7.5 Application: String Matching (Rabin-Karp)
// Rabin-Karp string matching: find all occurrences of pattern P in text T
// Time: O(N+M) average, O(NM) worst case (but extremely fast in practice)
#include <bits/stdc++.h>
using namespace std;
typedef unsigned long long ull;
vector<int> rabinKarp(const string& T, const string& P) {
int n = T.size(), m = P.size();
if (m > n) return {};
const ull BASE = 131;
ull patHash = 0, textHash = 0, pow_m = 1;
// Compute BASE^m (natural overflow)
for (int i = 0; i < m - 1; i++) pow_m *= BASE;
// Initial hash
for (int i = 0; i < m; i++) {
patHash = patHash * BASE + P[i];
textHash = textHash * BASE + T[i];
}
vector<int> result;
for (int i = 0; i + m <= n; i++) {
if (textHash == patHash) {
// Verify when hashes match (avoid false positives from collision)
if (T.substr(i, m) == P) result.push_back(i);
}
if (i + m < n) {
// Rolling hash: remove leftmost char, add rightmost char
textHash = textHash - T[i] * pow_m; // remove leftmost
textHash = textHash * BASE + T[i + m]; // add rightmost
}
}
return result;
}
3.7.6 Application: Longest Common Substring
Problem: Given strings S and T, find the length of their longest common substring.
Approach: Binary search on the answer (length L of longest common substring), then use a hash set to check if any substring of length L appears in both strings.
// Longest common substring: O(N log N) — binary search + hashing
int longestCommonSubstring(const string& S, const string& T) {
StringHash hs(S), ht(T);
int ns = S.size(), nt = T.size();
auto check = [&](int len) -> bool {
unordered_set<ull> setS;
for (int i = 0; i + len <= ns; i++)
setS.insert(hs.get(i, i + len - 1));
for (int j = 0; j + len <= nt; j++)
if (setS.count(ht.get(j, j + len - 1)))
return true;
return false;
};
int lo = 0, hi = min(ns, nt);
while (lo < hi) {
int mid = (lo + hi + 1) / 2;
if (check(mid)) lo = mid;
else hi = mid - 1;
}
return lo;
}
⚠️ Common Mistakes
-
Bad modulus choice: Don't use numbers other than
10⁹+7; especially avoid non-prime moduli (high collision rate). Recommended:10⁹+7and10⁹+9as a double hash pair. -
unordered_maphacked: On platforms like Codeforces, the default hash can be attacked. Always usecustom_hash. -
Substring hash subtraction underflow:
h[r+1] - h[l] * pw[r-l+1]may be negative (with signed integers). Useunsigned long longnatural overflow, or(... % M + M) % Mto ensure non-negative. -
BASE doesn't match character set: For lowercase letters (26 types), BASE must be > 26 (typically 31 or 131). For all ASCII characters (128 types), BASE must be > 128 (use 131 or 137).
-
Hash collision causing WA: Even with double hashing, collisions are theoretically possible. If uncertain, add direct string comparison when hashes match.
Chapter Summary
📌 Core Comparison Table
| Tool | Time Complexity | Use Case |
|---|---|---|
map<K,V> | O(log N) | Need ordering, need range queries |
unordered_map<K,V> | O(1) amortized | Only need lookup/insert, key order not required |
| String hash (single) | O(N) preprocess, O(1) query | Substring comparison, pattern matching |
| String hash (double) | O(N) preprocess, O(1) query | High-precision scenarios, avoid collisions |
❓ FAQ
Q1: Which is better — unsigned long long natural overflow double hash or manual mod hash?
A:
ullnatural overflow (equivalent to mod 2⁶⁴) is simpler to code, and 2⁶⁴ is large enough that single-hash collision probability is already very low (≈ 10⁻¹⁸). But crafted data can deliberately cause collisions — double hashing is safer then. Both work in contests;ullis more common.
Q2: What can string hashing do that KMP cannot?
A: String hashing excels at multi-string comparison (e.g., finding longest common substring, palindromic substrings), while KMP only excels at single-pattern matching. Hash + binary search can solve many string problems in O(N log N) that would require more complex KMP implementations.
Q3: Should I use BASE 31 or 131?
A: Use 31 for lowercase letters only (a prime less than 37, avoids too-small hash space). Use 131 for mixed case or digits (a prime greater than 128, covers full ASCII). The key is: BASE must be larger than the character set size and ideally a prime.
Practice Problems
Problem 3.7.1 — Two Sum with Hash 🟢 Easy
Given array A, find if any two distinct elements sum to target X. Use unordered_set.
Hint
For each A[i], check if (X - A[i]) is already in the hash set. Insert A[i] after checking.Problem 3.7.2 — Substring Check 🟢 Easy Given string T and pattern P, check if P appears in T. Print all starting indices.
Hint
Use Rabin-Karp rolling hash, or just use `string::find` for practice, then implement manually.Problem 3.7.3 — Longest Palindromic Substring 🟡 Medium Find the length of the longest palindromic substring.
Hint
A palindrome s[l..r] satisfies: hash(s[l..r]) == hash(reverse(s)[n-1-r..n-1-l]). Binary search + hash on both forward and reversed string.Problem 3.7.4 — Count Distinct Substrings 🟡 Medium Given string S of length N (N ≤ 5000), count the number of distinct substrings.
Hint
Insert all O(N²) substring hashes into an unordered_set, count distinct values. Use double hash to avoid collisions.Problem 3.7.5 — String Periods 🔴 Hard Find the smallest period of a string S (smallest k such that S is a repetition of S[0..k-1]).