C++ 离散化详解

最新推荐文章于 2025-04-14 18:13:02 发布

什码情况

最新推荐文章于 2025-04-14 18:13:02 发布

阅读量989

点赞数 30

CC 4.0 BY-SA版权

文章标签：算法 c++ 数据结构离散化 ACM

本文链接：https://blog.youkuaiyun.com/user_longling/article/details/146959426

1. 什么是离散化？

离散化（Discretization）是一种将 原始数据映射到较小范围的整数索引 的方法，通常用于处理 坐标压缩、数据压缩、优化数据结构查询 等场景。

在 C++ 中，离散化主要用于以下场景：

线段树 / 树状数组：当数值范围较大（如 10^9 级别），但数值种类较少时，离散化可以将其映射到 0 ~ n-1，减少空间消耗。
扫描线算法：处理动态区间（如二维平面上的矩形覆盖、线段覆盖）时，可以用离散化将 y 坐标转化为索引。
排序去重：在数据分析、哈希映射等领域，离散化可以将无序数据转换为紧凑的索引，提高查询效率。
动态规划 / 归并排序优化：有时可以通过离散化减少空间需求，提高查询速度。

2. 离散化的常见方法

离散化的主要思路是 排序 + 去重 + 映射，在 C++ 中，常见方法如下：

(1) 使用 `vector` 和 `unordered_map` 离散化

#include <iostream>
#include <vector>
#include <unordered_map>
#include <algorithm>
using namespace std;

vector<int> discretize(vector<int> &arr) {
    vector<int> sorted_arr = arr;    // 复制原数组
    sort(sorted_arr.begin(), sorted_arr.end());  // 排序
    sorted_arr.erase(unique(sorted_arr.begin(), sorted_arr.end()), sorted_arr.end()); // 去重

    unordered_map<int, int> mapping; // 原值 -> 离散化索引
    for (int i = 0; i < sorted_arr.size(); i++) {
        mapping[sorted_arr[i]] = i;  // 映射到 [0, n-1]
    }

    vector<int> compressed(arr.size()); // 存储映射后的值
    for (int i = 0; i < arr.size(); i++) {
        compressed[i] = mapping[arr[i]];
    }
    return compressed;
}

int main() {
    vector<int> data = {100, 200, 300, 100, 50};
    vector<int> compressed = discretize(data);
    for (int x : compressed) cout << x << " ";  // 输出: 1 2 3 1 0
    return 0;
}

(2) 使用 `lower_bound()` 进行离散化查询

lower_bound() 可以在有序数组中快速找到某个值的索引，从而进行映射。

#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

vector<int> discretize_with_lower_bound(vector<int> &arr) {
    vector<int> sorted_arr = arr;
    sort(sorted_arr.begin(), sorted_arr.end());
    sorted_arr.erase(unique(sorted_arr.begin(), sorted_arr.end()), sorted_arr.end());
    
    vector<int> compressed;
    for (int x : arr) {
        int idx = lower_bound(sorted_arr.begin(), sorted_arr.end(), x) - sorted_arr.begin();
        compressed.push_back(idx);
    }
    return compressed;
}

int main() {
    vector<int> data = {50, 300, 200, 100, 300};
    vector<int> compressed = discretize_with_lower_bound(data);
    for (int x : compressed) cout << x << " ";  // 输出: 0 3 2 1 3
    return 0;
}

💡 lower_bound(sorted_arr.begin(), sorted_arr.end(), x) - sorted_arr.begin() 的作用：

lower_bound() 返回的是 x 在 sorted_arr 中的第一个不小于 x 的位置。
用 - sorted_arr.begin() 计算索引，即可获得离散化结果。

3. 离散化的应用场景

(1) 线段树中的坐标压缩

离散化可以将 大范围坐标映射到较小范围，减少空间占用。例如：

vector<int> y_coords; // 离散化后的 y 坐标

如果 y 坐标取值范围是 [1, 10^9]，但实际只有 100 个不同值，那么通过离散化，我们可以将 y 坐标转换为 [0, 99]，这样可以使用较小的线段树数组。

(2) 扫描线算法

离散化在 矩形覆盖问题（如计算矩形并集面积）中很常见，我们需要先对 y 坐标进行离散化，然后用扫描线 + 线段树处理。

(3) 二维树状数组 / 归并优化

在 二维树状数组（Fenwick Tree）或 归并优化的逆序对计算 中，离散化可以将数据映射到 [0, n-1]，避免 10^9 级别的大数影响效率。

4. 注意事项 & 总结

✅ 离散化的关键步骤：

收集数据：将所有需要离散化的数值存入 vector<int>。
排序去重：使用 sort() + unique() 获得去重后的有序数组。
建立映射关系：使用 unordered_map 或 lower_bound() 进行索引转换。
应用映射：将原数组转换为离散化索引。

✅ 注意点：

离散化适用于 数据范围大但数值种类少 的情况，否则可能会导致性能下降。
unordered_map 适用于频繁查询，lower_bound() 适用于一次性查询。
离散化后的数据一般范围较小，可用于 树状数组、线段树、扫描线等数据结构。

📌 典型的离散化代码模版：

sort(sorted_arr.begin(), sorted_arr.end());
sorted_arr.erase(unique(sorted_arr.begin(), sorted_arr.end()), sorted_arr.end());
int index = lower_bound(sorted_arr.begin(), sorted_arr.end(), x) - sorted_arr.begin();