[算法沉淀记录] 分治法应用 —— 二分搜索(Binary Search)

分治法应用 —— 二分搜索

算法基本思想

二分搜索（Binary Search）是一种在有序数组中查找特定元素的高效算法。它每次将搜索区间减半，从而快速地缩小搜索范围。二分搜索的基本思想是：首先将待查关键字与数组中间位置的关键字比较，由于数组是有序的，所以一次比较就可以确定待查关键字是在中间位置的左边还是右边，然后只在相应的区域内继续搜索，直到找到为止，或者确定找不到为止。

算法步骤

二分搜索的算法步骤如下：

初始化：设置两个指针，low指向数组的起始位置，high指向数组的最后一个位置。
查找中间元素：计算中间位置mid = (low + high) / 2。
比较中间元素：
- 如果中间元素等于目标值，则搜索成功，返回中间位置的索引。
- 如果中间元素小于目标值，则将low设置为mid + 1，表示目标值在中间位置的右侧。
- 如果中间元素大于目标值，则将high设置为mid - 1，表示目标值在中间位置的左侧。
重复步骤2和3：直到low大于high，表示搜索区间为空，目标值不存在于数组中。

伪代码描述

function binarySearch(arr, target):low = 0high = length(arr) - 1while low <= high:mid = (low + high) / 2if arr[mid] == target:return midelif arr[mid] < target:low = mid + 1else:high = mid - 1return -1  // 目标值不存在于数组中

二分搜索算法是一种高效的查找算法，特别适用于有序数组。下面是二分搜索算法的一些优缺点：

优点

高效的时间复杂度：二分搜索算法的时间复杂度为O(log n)，这意味着即使在很大的数据集中，它也能快速地找到目标元素。
较少的比较次数：相比线性搜索，二分搜索每次迭代都将搜索区间减半，因此需要比较的次数大大减少。
空间复杂度低：二分搜索算法只需要常数级别的额外空间来存储几个变量，因此它的空间复杂度为O(1)。
适用于静态或有序数组：对于静态数据集或已经排序的数据，二分搜索是一个理想的选择。

缺点

数据必须有序：二分搜索算法的前提是数据必须是有序的。如果数据未排序，那么首先需要进行排序，这会增加额外的复杂度和时间成本。
不适合数据频繁变动：如果数据集合经常变化，那么每次变化后可能都需要重新排序，这会导致维护成本增加。
内存访问模式不友好：二分搜索可能导致非顺序的内存访问模式，这可能会影响缓存性能，尤其是在大数据集上。
不适合小数据集：对于小数据集，二分搜索的优势不明显，甚至可能不如线性搜索高效，因为二分搜索的常数因子和递归开销在小数据集上更为显著。
实现复杂度：虽然二分搜索的基本思想简单，但实现时需要注意很多细节，如整数溢出、循环终止条件等，这些都可能导致算法出错。
不适合所有查找问题：二分搜索适用于精确查找，但如果需要查找的是近似值或者需要查找多个满足条件的元素，可能需要其他更合适的算法。

二分搜索算法在有序数组中查找特定元素时非常高效，但其高效性依赖于数据已排序的前提。在实际应用中，需要根据数据的特点和问题的需求来选择合适的查找算法。

应用场景

二分搜索算法由于其高效性，在许多开发场景中都有应用。以下是一些常见的应用场景：

搜索算法：
- 在有序数组或数据结构中查找特定的元素。
- 在有序列表中查找某个值的索引或位置。
数值计算：
- 在计算机科学和工程中，寻找某个方程的根或满足特定条件的数值解。
- 在动态规划中寻找最优解的边界条件。
游戏开发：
- 在游戏中计算玩家的排名或分数位于哪个百分比。
- 在游戏中动态调整难度级别，根据玩家的表现找到合适的挑战水平。
数据分析和统计：
- 在大量有序数据中查找中位数、百分位数或其他统计指标。
- 在时间序列数据中查找特定时间点的数据值。
排序算法的优化：
- 在归并排序中，用于合并两个已排序的子数组。
- 在快速排序中，用于选择合适的枢轴元素。
算法竞赛和面试题：
- 在编程竞赛和算法面试中，二分搜索是常见的问题解决工具。
- 用于解决各种变种的二分搜索问题，如查找最接近的元素、查找元素的边界等。
资源分配和调度：
- 在资源分配问题中，如云计算中的负载均衡，找到合适的资源分配点。
- 在任务调度中，根据任务的优先级和截止时间找到最优的调度顺序。
缓存和数据库索引：
- 在数据库中，利用二分搜索在索引中快速定位记录。
- 在缓存淘汰策略中，如LRU缓存，快速查找和更新缓存条目。
网络协议：
- 在网络协议中，如TCP拥塞控制，用于动态调整发送速率。
人工智能和机器学习：
- 在机器学习算法中，如支持向量机（SVM），用于寻找最优的超平面。
  二分搜索算法的这些应用场景都利用了其在有序数据中快速查找特定元素的特性。在实际开发中，根据具体情况选择合适的二分搜索变种，可以大大提高程序的效率和性能。

时间复杂度分析

最坏情况

最坏情况下，目标元素不在数组中，或者目标元素是数组的最后一个元素。在这种情况下，每次迭代都会将搜索区间减半，直到区间为空。设数组长度为n，那么最坏情况下二分搜索需要进行(log2(n))次迭代。因此，最坏情况下的时间复杂度是O(log n)。

最佳情况

最佳情况下，目标元素恰好在数组的中间位置。在这种情况下，只需要一次迭代就可以找到目标元素。因此，最佳情况下的时间复杂度是(O(1))。

平均情况

在平均情况下，目标元素可能在数组的任何位置。每次迭代都会将搜索区间减半，直到找到目标元素。假设目标元素出现在数组中的任何位置的概率是相等的，那么平均情况下的时间复杂度也是(O(log(n)))。

空间复杂度分析

二分搜索算法只需要常数级别的额外空间来存储几个变量，如low、high和mid等。这些变量的空间需求不随输入数组的大小而变化，因此二分搜索的空间复杂度是(O(1))。

证明

时间复杂度证明

我们可以使用数学归纳法来证明二分搜索的时间复杂度。
基础情况：
当n=1时，只需要一次比较，时间复杂度为O(1)，符合O(log n)。
归纳假设：
假设当数组长度为k时，二分搜索的时间复杂度为O(log k)。
归纳步骤：
当数组长度为k+1时，第一次迭代将数组分为两半，每半的长度为k/2（向上取整）。根据归纳假设，对每一半进行二分搜索的时间复杂度是O(log(k/2))。因此，总的时间复杂度是2 * O(log(k/2))，这可以简化为O(log k + log 2) = O(log k + 1) = O(log k)。由于log k是k的增长速度的较小的常数倍，所以O(log k)也符合O(log(k+1))。
由数学归纳法可知，二分搜索的时间复杂度是O(log n)。

空间复杂度证明

二分搜索算法使用的额外空间包括几个整型变量和循环中的临时空间。这些空间的需求不随输入数组的大小而变化，因此空间复杂度是常数级别的，即O(1)。
综上所述，二分搜索算法的时间复杂度在最坏情况下是O(log n)，在最佳情况下是O(1)，在平均情况下也是O(log n)。空间复杂度是O(1)。这些结论都是基于对算法迭代过程的数学分析和归纳证明得出的。

代码实现

Python 实现

def binary_search(arr, target):low = 0high = len(arr) - 1while low <= high:mid = low + (high - low) // 2if arr[mid] == target:return midelif arr[mid] < target:low = mid + 1else:high = mid - 1return -1

C++ 模板实现

// Define a template binary_search_function:
template <typename Iterator, typename T>
Iterator binarySearch(Iterator begin, Iterator end, const T &key)
{const Iterator NotFound = end;while (begin < end){const Iterator Middle = begin + (distance(begin, end) / 2);if (*Middle == key){return Middle;}else if (*Middle > key){end = Middle;}else{begin = Middle + 1;}}return NotFound;
}

另一个版本：

int binarySearch(const vector<T>& arr, T target) {int low = 0;int high = arr.size() - 1;while (low <= high) {int mid = low + (high - low) / 2;  // 防止溢出if (arr[mid] == target) {return mid;} else if (arr[mid] < target) {low = mid + 1;} else {high = mid - 1;}}return -1;  // 目标值不存在于数组中
}

扩展阅读

二分搜索算法的时间复杂度已经是O(log n)，这是一个非常高效的算法。但是，在一些特殊情况下，还有几种常见的优化方法和变种：

优化时间复杂度

避免重复计算：
- 在计算中点时，可以直接使用low + (high - low) / 2而不是(low + high) / 2，这样可以避免整数溢出的问题。
尾递归优化：
- 如果二分搜索是通过递归来实现的，那么可以通过尾递归优化来减少栈空间的使用。
循环不变量：
- 使用循环不变量来维护搜索区间，这样可以减少不必要的比较和分支判断。

一个疑虑：既然有二分搜索，那么为什么没有多分搜索？

多分搜索这个概念并不是没有，但从理论和实践的角度来看，二分搜索已经是效率非常高的搜索算法，它能够在对数时间复杂度内找到目标元素。多分搜索，即每次将搜索区间分成多于两部分的搜索，理论上也是可能的，但在实际应用中并不常见，原因如下：

效率提升有限：二分搜索每次将搜索区间减半，已经是一个非常快的搜索方式。多分搜索可能每次将区间分成更多部分，但并不一定能显著提升搜索效率。在大多数情况下，二分搜索的O(log n)时间复杂度已经足够高效。
实现复杂性：二分搜索的实现相对简单，只需要维护两个指针即可。而多分搜索的实现会更加复杂，需要维护多个子区间的指针和状态，这会增加代码的复杂度和出错的可能性。
缓存不友好：多分搜索可能导致非顺序的内存访问模式，这可能会影响缓存性能。相比之下，二分搜索的顺序访问模式更加符合现代计算机系统的缓存机制。
适用场景有限：二分搜索适用于有序数组或能够转化为有序的搜索问题。多分搜索可能在某些特定的问题上有用，但这些场景相对罕见，且往往可以通过其他更有效的算法来解决。
递归开销：如果多分搜索采用递归实现，那么每次分割都会产生额外的递归调用开销。这可能会导致递归深度增加，从而增加栈空间的使用。
概率分布问题：在多分搜索中，如何合理地分割搜索区间是一个问题。理想的分割应该是基于数据的概率分布，但这样的信息通常不可用，因此很难设计出一个普遍适用的多分搜索算法。
综上所述，虽然多分搜索在理论上是可能的，但在实际应用中，二分搜索已经能够满足大多数场景的需求，且具有更好的普适性、简单性和效率。因此，多分搜索并没有成为主流的搜索算法。

完整的项目代码

Python 代码

def binary_search(arr, target):low = 0high = len(arr) - 1while low <= high:mid = low + (high - low) // 2if arr[mid] == target:return midelif arr[mid] < target:low = mid + 1else:high = mid - 1return -1arr = [1, 3, 5, 7, 9, 11]
target = 7
result = binary_search(arr, target)if result != -1:print(f"Element found at index {result}")
else:print("Element not found in the array")

C++ 代码

#include <list>
#include <array>
#include <algorithm>
#include <functional>
#include <iostream>
#include <iterator>
#include <numeric>
#include <vector>using namespace std;class Person
{
public:Person() = default;~Person() = default;Person(string name, int age, int score){this->name = name;this->age = age;this->socre = score;}// Override the operator> for other function to use.bool operator>(const Person &other) const{// Compare the socre of two Person objects.return this->socre > other.socre;}// Override the operator< for other function to use.bool operator<(const Person &other) const{// Compare the socre of two Person objects.return this->socre < other.socre;}// Override the operator== for other function to use.bool operator==(const Person &other) const{// Compare the socre, age and name of two Person objects.return this->socre == other.socre &&this->age == other.age &&this->name == other.name;}// Override the operator!= for other function to use.bool operator!=(const Person &other) const{// Compare the socre, age and name of two Person objects.return this->socre != other.socre ||this->age != other.age ||this->name != other.name;}// Override the operator<= for other fnction to use.bool operator<=(const Person &other) const{// Compare the socre, age and name of two Person objects.return this->socre <= other.socre &&this->age <= other.age &&this->name <= other.name;}// Override the operator>= for other function to use.bool operator>=(const Person &other) const{// Compare the socre, age and name of two Person objects.return this->socre >= other.socre &&this->age >= other.age &&this->name >= other.name;}// Now there are some get parameters function for this calss:const string &getName() const { return this->name; }int getAge() const { return this->age; }int getScore() const { return this->socre; }private:string name;int age;int socre;
};// Return whether modulus of elem1 is less than modulus of elem2
bool mod_lesser(int elem1, int elem2)
{// If elem1 is negative, make it positiveif (elem1 < 0)elem1 = -elem1;// If elem2 is negative, make it positiveif (elem2 < 0)elem2 = -elem2;// Return whether elem1 is less than elem2return elem1 < elem2;
}// This is a absolute value comparison function.
template <typename T>
bool compareAbsoluteValue(T elem1, T elem2)
{return abs(elem1) < abs(elem2);
}/*** @brief This case if copy from the link :* @link https://learn.microsoft.com/zh-cn/cpp/standard-library/algorithm-functions?view=msvc-170#binary_search @endlink*/
void MicosoftSTLCase()
{using namespace std;// create a list of intslist<int> List1;// insert values into the listList1.push_back(50);List1.push_back(10);List1.push_back(30);List1.push_back(20);List1.push_back(25);List1.push_back(5);// sort the listList1.sort();// output the sorted listcout << "List1 = ( ";for (auto Iter : List1)cout << Iter << " ";cout << ")" << endl;// default binary search for 10if (binary_search(List1.begin(), List1.end(), 10))cout << "There is an element in list List1 with a value equal to 10."<< endl;elsecout << "There is no element in list List1 with a value equal to 10."<< endl;// a binary_search under the binary predicate greaterList1.sort(greater<int>());if (binary_search(List1.begin(), List1.end(), 10, greater<int>()))cout << "There is an element in list List1 with a value greater than 10 "<< "under greater than." << endl;elsecout << "No element in list List1 with a value greater than 10 "<< "under greater than." << endl;// a binary_search under the user-defined binary predicate mod_lesser// create a vector of intsvector<int> v1;// insert values into the vectorfor (auto i = -2; i <= 4; ++i){v1.push_back(i);}// sort the vectorsort(v1.begin(), v1.end(), mod_lesser);// output the sorted vectorcout << "Ordered using mod_lesser, vector v1 = ( ";for (auto Iter : v1)cout << Iter << " ";cout << ")" << endl;// binary search for -3if (binary_search(v1.begin(), v1.end(), -3, mod_lesser))cout << "There is an element with a value equivalent to -3 "<< "under mod_lesser." << endl;elsecout << "There is not an element with a value equivalent to -3 "<< "under mod_lesser." << endl;// The extra section uses a custom absolute value comparison function flow.// create a vector of doublesvector<double> v2;for (auto i = -2.5; i <= 4.5; i += 0.5){v2.push_back(i);}// sort the vectorsort(v2.begin(), v2.end(), compareAbsoluteValue<double>);// output the sorted vectorcout << "Ordered using compareAbsoluteValue<double>, vector v2 = ( ";for (auto Iter : v2)cout << Iter << " ";cout << ")" << endl;// binary search for 0.0if (binary_search(v2.begin(), v2.end(), 0.0, compareAbsoluteValue<double>))cout << "There is an element with a value equivalent to 0.0 "<< "under compareAbsoluteValue<double>." << endl;elsecout << "There is not an element with a value equivalent to 0.0 "<< "under compareAbsoluteValue<double>." << endl;
}/*** @brief This case if copy from the link :* @link https://en.cppreference.com/w/cpp/algorithm/binary_search @endlink*/
void cppRefCase()
{vector<int> haystack{1, 3, 4, 5, 9};vector<int> needles{1, 2, 3};for (const auto needle : needles){cout << "Searching for " << needle << '\n';if (binary_search(haystack.begin(), haystack.end(), needle))cout << "Found " << needle << '\n';elsecout << "No dice!\n";}
}bool myfunction(int i, int j) { return (i < j); }/*** @brief This case if copy from the link :* @link https://cplusplus.com/reference/algorithm/binary_search/ @endlink*/
void cppPlusCase()
{int myints[] = {1, 2, 3, 4, 5, 4, 3, 2, 1};vector<int> v(myints, myints + 9); // 1 2 3 4 5 4 3 2 1// using default comparison:sort(v.begin(), v.end());cout << "looking for a 3... ";if (binary_search(v.begin(), v.end(), 3))cout << "found!\n";elsecout << "not found.\n";// using myfunction as comp:sort(v.begin(), v.end(), myfunction);cout << "looking for a 6... ";if (binary_search(v.begin(), v.end(), 6, myfunction))cout << "found!\n";elsecout << "not found.\n";
}// Define a template binary_search_function:
template <typename Iterator, typename T>
Iterator binarySearch(Iterator begin, Iterator end, const T &key)
{const Iterator NotFound = end;while (begin < end){const Iterator Middle = begin + (distance(begin, end) / 2);if (*Middle == key){return Middle;}else if (*Middle > key){end = Middle;}else{begin = Middle + 1;}}return NotFound;
}void templateSearchCase()
{vector<int> haystack{1, 3, 4, 5, 9};vector<int> needles{1, 2, 3};const auto &hayStackEnd = haystack.end();for (const auto needle : needles){cout << "Searching for " << needle << '\n';if (binarySearch<vector<int>::iterator, int>(haystack.begin(), hayStackEnd, needle) != hayStackEnd)cout << "Found " << needle << '\n';elsecout << "No dice!\n";}vector<double> dHaystack{1.1, 1.3, 1.4, 1.5, 1.9, 2.0, 2.2, 2.3, 3.0, 3.14};vector<double> dNeedles{1.1, 1.3, 1.5, 1.9, 2.0, 3.0, 3.14, 4.0};const auto &dHayStackEnd = dHaystack.end();for (const auto needle : dNeedles){cout << "Searching for " << needle << '\n';if (binarySearch<vector<double>::iterator, double>(dHaystack.begin(), dHayStackEnd, needle) != dHayStackEnd)cout << "Found " << needle << '\n';elsecout << "No dice!\n";}vector<Person> pHaystack{Person{"Jonah", 20, 75}, Person{"Albert", 45, 88}, Person{"Brenda", 60, 96}, Person{"Christina", 18, 110}, Person{"David", 19, 95}};vector<Person> pNeedles{Person{"Albert", 45, 88}, Person{"Christina", 18, 110}, Person{"David", 88, 95}, Person{"Erica", 50, 91}};const auto &pHayStackEnd = pHaystack.end();for (const auto &needle : pNeedles){cout << "Searching for " << needle.getName() << '\n';if (binarySearch<vector<Person>::iterator, Person>(pHaystack.begin(), pHaystack.end(), needle) != pHaystack.end())cout << "Found " << needle.getName() << '\t' << needle.getAge() << '\t' << needle.getScore() << '\n';elsecout << "No dice!\n";}
}template <typename T>
int binarySearchVec(const vector<T> &arr, T target)
{int low = 0;int high = arr.size() - 1;while (low <= high){int mid = low + (high - low) / 2; // 防止溢出if (arr[mid] == target){return mid;}else if (arr[mid] < target){low = mid + 1;}else{high = mid - 1;}}return -1;
}int main(int argc, char *argv[])
{MicosoftSTLCase();cppRefCase();cppPlusCase();templateSearchCase();return 0;
}