大数据的left

大数据的left_join

刷算法题：

第一遍：1.看5分钟，没思路看题解

2.通过题解改进自己的解法，并且要写每行的注释以及自己的思路。

3.思考自己做到了题解的哪一步，下次怎么才能做对(总结方法)

4.整理到自己的自媒体平台。

5.再刷重复的类似的题目，根据时间和任务安排刷哪几个板块

6.用c++语言都刷过一遍了就刷中等

一.题目

用c++写一个函数，内存有限，给了两个表格，怎么实现left join？（滑动窗口法）两个表都无法直接加载到内存里，并且给出两个二维vector代表文件即可

二、反思

1.自己的解法

#include <iostream>
#include <vector>
#include <unordered_map>
#include <string>// 定义一行数据类型
using Row = std::vector<std::string>;// 滑动窗口方式实现 LEFT JOIN
std::vector<Row> leftJoin(const std::vector<Row>& table1, const std::vector<Row>& table2, size_t joinCol1, size_t joinCol2, size_t chunkSize) {std::vector<Row> result; // 存储最终的结果size_t table1Size = table1.size();size_t table2Size = table2.size();for (size_t i = 0; i < table1Size; i += chunkSize) {size_t end = std::min(i + chunkSize, table1Size);// 将当前窗口的数据加载到内存中std::vector<Row> table1Chunk(table1.begin() + i, table1.begin() + end);// 构建右表的哈希索引std::unordered_multimap<std::string, Row> table2Map;for (const auto& row : table2) {if (joinCol2 < row.size()) {table2Map.emplace(row[joinCol2], row);}}// 对当前窗口的左表进行匹配for (const auto& row1 : table1Chunk) {bool matched = false;if (joinCol1 < row1.size()) {auto range = table2Map.equal_range(row1[joinCol1]);for (auto it = range.first; it != range.second; ++it) {Row joinedRow = row1; // 复制左表当前行joinedRow.insert(joinedRow.end(), it->second.begin(), it->second.end());result.push_back(joinedRow);matched = true;}}// 如果没有匹配项，则补空if (!matched) {Row joinedRow = row1;joinedRow.resize(row1.size() + table2[0].size(), "NULL");result.push_back(joinedRow);}}}return result;
}int main() {// 示例表格1（左表）std::vector<Row> table1 = {{"1", "Alice"},{"2", "Bob"},{"3", "Charlie"}};// 示例表格2（右表）std::vector<Row> table2 = {{"1", "New York"},{"2", "Los Angeles"}};// 左表的连接列和右表的连接列索引size_t joinCol1 = 0;size_t joinCol2 = 0;// 设置块大小size_t chunkSize = 2;// 执行 LEFT JOINauto result = leftJoin(table1, table2, joinCol1, joinCol2, chunkSize);// 输出结果for (const auto& row : result) {for (const auto& col : row) {std::cout << col << "\t";}std::cout << "\n";}return 0;
}