【CSP】202403-1词频统计

文章目录

算法思路
1. 数据结构选择
2. 输入处理
3. 统计出现的文章数
4. 输出结果

代码示例
代码优化

在这里插入图片描述
样例输入

4 3
5 1 2 3 2 1
1 1
3 2 2 2
2 3 2

样例输出

2 3
3 6
2 2

算法思路

1. 数据结构选择

vector<int>：用于存储每篇文章的单词列表（可能包含重复）。
unordered_set<int>：用于统计每篇文章中出现的不同单词（自动去重）。
两个统计数组：
- totalCount[i]：记录单词i在所有文章中的总出现次数。
- articleCount[i]：记录单词i出现在多少篇文章中。

2. 输入处理

读取文章数n和单词上限m：确定处理范围。
逐篇处理文章：
- 读取文章长度l。
- 读取l个单词，并存入words数组。
- 遍历words数组，累加每个单词的总出现次数到totalCount。

3. 统计出现的文章数

使用集合去重：
- 将words数组中的单词存入unordered_set，自动去除重复。
- 遍历集合中的每个单词，将其对应的articleCount加 1（每篇文章只统计一次）。

4. 输出结果

按单词编号1到m的顺序，输出每个单词的articleCount和totalCount。

代码示例

#include<iostream>
#include<vector>
#include<unordered_set>
using namespace std;int main(){int n,m;//n篇文章，单词编号上限m cin>>n>>m; vector<int> totalCount(m+1,0);//单词i在文章中的总出现次数vector<int> articleCount(m+1,0);//单词i出现在多少篇文章中//遍历每一篇文章for(int i=0;i<n;i++){ int l;//当前文章的单词数量 cin>>l; //存储当前文章的所有单词vector<int> words(l); for(int j=0;j<l;++j){cin>>words[j];//读取每个单词//更新总出现次数，每出现一次就加1totalCount[words[j]]++; }//使用集合统计当前文章中出现的不同单词（自动去重）unordered_set<int> seen;for(int word:words){seen.insert(word);//插入集合自动去重 } //遍历集合中的单词，统计出现的文章数for(int word:seen){articleCount[word]++;//每篇文章只算一次 } }//输出结果：按单词编号1到m依次输出for(int i=1;i<=m;++i){cout<<articleCount[i]<<" "<<totalCount[i]<<endl;} return 0;
}

代码优化

减少不必要的vector存储

原代码中使用vector<int> words(l)来存储每篇文章的所有单词，实际上可以直接在读取单词时进行统计，无需额外存储，这样可以减少内存使用。

减少集合的插入操作

在统计文章中出现的不同单词时，可以在读取单词时判断是否已经在集合中，避免不必要的插入操作

【代码示例】

#include <iostream>
#include <vector>
#include <unordered_set>
using namespace std;int main() {int n, m;cin >> n >> m;vector<int> totalCount(m + 1, 0);   // 总出现次数（1-based）vector<int> articleCount(m + 1, 0); // 出现的文章数（1-based）for (int i = 0; i < n; ++i) {int l;cin >> l;  // 读取文章长度unordered_set<int> seen;for (int j = 0; j < l; ++j) {int word;cin >> word;totalCount[word]++;  // 累加总次数if (seen.find(word) == seen.end()) {seen.insert(word);articleCount[word]++;  // 如果是第一次出现，更新文章数}}}// 输出结果（1-based）for (int i = 1; i <= m; ++i) {cout << articleCount[i] << " " << totalCount[i] << endl;}return 0;
}