实现一个基于相等性比较的 GroupBy
Intro
在我们的系统里有些数据可能会有问题,数据源头不在我们这里,数据不好修复,在做 GroupBy
的时候就会很痛苦,默认的 group by 会依赖于 HashCode
,而某些场景下 HashCode
可能并不太大做统一,所以扩展了一个不依赖 HashCode
,只需要考虑相等性比较的一个 GroupBy
Sample
我们有下面这样的一些数据
var students = new StudentResult[]
{new() { StudentName = "Ming", CourseName = "Chinese", Score = 80, },new(){StudentId = 1, StudentName = "Ming", CourseName = "English", Score = 60,},new(){StudentId = 2, StudentName = "Mike", CourseName = "English", Score = 70,},new() { StudentId = 1, CourseName = "Math", Score = 100, },new(){StudentName = "Mike", CourseName = "Chinese", Score = 60,},
};
这些数据是一些学生成绩,但是学生的信息不全,学生信息可能有 Id,可能有 Name,假设每个学生的 Id 和 Name 都是唯一的,不会重复,将上面的信息按学生分组并获取每个学生的总分数,你会怎么实现呢?
Implement
默认的实现依赖于 HashCode
,实现源码可以参考文末链接,而多个字段的 HashCode
比较难以统一,所以就想着自己扩展 GroupBy
,实现代码如下:
GroupBy
的返回值是 IEnumerable<IGrouping<TKey, T>>
,默认的 Grouping
的 Add
方法是 internal
的
我们先自定义一个简单 IGrouping
,实现代码如下:
private sealed class Grouping<TKey, T> : IGrouping<TKey, T>
{private readonly List<T> _items = new();public Grouping(TKey key) => Key = key ?? throw new ArgumentNullException(nameof(key));public TKey Key { get; }public void Add(T t) => _items.Add(t);public int Count => _items.Count;public IEnumerator<T> GetEnumerator(){return _items.GetEnumerator();}IEnumerator IEnumerable.GetEnumerator(){return GetEnumerator();}
}
接着来实现我们的按相等性比较的 GroupBy
,实现如下:
public static IEnumerable<IGrouping<TKey, T>> GroupByEquality<T, TKey>(this IEnumerable<T> source,Func<T, TKey> keySelector,Func<TKey, TKey, bool> comparer)
{var groups = new List<Grouping<TKey, T>>();foreach (var item in source){var key = keySelector(item);var group = groups.FirstOrDefault(x => comparer(x.Key, key));if (group is null){group = new Grouping<TKey, T>(key);group.List.Add(item);groups.Add(group);}else{keyAction?.Invoke(group.Key, item);group.List.Add(item);}}return groups;
}
我们来测试一下我们的 GroupBy
,测试代码:
var groups = students.GroupByEquality(x => new Student() { Id = x.StudentId, Name = x.StudentName },(s1, s2) => s1.Id == s2.Id || s1.Name == s2.Name, (k, x) =>{if (k.Id <= 0 && x.StudentId > 0){k.Id = x.StudentId;}if (k.Name.IsNullOrEmpty() && x.StudentName.IsNotNullOrEmpty()){k.Name = x.StudentName;}});
foreach (var group in groups)
{Console.WriteLine("-------------------------------------");Console.WriteLine($"{group.Key.Id} {group.Key.Name}, Total score: {group.Sum(x => x.Score)}");foreach (var result in group){Console.WriteLine($"{result.StudentId} {result.StudentName}\n{result.CourseName} {result.Score}");}
}
输出结果如下:
可以看到前面的数据分成了两组,但是可以看到的数据里仍然是信息不全的,我们可以稍微改进一下上面的方法,修改后如下:
public static IEnumerable<IGrouping<TKey, T>> GroupByEquality<T, TKey>(this IEnumerable<T> source,Func<T, TKey> keySelector,Func<TKey, TKey, bool> comparer,Action<TKey, T>? keyAction = null, Action<T, TKey>? itemAction = null)
{var groups = new List<Grouping<TKey, T>>();foreach (var item in source){var key = keySelector(item);var group = groups.FirstOrDefault(x => comparer(x.Key, key));if (group is null){group = new Grouping<TKey, T>(key){item};groups.Add(group);}else{keyAction?.Invoke(group.Key, item);group.Add(item);}}if (itemAction != null){foreach (var group in groups.Where(g => g.Count > 1)){foreach (var item in group)itemAction.Invoke(item, group.Key);}}return groups;
}
增加了一个 itemAction
,这里加了一个 group count 大于 1 的条件,因为只有一个元素的时候,key 一定是来自这个元素不需要更新,所以加了一个条件,再来修改一下我们调用的示例:
var groups = students.GroupByEquality(x => new Student() { Id = x.StudentId, Name = x.StudentName },(s1, s2) => s1.Id == s2.Id || s1.Name == s2.Name, (k, x) =>{if (k.Id <= 0 && x.StudentId > 0){k.Id = x.StudentId;}if (k.Name.IsNullOrEmpty() && x.StudentName.IsNotNullOrEmpty()){k.Name = x.StudentName;}}, (x, k) =>{if (k.Id > 0 && x.StudentId <= 0){x.StudentId = k.Id;}if (k.Name.IsNotNullOrEmpty() && x.StudentName.IsNullOrEmpty()){x.StudentName = k.Name;}});
foreach (var group in groups)
{Console.WriteLine("-------------------------------------");Console.WriteLine($"{group.Key.Id} {group.Key.Name}, Total score: {group.Sum(x => x.Score)}");foreach (var result in group){Console.WriteLine($"{result.StudentId} {result.StudentName}\n{result.CourseName} {result.Score}");}
}
增加了 itemAction
,在最后将 key 的信息再同步回 group 内的各个数据,此时我们再来运行一下我们的示例,结果如下:
可以看到现在我们的数据就都有 Id 和 Name 了~~
More
我们也可以增加一个 IEqualityComparer
的重载来支持自定义的 comparer
public static IEnumerable<IGrouping<TKey, T>> GroupByEquality<T, TKey>(this IEnumerable<T> source,Func<T, TKey> keySelector,IEqualityComparer<TKey> keyComparer,Action<TKey, T>? keyAction = null, Action<T, TKey>? itemAction = null) where TKey : notnull
{return GroupByEquality(source, keySelector, keyComparer.Equals, keyAction, itemAction);
}
References
https://github.com/dotnet/runtime/blob/main/src/libraries/System.Linq/src/System/Linq/Grouping.cs
https://github.com/dotnet/runtime/blob/main/src/libraries/System.Linq/src/System/Linq/Lookup.cs
https://github.com/WeihanLi/WeihanLi.Common/blob/05ba92b5439bfa8623ae9b3133bf78daf4a8f6b4/src/WeihanLi.Common/Extensions/EnumerableExtension.cs#L275
https://github.com/WeihanLi/WeihanLi.Common/blob/dev/samples/DotNetCoreSample/GroupByEqualitySample.cs#L10