看到一个帖子《excel吧-数据分组问题》,对一组数据分成4组,使每组的和值相近
目录
- 代码思路
- 1,分组形式、可分组数
- 代码1
- 代码2
- 代码2举例
- 2,数组所有分组形式
- 举例
- 这个问题可以转化为2步:第1步,获取一组数据的所有分组形式;第2步,对所有分组形式计算其方差,方差最小的则是和值最相近的一组
- 本文为第1步,获取一组数据的所有分组形式
代码思路
- n个元素分成m组,每组元素个数最小值为
1
,最大值为n-m+1
,可以通过组合获取所有分组形式 - 所有元素进行分组,即组合问题,4组组合数相乘就是一种分组形式的分组数(注意:因为组合不区分顺序,因此当分组内组合的指数为1时,不管底数是多少,分组数都为1)。通过观察上图,可以发现9种元素分成4组,有6种分组形式共18480种分组
- 有了分组形式和分组数,那就可以获取每种分组形式中的每个分组元素组成
- 函数调用:以下代码调用了《Excel·VBA数组冒泡排序函数》
bubble_sort
函数,《Excel·VBA数组组合函数、组合求和》combin_arr1
函数(如需使用代码需复制)
1,分组形式、可分组数
有2种代码及结果输出形式,主要使用第2种
代码1
Function 可分组数(ByVal n&, ByVal m&, Optional ByVal mode& = 1)'计算分组成不重复的组数,可选择最终返回组数,和每格内含元素个数的二维数组(从1开始计数)'n元素个数;m需要分成几组;mode为1时返回组数,为2时返回二维数组(组数行*m列)Dim arr, brr, crr, drr, x&, y&, i&, j&, t, tt, a, b, d, s, bb, k, krr, resReDim arr(1 To n - m + 1), brr(1 To n - m + 1) '组合法计算组数,最大值为n - m + 1x = n - m + 1: arr(1) = 1: brr(1) = m - 1 'arr元素个数,brr重复次数If m = 1 ThenIf mode = 1 Then可分组数 = 1: Exit FunctionElseIf mode = 2 ThenReDim res(1 To 1, 1 To 1): res(1, 1) = n: 可分组数 = res: Exit FunctionEnd IfEnd IfFor i = 2 To x '每个数字各最多需要的数量arr(i) = i: t = n \ i: tt = n / i '整除、除,判断是否相等If t = tt And t = m Then '整除,且正好分配为m组brr(i) = tElseFor j = t To 1 Step -1a = i * j + (m - j) '数字i有j个,其余为1,判断和是否<=nIf a <= n Then brr(i) = j: Exit ForNextEnd IfNexts = WorksheetFunction.Sum(brr): ReDim crr(1 To s)For i = x To 1 Step -1 '倒序、正序平均分组都在最后For j = 1 To brr(i)y = y + 1: crr(y) = arr(i) '所有数字按个数写入一个数组NextNext'对数组crr选m个进行组合,获取和值为n,且组合形式唯一的所有组合Dim dict As Object: Set dict = CreateObject("scripting.dictionary"): x = 0drr = combin_arr1(crr, m) '调用函数返回组合,一维嵌套数组For Each d In drr '遍历组合,和值等于n;再降序排序,写入字典s = WorksheetFunction.Sum(d)If s = n Then b = bubble_sort(d, "-"): bb = Join(b, "+"): dict(bb) = ""Next'对符合条件的组合形式,计算分成m组的组数,以及每种组合形式的组数For Each k In dict.keyskrr = Split(k, "+"): s = n: y = 1For i = 0 To m - 1 '分组中只有1个元素的无所谓顺序,排除If krr(i) > 1 Then y = y * Application.Combin(s, krr(i)): s = s - krr(i)Nextdict(k) = y: x = x + y 'y每种组合形式的组数,x总组数NextIf mode = 1 Then '输出结果可分组数 = xElseIf mode = 2 ThenReDim res(1 To x, 1 To m): i = 0For Each k In dict.keyskrr = Split(k, "+")For y = 1 To dict(k) '重复写入dict(k)行krr数组i = i + 1For j = 0 To m - 1res(i, j + 1) = krr(j)NextNextNext可分组数 = resEnd If
End Function
代码2
Function 可分组数2(ByVal n&, ByVal m&, Optional ByVal mode& = 1)'计算分组成不重复的组数,可选择最终返回总组数,或每种组合形式的组数的二维数组(从1开始计数)'n元素个数;m需要分成几组;mode为1时返回组数,为2时返回二维数组,1列组合形式1列组数Dim arr, brr, crr, drr, x&, y&, i&, j&, t, tt, a, b, d, s, bb, k, resReDim arr(1 To n - m + 1), brr(1 To n - m + 1) '组合法计算组数,最大值为n - m + 1x = n - m + 1: arr(1) = 1: brr(1) = m - 1 'arr元素个数,brr重复次数If m = 1 Or n = m ThenIf mode = 1 Then可分组数2 = 1ElseIf mode = 2 ThenReDim res(1 To 1, 1 To 2): res(1, 2) = 1res(1, 1) = WorksheetFunction.Rept("1", m): 可分组数2 = resEnd IfExit FunctionEnd IfFor i = 2 To x '每个数字各最多需要的数量arr(i) = i: t = n \ i: tt = n / i '整除、除,判断是否相等If t = tt And t = m Then '整除,且正好分配为m组brr(i) = tElseFor j = t To 1 Step -1a = i * j + (m - j) '数字i有j个,其余为1,判断和是否<=nIf a <= n Then brr(i) = j: Exit ForNextEnd IfNexts = WorksheetFunction.Sum(brr): ReDim crr(1 To s)For i = x To 1 Step -1 '倒序、正序平均分组都在最后For j = 1 To brr(i)y = y + 1: crr(y) = arr(i) '所有数字按个数写入一个数组NextNext'对数组crr选m个进行组合,获取和值为n,且组合形式唯一的所有组合Dim dict As Object: Set dict = CreateObject("scripting.dictionary"): x = 0drr = combin_arr1(crr, m) '调用函数返回组合,一维嵌套数组For Each d In drr '遍历组合,和值等于n;再降序排序,写入字典s = WorksheetFunction.Sum(d)If s = n Then b = bubble_sort(d, "-"): bb = Join(b, "+"): dict(bb) = ""Next'对符合条件的组合形式,计算分成m组的组数,以及每种组合形式的组数For Each k In dict.keyskrr = Split(k, "+"): s = n: y = 1For i = 0 To m - 1 '分组中只有1个元素的无所谓顺序,排除If krr(i) > 1 Then y = y * Application.Combin(s, krr(i)): s = s - krr(i)Nextdict(k) = y: x = x + y 'y每种组合形式的组数,x总组数NextIf mode = 1 Then '输出结果可分组数2 = xElseIf mode = 2 ThenReDim res(1 To dict.Count, 1 To 2): i = 0For Each k In dict.keysi = i + 1: res(i, 1) = k: res(i, 2) = dict(k)Next可分组数2 = resEnd If
End Function
代码2举例
Sub 可分组数2举例()arr = 可分组数2(9, 4, 2)If IsArray(arr) Then[a1].Resize(UBound(arr), UBound(arr, 2)) = arrElseDebug.Print arrEnd If
End Sub
生成的分组形式和分组数都和手工计算一致
代码1的输出结果是上图A列每行按"+"号拆分成4列及重复对应B列数字行数,最终生成结果为18480行*4列
2,数组所有分组形式
- 为方便后续计算方差,返回结果有分组和值和分组字符串2种形式。可以先调用函数获取和值计算方差及对应的行号,再调用函数获取字符串组成形式,输出行号对应的结果
- 为减少计算量,
last_row
参数可以控制是计算所有分组形式,还是仅计算后x行分组形式。因为brr数组越后面元素分布越均匀,当需要计算方差的数组数值之间差异较小时,last_row
较小则可以更快计算出结果;而如果数值差异较大的,可以适当增大last_row
以便计算正确的结果;last_row
等于0时,计算所有分组形式
Function 数组分组(ByVal data_arr, ByVal m&, Optional ByVal mode& = 1, Optional ByVal last_row& = 1)'对数组data_arr分为m组,结果返回二维数组(n行*m列),每列为和值/组成元素(数组从1开始计数)'data_arr元素数组;m需要分成几组;mode为1时返回和值,为2时返回字符串'为减少计算量,因为brr数组越后面元素分布越均匀,故last_row参数仅对brr数组的后last_row行进行分组Dim arr, brr, br, srr, sr, a, n&, i&, j&, x&, y&, r&, rr&, c&, t&, w&, res, trr, temp, s&ReDim arr(1 To 1000)If mode <> 1 And mode <> 2 Then Debug.Print "参数错误": Exit FunctionFor Each a In data_arr '多行多列的,按列从左往右读取,排除空值If Len(a) Then i = i + 1: arr(i) = aNextn = i: ReDim Preserve arr(1 To n): brr = 可分组数2(n, m, 2)If last_row > 0 And last_row < UBound(brr) Then 'last_row为2即仅计算brr数组后2行;为0则全部计算ReDim br(1 To last_row, 1 To 2)For i = 1 To last_rowbr(i, 1) = brr(i + UBound(brr) - last_row, 1): br(i, 2) = brr(i + UBound(brr) - last_row, 2)Nextbrr = brEnd Ifx = WorksheetFunction.Sum(Application.Index(brr, , 2))ReDim srr(1 To UBound(brr), 1 To m), sr(1 To UBound(brr), 1 To m)For i = 1 To UBound(brr) 'brr第1列转为数组temp = Split(brr(i, 1), "+"): t = brr(i, 2): s = nFor j = 1 To msrr(i, j) = temp(j - 1)NextFor j = 1 To m '计算重复次数If srr(i, j) > 1 Thent = t \ Application.Combin(s, srr(i, j)): sr(i, j) = t: s = s - srr(i, j)Elsesr(i, j) = 1End IfNextNexti = 1: r = 0: c = 1: rr = 0: ReDim res(1 To x, 1 To m)DoDo While c = 1 '第1列赋值crr = combin_arr1(arr, srr(i, c)): t = sr(i, c) '重复写入t次For Each a In crrFor j = 1 To tr = r + 1: res(r, c) = aNextNextIf i < UBound(brr) Then i = i + 1 Else Exit DoLoopi = 1: r = 1: rr = 0: c = 2: ReDim temp(1 To n) '除第1列的其他列,按列赋值Dots = "": y = 0 'trr数组记录剩余元素,temp临时数组For j = 1 To c - 1ts = ts & "++" & Join(res(r, j), "++") & "++"NextFor Each a In arr '排除前一列已使用元素,且前后+号避免部分重复元素被找到aa = "+" & CStr(a) & "+"If InStr(ts, aa) = 0 Theny = y + 1: temp(y) = aElsets = Replace(ts, aa, "", , 1)End IfNextReDim trr(1 To y)For j = 1 To y 'trr数组更新元素,且转换格式,否则导致求和错误trr(j) = CDbl(temp(j))NextIf c <> m Thencrr = combin_arr1(trr, srr(i, c)): w = 可分组数2(y, m - c + 1)If w = 1 Then '只赋值第1个,避免c递增后出错res(r, c) = crr(1): rr = rr + 1Elset = sr(i, c): r = r - 1For Each a In crrFor j = 1 To tr = r + 1: res(r, c) = a: rr = rr + 1NextNextEnd IfElseres(r, c) = trr: rr = rr + 1 '最后一列直接赋值,只有1组End Ifr = r + 1 '下一行If rr >= brr(i, 2) Then rr = 0: i = i + 1 'brr一行循环结束,进入下一轮If i > UBound(brr) Then i = 1: r = 1: c = c + 1Loop Until c > mLoop Until r = 1 '所有写入完成后,r=1If mode = 1 Then '返回结果,求和模式For i = 1 To xFor j = 1 To mres(i, j) = WorksheetFunction.Sum(res(i, j))NextNextElse '字符串模式For i = 1 To xFor j = 1 To mres(i, j) = Join(res(i, j), "+")NextNextEnd If数组分组 = res
End Function
举例
Sub 数组分组举例()tm = Timerarr = Array(1, 2, 3, 4, 5, 6, 7, 8, 9): a = 数组分组(arr, 4, 1, 0)[a1].Resize(UBound(a), UBound(a, 2)) = aDebug.Print "累计用时" & Format(Timer - tm, "0.00") '耗时
End Sub
mode
参数为1,last_row
参数为0,求和模式、输出所有分组形式(以下为部分截图)
mode
参数为2,last_row
参数为0,字符串模式、输出所有分组形式(以下为部分截图)
测试结果 | 9个元素分成4组 | 10个元素分成4组 |
---|---|---|
总分组数 | 18480 | 88110 |
耗时秒数 | 6.34 | 26.57 |