chimerge算法matlab实现,有监督的卡方分箱算法

实现代码

import numpy as np

import pandas as pd

from collections import Counter

def chimerge(data, attr, label, max_intervals):

distinct_vals = sorted(set(data[attr])) # Sort the distinct values

labels = sorted(set(data[label])) # Get all possible labels

empty_count = {l: 0 for l in labels} # A helper function for padding the Counter()

intervals = [[distinct_vals[i], distinct_vals[i]] for i in range(len(distinct_vals))] # Initialize the intervals for each attribute

while len(intervals) > max_intervals: # While loop

chi = []

for i in range(len(intervals)-1):

# Calculate the Chi2 value

obs0 = data[data[attr].between(intervals[i][0], intervals[i][1])]

obs1 = data[data[attr].between(intervals[i+1][0], intervals[i+1][1])]

total = len(obs0) + len(obs1)

count_0 = np.array([v for i, v in {**empty_count, **Counter(obs0[label])}.items()])

count_1 = np.array([v for i, v in {**empty_count, **Counter(obs1[label])}.items()])

count_total = count_0 + count_1

expected_0 = count_total*sum(count_0)/total

expected_1 = count_total*sum(count_1)/total

chi_ = (count_0 - expected_0)**2/expected_0 + (count_1 - expected_1)**2/expected_1

chi_ = np.nan_to_num(chi_) # Deal with the zero counts

chi.append(sum(chi_)) # Finally do the summation for Chi2

min_chi = min(chi) # Find the minimal Chi2 for current iteration

for i, v in enumerate(chi):

if v == min_chi:

min_chi_index = i # Find the index of the interval to be merged

break

new_intervals = [] # Prepare for the merged new data array

skip = False

done = False

for i in range(len(intervals)):

if skip:

skip = False

continue

if i == min_chi_index and not done: # Merge the intervals

t = intervals[i] + intervals[i+1]

new_intervals.append([min(t), max(t)])

skip = True

done = True

else:

new_intervals.append(intervals[i])

intervals = new_intervals

for i in intervals:

print(‘[‘, i[0], ‘,‘, i[1], ‘]‘, sep=‘‘)

使用例子

iris = pd.read_csv(‘http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data‘, header=None)

iris.columns = [‘sepal_l‘, ‘sepal_w‘, ‘petal_l‘, ‘petal_w‘, ‘type‘]

for attr in [‘sepal_l‘, ‘sepal_w‘, ‘petal_l‘, ‘petal_w‘]:

print(‘Interval for‘, attr)

chimerge(data=iris, attr=attr, label=‘type‘, max_intervals=3)

结果:

原文：https://www.cnblogs.com/hichens/p/13585854.html

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/556446.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！

chimerge算法matlab实现,有监督的卡方分箱算法

相关文章

金士顿u盘真假软件_简洁轻巧金士顿DT80 Type-C高速闪存盘评测

php阴影效果,如何使用css3实现文字的单阴影效果和多重阴影效果（

SpringAOP+自定义注解实现日志功能

promise链式调用_这一次，彻底弄懂 Promise

用MATLAB绘制国债NSS模型,[matlab]用lsqcurvefit或lsqnonlin实现NSS利率期限模型-经管之家官网！...

Spring 自定义注解，配置简单日志注解

visual studio 判断dropdownlist选的是什么_心理测试：五个小蓝人，你选哪个？测你是不是一个容易追求的人...

java中为何输出框会无限输出,MyBatis启动时控制台无限输出日志的原因及解决办法...

基于注解SpringAOP，AfterReturning，Before，Aroundspringboot工程 @Around 简单的使用SpringBoot:AOP 自定义注解实现日志管理

流浪地球开机动画包zip_【文娱热点】流浪地球2定档2023大年初一；迪士尼计划裁员32000人...

matlab读气象数据,中国气象数据网

spring中自定义注解(annotation)与AOP中获取注解___使用aspectj的@Around注解实现用户操作和操作结果日志

php获取40001,php - Discord API错误＃40001未经授权 - SO中文参考 - www.soinside.com

python 编译器pyc_有没有办法知道哪个Python版本.pyc文件被编译？

php可以支持代码重用技术的命令,Linux下的编程 PHP高级技巧全放送(一)

python处理word表格格式_python---word表格样式设置

Spring AOP——Spring 中面向切面编程

python getchar,Linux C编程学习：getchar()和getch()

python 折线图中文乱码_彻底解决 Python画图中文乱码问题--Pyplotz组件

SpringBoot的AOP是默认开启的，不需要加注解@EnableAspectJAutoProxy____听说SpringAOP 有坑？那就来踩一踩