显著性检验(Significance Test)

参考链接:Click Here

显著性检验(Significance Test)主要分为两个类别:

  • Statistical Significance Test (统计显著性检验)

    计量方式:p-value < 0.05

    目的:检验原始分布与目标分布之间是否具有显著差异性

  • Practical Significance Test (现实显著性检验)

    计量方式:effect size(cohen's d)(统计效应)

    目的:检验原始分布与目标分布之间的差异性有多大“

NLPStatTest: A Toolkit for Comparing NLP System Performance”中提出在NLP领域除了Statistical Significance,做Practical Significance也是有必要的

2.2.3 Effect Size Estimation

In most experimental NLP papers employing significance testing, the p-value is the only quantity reported. However, the p-value is often misused and misinterpreted. For instance, statistical significance is easily conflated with practical significance; as a result, NLP researchers often run significance tests to show that the performances of two NLP systems are different (i.e., statistical significance), without measuring the degree or the importance of such a difference (i.e., practical significance).

使用说明:

Statistical Significance Test (统计显著性检验):

python Statistical_significance.py file1 file2 0.05
import sys
import numpy as np
from scipy import stats### Normality Check
# H0: data is normally distributed
def normality_check(data_A, data_B, name, alpha):if(name=="Shapiro-Wilk"):# Shapiro-Wilk: Perform the Shapiro-Wilk test for normality.shapiro_results = stats.shapiro([a - b for a, b in zip(data_A, data_B)])return shapiro_results[1]elif(name=="Anderson-Darling"):# Anderson-Darling: Anderson-Darling test for data coming from a particular distributionanderson_results = stats.anderson([a - b for a, b in zip(data_A, data_B)], 'norm')sig_level = 2if(float(alpha) <= 0.01):sig_level = 4elif(float(alpha)>0.01 and float(alpha)<=0.025):sig_level = 3elif(float(alpha)>0.025 and float(alpha)<=0.05):sig_level = 2elif(float(alpha)>0.05 and float(alpha)<=0.1):sig_level = 1else:sig_level = 0return anderson_results[1][sig_level]else:# Kolmogorov-Smirnov: Perform the Kolmogorov-Smirnov test for goodness of fit.ks_results = stats.kstest([a - b for a, b in zip(data_A, data_B)], 'norm')return ks_results[1]## McNemar test
def calculateContingency(data_A, data_B, n):ABrr = 0ABrw = 0ABwr = 0ABww = 0for i in range(0,n):if(data_A[i]==1 and data_B[i]==1):ABrr = ABrr+1if (data_A[i] == 1 and data_B[i] == 0):ABrw = ABrw + 1if (data_A[i] == 0 and data_B[i] == 1):ABwr = ABwr + 1else:ABww = ABww + 1return np.array([[ABrr, ABrw], [ABwr, ABww]])def mcNemar(table):statistic = float(np.abs(table[0][1]-table[1][0]))**2/(table[1][0]+table[0][1])pval = 1-stats.chi2.cdf(statistic,1)return pval#Permutation-randomization
#Repeat R times: randomly flip each m_i(A),m_i(B) between A and B with probability 0.5, calculate delta(A,B).
# let r be the number of times that delta(A,B)<orig_delta(A,B)
# significance level: (r+1)/(R+1)
# Assume that larger value (metric) is better
def rand_permutation(data_A, data_B, n, R):delta_orig = float(sum([ x - y for x, y in zip(data_A, data_B)]))/nr = 0for x in range(0, R):temp_A = data_Atemp_B = data_Bsamples = [np.random.randint(1, 3) for i in xrange(n)] #which samples to swap without repetitionsswap_ind = [i for i, val in enumerate(samples) if val == 1]for ind in swap_ind:temp_B[ind], temp_A[ind] = temp_A[ind], temp_B[ind]delta = float(sum([ x - y for x, y in zip(temp_A, temp_B)]))/nif(delta<=delta_orig):r = r+1pval = float(r+1.0)/(R+1.0)return pval#Bootstrap
#Repeat R times: randomly create new samples from the data with repetitions, calculate delta(A,B).
# let r be the number of times that delta(A,B)<2*orig_delta(A,B). significance level: r/R
# This implementation follows the description in Berg-Kirkpatrick et al. (2012),
# "An Empirical Investigation of Statistical Significance in NLP".
def Bootstrap(data_A, data_B, n, R):delta_orig = float(sum([x - y for x, y in zip(data_A, data_B)])) / nr = 0for x in range(0, R):temp_A = []temp_B = []samples = np.random.randint(0,n,n) #which samples to add to the subsample with repetitionsfor samp in samples:temp_A.append(data_A[samp])temp_B.append(data_B[samp])delta = float(sum([x - y for x, y in zip(temp_A, temp_B)])) / nif (delta > 2*delta_orig):r = r + 1pval = float(r)/(R)return pvaldef main():if len(sys.argv) < 3:print("You did not give enough arguments\n ")sys.exit(1)filename_A = sys.argv[1]filename_B = sys.argv[2]alpha = sys.argv[3]with open(filename_A) as f:data_A = f.read().splitlines()with open(filename_B) as f:data_B = f.read().splitlines()data_A = list(map(float,data_A))data_B = list(map(float,data_B))print("\nPossible statistical tests: Shapiro-Wilk, Anderson-Darling, Kolmogorov-Smirnov, t-test, Wilcoxon, McNemar, Permutation, Bootstrap")name = input("\nEnter name of statistical test: ")### Normality Checkif(name=="Shapiro-Wilk" or name=="Anderson-Darling" or name=="Kolmogorov-Smirnov"):output = normality_check(data_A, data_B, name, alpha)if(float(output)>float(alpha)):answer = input("\nThe normal test is significant, would you like to perform a t-test for checking significance of difference between results? (Y\\N) ")if(answer=='Y'):# two sided t-testt_results = stats.ttest_rel(data_A, data_B)# correct for one sided testpval = t_results[1]/2if(float(pval)<=float(alpha)):print("\nTest result is significant with p-value: {}".format(pval))returnelse:print("\nTest result is not significant with p-value: {}".format(pval))returnelse:answer2 = input("\nWould you like to perform a different test (permutation or bootstrap)? If so enter name of test, otherwise type 'N' ")if(answer2=='N'):print("\nbye-bye")returnelse:name = answer2else:answer = input("\nThe normal test is not significant, would you like to perform a non-parametric test for checking significance of difference between results? (Y\\N) ")if (answer == 'Y'):answer2 = input("\nWhich test (Permutation or Bootstrap)? ")name = answer2else:print("\nbye-bye")return### Statistical tests# Paired Student's t-test: Calculate the T-test on TWO RELATED samples of scores, a and b. for one sided test we multiply p-value by halfif(name=="t-test"):t_results = stats.ttest_rel(data_A, data_B)# correct for one sided testpval = float(t_results[1]) / 2if (float(pval) <= float(alpha)):print("\nTest result is significant with p-value: {}".format(pval))returnelse:print("\nTest result is not significant with p-value: {}".format(pval))return# Wilcoxon: Calculate the Wilcoxon signed-rank test.if(name=="Wilcoxon"):wilcoxon_results = stats.wilcoxon(data_A, data_B)if (float(wilcoxon_results[1]) <= float(alpha)):print("\nTest result is significant with p-value: {}".format(wilcoxon_results[1]))returnelse:print("\nTest result is not significant with p-value: {}".format(wilcoxon_results[1]))returnif(name=="McNemar"):print("\nThis test requires the results to be binary : A[1, 0, 0, 1, ...], B[1, 0, 1, 1, ...] for success or failure on the i-th example.")f_obs = calculateContingency(data_A, data_B, len(data_A))mcnemar_results = mcNemar(f_obs)if (float(mcnemar_results) <= float(alpha)):print("\nTest result is significant with p-value: {}".format(mcnemar_results))returnelse:print("\nTest result is not significant with p-value: {}".format(mcnemar_results))returnif(name=="Permutation"):R = max(10000, int(len(data_A) * (1 / float(alpha))))pval = rand_permutation(data_A, data_B, len(data_A), R)if (float(pval) <= float(alpha)):print("\nTest result is significant with p-value: {}".format(pval))returnelse:print("\nTest result is not significant with p-value: {}".format(pval))returnif(name=="Bootstrap"):R = max(10000, int(len(data_A) * (1 / float(alpha))))pval = Bootstrap(data_A, data_B, len(data_A), R)if (float(pval) <= float(alpha)):print("\nTest result is significant with p-value: {}".format(pval))returnelse:print("\nTest result is not significant with p-value: {}".format(pval))returnelse:print("\nInvalid name of statistical test")sys.exit(1)if __name__ == "__main__":main()

Practical Significance Test (现实显著性检验):

python Practical_significance.py file1 file2
import sys
import numpy as np
from numpy import mean, std, sqrtdef read_data_from_file(file_name):with open(file_name, 'r', encoding='utf-8') as reader:data_file = []try:lines = reader.readlines()data_file = [float(line.strip()) for line in lines]except:print('Data format error, please check')if len(data_file) == 0:print('Empty file, exit')sys.exit(0)return data_filedef two_side_data_reader(file1_name, file2_name):data_file1 = read_data_from_file(file1_name)data_file2 = read_data_from_file(file2_name)return data_file1, data_file2def cal_cohen_d(data1, data2):def cohen_d(x, y):return (mean(x) - mean(y)) / sqrt((std(x) ** 2 + std(y) ** 2) / 2.0)mean1 = np.mean(data1)mean2 = np.mean(data2)# print(type(mean1))std1 = np.std(data1)std2 = np.std(data2)cohen = cohen_d(data1, data2)print('Data1 [mean:%.4f, std:%.4f]' % (mean1, std1))print('Data2 [mean:%.4f, std:%.4f]' % (mean2, std2))print("cohen's d value = %.4f" % (cohen))return cohenif __name__ == '__main__':file_1 = sys.argv[1]file_2 = sys.argv[2]data1, data2 = two_side_data_reader(file_1, file_2)res = cal_cohen_d(data1, data2)

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/56480.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

如何在树莓派上使用Nginx搭建本地站点并通过内网穿透实现远程访问

文章目录 1. Nginx安装2. 安装cpolar3.配置域名访问Nginx4. 固定域名访问5. 配置静态站点 安装 Nginx&#xff08;发音为“engine-x”&#xff09;可以将您的树莓派变成一个强大的 Web 服务器&#xff0c;可以用于托管网站或 Web 应用程序。相比其他 Web 服务器&#xff0c;Ngi…

时间和日期--Python

1. 时间&#xff1a;time模块 总结&#xff1a;2. datetime模块 相比与time模块&#xff0c;datetime模块的接口更直观、更容易调用 2.1 datetime模块定义的类 &#xff08;1&#xff09;datetime.date:表示日期的类。常用的属性有&#xff1a;year、month、day; &#xff…

九大ES5特性,巩固你的JavaScript基础

文章目录 1. 变量声明和作用域&#xff1a;使用 var 关键字声明变量函数作用域和全局作用域变量提升 2. 数据类型&#xff1a;基本数据类型&#xff1a;Number、String、Boolean、null、undefined引用数据类型&#xff1a;Object、Array、Function、Date 3. 函数&#xff1a;函…

vue中form、table和input标签过长

form标签过长 效果&#xff1a; 代码&#xff1a; <el-form-item v-for"(item,index) in ticketEditTable1" :label"item.fieldNameCn" :propitem.fieldName :key"item.fieldNameCn" overflow"":rules"form[item.fieldName…

Linux 中的 sysctl 命令及示例

介绍 Linux管理员使用该命令在运行时sysctl读取或修改内核参数。无需重新启动即可实时控制和修改网络、 I/O 操作和内存管理设置的选项对于高可用性系统至关重要。 了解如何使用该sysctl命令及其选项来动态调整系统性能。

Android中使用JT808协议进行车载终端通信的实现和优化

JT808是一种在中国广泛应用的车载终端通信协议&#xff0c;用于车辆与监控中心之间的数据通信。下面是关于Android平台上使用JT808协议进行通信的一般步骤和注意事项&#xff1a; 协议了解&#xff1a;首先&#xff0c;您需要详细了解JT808协议的规范和定义。该协议包含了通信消…

【halcon深度学习】图像分割数据集格式的转换

前言 目前用于**图像分割的**数据集&#xff0c;我目前接触到的用的比较多的有&#xff1a; 1 PASCAL VOC 2 COCO 3 YOLO 4 Halcon自己的格式&#xff08;其实就是Halcon字典类型&#xff09;当前我涉及到计算机视觉中的数据集格式有&#xff0c;PASCAL VOC、COCO 和 YOLO 用于…

软件设计师学习笔记3-CPU组成

目录 1.计算机结构 1.1计算机的外设与主机 1.2计算机各部分之间的联系(了解一下即可) 2.CPU结构 1.计算机结构 1.1计算机的外设与主机 1.2计算机各部分之间的联系(了解一下即可) 该图片来自希赛软考 注&#xff1a;黄色的是传递数据的数据总线&#xff0c;白色的是传递控…

表和Json的相互操作

目录 一、表转Json 1.使用 for json path 2.如何返回单个Json 3.如何给返回的Json增加一个根节点呢 4.如何给返回的Json增加上一个节点 二、对Json基本操作 1.判断给的字符串是否是Json格式 2.从 JSON 字符串中提取标量值 3. 从 JSON 字符串中提取对象或数组 4. 更…

【Linux】进程概念,轻松入门【下篇】

目录 1. 基本概念 2. 常见环境变量 常见环境变量指令 &#xff08;1. PATH &#xff08;2. HOME &#xff08;3. SHELL 3.环境变量的组织形式 &#xff08;1&#xff09;通过代码如何获取环境变量 &#xff08;2&#xff09;普通变量与环境变量的区别 &#xff08;3&…

206.Flink(一):flink概述,flink集群搭建,flink中执行任务,单节点、yarn运行模式,三种部署模式的具体实现

一、Flink概述 1.基本描述 Flink官网地址:Apache Flink — Stateful Computations over Data Streams | Apache Flink Flink是一个框架和分布式处理引擎,用于对无界和有界数据流进行有状态计算。 2.有界流和无界流 无界流(流): 有定义流的开始,没有定义结束。会无休止…

[当前就业]2023年8月25日-计算机视觉就业现状分析

计算机视觉就业现状分析 前言&#xff1a;超越YOLO&#xff1a;计算机视觉市场蓬勃发展 如今&#xff0c;YOLO&#xff08;You Only Look Once&#xff09;新版本的发布周期很快&#xff0c;每次迭代的性能都优于其前身。每 3 到 4 个月就会推出一个升级版 YOLO 变体&#xf…

数据通信——RIP协议

一&#xff0c;实验背景 你们公司又订购了一批设备&#xff0c;你以为还要为新员工设计静态路由&#xff0c;结果领导说&#xff0c;不是有动态路由吗&#xff1f;用动态路由&#xff0c;就用什么R的那个。“垃圾RIP&#xff0c;用RIP还不如静态&#xff0c;RIP缺点太多&#x…

25-非父子通信 - event bus 事件总线

作用:非父子组件之间&#xff0c;进行简易消息传递。(复杂场景 -> Vuex) 1.创建一个都能访问到的事件总线(空 Vue 实例) -> utils/EventBus.js import Vue from vue const Bus new Vue() export default Bus 2. A组件(接收方),监听 Bus实例 的事件(支持多个组件同时监…

openGL glew示例代码分析绘制一个三角形

openGL文档 > docs.gl &#xff0c;可以直接查询函数的定义和使用 #include <iostream> #include <string> #include <GL/glew.h> #include <GLFW/glfw3.h>int main(void) {GLFWwindow* window;/* Initialize the library */if (!glfwInit())retu…

SSL核心概念 SSL类型级别

SSL&#xff1a;SSL&#xff08;Secure Sockets Layer&#xff09;即安全套接层&#xff0c;及其继任者传输层安全&#xff08;Transport Layer Security&#xff0c;TLS&#xff09;是为网络通信提供安全及数据完整性的一种安全协议。TLS与SSL在传输层对网络连接进行加密。 H…

pymysql的使用

pymysql的使用 1 驱动 MySQL基于TCP协议之上开发&#xff0c;但是网络连接后&#xff0c;传输的数据必须遵循MySQL的协议。 封装好MySQL协议的包&#xff0c;就是驱动程序。 MySQL的驱动&#xff1a; MySQLdb: 最有名的库。对MySQL的C Client封装实现&#xff0c;支持Python…

基于PreparedStatement抓取带参最终SQL(oracle,mysql,PostgreSQL等通用)

前言 很多抓取最终SQL的方法&#xff0c;都是带着?的。比如&#xff1a; SELECT value from sys_param where name?我们现在想把 &#xff1f; 给去掉。有什么办法呢 方法1 编写工具类 &#xff08;该方法有些情况下是不适用的&#xff0c;比如oracle数据库&#xff0c;该…

探索内网穿透工具:实现局域网SQL Server数据库的公网远程访问方法

文章目录 1.前言2.本地安装和设置SQL Server2.1 SQL Server下载2.2 SQL Server本地连接测试2.3 Cpolar内网穿透的下载和安装2.3 Cpolar内网穿透的注册 3.本地网页发布3.1 Cpolar云端设置3.2 Cpolar本地设置 4.公网访问测试5.结语 1.前言 数据库的重要性相信大家都有所了解&…

【mysql事务隔离级别】事务隔离级别(面试高频考点!)

目录 什么是事务的隔离级别&#xff1f; 常见的四种事务隔离级别 读未提交 读已提交 为什么没有解决幻读问题&#xff0c;是怎么导致的&#xff1f; 可重复读 串行化 什么是事务的隔离级别&#xff1f; 事务的隔离级别&#xff08;Isolation Level&#xff09;指的是在并…