衡量试卷难度信度_我们可以通过数字来衡量语言难度吗?

衡量试卷难度信度

Without a doubt, the world is “growing smaller” in terms of our access to people and content from other countries and cultures. Even the COVID-19 pandemic, which has curtailed international travel, has led to increasing virtual interaction via the internet. Yet the barriers to fluent and proficient inter-language communications remain formidable.

毫无疑问,就我们接触其他国家和文化的人们和内容而言,世界正在“变得越来越小”。 即使是减少国际旅行的COVID-19大流行,也导致通过互联网增加虚拟互动。 然而,流畅和熟练的跨语言交流的障碍仍然巨大。

在线翻译与语言学习 (Online Translation Versus Language Learning)

The quality of machine translation has improved dramatically in recent years, thanks to the introduction of Artificial Intelligence methods such as neural networks to the task. The AI-driven optimization of translating has trickled down rapidly to consumer apps like Google Translate and Microsoft Translator, which simplify the usage of machine translators and improve the ability to convey meaning across linguistic frontiers.

近年来,由于将诸如神经网络之类的人工智能方法引入任务,机器翻译的质量得到了显着改善。 人工智能驱动的翻译优化已Swift渗透到Google Translate和Microsoft Translator等消费类应用程序,这些应用程序简化了机器翻译的使用,并提高了跨语言边界传达含义的能力。

There’s a huge difference between translating a language via software and learning a new language. For most adults, learning a new language is hard. But some people love linguistic challenges: for them, the hardest languages to learn may be the most enjoyable to conquer. The neuroplasticity of young brains, of course, makes new language acquisition a relative snap for children. But few adults have it so easy.

通过软件翻译语言和学习新语言之间存在巨大差异。 对于大多数成年人来说,学习新语言非常困难。 但是有些人喜欢语言方面的挑战:对他们而言,最难学习的语言可能是最难克服的语言 。 当然,年轻大脑的神经可塑性使新语言习得成为儿童的一个相对习惯。 但是很少有成年人这么容易。

在线语言学习及其挑战 (Online Language Learning and Its Challenges)

Online language learning, now a $582 billion/year industry according to the ICEF, has made more convenient and easier the learning of a new language for millions. English language learning accounts for most of this total. While the popularity of English may not be surprising — it is the most spoken language and the main language of business worldwide — proficient English speakers are branching out to additional languages at a rapid pace.

根据ICEF的统计 ,在线语言学习现在的年产值为 5820亿美元,它使数百万的新语言学习更加便捷。 英语学习占总数的大部分。 尽管英语的流行并不令人惊讶-它是全球最常用的语言和主要业务语言-但是精通英语的人正在Swift向其他语言扩展。

Rosetta Stone, a leading provider of language courses, reports that Spanish topped the list of languages that British people were most eager to take on in 2018, with 23.1% of its UK learners learning the language last year. Four other European languages — French, English, Italian and German — rounded off the top five. Perhaps surprisingly, Mandarin Chinese, the most popular native language, with more than a million, was not in the next tier.

领先的语言课程提供商Rosetta Stone报告说,西班牙语是英国人最渴望在2018年采用的语言,其英国学习者中有23.1%的人去年学习了该语言。 排名前五位的还有其他四种欧洲语言-法语,英语,意大利语和德语。 也许令人惊讶的是,拥有超过一百万种语言的最受欢迎的母语-普通话不在下一级。

No doubt the perception of that language’s difficulty played a role in its relatively low popularity ranking. Mandarin Chinese, poses major hardships for a non-Chinese speaker. And yet more than a 1.1 billion speaking, read, write and understand it fluently. So is it really hard? Or is it just unfamiliar to an English speaker? The question raises a major challenge: isn’t the perception of difficulty a totally relative matter, differing to some degree for each language learner, depending on background and education.

毫无疑问,对这种语言的困难的认识在其相对较低的流行度排名中起作用。 中文普通话给不讲中文的人带来很大的困难。 流利的说,读,写和理解能力超过11亿。 那真的很难吗? 还是只是不熟悉说英语的人? 这个问题提出了一个重大挑战:对困难的理解不是一个完全相对的问题吗,取决于背景和教育程度,每种语言学习者在一定程度上有所不同。

The challenge that faces a data scientist, of course, is how can language difficulty be measured. If we wish to split hairs, there is a distinction between the difficulty of learning a language and its inherent difficulty of usage. But for purposes of this article, we will focus on the task of evaluating a way to measure a language’s degree of difficulty, if we can borrow a term from the language of gymnastics and other competitive sports.

当然,数据科学家面临的挑战是如何衡量语言难度。 如果我们希望分开头发,则在学习语言的难度和其固有的使用难度之间会有区别。 但是出于本文的目的,如果我们可以从体操和其他竞技体育的语言中借用一个术语,则我们将专注于评估一种衡量语言难度的方法的任务。

方法A:向外交部咨询 (Approach A: Ask the Foreign Service)

Nearly a decade ago, Voxy posted an infographic (shown below), sourced from the Foreign Service Institute, which breaks language difficulty for native English speakers into three neat categories: easy, medium, and hard. The basis for comparison was how long — in terms of calendar weeks and learning hours, attaining “proficiency” would be required for different languages. The site did qualify its findings by noting that difficulty depended on language complexity, how close it was to the learner’s own language (in this case, English), how many learning hours per week, and the language resources available. It appears from the chart that the basic assumption of 25 hours of learning per week.

大约十年前,Voxy发布了一个图表(如下所示),该图表来自外交事务学院,该指南将以英语为母语的人的语言难度分为三个简单的类别:简单,中等和困难。 比较的基础是多长时间-就日历周和学习时间而言,不同语言需要达到“熟练”水平。 该站点通过指出难度取决于语言的复杂性,与学习者自己的语言的距离(在这种情况下为英语),每周学习多少小时以及可用的语言资源,来验证其发现。 从图表中可以看出,每周学习25个小时的基本假设。

  • Easy (22–23 weeks, 575–600 class hours): The Romance Languages (Spanish, Portuguese, French, Italian, and Romanian) all fell in this group, along with Dutch, Afrikaans, Norwegian and Swedish

    轻松学习 (22-23周,575-600学时):浪漫语言(西班牙语,葡萄牙语,法语,意大利语和罗马尼亚语)以及荷兰语,南非荷兰语,挪威语和瑞典语都属于这一类

  • Medium (44 weeks, 1110 class hours): Russian, Polish, Serbian, Finnish, Thai and Vietnamese, Greek, Hebrew, and Hindi.

    中级 (44周,1110学时):俄语,波兰语,塞尔维亚语,芬兰语,泰语和越南语,希腊语,希伯来语和北印度语。

  • Hard (88 weeks, 2220 class hours): Chinese, Japanese, Korean, Arabic

    辛苦 (88周,2220课时):中文,日语,韩语,阿拉伯语

While Voxy clearly intends the chart to be a teaching tool or subject of discussion, it’s not hard to pick apart weaknesses in its analytical method. First, who is to set the bar of “proficiency”? And how to measure the quality of instruction? How to account for factor-like second-language knowledge? For a data scientist, the results would appear disappointingly arbitrary.

尽管Voxy明确希望该图表成为教学工具或讨论的主题,但不难发现其分析方法中的缺点。 首先,谁来设定“熟练”标准? 以及如何衡量教学质量? 如何解释类​​似因子的第二语言知识? 对于数据科学家来说,结果似乎是令人失望的任意。

Image for post
Voxy on Voxy摄, What Are The Hardest Languages To Learn?最难学习的语言是什么?

方法B:评分语言学习难度:多种语言的方法 (Approach B: Scoring Language Learning Difficulty: A Polyglot’s Approach)

A more intriguing approach to the problem, at least from a data science perspective, is offered by linguist Michael Campbell at Glossika. In a detailed blog post aptly titled “Language Difficulty,” he devised a scoring system for answering, numerically, the precise questions which intrigue us:

至少从数据科学的角度来看,语言学家Michael Campbell在Glossika上提供了一种更有趣的方法。 在一个恰当的标题为“语言难度”的详细博客文章中,他设计了一种评分系统,以数字方式回答引起我们注意的精确问题:

  1. Is there an objective method for measuring language difficulty?

    是否有客观的方法来衡量语言难度?
  2. What are the most difficult languages in the world?

    世界上最困难的语言是什么?

Distinguishing Campbell’s approach is its relativistic data-based approach. Language difficulty is based on the relative similarity between any two languages according to various criteria of linguistic complexity. Perhaps counter-intuitively, this approach actually makes an objective assessment of language learning difficulty possible, because it is based on numerical criteria that can be objectively assessed. Among the criteria he offers are:

区分坎贝尔的方法是其相对论的基于数据的方法。 语言难度是根据语言复杂性的各种标准,基于任何两种语言之间的相对相似性。 也许与直觉相反,该方法实际上使对语言学习难度的客观评估成为可能,因为它基于可以客观评估的数字标准。 他提供的标准包括:

词汇习得 (Vocabulary Acquisition)

This he considered with respect to how close the language is to the learner’s language.

他考虑到语言与学习者语言之间的接近程度。

Languages are divided into families, branches, and sub-branches. For example, English belongs to the Indo-European Proto-language, to which languages like Russian, Armenia, and Greek all belong. By contrast, Arabic, Chinese, and Japanese would be in a different family. Within the Indo-European grouping, that branch, English is a Germanic-Romance language, therefore closer to languages like German and French. In terms of similarity, English is closest in any way to German, despite grammatical differences. Similarly, Portuguese, Spanish and Italian would belong to the same sub-branch, making language-learning easier. Campbell assigns high importance to this criterion, with language-learning difficulty reflected in exponentially higher numbers. Same sub-branch branch: 0 points. Different sub-branch: 1 point. Different branches: 10 points. Different family: 100 points.

语言分为家庭,分支和分支。 例如,英语属于印欧语系的原始语言,俄语,亚美尼亚和希腊语等语言均属于该语言。 相比之下,阿拉伯文,中文和日文将属于另一个家庭。 在该分支的印欧语组中,英语是日耳曼语-罗曼斯语,因此更接近德语和法语。 就相似性而言,尽管在语法上有所不同,但英语在任何方面都与德语最接近。 同样,葡萄牙语,西班牙语和意大利语将属于同一分支机构,从而使语言学习更加容易。 坎贝尔(Campbell)对该标准给予了高度重视,语言学习的困难程度以指数级的高反映出来。 同一个分支分支:0分。 不同的支行:1分。 不同的分支机构:10分。 不同的家庭:100分。

流利的语法和语法 (Syntax and Grammar for Fluency)

Campbell, a linguist by profession. broke down into a list of factors, such as

坎贝尔,专业语言学家。 分为一系列因素,例如

  • Language type

    语言类型
  • Subject-Verb-Object order

    主语-宾语-宾语顺序
  • Adjective-Noun order

    形容词-名词顺序
  • Genitive (possessor) — Noun order

    属格(宾语)—名词顺序
  • Determiner-Noun order

    确定者名词顺序
  • Relative (clause) — Noun order

    相对(从句)-名词顺序
  • Noun Declension

    名词变格
  • Tenses

    时态
  • Conjugation

    共轭
  • Adposition

    定位

For each of these criteria, Campbell assigns 1 point plus or minus if there is a difference between languages. The results of his calculation are rendered in a matrix:

对于这些标准中的每一个,如果语言之间存在差异,则Campbell会为其分配正负1点。 他的计算结果呈现在一个矩阵中:

Image for post
Matrix矩阵 derived from 来自 The Glossika BlogThe Glossika Blog

By comparing rows in this matrix, he can assign a score to the syntactical and grammatical differences between two languages and thus the difficulty of learning from a given language. The difficulty score for a German speaker learning French would be 6 points, a Japanese speaker learning Spanish 13 points, and a Chinese speaker learning Polish a whopping 34 points.

通过比较此矩阵中的行,他可以为两种语言之间的句法和语法差异分配分数,从而为从给定语言学习的难度分配分数。 如果说德语的人说法语,那么他的难度得分将是6分;如果说日语的人说西班牙语,那么他的难度得分将是13分;如果说波兰语的话,中国人的难度得分将会高达34分。

音韵流利 (Phonology for Fluency)

Campbell’s calculations account for the difference in total phonemes (written sounds) and allophones (the sounds people say), considering 12 points of articulation and the number of vowels and intonations.

坎贝尔的计算考虑了12个发音点以及元音和语调的数量,从而说明了总音素(书面声音)和同音素(人们说的声音)之间的差异。

Image for post
Matrix 矩阵derived from 来自 The Glossika BlogThe Glossika Blog

According to this matrix, comparing rows enables you to calculate language difficulty as related to these phonological criteria. The difficulty score for a German speaker learning French would be 1 point, a Japanese speaker learning Spanish 11 points, and a Chinese speaker learning Polish a whopping 15 points.

根据此矩阵,比较行使您能够计算与这些语音标准相关的语言难度。 如果说德语的人会说法语,那么他的难度系数将是1分;如果说日语的人说西班牙语,那么难度系数将是11分;而如果说波兰语的话,汉语学习者的难度得分将达到15分。

Data scientists will note that the scores assigned for various parameters are arbitrary and subjective, but there is merit in the attempt to break down degrees of difficulty into component factors.

数据科学家将注意到,为各种参数分配的分数是任意的和主观的,但尝试将难度分为要素也有好处。

For example, for an English speaker, the following are the score assignments according to language family:

例如,对于说英语的人,以下是根据语言族的分数分配:

Image for post
Matrix矩阵 derived from 来自 The Glossika BlogThe Glossika Blog

It is hard to reconcile a 0 score in German (So einfach ist das?) with a score of 5 in French or Spanish. And is Georgian really 10 times harder to acquire vocabulary than Polish? So the specific enumeration is certainly open to fine-turning, though the method is intriguing — if a bit rough around the edges.

很难用德语( So einfach ist das? )的0分数与法语或西班牙语的5分数进行协调。 格鲁吉亚语的词汇获取真的比波兰语难10倍吗? 因此,尽管该方法很有趣,但是具体的枚举当然可以进行微调-如果边缘有些粗糙。

最后的推算:乌比赫有什么独特之处? (The Final Reckoning: What’s Unique About Ubykh?)

His 2016 article concluded with a list of some of the most difficult languages. He mentioned, in this connection, the Romany language of European gypsies, which are not even written down, and Sentinelese, the language of the Pacific island where wannabe visitors are killed on arrival, polysynthetic languages like Greenlandic, and Ubykh, with no less than 84 consonants. Honorable mention goes to Bella Coola, a language is only written down by linguists to record the grammar.

他在2016年的文章中总结了一些最困难的语言。 在这方面,他提到了甚至没有写下来的欧洲吉普赛人的罗曼语和太平洋岛屿上的塞纳蒂莱斯(Stinetineles)语言,那里是想要来访的游客被杀死的语言,包括格陵兰语和乌比克语等多合成语言,其中不少于84个辅音 值得一提的是贝拉·库拉(Bella Coola),该语言仅由语言学家写下才能记录语法。

Two years later, Campbell wrote a follow-up piece applying his scoring system and setting it against the FSI rankings.

两年后,坎贝尔撰写了一篇后续文章,运用了他的计分系统并将其与FSI排名进行比较。

Image for post
Matrix矩阵 derived from 来自 The Glossika BlogThe Glossika Blog

Non-linguists may be nonplussed by the dismissive way the author chalks up Thai, Vietnamese, Turkish and Finnish as “easy” — except, he hastens to say, for their utterly unfamiliar vocabularies. He confesses surprise that, per his ranking system, Korean beats out Taiwanese in difficulty. But he credits Ubykh, an extinct Circassian language, as leaving even Korean in the dust.

笔者将泰国,越南语,土耳其语和芬兰语归为“轻松”,这是不屑一顾的方式,这使非语言学家不为所动。但他不得不说,因为他们完全不熟悉这些词汇。 他承认,按照他的排名系统,韩国人在台湾方面的困难胜过台湾人。 但他认为,已故的切尔克斯语乌比赫语,甚至使朝鲜人也陷入了尘土。

Here you can learn Ubykh numbers and listen to a tale of futility that should appeal to every data scientist — in any language.

在这里,您可以学习Ubykh数字并聆听一个徒劳无益的故事,该故事应该以任何一种语言吸引每位数据科学家。

翻译自: https://towardsdatascience.com/can-we-measure-language-difficulty-by-the-numbers-3d591396934c

衡量试卷难度信度

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/390692.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Linux 题目总结

守护进程的工作就是打开一个端口,并且等待(Listen)进入连接。 如果客户端发起一个连接请求,守护进程就创建(Fork)一个子进程响应这个连接,而主进程继续监听其他的服务请求。 xinetd能够同时监听…

《精通Spring4.X企业应用开发实战》读后感第二章

一、配置Maven\tomcat https://www.cnblogs.com/Miracle-Maker/articles/6476687.html https://www.cnblogs.com/Knowledge-has-no-limit/p/7240585.html 二、创建数据库表 DROP DATABASE IF EXISTS sampledb; CREATE DATABASE sampledb DEFAULT CHARACTER SET utf8; USE sampl…

视图可视化 后台_如何在单视图中可视化复杂的多层主题

视图可视化 后台Sometimes a dataset can tell many stories. Trying to show them all in a single visualization is great, but can be too much of a good thing. How do you avoid information overload without oversimplification?有时数据集可以讲述许多故事。 试图在…

一步一步构建自己的管理系统①

2019独角兽企业重金招聘Python工程师标准>>> 系统肯定要先选一个基础框架。 还算比较熟悉Spring. 就选Spring boot postgres mybatis. 前端用Angular. 开始搭开发环境,开在window上整的。 到时候再放到服务器上。 自己也去整了个小服务器,…

python边玩边学_边听边学数据科学

python边玩边学Podcasts are a fun way to learn new stuff about the topics you like. Podcast hosts have to find a way to explain complex ideas in simple terms because no one would understand them otherwise 🙂 In this article I present a few episod…

react css多个变量_如何使用CSS变量和React上下文创建主题引擎

react css多个变量CSS variables are really cool. You can use them for a lot of things, like applying themes in your application with ease. CSS变量真的很棒。 您可以将它们用于很多事情,例如轻松地在应用程序中应用主题。 In this tutorial Ill show you …

vue 自定义 移动端筛选条件

1.创建组件 components/FilterBar/FilterBar.vue <template><div class"filterbar" :style"{top: top px}"><div class"container"><div class"row"><divclass"col":class"{selected: ind…

PPPOE拨号上网流程及密码窃取具体实现

楼主学生党一枚&#xff0c;最近研究netkeeper有些许心得。 关于netkeeper是调用windows的rasdial来进行上网的东西&#xff0c;网上已经有一大堆&#xff0c;我就不赘述了。 本文主要讲解rasdial的部分核心过程&#xff0c;以及我们可以利用它来干些什么。 netkeeper中rasdial…

新购阿里云服务器ECS创建之后无法ssh连接的问题处理

作者&#xff1a;13 GitHub&#xff1a;https://github.com/ZHENFENG13 版权声明&#xff1a;本文为原创文章&#xff0c;未经允许不得转载。 问题描述 由于原服务器将要到期&#xff0c;因此趁着阿里云搞促销活动重新购买了一台ECS服务器&#xff0c;但是在初始化并启动后却无…

边缘计算 ai_在边缘探索AI!

边缘计算 ai介绍 (Introduction) What is Edge (or Fog) Computing?什么是边缘(或雾)计算&#xff1f; Gartner defines edge computing as: “a part of a distributed computing topology in which information processing is located close to the edge — where things a…

初识spring-boot

使用Spring或者SpringMVC的话依然有许多东西需要我们进行配置&#xff0c;这样不仅徒增工作量而且在跨平台部署时容易出问题。 使用Spring Boot可以让我们快速创建一个基于Spring的项目&#xff0c;而让这个Spring项目跑起来我们只需要很少的配置就可以了。Spring Boot主要有如…

leetcode 879. 盈利计划(dp)

这是我参与更文挑战的第9天 &#xff0c;活动详情查看更文挑战 题目 集团里有 n 名员工&#xff0c;他们可以完成各种各样的工作创造利润。 第 i 种工作会产生 profit[i] 的利润&#xff0c;它要求 group[i] 名成员共同参与。如果成员参与了其中一项工作&#xff0c;就不能…

区块链101:区块链的应用和用例是什么?

区块链技术是一场记录系统的革命。 比特币是历史上第一个永久的、分散的、全球性的、无信任的记录分类帐。自其发明以来&#xff0c;世界各地各行各业的企业家都开始明白这一发展的意义。 区块链技术的本质让人联想到疯狂&#xff0c;因为这个想法现在可以应用到任何值得信赖的…

如何建立搜索引擎_如何建立搜寻引擎

如何建立搜索引擎This article outlines one of the most important search algorithms used today and demonstrates how to implement it in Python in just a few lines of code.本文概述了当今使用的最重要的搜索算法之一&#xff0c;并演示了如何仅用几行代码就可以在Pyth…

用Docker自动构建纸壳CMS

纸壳CMS可以运行在Docker上&#xff0c;接下来看看如何自动构建纸壳CMS的Docker Image。我们希望的是在代码提交到GitHub以后&#xff0c;容器镜像服务可以自动构建Docker Image&#xff0c;构建好以后&#xff0c;就可以直接拿这个Docker Image来运行了。 Dockerfile 最重要的…

Linux学习笔记15—RPM包的安装OR源码包的安装

RPM安装命令1、 安装一个rpm包rpm –ivh 包名“-i” : 安装的意思“-v” : 可视化“-h” : 显示安装进度另外在安装一个rpm包时常用的附带参数有&#xff1a;--force : 强制安装&#xff0c;即使覆盖属于其他包的文件也要安装--nodeps : 当要安装的rpm包依赖其他包时&#xff0…

leetcode 518. 零钱兑换 II

给定不同面额的硬币和一个总金额。写出函数来计算可以凑成总金额的硬币组合数。假设每一种面额的硬币有无限个。 示例 1: 输入: amount 5, coins [1, 2, 5] 输出: 4 解释: 有四种方式可以凑成总金额: 55 5221 52111 511111 示例 2: 输入: amount 3, coins [2] 输出: 0 解…

leetcode 279. 完全平方数(dp)

题目一 给定正整数 n&#xff0c;找到若干个完全平方数&#xff08;比如 1, 4, 9, 16, …&#xff09;使得它们的和等于 n。你需要让组成和的完全平方数的个数最少。 给你一个整数 n &#xff0c;返回和为 n 的完全平方数的 最少数量 。 完全平方数 是一个整数&#xff0c;其…

github代码_GitHub启动代码空间

github代码Codespaces works like a virtual Integrated Development Environment (IDE) on the cloud.代码空间的工作方式类似于云上的虚拟集成开发环境(IDE)。 Until now, you had to make a pull request to contribute to a project. This required setting up the enviro…

leetcode 1449. 数位成本和为目标值的最大数字(dp)

这是我参与更文挑战的第12天 &#xff0c;活动详情查看更文挑战 题目 给你一个整数数组 cost 和一个整数 target 。请你返回满足如下规则可以得到的 最大 整数&#xff1a; 给当前结果添加一个数位&#xff08;i 1&#xff09;的成本为 cost[i] &#xff08;cost 数组下标…