泰坦尼克:机器从灾难中学习_用于灾难响应的机器学习研究:什么才是好的论文?...

泰坦尼克:机器从灾难中学习

For the first time in 2021, a major Machine Learning conference will have a track devoted to disaster response. The 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021) has a track on “NLP Applications for Emergency Situations and Crisis Management”.

2021年第一次,大型机器学习会议将专门讨论灾难响应。 计算语言学协会欧洲分会第 16届会议(EACL 2021)的主题为“ NLP在紧急情况和危机管理中的应用”。

I am delighted to be the Senior Area Chair for this track! I’ve worked in machine learning and disaster response for 20 years and I’m glad that more people are now looking into how machine learning can help people at the most critical times.

我很高兴成为该曲目的高级区域主席! 我从事机器学习和灾难响应工作已有20年了,我很高兴看到越来越多的人开始研究机器学习如何在最关键的时刻帮助人们。

The majority of what goes into a paper on machine learning for disaster response should be the same as any other paper in applied science: reproducible methods that clearly advance our knowledge of how to deploy and evaluate machine learning technologies.

关于机器学习以应对灾难的论文的大部分内容应与应用科学领域的其他论文相同:可重现的方法明显提高了我们对如何部署和评估机器学习技术的认识。

However, there are aspects of disaster response that make some aspects of the science more important and a few aspects that are unique to disaster response. Some will be familiar to researchers who have worked in healthcare, but some are drawn from international development. Here’s a summary of important points that are covered in this article:

但是,灾难响应的某些方面使科学的某些方面更加重要,而灾难响应所独有的一些方面。 一些从事医疗保健工作的研究人员会很熟悉,但是有些来自国际发展。 这是本文涵盖的重要点的摘要:

  1. The majority of disaster response is helping the crisis-affected community help themselves. Therefore, tools that empower disaster-affected communities are most valuable, especially speakers of low resource languages.

    大部分灾难响应都在帮助受危机影响的社区自我救助。 因此,增强受灾社区能力的工具尤其是使用低资源语言的人最为有价值。
  2. Information management is a much bigger problem than information discovery for professional disaster responders.

    对于专业的灾难响应者而言,信息管理比信息发现要大得多。
  3. Papers focused on machine learning for disaster response should not cut corners on the science.

    专注于机器学习以应对灾难的论文不应在科学上走捷径。
  4. Papers promoting system built by research labs have no place in disaster response.

    研究实验室建立的论文推广系统在灾难响应中没有地位。
  5. It is not possible to fully evaluate the sensitivity of data during a disaster, so responses to ongoing disasters should default to private data practices.

    无法完全评估灾难期间数据的敏感性,因此对正在进行的灾难的响应应默认为私有数据做法。
  6. Disaster response is often used as a cover for human rights violations, especially under authoritarian regimes, so research from authoritarian regimes that can violate human rights should be rejected.

    灾害响应常常被用作侵犯人权的掩护,尤其是在专制政权下,因此,应拒绝来自专制政权可能侵犯人权的研究。
  7. Researchers should not partner with non-operational aid organizations and should know how to spot the difference between operational and non-operational organizations.

    研究人员不应与非运营援助组织合作,并且应该知道如何发现运营组织与非运营组织之间的差异。
  8. English social media processing is not interesting or useful for disaster responders.

    英语社会媒体处理对于灾难响应者来说并不有趣或没有用。
  9. Ignore anything that relies on research published in “ISCRAM”.

    忽略任何依赖于“ ISCRAM”中发表的研究的内容。
  10. Apply the “Do no harm” principle to evaluating impact.

    将“请勿伤害”原则应用于评估影响。

I’ll share examples from my own experience as I expand on each one.

我将分享自己的经验,并逐一举例说明。

1.灾害应对主要是社区自救 (1. Disaster response is mostly communities helping themselves)

For any large disaster, we simply don’t have the resources to help most people directly. In the lead up to a forest fire, your own preparation for your property will probably have a bigger impact than any preparation that professional firefighter has time to provide. During a pandemic, you will be directly responsible for social distancing and sanitation. After an earthquake, your neighbors are much more likely to pull you from your collapsed house than a professional search & rescue team.

对于任何大灾难,我们根本没有资源来直接帮助大多数人。 在导致森林大火之前,您自己准备的财产可能会比专业消防员有时间提供的任何准备产生更大的影响。 在大流行期间,您将直接与社会隔离和环境卫生负责。 地震后,与专业的搜救团队相比,邻居更容易将您从倒塌的房屋中拉出来。

The most important way to support these communities is with clear communication. Speakers of low resource languages are more likely to be the victims of natural and man-made disasters. Therefore, any technology that helps get information to linguistically diverse communities will be help them in disasters. In fact, I believe that the work I have done helping large companies deploy technology in more languages has had a bigger impact on disaster response than the time I spent working in refugee camps for the UN.

支持这些社区的最重要方法是清晰的沟通。 使用低资源语言的人更有可能成为自然和人为灾难的受害者。 因此,任何有助于在语言上多样化的社区中获取信息的技术都将在灾难中帮助他们。 实际上,我相信,与我在联合国难民营工作的时间相比,我帮助大型公司以更多语言部署技术的工作对灾难响应的影响更大。

So, if you are improving the machine translation or the language support of devices and applications like search engines and online stores, then you are already working on the single most important problem for machine learning in disaster response. It helps disaster responders communicate with the affected community and it helps the community search for the right resources online to help themselves.

因此,如果您要改善机器翻译或对设备和应用程序(例如搜索引擎和在线商店)的语言支持,那么您已经在研究灾难响应中机器学习的一个最重要的问题。 它可以帮助灾难响应者与受影响的社区进行通信,还可以帮助社区在线搜索合适的资源以帮助自己。

Like I shared in Gretchen McCulloh’s recent Wired article, “Covid-19 Is History’s Biggest Translation Challenge”, I’ve seen the downsides of getting this wrong many times. For example, in Sierra Leone during the Ebola crisis, an international news agency broadcast Mande-language announcements in a Temne-speaking area, creating distrust because Mande was seen as the language of the political party who were in power at the time. As a result, Temne-speakers were more likely to avoid healthcare clinics.

就像我在Gretchen McCulloh在《 连线杂志上发表的文章“ Covid-19是历史上最大的翻译挑战 ”中所分享的一样,我已经多次看到犯错的弊端。 例如,在埃博拉危机期间的塞拉利昂,一家国际新闻社在泰姆语地区广播曼德语的公告,造成不信任,因为曼德被视为当时执政的政党的语言。 结果,讲特姆语的人更可能避开医疗诊所。

I worked with aid agencies to come up with the shocking conclusion that for every person who contracts Ebola, ten people died nearby from other preventable conditions because they were avoiding clinics. The amplification of wealthy nations’ fears into the local media, with too little attention to local languages, killed more people than Ebola itself.

我与援助机构合作得出了一个令人震惊的结论,即每感染埃博拉病毒的人中就有十人死于附近其他可预防的疾病,因为他们避开了诊所。 富裕国家对当地媒体的恐惧加剧,而对当地语言的关注却很少,比埃博拉病毒本身造成的死亡人数更多。

On the positive side, getting it right can make a big impact. For example, I recruited and managed 2,000 Haitian Kreyol speakers following the 2010 earthquake to translate emergency messages. The translators’ work saved many lives. It also supported machine learning research for disaster response, with the data used in the Workshop on Machine Translation shared task in 2011 (WMT11) and as part of a multilingual disaster response dataset that is now widely used.

从积极的一面来看,正确处理可能会产生重大影响。 例如,在2010年地震后,我招募和管理了2,000名海地克雷奥尔语发言人 ,以翻译紧急信息。 译者的工作挽救了许多生命。 它还通过2011 年机器翻译研讨会共享任务(WMT11)中使用的数据以及现在已被广泛使用的多语言灾难响应数据集的一部分,为灾难响应的机器学习研究提供了支持。

Any paper focused on low resource languages can make the argument that it will be useful for disaster response, because it can be part of the foundational technology that allows disaster-affected populations to more easily communicate and access information and services.

任何针对低资源语言的论文都可以提出这样的论点,即它对于灾难响应将是有用的,因为它可以成为基础技术的一部分,该基础技术使受灾人群能够更轻松地交流和访问信息与服务。

2.信息管理是专业灾难响应者面临的最大问题。 (2. Information management is the biggest problem for professional disaster responders.)

Most of the work that goes into disaster response is logistics and most disaster response professionals share information via spreadsheets and unstructured documents. It is a myth (from too many movies) that analytics and machine learning during a disaster is primarily focused on predicting where the next “hot-spot” will be. These use cases exist, but they are rare.

灾难响应中的大部分工作是物流,大多数灾难响应专业人员通过电子表格和非结构化文档共享信息。 一个神话(从太多的电影中得出)是灾难期间的分析和机器学习主要专注于预测下一个“热点”的位置。 这些用例存在,但很少见。

For example, a disaster response leader in charge of planning drinking water distributions might get 100s of reports from the different agencies or regions, each one with pieces of information needed to estimate the overall need. The information needs to be extracted from each of those reports reliably.

例如,负责规划饮用水分配的灾难响应负责人可能会从不同机构或地区获得100份报告,每份报告都带有估计总体需求所需的信息。 需要可靠地从每个报告中提取信息。

So, if you can develop machine learning systems that can extract information from semi-structured tables and forms in spreadsheets and PDF documents, then you are working on one of the most important problems for supporting disaster response professionals.

因此,如果您可以开发能够从电子表格和PDF文档中的半结构化表格和表格中提取信息的机器学习系统,那么您正在研究支持灾难响应专业人员的最重要问题之一。

A good example from 10 years ago that is still relevant is “Designing Adaptive Feedback for Improving Data Entry Accuracy”, by Kuang Chen, Joseph M. Hellerstein, and Tapan S. Parikh. The paper evaluates machine learning-assisted technologies to help professional data entry clerks digitize patient data from clinics in rural Uganda, supporting a large population of Congolese refugees.

10年前仍然有用的一个很好的例子是Kuang Chen,Joseph M. Hellerstein和Tapan S. Parikh撰写的“ 设计自适应反馈以提高数据输入的准确性 ”。 本文评估了机器学习辅助技术,以帮助专业数据录入员对乌干达农村诊所的患者数据进行数字化处理,为大量刚果难民提供支持。

3.不要在科学上走捷径 (3. Don’t cut corners on the science)

You have probably seen in the COVID-19 news that infectious disease professionals are opposed to the wide-spread use of vaccines that have not gone through appropriate testing. If vaccines for the most widespread pandemic in living memory can wait for the science, so can your machine learning research.

您可能已经在COVID-19新闻中看到,传染病专业人士反对尚未通过适当测试的疫苗的广泛使用。 如果针对生命记忆中最广泛流行的疫苗可以等待科学,那么您的机器学习研究也可以等待科学。

But we can speed up the review process. For example, at the recent ACL 2020 Workshop on Natural Language Processing for COVID-19 (NLP-COVID), we reviewed papers as they were submitted instead of waiting for the final deadline. However, we did not drop the standard for acceptance in any way.

但是我们可以加快审核过程。 例如,在最近的ACL 2020 COVID-19自然语言处理(NLP-COVID)研讨会上 ,我们审查了提交的论文,而不是等待最终截止日期。 但是,我们没有以任何方式放弃接受标准。

4.研究实验室的系统演示论文在灾难响应研究中没有位置 (4. System demonstration papers from research labs have no place in disaster response research)

Toyota Land Cruisers are a cliche in the aid world: they have been the majority of vehicles that I have seen in some response situations.

丰田陆地巡洋舰是救援界的老牌车队:在某些情况下,它们是我所见过的大多数车辆。

Image for post
Toyota Land Cruiser: used for predicability and reliability. Source: https://commons.wikimedia.org/wiki/File:World_Food_Programme_Landcruiser.jpg
丰田陆地巡洋舰:用于可预测性和可靠性。 资料来源: https : //commons.wikimedia.org/wiki/File : World_Food_Programme_Landcruiser.jpg

Land Cruisers are not necessarily the most perfect vehicle for aid, but they are the most reliable and predictable. When they do have problems, there will be many people who know how to repair them and many suitable replacement parts available.

陆地巡洋舰不一定是最理想的援助工具,但它们是最可靠和可预测的。 当它们确实有问题时,将有很多人知道如何维修它们以及许多合适的替换零件。

Could an academic research lab design a vehicle more suitable for disaster response? No doubt. Should those vehicles actually be deployed in a critical situation? Absolutely not. Spare parts would be hard to come by and the only experts would be a small number of academics who would quickly move onto other work and not be available to help with mechanical issues. So, the science from that lab might inform future development, but they should not build the actual vehicles.

学术研究实验室可以设计一种更适合灾难响应的车辆吗? 毫无疑问。 这些车辆是否真的应该在紧急情况下部署? 绝对不。 备件将很难获得,唯一的专家就是少数学者,他们将Swift转移到其他工作上,而又无法解决机械问题。 因此,该实验室的科学知识可能会为将来的发展提供信息,但他们不应制造实际的车辆。

The same applies to any software. Software can inform the science and be used in controlled environments during non-critical periods, especially if there are human-computer interaction (HCI) components that are important to test. However, machine learning researchers do not have the skills to create scalable technology or the ability to continue to support that software for years to come. Aid agencies don’t have the engineering capacity take an academic system and make it scalable and reliable through extensive development and testing. There is simply no use for academic demos where the authors claim that their technology should be used for actual critical data.

这同样适用于任何软件。 软件可以为科学提供参考,并可以在非关键时期在受控环境中使用,尤其是在存在对测试很重要的人机交互(HCI)组件的情况下。 但是,机器学习研究人员没有创造可扩展技术的技能,也没有能力在未来几年继续支持该软件。 援助机构不具备具备学术能力的工程能力,无法通过广泛的开发和测试使其具有可扩展性和可靠性。 作者声称需要将其技术用于实际关键数据的学术演示完全没有用。

Even when I was at Stanford, arguably the most industry-focused technical university, I did not work with colleagues there when I needed to develop tools quickly as a disaster responder. I worked with commercial software solutions, because reliability was more important than innovation, and it was only after the response that I led research on how machine learning could improve future responses.

即使在斯坦福大学(可以说是最专注于行业的技术大学)时,当我需要作为灾难响应者快速开发工具时,也没有与那里的同事一起工作。 我从事商业软件解决方案的工作,是因为可靠性比创新更重要,而且只有在做出回应之后,我才开始研究机器学习如何改善未来的响应。

5.默认为私有数据惯例。 (5. Default to private data practices.)

During an ongoing disaster, it is not possible to determine whether data that seems non-sensitive today will in fact become sensitive later. So, don’t publish any personal data, including already open social media, as part of a paper during a disaster. Instead, wait until the affected population are no longer at risk and then engage privacy professionals to help decide what can and cannot be shared.

在持续的灾难中,无法确定今天似乎不敏感的数据是否实际上会在以后变得敏感。 因此,在灾难期间,请勿发布任何个人数据(包括已经开放的社交媒体)作为论文的一部分。 取而代之的是,等到受影响的人群不再处于危险之中,然后聘请隐私专业人员来帮助确定可以共享和不能共享的内容。

Even for data that is seen as open, re-publishing that data can become sensitive if you republish it in new contexts, and aggregated data (including Machine Learning models) can become more sensitive than its individual data points.

即使对于被视为开放的数据,如果您在新的上下文中重新发布该数据,则重新发布该数据也会变得敏感,并且聚合数据(包括机器学习模型)可能会比其单个数据点更加敏感。

For example, during the Arab Spring, I saw a lot of people tweeting about their local conditions: road closures, refugees, etc. While they were “public” tweets, the tweets were clearly written with only a handful of followers in mind, and they didn’t realize that reporting road closures would also help paint a picture of troop movements. As an example of what not to do, some of these tweets were copied to UN-controlled websites and re-published, with no mechanism for the original authors to remove them from the UN sites. Many actors within the Middle East and North Africa saw the UN as a negative foreign influence (or invader) and the people tweeting were therefore seen as collaborators — they didn’t care if these people only intended to be sharing information with a small number of followers.

例如,在“阿拉伯之春”期间,我看到很多人在发布有关当地情况的推文:封路,难民等。虽然它们是“公共”推文,但这些推文显然只在少数追随者的心目中撰写,他们没有意识到报告封路也不会有助于描绘部队的动向。 作为不采取行动的一个例子,这些推文中的一些被复制到联合国控制的网站上并重新发布,而原始作者没有任何机制将其从联合国站点中删除。 中东和北非的许多参与者都将联合国视为负面的外国影响(或侵略者),因此发推文的人被视为合作者–他们不在乎这些人是否只打算与少数人分享信息追随者。

So, you need to ask yourself: what is the effect of recontextualizing the data or model so that it is now published by myself or my organization?

因此,您需要自问:重新对数据或模型进行上下文化以使其现在由本人或我的组织发布会产生什么影响?

6.灾害应对经常被用作侵犯人权的掩护 (6. Disaster response is often used as a cover for human rights violations)

While crime typically goes down overall following disasters, a small number of predators and opportunists will try to gain advantage from the chaos. This is especially true in oppressive governments who use disasters as a cover to identify and silence their critics.

尽管灾难通常会整个灾难之后使犯罪率下降 ,但少数掠食者和机会主义者会设法从混乱中获得好处。 在压迫性政府中尤其如此,这些政府利用灾难作为掩饰来识别和压制批评家。

If you are a reviewer or a researcher considering research into personal information including ethnicity, religion, gender, or political preferences, then you should take into consideration the use case and whether it can be used for human rights violations, especially by Authoritarian Regimes. As a guide, look at the Democracy Index compiled by the Economist Intelligence Unit (EIU): https://en.wikipedia.org/wiki/Democracy_Index.

如果您是考虑对包括种族,宗教,性别或政治偏好在内的个人信息进行研究的审稿人或研究人员,则应考虑用例,以及该用例是否可用于侵犯人权的行为,特别是在专制制度下。 作为指南,请查看经济学人智库(EIU)编制的民主指数: https : //en.wikipedia.org/wiki/Democracy_Index 。

They rank countries and put them into four buckets: Full Democracies, Flawed Democracies, Hybrid Regimes, and Authoritarian Regimes. This is important because: independent research institutions cannot exist in Authoritarian Regimes.

他们对国家进行排名,并将其划分为四个部分:完全民主制,有缺陷的民主制,混合政权和专制政权。 这很重要,因为: 在威权主义体制下不能存在独立的研究机构

If there are sensitive use cases, like identifying people complaining about the government on social media or expressing political preferences, then research in this area cannot be trusted. This happens very frequently. In my presentation at KDD last year I talk about how countries used the last COVID outbreak (SARS-CoV-1) as a cover to identify dissidents.

如果存在敏感的用例,例如在社交媒体上识别抱怨政府的人或表达政治偏爱,则该领域的研究将不可信。 这经常发生。 在去年在KDD上的演讲中,我谈到了国家如何利用上一次COVID爆发(SARS-CoV-1)作为掩盖身份差异的标识。

A researcher working for an authoritarian regime is not independent of their government in the same way that a researcher at a public institution in a democracy is independent of their government. The researcher’s identity or nationality is not the problem: their employer or funder is the problem. Research that has human rights implications which is funded by authoritarian regimes should therefore be immediately rejected by the program chairs of machine learning conferences. Those researchers do not have the independence to prevent the negative use cases, no matter what their personal intentions are.

在专制政权下工作的研究人员并不像在民主国家的公共机构中的研究人员独立于他们的政府那样独立于其政府。 研究人员的身份或国籍不是问题:他们的雇主或资助者是问题。 因此,由专制政权资助的具有人权影响的研究应立即被机器学习会议的计划主席拒绝。 那些研究人员,无论他们的个人意图是什么,都没有独立性来防止否定用例。

There are plenty of use cases that can help with disaster response and the most important ones do not require sensitive data: general research into low resource languages and information extraction from semi-structured documents. So, there is nothing stopping a researcher who was born into an authoritarian regime, or who choses to be employed by one, from contributing to disaster response machine learning research.

有很多用例可以帮助应对灾难,最重要的用例不需要敏感数据:对低资源语言的常规研究以及从半结构化文档中提取信息。 因此,没有阻止一位专制政权或选择受雇于独裁政权的研究人员为灾难响应机器学习研究做出贡献的机会。

Note that countries high on democracy index can still commit human right’s abuses, so this does not give a free pass to research from those countries. EIU’s democracy index is heavily focused on country-internal factors. A good example of a grey area for any country is the military. The world’s militaries are also the largest disaster response organizations, so this makes for complicated evaluations that typically have to be investigated on a per-case basis.

请注意,民主指数高的国家仍会犯下侵犯人权的行为,因此这不能免费获得这些国家的研究成果。 EIU的民主指数主要关注国家内部因素。 对于任何国家来说,灰色地区都是一个很好的例子。 世界上的军事力量也是最大的灾难响应组织,因此,这使得评估工作很复杂,通常需要根据具体情况进行调查。

Some cases are unequivocally positive. For example, in 2012 a small group of us (non-military disaster responders) took part in exercises hosted by the Naval Post Graduate School where we aimed to discover better ways to make damage assessments from aerial imagery following a disaster. We worked with Civil Air Patrol (part of the US Military) who fly over disasters to take images and FEMA (part of the Department of Homeland Security) who use damage assessments from those images to help with the response. Just a few months later we used our new techniques to help respond to Hurricane Sandy. There is little doubt that this was completely positive.

有些情况无疑是积极的。 例如,2012年我们中的一小部分人(非军事灾难响应者)参加了海军研究生院举办的演习,我们的目的是发现更好的方法,以从灾难后的航空影像进行损坏评估。 我们与民航巡逻队(美国军方的一部分)合作,后者飞越灾难以拍摄图像,而FEMA(国土安全部的一部分)与FEMA(使用这些图像的损害评估来帮助做出响应)合作。 仅仅几个月后, 我们就使用新技术来应对飓风桑迪 。 毫无疑问,这完全是积极的。

However, like I said in my after-action report for the earthquake in Haiti in 2010, there was an uneasy tension because many people in Haiti saw the US Military as former occupiers. In other cases, like working with UNICEF to support maternal health in West Africa, we chose not to work with any US government organization, because that would be perceived as lacking independence when helping other nations. So, regardless of the employer of the researchers, the ethics of government involvement need to be considered on a per case basis for every paper, especially when the disaster and responders are from multiple different nations.

但是,就像我在2010年海地地震的行动后报告中所说的那样,紧张局势令人不安,因为海地许多人都将美军视为前占领者 。 在其他情况下, 例如与联合国儿童基金会合作以支持西非的孕产妇健康 ,我们选择不与任何美国政府组织合作,因为在帮助其他国家时,这被认为缺乏独立性。 因此,无论研究人员的雇主如何,都需要针对每篇论文逐案考虑政府参与的道德规范,尤其是在灾难和响应者来自多个不同国家的情况下。

7.不要与非行动援助组织合作 (7. Don’t partner with non-operational aid organizations)

Most international development organizations that will reach out to research institutions for help will not actually be doing disaster response work. To give a very high-level introduction to the aid industry, here’s a graphic showing how a lot of aid organization work in disaster response:

大多数国际发展组织将接触到研究机构的帮助实际上不会做救灾工作。 为了从总体上介绍援助行业,下面的图形显示了许多援助组织在灾难响应中的工作方式:

Image for post
High-level overview of how aid organizations are structured. A small number of large organizations that do aid at the national or international level are known as “Operational Organizations”, but most of them use local “Implementing Partners” for the actual disaster response work. Some local aid organizations might be wholly independent or joint independent and helping larger orgs. “Non-Operational Organizations” are the smallest but can erroneously look like they are big and operational. Source: https://www.kdnuggets.com/2020/04/5-ways-data-scientists-can-help-covid-19.html
有关援助组织结构的高级概述。 少数在国家或国际级别提供帮助的大型组织称为“运营组织”,但大多数组织使用本地“实施伙伴”进行实际的灾难响应工作。 一些地方援助组织可能是完全独立或联合独立的,可以帮助更大的组织。 “非运营组织”是最小的组织,但会错误地看起来像是大型组织和可运作组织。 资料来源: https : //www.kdnuggets.com/2020/04/5-ways-data-scientists-can-help-covid-19.html

If someone is asking you to help, how do you know if they are actually responding? The best organization to help is one operating locally. Does your local hospital or food distribution center for refugees need help? Start with them.

如果有人要您提供帮助,您如何知道他们是否真的在响应? 最好的帮助组织是在本地运营的组织。 您当地的难民医院或食物分配中心需要帮助吗? 从他们开始。

The non-operational organizations are typically small and use disasters as funding and publicity opportunities. Look for them talking about “partnerships” with bigger organizations like the WHO, but nowhere saying that they are an “implementing partner”. This is typically code for “not actually part of the response”. If they reach out to you, chances are that you are the product and they are telling potential funders something like “look, we have researchers from a prominent university bringing innovation to disaster response.”

非运营组织通常规模较小,并利用灾难作为资金和宣传机会。 寻找他们谈论与世界卫生组织等较大组织的“伙伴关系”,却无处说他们是“实施伙伴”。 这通常是“实际上不是响应的一部分”的代码。 如果他们伸出援手,您很有可能是的产品,他们会告诉潜在的资助者诸如“看,我们有来自著名大学的研究人员,他们将创新应用于灾难应对。”

The operational organizations like UNHCR, UNICEF, Red Cross, Doctors Without Borders all have their own technology innovation teams, so there is no need to partner with non-operational organizations. Non-operational organizations’ incentives are not aligned with privacy as they need publicity to continue to attract funding. For an example, see how a non-operational UN organization edited a video interview with me following an Ebola outbreak in Uganda in 2011: they edited my statement that more privacy was needed to instead say that we needed less privacy, so that they could access that data:

难民专员办事处,儿童基金会,红十字会,无国界医生组织等业务组织都有自己的技术创新团队,因此无需与非业务组织建立伙伴关系。 非运营组织的激励机制与隐私不符,因为它们需要宣传以继续吸引资金。 例如,看看在2011年乌干达埃博拉疫情爆发后,一个非运作性的联合国组织如何编辑了对我的视频采访:他们编辑了我的声明,即需要更多的隐私,而不是说我们需要更少的隐私,以便他们可以访问该数据:

If there are no operational aid agencies or implementing partners that need your help, then I recommend researching the fundamental building blocks of disaster response like supporting low-resource languages and information extraction from semi-structured documents.

如果没有运营援助机构或执行合作伙伴需要您的帮助,那么我建议您研究灾难响应的基本组成部分,例如支持低资源语言和从半结构化文档中提取信息。

Just like countries with “Democratic” in their name tend not to be democracies, a similar rule-of-thumb applies to organizations with “Disaster” in their title. Operational organizations are named after the people they help or the service they are providing: Médecins Sans Frontières, World Health Organization, United Nations Children’s Fund, etc. If an organization has the words “Crisis”, “Disaster” or “Humanitarian” in its title, it probably doesn’t do disaster response: nobody wants to get aid from an organization who’s name reminds of them of their trauma and these organizations are named to maximize publicity and funding, not response.

就像名称中带有“民主”的国家往往不是民主国家一样,类似的经验法则也适用于名称中带有“灾难”的组织。 运营组织的名称以其提供帮助的人员或所提供的服务为依据: 无国界医生,世界卫生组织,联合国儿童基金会等。如果组织的名称中包含“危机”,“灾难”或“人道主义”字样标题,它可能并没有做出灾难响应:没有人希望从一个名称使他们想起他们的创伤的组织那里获得援助,而这些组织的名字是为了最大程度地宣传和筹集资金,而不是响应。

8.英语社交媒体处理对救灾人员没有帮助 (8. English social media processing is not useful for disaster responders)

In NLP more broadly, we know that English-only results for our models rarely tell us how well other languages work. English-speaking countries tend to have the most well-funded disaster response organizations already, so this is one area where English’s irrelevance is amplified.

在更广泛的NLP中,我们知道模型的仅英语结果很少能告诉我们其他语言的运行情况。 讲英语的国家往往已经拥有资金最充足的救灾组织,因此这是英语无关紧要的地区之一。

Best practice is to use social media as a broadcast medium for disaster response organizations to communicate to crisis-affect populations and that open social media should not be used as a direct communications channel for disaster-affected populations. This conclusion was reached in the disaster response community after incidents in Libya mentioned above, a response to floods in Pakistan in 2010 where publicly discussed aid camps were threatened by terrorists, and an analysis for the response to Haiti where it was found that despite many media articles praising social media, open social media was not a significant factor in the response.

最佳做法是使用社交媒体作为一种广播媒介的救灾组织沟通危机影响的人群和开放的社会化媒体应该被用来作为受灾人口的直接通信信道。 在上述利比亚事件,2010年针对巴基斯坦洪灾的应对行动中得出的这一结论是在灾难响应界做出的,巴基斯坦在该地的公开讨论的援助营受到恐怖分子的威胁,并分析了对海地的应对措施,发现尽管有许多媒体,赞美社交媒体的文章,开放的社交媒体并不是回应的重要因素。

There is an intersectional problem with English and Authoritarian Regimes, too. Oppressive regimes frequently target intellectuals who are seen as political opponents. Speaking English often marks someone as being more educated and as someone speaking to international audiences. The inability to respond to a disaster makes a government look weak and this especially exposes an authoritarian leader who leads by projecting strength. For example, in 2013 when Typhoon Haiyan hit the Philippines, a small number of English-speaking people there who were critical of the response were seen as in danger of reprisals from the government. So, we decided it was unethical to ever release English-language social media data as part of disaster response datasets.

英文和威权政体也存在交叉问题。 压迫政权经常针对被视为政治对手的知识分子。 说英语常常标志着一个人受过更高的教育,并且代表了一个与国际听众交谈的人。 无能为力的灾难使政府显得软弱无力,这尤其暴露了一个以投射力量为首的专制领导人。 例如,在2013年台风“海盐”袭击菲律宾时,那里的少数说英语的人对该React持批评态度,被视为有遭到政府报复的危险。 因此,我们认为发布英语社交媒体数据作为灾难响应数据集的一部分是不道德的。

My PhD showed how hard it is to adapt English social media communications to other domains in disasters, which was also summarized for the international development community in a paper coauthored by my PhD advisor, Christopher Manning. So, there is no excuse for ignoring this and conducting English-only research.

我的博士展示了使英语社交媒体交流适应灾难中其他领域的艰辛,我的博士顾问克里斯托弗·曼宁(Christopher Manning)合着的一篇论文也为国际发展界总结了这一点 。 因此,没有任何借口可以忽略这一点并进行仅英语的研究。

If the research is not for disaster responders, but aimed at supporting related professionals, then there is a stronger argument to be made. For example, there is interesting research in using machine learning to help mental-health professionals understand social reactions to disasters on online forums.

如果这项研究不是针对灾难响应者的,而是旨在为相关专业人士提供支持的,那么就需要进行更强有力的论证。 例如,在使用机器学习来帮助心理健康专业人员在在线论坛上了解社会对灾难的社会React方面,存在有趣的研究。

9.忽略任何依赖“ ISCRAM”的研究 (9. Ignore anything relying on research from “ISCRAM”)

Every scientific subfield has a venue where rejected papers get accepted by a pseudo-anonymized set of researchers accepting each other’s work. For machine learning work applied to disaster response, this is “Information Systems for Crisis Response and Management” (ISCRAM, pronounced “I-SCAM” by those of us who actually work in disaster response).

每个科学子领域都有一个场所,被拒绝的论文会被一组伪匿名的接受彼此工作的研究人员所接受。 对于应用于灾难响应的机器学习工作,这就是“用于危机响应和管理的信息系统”(ISCRAM,在我们那些实际从事灾难响应的人中被称为“ I-SCAM”)。

In 2013 I wrote about “The Top NLP Conferences”, using disaster response research as an example:

2013年,我以灾难响应研究为例,撰写了“ The Top NLP Conferences”:

In that article I noted that ISCRAM publishes junk science articles that have been rejected by mainstream NLP conferences. Every paper I have published on actual disaster response efforts has been plagiarized a year or so later in ISCRAM as a “simulation” by an authoritarian regime’s paper-mill, as part of their attempts to whitewash their systems that are actually built for uses cases that violate human rights.

在那篇文章中,我指出ISCRAM发表了被主流NLP会议拒绝的垃圾科学文章。 我发表的每篇有关实际灾难响应工作的论文都在一年左右的时间里被ISCRAM pla窃,作为专制政权的造纸厂的“模拟”,作为他们试图粉刷其实际为用例构建的系统的一部分侵犯人权。

Papers that rely on ISCRAM-published research alone cannot be trusted and should not be accepted to mainstream scientific venues.

仅仅依靠ISCRAM发表的研究论文就不能令人信服,也不应被主流科学机构所接受。

10.评估影响时应用“无害”原则 (10. Apply the “Do No Harm” principle when evaluating impact)

The “Do No Harm” principle from medicine and disaster response circles should be applied to evaluating machine learning research. Organizations wouldn’t deploy a vaccine that killed 50% as many people as it saves, if the people killed wouldn’t have otherwise died. The same is true for most use cases with personal data, even if already public: are there use cases that can be used to harm people who would not otherwise be harmed?

医学和灾难响应界的“请勿伤害”原则应应用于评估机器学习研究。 如果被杀死的人不会因此丧生,组织将不会部署一种疫苗,该疫苗杀死的人数量应达到其挽救的50%。 对于大多数具有个人数据的用例,即使已经是公开的,也是如此: 是否存在可以用来伤害原本不会受到伤害的人的用例?

Machine learning researchers shouldn’t be making the case that there is a net benefit to their research, which they are probably not qualified to evaluate, in any case. If there is a clear negative use case from the research that would negatively impact people who would not otherwise be harmed, then that paper should be rejected on ethical grounds.

机器学习研究人员不应该证明自己的研究有净收益 ,无论如何他们都没有资格进行评估。 如果研究中有明确的消极用例会对其他方面不会受到伤害的人产生消极影响,则该论文应基于道德理由予以拒绝。

我应该如何开始研究机器学习以应对灾难? (How should I get started on researching machine learning for disaster response?)

If you want to help in an ongoing disaster and you don’t have experience, remember that people will have the least time to train you during a disaster. Don’t be surprised if your most valuable skill is in data cleaning or other skills that won’t result in research papers. If you turned up to a hospital to help without any medical training, you shouldn’t complain if they put a mop and bucket in your hands. The same applies to disaster response. See my recent KDNuggets article for more information on how you can help and what to avoid: 5 Ways Data Scientists Can Help Respond to COVID-19 and 5 Actions to Avoid.

如果您想在正在进行的灾难中提供帮助而又没有经验,请记住,人们在灾难期间最没有时间培训您。 如果您最有价值的技能是数据清理或其他不会产生研究论文的技能,不要感到惊讶。 如果您未经任何医学培训就去医院寻求帮助,则不要抱怨他们把手拖把拖了水桶。 灾难响应也是如此。 请参阅我最近的KDNuggets文章,以获取有关如何帮助和避免的更多信息: 数据科学家可以帮助应对COVID-19的5种方法和5种避免的行动 。

If you don’t have experience and want to work on something that can be published at a scientific venue, consider use cases like supporting low resource languages and information extraction from semi-structured text. These use cases can also help with other areas of impact, like healthcare and the environment. So, there is great potential impact with much less chance of inadvertently causing harm.

如果您没有经验并且想从事可以在科学场所发布的内容,请考虑使用案例,例如支持低资源语言和从半结构化文本中提取信息。 这些用例还可以帮助影响其他方面,例如医疗保健和环境。 因此,潜在的影响很大,而无意间造成伤害的机会则要少得多。

翻译自: https://towardsdatascience.com/research-on-machine-learning-for-disaster-response-b65f3e97c018

泰坦尼克:机器从灾难中学习

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/390725.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

github持续集成的设置_如何使用GitHub Actions和Puppeteer建立持续集成管道

github持续集成的设置Lately Ive added continuous integration to my blog using Puppeteer for end to end testing. My main goal was to allow automatic dependency updates using Dependabot. In this guide Ill show you how to create such a pipeline yourself. 最近&…

shell与常用命令

虚拟控制台 一台计算机的输入输出设备就是一个物理的控制台 ; 如果在一台计算机上用软件的方法实现了多个互不干扰独立工作的控制台界面,就是实现了多个虚拟控制台; Linux终端的工作方式是字符命令行方式,用户通过键盘输入命令进…

Linux文本编辑器

Linux文本编辑器 Linux系统下有很多文本编辑器。 按编辑区域: 行编辑器 ed 全屏编辑器 vi 按运行环境: 命令行控制台编辑器 vi X Window图形界面编辑器 gedit ed 它是一个很古老的行编辑器,vi这些编辑器都是ed演化而来。 每次只能对一…

Alpha第十天

Alpha第十天 听说 031502543 周龙荣(队长) 031502615 李家鹏 031502632 伍晨薇 031502637 张柽 031502639 郑秦 1.前言 任务分配是VV、ZQ、ZC负责前端开发,由JP和LL负责建库和服务器。界面开发的教辅材料是《第一行代码》,利用And…

Streamlit —使用数据应用程序更好地测试模型

介绍 (Introduction) We use all kinds of techniques from creating a very reliable validation set to using k-fold cross-validation or coming up with all sorts of fancy metrics to determine how good our model performs. However, nothing beats looking at the ra…

X Window系统

X Window系统 一种以位图方式显示的软件窗口系统。诞生于1984,比Microsoft Windows要早。是一套独立于内核的软件 Linux上的X Window系统 X Window系统由三个基本元素组成:X Server、X Client和二者通信的通道。 X Server:是控制输出及输入…

lasso回归和岭回归_如何计划新产品和服务机会的回归

lasso回归和岭回归Marketers sometimes have to be creative to offer customers something new without the luxury of that new item being a brand-new product or built-from-scratch service. In fact, incrementally introducing features is familiar to marketers of c…

Linux 设备管理和进程管理

设备管理 Linux系统中设备是用文件来表示的,每种设备都被抽象为设备文件的形式,这样,就给应用程序一个一致的文件界面,方便应用程序和操作系统之间的通信。 设备文件集中放置在/dev目录下,一般有几千个,不…

贝叶斯 定理_贝叶斯定理实际上是一个直观的分数

贝叶斯 定理Bayes’ Theorem is one of the most known to the field of probability, and it is used often as a baseline model in machine learning. It is, however, too often memorized and chanted by people who don’t really know what P(B|E) P(E|B) * P(B) / P(E…

文本数据可视化_如何使用TextHero快速预处理和可视化文本数据

文本数据可视化自然语言处理 (Natural Language Processing) When we are working on any NLP project or competition, we spend most of our time on preprocessing the text such as removing digits, punctuations, stopwords, whitespaces, etc and sometimes visualizati…

linux shell 编程

shell的作用 shell是用户和系统内核之间的接口程序shell是命令解释器 shell程序 Shell程序的特点及用途: shell程序可以认为是将shell命令按照控制结构组织到一个文本文件中,批量的交给shell去执行 不同的shell解释器使用不同的shell命令语法 shell…

真实感人故事_您的数据可以告诉您真实故事吗?

真实感人故事Many are passionate about Data Analytics. Many love matplotlib and Seaborn. Many enjoy designing and working on Classifiers. We are quick to grab a data set and launch Jupyter Notebook, import pandas and NumPy and get to work. But wait a minute…

转:防止跨站攻击,安全过滤

转:http://blog.csdn.net/zpf0918/article/details/43952511 Spring MVC防御CSRF、XSS和SQL注入攻击 本文说一下SpringMVC如何防御CSRF(Cross-site request forgery跨站请求伪造)和XSS(Cross site script跨站脚本攻击)。 说说CSRF 对CSRF来说,其实Spring…

Linux c编程

c语言标准 ANSI CPOSIX(提高UNIX程序可移植性)SVID(POSIX的扩展超集)XPG(X/Open可移植性指南)GNU C(唯一能编译Linux内核的编译器) gcc 简介 名称: GNU project C an…

k均值算法 二分k均值算法_使用K均值对加勒比珊瑚礁进行分类

k均值算法 二分k均值算法Have you ever seen a Caribbean reef? Well if you haven’t, prepare yourself.您见过加勒比礁吗? 好吧,如果没有,请做好准备。 Today, we will be answering a question that, at face value, appears quite sim…

新建VUX项目

使用Vue-cli安装Vux2 特别注意配置vux-loader。来自为知笔记(Wiz)

衡量试卷难度信度_我们可以通过数字来衡量语言难度吗?

衡量试卷难度信度Without a doubt, the world is “growing smaller” in terms of our access to people and content from other countries and cultures. Even the COVID-19 pandemic, which has curtailed international travel, has led to increasing virtual interactio…

Linux 题目总结

守护进程的工作就是打开一个端口,并且等待(Listen)进入连接。 如果客户端发起一个连接请求,守护进程就创建(Fork)一个子进程响应这个连接,而主进程继续监听其他的服务请求。 xinetd能够同时监听…

《精通Spring4.X企业应用开发实战》读后感第二章

一、配置Maven\tomcat https://www.cnblogs.com/Miracle-Maker/articles/6476687.html https://www.cnblogs.com/Knowledge-has-no-limit/p/7240585.html 二、创建数据库表 DROP DATABASE IF EXISTS sampledb; CREATE DATABASE sampledb DEFAULT CHARACTER SET utf8; USE sampl…

视图可视化 后台_如何在单视图中可视化复杂的多层主题

视图可视化 后台Sometimes a dataset can tell many stories. Trying to show them all in a single visualization is great, but can be too much of a good thing. How do you avoid information overload without oversimplification?有时数据集可以讲述许多故事。 试图在…