COVID-19研究助理

These days scientists, researchers, doctors, and medical professionals face challenges to develop answers to their high priority scientific questions.

如今,科学家,研究人员,医生和医学专家面临着挑战,无法为其高度优先的科学问题找到答案。

The rapid acceleration in new coronavirus literature makes it difficult for the medical research community to Keep Up. Therefore there’s a growing urgency for approaches in Natural Language Processing and AI to help medical professionals generate new insights in support of the ongoing fight against this infectious disease.

新的冠状病毒文献的Swift发展使医学研究界难以跟上。 因此,越来越需要采用自然语言处理和AI的方法来帮助医学专业人士产生新见解,以支持正在进行的抵抗这种传染病的斗争。

Objective:

目的:

We aim to assist medical professionals to accelerate their work to help fight COVID19. This will help reduce search time for the medical professional time by accessing a wider range of research resources. All the resources they need in one place.

我们旨在协助医疗专业人员加快工作速度,以对抗COVID19。 通过访问更广泛的研究资源,这将有助于减少医学专业人士的搜索时间。 他们需要的所有资源都集中在一处。

Datasets challenge:

数据集挑战:

Kaggle has prepared free accessible datasets related to COVID-19 Open Research Dataset (CORD-19).

Kaggle已准备了与COVID-19开放研究数据集(CORD-19)相关的免费的可访问数据集。

Image for post
Open Research Dataset Challenge (CORD-19)
开放研究数据集挑战(CORD-19)

The Cord-19 resource offers more than 158,000 scholarly articles, including over 75,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses.

Cord-19资源提供了超过158,000篇学术文章,其中包括超过75,000篇全文,涉及COVID-19,SARS-CoV-2和相关冠状病毒。

We found these datasets useful to apply the Watson Discovery AI Search Engine on those articles.

我们发现这些数据集对于将Watson Discovery AI搜索引擎应用于这些文章非常有用。

Watson Discovery is a search tool powered by machine learning which can continue to learn and improve over time.

Watson Discovery是一种基于机器学习的搜索工具,可以随着时间的推移不断学习和改进。

With this provided datasets 158,000 scholarly articles, we have only prepared “comm_use_subset” which it has 9,120 articles to feed inside Watson Discovery.

借助此提供的数据集,有158,000篇学术文章我们仅准备了“ comm_use_subset ”,其中有 9,120 篇文章可以在Watson Discovery中提供。

Image for post

Solution:

解:

We are looking into building an assistant smart AI conversational chatbot to answer the user’s high priority scientific questions.

我们正在研究建立一个辅助智能AI对话聊天机器人,以回答用户的高优先级科学问题。

Step 1: Data analysis: clean the data from JSON files based on text-only:

步骤1:数据分析:基于纯文本清除JSON文件中的数据:

we are extracting articles “full-text article” from JSON files and save the results in the form of Txts \ Html.

我们 将从JSON文件 中提取文章 “全文文章” ,并将结果保存为Txts \ Html的形式。

Due to Watson Discovery limit with 50k characters for every single document, the datasets are provided in “JSON files” which all the file has more than 50k characters because of JSON codes. Therefore, We have applied this simple py script below to extract “full-text article” from JSON files and save the results in the form of TXT \ HTML.

由于每个文档的Watson Discovery限制为50k个字符,因此在“ JSON文件”中提供了数据集,由于JSON代码,所有文件都超过50k个字符。 因此,我们在下面应用了这个简单的py脚本,以 从JSON文件中 提取 “全文文章” ,并将结果保存为TXT \ HTML格式。

In our case, we have saved the results in HTML format, because WD doesn’t support Txt format. WD only supports document formats like pdf, word, excel, PowerPoint, Html, png, jpeg, and JSON.

在本例中,我们将结果保存为HTML格式,因为WD不支持Txt格式。 WD仅支持pdf,word,excel,PowerPoint,Html,png,jpeg和JSON等文档格式。

The script below does; Each executed file formatted like the following order:

下面的脚本可以; 每个执行的文件的格式如下:

  • title

    标题
  • abstract

    抽象
  • full-text article

    全文

This task helped us to have clear data formatted as a text document, it will be easy to manage the data capacity for the numbers of characters in each files.

此任务帮助我们将清晰的数据格式化为文本文档,可以轻松管理每个文件中字符数的数据容量。

Py Script:

Py脚本:

import json
import os, glob#directory for atricle json files:
articles_dir = 'D:/DEV/Kaggle/CORD-19-research-challenge/comm_use_subset/comm_use_subset'
#output directory for processed files:
output_dir = 'D:/DEV/Kaggle/CORD-19-research-challenge/output'os.chdir(articles_dir)
for file in glob.glob('*.json'):
print('Processing file: ',file)
with open(file, 'r', encoding = 'utf8') as article_file:
article = json.load(article_file)
title = article['metadata']['title']
abstract_sections = []
abstract_texts = dict()

body_sections = []
body_texts = dict()#reading abstracts
for abst in article['abstract']:
abst_section = abst['section']
abst_text = abst['text']
if abst_section not in abstract_sections:
abstract_sections.append(abst_section)
abstract_texts[abst_section] = abst_text
else:
abstract_texts[abst_section] = abstract_texts[abst_section] + '\n' + abst_text

#reading body
for body in article['body_text']:
body_section = body['section']
body_text = body['text']
if body_section not in body_sections:
body_sections.append(body_section)
body_texts[body_section] = body_text
else:
body_texts[body_section] = body_texts[body_section] + '\n' + body_text

with open(output_dir+'/clean.'+file.replace('.json','.html') , 'w', encoding = 'utf8') as out_file:
out_file.writelines(title)
out_file.writelines('\n\n')
#print abstracts
for a_section in abstract_sections:
out_file.writelines('\n\n')
out_file.writelines(a_section)
out_file.writelines('\n')
out_file.writelines(abstract_texts[a_section])
#print body
for b_section in body_sections:
out_file.writelines('\n\n')
out_file.writelines(b_section)
out_file.writelines('\n')
out_file.writelines(body_texts[b_section])
out_file.writelines('\n')

Step 2: Feed Watson Discovery:

步骤2:输入 Watson Discovery:

Create your IBM free Cloud account: https://ibm.biz/BdqbAU

创建您的IBM免费云帐户: https : //ibm.biz/BdqbAU

With Watson Discovery smart AI search engine, we have fed and trained our queries and rated the results with WD Machine learning.

借助Watson Discovery智能AI搜索引擎,我们已经喂饱并训练了我们的查询,并通过WD Machine learning对结果进行了评分。

Image for post
Image for post

Rate the best relevant article for an example question that will be asking by a researcher.

为最相关的文章评分,以解决研究人员将要提出的示例问题。

Image for post

This task required a lot of reading and understanding the academic and scientific articles, we have built around 100 queries so far.

此任务需要大量阅读和理解学术和科学文章,到目前为止,我们已经建立了约100个查询。

Expected questions from the user with best relevant answers.

用户期望的问题以及最佳的相关答案。

Image for post

Step4: Integrate Watson Assistant with Watson Discovery:

步骤4:将Watson Assistant与Watson Discovery集成在一起:

Watson Assistant is a conversation AI platform that helps you provide customers fast, straightforward, and accurate answers to their questions, across any application, device, or channel.

Watson Assistant是一个对话式AI平台,可帮助您在任何应用程序,设备或渠道上为客户提供快速,直接,准确的问题解答。

Calling Watson Assistant from Java Script for the server connection:

从Java Script调用Watson Assistant进行服务器连接:

const AssistantV1 = require('ibm-watson/assistant/v1');
const { getAuthenticatorFromEnvironment } = require('ibm-watson/auth');// need to manually set url and disableSslVerification to get around
// current Cloud Pak for Data SDK issue IF user uses
// `CONVERSATION_` prefix in run-time environment.
let auth;
let url;
let disableSSL = false;try {
// ASSISTANT should be used
auth = getAuthenticatorFromEnvironment('ASSISTANT');
url = process.env.ASSISTANT_URL;
if (process.env.ASSISTANT_DISABLE_SSL == 'true') {
disableSSL = true;
}
} catch (e) {
// but handle if alternate CONVERSATION is used
auth = getAuthenticatorFromEnvironment('CONVERSATION');
url = process.env.CONVERSATION_URL;
if (process.env.CONVERSATION_DISABLE_SSL == 'true') {
disableSSL = true;
}
}
console.log('Assistant auth:',JSON.stringify(auth, null, 2));const assistant = new AssistantV1({
version: '2020-03-01',
authenticator: auth,
url: url,
disableSslVerification: disableSSL
});// SDK uses workspaceID, but Assistant tooling refers to the this value as the SKILL ID.
assistant.workspaceId = process.env.ASSISTANT_SKILL_ID;module.exports = assistant;

Step 5: Test the app

步骤5 :测试应用

The methodology is defined as: - The user interacts with Watson Assistant.- Watson Assistant Invokes Watson Discovery.- Watson Discovery finds the optimal results regarding the queries and responds to the Assistant.- Watson Assistant displays the results to the User.

该方法定义为: -用户与Watson Assistant交互。-Watson Assistant调用Watson Discovery。-Watson Discovery找到有关查询的最佳结果并响应Assistant。-Watson Assistant将结果显示给用户。

Finally, we have Integrated Watson Assistant with Watson Discovery, then configured the front-end app with Watson Assistant, then deployed on the IBM cloud. The app is live running, we are going to keep it alive for a while

最后 ,我们将Watson Assistant与Watson Discovery集成在一起,然后使用Watson Assistant配置了前端应用程序,然后将其部署在IBM云上。 该应用程序正在运行,我们将使其保持一段时间

live Demo: https://covid19assistantcfc.mybluemix.net/

现场演示: https : //covid19assistantcfc.mybluemix.net/

We are still generating real crisis questions from the abstracts and the articles, we will be able to keep training the Discovery, and rates the best answers for the bot.

我们仍在从摘要和文章中产生真正的危机问题,我们将能够继续培训Discovery,并为该机器人评价最佳答案。

Image for post

Project Demo:

项目演示:

演示地址

Conclusion:

结论:

To conclude, this conversational AI chatbot in the research community can be beneficial to help scientists and doctors reducing time and accelerating their work to fight back COVID-19.

总而言之,研究社区中的这种对话式AI聊天机器人可以帮助科学家和医生减少时间并加快反击COVID-19的工作。

GitHub Repository for this project:

该项目的GitHub存储库:

[1] https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge

[1] https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge

[2] https://www.semanticscholar.org/cord19

[2] https://www.semanticscholar.org/cord19

[3] https://ai2-semanticscholar-cord-19.s3-us-west-2.amazonaws.com/historical_releases.html

[3] https://ai2-semanticscholar-cord-19.s3-us-west-2.amazonaws.com/historical_releases.html

[4] https://www.statnews.com/2020/03/16/database-launched-to-spur-ai-tools-to-fight-coronavirus/

[4] https://www.statnews.com/2020/03/16/database-launched-to-spur-ai-tools-to-fight-coronavirus/

[5] https://github.com/Call-for-Code/Solution-Starter-Kit-Communication-2020#the-idea

[5] https://github.com/Call-for-Code/Solution-Starter-Kit-Communication-2020#the-idea

翻译自: https://medium.com/swlh/covid-19-research-assistant-using-ai-watson-discovery-to-analyze-open-research-dataset-by-kaggle-9807cf467626

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/390832.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Go语言实战 : API服务器 (8) 中间件

为什么需要中间件 我们可能需要对每个请求/返回做一些特定的操作,比如 记录请求的 log 信息在返回中插入一个 Header部分接口进行鉴权 这些都需要一个统一的入口。这个功能可以通过引入 middleware 中间件来解决。Go 的 net/http 设计的一大特点是特别容易构建中间…

缺失值和异常值的识别与处理_识别异常值-第一部分

缺失值和异常值的识别与处理📈Python金融系列 (📈Python for finance series) Warning: There is no magical formula or Holy Grail here, though a new world might open the door for you.警告 : 这里没有神奇的配方或圣杯,尽管…

leetcode 664. 奇怪的打印机(dp)

题目 有台奇怪的打印机有以下两个特殊要求: 打印机每次只能打印由 同一个字符 组成的序列。 每次可以在任意起始和结束位置打印新字符,并且会覆盖掉原来已有的字符。 给你一个字符串 s ,你的任务是计算这个打印机打印它需要的最少打印次数。…

PHP7.2 redis

为什么80%的码农都做不了架构师?>>> PHP7.2 的redis安装方法: 顺便说一下PHP7.2的安装: wget http://cn2.php.net/distributions/php-7.2.4.tar.gz tar -zxvf php-7.2.4.tar.gz cd php-7.2.4./configure --prefix/usr/local/php…

梯度 cv2.sobel_TensorFlow 2.0中连续策略梯度的最小工作示例

梯度 cv2.sobelAt the root of all the sophisticated actor-critic algorithms that are designed and applied these days is the vanilla policy gradient algorithm, which essentially is an actor-only algorithm. Nowadays, the actor that learns the decision-making …

垃圾回收算法优缺点对比

image.pngGC之前 说明:该文中的GC算法讲解不仅仅局限于某种具体开发语言。 mutator mutator 是 Edsger Dijkstra 、 琢磨出来的词,有“改变某物”的意思。说到要改变什么,那就是 GC 对象间的引用关系。不过光这么说可能大家还是不能理解&…

yolo人脸检测数据集_自定义数据集上的Yolo-V5对象检测

yolo人脸检测数据集计算机视觉 (Computer Vision) Step by step instructions to train Yolo-v5 & do Inference(from ultralytics) to count the blood cells and localize them.循序渐进的说明来训练Yolo-v5和进行推理(来自Ultralytics )以对血细胞进行计数并将其定位。 …

图深度学习-第2部分

有关深层学习的FAU讲义 (FAU LECTURE NOTES ON DEEP LEARNING) These are the lecture notes for FAU’s YouTube Lecture “Deep Learning”. This is a full transcript of the lecture video & matching slides. We hope, you enjoy this as much as the videos. Of cou…

Linux下 安装Redis并配置服务

一、简介 1、 Redis为单进程单线程模式,采用队列模式将并发访问变成串行访问。 2、 Redis不仅仅支持简单的k/v类型的数据,同时还提供list,set,zset,hash等数据结构的存储。 3、 Redis支持数据的备份,即mas…

leetcode 477. 汉明距离总和(位运算)

theme: healer-readable 题目 两个整数的 汉明距离 指的是这两个数字的二进制数对应位不同的数量。 计算一个数组中,任意两个数之间汉明距离的总和。 示例: 输入: 4, 14, 2 输出: 6 解释: 在二进制表示中,4表示为0100,14表示为1110&…

量子信息与量子计算_量子计算为23美分。

量子信息与量子计算On Aug 13, 2020, AWS announced the General Availability of Amazon Braket. Braket is their fully managed quantum computing service. It is available on an on-demand basis, much like SageMaker. That means the everyday developer and data scie…

全面理解Java内存模型

Java内存模型即Java Memory Model,简称JMM。JMM定义了Java 虚拟机(JVM)在计算机内存(RAM)中的工作方式。JVM是整个计算机虚拟模型,所以JMM是隶属于JVM的。 如果我们要想深入了解Java并发编程,就要先理解好Java内存模型。Java内存模型定义了多…

leetcode 1074. 元素和为目标值的子矩阵数量(map+前缀和)

给出矩阵 matrix 和目标值 target&#xff0c;返回元素总和等于目标值的非空子矩阵的数量。 子矩阵 x1, y1, x2, y2 是满足 x1 < x < x2 且 y1 < y < y2 的所有单元 matrix[x][y] 的集合。 如果 (x1, y1, x2, y2) 和 (x1’, y1’, x2’, y2’) 两个子矩阵中部分坐…

失物招领php_新奥尔良圣徒队是否增加了失物招领?

失物招领phpOver the past couple of years, the New Orleans Saints’ offense has been criticized for its lack of wide receiver options. Luckily for Saints’ fans like me, this area has been addressed by the signing of Emmanuel Sanders back in March — or has…

leetcode 5756. 两个数组最小的异或值之和(状态压缩dp)

题目 给你两个整数数组 nums1 和 nums2 &#xff0c;它们长度都为 n 。 两个数组的 异或值之和 为 (nums1[0] XOR nums2[0]) (nums1[1] XOR nums2[1]) … (nums1[n - 1] XOR nums2[n - 1]) &#xff08;下标从 0 开始&#xff09;。 比方说&#xff0c;[1,2,3] 和 [3,2,1…

客户细分模型_Avarto金融解决方案的客户细分和监督学习模型

客户细分模型Lets assume that you are a CEO of a company which have some X amount of customers in a city with 1000 *X population. Analyzing the trends/features of your customer and segmenting the population of the city to land new potential customers would …

leetcode 231. 2 的幂

给你一个整数 n&#xff0c;请你判断该整数是否是 2 的幂次方。如果是&#xff0c;返回 true &#xff1b;否则&#xff0c;返回 false 。 如果存在一个整数 x 使得 n 2x &#xff0c;则认为 n 是 2 的幂次方。 示例 1&#xff1a; 输入&#xff1a;n 1 输出&#xff1a;tr…

leetcode 342. 4的幂

给定一个整数&#xff0c;写一个函数来判断它是否是 4 的幂次方。如果是&#xff0c;返回 true &#xff1b;否则&#xff0c;返回 false 。 整数 n 是 4 的幂次方需满足&#xff1a;存在整数 x 使得 n 4x 示例 1&#xff1a; 输入&#xff1a;n 16 输出&#xff1a;true …

梯度反传_反事实政策梯度解释

梯度反传Among many of its challenges, multi-agent reinforcement learning has one obstacle that is overlooked: “credit assignment.” To explain this concept, let’s first take a look at an example…在许多挑战中&#xff0c;多主体强化学习有一个被忽略的障碍&a…

大数据与Hadoop

大数据的定义 大数据是指无法在一定时间内用常规软件工具对其内容进行抓取、管理和处理的数据集合。 大数据的概念–4VXV 1,数据量大&#xff08;Volume&#xff09;2,类型繁多&#xff08;Variety &#xff09;3,速度快时效高&#xff08;Velocity&#xff09;4,价值密度低…