小型数据库_如果您从事“小型科学”工作,那么您是否正在利用数据存储库?

小型数据库

If you’re a scientist, especially one performing a lot of your research alone, you probably have more than one spreadsheet of important data that you just haven’t gotten around to writing up yet. Maybe you never will. Sitting idle on a hard drive, that “dark data” could prove very useful to someone in the future (or even someone in the present), especially as our climate and society changes.

如果您是一名科学家,尤其是一个人独自进行大量研究,那么您可能拥有多个重要数据电子表格,而这些电子数据您还没有写出来。 也许你永远不会。 闲置在硬盘上的“黑暗数据”可能对将来的某人(甚至现在的某人)非常有用,尤其是在我们的气候和社会变化的情况下。

What are you going to do with those files? How are you going to preserve them?

您将如何处理这些文件? 您将如何保存它们?

If you’re like me, maybe you’ve felt the terror of losing data every time you moved your files to a new computer or moved your research to a new job. Did you remember to back up that spreadsheet from your brilliant pet project from 7 years ago? If you did back it up, are you sure you backed up the most recent version? It’s sobering to imagine other people have gone through this and lost potentially valuable species records, survey data, and field observations.

如果您像我一样,也许您每次将文件移至新计算机或将研究移至新工作时都会感到丢失数据的恐惧。 您是否还记得从7年前的出色宠物项目中备份了该电子表格? 如果您备份过,是否确定备份了最新版本? 想象其他人经历了这一过程并失去了可能有价值的物种记录,调查数据和实地观察结果,这真是令人震惊。

营救的数字数据存储库 (Digital Data Repositories to the Rescue)

In the years before I returned to graduate school, I worked for a science nonprofit on Nantucket Island, Massachusetts, and this problem haunted me all the time. Over nearly a decade there, I accumulated spreadsheets filled with very localized, ecological data, but had no way to organize it, save it, and share it. Fortunately, a solution is emerging in the form of digital repositories backed with robust metadata schemes and indexing services. Importantly, some of these repositories are accessible to everyone, and no university affiliation is required.

回到研究生院的前几年,我在马萨诸塞州楠塔基特岛的一家科学非营利组织工作,这个问题一直困扰着我。 在那附近的近十年中,我积累了电子表格,其中包含非常本地化的生态数据,但是却无法组织,保存和共享它。 幸运的是,以强大的元数据方案和索引服务为后盾的数字存储库的形式正在出现一种解决方案。 重要的是,每个人都可以访问其中一些存储库,并且不需要大学附属机构。

In May 2020, Meghan Mitchell, Christopher Tillman Neal and I launched a digital repository for the Nantucket Biodiversity Initiative (NBI). The repository stores and protects environmental and ecology research data from around Nantucket, but it is focused on projects funded by NBI. Visit the Nantucket Biodiversity Digital Repository and browse through the files to learn about bat counts, spider surveys, sandplain grassland research, and much more.

2020年5月,我和梅根·米切尔 ( Meghan Mitchell) , 克里斯托弗·蒂尔曼·尼尔 ( Christopher Tillman Neal)共同为楠塔基特生物多样性倡议 (NBI)建立了一个数字仓库。 该存储库可以存储和保护Nantucket周围的环境和生态研究数据,但它的重点是由NBI资助的项目。 访问Nantucket生物多样性数字资料库 ,浏览文件,以了解蝙蝠数量,蜘蛛调查,滩涂草地研究等更多信息。

A snapping turtle on Nantucket Island, Massachusetts
A snapping turtle on Nantucket. Over half of Nantucket Island is conservation land and scientific species inventories date back to the late 1800’s. There is a wealth of information that would benefit from being published to a repository. Photo: Andrew Mckenna-Foster
楠塔基特岛上的一只鳄龟。 楠塔基特岛一半以上的土地是自然保护区,科学物种清单可追溯到1800年代后期。 通过发布到存储库中可以获得大量信息。 照片:安德鲁·麦肯纳·福斯特(Andrew Mckenna-Foster)

We used Zenodo, a free platform that allows anyone to upload research related files. Zenodo stores the files forever, makes them searchable on the internet, and even gives them a digital object identifier (DOI). However, uploading your files to a repository is the easy part of the solution; to make data useful far into the future, it is crucial to follow the core principles of data publishing and sharing. Uploading data with no context makes it one more piece of junk in the vastness of the internet.

我们使用了Zenodo ,这是一个免费平台,任何人都可以上传与研究相关的文件。 Zenodo永久存储文件,使它们可以在Internet上搜索,甚至为它们提供数字对象标识符(DOI)。 但是,将文件上传到存储库是该解决方案的简单部分。 为了使数据对将来有用,遵循数据发布和共享的核心原则至关重要。 在没有上下文的情况下上传数据会使它在互联网的广阔空间中变得更加垃圾。

记录数据很困难,但是绝对必要 (Documenting Data is Difficult but Absolutely Essential)

Published data should be FAIR: Findable, Accessible, Interoperable, and Reusable. In practice, this means

发布的数据应公平 :可查找,可访问,可互操作和可重用。 实际上,这意味着

  • Describing the data with a solid description, useful keywords, and author information (metadata)

    用可靠的描述,有用的关键字和作者信息(元数据)描述数据
  • Using a standard metadata scheme so that the information can be easily shared

    使用标准的元数据方案,以便可以轻松共享信息
  • Uploading the files in an open format (like CSV)

    以开放格式(例如CSV)上传文件
  • Licensing the data so that people and machines will understand how the data can be used.

    授予数据许可,以便人和机器可以理解如何使用数据。

That is only the bare minimum. While Zenodo and other free repository platforms like figshare and Dataverse simplify this process, it still requires work and planning.

那只是最低限度。 虽然Zenodo和其他免费的存储库平台(例如figshare和Dataverse)简化了此过程,但仍需要进行工作和计划​​。

The meat of our project was working with NBI to create a workflow that curates and applies metadata to all reports and datasets before publication. If you want to set up a repository for yourself or your organization, this is where you should focus most of your energy. We built a documentation site on GitHub that describes the process in detail and is free to copy.

我们项目的重点是与NBI合作创建一个工作流,该工作流在发布之前对所有报表和数据集进行策展并将其应用于元数据。 如果您想为自己或您的组织建立存储库,则应在此处集中精力。 我们在GitHub上建立了一个文档站点 ,该站点详细描述了该过程,可以免费复制。

那么,结果是什么? (So, What are the Outcomes?)

The repository is growing as we curate and upload reports and data going back to 2005. More importantly,

随着我们整理和上载可追溯到2005年的报告和数据,该信息库正在增长。更重要的是,

  • NBI now has a permanent, accessible, and shareable library of the research it has supported.

    NBI现在拥有其支持的研究的永久,可访问且可共享的库。
  • Researchers who work on or near Nantucket now have a way to publish their data and reports.

    现在,在Nantucket上或附近工作的研究人员可以发布其数据和报告。
  • People looking for data and information for the area can now browse current and past research. Importantly, they can cite any information they use, giving authors the credit they deserve.

    正在寻找该地区数据和信息的人们现在可以浏览当前和过去的研究。 重要的是,他们可以引用自己使用的任何信息,从而为作者提供应有的信誉。
  • I can sleep at night knowing the data I spent years collecting has a permanent home.

    我知道自己花了数年收集的数据拥有永久性住所,因此我可以在晚上入睡。
Charts showing what types of files have been uploaded to the digital repository
A summary of the repository as of August 2020. We use Zenodo’s API to harvest metadata from the Nantucket Biodiversity Digital Repository for visualization using Python. These charts are only possible because the workflow we designed controls how keywords are assigned.
截至2020年8月的存储库摘要。我们使用Zenodo的API从Nantucket生物多样性数字存储库中收集元数据,以便使用Python进行可视化。 这些图表是唯一可行的,因为我们设计的工作流程控制着关键字的分配方式。

As NBI continues to support research and add files to this repository, publishing the raw data, not just a project report, will be especially important. With that data in hand, researchers in 10, 50, or 100 years will be able to reproduce and directly compare data from species surveys, population surveys, and management regimes.

随着NBI继续支持研究并向该存储库添加文件,发布原始数据(而不仅仅是项目报告)将变得尤为重要。 有了这些数据,研究人员将能够在10、50或100年内重现并直接比较物种调查,种群调查和管理制度中的数据。

存储库已被使用 (The Repository Is Already Being Used)

The icing on the cake is that since the repository became operational, it has already proven useful: I recently shared a dataset on Nantucket tarantulas with another spider researcher who was looking for a way to cite our observations.

锦上添花的是,自该库投入运行以来,它已被证明是有用的:我最近与另一位蜘蛛研究人员共享了Nantucket tarantulas的数​​据集,该研究人员正在寻找一种方法来引用我们的观察结果。

I hope you consider publishing your data whenever possible and choose to follow the FAIR principles. The open science community is growing rapidly and offers numerous resources for anyone to get started. I am always open to questions and collaborations so please contact me if you’re interested in working together.

我希望您考虑在任何可能的时候发布数据,并选择遵循FAIR原则。 开放式科学界正在Swift发展,并为任何人提供了众多的资源。 我总是对问题和合作持开放态度,因此,如果您有兴趣合作,请与我联系。

翻译自: https://medium.com/swlh/if-you-work-in-small-science-are-you-leveraging-data-repositories-357cabfc2326

小型数据库

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388753.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

BitmapEffect位图效果是简单的像素处理操作。它可以呈现下面几种特殊效果。

BitmapEffect位图效果是简单的像素处理操作。它可以呈现下面几种特殊效果。 BevelBitmapEffect 凹凸效果 BlurBitmapEffect 模糊效果 DropShadowBitmapEffect投影效果 EmbossBitmapEffect 浮雕效果 Outer…

AutoScaling 与函数计算结合,赋予更丰富的弹性能力

目前,弹性伸缩服务已经接入了负载均衡(SLB)、云数据库RDS 等云产品,但是暂未接入 云数据库Redis,有时候我们可能会需要弹性伸缩服务在扩缩容的时候自动将扩缩容涉及到的 ECS 实例私网 IP 添加到 Redis 白名单或者从 Re…

参考文献_参考

参考文献Recently, I am attracted by the news that Tanzania has attained lower middle income status under the World Bank’s classification, five years ahead of projection. Being curious on how they make the judgement, I take a look of the World Bank’s offi…

java语言静态分析工具_PMD 6.16.0 发布,跨语言静态代码自动分析工具

PMD 6.16.0 发布了。PMD 是一个代码分析器,能够帮助发现常见的编程问题,比如未使用的变量、空的 catch 块、不必要的对象创建等等。最初仅支持 Java 代码,目前还可支持 JavaScript、Salesforce.com Apex 和 Visualforce、PLSQL、Apache Veloc…

B1922 [Sdoi2010]大陆争霸 最短路

我一直都不会dij的堆优化,今天搞了一下。。。就是先弄一个优先队列,存每个点的数据,然后这个题就加了一点不一样的东西,每次的最短路算两次,一次是自己的最短路,另一次是机关的最短路,两者取最大…

WPF中的鼠标事件详解

WPF中的鼠标事件详解 Uielement和ContentElement都定义了十个以Mouse开头的事件,8个以PreviewMouse开头的事件,MouseMove,PreviewMouseMove,MouseEnter,Mouseleave的事件处理器类型都是MouseEventHandler类型。这些事件都具备对应得MouseEventargs对象。…

数据统计 测试方法_统计测试:了解如何为数据选择最佳测试!

数据统计 测试方法This post is not meant for seasoned statisticians. This is geared towards data scientists and machine learning (ML) learners & practitioners, who like me, do not come from a statistical background.Ť他的职位是不是意味着经验丰富的统计人…

前端介绍-35

前端介绍-35 # 前端## 一、什么是前端 前端即网站前台部分,运行在PC端,移动端等浏览器上展现给用户浏览的网页。随着互联网技术的发展,HTML5,CSS3,前端框架的应用,跨平台响应式网页设计能够适应各种屏幕…

spring的几个通知(前置、后置、环绕、异常、最终)

1、没有异常的 2、有异常的 1、被代理类接口Person.java 1 package com.xiaostudy;2 3 /**4 * desc 被代理类接口5 * 6 * author xiaostudy7 *8 */9 public interface Person { 10 11 public void add(); 12 public void update(); 13 public void delete();…

每个Power BI开发人员的Power Query提示

If someone asks you to define the Power Query, what should you say? If you’ve ever worked with Power BI, there is no chance that you haven’t used Power Query, even if you weren’t aware of it. Therefore, one could easily say that Power Query is the “he…

c# PDF 转换成图片

1.新建项目 2.新增一个新文件夹“lib”(主要是为了存放引用的dll) 3.将“gsdll32.dll 、PDFLibNet.dll 、PDFView.dll”3个dll添加到文件夹中 4.项目添加“PDFLibNet.dll 、PDFView.dll”2个类库的引用,并将gsdll32.dll 拷贝到项目生产根…

java finally在return_Java finally语句到底是在return之前还是之后执行?

点击上方“方志朋”,选择“置顶或者星标”你的关注意义重大!网上有很多人探讨Java中异常捕获机制try...catch...finally块中的finally语句是不是一定会被执行?很多人都说不是,当然他们的回答是正确的,经过我试验&#…

oracle 死锁

为什么80%的码农都做不了架构师?>>> ORA-01013: user requested cancel of current operation 转载于:https://my.oschina.net/8808/blog/2994537

面试题:二叉树的深度

题目描述:输入一棵二叉树,求该树的深度。从根结点到叶结点依次经过的结点(含根、叶结点)形成树的一条路径,最长路径的长度为树的深度。 思路:递归 //递归 public class Solution {public int TreeDepth(Tre…

a/b测试_如何进行A / B测试?

a/b测试The idea of A/B testing is to present different content to different variants (user groups), gather their reactions and user behaviour and use the results to build product or marketing strategies in the future.A / B测试的想法是将不同的内容呈现给不同…

hibernate h2变mysql_struts2-hibernate-mysql开发案例 -解道Jdon

Hibernate专题struts2-hibernate-mysql开发案例与源码源码下载本案例展示使用Struts2,Hibernate和MySQL数据库开发一个个人音乐管理器Web应用程序。,可将您的音乐收藏添加到数据库中。功能有:显示一个添加记录的表单和所有的音乐收藏的列表。…

P5024 保卫王国

传送门 我现在还是不明白为什么NOIPd2t3会是一道动态dp…… 首先关于动态dp可以看这里 然后这里就是把把矩阵给改一改,改成这个形式\[\left[dp_{i-1,0},dp_{i-1,1}\right]\times \left[\begin{matrix}\infty&ldp_{i,1}\\ldp_{i,0}&ldp_{i,1}\end{matrix}\ri…

提取图像感兴趣区域_从图像中提取感兴趣区域

提取图像感兴趣区域Welcome to the second post in this series where we talk about extracting regions of interest (ROI) from images using OpenCV and Python.欢迎来到本系列的第二篇文章,我们讨论使用OpenCV和Python从图像中提取感兴趣区域(ROI)。 As a rec…

解决java compiler level does not match the version of the installed java project facet

ava compiler level does not match the version of the installed java project facet错误的解决 因工作的关系,Eclipse开发的Java项目拷来拷去,有时候会报一个很奇怪的错误。明明源码一模一样,为什么项目复制到另一台机器上,就会…

php模板如何使用,ThinkPHP如何使用模板

到目前为止,我们只是使用了控制器和模型,还没有接触视图,下面来给上面的应用添加视图模板。首先我们修改下 Action 的 index 操作方法,添加模板赋值和渲染模板操作。PHP代码classIndexActionextendsAction{publicfunctionindex(){…