大数据 vr csdn_VR中的数据可视化如何革命化科学

大数据 vr csdn

Astronomy has become a big data discipline, and the ever growing databases in modern astronomy pose many new challenges for analysts. Scientists are more frequently turning to artificial intelligence and machine learning algorithms to analyze multidimensional data sets. However, it is not only a methodological and technical challenge: it is also a visual one! Data visualization is driving discovery in astronomy and is also helping with communicating new findings to the general public. The history of information graphics shows how the transformation of data into knowledge is vital for understanding the data at hand, a subject I have previously written about here.

天文学已经成为一门大数据学科,而现代天文学中不断增长的数据库对分析人员提出了许多新的挑战。 科学家越来越多地转向人工智能和机器学习算法来分析多维数据集。 但是,这不仅是方法和技术上的挑战:这也是视觉上的挑战! 数据可视化正在推动天文学的发现,并且还有助于将新发现传达给公众。 信息图形的历史表明,如何将数据转换为知识对于理解手头的数据至关重要,这是我先前在此撰写的主题。

The problem of visualizing complex data and exploring it interactively is by no means new or limited to research. Examples from digital information design in bioinformatics and medicine (e.g. Genome Valence by Ben Fry or Meviatis by Ricarda Schuhmann) show how visualization can support the understanding of structures within data sets and facilitate exploration. The representation of the data’s dimensions (i.e. its parameter values) can result in dynamic and aesthetic data sculptures. Such visualizations are often quite beautiful in themselves but, crucially, their interactive features enable users to quickly make comparisons and interpret the data.

可视化复杂数据并以交互方式进行探索的问题绝不是新问题,也不是仅限于研究。 生物信息学和医学中的数字信息设计的示例(例如Ben Fry的Genome Valence或Ricarda Schuhmann的Meviatis )显示了可视化如何能够支持对数据集内结构的理解并促进探索。 数据尺寸的表示(即参数值)可以产生动态和美观的数据雕塑。 这样的可视化效果通常很漂亮,但至关重要的是,它们的交互功能使用户可以快速进行比较并解释数据。

Today’s digital media allows us to go beyond designing interactive on-screen three-dimensional applications. Both augmented reality (AR) and virtual reality (VR) make it possible for users to take a fresh look at their data and explore parameter spaces in 3D. There is so much potential for using these technologies in the field of information design. For VR, the advantages are obvious:

当今的数字媒体使我们不仅可以设计交互式屏幕三维应用程序。 增强现实(AR)和虚拟现实(VR)都使用户可以重新查看其数据并探索3D中的参数空间 。 在信息设计领域中使用这些技术的潜力很大。 对于VR,优势显而易见:

  • More space! VR offers a larger field of view than 2D images. This allows for multiple views to be arranged in space, making it easier to draw cross-references and connections.

    更多的空间! VR提供比2D图像更大的视野。 这样就可以在空间中排列多个视图,从而更容易绘制交叉引用和连接。

  • More dimensions! Compared to 2D graphics, VR visualizations offer additional parameters that can represent data (e.g. sound, haptics, lighting, interaction).

    更多尺寸! 与2D图形相比,VR可视化提供了可以表示数据的其他参数(例如,声音, 触觉 ,照明,交互)。

  • More structure! The perception of space and depth is more intuitive; enabling shapes and volumes to be recognized more quickly.

    更结构! 对空间和深度的感知更加直观; 使形状和体积更快地被识别。

  • More fun! Immersing yourself in the data and the ability to go from overview to detail by scaling the space is a powerful immersive experience.

    更多乐趣! 沉浸在数据中,并能够通过缩放空间来从概览到细节,这是一种强大的沉浸式体验。

了解未知的本质 (Understanding the nature of the unknown)

Inspired by the above research examples, the hypothesis I chose to explore for my bachelor thesis in Information Design was:

受到以上研究示例的启发,我选择为信息设计学士学位探索的假设是:

The presentation of scientific data with new digital media, especially VR, offers great potential for data analysis in science.

用新的数字媒体(尤其是VR)呈现科学数据,为科学数据分析提供了巨大潜力。

I wanted to test this hypothesis on a data set from my previous research which I had been struggling to get an overview of. During my PhD in Astrophysics, I was involved in the EXTraS project, which aimed to automatically classify unknown and newly discovered X-ray sources in the cosmos. The sources were observed by the X-ray satellite XMM-Newton from the European Space Agency (ESA). I set about designing the Virtual Data Cosmos as a way of grouping data with similar properties and visualizing these groups.

我想用以前的研究中的数据集来检验这个假设,而我一直在努力进行概述。 在获得天体物理学博士学位期间,我参与了EXTraS项目,该项目旨在自动对宇宙中未知和新发现的X射线源进行分类。 来源是由欧洲航天局 (ESA)的X射线卫星XMM-Newton观测到的。 我着手设计虚拟数据宇宙 ,以将具有相似属性的数据分组并可视化这些组的方式。

As more and more data is collected by X-ray satellites, the data archives of these satellites are growing annually. The records detail millions of sources that emit X-rays, and from which any newly found source could yield new physical discoveries. The classification of unknown sources is therefore hugely important in modern astronomy and, due to the sheer amount of data, intelligent algorithms are increasingly being adopted by astronomers worldwide.

随着越来越多的X射线卫星收集数据,这些卫星的数据档案每年都在增长。 记录详细记录了数百万个发出X射线的放射源,任何新发现的放射源都可以从中产生新的物理发现。 因此,未知源的分类在现代天文学中非常重要,由于数据量巨大,全球各地的天文学家都越来越多地采用智能算法。

The image below shows an image of the entire sky in the optical wavelength as seen from Earth. This projection scan be seen as analogous to a world map in which the galactic plane lies on the equator and the galactic center is in the center of the map. Just as in a normal world map there are longitudes and latitudes, shown as white grid lines. This is typically referred to as a sky map. Laid over the optical image are white dots; each represents a region observed by the X-ray satellite XMM-Newton. Each white dot includes several unknown X-ray sources. The objective of the project was to classify each of these sources.

下图显示了从地球看到的整个光学波长的天空图像。 该投影扫描被视为类似于世界地图,其中银河平面位于赤道上,而银河中心位于地图中心。 就像在正常的世界地图中一样,也有经度和纬度,以白色网格线显示。 这通常称为天空地图 。 光学图像上有白点。 每个代表一个由X射线卫星XMM-Newton观测到的区域。 每个白点包括几个未知的X射线源。 该项目的目的是对所有这些来源进行分类。

Sky map of the universe at optical wavelength as seen from Earth with positions of unknown X-ray sources laid over.
An optical sky map of the universe (Source: ESA) adapted to show the positions of unknown X-ray sources
宇宙的光学天空图(来源:ESA),适合显示未知X射线源的位置

In order to understand the nature of each X-ray source, astronomers compare its features (specifically the energetic and temporal properties observed) to those of objects with known classification types such as binary star or Seyfert galaxy. Questions like these help:

为了了解每个X射线源的性质,天文学家将其特征(特别是观察到的能量和时间性质)与具有已知分类类型的物体(如双星塞弗特星系)进行了比较 。 这些问题有帮助:

  • What are the correlations between the properties of the X-ray source and those of known object classification type?

    X射线源的属性与已知对象分类类型的属性之间有什么关联?
  • Where are the differences?

    区别在哪里?
  • Has the unknown object been discovered elsewhere in the electromagnetic spectrum which could yield further hints on its nature?

    是否在电磁光谱的其他地方发现了未知物体,这可能进一步暗示其性质?

In order to describe the similarity between an unknown and a known X-ray source we astronomers use statistics as well as visualization. In this case, machine learning algorithms (supervised decision tree algorithms to be precise) automatically characterized every source in this large and complex data set by comparing their precise parameter values (e.g. observed X-ray intensity) with those of known objects. Ultimately, the algorithms calculate the probability of an X-ray source belonging to various classification types and allocate it to the class that is most likely.

为了描述未知和已知X射线源之间的相似性,我们的天文学家使用了统计数据和可视化数据。 在这种情况下,通过将机器学习算法(精确的监督决策树算法 )的精确参数值(例如观察到的X射线强度)与已知对象的精确参数值进行比较,可以自动表征该庞大而复杂的数据集中的每个源。 最终,算法计算出X射线源属于各种分类类型的概率,并将其分配给最可能的类别。

For example: The X-ray source with ID 1 has a 45% probability of being a single star, a 30% probability of being a binary star and a 0.01% probability of being a galaxy. The algorithm therefore assigns the class with the highest probability as the final classification of the unknown source. In this case, source ID 1 would be classified as single star.

例如:ID为1的X射线源有45%的概率是单颗星,有30%的概率是双星,有0.01%的概率是星系。 因此,该算法将概率最高的类别指定为未知源的最终分类。 在这种情况下,源ID 1将被分类为单颗星。

Once the algorithm has classified all unknown sources in this way, the task of the astronomer is to carefully screen and control the results. How did the algorithm perform? Did it make mistakes? Since more than one algorithm was tested one would need to compare the results of each to answer these questions. Did different algorithms classify the same unknown source into different classes? Also, as a scientist, one also wants to know why an algorithm classified an object as it did. The astronomer requires an understanding of the relationship between different parameters and source classification types, and does this with the help of visualization.

一旦算法以这种方式对所有未知源进行分类,天文学家的任务就是仔细筛选和控制结果。 该算法如何执行? 它犯错了吗? 由于测试了多种算法,因此需要比较每种算法的结果来回答这些问题。 是否有不同的算法将相同的未知源划分为不同的类? 另外,作为科学家,人们也想知道 为什么 一种算法将对象分类。 天文学家需要了解不同参数与源分类类型之间的关系,并借助可视化来做到这一点。

传统科学的局限性 (The limitations of traditional science viz)

A typical method is to create multiple scatterplots in which the X-ray properties of unknown cosmic sources are compared with each other while taking into account the results of a single algorithm. This is done by assigning a unique color and symbol to a specific source classification and depicting X-ray sources with specific class symbols in the plot. We astronomers can then analyze whether the positions of sources depicted with the same symbol form patterns that help to distinguish different classification types.

一种典型的方法是创建多个散点图,其中将未知宇宙源的X射线属性相互比较,同时考虑到单个算法的结果。 这是通过为特定的源分类分配唯一的颜色和符号并在绘图中描绘具有特定类别符号的X射线源来完成的。 然后,我们的天文学家可以分析以相同符号表示的源位置是否形成有助于区分不同分类类型的模式。

Scatterplot of X-ray properties for cosmic sources. Because the data points overlap different classes cannot be distinguished
Typical scatterplots used in astronomy to explore data set dimensions. Classification types (e.g.: stars, galaxies, etc.) are coded by color and symbol.
天文学中用于探索数据集维度的典型散点图。 分类类型(例如:恒星,星系等)按颜色和符号编码。

For example: these scatterplots were created to investigate the relationships between parameter HR1 and parameters HR2, HR3, and HR4. The parameters are abstract properties used to describe specific radiation energies of the cosmic sources and visualizing them in the abstract plane enables us to look for patterns that may characterize the properties of different objects. The data points represent all unknown cosmic sources observed by the satellite.

例如:创建这些散点图以研究参数HR1与参数HR2,HR3和HR4之间的关系。 这些参数是抽象属性,用于描述宇宙源的特定辐射能,并在抽象平面中对其进行可视化处理使我们能够寻找可表征不同物体属性的模式。 数据点代表卫星观测到的所有未知宇宙源。

In this case, green triangles represent the class Seyfert galaxies, while purple squares depict the class of single variable stars that exist within our Milky Way. We see that the sources overlap if we only look at the HR1 parameter, but they occupy very different regions in the HR1-HR2 plane in the first scatterplot. Hence from that plot we can conclude that sources with a low HR1 and HR2 value belong to the purple square (variable star) class.

在这种情况下,绿色三角形代表塞弗特星系类别,而紫色正方形代表我们银河系中存在的单变星类别。 如果仅查看HR1参数,就会看到源重叠,但是在第一个散点图中,它们在HR1-HR2平面中占据了非常不同的区域。 因此,从该图可以得出结论,HR1和HR2值较低的源属于紫色正方形( 变星 )类。

But what about sources with high HR1 and HR2 values? Comparing only these parameters would put them in the galaxy (green) class. But there are many other classes which also occupy this region, e.g. blue triangles, which represent a kind of binary star system and this confuses the picture. To get a clearer understanding we now need to compare the HR1-HR2 parameter plane with the other scatterplots. If we now look at the second image, which illustrates the HR1-HR3 plane, we see that the sources shown in green and blue symbols are slightly more separated. And by combining the information of the first and second plots, we can identify the specific combinations of HR1, H2 and HR3 parameters that differentiate variable stars (purple), galaxies (green) and binary star systems (blue) .

但是,具有高HR1和HR2值的源又如何呢? 仅比较这些参数会将它们置于银河 (绿色)类中。 但是还有许多其他类别也占据了这个区域,例如蓝色三角形代表了一种双星系统 ,这使图片变得混乱。 为了更清楚地了解,我们现在需要将HR1-HR2参数平面与其他散点图进行比较。 现在,如果我们查看第二张图片,该图片说明了HR1-HR3平面,那么我们看到以绿色和蓝色符号显示的源稍微分开了。 并通过结合第一和​​第二个图的信息,我们可以确定HR1,H2和HR3参数的特定组合,以区分可变恒星 (紫色), 星系 (绿色)和双星系统 (蓝色)。

With each additional scatterplot we gradually form a mental model of a multidimensional parameter space in which each source class is located in a unique location. In principle this is what the algorithms do and is why our parameters are also known as the ‘dimensions’ of a data set. However, the larger the number of parameters and classes, the more difficult it is for humans to keep an overview of all relationships. It is simply not possible for us to imagine more than three dimensions at once.

通过每个其他散点图,我们逐渐形成多维参数空间的思维模型,其中每个源类都位于唯一的位置。 原则上,这就是算法的工作,也是为什么我们的参数也被称为数据集的“维度”的原因。 但是,参数和类的数量越多,人们越难掌握所有关系。 我们根本无法一次想象三个以上的维度。

In our sample, the size of the data set and the fact there were more than 50 parameters made it impossible to get an overview of all the relationships between parameter values and source classifications. The scatterplots required were simply too many and, due to the size of the data set, many regions were occupied by multiple source classes. The overlap of their symbols made it very difficult to see the data patterns.

在我们的样本中,数据集的大小以及超过50个参数的事实使得无法大致了解参数值与源分类之间的所有关系。 所需的散点图太多了,并且由于数据集的大小,许多区域被多个源类别占用。 它们符号的重叠使得很难看到数据模式。

In addition, these plots correspond to the classification by a single algorithm. So as we increase the number of algorithms in use, the number of plots would quickly become unmanageable. I concluded that this traditional 2D visualization did not allow a proper overview of the data, and was frustrated that the decision-making mechanisms of the algorithm remained opaque.

此外,这些图对应于单个算法的分类。 因此,随着我们增加使用的算法数量,地块数量将很快变得难以管理。 我得出的结论是,这种传统的2D可视化无法正确查看数据,并且对算法的决策机制仍然不透明感到沮丧。

设计虚拟数据宇宙 (Designing the Virtual Data Cosmos)

直接可视化数据 (Visualizing the data directly)

To come up with a new way to visualize this big data set, I first did some research on the history and principles of data visualization. I was fascinated by the creativity with which designers and scientists mapped their data.

为了提出一种可视化此大数据集的新方法,我首先对数据可视化的历史和原理进行了一些研究。 设计师和科学家绘制数据的创造力使我着迷。

Excellence in statistical graphics consists of complex ideas communicated with clarity, and efficiency.

统计图形方面的卓越表现包括复杂,清晰,高效的想法。

Edward Tufte coined the term ‘graphic excellence’ in data visualization. He postulated various properties that statistical graphics require to be successful. His theory was that data should be displayed directly without the user being distracted by the design itself. Furthermore, statistical graphics should serve a clear purpose (either description, exploration, tabulation or decoration) and should show several levels of detail, from a rough overview to the fine structure of the data.

Edward Tufte创造了数据可视化中的“图形卓越”一词。 他提出了统计图形必须具备的各种属性才能成功。 他的理论是,数据应直接显示,而用户不会因设计本身而分心。 此外,统计图形应具有明确的目的(描述,探索,制表或修饰),并应显示从粗糙的概述到数据的精细结构的多个细节级别。

Similar claims were made by a 2015 study on the visualization of big data in VR and AR. The authors concluded that for a data visualization to serve as an analysis tool, it requires the data concerned to be represented exactly. The implication for my work was that the data mapping had to be done through coding. This meant that the data values themselves would define the visual aesthetic of the virtual environment.

2015年关于VR和AR中大数据可视化的研究也提出了类似的主张。 作者得出的结论是,要使数据可视化充当分析工具,就需要准确地表示有关数据。 这对我的工作意味着数据映射必须通过编码来完成。 这意味着数据值本身将定义虚拟环境的视觉美感。

In addition, the interaction and scalability in a VR scene would allow the user to be fully immersed in the data and literally dive into it. One could easily move around and take different perspectives on the data set. Similarly, the user would be able to zoom out and get an overview, effectively holding the data in their hands. The data set could even be turned around and explored as though it were a physical object.

此外,VR场景中的交互性和可伸缩性将使用户完全沉浸在数据中,并从字面上深入其中。 人们可以轻松地走动,并对数据集采取不同的观点。 类似地,用户将能够缩小并获得概览,从而有效地将数据掌握在他们手中。 甚至可以将数据集转为一个物理对象并进行探索。

This, for me, was the most important aspect of the VR approach: it combined the advantage of data physicalization with the possibility to shape and manipulate the data environment, which is not possible in the real world.

对我而言,这是VR方法最重要的方面:它将数据物理化的优势与塑造和操纵数据环境的可能性相结合,这在现实世界中是不可能的。

A sketch illustrating two immersive moments in VR: holding the data in your hands versus diving into the data.
A sketch illustrating two immersive moments in VR: holding the data in your hands versus diving into the data
一张草图说明了VR中的两个沉浸式时刻:将数据掌握在手中与深入研究数据

Regardless of how the X-ray source data was organized, my principle idea was to pull the cluster of X-ray parameters and probabilities apart and display them in three-dimensional space. The goal was an interactive data visualization in VR in which the data could be explored directly. By interacting with a concrete virtual environment anyone could explore this abstract data space.

不管X射线源数据是如何组织的,我的基本思想都是将X射线参数和概率簇分开,并在三维空间中显示它们。 目标是在VR中进行交互式数据可视化,从而可以直接浏览数据。 通过与具体的虚拟环境进行交互,任何人都可以探索此抽象数据空间。

My solution for the problem resulted in the Virtual Data Cosmos. I’ll talk you through the design concept here. A detailed description of the design process will be explained in the next article in this series.

我针对该问题的解决方案产生了Virtual Data Cosmos 。 我将在这里与您讨论整个设计概念。 设计过程的详细描述将在本系列的下一篇文章中进行解释。

应用设计理念 (Applying the design concept)

I wanted to ensure that the visualization would first give the user an overview of the data and only then allow them to go into the detail. By zooming in on their chosen classification type, one would finally reach the DNA of the X-ray source (i.e., they would find details of its spectral parameters) and therefore understand why the algorithm assigned the source to a certain class.

我想确保可视化效果将首先为用户提供数据概览,然后才允许他们进入细节。 通过放大他们选择的分类类型,人们最终将到达X射线源的DNA(即,他们将找到其光谱参数的详细信息),因此可以理解为什么算法将源分配给特定类别。

The VR experience consists of two spaces; users can choose to zoom in and out to seamlessly move from one space to the other:

VR体验包含两个空间: 用户可以选择放大和缩小以从一个空间无缝移动到另一个空间:

  • The class room represents the entire cosmos and includes all data points, grouped according to their classification by the algorithms.

    教室代表整个宇宙,包括所有数据点,这些数据点根据算法的分类进行分组。

  • The parameter space represents the observed parameter values of a user-selected subsample of the X-ray sources, and their classification by a selected algorithm.

    参数空间表示用户选择的X射线源子样本的观察参数值,以及通过选定算法进行的分类。

The starting point was to create the ‘class room’, within which each classification type has its own three-dimensional volume. The class room visualizes the classification results of the X-ray sources by the various algorithms and allows users to explore the probability distributions within the database. It prompts questions such as:

起点是创建“教室”,其中每个分类类型都有自己的三维空间。 教室通过各种算法可视化X射线源的分类结果,并允许用户浏览数据库中的概率分布。 它提示如下问题:

  • How did an algorithm classify the unknown X-ray sources?

    算法如何对未知的X射线源进行分类?
  • What is the probability of a source of belonging to that source class?

    一个源属于该源类的概率是多少?
  • What could be an alternative classification?

    什么是替代分类?
Sketch of the VR showing the class room and parameter space, and how data points serve as a portal between the scenes.
A sketch of the VR concept showing the class room and parameter space, and how data points serve as a portal between the two spaces.
VR概念的草图,显示了教室和参数空间,以及数据点如何充当两个空间之间的门户。

Visualizing the complete data set in the class room was a very exciting moment! For the first time since the start of the EXTraS project, we were able to clearly visualize more than 500,000 data points without compromise, and compare the results of various algorithms all at once. I felt that I finally got a clear overview of the results and could easily see the distribution of all classified X-ray sources.

在教室中可视化完整的数据集是一个非常激动人心的时刻! 自EXTraS项目启动以来,我们首次能够毫不妥协地清晰地可视化超过500,000个数据点,并同时比较各种算法的结果。 我觉得我终于对结果有了一个清晰的概览,并且可以轻松看到所有分类X射线源的分布。

Here are some screenshots from the VR class room:

这是VR教室的一些屏幕截图:

Overview of the classification results in the class room shown as color-coded point clouds of different algorithms.
Overview of the classification results in the class room.
课堂中分类结果的概述。
Activated data points in the class room which show additional information and can be selected to be shown in parameter space.
Zooming in on the details of the data set in the class room.
放大教室中数据集的详细信息。

The next step was to understand how an algorithm distinguished between different classes. By zooming in and comparing the features of various selected X-ray sources one enters the parameter space. There is a lot to view here, and again we faced the problem of how to visualize all parameter dimensions at once.

下一步是了解算法如何区分不同的类。 通过放大并比较各种选定的X射线源的特征,可以进入参数空间。 这里有很多视图,而且我们再次面临如何一次性可视化所有参数维的问题。

The desire to pull the data points apart eventually led to the final approach: to let each source perform a ‘walk’ through space, each source starting from the same point. Their parameter values were used to define the direction and length of each step. This mapping yields that each source produced a unique path (or trace) in space, and objects with similar properties ended up in similar locations in the virtual cosmos.

将数据点分开的愿望最终导致了最终的方法:让每个源在空间中进行“漫游”,每个源都从同一点开始。 它们的参数值用于定义每个步骤的方向和长度。 通过这种映射,每个源都在空间中产生了唯一的路径(或轨迹),并且具有相似属性的对象最终位于虚拟宇宙中的相似位置。

For example, the following image shows the possible walks of three sources belonging to different classes. This one image allows us to draw the same conclusions that we received from comparing the three scatterplots from above.

例如,下图显示了属于不同类别的三个源的可能遍历。 这一幅图像使我们能够得出与从上方比较三个散点图所得出的相同结论。

Sketch which shows examples of parameter walks for three different source classes.
Example of parameter walks for three different source classes.
三种不同源类​​的参数遍历示例。

In this sketch, four steps are defined based on the values of the parameters HR1, HR2, HR3, and HR4. Their values mainly define the direction of the step, while the step length is defined by the selected algorithm.

在此草图中,基于参数HR1,HR2,HR3和HR4的值定义了四个步骤。 它们的值主要定义步的方向,而步长由所选算法定义。

We see that the HR1and HR2 steps already help us to separate variable stars from galaxies or binary star systems. The additional parameters then help to differentiate between the latter two classes.

我们看到HR1和HR2步骤已经帮助我们将可变恒星与星系或双星系统分开。 然后,附加参数有助于区分后两个类。

We can see how an algorithm classified an object by the color of the objects path. More detailed information on the data mapping will be given in a subsequent article.

我们可以看到算法如何通过对象路径的颜色对对象进行分类。 有关数据映射的更多详细信息将在后续文章中给出。

This is a screenshot of the VR parameter space for a large number of sources that were classified to three different classes (named CV, BL and STAR):

这是VR参数空间的屏幕快照,该VR参数空间适用于被分类为三个不同类(名为CV,BL和STAR)的大量源:

The parameter space which shows the data traces of various selected X-ray sources. Similar sources occupy the same regions.
Exploring the parameter space
探索参数空间

In the image above, there are three classes: variable stars (blue), a very active kind of elliptical galaxies (light green) and normal stars (dark green). We can see that sources whose parameters generated a similar path have been assigned to the same class. We can also see situations where the parameter values caused the path to take on a strange shape, causing confusion for the algorithm.

在上图中,分为三类: 变星 (蓝色),一种非常活跃的椭圆星系 (浅绿色)和普通星(深绿色)。 我们可以看到,其参数生成相似路径的源已分配给同一类。 我们还可以看到参数值导致路径采用奇怪形状,导致算法混乱的情况。

This representation yielded a much better understanding of why a machine-learning algorithm classified a source in a certain way and made clear why it failed to characterize other sources when their paths overlapped.

通过这种表示,可以更好地理解为什么机器学习算法以某种方式对源进行分类,并阐明了为什么当路径重叠时无法表征其他源的原因。

摘要 (Summary)

Creating the Virtual Data Cosmos convinced me not only of my hypothesis that VR offers great potential for scientific data analysis in science, but also that the pure presentation of big data can create interesting and aesthetic virtual spaces when determined by the specific parameters of the data. This generative approach implies that by exploring the virtual world, users can actually examine an abstract parameter space that is not necessarily visual in nature. By interacting with the virtual elements, the visualization becomes an extremely useful tool.

创建虚拟数据宇宙不仅使我相信虚拟现实为科学中的科学数据分析提供了巨大潜力的假设,而且使大数据的纯粹呈现在由数据的特定参数确定的情况下可以创建有趣且美观的虚拟空间,这使我相信了这一事实。 这种生成方法意味着,通过探索虚拟世界,用户实际上可以检查本质上不一定是视觉上的抽象参数空间。 通过与虚拟元素进行交互,可视化成为极其有用的工具。

The scalability in VR is just one advantage over traditional science viz methods. Additionally, the immersive data visualization is fun to work with. It encourages one to focus longer on the data and have a more complete sense of what information might otherwise be hidden.

虚拟现实中的可伸缩性只是相对于传统科学方法的优势之一。 此外,沉浸式数据可视化非常有趣。 它鼓励人们将注意力集中在数据上,并对可能隐藏的信息有更全面的了解。

There is of course plenty more to be explored in this area. Once I was free from using conventional methods to represent the data, designing the parameter space using the radiation properties of the sources raised many new questions for me. How could the parameters be separated more precisely? Are there better representations that would allow the parameter correlations to be analyzed even more clearly? I’ll talk more about how I improved upon the first version by manipulating the parameters in the next article in this series.

当然,在这一领域还有很多值得探索的地方。 一旦我摆脱了使用传统方法来表示数据的麻烦,利用光源的辐射特性设计参数空间就给我提出了许多新问题。 如何更精确地分离参数? 是否有更好的表示形式可以使参数相关性得到更清晰的分析? 在本系列的下一篇文章中,我将通过操纵参数来详细讨论如何对第一个版本进行改进。

The example of the Virtual Data Cosmos illustrates how applying principles of data visualization in VR can support the sciences by enabling the creation of mental models for multidimensional data. This project shows just how thinking outside the box and coming up with new ways to visualize big data opens many exciting possibilities for science.

虚拟数据宇宙的示例说明了在VR中应用数据可视化原理如何通过为多维数据创建心理模型来支持科学。 该项目展示了开箱即用的思维方式以及提出可视化大数据的新方法如何为科学带来了许多令人兴奋的可能性。

I hope I was able to inspire you to create your own VR data visualization experience. A walk-through of the VR experience I created is available on http://annok.de/vdc-2/

我希望能够激发您创建自己的VR数据可视化体验的灵感。 有关我创建的VR体验的演练,请访问http://annok.de/vdc-2/

During my years in astronomy, data visualization has been an elemental part of my research. Toward the end of my PhD, I encountered a challenge quite common in modern astronomy: understanding and visualizing information of a big dataset. Since I was also studying information design at the University of Applied Sciences, I started my exploration into data visualizations and how it could be a tool in processing multidimensional data in science or industry. In this series of articles I will describe my adventure, which eventually led to the development of the Virtual Data Cosmos.

在我从事天文学的几年中,数据可视化一直是我研究的基本组成部分。 在攻读博士学位时,我遇到了现代天文学中一个相当普遍的挑战:理解和可视化大数据集的信息。 由于我还在应用科学大学学习信息设计,因此我开始探索数据可视化以及它如何成为处理科学或工业中多维数据的工具。 在本系列文章中,我将描述我的冒险,最终导致了Virtual Data Cosmos的发展。

Image for post

翻译自: https://medium.com/nightingale/how-data-visualization-in-vr-can-revolutionize-science-aece026a2207

大数据 vr csdn

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388388.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Xcode做简易计算器

1.创建一个新项目,选择“View-based Application”。输入名字“Cal”,这时会有如下界面。 2.选择Resources->CalViewController.xib并双击,便打开了资源编辑对话框。 3.我们会看到几个窗口。其中有一个上面写着Library,这里…

导入数据库怎么导入_导入必要的库

导入数据库怎么导入重点 (Top highlight)With the increasing popularity of machine learning, many traders are looking for ways in which they can “teach” a computer to trade for them. This process is called algorithmic trading (sometimes called algo-trading)…

windows查看系统版本号

windows查看系统版本号 winR,输入cmd,确定,打开命令窗口,输入msinfo32,注意要在英文状态下输入,回车。然后在弹出的窗口中就可以看到系统的具体版本号了。 winR,输入cmd,确定,打开命令窗口&…

02:Kubernetes集群部署——平台环境规划

1、官方提供的三种部署方式: minikube: Minikube是一个工具,可以在本地快速运行一个单点的Kubernetes,仅用于尝试Kubernetes或日常开发的用户使用。部署地址:https://kubernetes.io/docs/setup/minikube/kubeadm Kubea…

更便捷的画决策分支图的工具_做出更好决策的3个要素

更便捷的画决策分支图的工具Have you ever wondered:您是否曾经想过: How did Google dominate 92.1% of the search engine market share? Google如何占领搜索引擎92.1%的市场份额? How did Facebook achieve 74.1% of social media marke…

的界面跳转

在界面的跳转有两种方法,一种方法是先删除原来的界面,然后在插入新的界面:如下代码 if (self.rootViewController.view.superview nil) { [singleDollController.view removeFromSuperview]; [self.view insertSubview:rootViewControlle…

计算性能提升100倍,Uber推出机器学习可视化调试工具

为了让模型迭代过程更加可操作,并能够提供更多的信息,Uber 开发了一个用于机器学习性能诊断和模型调试的可视化工具——Manifold。机器学习在 Uber 平台上得到了广泛的应用,以支持智能决策制定和特征预测(如 ETA 预测 及 欺诈检测…

矩阵线性相关则矩阵行列式_搜索线性时间中的排序矩阵

矩阵线性相关则矩阵行列式声明 (Statement) We have to search for a value x in a sorted matrix M. If x exists, then return its coordinates (i, j), else return (-1, -1).我们必须在排序的矩阵M中搜索值x 。 如果x存在,则返回其坐标(i,j) &#x…

一地鸡毛 OR 绝地反击,2019年区块链发展指南

如果盘点2018年IT技术领域谁是“爆款流量”,那一定有个席位是属于区块链的,不仅经历了巨头、小白纷纷入场的光辉岁月,也经历了加密货币暴跌,争先退场的一地鸡毛。而当时间行进到2019年,区块链又将如何发展呢? 近日,全球知名创投研究机构CBInsight发布了《What’s Next …

iphone UITableView及UIWebView的使用

1。新建一个基于Navigation-based Application的工程。 2。修改原来的RootViewController.h,RootViewController.m,RootViewController.xib为MyTableViewController.h,MyTableViewController.m,MyTableViewController.xib。 3。点击MainVindow.xib,将R…

物联网数据可视化_激发好奇心:数据可视化如何增强博物馆体验

物联网数据可视化When I was living in Paris at the beginning of this year, I went to a minimum of three museums a week. While this luxury was made possible by the combination of an ICOM card and unemployment, it was founded on a passion for museums. Looking…

计算机公开课教学反思,语文公开课教学反思

语文公开课教学反思引导语: 在语文的公开课结束后,教师们在教学 有哪些需要反思的呢?接下来是yjbys小编为大家带来的关于语文公开课教学反思,希望会给大家带来帮助。篇一:语文公开课教学反思今天早上,我上了一节语文…

bigquery数据类型_将BigQuery与TB数据一起使用后的成本和性能课程

bigquery数据类型I’ve used BigQuery every day with small and big datasets querying tables, views, and materialized views. During this time I’ve learned some things, I would have liked to know since the beginning. The goal of this article is to give you so…

中国计算机学科建设,计算机学科建设战略研讨会暨“十四五”规划务虚会召开...

4月15日下午,信息学院计算机系举办了计算机科学与技术学科建设战略研讨会暨“十四五”规划务虚会。本次会议的主旨是借第五轮学科评估的契机,总结计算机学科发展的优劣势,在强调保持优势的同时,更着眼于短板和不足,在未…

服务器被攻击怎么修改,服务器一直被攻击怎么办?

原标题:服务器一直被攻击怎么办?有很多人问说,网站一直被攻击,什么被挂马,什么被黑,每天一早打开网站,总是会出现各种各样的问题,这着实让站长们揪心。从修改服务器管理账号开始&…

脚本 api_从脚本到预测API

脚本 apiThis is the continuation of my previous article:这是我上一篇文章的延续: From Jupyter Notebook To Scripts从Jupyter Notebook到脚本 Last time we discussed how to convert Jupyter Notebook to scripts, together with all sorts of basic engine…

Iphone代码创建视图

要想以编程的方式创建视图,需要使用视图控制器中定义的viewDidLoad方法,只有在运行期间生成UI时才需要实现该方法。 在此只贴出viewDidLoad方法的代码,因为只需要在这个方法里面编写代码: [cpp] view plaincopyprint?- (void)vi…

binary masks_Python中的Masks概念

binary masksAll men are sculptors, constantly chipping away the unwanted parts of their lives, trying to create their idea of a masterpiece … Eddie Murphy所有的人都是雕塑家,不断地消除生活中不必要的部分,试图建立自己的杰作理念……埃迪墨…

Iphone在ScrollView下点击TextField使文本筐不被键盘遮住

1.拖一个Scroll View视图填充View窗口&#xff0c;将Scroll View视图拖大一些&#xff0c;使其超出屏幕。 2.向Scroll View拖&#xff08;添加&#xff09;多个Label视图和Text View视图。 3.在.h头文件中添加如下代码&#xff1a; [cpp] view plaincopyprint?#import <U…

python 仪表盘_如何使用Python刮除仪表板

python 仪表盘Dashboard scraping is a useful skill to have when the only way to interact with the data you need is through a dashboard. We’re going to learn how to scrape data from a dashboard using the Selenium and Beautiful Soup packages in Python. The S…