神经网络推理_分析神经网络推理性能的新工具

神经网络推理

Measuring the inference time of a trained deep neural model on different hardware devices is a critical task when making deployment decisions. Should you deploy your inference on 8 Nvidia V100s, on 12 P100s, or perhaps you can use 64 CPU cores?

在制定部署决策时,测量经过训练的深度神经模型在不同硬件设备上的推理时间是一项关键任务。 您应该在8个Nvidia V100、12个P100上部署推理,还是可以使用64个CPU内核?

When it comes to inference timing, apple-to-apple comparisons among devices do not require rocket-science. Nevertheless, the process is a true time consuming burden that is prone to errors and requires expertise to perform correctly.

当涉及到推理时间时,设备之间的苹果对苹果比较不需要火箭科学。 然而,该过程是一个真正耗时的负担,容易出错,并且需要专业知识才能正确执行。

Fortunately, DeciAI released a free service that does it for you. The Deci Inference Performance Simulator (DIPS) can help practitioners analyze their inference performance. DIPS can measure model throughput, latency, cloud cost, model memory usage, and other important performance metrics. It provides a full analysis on how your model will behave and perform across various production environments — at no cost.

幸运的是,DeciAI发布了一项免费服务,可以为您完成这项工作。 Deci推理性能模拟器(DIPS)可以帮助从业人员分析其推理性能。 DIPS可以测量模型吞吐量,延迟,云成本,模型内存使用情况以及其他重要的性能指标。 它免费提供了有关模型在各种生产环境中的行为和性能的完整分析。

为什么测量运行时性能很痛苦? (Why is measuring run-time performance painful?)

In how to measure deep learning performance, we provide practical guidelines for inference evaluation that include the following steps: (1) Write a latency measurement script (2) Write a script to compute the optimal batch size for inference (3) Write a throughput measurement script (4) Launch several machines on the cloud to run the all these scripts and (5) Summarize the obtained metrics and analyze the results.

在如何衡量深度学习性能方面 ,我们提供了用于推理评估的实用指南,包括以下步骤:(1)编写延迟测量脚本(2)编写脚本以计算推理的最佳批处理大小(3)编写吞吐量度量脚本(4)在云上启动多台计算机以运行所有这些脚本,以及(5)汇总获得的指标并分析结果。

Performing these steps is not only time consuming, it is also highly error prone. For example, issues may arise when it comes to timing on the CPU, measuring the transfer of data to and from the acceleration device, measuring preprocessing, and so on.

执行这些步骤不仅耗时,而且极易出错。 例如,当涉及到CPU上的计时,测量与加速设备之间的数据传输,测量预处理等等时,可能会出现问题。

The DIPS platform deals with all the above details, and more, making it possible for you to obtain accurate inference timing. At Deci AI, our business is about accelerating inference and we created DIPS for our own internal use. When we saw that even practitioners face challenges with timing inference principles, we realized that everyone would benefit if we released DIPS to the community. We firmly believe that helping others tackle this technical challenge will go a long way towards promoting unified timing calculations.

DIPS平台处理上述所有细节,并提供更多信息,使您可以获取准确的推理时间。 在Deci AI,我们的业务是加速推理,我们创建了DIPS供内部使用。 当我们看到从业人员甚至在时序推理原则上也面临挑战时,我们意识到,如果我们向社区发布DIPS,每个人都会从中受益。 我们坚信,帮助他人应对这一技术挑战将大大有助于促进统一的时序计算。

DIPS报告 (The DIPS Report)

The DIPS service platform receives as input a neural model and returns a comprehensive report on the model’s inference performance.

DIPS服务平台接收神经模型作为输入,并返回有关模型推理性能的综合报告。

The model can be completely untrained, because DIPS is only concerned with timing and costs.

该模型可以完全不受训练,因为DIPS只考虑时间和成本。

In the next section, we describe how to input your own model. But first, let’s see what makes this tool so attractive. Below you can see the Results Summary taken from a typical DIPS report.

在下一节中,我们描述如何输入您自己的模型。 但是首先,让我们看看是什么使该工具如此吸引人。 您可以在下面看到来自典型DIPS报告的结果摘要。

Image for post

The model that gave rise to this report is Yolo v3, implemented in ONNX. (DIPS also supports PyTorch and TensorFlow.) As you can see, the report includes 5 categories and a list of key insights. For example, one conclusion is that using Tesla-V100 will yield the highest throughput and lowest latency. Another non-trivial conclusion is that T-4 will yield the best price for the inference of 100K images. Other insights note the capacity of each model on the different hardware (optimal batch size), the tradeoff between cost and performance for each hardware, memory usage, and much more.

产生此报告的模型是在ONNX中实现的Yolo v3。 (DIPS还支持PyTorch和TensorFlow。)如您所见,该报告包含5个类别和关键见解列表。 例如,一个结论是,使用Tesla-V100将产生最高的吞吐量和最低的延迟。 另一个不平凡的结论是,T-4将为推断100K图像提供最佳价格。 其他见解指出,每种模型在不同硬件上的容量(最佳批处理大小),每种硬件的成本和性能之间的权衡,内存使用情况等等。

Even experienced programmers might need several days of code writing to produce this kind of study and a similar report encompassing all these hardware devices. With DIPS, it will take you at most a few minutes!

即使是经验丰富的程序员,也可能需要几天的代码编写来进行此类研究,并且需要涵盖所有这些硬件设备的类似报告。 使用DIPS,最多只需要几分钟!

DIPS also offers a deeper look into each of the sections of the report. For example, anyone interested in computation cost can look at the report page that specifies the cost aspects for each of the hardwares. For the scenario above, the model cloud cost section of the report looks like this:

DIPS还对报告的每个部分进行了更深入的研究。 例如,任何对计算成本感兴趣的人都可以查看报告页面,该页面指定了每种硬件的成本方面。 对于上述情况,报告的模型云成本部分如下所示:

Image for post

Using the information provided, you can optimize your cloud costs depending on the desired input batch sizes — and even compare the cost of several models on a specific hardware.

使用提供的信息,您可以根据所需的输入批处理大小来优化您的云成本,甚至可以比较特定硬件上几种型号的成本。

如何使用DIPS (How to Use DIPS)

EMBED: https://www.youtube.com/watch?v=cC9nMFS1e_c

嵌入: https : //www.youtube.com/watch?v = cC9nMFS1e_c

Let’s take a quick walk-through on how to use the DIPS. You can find DIPS on Deci’s website. After inserting some initial details (Step 1) you will land on the following page (Step 2):

让我们快速浏览一下如何使用DIPS。 您可以在Deci 网站上找到DIPS。 插入一些初始详细信息(步骤1)后,您将进入以下页面(步骤2):

Image for post

This page allows you to provide the minimal details needed for us to analyze your model. Fill in the following basic information:

该页面允许您提供我们分析模型所需的最少详细信息。 填写以下基本信息:

  1. Model name — The name of the model you would like to analyze (any string is OK).

    模型名称 -您要分析的模型的名称(任何字符串都可以)。

  2. Model framework — Choose one of the given frameworks. The minimal requirement for testing each framework is written in blue.

    模型框架 -选择给定的框架之一。 测试每个框架的最低要求用蓝色表示。

  3. Input dimension — The dimension of the tensor that should be used for the network. For example, if you work on ImageNet and PyTorch this will be (3,224,224).

    输入维数 -网络应使用的张量的维数。 例如,如果您使用ImageNet和PyTorch,则为(3,224,224)。

  4. Inference hardware — The hardware you wish to test. You can choose up to 4 hardware types: Intel CPU, Nvidia V100, Nvidia T4, Nvidia K80.

    推理硬件 —您要测试的硬件。 您最多可以选择4种硬件类型:英特尔CPU,Nvidia V100,Nvidia T4,Nvidia K80。

  5. Choose how you want to give us access to the model.

    选择您希望我们如何访问模型的方式。
  6. Checkpoint link — Share the model via a public link. When you select the framework, you’ll find specific instructions in blue under the framework field.

    检查点链接—通过公共链接共享模型。 选择框架后,您会在“框架”字段下以蓝色找到特定的说明。
  7. Be contacted by Deci — Deci’s expert will contact you to get the model.

    Deci与您联系-Deci的专家将与您联系以获取模型。
  8. Use an existing off-the-shelf model — You have the option of choosing one of several off-the-shelf models (e.g., ResNet 18/50, EfficientNet, MobileNet, and Yolo).

    使用现有的现成模型-您可以选择几种现成模型(例如ResNet 18/50,EfficientNet,MobileNet和Yolo)之一。

As mentioned above, you don’t need to supply a trained model in order to use DIPS. An untrained model will give rise to the same inference timing (and cost) metrics.

如上所述,您无需提供经过训练的模型即可使用DIPS。 未经训练的模型将产生相同的推理时间(和成本)指标。

为什么在隐私方面可以放松一下 (Why you can relax when it comes to privacy)

It’s natural that most users will be concerned about sharing models, weights, or data. For this reason, we built DIPS as a fully secure and private application, where all the data and model weights remain confidential. We also allow you to choose an off-the-shelf model from our model repository, so we use our own existing models for analysis. After analyzing the model, we immediately delete your model from our servers. We never save a copy of your model. Moreover, DIPS uses a secure transfer protocol with the highest encryption standards available. At Deci, we are committed to ensuring that no one will use or distribute any of the input models. If you still have privacy concerns, you can upload an open source model that has the same characteristics, or alter your own model.

大多数用户自然会担心共享模型,权重或数据。 因此,我们将DIPS构建为完全安全的私有应用程序,其中所有数据和模型权重均保持机密。 我们还允许您从模型存储库中选择现成的模型,因此我们使用自己的现有模型进行分析。 分析模型后,我们立即从服务器中删除您的模型。 我们绝不会保存您的模型的副本。 而且,DIPS使用具有最高可用加密标准的安全传输协议。 在Deci,我们致力于确保没有人会使用或分发任何输入模型。 如果仍然有隐私问题,可以上传具有相同特征的开源模型,或更改自己的模型。

节省时间并防止在测​​量模型性能时出错 (Save time and prevent errors in measuring your model performance)

DIPS is a new tool, available free of charge, for measuring the inference performance of deep learning architectures on different hardware platforms. It provides a unified approach to evaluating your model’s metrics with the simple click of a button. DIPS is openly available to the deep learning community to help save time and prevent errors in latency/throughput measurements.

DIPS是免费提供的新工具,用于测量不同硬件平台上的深度学习架构的推理性能。 只需单击一个按钮,它便提供了一种统一的方法来评估模型的指标。 DIPS向深度学习社区开放,以帮助节省时间并防止延迟/吞吐量测量中的错误。

Deci is committed to keeping any models evaluated using DIPS completely secure and private. So all that remains is for you to try DIPS from the following link and tell us what you think.

Deci致力于使使用DIPS评估的任何模型都完全安全和私密。 因此,剩下的就是让您尝试以下链接中的 DIPS,并告诉我们您的想法。

翻译自: https://towardsdatascience.com/a-new-tool-for-analysing-neural-network-inference-performance-13cc21d2efea

神经网络推理

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391140.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Eclipse断点调试

1.1 Eclipse断点调试概述Eclipse的断点调试可以查看程序的执行流程和解决程序中的bug1.2 Eclipse断点调试常用操作:A:什么是断点:就是一个标记,从哪里开始。B:如何设置断点:你想看哪里的程序,你就在那个有效程序的左边双击即可。C…

深入理解InnoDB(7)—系统表空间

系统表空间 可以看到,系统表空间和独立表空间的前三个页面(页号分别为0、1、2,类型分别是FSP_HDR、IBUF_BITMAP、INODE)的类型是一致的,只是页号为3~7的页面是系统表空间特有的 页号3 SYS: Insert Buffer …

CodeForces - 869B The Eternal Immortality

题意&#xff1a;已知a,b&#xff0c;求的最后一位。 分析&#xff1a; 1、若b-a>5&#xff0c;则尾数一定为0&#xff0c;因为连续5个数的尾数要么同时包括一个5和一个偶数&#xff0c;要么包括一个0。 2、若b-a<5&#xff0c;直接暴力求即可。 #include<cstdio>…

如何在24行JavaScript中实现Redux

90% convention, 10% library. 90&#xff05;的惯例&#xff0c;10&#xff05;的图书馆。 Redux is among the most important JavaScript libraries ever created. Inspired by prior art like Flux and Elm, Redux put JavaScript functional programming on the map by i…

卡方检验 原理_什么是卡方检验及其工作原理?

卡方检验 原理As a data science engineer, it’s imperative that the sample data set which you pick from the data is reliable, clean, and well tested for its usability in machine learning model building.作为数据科学工程师&#xff0c;当务之急是从数据中挑选出的…

Web UI 设计(网页设计)命名规范

Web UI 设计命名规范 一.网站设计及基本框架结构: 1. Container“container“ 就是将页面中的所有元素包在一起的部分&#xff0c;这部分还可以命名为: “wrapper“, “wrap“, “page“.2. Header“header” 是网站页面的头部区域&#xff0c;一般来讲&#xff0c;它包含…

27个机器学习图表翻译_使用机器学习的信息图表信息组织

27个机器学习图表翻译Infographics are crucial for presenting information in a more digestible fashion to the audience. With their usage being expanding to many (if not all) professions like journalism, science, and research, advertisements, business, the re…

面向Tableau开发人员的Python简要介绍(第4部分)

用PYTHON探索数据 (EXPLORING DATA WITH PYTHON) Between data blends, joins, and wrestling with the resulting levels of detail in Tableau, managing relationships between data can be tricky.在数据混合&#xff0c;联接以及在Tableau中产生的详细程度之间进行搏斗之间…

蝙蝠侠遥控器pcb_通过蝙蝠侠从Circle到ML:第二部分

蝙蝠侠遥控器pcbView Graph查看图 背景 (Background) Wait! Isn’t the above equation different from what we found last time? Yup, very different but still looks exactly the same or maybe a bit better. Just in case you are wondering what I am talking about, p…

camera驱动框架分析(上)

前言 camera驱动框架涉及到的知识点比较多&#xff0c;特别是camera本身的接口就有很多&#xff0c;有些是直接连接到soc的camif口上的&#xff0c;有些是通过usb接口导出的&#xff0c;如usb camera。我这里主要讨论前者&#xff0c;也就是与soc直连的。我认为凡是涉及到usb的…

探索感染了COVID-19的动物的数据

数据 (The data) With the number of cases steadily rising day by day, COVID-19 has been pretty much in the headlines of every newspaper known to man. Despite the massive amount of attention, a topic that has remained mostly untouched (some exceptions being …

Facebook哭晕在厕所,调查显示用VR体验社交的用户仅为19%

美国娱乐软件协会ESA调查显示&#xff0c;有74%的用户使用VR玩游戏&#xff0c;而仅有19%的用户会用VR进行社交。 当我们说到VR社交&#xff0c;必然离不开Facebook。在刚刚结束的F8大会上&#xff0c;小扎展示了VR社交平台Facebook Spaces测试版&#xff0c;巧的是此前也有好…

解决Javascript疲劳的方法-以及其他所有疲劳

Learn your fundamentals, and never worry again. 了解您的基础知识&#xff0c;再也不用担心。 新工具让我担心 (New Tools Worry Me) When JavaScripts shiny tool of the day comes out, I sometimes overreact. 当JavaScript一天一度的闪亮工具问世时&#xff0c;我有时R…

已知两点坐标拾取怎么操作_已知的操作员学习-第4部分

已知两点坐标拾取怎么操作有关深层学习的FAU讲义 (FAU LECTURE NOTES ON DEEP LEARNING) These are the lecture notes for FAU’s YouTube Lecture “Deep Learning”. This is a full transcript of the lecture video & matching slides. We hope, you enjoy this as mu…

北京供销大数据集团发布SinoBBD Cloud 一体化推动产业云发展

9月5日&#xff0c;第五届全球云计算大会在上海世博展览馆盛大开幕&#xff0c;国内外顶尖企业汇聚一堂&#xff0c;新一代云计算技术产品纷纷亮相。作为国内领先的互联网基础服务提供商&#xff0c;北京供销大数据集团(以下简称“SinoBBD”)受邀参加此次大会&#xff0c;并正式…

“陪护机器人”研报:距离真正“陪护”还差那么一点

一款有“缺陷”的机器人&#xff0c;怎能做到真正的“陪护”&#xff1f; 近日&#xff0c;鼎盛智能发布了一款名为Ibotn的&#xff08;爱蹦&#xff09;幼儿陪伴机器人&#xff0c;核心看点就是通过人脸识别、场景识别等计算机视觉技术来实现机器人对儿童的陪护。不过&#xf…

【转】消息队列应用场景

一、消息队列概述 消息队列中间件是分布式系统中重要的组件&#xff0c;主要解决应用耦合&#xff0c;异步消息&#xff0c;流量削锋等问题。实现高性能&#xff0c;高可用&#xff0c;可伸缩和最终一致性架构。是大型分布式系统不可缺少的中间件。 目前在生产环境&#xff0c;…

lime 模型_使用LIME的糖尿病预测模型解释— OneZeroBlog

lime 模型Article outline文章大纲 Introduction 介绍 Data Background 资料背景 Aim of the article 本文的目的 Exploratory analysis 探索性分析 Training a Random Forest Model 训练随机森林模型 Global Importance 全球重要性 Local Importance 当地重要性 介绍 (Introd…

Linux第三周作业

1.三个法宝 ①存储程序计算机工作模型&#xff0c;计算机系统最最基础性的逻辑结构&#xff1b; ②函数调用堆栈&#xff0c;堆栈完成了计算机的基本功能&#xff1a;函数的参数传递机制和局部变量存取 &#xff1b; ③中断&#xff0c;多道程序操作系统的基点&#xff0c;没有…

RESTful API浅谈

2019独角兽企业重金招聘Python工程师标准>>> 上半年时候&#xff0c;部门有组织的讨论了一下实践微服务的技术话题&#xff0c;主要内容是SOA服务和微服务各自的优势和难点&#xff0c;其中有提到关于RESTful API设计方法。 正好最近在深入的学习HTTP协议&#xff0…