神经网络推理

Measuring the inference time of a trained deep neural model on different hardware devices is a critical task when making deployment decisions. Should you deploy your inference on 8 Nvidia V100s, on 12 P100s, or perhaps you can use 64 CPU cores?

在制定部署决策时，测量经过训练的深度神经模型在不同硬件设备上的推理时间是一项关键任务。您应该在8个Nvidia V100、12个P100上部署推理，还是可以使用64个CPU内核？

When it comes to inference timing, apple-to-apple comparisons among devices do not require rocket-science. Nevertheless, the process is a true time consuming burden that is prone to errors and requires expertise to perform correctly.

当涉及到推理时间时，设备之间的苹果对苹果比较不需要火箭科学。然而，该过程是一个真正耗时的负担，容易出错，并且需要专业知识才能正确执行。

Fortunately, DeciAI released a free service that does it for you. The Deci Inference Performance Simulator (DIPS) can help practitioners analyze their inference performance. DIPS can measure model throughput, latency, cloud cost, model memory usage, and other important performance metrics. It provides a full analysis on how your model will behave and perform across various production environments — at no cost.

幸运的是，DeciAI发布了一项免费服务，可以为您完成这项工作。 Deci推理性能模拟器(DIPS)可以帮助从业人员分析其推理性能。 DIPS可以测量模型吞吐量，延迟，云成本，模型内存使用情况以及其他重要的性能指标。它免费提供了有关模型在各种生产环境中的行为和性能的完整分析。

为什么测量运行时性能很痛苦？ (Why is measuring run-time performance painful?)

In how to measure deep learning performance, we provide practical guidelines for inference evaluation that include the following steps: (1) Write a latency measurement script (2) Write a script to compute the optimal batch size for inference (3) Write a throughput measurement script (4) Launch several machines on the cloud to run the all these scripts and (5) Summarize the obtained metrics and analyze the results.

在如何衡量深度学习性能方面，我们提供了用于推理评估的实用指南，包括以下步骤：(1)编写延迟测量脚本(2)编写脚本以计算推理的最佳批处理大小(3)编写吞吐量度量脚本(4)在云上启动多台计算机以运行所有这些脚本，以及(5)汇总获得的指标并分析结果。

Performing these steps is not only time consuming, it is also highly error prone. For example, issues may arise when it comes to timing on the CPU, measuring the transfer of data to and from the acceleration device, measuring preprocessing, and so on.

执行这些步骤不仅耗时，而且极易出错。例如，当涉及到CPU上的计时，测量与加速设备之间的数据传输，测量预处理等等时，可能会出现问题。

The DIPS platform deals with all the above details, and more, making it possible for you to obtain accurate inference timing. At Deci AI, our business is about accelerating inference and we created DIPS for our own internal use. When we saw that even practitioners face challenges with timing inference principles, we realized that everyone would benefit if we released DIPS to the community. We firmly believe that helping others tackle this technical challenge will go a long way towards promoting unified timing calculations.

DIPS平台处理上述所有细节，并提供更多信息，使您可以获取准确的推理时间。在Deci AI，我们的业务是加速推理，我们创建了DIPS供内部使用。当我们看到从业人员甚至在时序推理原则上也面临挑战时，我们意识到，如果我们向社区发布DIPS，每个人都会从中受益。我们坚信，帮助他人应对这一技术挑战将大大有助于促进统一的时序计算。

DIPS报告 (The DIPS Report)

The DIPS service platform receives as input a neural model and returns a comprehensive report on the model’s inference performance.

DIPS服务平台接收神经模型作为输入，并返回有关模型推理性能的综合报告。

The model can be completely untrained, because DIPS is only concerned with timing and costs.

该模型可以完全不受训练，因为DIPS只考虑时间和成本。

In the next section, we describe how to input your own model. But first, let’s see what makes this tool so attractive. Below you can see the Results Summary taken from a typical DIPS report.

在下一节中，我们描述如何输入您自己的模型。但是首先，让我们看看是什么使该工具如此吸引人。您可以在下面看到来自典型DIPS报告的结果摘要。

The model that gave rise to this report is Yolo v3, implemented in ONNX. (DIPS also supports PyTorch and TensorFlow.) As you can see, the report includes 5 categories and a list of key insights. For example, one conclusion is that using Tesla-V100 will yield the highest throughput and lowest latency. Another non-trivial conclusion is that T-4 will yield the best price for the inference of 100K images. Other insights note the capacity of each model on the different hardware (optimal batch size), the tradeoff between cost and performance for each hardware, memory usage, and much more.

产生此报告的模型是在ONNX中实现的Yolo v3。 (DIPS还支持PyTorch和TensorFlow。)如您所见，该报告包含5个类别和关键见解列表。例如，一个结论是，使用Tesla-V100将产生最高的吞吐量和最低的延迟。另一个不平凡的结论是，T-4将为推断100K图像提供最佳价格。其他见解指出，每种模型在不同硬件上的容量(最佳批处理大小)，每种硬件的成本和性能之间的权衡，内存使用情况等等。

Even experienced programmers might need several days of code writing to produce this kind of study and a similar report encompassing all these hardware devices. With DIPS, it will take you at most a few minutes!

即使是经验丰富的程序员，也可能需要几天的代码编写来进行此类研究，并且需要涵盖所有这些硬件设备的类似报告。使用DIPS，最多只需要几分钟！

DIPS also offers a deeper look into each of the sections of the report. For example, anyone interested in computation cost can look at the report page that specifies the cost aspects for each of the hardwares. For the scenario above, the model cloud cost section of the report looks like this:

DIPS还对报告的每个部分进行了更深入的研究。例如，任何对计算成本感兴趣的人都可以查看报告页面，该页面指定了每种硬件的成本方面。对于上述情况，报告的模型云成本部分如下所示：

Using the information provided, you can optimize your cloud costs depending on the desired input batch sizes — and even compare the cost of several models on a specific hardware.

使用提供的信息，您可以根据所需的输入批处理大小来优化您的云成本，甚至可以比较特定硬件上几种型号的成本。

如何使用DIPS (How to Use DIPS)

EMBED: https://www.youtube.com/watch?v=cC9nMFS1e_c

嵌入： https ： //www.youtube.com/watch？v = cC9nMFS1e_c

Let’s take a quick walk-through on how to use the DIPS. You can find DIPS on Deci’s website. After inserting some initial details (Step 1) you will land on the following page (Step 2):

让我们快速浏览一下如何使用DIPS。您可以在Deci 网站上找到DIPS。插入一些初始详细信息(步骤1)后，您将进入以下页面(步骤2)：

This page allows you to provide the minimal details needed for us to analyze your model. Fill in the following basic information:

该页面允许您提供我们分析模型所需的最少详细信息。填写以下基本信息：

Model name — The name of the model you would like to analyze (any string is OK).
模型名称 -您要分析的模型的名称(任何字符串都可以)。
Model framework — Choose one of the given frameworks. The minimal requirement for testing each framework is written in blue.
模型框架 -选择给定的框架之一。测试每个框架的最低要求用蓝色表示。
Input dimension — The dimension of the tensor that should be used for the network. For example, if you work on ImageNet and PyTorch this will be (3,224,224).
输入维数 -网络应使用的张量的维数。例如，如果您使用ImageNet和PyTorch，则为(3,224,224)。
Inference hardware — The hardware you wish to test. You can choose up to 4 hardware types: Intel CPU, Nvidia V100, Nvidia T4, Nvidia K80.
推理硬件 —您要测试的硬件。您最多可以选择4种硬件类型：英特尔CPU，Nvidia V100，Nvidia T4，Nvidia K80。
Choose how you want to give us access to the model.
选择您希望我们如何访问模型的方式。
Checkpoint link — Share the model via a public link. When you select the framework, you’ll find specific instructions in blue under the framework field.
检查点链接—通过公共链接共享模型。选择框架后，您会在“框架”字段下以蓝色找到特定的说明。
Be contacted by Deci — Deci’s expert will contact you to get the model.
Deci与您联系-Deci的专家将与您联系以获取模型。
Use an existing off-the-shelf model — You have the option of choosing one of several off-the-shelf models (e.g., ResNet 18/50, EfficientNet, MobileNet, and Yolo).
使用现有的现成模型-您可以选择几种现成模型(例如ResNet 18/50，EfficientNet，MobileNet和Yolo)之一。

As mentioned above, you don’t need to supply a trained model in order to use DIPS. An untrained model will give rise to the same inference timing (and cost) metrics.

如上所述，您无需提供经过训练的模型即可使用DIPS。未经训练的模型将产生相同的推理时间(和成本)指标。

为什么在隐私方面可以放松一下 (Why you can relax when it comes to privacy)

It’s natural that most users will be concerned about sharing models, weights, or data. For this reason, we built DIPS as a fully secure and private application, where all the data and model weights remain confidential. We also allow you to choose an off-the-shelf model from our model repository, so we use our own existing models for analysis. After analyzing the model, we immediately delete your model from our servers. We never save a copy of your model. Moreover, DIPS uses a secure transfer protocol with the highest encryption standards available. At Deci, we are committed to ensuring that no one will use or distribute any of the input models. If you still have privacy concerns, you can upload an open source model that has the same characteristics, or alter your own model.

大多数用户自然会担心共享模型，权重或数据。因此，我们将DIPS构建为完全安全的私有应用程序，其中所有数据和模型权重均保持机密。我们还允许您从模型存储库中选择现成的模型，因此我们使用自己的现有模型进行分析。分析模型后，我们立即从服务器中删除您的模型。我们绝不会保存您的模型的副本。而且，DIPS使用具有最高可用加密标准的安全传输协议。在Deci，我们致力于确保没有人会使用或分发任何输入模型。如果仍然有隐私问题，可以上传具有相同特征的开源模型，或更改自己的模型。

节省时间并防止在测量模型性能时出错 (Save time and prevent errors in measuring your model performance)

DIPS is a new tool, available free of charge, for measuring the inference performance of deep learning architectures on different hardware platforms. It provides a unified approach to evaluating your model’s metrics with the simple click of a button. DIPS is openly available to the deep learning community to help save time and prevent errors in latency/throughput measurements.

DIPS是免费提供的新工具，用于测量不同硬件平台上的深度学习架构的推理性能。只需单击一个按钮，它便提供了一种统一的方法来评估模型的指标。 DIPS向深度学习社区开放，以帮助节省时间并防止延迟/吞吐量测量中的错误。

Deci is committed to keeping any models evaluated using DIPS completely secure and private. So all that remains is for you to try DIPS from the following link and tell us what you think.

Deci致力于使使用DIPS评估的任何模型都完全安全和私密。因此，剩下的就是让您尝试以下链接中的 DIPS，并告诉我们您的想法。