python dash

📌 Learn how to deliver AI for Big Data using Dash & Databricks this recorded webinar with Peter Kim of Plotly and Prasad Kona of Databricks.

this通过Plotly的Peter Kim和Databricks的Prasad Kona的网络研讨会了解如何使用Dash＆Databricks交付用于大数据的AI。

We’re delighted to announce that Plotly and Databricks are partnering to bring cloud-distributed Artificial Intelligence (AI) & Machine Learning (ML) to a vastly wider audience of business users. By integrating the Plotly Dash frontend with the Databricks backend, we are offering a seamless process to transform AI and ML models into production-ready, dynamic, interactive, web applications. This partnership with Databricks empowers Python developers to easily and quickly build Dash apps that are connected to a Databricks Spark cluster. The direct integration, databricks-dash, is distributed by Plotly and available with Plotly’s Dash Enterprise.

我们很高兴地宣布， Plotly和Databricks正在合作，将云分布式的人工智能(AI)和机器学习(ML)带给更广泛的业务用户。通过将Plotly Dash前端与Databricks后端集成，我们提供了一个无缝流程，可将AI和ML模型转换为可用于生产，动态，交互式的Web应用程序。通过与Databricks的合作，Python开发人员可以轻松快速地构建连接到Databricks Spark集群的Dash应用程序。 直接集成 databricks-dash 由Plotly分发，可用于 Plotly的Dash Enterprise。

Plotly’s Dash is a Python framework that enables developers to build interactive, data-rich analytical web apps in pure Python, with no JavaScript required. Traditional “full-stack” app development is done in teams with some members specializing in backend/server technologies like Python, some specializing in front-end technologies like React, and some specializing in data science. Dash provides a tightly-integrated backend and front-end, entirely written in Python. This means that data science teams producing models, visualizations and complex analyses no longer need to rely on backend specialists to expose these models to the front-end via APIs, and no longer need to rely on front-end specialists to build user interfaces to connect to these APIs. If you’re interested in Dash’s architecture, please see our “Dash is React for Python” article.

Plotly的Dash是一个Python框架，可让开发人员使用纯Python构建交互式，数据丰富的分析Web应用程序，而无需使用JavaScript。传统的“全栈”应用程序开发是由团队完成的，其中一些成员专门研究Python等后端/服务器技术，一些成员专门研究React等前端技术，还有一些专门研究数据科学。 Dash提供了完全使用Python编写的紧密集成的后端和前端。这意味着产生模型，可视化和复杂分析的数据科学团队不再需要依靠后端专家通过API将这些模型公开给前端，也不再需要依靠前端专家来构建用户界面进行连接这些API。如果您对Dash的体系结构感兴趣，请参阅我们的“ Dash是Python的React ”一文。

Databricks’ unified platform for data and AI rests on top of Apache Spark, a distributed general-purpose cluster computing framework originally developed by the Databricks founders. With enough hardware and networking availability, Apache Spark scales horizontally naturally due to its distributed architecture. Apache Spark has a rich collection of APIs, MLlib, and integration with popular Python scientific libraries (e.g. pandas, scikit-learn, etc). The Databricks Data Science Workspace provides managed, optimized, and secure Spark clusters. This enables developers and data scientists to focus on building and optimizing models and worry less about infrastructure aspects such as speed, reliability, building fault-tolerant systems, etc. Databricks also abstracts away many manual administrative duties (such as creating a cluster, auto-scaling hardware, and managing users) and simplifies the development process by enabling users to create IPython-like notebooks.

Databricks的数据和AI统一平台位于Apache Spark之上， Apache Spark是由Databricks创始人最初开发的分布式通用集群计算框架。凭借足够的硬件和网络可用性，Apache Spark由于其分布式架构而可以自然地水平扩展。 Apache Spark具有丰富的API，MLlib以及与流行的Python科学库(例如，pandas，scikit-learn等)的集成。 Databricks数据科学工作区提供了托管，优化和安全的Spark集群。这使开发人员和数据科学家可以专注于构建和优化模型，而不必担心基础架构方面的问题，例如速度，可靠性，构建容错系统等。Databricks还抽象出许多手动管理职责(例如创建集群，扩展硬件并管理用户)，并通过使用户能够创建类似于IPython的笔记本来简化开发过程。

With Dash apps connected to Databricks Spark clusters, Dash + Databricks gives business users the powerful magic of Python and pyspark.
通过将Dash应用程序连接到Databricks Spark集群，Dash + Databricks为业务用户提供了Python和pyspark的强大魔力。

Databricks is the industry-leading Spark platform, and Plotly’s Dash is the industry-leading library for building UIs and web apps in Python. By using Dash and Databricks together, data scientists can quickly deliver production-ready AI and ML apps to business users that are backed by Databricks Spark clusters. A typical Dash + Databricks app is usually less than a thousand lines of code written in Python (no Javascript required). These Dash apps can vary from simple UIs for simulation models to complex dashboards acting as read/write interfaces to your Databricks Spark cluster and large amounts of data stored in a data warehouse. With Dash apps connected to Databricks Spark clusters, Dash + Databricks gives business users the powerful magic of Python and pyspark.

Databricks是行业领先的Spark平台，而Plotly的Dash是行业领先的库，用于在Python中构建UI和Web应用程序。通过将Dash和Databricks一起使用，数据科学家可以为由Databricks Spark集群支持的业务用户快速交付可用于生产的AI和ML应用程序。一个典型的Dash + Databricks应用程序通常少于一千行用Python编写的代码(不需要Javascript)。这些Dash应用程序的范围从模拟模型的简单UI到充当Databricks Spark集群的读/写界面以及存储在数据仓库中的大量数据的复杂仪表板不等。通过将Dash应用程序连接到Databricks Spark集群，Dash + Databricks为业务用户提供了Python和pyspark的强大魔力。

Currently, there are two ways to integrate Dash with Databricks:

当前，有两种方法可以将Dash与Databricks集成：

databricks-dash supports a Notebook-like approach meant for quick Dash app prototyping within the Databricks notebook environment.
databricks-dash支持类似于Notebook的方法，旨在在Databricks笔记本环境中快速进行Dash应用原型设计。
databricks-connect supports a development-like approach meant for productionizing.
databricks-connect支持用于生产的类似于开发的方法。

More details on each integration methods follow:

每种集成方法的更多详细信息如下：

数据块-破折号 (databricks-dash)

databricks-dash is a closed-source, custom library that can be installed and imported on any Databricks notebook. With the use of import, developers can start building Dash applications on the Databricks notebook itself. Like regular Dash applications, Dash applications in Databricks notebooks maintain their usage of app layouts and callbacks. Any PySpark code that deals with complex models or simple ETL processes written on Databricks notebooks can be easily integrated into Dash applications with minimal code migrations. Once a Flask (Python) server runs, the generated Dash application becomes hosted on your Databricks instance with a unique url. It is important to note that these Dash applications on Databricks notebooks are running on shared resources and lack a load balancer. Thus, databricks-dash is great for quick prototyping and iterating but is not recommended for production deployments. For any data scientist or developer interested in taking this Dash application using databricks-dash to production, Plotly’s Dash Enterprise documentation can provide you all the steps to help you get there by using databricks-connect.

databricks-dash是一个封闭源代码，自定义库，可以在任何Databricks笔记本上安装和导入。通过使用import ，开发人员可以开始在Databricks笔记本本身上构建Dash应用程序。像常规的Dash应用程序一样，Databricks笔记本中的Dash应用程序保持其对应用程序布局和回调的使用。任何处理Databricks笔记本上编写的复杂模型或简单ETL流程的PySpark代码都可以轻松地集成到Dash应用程序中，而无需进行最少的代码迁移。 Flask(Python)服务器运行后，生成的Dash应用程序将使用唯一的URL托管在您的Databricks实例上。重要的是要注意，Databricks笔记本上的这些Dash应用程序在共享资源上运行，并且没有负载平衡器。因此， databricks-dash非常适合快速进行原型制作和迭代，但不建议用于生产部署。对于有兴趣将使用databricks-dash Dash应用程序databricks-dash生产的任何数据科学家或开发人员，Plotly的Dash Enterprise文档都可以为您提供所有步骤，以帮助您使用databricks-connect到达那里。

Here is a minimal self-contained example of using databricks-dash to create a Dash app from the Databricks notebook interface. After installing the databricks-dash library, run the example by copying and pasting the following code block into a Databricks notebook cell. Here’s a video demo of how to use databricks-dash to accompany the code below.

这是一个使用databricks-dash从Databricks笔记本界面创建Dash应用程序的独立示例。安装databricks-dash库后，通过将以下代码块复制并粘贴到Databricks笔记本单元中来运行示例。以下是有关如何使用databricks-dash伴随以下代码的视频演示。

# Imports
import plotly.express as px
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
from databricks_dash import DatabricksDash# Load Data
df = px.data.tips()# Build App
app = DatabricksDash(__name__)
server = app.serverapp.layout = html.Div([
     html.H1("DatabricksDash Demo"),
     dcc.Graph(id='graph'),
     html.Label([
          "colorscale",
          dcc.Dropdown(
               id='colorscale-dropdown', clearable=False,
               value='plasma', options=[
                    {'label': c,'value': c}
                    for c in px.colors.named_colorscales()               ])
     ]),
])# Define callback to update graph
@app.callback(
     Output('graph', 'figure'),
     [Input("colorscale-dropdown", "value")]
)
def update_figure(colorscale):
     return px.scatter(
          df, x="total_bill", y="tip", color="size",
          color_continuous_scale=colorscale,
          render_mode="webgl", title="Tips"
     )if __name__ == "__main__":
     app.run_server(mode='inline', debug=True)

The result of this code block is this app:

该代码块的结果是该应用程序：

Here is a slightly larger example that uses PySpark to perform data pre-processing on the Databricks cluster. The dashboard itself is styled using Dash Design Kit, so the dash-design-kit package must be installed along with databricks-dash. This example is based on the Databricks-connect application template but has been modified to use databricks_dash.DatabricksDash instead of dash.Dash.

这是一个稍大的示例，该示例使用PySpark在Databricks群集上执行数据预处理。仪表板本身使用Dash Design Kit设置样式，因此dash-design-kit软件包必须与databricks-dash一起安装。本示例基于Databricks-connect应用程序模板，但已修改为使用databricks_dash.DatabricksDash而不是dash.Dash 。

Image for post — A more complex Dash app within a Databricks notebook

数据块连接 (databricks-connect)

databricks-connectis the recommended way to get PySpark models and Dash applications on Databricks notebooks to production. databricks-connect is a Spark client library distributed by Databricks that allows locally written Spark jobs to be run on a remote Databricks cluster. After installing and configuring databricks-connect and PySpark, developers and data scientists can run Dash and PySpark code on their favorite IDEs and no longer need to use Databricks notebooks. To make this happen, simply import PySpark, as you would import any other python modules, and write PySpark code with your Dash code base. We’ve made this video demo of how to utilize databricks-connect. The end result of this is a Dash application that can query our Databricks cluster for distributed processing, which is essential for big data use cases. This is important because using databricks-connect means our Dash application can be deployed to Plotly’s Dash Enterprise and be production-ready, which is the ideal workflow in Python!

建议使用databricks-connect Databricks笔记本上的PySpark模型和Dash应用程序投入生产。 databricks-connect是由Databricks分发的Spark客户端库，它允许在远程Databricks群集上运行本地编写的Spark作业。安装并配置了databricks-connect和PySpark之后，开发人员和数据科学家可以在自己喜欢的IDE上运行Dash和PySpark代码，而不再需要使用Databricks笔记本。为此，只需导入PySpark，就像导入其他任何python模块一样，然后使用Dash代码库编写PySpark代码。我们已经制作了这个视频演示，演示了如何利用databricks-connect 。这样的最终结果是一个Dash应用程序，该应用程序可以查询我们的Databricks集群以进行分布式处理，这对于大数据用例至关重要。这很重要，因为使用databricks-connect意味着我们的Dash应用程序可以部署到Plotly的Dash Enterprise并可以投入生产，这是Python中的理想工作流程！

Here is an example of a Dash application with databricks-connect. This Dash application uses Yelp’s open dataset and plots out restaurant establishments in Toronto, Calgary, and Montreal on a map. Once we click Submit, this triggers a Spark job on our Databricks cluster, with filtering and matching based on given criteria.

这是带有databricks-connect的Dash应用程序的示例。该Dash应用程序使用Yelp的开放数据集，并在地图上绘制多伦多，卡尔加里和蒙特利尔的餐厅。单击“提交”后，这将触发Databricks集群上的Spark作业，并根据给定条件进行过滤和匹配。

So in summary, the two ways to integrate Dash with Databricks offer advantages for quick prototyping in a Notebook-like fashion or for high-performance production deployment of analytical apps. Both methods provide a path to leverage Plotly’s Dash Enterprise as the recommended solution to operationalize AI/ML models and data directly to business users.

因此，总而言之，将Dash与Databricks集成的两种方式为以类似于Notebook的方式快速进行原型制作或分析应用程序的高性能生产部署提供了优势。两种方法都提供了一条途径，可以利用Plotly的Dash Enterprise作为推荐的解决方案来直接将AI / ML模型和数据投入业务用户。

Databricks brings the best-in-class Python analytic processing backend and Plotly’s Dash brings the best-in-class Python front-end! The documentation for installing, creating, and deploying databricks-dash applications will be available in the next version of Dash Enterprise 4.0 in July 2020.

Databricks带来了一流的Python分析处理后端，而Plotly的Dash带来了一流的Python前端！ 2020年7月 ，下一版本的 Dash Enterprise 4.0 将提供 用于安装，创建和部署 databricks-dash 应用程序 的文档 。

We’ll be posting some more info about our Databricks partnership in the coming weeks on our Twitter and LinkedIn, so stay tuned! If you have any questions or would like to learn more about Plotly Dash and Databricks integration, email info@plotly.com, and we’ll get you started!

我们将在未来几周内在Twitter和LinkedIn上发布有关Databricks合作伙伴关系的更多信息，敬请期待！如果您有任何疑问或想了解有关Plotly Dash和Databricks集成的更多信息，请发送电子邮件至info@plotly.com ，我们将帮助您入门！