python dash_Dash是Databricks Spark后端的理想基于Python的前端

python dash

📌 Learn how to deliver AI for Big Data using Dash & Databricks this recorded webinar with Peter Kim of Plotly and Prasad Kona of Databricks.

this通过Plotly的Peter Kim和Databricks的Prasad Kona的网络研讨会了解如何使用Dash&Databricks交付用于大数据的AI。

We’re delighted to announce that Plotly and Databricks are partnering to bring cloud-distributed Artificial Intelligence (AI) & Machine Learning (ML) to a vastly wider audience of business users. By integrating the Plotly Dash frontend with the Databricks backend, we are offering a seamless process to transform AI and ML models into production-ready, dynamic, interactive, web applications. This partnership with Databricks empowers Python developers to easily and quickly build Dash apps that are connected to a Databricks Spark cluster. The direct integration, databricks-dash, is distributed by Plotly and available with Plotly’s Dash Enterprise.

我们很高兴地宣布, Plotly和Databricks正在合作,将云分布式的人工智能(AI)和机器学习(ML)带给更广泛的业务用户。 通过将Plotly Dash前端与Databricks后端集成,我们提供了一个无缝流程,可将AI和ML模型转换为可用于生产,动态,交互式的Web应用程序。 通过与Databricks的合作,Python开发人员可以轻松快速地构建连接到Databricks Spark集群的Dash应用程序。 直接集成 databricks-dash 由Plotly分发,可用于 Plotly的Dash Enterprise。

Plotly’s Dash is a Python framework that enables developers to build interactive, data-rich analytical web apps in pure Python, with no JavaScript required. Traditional “full-stack” app development is done in teams with some members specializing in backend/server technologies like Python, some specializing in front-end technologies like React, and some specializing in data science. Dash provides a tightly-integrated backend and front-end, entirely written in Python. This means that data science teams producing models, visualizations and complex analyses no longer need to rely on backend specialists to expose these models to the front-end via APIs, and no longer need to rely on front-end specialists to build user interfaces to connect to these APIs. If you’re interested in Dash’s architecture, please see our “Dash is React for Python” article.

Plotly的Dash是一个Python框架,可让开发人员使用纯Python构建交互式,数据丰富的分析Web应用程序,而无需使用JavaScript。 传统的“全栈”应用程序开发是由团队完成的,其中一些成员专门研究Python等后端/服务器技术,一些成员专门研究React等前端技术,还有一些专门研究数据科学。 Dash提供了完全使用Python编写的紧密集成的后端和前端。 这意味着产生模型,可视化和复杂分析的数据科学团队不再需要依靠后端专家通过API将这些模型公开给前端,也不再需要依靠前端专家来构建用户界面进行连接这些API。 如果您对Dash的体系结构感兴趣,请参阅我们的“ Dash是Python的React ”一文。

Databricks’ unified platform for data and AI rests on top of Apache Spark, a distributed general-purpose cluster computing framework originally developed by the Databricks founders. With enough hardware and networking availability, Apache Spark scales horizontally naturally due to its distributed architecture. Apache Spark has a rich collection of APIs, MLlib, and integration with popular Python scientific libraries (e.g. pandas, scikit-learn, etc). The Databricks Data Science Workspace provides managed, optimized, and secure Spark clusters. This enables developers and data scientists to focus on building and optimizing models and worry less about infrastructure aspects such as speed, reliability, building fault-tolerant systems, etc. Databricks also abstracts away many manual administrative duties (such as creating a cluster, auto-scaling hardware, and managing users) and simplifies the development process by enabling users to create IPython-like notebooks.

Databricks的数据和AI统一平台位于Apache Spark之上, Apache Spark是由Databricks创始人最初开发的分布式通用集群计算框架。 凭借足够的硬件和网络可用性,Apache Spark由于其分布式架构而可以自然地水平扩展。 Apache Spark具有丰富的API,MLlib以及与流行的Python科学库(例如,pandas,scikit-learn等)的集成。 Databricks数据科学工作区提供了托管,优化和安全的Spark集群。 这使开发人员和数据科学家可以专注于构建和优化模型,而不必担心基础架构方面的问题,例如速度,可靠性,构建容错系统等。Databricks还抽象出许多手动管理职责(例如创建集群,扩展硬件并管理用户),并通过使用户能够创建类似于IPython的笔记本来简化开发过程。

With Dash apps connected to Databricks Spark clusters, Dash + Databricks gives business users the powerful magic of Python and pyspark.

通过将Dash应用程序连接到Databricks Spark集群,Dash + Databricks为业务用户提供了Python和pyspark的强大魔力。

Databricks is the industry-leading Spark platform, and Plotly’s Dash is the industry-leading library for building UIs and web apps in Python. By using Dash and Databricks together, data scientists can quickly deliver production-ready AI and ML apps to business users that are backed by Databricks Spark clusters. A typical Dash + Databricks app is usually less than a thousand lines of code written in Python (no Javascript required). These Dash apps can vary from simple UIs for simulation models to complex dashboards acting as read/write interfaces to your Databricks Spark cluster and large amounts of data stored in a data warehouse. With Dash apps connected to Databricks Spark clusters, Dash + Databricks gives business users the powerful magic of Python and pyspark.

Databricks是行业领先的Spark平台,而Plotly的Dash是行业领先的库,用于在Python中构建UI和Web应用程序。 通过将Dash和Databricks一起使用,数据科学家可以为由Databricks Spark集群支持的业务用户快速交付可用于生产的AI和ML应用程序。 一个典型的Dash + Databricks应用程序通常少于一千行用Python编写的代码(不需要Javascript)。 这些Dash应用程序的范围从模拟模型的简单UI到充当Databricks Spark集群的读/写界面以及存储在数据仓库中的大量数据的复杂仪表板不等。 通过将Dash应用程序连接到Databricks Spark集群,Dash + Databricks为业务用户提供了Python和pyspark的强大魔力。

Currently, there are two ways to integrate Dash with Databricks:

当前,有两种方法可以将Dash与Databricks集成:

  1. databricks-dash supports a Notebook-like approach meant for quick Dash app prototyping within the Databricks notebook environment.

    databricks-dash支持类似于Notebook的方法,旨在在Databricks笔记本环境中快速进行Dash应用原型设计。

  2. databricks-connect supports a development-like approach meant for productionizing.

    databricks-connect支持用于生产的类似于开发的方法。

More details on each integration methods follow:

每种集成方法的更多详细信息如下:

数据块-破折号 (databricks-dash)

databricks-dash is a closed-source, custom library that can be installed and imported on any Databricks notebook. With the use of import, developers can start building Dash applications on the Databricks notebook itself. Like regular Dash applications, Dash applications in Databricks notebooks maintain their usage of app layouts and callbacks. Any PySpark code that deals with complex models or simple ETL processes written on Databricks notebooks can be easily integrated into Dash applications with minimal code migrations. Once a Flask (Python) server runs, the generated Dash application becomes hosted on your Databricks instance with a unique url. It is important to note that these Dash applications on Databricks notebooks are running on shared resources and lack a load balancer. Thus, databricks-dash is great for quick prototyping and iterating but is not recommended for production deployments. For any data scientist or developer interested in taking this Dash application using databricks-dash to production, Plotly’s Dash Enterprise documentation can provide you all the steps to help you get there by using databricks-connect.

databricks-dash是一个封闭源代码,自定义库,可以在任何Databricks笔记本上安装和导入。 通过使用import ,开发人员可以开始在Databricks笔记本本身上构建Dash应用程序。 像常规的Dash应用程序一样,Databricks笔记本中的Dash应用程序保持其对应用程序布局和回调的使用。 任何处理Databricks笔记本上编写的复杂模型或简单ETL流程的PySpark代码都可以轻松地集成到Dash应用程序中,而无需进行最少的代码迁移。 Flask(Python)服务器运行后,生成的Dash应用程序将使用唯一的URL托管在您的Databricks实例上。 重要的是要注意,Databricks笔记本上的这些Dash应用程序在共享资源上运行,并且没有负载平衡器。 因此, databricks-dash非常适合快速进行原型制作和迭代,但不建议用于生产部署。 对于有兴趣将使用databricks-dash Dash应用程序databricks-dash生产的任何数据科学家或开发人员,Plotly的Dash Enterprise文档都可以为您提供所有步骤,以帮助您使用databricks-connect到达那里。

Here is a minimal self-contained example of using databricks-dash to create a Dash app from the Databricks notebook interface. After installing the databricks-dash library, run the example by copying and pasting the following code block into a Databricks notebook cell. Here’s a video demo of how to use databricks-dash to accompany the code below.

这是一个使用databricks-dash从Databricks笔记本界面创建Dash应用程序的独立示例。 安装databricks-dash库后,通过将以下代码块复制并粘贴到Databricks笔记本单元中来运行示例。 以下是有关如何使用databricks-dash伴随以下代码的视频演示 。

# Imports
import plotly.express as px
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
from databricks_dash import DatabricksDash# Load Data
df = px.data.tips()# Build App
app = DatabricksDash(__name__)
server = app.serverapp.layout = html.Div([
html.H1("DatabricksDash Demo"),
dcc.Graph(id='graph'),
html.Label([
"colorscale",
dcc.Dropdown(
id='colorscale-dropdown', clearable=False,
value='plasma', options=[
{'label': c,'value': c}
for c in px.colors.named_colorscales() ])
]),
])# Define callback to update graph
@app.callback(
Output('graph', 'figure'),
[Input("colorscale-dropdown", "value")]
)
def update_figure(colorscale):
return px.scatter(
df, x="total_bill", y="tip", color="size",
color_continuous_scale=colorscale,
render_mode="webgl", title="Tips"
)if __name__ == "__main__":
app.run_server(mode='inline', debug=True)

The result of this code block is this app:

该代码块的结果是该应用程序:

Here is a slightly larger example that uses PySpark to perform data pre-processing on the Databricks cluster. The dashboard itself is styled using Dash Design Kit, so the dash-design-kit package must be installed along with databricks-dash. This example is based on the Databricks-connect application template but has been modified to use databricks_dash.DatabricksDash instead of dash.Dash.

这是一个稍大的示例,该示例使用PySpark在Databricks群集上执行数据预处理。 仪表板本身使用Dash Design Kit设置样式,因此dash-design-kit软件包必须与databricks-dash一起安装。 本示例基于Databricks-connect应用程序模板,但已修改为使用databricks_dash.DatabricksDash而不是dash.Dash

Image for post
A more complex Dash app within a Databricks notebook
Databricks笔记本中更复杂的Dash应用

数据块连接 (databricks-connect)

databricks-connectis the recommended way to get PySpark models and Dash applications on Databricks notebooks to production. databricks-connect is a Spark client library distributed by Databricks that allows locally written Spark jobs to be run on a remote Databricks cluster. After installing and configuring databricks-connect and PySpark, developers and data scientists can run Dash and PySpark code on their favorite IDEs and no longer need to use Databricks notebooks. To make this happen, simply import PySpark, as you would import any other python modules, and write PySpark code with your Dash code base. We’ve made this video demo of how to utilize databricks-connect. The end result of this is a Dash application that can query our Databricks cluster for distributed processing, which is essential for big data use cases. This is important because using databricks-connect means our Dash application can be deployed to Plotly’s Dash Enterprise and be production-ready, which is the ideal workflow in Python!

建议使用databricks-connect Databricks笔记本上的PySpark模型和Dash应用程序投入生产。 databricks-connect是由Databricks分发的Spark客户端库,它允许在远程Databricks群集上运行本地编写的Spark作业。 安装并配置了databricks-connect和PySpark之后,开发人员和数据科学家可以在自己喜欢的IDE上运行Dash和PySpark代码,而不再需要使用Databricks笔记本。 为此,只需导入PySpark,就像导入其他任何python模块一样,然后使用Dash代码库编写PySpark代码。 我们已经制作了这个视频演示 , 演示了如何利用databricks-connect 。 这样的最终结果是一个Dash应用程序,该应用程序可以查询我们的Databricks集群以进行分布式处理,这对于大数据用例至关重要。 这很重要,因为使用databricks-connect意味着我们的Dash应用程序可以部署到Plotly的Dash Enterprise并可以投入生产,这是Python中的理想工作流程!

Here is an example of a Dash application with databricks-connect. This Dash application uses Yelp’s open dataset and plots out restaurant establishments in Toronto, Calgary, and Montreal on a map. Once we click Submit, this triggers a Spark job on our Databricks cluster, with filtering and matching based on given criteria.

这是带有databricks-connect的Dash应用程序的示例 。 该Dash应用程序使用Yelp的开放数据集 ,并在地图上绘制多伦多,卡尔加里和蒙特利尔的餐厅。 单击“提交”后,这将触发Databricks集群上的Spark作业,并根据给定条件进行过滤和匹配。

Image for post
A Dash app on Dash Enterprise, connecting to a Databricks Spark cluster through databricks-connect
Dash Enterprise上的Dash应用程序,通过databricks-connect连接到Databricks Spark集群

So in summary, the two ways to integrate Dash with Databricks offer advantages for quick prototyping in a Notebook-like fashion or for high-performance production deployment of analytical apps. Both methods provide a path to leverage Plotly’s Dash Enterprise as the recommended solution to operationalize AI/ML models and data directly to business users.

因此,总而言之,将Dash与Databricks集成的两种方式为以类似于Notebook的方式快速进行原型制作或分析应用程序的高性能生产部署提供了优势。 两种方法都提供了一条途径,可以利用Plotly的Dash Enterprise作为推荐的解决方案来直接将AI / ML模型和数据投入业务用户。

Databricks brings the best-in-class Python analytic processing backend and Plotly’s Dash brings the best-in-class Python front-end! The documentation for installing, creating, and deploying databricks-dash applications will be available in the next version of Dash Enterprise 4.0 in July 2020.

Databricks带来了一流的Python分析处理后端,而Plotly的Dash带来了一流的Python前端! 2020年7月 ,下一版本的 Dash Enterprise 4.0 将提供 用于安装,创建和部署 databricks-dash 应用程序 的文档

We’ll be posting some more info about our Databricks partnership in the coming weeks on our Twitter and LinkedIn, so stay tuned! If you have any questions or would like to learn more about Plotly Dash and Databricks integration, email info@plotly.com, and we’ll get you started!

我们将在未来几周内在Twitter和LinkedIn上发布有关Databricks合作伙伴关系的更多信息,敬请期待! 如果您有任何疑问或想了解有关Plotly Dash和Databricks集成的更多信息,请发送电子邮件至info@plotly.com ,我们将帮助您入门!

翻译自: https://medium.com/plotly/dash-is-an-ideal-front-end-for-your-databricks-spark-backend-212ee3cae6cc

python dash

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388141.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Eclipse 插件开发遇到问题心得总结

Eclipse 插件开发遇到问题心得总结 Posted on 2011-07-17 00:51 季枫 阅读(3997) 评论(0) 编辑 收藏1、Eclipse 中插件开发多语言的实现 为了使用 .properties 文件,需要在 META-INF/MANIFEST.MF 文件中定义: Bundle-Localization: plugin 这样就会…

在Python中查找子字符串索引的5种方法

在Python中查找字符串中子字符串索引的5种方法 (5 Ways to Find the Index of a Substring in Strings in Python) str.find() str.find() str.rfind() str.rfind() str.index() str.index() str.rindex() str.rindex() re.search() re.search() str.find() (str.find()) …

Eclipse 插件开发 向导

阅读目录 最近由于特殊需要,开始学习插件开发。   下面就直接弄一个简单的插件吧!   1 新建一个插件工程   2 创建自己的插件名字,这个名字最好特殊一点,一遍融合到eclipse的时候,不会发生冲突。   3 下一步,进…

线性回归 假设_线性回归的假设

线性回归 假设Linear Regression is the bicycle of regression models. It’s simple yet incredibly useful. It can be used in a variety of domains. It has a nice closed formed solution, which makes model training a super-fast non-iterative process.线性回归是回…

solo

solo - 必应词典 美[soʊloʊ]英[səʊləʊ]n.【乐】独奏(曲);独唱(曲);单人舞;单独表演adj.独唱[奏]的;单独的;单人的v.独奏;放单飞adv.独网络梭罗;独奏曲;索罗变形复数&#xff1…

Eclipse 简介和插件开发天气预报

Eclipse 简介和插件开发 Eclipse 是一个很让人着迷的开发环境,它提供的核心框架和可扩展的插件机制给广大的程序员提供了无限的想象和创造空间。目前网上流传相当丰富且全面的开发工具方面的插件,但是 Eclipse 已经超越了开发环境的概念,可以…

趣味数据故事_坏数据的好故事

趣味数据故事Meet Julia. She’s a data engineer. Julia is responsible for ensuring that your data warehouses and lakes don’t turn into data swamps, and that, generally speaking, your data pipelines are in good working order.中号 EETJulia。 她是一名数据工程…

Linux 4.1内核热补丁成功实践

最开始公司运维同学反馈,个别宿主机上存在进程CPU峰值使用率异常的现象。而数万台机器中只出现了几例,也就是说万分之几的概率。监控产生的些小误差,不会造成宕机等严重后果,很容易就此被忽略了。但我们考虑到这个异常转瞬即逝、并…

python分句_Python循环中的分句,继续和其他子句

python分句Python中的循环 (Loops in Python) for loop for循环 while loop while循环 Let’s learn how to use control statements like break, continue, and else clauses in the for loop and the while loop.让我们学习如何在for循环和while循环中使用诸如break &#xf…

eclipse plugin 菜单

简介: 菜单是各种软件及开发平台会提供的必备功能,Eclipse 也不例外,提供了丰富的菜单,包括主菜单(Main Menu),视图 / 编辑器菜单(ViewPart/Editor Menu)和上下文菜单&am…

python数据建模数据集_Python中的数据集

python数据建模数据集There are useful Python packages that allow loading publicly available datasets with just a few lines of code. In this post, we will look at 5 packages that give instant access to a range of datasets. For each package, we will look at h…

打开editor的接口讨论

【打开editor的接口讨论】 先来看一下workbench吧,workbench从静态划分应该大致如下: 从结构图我们大致就可以猜测出来,workbench page作为一个IWorkbenchPart(无论是eidtor part还是view part&#…

网络攻防技术实验五

2018-10-23 实验五 学 号201521450005 中国人民公安大学 Chinese people’ public security university 网络对抗技术 实验报告 实验五 综合渗透 学生姓名 陈军 年级 2015 区队 五 指导教师 高见 信息技术与网络安全学院 2018年10月23日 实验任务总纲 2018—2019 …

usgs地震记录如何下载_用大叶草绘制USGS地震数据

usgs地震记录如何下载One of the many services provided by the US Geological Survey (USGS) is the monitoring and tracking of seismological events worldwide. I recently stumbled upon their earthquake datasets provided at the website below.美国地质调查局(USGS)…

Springboot 项目中 xml文件读取yml 配置文件

2019独角兽企业重金招聘Python工程师标准>>> 在xml文件中读取yml文件即可&#xff0c;代码如下&#xff1a; 现在spring-boot提倡零配置&#xff0c;但是的如果要集成老的spring的项目&#xff0c;涉及到的bean的配置。 <bean id"yamlProperties" clas…

无法获取 vmci 驱动程序版本: 句柄无效

https://jingyan.baidu.com/article/a3a3f811ea5d2a8da2eb8aa1.html 将 vmci0.present "TURE" 改为 “FALSE”; 转载于:https://www.cnblogs.com/limanjihe/p/9868462.html

数据可视化 信息可视化_更好的数据可视化的8个技巧

数据可视化 信息可视化Ggplot is R’s premier data visualization package. Its popularity can likely be attributed to its ease of use — with just a few lines of code you are able to produce great visualizations. This is especially great for beginners who are…

分布式定时任务框架Elastic-Job的使用

为什么80%的码农都做不了架构师&#xff1f;>>> 一、前言 Elastic-Job是一个优秀的分布式作业调度框架。 Elastic-Job是一个分布式调度解决方案&#xff0c;由两个相互独立的子项目Elastic-Job-Lite和Elastic-Job-Cloud组成。 Elastic-Job-Lite定位为轻量级无中心化…

Memcached和Redis

Memcached和Redis作为两种Inmemory的key-value数据库&#xff0c;在设计和思想方面有着很多共通的地方&#xff0c;功能和应用方面在很多场合下(作为分布式缓存服务器使用等) 也很相似&#xff0c;在这里把两者放在一起做一下对比的介绍 基本架构和思想 首先简单介绍一下两者的…

第4章 springboot热部署 4-1 SpringBoot 使用devtools进行热部署

/imooc-springboot-starter/src/main/resources/application.properties #关闭缓存, 即时刷新 #spring.freemarker.cachefalse spring.thymeleaf.cachetrue#热部署生效 spring.devtools.restart.enabledtrue #设置重启的目录,添加那个目录的文件需要restart spring.devtools.r…