任务编排工具和工作流程 (Task orchestration tools and workflows)

Recently there’s been an explosion of new tools for orchestrating task- and data workflows (sometimes referred to as “MLOps”). The quantity of these tools can make it hard to choose which ones to use and to understand how they overlap, so we decided to compare some of the most popular ones head to head.

最近，用于编排任务和数据工作流(有时称为“ MLOps”) 的新工具激增。这些工具的数量众多，因此很难选择要使用的工具，也难以理解它们的重叠方式，因此我们决定对一些最受欢迎的工具进行比较。

Image for post — Airflow is the most popular solution, followed by Luigi. There are newer contenders too, and they’re all growing fast. (Source: Author)

Overall Apache Airflow is both the most popular tool and also the one with the broadest range of features, but Luigi is a similar tool that’s simpler to get started with. Argo is the one teams often turn to when they’re already using Kubernetes, and Kubeflow and MLFlow serve more niche requirements related to deploying machine learning models and tracking experiments.

总体而言，Apache Airflow既是最受欢迎的工具，也是功能最广泛的工具，但是Luigi是类似的工具，上手起来比较简单。 Argo是团队已经在使用Kubernetes时经常使用的一种，而Kubeflow和MLFlow满足了与部署机器学习模型和跟踪实验有关的更多利基需求。

Before we dive into a detailed comparison, it’s useful to understand some broader concepts related to task orchestration.

在进行详细比较之前，了解一些与任务编排相关的更广泛的概念很有用。

什么是任务编排，为什么有用？ (What is task orchestration and why is it useful?)

Smaller teams usually start out by managing tasks manually — such as cleaning data, training machine learning models, tracking results, and deploying the models to a production server. As the size of the team and the solution grows, so does the number of repetitive steps. It also becomes more important that these tasks are executed reliably.

较小的团队通常从手动管理任务开始，例如清理数据，训练机器学习模型，跟踪结果以及将模型部署到生产服务器。随着团队规模和解决方案的增长，重复步骤的数量也随之增加。可靠地执行这些任务也变得更加重要。

The complex ways these tasks depend on each other also increases. When you start out, you might have a pipeline of tasks that needs to be run once a week, or once a month. These tasks need to be run in a specific order. As you grow, this pipeline becomes a network with dynamic branches. In certain cases, some tasks set off other tasks, and these might depend on several other tasks running first.

这些任务相互依赖的复杂方式也在增加。当你开始，你可能有任务的管道需要进行每周运行一次或每月一次。这些任务需要按特定顺序运行。随着您的成长，该管道变成具有动态分支的网络。在某些情况下，某些任务会引发其他任务，而这些可能取决于首先运行的其他几个任务。

This network can be modelled as a DAG — a Directed Acyclic Graph, which models each task and the dependencies between them.

可以将该网络建模为DAG(有向无环图)，该模型对每个任务及其之间的依赖关系进行建模。

Workflow orchestration tools allow you to define DAGs by specifying all of your tasks and how they depend on each other. The tool then executes these tasks on schedule, in the correct order, retrying any that fail before running the next ones. It also monitors the progress and notifies your team when failures happen.

工作流程编排工具允许您通过指定所有任务以及它们如何相互依赖来定义DAG。然后，该工具按正确的顺序按计划执行这些任务，然后在运行下一个任务之前重试任何失败的任务。它还会监视进度，并在发生故障时通知您的团队。

CI/CD tools such as Jenkins are commonly used to automatically test and deploy code, and there is a strong parallel between these tools and task orchestration tools — but there are important distinctions too. Even though in theory you can use these CI/CD tools to orchestrate dynamic, interlinked tasks, at a certain level of complexity you’ll find it easier to use more general tools like Apache Airflow instead.

CI / CD工具(例如Jenkins)通常用于自动测试和部署代码，这些工具与任务编排工具之间有很强的相似性-但也有重要的区别。即使从理论上讲，您可以使用这些CI / CD工具来编排动态的，相互链接的任务，但在一定程度的复杂性下，您会发现改用Apache Airflow等更通用的工具会更容易。

[Want more articles like this? Sign up to our newsletter. We share a maximum of one article per week and never send any kind of promotional mail].

[想要更多这样的文章吗？订阅我们的新闻通讯。我们每周最多共享一篇文章，从不发送任何形式的促销邮件]。

Overall, the focus of any orchestration tool is ensuring centralized, repeatable, reproducible, and efficient workflows: a virtual command center for all of your automated tasks. With that context in mind, let’s see how some of the most popular workflow tools stack up.

总体而言，任何业务流程工具的重点都是确保集中，可重复，可重现和高效的工作流程：虚拟命令中心，用于您的所有自动化任务。考虑到这种情况，让我们看看一些最流行的工作流工具是如何堆叠的。

告诉我使用哪一个 (Just tell me which one to use)

You should probably use:

您可能应该使用：

Apache Airflow if you want the most full-featured, mature tool and you can dedicate time to learning how it works, setting it up, and maintaining it.
阿帕奇气流 如果您需要功能最全，最成熟的工具，则可以花时间来学习它的工作原理，设置和维护它。
Luigi if you need something with an easier learning curve than Airflow. It has fewer features, but it’s easier to get off the ground.
路易吉 如果您需要比Airflow更容易学习的东西。它具有较少的功能，但更容易起步。
Argo if you’re already deeply invested in the Kubernetes ecosystem and want to manage all of your tasks as pods, defining them in YAML instead of Python.
Argo，如果您已经对Kubernetes生态系统进行了深入投资，并希望将所有任务作为Pod进行管理，请在YAML中定义它们，而不是Python。
KubeFlow if you want to use Kubernetes but still define your tasks with Python instead of YAML.
库伯流 如果您想使用Kubernetes，但仍使用Python而不是YAML定义任务。
MLFlow if you care more about tracking experiments or tracking and deploying models using MLFlow’s predefined patterns than about finding a tool that can adapt to your existing custom workflows.
MLFlow，如果您更关心使用MLFlow的预定义模式跟踪实验或跟踪和部署模型，而不是寻找可以适应现有自定义工作流程的工具。

比较表 (Comparison table)

For a quick overview, we’ve compared the libraries when it comes to:

为了快速浏览，我们比较了以下方面的库：

Maturity: based on the age of the project and the number of fixes and commits;
成熟度：基于项目的年龄以及修复和提交的次数；
Popularity: based on adoption and GitHub stars;
受欢迎程度：基于采用率和GitHub星级；
Simplicity: based on ease of onboarding and adoption;
简洁性：基于易于注册和采用；
Breadth: based on how specialized vs. how adaptable each project is;
广度：基于每个项目的专业性与适应性；
Language: based on the primary way you interact with the tool.
语言：基于您与工具互动的主要方式。

These are not rigorous or scientific benchmarks, but they’re intended to give you a quick overview of how the tools overlap and how they differ from each other. For more details, see the head-to-head comparison below.

这些不是严格或科学的基准，但是它们旨在使您快速了解这些工具如何重叠以及它们如何彼此不同。有关更多详细信息，请参见下面的正面对比。

路易吉vs.气流 (Luigi vs. Airflow)

Luigi and Airflow solve similar problems, but Luigi is far simpler. It’s contained in a single component, while Airflow has multiple modules which can be configured in different ways. Airflow has a larger community and some extra features, but a much steeper learning curve. Specifically, Airflow is far more powerful when it comes to scheduling, and it provides a calendar UI to help you set up when your tasks should run. With Luigi, you need to write more custom code to run tasks on a schedule.

Luigi和Airflow解决了类似的问题，但是Luigi要简单得多。它包含在单个组件中，而Airflow有多个模块，可以用不同的方式进行配置。气流具有更大的社区和一些其他功能，但学习曲线却陡峭得多。具体来说，Airflow在计划方面要强大得多，它提供了日历UI，可帮助您设置任务应在何时运行。使用Luigi，您需要编写更多的自定义代码以按计划运行任务。

Both tools use Python and DAGs to define tasks and dependencies. Use Luigi if you have a small team and need to get started quickly. Use Airflow if you have a larger team and can take an initial productivity hit in exchange for more power once you’ve gotten over the learning curve.

两种工具都使用Python和DAG定义任务和依赖项。如果您的团队较小并且需要快速上手，请使用Luigi。如果您的团队规模较大，可以使用Airflow，一旦您掌握了学习曲线，就可以以最初的生产力下降来换取更多的功能。

路易吉vs.阿尔戈 (Luigi vs. Argo)

Argo is built on top of Kubernetes, and each task is run as a separate Kubernetes pod. This can be convenient if you’re already using Kubernetes for most of your infrastructure, but it will add complexity if you’re not. Luigi is a Python library and can be installed with Python package management tools, such as pip and conda. Argo is a Kubernetes extension and is installed using Kubernetes. While both tools let you define your tasks as DAGs, with Luigi you’ll use Python to write these definitions, and with Argo you’ll use YAML.

Argo建立在Kubernetes之上，并且每个任务都作为单独的Kubernetes容器运行。如果您已经在大多数基础架构中使用Kubernetes，这可能会很方便，但是如果您没有使用Kubernetes，则会增加复杂性。 Luigi是一个Python库，可以与Python包管理工具(如pip和conda)一起安装。 Argo是Kubernetes扩展，使用Kubernetes安装。虽然这两种工具都可以将任务定义为DAG，但使用Luigi时，您将使用Python编写这些定义，而使用Argo时，您将使用YAML。

Use Argo if you’re already invested in Kubernetes and know that all of your tasks will be pods. You should also consider it if the developers who’ll be writing the DAG definitions are more comfortable with YAML than Python. Use Luigi if you’re not running on Kubernetes and have Python expertise on the team.

如果您已经对Kubernetes进行了投资，并且知道所有任务都是吊舱，请使用Argo。如果将要编写DAG定义的开发人员对YAML比对Python更满意，则还应该考虑这一点。如果您不是在Kubernetes上运行并且在团队中拥有Python专业知识，请使用Luigi。

路易吉vs.库伯福 (Luigi vs. Kubeflow)

Luigi is a Python-based library for general task orchestration, while Kubeflow is a Kubernetes-based tool specifically for machine learning workflows. Luigi is built to orchestrate general tasks, while Kubeflow has prebuilt patterns for experiment tracking, hyper-parameter optimization, and serving Jupyter notebooks. Kubeflow consists of two distinct components: Kubeflow and Kubeflow Pipelines. The latter is focused on model deployment and CI/CD, and it can be used independently of the main Kubeflow features.

Luigi是用于一般任务编排的基于Python的库，而Kubeflow是专门用于机器学习工作流的基于Kubernetes的工具。 Luigi是为协调一般任务而构建的，而Kubeflow具有用于实验跟踪，超参数优化和为Jupyter笔记本服务的预构建模式。 Kubeflow由两个不同的组件组成：Kubeflow和Kubeflow管道。后者专注于模型部署和CI / CD，并且可以独立于主要Kubeflow功能使用。

Use Luigi if you need to orchestrate a variety of different tasks, from data cleaning through model deployment. Use Kubeflow if you already use Kubernetes and want to orchestrate common machine learning tasks such as experiment tracking and model training.

如果需要安排从数据清理到模型部署的各种不同任务，请使用Luigi。如果您已经使用Kubernetes并希望安排常见的机器学习任务(例如实验跟踪和模型训练)，请使用Kubeflow。

路易吉vs MLFlow (Luigi vs. MLFlow)

Luigi is a general task orchestration system, while MLFlow is a more specialized tool to help manage and track your machine learning lifecycle and experiments. You can use Luigi to define general tasks and dependencies (such as training and deploying a model), but you can import MLFlow directly into your machine learning code and use its helper function to log information (such as the parameters you’re using) and artifacts (such as the trained models). You can also use MLFlow as a command-line tool to serve models built with common tools (such as scikit-learn) or deploy them to common platforms (such as AzureML or Amazon SageMaker).

Luigi是一个通用的任务编排系统，而MLFlow是一个更专业的工具，可以帮助管理和跟踪您的机器学习生命周期和实验。您可以使用Luigi定义常规任务和依赖项(例如训练和部署模型)，但是可以将MLFlow直接导入到机器学习代码中，并使用其帮助函数来记录信息(例如您正在使用的参数)，并且工件(例如训练有素的模型)。您还可以将MLFlow用作命令行工具，以服务使用通用工具(例如scikit-learn)构建的模型或将其部署到通用平台(例如AzureML或Amazon SageMaker)。

气流与Argo (Airflow vs. Argo)

Argo and Airflow both allow you to define your tasks as DAGs, but in Airflow you do this with Python, while in Argo you use YAML. Argo runs each task as a Kubernetes pod, while Airflow lives within the Python ecosystem. Canva evaluated both options before settling on Argo, and you can watch this talk to get their detailed comparison and evaluation.

Argo和Airflow都允许您将任务定义为DAG，但是在Airflow中，您可以使用Python进行此操作，而在Argo中，您可以使用YAML。 Argo作为Kubernetes窗格运行每个任务，而Airflow则生活在Python生态系统中。在选择Argo之前，Canva评估了这两个选项，您可以观看此演讲以获取详细的比较和评估。

Use Airflow if you want a more mature tool and don’t care about Kubernetes. Use Argo if you’re already invested in Kubernetes and want to run a wide variety of tasks written in different stacks.

如果您想要更成熟的工具并且不关心Kubernetes，请使用Airflow。如果您已经在Kubernetes上进行了投资，并且想要运行以不同堆栈编写的各种任务，请使用Argo。

气流与Kubeflow (Airflow vs. Kubeflow)

Airflow is a generic task orchestration platform, while Kubeflow focuses specifically on machine learning tasks, such as experiment tracking. Both tools allow you to define tasks using Python, but Kubeflow runs tasks on Kubernetes. Kubeflow is split into Kubeflow and Kubeflow Pipelines: the latter component allows you to specify DAGs, but it’s more focused on deployment and model serving than on general tasks.

Airflow是一个通用的任务编排平台，而Kubeflow则特别专注于机器学习任务，例如实验跟踪。两种工具都允许您使用Python定义任务，但是Kubeflow在Kubernetes上运行任务。 Kubeflow分为Kubeflow和Kubeflow管道：后一个组件允许您指定DAG，但与常规任务相比，它更侧重于部署和模型服务。

Use Airflow if you need a mature, broad ecosystem that can run a variety of different tasks. Use Kubeflow if you already use Kubernetes and want more out-of-the-box patterns for machine learning solutions.

如果您需要一个成熟的，广泛的生态系统来执行各种不同的任务，请使用Airflow。如果您已经使用Kubernetes，并希望使用更多现成的机器学习解决方案模式，请使用Kubeflow。

气流与MLFlow (Airflow vs. MLFlow)

Airflow is a generic task orchestration platform, while MLFlow is specifically built to optimize the machine learning lifecycle. This means that MLFlow has the functionality to run and track experiments, and to train and deploy machine learning models, while Airflow has a broader range of use cases, and you could use it to run any set of tasks. Airflow is a set of components and plugins for managing and scheduling tasks. MLFlow is a Python library you can import into your existing machine learning code and a command-line tool you can use to train and deploy machine learning models written in scikit-learn to Amazon SageMaker or AzureML.

Airflow是一个通用的任务编排平台，而MLFlow是专门为优化机器学习生命周期而构建的。这意味着MLFlow具有运行和跟踪实验以及训练和部署机器学习模型的功能，而Airflow具有更广泛的用例，您可以使用它来运行任何任务集。 Airflow是一组用于管理和计划任务的组件和插件。 MLFlow是一个Python库，您可以将其导入到现有的机器学习代码中，并且可以使用命令行工具来将scikit-learn编写的机器学习模型训练和部署到Amazon SageMaker或AzureML。

Use MLFlow if you want an opinionated, out-of-the-box way of managing your machine learning experiments and deployments. Use Airflow if you have more complicated requirements and want more control over how you manage your machine learning lifecycle.

如果您想以一种开明的，开箱即用的方式来管理机器学习实验和部署的方法，请使用MLFlow。如果您有更复杂的要求并且想要更好地控制如何管理机器学习生命周期，请使用Airflow。

Argo与Kubeflow (Argo vs. Kubeflow)

Parts of Kubeflow (like Kubeflow Pipelines) are built on top of Argo, but Argo is built to orchestrate any task, while Kubeflow focuses on those specific to machine learning — such as experiment tracking, hyperparameter tuning, and model deployment. Kubeflow Pipelines is a separate component of Kubeflow which focuses on model deployment and CI/CD, and can be used independently of Kubeflow’s other features. Both tools rely on Kubernetes and are likely to be more interesting to you if you’ve already adopted that. With Argo, you define your tasks using YAML, while Kubeflow allows you to use a Python interface instead.

Kubeflow的某些部分(例如Kubeflow管道)建立在Argo之上，但是Argo的建立是为了编排任何任务，而Kubeflow则专注于特定于机器学习的任务，例如实验跟踪，超参数调整和模型部署。 Kubeflow管道是Kubeflow的一个独立组件，专注于模型部署和CI / CD，并且可以独立于Kubeflow的其他功能使用。这两种工具都依赖Kubernetes，如果您已经采用了它，那么可能会让您更感兴趣。使用Argo，您可以使用YAML定义任务，而Kubeflow允许您使用Python接口。

Use Argo if you need to manage a DAG of general tasks running as Kubernetes pods. Use Kubeflow if you want a more opinionated tool focused on machine learning solutions.

如果您需要管理作为Kubernetes Pod运行的常规任务的DAG，请使用Argo。如果您想要更专注于机器学习解决方案的工具，请使用Kubeflow。

Argo与MLFlow (Argo vs. MLFlow)

Argo is a task orchestration tool that allows you to define your tasks as Kubernetes pods and run them as a DAG, defined with YAML. MLFlow is a more specialized tool that doesn’t allow you to define arbitrary tasks or the dependencies between them. Instead, you can import MLFlow into your existing (Python) machine learning code base as a Python library and use its helper functions to log artifacts and parameters to help with analysis and experiment tracking. You can also use MLFlow’s command-line tool to train scikit-learn models and deploy them to Amazon Sagemaker or Azure ML, as well as to manage your Jupyter notebooks.

Argo是一个任务编排工具，可让您将任务定义为Kubernetes Pod，并将其作为DAG运行(使用YAML定义)。 MLFlow是一种更加专业的工具，它不允许您定义任意任务或它们之间的依赖关系。相反，您可以将MLFlow作为Python库导入到现有的(Python)机器学习代码库中，并使用其助手功能记录工件和参数，以帮助进行分析和实验跟踪。您还可以使用MLFlow的命令行工具来训练scikit学习模型，并将其部署到Amazon Sagemaker或Azure ML，以及管理Jupyter笔记本。

Use Argo if you need to manage generic tasks and want to run them on Kubernetes. Use MLFlow if you want an opinionated way to manage your machine learning lifecycle with managed cloud platforms.

如果您需要管理常规任务并想在Kubernetes上运行它们，请使用Argo。如果您想以一种明智的方式使用托管云平台来管理机器学习生命周期，请使用MLFlow。

Kubeflow与MLFlow (Kubeflow vs. MLFlow)

Kubeflow and MLFlow are both smaller, more specialized tools than general task orchestration platforms such as Airflow or Luigi. Kubeflow relies on Kubernetes, while MLFlow is a Python library that helps you add experiment tracking to your existing machine learning code. Kubeflow lets you build a full DAG where each step is a Kubernetes pod, but MLFlow has built-in functionality to deploy your scikit-learn models to Amazon Sagemaker or Azure ML.

与诸如Airflow或Luigi之类的通用任务编排平台相比，Kubeflow和MLFlow都是更小，更专业的工具。 Kubeflow依赖Kubernetes，而MLFlow是一个Python库，可帮助您将实验跟踪添加到现有的机器学习代码中。 Kubeflow允许您构建完整的DAG，其中每个步骤都是一个Kubernetes窗格，但是MLFlow具有内置功能，可以将scikit学习模型部署到Amazon Sagemaker或Azure ML。

Use Kubeflow if you want to track your machine learning experiments and deploy your solutions in a more customized way, backed by Kubernetes. Use MLFlow if you want a simpler approach to experiment tracking and want to deploy to managed platforms such as Amazon Sagemaker.

如果您想跟踪机器学习实验并以Kubernetes为后盾以更自定义的方式部署解决方案，请使用Kubeflow。如果您想要一种更简单的方法来进行实验跟踪，并希望将其部署到托管平台(例如Amazon Sagemaker)，请使用MLFlow。

没有银弹 (No silver bullet)

While all of these tools have different focus points and different strengths, no tool is going to give you a headache-free process straight out of the box. Before sweating over which tool to choose, it’s usually important to ensure you have good processes, including a good team culture, blame-free retrospectives, and long-term goals. If you’re struggling with any machine learning problems, get in touch. We love talking shop, and you can schedule a free call with our CEO.

尽管所有这些工具都有不同的重点和优势，但是没有任何一种工具可以使您立即摆脱头痛的困扰。在努力选择哪种工具之前，通常重要的是要确保您拥有良好的流程，包括良好的团队文化，无可指责的回顾和长期目标。如果您遇到任何机器学习问题，请与我们联系。我们喜欢谈论商店，您可以安排与我们首席执行官的免费电话。