钉钉设置jira机器人_这是当您机器学习JIRA票证时发生的事情

钉钉设置jira机器人

For software developers, one of the most-debated and maybe even most-hated questions is “…and how long will it take?”. I’ve experienced those discussions myself, which oftentimes lacked precise information on the requirements. What I’ve learned is: If only sparse information is available, a reliable estimate is almost impossible. To make matters worse, developers found themselves under pressure after having issued a wild guess and then requiring more time.

对于软件开发人员而言,最受争议甚至最讨厌的问题之一是“……需要多长时间?”。 我自己经历了这些讨论,而这些讨论通常缺少有关要求的准确信息。 我了解到的是:如果只有稀疏信息可用,那么几乎不可能进行可靠的估计。 更糟糕的是,开发人员在做出了疯狂的猜测之后发现自己处于压力之下,然后需要更多的时间。

体验另一面 (Experiencing the other side)

As I started working in direct contact with customers, I’ve (reluctantly) realized that the collaboration oftentimes benefits from providing a schedule. In my experience, time becomes an important factor when customers have plans and projects on their own which can’t continue without knowing when a missing piece arrives.

当我开始与客户直接联系时,我(很不情愿地)意识到,协作通常可以从提供计划中受益。 根据我的经验,当客户自己制定计划和项目时,时间变得很重要,而计划和项目无法不知道丢失的零件何时到达。

相互理解(业务理解) (Understanding each other (Business Understanding))

In the end, this comes down to a simple problem: “How can managers and developers work together on a project and get what they, respectively, need?”. For those who interact with a customer, this means that they need the information (estimates) to make the collaboration go smoothly. In turn, developers need accurate requirements and some flexibility has to be respected as well.

最后,这归结为一个简单的问题:“管理人员和开发人员如何才能共同完成一个项目,并分别获得他们需要的东西?”。 对于那些与客户互动的人来说,这意味着他们需要信息(估算)来使协作顺利进行。 反过来,开发人员需要准确的要求,并且还必须尊重一些灵活性。

学习! (Learn!)

With that in mind, I’ve decided to use a data-science-driven approach to gain insights into the estimation problem. You can find the code that I used and more detailed technical explanations in my github repository. What I wanted to know, was:

考虑到这一点,我决定使用数据科学驱动的方法来深入了解估计问题。 您可以在github存储库中找到我使用的代码以及更详细的技术说明。 我想知道的是:

  1. As a baseline reference, what are the average times that a “New Feature”, “Bug” (etc) spends in implementation (i.e. status “in progress”)?

    作为基准参考,“新功能”,“错误”(等)在实施(即状态“进行中”)上花费的平均时间是多少?
  2. Is it possible to estimate the time spent “in progress” from analyzing the text in the summary and description of a ticket?

    是否可以通过分析票证摘要和说明中的文本来估计“进行中”所花费的时间?
  3. Which words in the description make up for large / small durations?

    说明中的哪些词组成了较长/较短的持续时间?

数据理解 (Data Understanding)

As data, I gathered around 20.000 tickets from the RTFACT-repository of the JFrog open source project. For all tickets, the following is available: Issuetype (i.e. “Bug”, “New Feature”), summary, description, time spent “in progress”. Some initial data exploration showed, that out of all the tickets, only 10% (2258) have a nonzero “in progress”-time. All the others have not been worked at or they were never put in that status.

作为数据,我从JFrog开源项目的RTFACT存储库中收集了大约20.000张票证。 对于所有票证,以下内容可用:发行类型(即“错误”,“新功能”),摘要,描述,“进行中”所花费的时间。 一些初步的数据研究表明,在所有故障单中,只有10%(2258)的“进行中”时间为非零。 所有其他人都没有工作过,或者从未处于这种状态。

To get a feeling for the data, I checked the counts of tickets by their issuetype. And as you can see in the next image, there is a large variation in the types with the highest count being Bugs.

为了了解数据,我按票证发行类型检查了票数。 正如您在下一张图片中看到的那样,类型之间的差异很大,其中错误最多。

Image for post

准备数据 (Prepare Data)

As a first cleaning step, I only kept entries with a non-zero “in progress”-time and removed outliers (outside of the 96%-quantile). Now, keep in mind that statistical models can only understand numbers, not text. To translate between strings of characters, I computed TFIDF text analysis features. These are a way of numerically representing the occurrence and importance of certain words in a text document.

作为第一步清理,我只保留“进行中”时间为非零的条目,并删除了异常值(在96%的位数之外)。 现在,请记住,统计模型只能理解数字,而不能理解文本。 为了在字符串之间进行翻译,我计算了TFIDF文本分析功能。 这是一种数字表示文本文档中某些单词的出现和重要性的方法。

资料建模 (Data Modeling)

A powerful and insightful model for analyzing data are decision trees / random forests. One branch of that type of model are gradient boosted trees. These are my model of choice due to their performance (won several Kaggle competitions) and their interpretability. This mainly means, that we can draw further insight from the decisions made in the trees.

决策树/随机森林是分析数据的强大而有见地的模型。 这种模型的一个分支是梯度增强树 。 由于它们的表现(赢得了几次Kaggle比赛)和可解释性,这些是我选择的模型。 这主要意味着,我们可以从树中做出的决策中获得更多的见解。

评估结果 (Evaluate the Results)

So, the first question asked regarded a baseline for the duration of a ticket. As you can see in the next image, the mean duration spans between ~10h and ~100h. Note, that the standard deviation is very large (~50 or higher), which calls for additional estimation information through e.g. the boosted trees.

因此,首先要问的问题是机票期限的基准。 如下图所示,平均持续时间介于〜10h和〜100h之间。 请注意,标准偏差非常大(〜50或更高),这需要通过增强树等其他估算信息。

Image for post

For the trees, the performance is good (and can be tuned to “great”) — on the training set. However, on the test set, the model generalized badly. This is, why I captioned this article by “early results”. As you can see in the next image, the ground truth (blue) deviates significantly from the estimated values (orange).

对于树木而言, 训练集上的表现很好(并且可以调整为“出色”)。 但是,在测试集上,该模型的推广效果很差。 这就是为什么我用“早期结果”为本文加上标题。 如您在下一张图片中所看到的,地面实况(蓝色)与估计值(橙色)明显不同。

Image for post

I think it’s still interesting to take a look at the keywords contribute in a positive way (larger time “in progress”) or a negative way (smaller time “in progress”):

我认为,以积极的方式(较长的时间在“进行中”)或否定的方式(较短的时间在“进行中”)查看关键字贡献是很有趣的:

Image for post

As you can see from the results, the issue types “Bug” and “New Feature” have the largest positive impact on the estimation. On the other end of the spectrum are the “error” and “com”, which have the largest negative impact on the estimation. For the top 15 words that cause the highest positive / negative impact, see the figure below.

从结果中可以看出,问题类型“错误”和“新功能”对估计的影响最大。 另一方面,“误差”和“ com”对估计的负面影响最大。 有关正面/负面影响最大的前15个字,请参见下图。

未来的工作 (Future Work)

What else needs to be done?

还有什么需要做的?

  1. The dataset is not very large (the model had to be trained based on only ~2200 valid samples). The next step would be to find a ticket repository with a larger number of valid tickets.

    数据集不是很大(必须仅基于约2200个有效样本来训练模型)。 下一步将是查找具有大量有效票证的票证存储库。
  2. instead of only estimating the implementation time (time “in progress”), the cycle time is possibly as well interesting to know

    不仅可以估计实施时间(“进行中”的时间 ),还可以了解周期时间

  3. Is it possible to estimate (classify) the ‘resolution’ (Fixed, Duplicate, Won’t Fix, …) of a ticket?

    是否可以估计(分类)票证的“解决方案”(固定,重复,无法解决……)?

谢谢 (Thanks)

This was my first article on medium! Thanks a lot for taking the time. If you have any feedback or insights that you’d like to share: I’d be glad to get some feedback.

这是我关于媒体的第一篇文章! 非常感谢您抽出宝贵的时间。 如果您想分享任何反馈或见解:很高兴获得一些反馈。

翻译自: https://medium.com/@steffen.herbort/early-results-this-is-what-happens-when-you-machine-learn-jira-tickets-1ea0d82f39fa

钉钉设置jira机器人

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391588.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

vscode 标准库位置_如何在VSCode中使用标准

vscode 标准库位置I use Visual Studio Code as my text editor. When I write JavaScript, I follow JavaScript Standard Style.Theres an easy way to integrate Standard in VS Code—with the vscode-standardjs plugin. I made a video for this some time ago if youre …

IBM量子计算新突破:成功构建50个量子比特原型机

本文来自AI新媒体量子位(QbitAI)IBM去年开始以云计算服务的形式提供量子计算能力。当时,IBM发布了包含5个量子比特的计算机。在短短18个月之后,IBM周五宣布,将发布包含20个量子比特的计算机。 IBM还宣布,该…

小程序点击地图气泡获取气泡_气泡上的气泡

小程序点击地图气泡获取气泡Combining two colors that are two steps apart on the Color Wheel creates a Diad Color Harmony. This Color Harmony is one of the lesser used ones. I decided to cover it here to add variety to your options for colorizing visualizati…

PopTheBubble —测量媒体偏差的产品创意

产品管理 (Product Management) A couple of months ago, I decided to try something new. The MVP Lab by Mozilla is an 8-week incubator for pre-startup teams to explore product concepts and, over the 8 weeks of the program, ship a minimum viable product that p…

linux-Centos7安装nginx

首先配置linux环境,我这里是刚刚装好linux,所以一次性安装了一系列我需要到的环境; yum install pcre pcre-devel zlib zlib-devel openssl openssl-devel gd gd-devel libjpeg libjpeg-devel libpng libpng-devel freetype freetype-devel e…

elasticsearch,elasticsearch-service安装

在Windows上安装Elasticsearch.zip 1 安装条件 安装需具备java 8或更高版本;官方的Oracle发行版,只需安装JDKElasticsearch的ZIP安装包——安装包地址 2 如何安装 Elasticsearch 傻瓜式的点下一步即可, java 注意环境变量配置 3 如何判断安装…

图表可视化seaborn风格和调色盘

seaborn是基于matplotlib的python数据可视化库,提供更高层次的API封装,包括一些高级图表可视化等工具。 使用seaborn需要先安装改模块pip3 install seaborn 。 一、风格style 包括set() / set_style() / axes_style() / despine() / set_context() 创建正…

面向Tableau开发人员的Python简要介绍(第3部分)

用PYTHON探索数据 (EXPLORING DATA WITH PYTHON) One of Tableau’s biggest advantages is how it lets you swim around in your data. You don’t always need a fine-tuned dashboard to find meaningful insights, so even someone with quite a basic understanding of T…

7、芯片发展

第一台继电器式计算机由康德拉.楚泽制造(1910-1995),这台机器使用了二进制数,但早期版本中使用的是机械存储器而非继电器,使用老式35毫米电影胶片进行穿孔编程。 同一时期,哈佛大学研究生霍华德.艾肯 要寻找…

seaborn分布数据可视化:直方图|密度图|散点图

系统自带的数据表格(存放在github上https://github.com/mwaskom/seaborn-data),使用时通过sns.load_dataset(表名称)即可,结果为一个DataFrame。 print(sns.get_dataset_names()) #获取所有数据表名称 # [anscombe, attention, …

pymc3使用_使用PyMC3了解飞机事故趋势

pymc3使用Visually exploring historic airline accidents, applying frequentist interpretations and validating changing trends with PyMC3.使用PyMC3直观地浏览历史性航空事故,应用常识性解释并验证变化趋势。 前言 (Preface) On the 7th of August this yea…

爬虫结果数据完整性校验

数据完整性分为三个方面: 1、域完整性(列) 限制输入数据的类型,及范围,或者格式,如性别字段必须是“男”或者“女”,不允许其他数据插入,成绩字段只能是0-100的整型数据,…

go map数据结构

map数据结构 key-value的数据结构,又叫字典或关联数组 声明:var map1 map[keytype]valuetype var a map[string]string var a map[string]int var a map[int]string var a map[string]map[string]string备注:声明是不会分配内存的&#xff0c…

吴恩达神经网络1-2-2_图神经网络进行药物发现-第2部分

吴恩达神经网络1-2-2预测毒性 (Predicting Toxicity) 相关资料 (Related Material) Jupyter Notebook for the article Jupyter Notebook的文章 Drug Discovery with Graph Neural Networks — part 1 图神经网络进行药物发现-第1部分 Introduction to Cheminformatics 化学信息…

Android热修复之 - 阿里开源的热补丁

1.1 基本介绍     我们先去github上面了解它https://github.com/alibaba/AndFix 这里就有一个概念那就AndFix.apatch补丁用来修复方法,接下来我们看看到底是怎么实现的。1.2 生成apatch包      假如我们收到了用户上传的崩溃信息,我们改完需要修复…

seaborn分类数据可视:散点图|箱型图|小提琴图|lv图|柱状图|折线图

一、散点图stripplot( ) 与swarmplot() 1.分类散点图stripplot( ) 用法stripplot(xNone, yNone, hueNone, dataNone, orderNone, hue_orderNone,jitterTrue, dodgeFalse, orientNone, colorNone, paletteNone,size5, edgecolor"gray", linewi…

数据图表可视化_数据可视化十大最有用的图表

数据图表可视化分析师每天使用的最佳数据可视化图表列表。 (List of best data visualization charts that Analysts use on a daily basis.) Presenting information or data in a visual format is one of the most effective ways. Researchers have proved that the human …

javascript实现自动添加文本框功能

转自:http://www.cnblogs.com/damonlan/archive/2011/08/03/2126046.html 昨天,我们公司的网络小组决定为公司做一个内部的网站,主要是为员工比如发布公告啊、填写相应信息、投诉、问题等等需求。我那同事给了我以下需求: 1.点击一…

从Mysql slave system lock延迟说开去

本文主要分析 sql thread中system lock出现的原因,但是笔者并明没有系统的学习过master-slave的代码,这也是2018年的一个目标,2018年我都排满了,悲剧。所以如果有错误请指出,也作为一个笔记用于后期学习。同时也给出笔…

接facebook广告_Facebook广告分析

接facebook广告Is our company’s Facebook advertising even worth the effort?我们公司的Facebook广告是否值得努力? 题: (QUESTION:) A company would like to know if their advertising is effective. Before you start, yes…. Facebook does ha…