大数据业务学习笔记_学习业务成为一名出色的数据科学家

大数据业务学习笔记

意见 (Opinion)

A lot of aspiring Data Scientists think what they need to become a Data Scientist is :

许多有抱负的数据科学家认为,成为一名数据科学家需要具备以下条件:

  • Coding

    编码
  • Statistic

    统计
  • Math

    数学
  • Machine Learning

    机器学习
  • Deep Learning

    深度学习

And any other technical skills.

以及其他任何技术技能。

The above list is accurate; most of the Data Scientist qualification you need right now is what I list above. It is unavoidable, as many job listing right now always list these skills as a prerequisite. Just look at the example of Data Scientist job requirements and preferences below.

上面的清单是准确的; 我上面列出的是您现在需要的大多数数据科学家资格。 这是不可避免的,因为现在很多工作清单总是将这些技能列为前提条件。 只需看下面的数据科学家工作要求和偏好示例。

Image for post
Taken from indeed.com
摘自确实网站

Most of the requirements sound technical; degree, coding, math, and stats. Although, there is an underlying business understanding requirement that you might not realize at first from this job advertisement.

大部分要求听起来都是技术性的; 学位,编码,数学和统计信息。 但是,有一个潜在的业务理解要求,您可能首先不会从此招聘广告中意识到。

If you look closely, they require someone that had experience in applying the analytical method to solve practical business problems. It implies your everyday task would consisting of solving the business problem, which in turn, you need to understand what kind of business the company runs and how the process itself works.

如果您仔细观察,他们会要求那些具有应用分析方法来解决实际业务问题的经验的人。 这意味着您的日常任务将包括解决业务问题 ,而这又需要您了解公司经营哪种业务以及流程本身如何运作。

You might ask, “Why do I need to understand it? Just create the machine learning model and the problem is solved, isn’t it?” Well, that line of thinking is dangerous, and I would explain why.

您可能会问:“为什么我需要了解它? 只需创建机器学习模型即可解决问题,不是吗?” 好吧,这种思路很危险,我将解释原因。

Just for a reminder, I would argue what makes you great as a Data Scientist is not only how well your coding skill is or how much you understand the statistical theory or even the master of business understanding, but it is a combination of many.

提醒您, 让我成为数据科学家的不仅仅在于您的编码技能如何,或者您对统计理论甚至对业务理解的掌握有多少而且还包括很多方面。

Anybody, of course, could agree or not with my opinion as I believe there are no specific skills that make you a great Data Scientist.

当然,任何人都可以同意或不同意我的观点,因为我相信没有特定的技能可以使您成为一名出色的数据科学家。

Data Scientist employment is hard. It would not easy to get in this field. With many applicants and people with a similar set of skills, you need to stand out. Business Understanding is the skill that would certainly separate you from all the fish in the ponds.

数据科学家的工作很难。 进入这个领域并不容易。 由于许多申请人和具有类似技能的人,您需要脱颖而出。 业务理解能力无疑会使您与池塘中的所有鱼区分开。

In my experience as a Data Scientist, there is no skill that I felt underrated as much as the business understanding skill. I even thought that you don’t need to understand the business in my early career. How wrong I was.

根据我作为数据科学家的经验,没有什么比业务理解技能低估了。 我什至以为您在我的早期职业中不需要了解业务。 我错了

I am not ashamed, though, to admit that I did not consider the business aspect essential at first because many data science education and books did not even teach us about this.

但是,我并不感到ham愧,因为我一开始并不认为业务方面是必不可少的,因为许多数据科学教育和书籍甚至都没有教过我们这一点。

So, why is it crucial to learn the business and how it impacts your employment as a Data Scientist?

那么,为什么学习业务至关重要,它又如何影响您作为数据科学家的工作呢?

Just imagine this situation. You work in the data department of the food industry with candy as their main product, and the company plans to release a new sour candy product. The company then ask the sales department to sell the product. Now, the sales department know that the company had a data department and requesting the data team to give new leads where they can sell sour candy.

试想一下这种情况。 您在食品工业的数据部门工作时,以糖果为主要产品,并且该公司计划发布一种新的酸味糖果产品。 然后,公司要求销售部门出售产品。 现在,销售部门知道该公司有一个数据部门,并要求数据团队提供新的线索以销售酸味糖果。

Before anybody complains that “This is not our job, we create a machine learning model!” or “I work as a data scientist, not in the sales department.” No, this is precisely what Data scientists do in the company; many of the projects are to work with another department for solving the company problem.

在有人抱怨“这不是我们的工作之前,我们创建了机器学习模型!” 或“我是数据科学家,而不是在销售部门。” 不,这正是数据科学家在公司中所做的; 许多项目将与另一个部门合作解决公司问题。

Back to our scenario, how do you correctly approach this problem then? You might think, “Just create a machine learning model to generate the leads.” Yes, it is on the right track, but how exactly you create the model? On what basis? Is the business question even viable enough to solved using the machine learning model?

回到我们的情况,那么您如何正确解决此问题? 您可能会想,“只要创建一个机器学习模型来生成线索即可。” 是的,它是在正确的轨道上,但是您如何精确地创建模型? 在什么基础上? 业务问题是否足够可行,可以使用机器学习模型解决?

You can’t just suddenly using a machine learning model, right? This is why business understanding is so crucial as a Data Scientist. You need to understand how the candy business in more detail. Keep asking a question like,

您不能只是突然使用机器学习模型,对吗? 这就是为什么业务理解对数据科学家如此重要的原因。 您需要更详细地了解糖果业务。 继续问一个问题,

  • What kind of business question exactly we want to solve?”

    我们到底想解决什么样的业务问题?”

  • “Would we even need a machine learning model?”

    “我们甚至需要机器学习模型吗?”

  • “What kind of attributes related to candy sales?”

    “与糖果销售相关的属性是什么?”

  • “How is the candy selling strategy and practice within and outside of the company?”.

    “公司内部和外部的糖果销售策略和实践如何?”

And many more business questions you could think of related to the business.

还有更多您可能想到的与业务相关的业务问题。

It is important to know what kind of business your company run and everything related to the business as your work as a data scientist would need you to make sense of the data.

了解您的公司经营哪种业务以及与该业务相关的所有事项非常重要,因为作为数据科学家,您需要了解数据

While it is easy to say that business understanding skill is essential, it is not easy to gain one.

虽然容易理解业务理解技能是必不可少的,但要获得一项技能却并不容易。

Education is one thing; for example, you might have a higher chance to stand out to applying for a data science position in the PR company if your educational background is communication compared to someone with a biology degree.

教育是一回事; 例如,与具有生物学学位的人相比,如果您的教育背景是交流,那么您可能有更大的机会脱颖而出在PR公司申请数据科学职位。

Although work experience quickly covers this. Working experience with another job title in a similar business industry would provide significant leverage, as you already understand the business process.

尽管工作经验很快就涵盖了这一点。 由于您已经了解业务流程,因此在类似的业务行业中拥有另一个职务的工作经验将提供重要的影响。

For a fresher, it might be a hard industry to break in, but in hindsight, there are many benefits as a fresher as well. I remember Tyler Folkman’s post on his LinkedIn why the industry should consider recent graduates, and I agree. The recent graduate could:

对于新生,这可能是一个很难进入的行业,但是事后看来,新生也有很多好处。 我记得泰勒·福克曼(Tyler Folkman)在其LinkedIn上的帖子,为什么该行业应考虑应届毕业生,我也同意。 应届毕业生可以:

  1. Come with preparation

    附带准备
  2. Hungry to learn about the business

    渴望了解业务
  3. Make an impact

    产生影响

Freshers should a target for companies that have established their data journeys. The company could teach many things about business more easily as fresher have no experience at all in the business world. In my opinion, never count out the freshers.

新生应该成为建立数据旅程的公司的目标。 该公司可以更轻松地教授有关业务的许多事情,因为刚开始的新手根本没有业务领域的经验。 我认为,永远不要指望新生。

I also would tell you about my experience, as well. When I first get the data project, I was not thinking about the business at all and just tried to build the machine learning model. And how disastrous it turns out to be.

我也将告诉您我的经历。 当我第一次获得数据项目时,我根本没有考虑业务,只是尝试构建机器学习模型。 事实证明这是多么的灾难。

I present the model to the related parties with hype in my brain. My model result is good, I know everything about the data, and I know the theory of the model I used. Easy peasy, right? So, wrong. It turns out that the user did not care about the model I used. They are more interested in knowing if I already consider a business approach “A” or why I used the data that should not relate at all to the business. It ends with a discussion that I need more business training.

我在脑海中大肆宣传该模型。 我的模型结果很好,我了解所有有关数据的知识,并且知道我使用的模型的理论。 轻轻松松吧? 大错特错。 事实证明,用户并不关心我使用的模型。 他们更想知道我是否已经考虑过业务方法“ A”,或者为什么我使用了与业务根本不相关的数据。 最后,我需要更多的业务培训。

It is embarrassing, but I am not ashamed at all to admit that it is my fault not to consider business understanding. I could be the best in model creation or statistic, but not knowing the business turns out to be a disaster. Since that day, I try to learn more about the business process itself, even before considering any of the technical things.

令人尴尬,但我完全不as愧承认不考虑业务了解是我的错。 在模型创建或统计方面,我可能是最好的,但我不知道这业务真是一场灾难。 从那天开始,即使在考虑任何技术问题之前,我也会尝试进一步了解业务流程本身。

结论 (Conclusion)

In my opinion, fresher or not, try to learn the business as much as possible.

我认为,无论是否新鲜,都应尽可能多地学习业务。

Focus on one industry you feel interested in; finance, banking, credit, automotive, candy, oil, etc. Every single business has a different approach and strategy; you just need to focus on learning the industry you like.

专注于您感兴趣的一个行业; 金融,银行,信贷,汽车,糖果,石油等。每一项业务都有不同的方法和策略; 您只需要专注于学习自己喜欢的行业即可。

Data scientist employment is hard. It was not easy to get into this field. With many applicants and many people with a similar set of skills, you need to stand out. Business understanding is the skill that will undoubtedly separate you from all the fish in the pond.

数据科学家的工作很难。 进入这个领域并不容易。 在许多申请人和具有相似技能的许多人中, 您需要脱颖而出。 业务理解能力无疑会使您与池塘中的所有鱼类区分开。

翻译自: https://towardsdatascience.com/learn-the-business-to-become-a-great-data-scientist-635fa6029fb6

大数据业务学习笔记

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388090.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

postman 请求参数为数组及JsonObject

2019独角兽企业重金招聘Python工程师标准>>> 1. (1)数组的请求方式(post) https://blog.csdn.net/qq_21205435/article/details/81909184 (2)数组的请求方式(get) http://localhost:port/list?ages10,20,30 后端接收方式: PostMa…

python 开发api_使用FastAPI和Python快速开发高性能API

python 开发apiIf you have read some of my previous Python articles, you know I’m a Flask fan. It is my go-to for building APIs in Python. However, recently I started to hear a lot about a new API framework for Python called FastAPI. After building some AP…

基于easyui开发Web版Activiti流程定制器详解(一)——目录结构

题外话(可略过): 前一段时间(要是没记错的话应该是3个月以前)发布了一个更新版本,很多人说没有文档看着比较困难,所以打算拿点时间出来详细给大家讲解一下,…

基于easyui开发Web版Activiti流程定制器详解(二)——文件列表

上一篇我们介绍了目录结构,这篇给大家整理一个文件列表以及详细说明,方便大家查找文件。 由于设计器文件主要保存在wf/designer和js/designer目录下,所以主要针对这两个目录进行详细说明。 wf/designer目录文件详解…

Power BI:M与DAX以及度量与计算列

When I embarked on my Power BI journey I was almost immediately slapped with an onslaught of foreign and perplexing terms that all seemed to do similar, but somehow different, things.当我开始Power BI之旅时,我几乎立刻受到了外国和困惑术语的冲击&am…

git 基本命令和操作

设置全局用户名密码 $ git config --global user.name runoob $ git config --global user.email testrunoob.comgit init:初始化仓库 创建新的 Git 仓库 git clone: 拷贝一个 Git 仓库到本地 : git clone [url]git add:将新增的文件添加到缓存 : git add test.htmlgit status …

基于easyui开发Web版Activiti流程定制器详解(三)——页面结构(上)

上一篇介绍了定制器相关的文件,这篇我们来看看整个定制器的界面部分,了解了页面结构有助于更好的理解定制器的实现,那么现在开始吧! 首先,我们来看看整体的结构: 整体结构比较简单…

基于easyui开发Web版Activiti流程定制器详解(四)——页面结构(下)

题外话: 这两天周末在家陪老婆和儿子没上来更新请大家见谅!上一篇介绍了调色板和画布区的页面结构,这篇讲解一下属性区的结构也是定制器最重要的一个页面。 属性区整体页面结构如图: 在这个区域可以定义工…

梯度下降法优化目标函数_如何通过3个简单的步骤区分梯度下降目标函数

梯度下降法优化目标函数Nowadays we can learn about domains that were usually reserved for academic communities. From Artificial Intelligence to Quantum Physics, we can browse an enormous amount of information available on the Internet and benefit from it.如…

FFmpeg 是如何实现多态的?

2019独角兽企业重金招聘Python工程师标准>>> 前言 众所周知,FFmpeg 在解码的时候,无论输入文件是 MP4 文件还是 FLV 文件,或者其它文件格式,都能正确解封装、解码,而代码不需要针对不同的格式做出任何改变&…

基于easyui开发Web版Activiti流程定制器详解(五)——Draw2d详解(一)

背景: 小弟工作已有十年有余,期间接触了不少工作流产品,个人比较喜欢的还是JBPM,因为出自名门Jboss所以备受推崇,但是现在JBPM版本已经与自己当年使用的版本(3.X)大相径…

seaborn 子图_Seaborn FacetGrid:进一步完善子图

seaborn 子图Data visualizations are essential in data analysis. The famous saying “one picture is worth a thousand words” holds true in the scope of data visualizations as well. In this post, I will explain a well-structured, very informative collection …

基于easyui开发Web版Activiti流程定制器详解(六)——Draw2d的扩展(一)

题外话: 最近在忙公司的云项目空闲时间不是很多,所以很久没来更新,今天补上一篇! 回顾: 前几篇介绍了一下设计器的界面和Draw2d基础知识,这篇讲解一下本设计器如何扩展Draw2d。 进…

深度学习网络总结

1.Siamese network Siamese [saiə mi:z] 孪生 左图的孪生网络是指两个网络通过共享权值实现对输入的输出,右图的伪孪生网络则不共享权值(pseudo-siamese network)。 孪生神经网络是用来衡量两个输入的相似度,可以用来人脸验证、语义相似度分析、QA匹配…

异常检测时间序列_时间序列的无监督异常检测

异常检测时间序列To understand the normal behaviour of any flow on time axis and detect anomaly situations is one of the prominent fields in data driven studies. These studies are mostly conducted in unsupervised manner, since labelling the data in real lif…

python设计模式(七):组合模式

组合,将对象组合成树状结构,来表示业务逻辑上的[部分-整体]层次,这种组合使单个对象和组合对象的使用方法一样。 如描述一家公司的层次结构,那么我们用办公室来表示节点,则总经理办公司是根节点,下面分别由…

存款惊人_如何使您的图快速美丽惊人

存款惊人So, you just finished retrieving, processing, and analyzing your data. You grab your data and you decide to graph it so you can show others your findings. You click ‘graph’ and……因此,您刚刚完成了数据的检索,处理和分析。 您获…

pytest自动化6:pytest.mark.parametrize装饰器--测试用例参数化

前言:pytest.mark.parametrize装饰器可以实现测试用例参数化。 parametrizing 1. 下面是一个简单是实例,检查一定的输入和期望输出测试功能的典型例子 2. 标记单个测试实例为失败,例如使用内置的mark.xfail,则跳过该用例不执行直…

基于easyui开发Web版Activiti流程定制器详解(六)——Draw2d详解(二)

上一篇我们介绍了Draw2d整体结构,展示了组件类关系图,其中比较重要的类有Node、Canvas、Command、Port、Connection等,这篇将进一步介绍Draw2d如何使用以及如何扩展。 进入主题: 详细介绍一下Draw2d中几个…

Ubuntu16.04 开启多个终端,一个终端多个小窗口

Ubuntu16.04 开启多个终端,一个终端多个小窗口 CtrlShift T,一个终端开启多个小终端 CtrlAlt T 开启多个终端 posted on 2019-03-15 11:26 _孤城 阅读(...) 评论(...) 编辑 收藏 转载于:https://www.cnblogs.com/liuweijie/p/10535904.html