数据结构两个月学完_这是我作为数据科学家两年来所学到的

数据结构两个月学完

It has been 2 years ever since I started my data science journey. Boy, that was one heck of a roller coaster ride!

自从我开始数据科学之旅以来已经有两年了 。 男孩 ,那可真是坐过山车!

There were many highs and lows, and of course, countless cups of coffee and sleepless nights.

有很多高峰和低谷,当然还有无数杯咖啡和不眠之夜。

I failed a lot, learned a lot, and of course, grew a lot as a data scientist along the journey.

作为一个数据科学家,我经历了很多失败,学到了很多东西,当然,成长了很多。

Throughout my journey in these 2 years, from writing on Medium, speaking at meetups and workshops, sharing my experience on LinkedIn, consulting clients on data science projects, to the current stage of teaching data science in education, I find joy and fulfilment in sharing and teaching to help others in data science and make an impact.

在这两年的旅程中,从撰写中型文章 , 在聚会和研讨会 上 发表演讲, 在LinkedIn上分享我的经验 , 就数据科学项目向客户提供咨询 ,到目前在教育中教授数据科学的阶段,我在分享中都感到快乐和成就并进行教学以帮助他人在数据科学中产生影响

At the end of the day, it all boils down to one simple fact — that I’m moving towards my mission — Making data science accessible to everyone.

归根结底,这都归结为一个简单的事实-我正在朝着自己的使命迈进- 使所有人都能使用数据科学

If you’re interested, feel free to check my previous LinkedIn post on why I decided to transition from a data scientist to becoming a data science instructor — a.k.a teacher.

如果您有兴趣,请随时查看我以前在LinkedIn上发布的帖子,以了解为什么我决定从数据科学家过渡到成为数据科学老师(又名老师)。

In this article, for the first time, I’ll consolidate everything that I’ve learned and condense all of these into 5 lessons that I’ve learned in 2 years as a data scientist.

在本文中,我将第一次将自己学到的所有知识整合在一起,并将所有这些知识汇总为我在两年内作为数据科学家学到的5课

If you’re just starting out in data science and wondering what to learn…

如果您只是刚开始从事数据科学,并想知道该学习什么……

Or you’re looking for a job in data science…

或者您正在寻找数据科学领域的工作...

Or you’re already working in data science space…

或者您已经在数据科学领域工作了……

I hope you’ll find these 5 lessons helpful to you as a data scientist!

希望您会发现这5堂课程对数据科学家有帮助!

Enough of talking… Let’s get started!

足够多的谈话……让我们开始吧!

我两年来作为数据科学家学到的5课 (5 Lessons I’ve Learned in 2 Years as a Data Scientist)

Image for post
(Source)(资源)

1.讲故事,而不是陈述。 (1. Storytelling, NOT Presentation.)

One of the most profound questions that I’ve ever been asked by one of the great senior data scientists during my data science career:

在我的数据科学职业生涯中,一位伟大的高级数据科学家曾经问过我最深刻的问题之一:

“Admond, what’s the story that we are gonna tell in the meeting later?”

“阿德蒙德,我们稍后在会议上要讲的故事是什么?”

The first time I heard this question, I was stunned for a second.

第一次听到这个问题时,我惊呆了一秒钟。

He didn’t ask what slides I’d prepared.

他没有问我准备了哪些幻灯片。

He didn’t ask what I was gonna share.

他没有问我要分享什么。

He didn’t ask what results that I was gonna tell.

他没有问我要告诉什么结果。

NONE.

没有。

To be honest with you, I didn’t even understand why he emphasized so much on telling stories, instead of telling facts that we already had.

老实说,我什至不明白他为什么这么讲讲故事,而不是讲我们已经掌握的事实。

Before I began to appreciate the importance of telling stories, I made tons of mistakes.

在我开始欣赏讲故事的重要性之前,我犯了很多错误。

Either stakeholders didn’t understand what I was saying. Or the insights couldn’t convince and motivate them to take action.

任何一个利益相关者都不理解我在说什么。 否则这些见解无法说服和激励他们采取行动。

Once I decided to improve my storytelling skills…

一旦我决定提高叙事技巧,…

Once I started focusing on telling stories…

一旦我开始专注于讲故事...

Things changed, for real.

事情变了,真的。

Stakeholders or non-technical bosses began to understand what I was delivering without bombarding them with technical jargons and results. They took action.

利益相关者或非技术老板开始理解我所提供的内容,而没有用技术术语和结果轰炸他们。 他们采取了行动。

Facts tell, but stories sell.

F 言行举止,但故事却卖。

If you want to be a good data scientist, focus on technical skills.

如果您想成为一名优秀的数据科学家,请专注于技术技能。

If you want to be a great data scientist, focus on storytelling skills.

如果您想成为一名出色的数据科学家,请专注于讲故事的技能。

所以……如何学习讲故事的技巧? (So… How To Learn Storytelling Skills?)

Want to learn storytelling skills? Learn from Vox.

想学习讲故事的技巧吗? 向Vox学习。

Because they are the master of storytelling, like seriously.

因为他们是讲故事的主人,所以很认真。

They have always been able to explain complex issues or ideas in an engaging and understandable way.

他们始终能够以一种引人入胜且易于理解的方式解释复杂的问题或想法。

If this is the first time you’ve heard of Vox, check out their YouTube video below.

如果这是您第一次听说Vox,请在下面查看他们的YouTube视频。

Just observe how they explained societal phenomena and issues in the most intuitive storytelling way possible to understand.

只需观察他们如何以最直观的讲故事的方式解释社会现象和问题,就可以理解。

And this is very important when it comes to presenting insights or delivering core message to your audience with great storytelling skills.

当谈到具有深刻的讲故事技巧的见解或向您的听众传达核心信息时,这一点非常重要。

演示地址

Vox — How wildlife trade is linked to coronavirusVox —野生生物贸易与冠状病毒之间的联系

2.数据混乱,拥抱它。 (2. Data Is Messy, Embrace It.)

Forget about having Kaggle-like data in your real working environment, because most of the time you won’t have clean data.

忘记在实际的工作环境中拥有类似Kaggle的数据,因为大多数时候您将没有干净的数据。

Or worse, sometimes you don’t even have data to begin with, or perhaps you’re just not sure where to get or query data because they are scattered everywhere.

或更糟糕的是,有时您甚至没有开始使用的数据,或者您只是不确定要从哪里获取或查询数据,因为它们分散在各处。

Data collection and data integrity are one of the most important steps in any data science projects, yet a lot of junior data scientists might be oblivious to that.

数据采集 数据完整性 这是任何数据科学项目中最重要的步骤之一,但是许多初级数据科学家可能会忽略这一点。

The reality is that you need to know where to get your data based on business requirements and the existing data architecture.

现实情况是,您需要根据业务需求和现有数据架构来了解从何处获取数据。

You might breathe a sigh of relief after you’ve got the data, but this is where the hard part begins — data integrity.

拥有数据后,您可能会松一口气,但这就是最困难的部分-数据完整性。

You need to perform a thorough check on the data collected by asking hard questions and understanding from different stakeholders to see if the data collected makes any sense.

您需要通过提出难题和不同利益相关者的理解对收集的数据进行彻底检查,以查看收集的数据是否有意义。

Without having right and accurate data in place at the first place, all of our data cleaning, EDA, machine learning models building, and deployment are simply a luxury.

如果没有首先放置正确且准确的数据,那么我们所有的数据清理 , EDA ,机器学习模型的建立和部署都是一种奢侈。

3.软技能>技术技能 (3. Soft Skills > Technical Skills)

One of the most common questions for beginners in data science is this:

数据科学初学者最常见的问题之一是:

“What are the skills that I need to learn when starting out in data science?”

“从数据科学开始我需要学习哪些技能?”

In my opinion, I think learning technical skills (programming, statistics etc.) should be the priority when first starting out in data science.

在我看来,我认为学习技术技能 (编程,统计学等)应该是首次进入数据科学时的优先事项。

Once we’ve a solid foundation in technical skills, we should focus more on building and improving our soft skills (communication, storytelling etc.).

一旦我们在技术技能上建立了坚实的基础,我们就应该更加专注于建立和改进我们的软技能 (沟通,讲故事等)。

While this might seem a bit counter-intuitive to the normal ways of learning data science skills, I truly believe in this approach.

尽管这似乎与学习数据科学技能的常规方法有点反常理,但我确实相信这种方法。

WHY?

为什么?

You see. Data scientists are problem solvers.

你看。 数据科学家是解决问题的人。

We don’t just write some code, build some fancy machine learning models and call it a day.

我们不只是编写一些代码,构建一些高级的机器学习模型,然后再称之为一天。

From understanding a business problem, collecting and visualizing data, to the stage of prototyping, fine-tuning and deploying models to real world applications, all these steps require teamwork, communication and storytelling skills to work with team members, manage expectation with stakeholders and ultimately to drive business decisions and actions.

从了解业务问题,收集和可视化数据到原型设计,微调和将模型部署到现实世界应用程序的阶段,所有这些步骤都需要团队合作,沟通和讲故事的技巧,才能与团队成员一起工作,与利益相关者一起管理期望并最终推动业务决策和行动。

There is a famous quote:

有句名言:

“ Without data you’re just another person with an opinion ”

“没有数据,您就是另一个有意见的人”

— W. Edwards Deming

—爱德华兹·戴明(W. Edwards Deming)

To me, getting data is only the first step. What’s more important is how you can use data to drive business decisions and actions to make a real impact. Here is a slightly modified quote from me:

对我来说,获取数据只是第一步。 更重要的是如何使用数据来推动业务决策和行动以产生真正的影响。 这是我的引用语:

“ Without storytelling skills you’re just another person with data ”

“没有讲故事的技巧,您就是另一个拥有数据的人”

You can perform the best data analytics in the world.

您可以执行世界上最好的数据分析。

You can build the best machine learning model in the world.

您可以构建世界上最好的机器学习模型。

You can also write the cleanest code in the world.

您还可以编写世界上最干净的代码。

But if you can’t use your results to drive business decisions and actions to convince people to use what you’ve got, your results would only be residing in your PowerPoint slides without having any real impact.

但是,如果您不能使用结果来推动业务决策和采取行动来说服人们使用您所拥有的功能,那么结果将只会驻留在PowerPoint幻灯片中而不会产生任何实际影响。

Sad, but true.

伤心,但真实。

4.可解释的模型很重要。 (4. Interpretable Models Matter, A Lot.)

For most businesses — unless you’re working at some cutting-edge technology companies — fancy or complex models typically are not the first choice for analytics or predictions.

对于大多数企业而言-除非您在某些尖端科技公司工作-否则,花哨或复杂的模型通常不是分析或预测的首选。

Your boss and stakeholders want to understand what’s going on behind your results.

您的老板和利益相关者希望了解结果背后的情况。

Therefore, you need to be able to explain what’s going on behind your results.

因此,您需要能够解释结果背后的原因。

For instance, what caused this anomaly to be detected? And why is that so? Does it make sense in the business context? Why is the prediction the way it is? What are the contributing factors to the prediction? Are our assumptions correct?

例如,什么原因导致此异常被检测到? 为什么会这样呢? 在商业环境中有意义吗? 为什么预测是这样? 预测的影响因素是什么? 我们的假设正确吗?

From all those questions asked above, it essentially boils down to one simple question:

从以上所有这些问题中,它基本上可以归结为一个简单的问题:

“ What’s the pattern observed behind? ”

“观察到的模式是什么? ”

Being able to understand what’s going on behind our models and results is crucial to drive business decisions by convincing stakeholders to take actions.

通过说服利益相关者采取行动,能够了解我们的模型和结果背后发生的事情,对于推动业务决策至关重要。

Huge enterprises simply can’t afford to deploy a blackbox model in the real world and let it run wild on the ground without understanding how it works or when it fails.

巨大的企业根本无力在现实世界中部署黑盒模型,而让它在不了解其工作原理或失效时间的情况下在野外疯狂运行。

And this is exactly why we’re still seeing simple models are still being utilized in the current industry like decision trees and logistic regression models.

这就是为什么我们仍然看到诸如决策树和逻辑回归模型之类的简单模型在当前行业中仍在使用的原因。

5.总是看到大图景 (5. Always See The Big Picture)

Image for post
(Source)(资源)

I made this huge mistake when I was first starting out in data science.

当我刚开始从事数据科学时,我犯了一个巨大的错误。

I focused too much on code and errors but somehow lost sight of the big picture that was truly important — end-to-end pipeline integration in production and how the solution performed in real world.

我过多地专注于代码和错误,但是却以某种方式忽略了真正重要的全局- 生产中的端到端管道集成以及解决方案在现实世界中的执行情况

In other words, I was too fixated with the technical part to the extent of over-optimizing my code and models without having a real impact in the overall project or business.

换句话说,我过于专注于技术部分,以至于过度优化了我的代码和模型,而对整个项目或业务没有真正的影响。

Unfortunately, I learned this the hard way.

不幸的是,我很难学到这一点。

Fortunately, I’m currently using what I’ve learned to always remind myself to see the big picture.

幸运的是,我目前正在使用自己学到的知识来提醒自己看大图。

Hopefully, you’ll begin to realize the importance of seeing the big picture in your day-to-day work as a data scientist.

希望您会开始意识到在作为数据科学家的日常工作中看到全局的重要性。

And the first step to do this is to first understand the business domain and the problems that you’re solving.

第一步是首先了解业务领域和您要解决的问题。

Be clear of what you or your team aims to achieve in a project and understand how your role could be a part of the big picture and how different small pieces of picture can work together as a whole for the common goals.

清楚您或您的团队在项目中要实现的目标,并了解您的角色如何成为整体的一部分,以及不同的小片段如何共同为共同的目标而协同工作。

最后的想法 (Final Thoughts)

Image for post
(Source)(资源)

Thank you for reading.

感谢您的阅读。

My data science journey definitely has been a tough one, but I enjoyed the ride and learned a lot along the way.

我的数据科学之旅当然是艰难的,但是我很喜欢这次旅程,并且在此过程中学到了很多东西。

And I’m still learning each and every day.

而且我仍在每天学习。

I hope you found this article helpful to you in some ways and will apply the lessons here in your work as a data scientist.

我希望您发现本文在某些方面对您有所帮助,并将本文中的课程应用于您作为数据科学家的工作。

Now that I’ve moved to become a data science instructor, you’d also expect more data science content from me in future to help you learn and get into this field.

既然我已经成为一名数据科学讲师,那么您也希望以后我会提供更多的数据科学内容,以帮助您学习和进入这一领域。

Check out my other articles if you want to learn more about data science.

如果您想了解有关数据科学的更多信息,请查看我的其他文章 。

If you’re interested in learning how to go into data science, feel free to check out this article — How To Go Into Data Science — where I compiled and answered a list of common questions (or challenges) faced by beginners in data science with guidance.

如果您有兴趣学习如何进入数据科学领域,请随时阅读本文— 如何进入数据科学领域。 在这里,我整理并回答了数据科学初学者在指导下遇到的常见问题(或挑战)列表。

I hope you enjoyed reading this article and I look forward to having you as part of the data science community.

希望您喜欢阅读本文,并希望您成为数据科学界的一员。

Remember, keep learning and never stop improving.

记住,继续学习,永远不要停止改进。

As always, if you have any questions or comments feel free to leave your feedback below or you can always reach me on LinkedIn. Till then, see you in the next post! 😄

与往常一样,如果您有任何疑问或意见,请随时在下面留下您的反馈,或者随时可以通过LinkedIn与我联系。 到那时,在下一篇文章中见! 😄

关于作者 (About the Author)

As a data scientist and data science instructor, Admond Lee is on a mission to make data science accessible to everyone. He is helping companies and digital marketing agencies track and achieve marketing ROI with actionable insights through innovative attribution and data-driven approach.

作为数据科学家和数据科学讲师, Admond Lee的使命是使每个人都可以访问数据科学。 他正在通过创新的归因和数据驱动方法,以切实可行的见解,帮助公司和数字营销机构跟踪并实现营销投资回报。

His story and data science work have been featured by various publications, including KDnuggets, Medium, Tech in Asia, AI Time Journal and business magazines. Besides, he has been invited to speak at various workshops and meetups.

他的故事和数据科学工作在KDnuggets , Medium , Asia in Tech , AI Time Journal和商业杂志等各种出版物中都有报道。 此外,他还应邀在各种研讨会和聚会上演讲 。

With his expertise in advanced social analytics and machine learning, Admond aims to bridge the gaps between digital marketing and data science.

凭借在高级社交分析和机器学习方面的专业知识,Admond致力于弥合数字营销与数据科学之间的鸿沟。

Check out his website if you want to understand more about Admond’s story, data science services, and how he can help you in marketing space using data science.

如果您想了解有关Admond的故事,数据科学服务以及他如何使用数据科学帮助您进行市场营销的更多信息,请访问他的网站

You can connect with him on LinkedIn, Medium, Twitter, and Facebook.

您可以在LinkedIn , Medium , Twitter和Facebook上与他联系。

翻译自: https://towardsdatascience.com/here-is-what-ive-learned-in-2-years-as-a-data-scientist-e13a24a74a72

数据结构两个月学完

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/392010.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

数学哲学与科学哲学和计算机科学的能动作用,数学哲学与科学哲学和计算机科学的能动作用...

3 数学哲学与计算机科学的能动作用数学哲学对于计算机科学的影响主要表现于以下的事实:一些源于数学哲学(数学基础研究)的概念和理论在计算机科学的历史发展中发挥了十分重要的作用。例如,在此可以首先提及(一阶)谓词演算理论:这是由弗雷格(…

AngularDart4.0 指南- 表单

2019独角兽企业重金招聘Python工程师标准>>> 表单是商业应用程序的主流。您可以使用表单登录,提交帮助请求,下订单,预订航班,安排会议,并执行无数其他数据录入任务。 在开发表单时,创建一个数据…

迈向数据科学的第一步:在Python中支持向量回归

什么是支持向量回归? (What is Support Vector Regression?) Support vector regression is a special kind of regression that gives you some sort of buffer or flexibility with the error. How does it do that ? I’m going to explain it to you in simpl…

jQuery事件整合

一、jQuery事件 1、focus()元素获得焦点 2、blur()元素失去焦点 3、change() 表单元素的值发生变化(可用于验证用户名是否存在) 4、click() 鼠标单击 5、dbc…

tableau跨库创建并集_刮擦柏林青年旅舍,并以此建立一个Tableau全景。

tableau跨库创建并集One of the coolest things about making our personal project is the fact that we can explore topics of our own interest. On my case, I’ve had the chance to backpack around the world for more than a year between 2016–2017, and it was one…

1.0 Hadoop的介绍、搭建、环境

HADOOP背景介绍 1.1 Hadoop产生背景 HADOOP最早起源于Nutch。Nutch的设计目标是构建一个大型的全网搜索引擎,包括网页抓取、索引、查询等功能,但随着抓取网页数量的增加,遇到了严重的可扩展性问题——如何解决数十亿网页的存储和索引问题。20…

如何实现多维智能监控?--AI运维的实践探索【一】

作者丨吴树生:腾讯高级工程师,负责SNG大数据监控平台建设。近十年监控系统开发经验,具有构建基于大数据平台的海量高可用分布式监控系统研发经验。 导语:监控数据多维化后,带来新的应用场景。SNG的哈勃多维监控平台在完…

使用Python和MetaTrader在5分钟内开始构建您的交易策略

In one of my last posts, I showed how to create graphics using the Plotly library. To do this, we import data from MetaTrader in a ‘raw’ way without automation. Today, we will learn how to automate this process and plot a heatmap graph of the correlation…

请对比html与css的异同,css2与css3的区别是什么?

css主要有三个版本,分别是css1、css2、css3。css2使用的比较多,因为css1的属性比较少,而css3有一些老式浏览器并不支持,所以大家在开发的时候主要还是使用css2。CSS1提供有关字体、颜色、位置和文本属性的基本信息,该版…

ipywidgets_未来价值和Ipywidgets

ipywidgetsHow to use Ipywidgets to visualize future value with different interest rates.如何使用Ipywidgets可视化不同利率下的未来价值。 There are some calculations that even being easy becoming better with a visualization of his terms. Moreover, the sooner…

计算机主机后面辐射大,电脑的背面辐射大吗

众所周知,电子产品的辐射都比较大,而电脑是非常常见的电子产品,它也存在着一定的辐射,那么电脑的背面辐射大吗?下面就一起随佰佰安全网小编来了解一下吧。有资料显示,电脑后面的辐射比前面大,长期近距离在…

装饰器3--装饰器作用原理

多思考,多记忆!!! 转载于:https://www.cnblogs.com/momo8238/p/7217345.html

用folium模块画地理图_使用Folium表示您的地理空间数据

用folium模块画地理图As a part of the Data Science community, Geospatial data is one of the most crucial kinds of data to work with. The applications are as simple as ‘Where’s my food delivery order right now?’ and as complex as ‘What is the most optim…

python创建类统计属性_轻松创建统计数据的Python包

python创建类统计属性介绍 (Introduction) Sometimes you may need a distribution figure for your slide or class. Since you are not using data, you want a quick solution.有时,您的幻灯片或课程可能需要一个分配图。 由于您不使用数据,因此需要快…

浅析STM32之usbh_def.H

【温故而知新】类似文章浅析USB HID ReportDesc (HID报告描述符) 现在将en.stm32cubef1\STM32Cube_FW_F1_V1.4.0\Middlewares\ST\STM32_USB_Host_Library\Core\Inc\usbh_def.H /********************************************************************************* file us…

C# (类型、对象、线程栈和托管堆)在运行时的相互关系

在介绍运行时的关系之前,先从一些计算机基础只是入手,如下图: 该图展示了已加载CLR的一个windows进程,该进程可能有多个线程,线程创建时会分配到1MB的栈空间.栈空间用于向方法传递实参,方法定义的局部变量也在实参上,上图的右侧展示了线程的栈内存,栈从高位内存地址向地位内存地…

2019-08-01 纪中NOIP模拟赛B组

T1 [JZOJ2642] 游戏 题目描述 Alice和Bob在玩一个游戏,游戏是在一个N*N的矩阵上进行的,每个格子上都有一个正整数。当轮到Alice/Bob时,他/她可以选择最后一列或最后一行,并将其删除,但必须保证选择的这一行或这一列所有…

knn分类 knn_关于KNN的快速小课程

knn分类 knnAs the title says, here is a quick little lesson on how to construct a simple KNN model in SciKit-Learn. I will be using this dataset. It contains information on students’ academic performance.就像标题中所说的,这是关于如何在SciKit-Le…

office漏洞利用--获取shell

环境: kali系统, windows系统 流程: 在kali系统生成利用文件, kali系统下监听本地端口, windows系统打开doc文件,即可中招 第一种利用方式, 适合测试用: 从git下载代码: …

pandas之DataFrame合并merge

一、merge merge操作实现两个DataFrame之间的合并,类似于sql两个表之间的关联查询。merge的使用方法及参数解释如下: pd.merge(left, right, onNone, howinner, left_onNone, right_onNone, left_indexFalse, right_indexFalse,    sortFalse, suffi…