数据科学项目_完整的数据科学组合项目

数据科学项目

In this article, I would like to showcase what might be my simplest data science project ever.

在本文中,我想展示一下有史以来最简单的数据科学项目

I have spent hours training a much more complex models in the past, and struggled to find the right parameters to create machine learning pipelines.

过去,我花费了数小时来训练更复杂的模型,并努力寻找合适的参数来创建机器学习管道。

Despite its simplicity, if I could only display one project on my resume, it would be this one.

尽管它很简单,但如果我只能在简历中显示一个项目,那就是这个。

Let me explain why.

让我解释一下原因。

包装是否确定礼物的价值? (Does the package determine the value of the gift?)

As a child, I would always get excited about holidays because I could get gifts. (Just humour me here, I do have a point, I promise). My aunt presented me with this beautiful dress, perhaps more beautiful than any other gift I received that day.

小时候,我总是会对假期感到兴奋,因为我可以得到礼物。 ( 我保证我在这里很幽默,我有一点要保证)。 我的姨妈给了我这件漂亮的衣服,也许比那天我收到的任何其他礼物都要漂亮。

Here’s the thing though — I didn’t even want to open it. She had shabbily wrapped it with newspaper, and the gift seemed to have lost half its value before I even saw what was inside.

不过,这是东西–我什至不想打开它。 她用报纸把它包裹起来,礼物似乎失去了一半的价值,我什至没有看到里面的东西。

To answer the question above, no. The package by no means determines the value of the gift.

要回答上述问题, 。 包装决不会决定礼物的价值。

However, it can greatly influence your expectation of what’s inside and can change the way you perceive it.

但是,它会极大地影响您对内部内容的期望,并会改变您对其的感知方式。

The machine learning models you spend weeks training are great. Demonstrate that. Don’t let them die in your Jupyter Notebook.

您花费数周训练的机器学习模型很棒。 证明这一点。 不要让它们在Jupyter Notebook中死亡。

Recruiters have hundreds of resumes to read. It is almost impossible for them to read through all your code on GitHub and understand all your projects.

招聘人员有数百份简历可供阅读。 他们几乎不可能阅读GitHub上的所有代码并理解所有项目。

To stand out, you need to do something slightly different. Create an interface they can interact with. Maybe a live dashboard they can play around with.

要脱颖而出,您需要做些不同的事情。 创建一个可以与之交互的界面。 也许他们可以玩的实时仪表板。

Even if it's not the best dashboard or interface out there, it will create interest, because you created something they can actually use.

即使不是最佳的仪表板或界面,它也会引起人们的兴趣,因为您创建了它们可以实际使用的东西。

I wanted to do exactly that, which is why I came up with this portfolio project. In the next few sections, I will explain exactly what I did without going too much into the technical detail.

我想做到这一点,这就是为什么我提出这个投资组合项目的原因。 在接下来的几节中,我将准确解释我所做的事情,而无需过多地讨论技术细节。

目标 (Aim)

I aimed to display skills in the following areas:

我旨在展示以下领域的技能:

  • Data Collection

    数据采集
  • Data Wrangling

    数据整理
  • Data Visualization

    数据可视化
  • Machine Learning

    机器学习
  • Web Development

    Web开发

In order to do so, I created the following components in my project:

为此,我在项目中创建了以下组件:

  • Front-end interface

    前端界面
  • Movie Dashboard

    电影仪表板
  • Movie Recommender System

    电影推荐系统

I will explain and demonstrate each component in detail.

我将详细解释和演示每个组件。

Note: If you don’t want to read through the entire article and just want to take a look at the final product, just scroll down and take a look at the ‘Links’ section.

注意:如果您不想通读整篇文章,只想看一下最终产品,只需向下滚动并看一下“ 链接 ”部分。

前端接口 (Front-End Interface)

Image for post

In the past, I would create projects and let the code sit in my GitHub repository. I write an occasional article explaining the project on Medium.

过去,我将创建项目并将代码放在我的GitHub存储库中。 我偶尔写一篇文章,解释Medium上的项目。

Here, I took a different approach.

在这里,我采取了另一种方法。

I created a web-page and explained the different components in my project. I wrote briefly about how users can interact with the systems I created, and put up links to my code and Medium article.

我创建了一个网页,并解释了项目中的不同组件。 我简短地写了关于用户如何与我创建的系统进行交互的文章,并提供了指向我的代码和中型文章的链接。

The entire project can be understood and accessed through just one page, which makes it so much easier for people to engage with.

整个项目仅需一页即可理解和访问,这使人们更容易进行互动。

You can check the site out here — View on laptop or PC for better UI experience.

您可以在此处 查看 该站点 — 在便携式计算机或PC上查看以获得更好的UI体验。

电影仪表板 (Movie Dashboard)

Image for post

Next, I created a movie dashboard with Tableau.

接下来,我使用Tableau创建了一个电影仪表板。

The steps involved:

涉及的步骤:

数据采集 (Data Collection)

I had to collect data from a variety of different places. I also wanted to visualize Bechdel scores of these movies (a measure of female representation in Hollywood), so I used an API to get that data.

我不得不从许多不同的地方收集数据。 我还想可视化这些电影的Bechdel分数( 好莱坞中女性代表的度量 ),因此我使用API​​来获取该数据。

数据整理 (Data Wrangling)

I cleaned the data and merged the datasets together. Once I was done, I could finally visualize it!

我清理了数据并将数据集合并在一起。 完成后,我终于可以将其可视化!

数据可视化 (Data Visualization)

Surprisingly, this took up a huge portion of my time compared to other parts of this project.

令人惊讶的是,与该项目的其他部分相比,这花费了我大量的时间。

I spent two days trying to create a visually appealing dashboard.

我花了两天的时间来创建一个吸引人的仪表板。

I created one with a Python Dash app. I wasn’t too satisfied with the layout, and tried creating a Shiny web app in R instead.

我用Python Dash应用程序创建了一个。 我对布局不太满意,而是尝试在R中创建一个Shiny Web应用程序。

It turned out better than my Dash app, and I loved the functionality. However, I simply didn’t find the design appealing.

事实证明,它比我的Dash应用程序好,我喜欢它的功能。 但是,我只是觉得设计没有吸引力。

Finally, I decided to use Tableau. This only took me about an hour to create. If you want to get started with Tableau, you can read this tutorial I created.

最后,我决定使用Tableau。 这只花了我大约一个小时的时间。 如果要开始使用Tableau,可以阅读我创建的本教程 。

You can view my dashboard here — View on laptop or PC for better UI experience.

您可以在此处查看我的仪表板- 在笔记本电脑或PC上查看以获得更好的UI体验

推荐系统 (Recommender System)

Image for post

Finally, machine learning!

最后,机器学习!

I created a simple recommendation system with the same data I used for the dashboard and deployed it with a Dash app.

我使用与仪表板相同的数据创建了一个简单的推荐系统,并通过Dash应用程序进行了部署。

Just enter a movie name, and it uses the back-end recommendation system to generate movie suggestions for you.

只需输入电影名称,它就会使用后端推荐系统为您生成电影建议。

Actually, this recommendation system was created when I was just starting to learn machine learning.

实际上,这个推荐系统是在我刚开始学习机器学习时创建的。

I found the code in my Jupyter Notebook, and decided to clean it up a bit to create this simple application.

我在Jupyter Notebook中找到了代码,并决定对其进行一些清理以创建此简单应用程序。

You can take a look at the recommendation system here — View on laptop or PC for better UI experience.

您可以在这里 查看推荐系统- 在笔记本电脑或PC上查看以获得更好的UI体验

That’s it!

而已!

链接 (Links)

  • Front-End Interface

    前端接口

  • Movie Dashboard

    电影仪表板

  • Recommender System

    推荐系统

  • Code (I apologize since the codes are pretty messy, I will clean them and re-upload soon.)

    代码 ( 我很抱歉,因为代码太乱了,我将清理它们并尽快重新上传。 )

I hope you enjoyed this article and found the tips above helpful. Jupyter Notebooks are great, but don’t let your projects just sit there.

希望您喜欢这篇文章,并发现以上提示对您有所帮助。 Jupyter Notebooks很棒,但不要让您的项目只坐在那儿。

Use your creativity to create something other people can interact with.

利用您的创造力创造其他人可以与之互动的东西。

I’ve seen some incredible projects on GitHub with only one star. On the other hand, I’ve also seen some really simple projects gain a lot of attention just because of how it was presented.

我在GitHub上仅看到一颗星星就看到了一些令人难以置信的项目。 另一方面,我也看到一些非常简单的项目因其呈现方式而引起了很多关注。

Most importantly though, create projects you like to work on and do what you feel is enjoyable!

不过,最重要的是,创建您喜欢的项目并做自己认为愉快的事情!

翻译自: https://towardsdatascience.com/a-complete-data-science-portfolio-project-ebbced35ea84

数据科学项目

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/390627.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

回溯算法和贪心算法_回溯算法介绍

回溯算法和贪心算法回溯算法 (Backtracking Algorithms) Backtracking is a general algorithm for finding all (or some) solutions to some computational problems, notably constraint satisfaction problems. It incrementally builds candidates to the solutions, and …

alpha冲刺day8

项目进展 李明皇 昨天进展 编写完个人中心页面今天安排 编写首页逻辑层问题困难 开始编写数据传递逻辑,要用到列表渲染和条件渲染心得体会 小程序框架设计的内容有点忘了,而且比较抽象,需要理解文档举例和具体案例林翔 昨天进展 黑名单用户的…

增加 processon 免费文件数

github 地址:github.com/96chh/Upgra… 关于 ProcessOn 非常好用的思维导图网站,不仅支持思维导图,还支持流程图、原型图、UML 等。比我之前用的百度脑图强多了。 直接登录网站就可以编辑,非常适合我在图书馆公用电脑学习使用。 但…

uni-app清理缓存数据_数据清理-从哪里开始?

uni-app清理缓存数据It turns out that Data Scientists and Data Analysts will spend most of their time on data preprocessing and EDA rather than training a machine learning model. As one of the most important job, Data Cleansing is very important indeed.事实…

高级人工智能之群体智能:蚁群算法

群体智能 鸟群: 鱼群: 1.基本介绍 蚁群算法(Ant Colony Optimization, ACO)是一种模拟自然界蚂蚁觅食行为的优化算法。它通常用于解决路径优化问题,如旅行商问题(TSP)。 蚁群算法的基本步骤…

JavaScript标准对象:地图

The Map object is a relatively new standard built-in object that holds [key, value] pairs in the order that theyre inserted. Map对象是一个相对较新的标准内置对象,按插入顺序保存[key, value]对。 The keys and values in the Map object can be any val…

leetcode 483. 最小好进制

题目 对于给定的整数 n, 如果n的k(k>2)进制数的所有数位全为1,则称 k(k>2)是 n 的一个好进制。 以字符串的形式给出 n, 以字符串的形式返回 n 的最小好进制。 示例 1: 输入:“13” 输…

图像灰度变换及图像数组操作

Python图像灰度变换及图像数组操作 作者:MingChaoSun 字体:[增加 减小] 类型:转载 时间:2016-01-27 我要评论 这篇文章主要介绍了Python图像灰度变换及图像数组操作的相关资料,需要的朋友可以参考下使用python以及numpy通过直接操…

npx npm区别_npm vs npx —有什么区别?

npx npm区别If you’ve ever used Node.js, then you must have used npm for sure.如果您曾经使用过Node.js ,那么一定要使用npm 。 npm (node package manager) is the dependency/package manager you get out of the box when you install Node.js. It provide…

找出性能消耗是第一步,如何解决问题才是关键

作者最近刚接手一个新项目,在首页列表滑动时就感到有点不顺畅,特别是在滑动到有 ViewPager 部分的时候,如果是熟悉的项目,可能会第一时间会去检查代码,但前面说到这个是刚接手的项目,同时首页的代码逻辑比较…

bigquery_如何在BigQuery中进行文本相似性搜索和文档聚类

bigqueryBigQuery offers the ability to load a TensorFlow SavedModel and carry out predictions. This capability is a great way to add text-based similarity and clustering on top of your data warehouse.BigQuery可以加载TensorFlow SavedModel并执行预测。 此功能…

bzoj 1996: [Hnoi2010]chorus 合唱队

Description 为了在即将到来的晚会上有吏好的演出效果&#xff0c;作为AAA合唱队负责人的小A需要将合唱队的人根据他们的身高排出一个队形。假定合唱队一共N个人&#xff0c;第i个人的身髙为Hi米(1000<Hi<2000),并已知任何两个人的身高都不同。假定最终排出的队形是A 个人…

移动应用程序开发_什么是移动应用程序开发?

移动应用程序开发One of the most popular forms of coding in the last decade has been the creation of apps, or applications, that run on mobile devices.在过去的十年中&#xff0c;最流行的编码形式之一是创建在移动设备上运行的应用程序。 Today there are two main…

leetcode 1600. 皇位继承顺序(dfs)

题目 一个王国里住着国王、他的孩子们、他的孙子们等等。每一个时间点&#xff0c;这个家庭里有人出生也有人死亡。 这个王国有一个明确规定的皇位继承顺序&#xff0c;第一继承人总是国王自己。我们定义递归函数 Successor(x, curOrder) &#xff0c;给定一个人 x 和当前的继…

vlookup match_INDEX-MATCH — VLOOKUP功能的升级

vlookup match电子表格/索引匹配 (SPREADSHEETS / INDEX-MATCH) In a previous article, we discussed about how and when to use VLOOKUP functions and what are the issues that we might face while using them. This article, on the other hand, will take you to a jou…

java基础-BigDecimal类常用方法介绍

java基础-BigDecimal类常用方法介绍 作者&#xff1a;尹正杰 版权声明&#xff1a;原创作品&#xff0c;谢绝转载&#xff01;否则将追究法律责任。 一.BigDecimal类概述 我们知道浮点数的计算结果是未知的。原因是计算机二进制中&#xff0c;表示浮点数不精确造成的。这个时候…

节点对象转节点_节点流程对象说明

节点对象转节点The process object in Node.js is a global object that can be accessed inside any module without requiring it. There are very few global objects or properties provided in Node.js and process is one of them. It is an essential component in the …

PAT——1018. 锤子剪刀布

大家应该都会玩“锤子剪刀布”的游戏&#xff1a;两人同时给出手势&#xff0c;胜负规则如图所示&#xff1a; 现给出两人的交锋记录&#xff0c;请统计双方的胜、平、负次数&#xff0c;并且给出双方分别出什么手势的胜算最大。 输入格式&#xff1a; 输入第1行给出正整数N&am…

leetcode 1239. 串联字符串的最大长度

题目 二进制手表顶部有 4 个 LED 代表 小时&#xff08;0-11&#xff09;&#xff0c;底部的 6 个 LED 代表 分钟&#xff08;0-59&#xff09;。每个 LED 代表一个 0 或 1&#xff0c;最低位在右侧。 例如&#xff0c;下面的二进制手表读取 “3:25” 。 &#xff08;图源&am…

flask redis_在Flask应用程序中将Redis队列用于异步任务

flask redisBy: Content by Edward Krueger and Josh Farmer, and Douglas Franklin.作者&#xff1a; 爱德华克鲁格 ( Edward Krueger) 和 乔什法默 ( Josh Farmer )以及 道格拉斯富兰克林 ( Douglas Franklin)的内容 。 When building an application that performs time-co…