数据科学还是计算机科学
意见 (Opinion)
目录 (Table of Contents)
- Introduction 介绍
- Examples 例子
- When You Should Use Data Science 什么时候应该使用数据科学
- Summary 摘要
介绍 (Introduction)
Both Data Science and Machine Learning are useful fields that apply several tools to predict, suggest, classify, and ultimately solve common business problems. You can create highly accurate models that automate previously manual tasks. Data Science can be powerful, saving companies money and time. However, you will find that you do not necessarily need Data Science to solve every problem you encounter. There are certain situations where human intervention is more important, or the situation does not allow for a generalized model.
数据科学和机器学习都是有用的领域,它们应用多种工具来预测,建议,分类并最终解决常见的业务问题。 您可以创建高度精确的模型来自动化以前的手动任务。 数据科学功能强大,可以节省公司的金钱和时间。 但是,您会发现不一定需要数据科学来解决遇到的每个问题。 在某些情况下,人工干预更为重要,或者这种情况不允许使用通用模型。
I will be describing five examples of when not to use Data Science. As a Data Scientist, I have found that I have slowly, over time, learned, or experienced when Data Science and Machine Learning were not necessary. I hope I can shed some light and intuition for you and your future situations.
我将描述五个何时不使用数据科学的示例。 作为数据科学家,我发现我在不需要数据科学和机器学习的过程中逐渐地,逐渐地学习或体验。 希望我能为您和您的未来情况提供一些启示和直觉。
例子 (Examples)
There are several examples of when to use Data Science and when not to use Data Science. Here are some situations that come to mind where a Data Science model is not necessary, and possibly could make the situation worse:
有几个何时使用数据科学以及何时不使用数据科学的示例。 在某些情况下,我想到了不需要数据科学模型的情况,这可能会使情况变得更糟:
Classifying some health implications
分类一些对健康的影响
Depending on the severity of incorrect predictions, utilizing Data Science in some facets of the healthcare field can be extremely costly in a few ways. An example of a Data Science model that results in an incorrect prediction with no harm would be classifying a t-shirt as a sweater, and vice-versa. This incorrect suggestion that would be seen by consumers on an e-commerce site would be unfortunate, but it would not result in harm. Now imagine you created a model to classify cancer. If you classify someone as not having cancer, and they actually did, this result can be extremely harmful. Perhaps human intervention is the best route here or human-in-the-loop (a combination of Data Science and human efforts), rather than Data Science only. A good rule of thumb to know is:
根据错误预测的严重程度,在医疗保健领域的某些方面使用数据科学可能会在某些方面造成极大的损失。 数据科学模型的一个实例,该实例会导致错误的预测而不会造成损害,将T恤衫归类为毛衣,反之亦然。 消费者在电子商务网站上看到的这个错误建议是不幸的,但不会造成危害。 现在假设您创建了一个模型来对癌症进行分类。 如果您将某人归类为没有癌症,而实际上他们确实患有癌症,则此结果可能非常有害。 也许人为干预是此处的最佳选择,还是人与人之间的循环 ( 数据科学和人类努力的结合 ),而不是仅数据科学。 一个好的经验法则是:
If this model prediction is incorrect, what will be the consequences?
如果此模型预测不正确,将产生什么后果?
However, Data Science, Machine Learning, and AI are constantly evolving and you can expect to see emerging technologies and improvements on model accuracy quickly.
但是,数据科学,机器学习和AI不断发展,您可以期望看到新兴技术和模型准确性的快速提高。
When you don't have enough data
当您没有足够的数据时
This example is more common. When you are producing a model, you want to make sure you have sufficient data. Bad data in and a bad model out could occur, and the same could be said about not having enough data that would then produce a bad model. The model could even seem to perform well but it would not generalize well to new situations. You could be overfitting, or simply not exposing the environment to enough possible instances of training data. Before you build a model as well as spend time on development and resources, check to see if you have enough data first.
这个例子比较常见。 制作模型时,您要确保有足够的数据。 可能会出现坏数据输入和坏模型输出的情况,对于没有足够的数据会产生坏模型的情况也可以这样说。 该模型似乎甚至表现良好,但不能很好地推广到新情况。 您可能过度拟合,或者只是没有将环境暴露于足够的训练数据实例中。 在构建模型以及花时间在开发和资源上之前,请先检查是否有足够的数据。
When it’s a one-off task
当是一次性任务时
This example is a little more dependant on the specific situation. You may be asked to perform a Data Science model from a non-technical stakeholder or leader in your company, and perhaps should ask yourself if Data Science is necessary.
这个例子更多地取决于具体情况。 可能会要求公司的非技术利益相关者或负责人执行数据科学模型,并且也许应该问自己是否需要数据科学。
— if you are not outputting results daily, weekly, or even monthly, you may not want to spend the time or creating a complex model that incorporates the scheduling of ingesting new data.
—如果您不是每天,每周甚至每月都不输出结果,则可能不希望花费时间或创建包含吸收新数据调度的复杂模型。
You could apply similar skills to answer this business problem and suggest to the stakeholder that since you only need to have one outputted CSV file, for example, you can answer the question with a simple Python function (you may not need to go in-to-depth with your stakeholders as to why you are not going to use a Data Science model, as some stakeholders just want an outputted result and do not care how you got it). You may just need a small function that manually mimics the themes of a Data Science model. If you know the situation well, you could create bins or weights yourself and apply those to features or columns and come up with your own score. Here is an example of what I am describing:
您可以应用类似的技能来回答此业务问题,并向涉众建议,例如,由于您仅需要输出一个CSV文件,因此可以使用简单的Python函数来回答问题( 您可能不需要进入与您的利益相关者深入了解为什么不使用数据科学模型,因为一些利益相关者只是想要输出结果,而不关心您如何获得它 。 您可能只需要一个小的功能即可手动模仿数据科学模型的主题。 如果您很了解情况,则可以自己创建箱或权重,然后将其应用于要素或列,并得出自己的分数。 这是我正在描述的示例:
Example:.50*(feature_1) + .20*(feature_2) + .30(feature_3) = score (scaled)
While this might not be the most ‘accurate’, if you need a quick way to organize data, a function like this or something similar could be sufficient.
尽管这可能不是最“ 准确 ”的方法,但是如果您需要一种快速的方法来组织数据,那么像这样的功能或类似的功能就足够了。
When you don’t have labeled data
当您没有标签数据时
Sometimes you may encounter a situation where you want to classify thousands of observations, but you have too much unlabeled data in your dataset. There are ways around this problem like labeling software or unsupervised techniques to create new labels. However, if you find that either using human effort or other software services to label takes up too much time and money, then you may want to reassess the situation. Perhaps you need to perform more data engineering techniques like accessing an API before you implement a Data Science model.
有时您可能会遇到想要对数千个观测值进行分类的情况,但是数据集中的未标记数据过多。 解决此问题的方法有很多,例如标签软件或创建新标签的无监督技术。 但是,如果您发现使用人工或其他软件服务进行标记会占用太多时间和金钱,那么您可能需要重新评估情况。 在实现数据科学模型之前,可能需要执行更多的数据工程技术,例如访问API。
When your budget is tight
当您的预算紧张时
Depending on how much data you are ingesting and predicting, training a model can be expensive. Your company may not have enough resources yet, and an expensive Data Science model not may be feasible.
根据要摄取和预测的数据量,训练模型可能会很昂贵。 您的公司可能没有足够的资源,昂贵的数据科学模型可能不可行。
This point goes along with ‘when you do not have enough time’ as well. You may have a certain deadline that is soon approaching and there are methods other than Data Science that can be beneficial like Python functions and rules.
这一点与“ 当您没有足够的时间时 ”也是如此。 您可能有一个即将到来的截止日期,并且除了Data Science之外,还有其他一些方法可能会有益,例如Python函数和规则。
什么时候应该使用数据科学 (When You Should Use Data Science)
There are countless situations when you should use Data Science and Machine Learning. Essentially, you could flip the above examples, or look at if you have an unsupervised, supervised, time-series, etc situation for when you should use Data Science.
在无数情况下,您应该使用数据科学和机器学习。 从本质上讲,您可以翻转上面的示例,或者查看何时使用数据科学时是否处于不受监督,受监督,时间序列等的情况。
You can also apply the above examples but incorporate both Data Science techniques and manual processes as well. Human-in-the-loop is becoming more common as a good bridge between these two practices.
您也可以应用上述示例,但同时要结合数据科学技术和手动过程。 作为这两种实践之间的良好桥梁, 环环相扣的人正变得越来越普遍。
Some specific examples of when to use Data Science include, but are not limited to:
何时使用数据科学的一些具体示例包括但不限于:
Recommending movies to users
向用户推荐电影
Forecasting sales for a company
预测公司的销售
Analyzing sentiment of reviews
分析评论情绪
Predicting temperature for a given month
预测给定月份的温度
Etc.
等等。
The examples of ‘when not to use Data Science’ are not to discourage you from utilizing Data Science, but to stress the importance of ‘just because you can, does not mean you should’. Ultimately, it depends on your specific situation and what the output will be affecting. Therefore, each example can be rebutted to be a use case for Data Science given the specific environment.
“ 何时不使用数据科学 ”的例子并不是要阻止您使用数据科学,而是要强调“ 仅仅因为您可以,并不意味着您应该 ”的重要性。 最终,这取决于您的具体情况以及输出将影响什么。 因此,在特定的环境下,每个示例都可以反驳为Data Science的用例。
摘要 (Summary)
There are caveats to all of these examples, and you may end up using Data Science in these situations. Data Science is evolving and new facets are emerging. Keep in mind, this article is opinion oriented and these points or examples can change quickly. Feel free to comment down below when you think you should or should not use Data Science for a given situation. To summarize, here are all of the five examples of when you should not use Data Science.
所有这些示例都有一些警告,您可能最终在这些情况下使用数据科学。 数据科学正在发展,新的方面正在涌现。 请记住,本文以观点为导向,这些要点或示例可能会Swift改变。 如果您认为在特定情况下应该或不应该使用Data Science,请在下面随意评论。 总而言之,以下是您不应该使用数据科学的五个示例。
Classifying some health implicationsWhen you don’t have enough dataWhen it’s a one-off taskWhen you don’t have labeled dataWhen your budget is tight
I hope you enjoyed my article. Thank you for reading!
希望您喜欢我的文章。 感谢您的阅读!
翻译自: https://towardsdatascience.com/when-not-to-use-data-science-f2e42a3a77d3
数据科学还是计算机科学
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/390735.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!