数据库数据过长避免_为什么要避免使用商业数据科学平台

数据库数据过长避免

让我们从一个类比开始 (Let's start with an analogy)

Stick with me, I promise it’s relevant.

坚持下去,我保证这很重要。

If your selling vegetables in a grocery store your business value lies in your loyal customers and your position on the high street that sees a high footfall. You probably don’t have a fancy dandy shop front, it’s just boxes of veg, it’s that and your quality sales staff that sells the veg to the passers-by.

如果您在杂货店里卖菜,您的业务价值就在于您的忠实客户和您在大街上人流量大的位置。 您可能没有花哨的花花公子店面,只是一箱蔬菜,是这样,还有您的优质销售人员将蔬菜卖给路人。

One day a salesman from High Tech Veg Retail Solutions Inc comes into your shop. He tells you “cardboard boxes are not efficient and unmanageable”. He has a product that will keep your veg in a locked fridge in the back of the shop, but passers-by could simply ask for cauliflower and it would be whizzed at top speed via conveyer belt to them.

有一天,来自高科技蔬菜零售解决方案公司的推销员走进您的商店。 他告诉您“纸箱效率不高且无法管理”。 他的产品可以将您的蔬菜放在商店后方的锁冰箱中,但是过路人可以简单地索要花椰菜,然后会通过传送带以极高的速度将菜花打发到他们身上。

It does almost everything, the only downside is that due to the complexity of the machine you will only be able to stock half your current range of veg and by the way, all the veg will still be stored in cardboard boxes inside the fridge.

它几乎可以完成所有工作,唯一的缺点是,由于机器的复杂性,您将只能储备当前范围的蔬菜的一半,而且,所有的蔬菜仍将存储在冰箱内的纸板箱中。

On the upside, you can get rid of your quality staff and employ cheaper staff with fewer skills.

从好的方面来看,您可以摆脱高素质的员工,而聘用技能较少的廉价员工。

I’m sure you would send him on his way to find another victim.

我相信您会派他去寻找另一名受害者。

您的商业价值是知识产权 (Your business value is Intellectual Property)

If your reading this article, then you are either considering AI and ML or are already using it and have heard that there is a much better commercial data science platform available.

如果您阅读本文,那么您正在考虑使用AI和ML或已经在使用AI和ML,并且听说有一个更好的商业数据科学平台可用。

In the remainder of this article, I’m going to explain why you would be making a big mistake investing in a commercial data science solution.

在本文的其余部分中,我将解释为什么您在商业数据科学解决方案上进行投资会犯一个大错误。

开源纸箱 (Open source cardboard boxes)

Those free cardboard boxes that are easily accessible on the shop front are your Open Source AI and ML toolsets, freely available and easily accessible.

那些在商店前部容易获得的免费纸板箱是您的开源AI和ML工具集,可免费获得且易于获得。

They don’t hide anything, you can see everything you put in and you can stand by the output, even for safety-critical applications because you can describe how you got your results.

它们不会隐藏任何内容,您可以看到所输入的所有内容,并且可以支持输出,即使对于安全性至关重要的应用程序也是如此,因为您可以描述如何获得结果。

Every available option for squeezing that last 20% out of your model that produces 80% of its value is available to you.

您可以使用每个可用选项来将模型中的最后20%压缩,从而产生其价值的80%。

Any training you need is free or very low cost at least and is easily accessible 24 hours a day on many different web sites.

您需要的任何培训至少都是免费的或非常廉价的,并且每天24小时均可在许多不同的网站上轻松访问。

The most common language adopted by Opensource tools is Python. A language learned at High School, college, and University.

开源工具采用的最常见的语言是Python。 在高中,大学和大学学习的一种语言。

带有闪亮贴纸的昂贵纸板箱 (Expensive cardboard boxes with a shiny sticker)

This is what commercial AI and ML platforms offer.

这就是商业AI和ML平台所提供的。

Under the hood, they are employing the same Opensource tools you can access for free. Yes, they have a fancy wrapper around them, a conveyer belt built-in, and a shiny sticker to boot.

在幕后,他们正在使用可以免费访问的相同开源工具。 是的,它们周围有精美的包装纸,内置的传送带和引导套。

The only way to access those free tools though, is through the interface the platform provides you with. Its a really pretty interface, but it only gives you access to a fraction of the capability of what the underlying opensource tools are capable of.

但是,访问这些免费工具的唯一方法是通过平台提供的界面。 它的界面非常漂亮,但是只允许您访问底层开源工具所能提供的部分功能。

I can’t think of any commercial data science platform that is not employing Opensource tools at its heart.

我想不出任何没有真正使用开放源代码工具的商业数据科学平台。

The 80/20 ruleThe data scientists that could get that last 20% out of a model for you, are now reduced to dragging, dropping, and clicking a mouse and you're losing 80% of your business value. I hear you say, “but the results are much faster on this vendors platform”, OK, so you’re losing 80% of your business value faster!

80/20规则可以为您从模型中获得最后20% 收益的数据科学家现在减少为拖放,单击和单击鼠标,您将失去80%的业务价值。 我听到你说,“但是在这个供应商平台上,结果更快”,好的,因此您损失了80%的业务价值!

Also, ask yourself why is this vendors platform faster, it’s because that last 20% that gets 80% of the value is not the low hanging fruit. It’s complex, it’s why data scientists dedicate their careers to the subject and its why they are invaluable as data scientists and not mouse clickers

另外,问自己为什么这个供应商平台更快,这是因为最后20%获得80%的价值的原因并不容易。 这很复杂,这就是为什么数据科学家将自己的职业奉献给该学科,以及为什么他们作为数据科学家而不是鼠标点击者而具有不可估量的价值

Where is your business value now?Let’s assume that this commercial platform, by some miracle, could get 100% of the value you can get from unrestricted Opensource tools, where is your business value now? It’s locked into this vendor's platform, a platform your spending a huge amount of money on.

您现在的业务价值在哪里? 让我们假设这个商业平台可以奇迹般地从无限制的开源工具中获得100%的价值,现在您的商业价值在哪里? 它已锁定在该供应商的平台中,您在该平台上花费了大量金钱。

You can’t extract your IP, its been converted into a proprietary format. Even if you could reverse engineer their generated code (see you in court), the best you would get is a result that is missing that last 20% and how long did the reverse engineering take you.

您无法提取您的IP,它已转换为专有格式。 即使您可以对他们生成的代码进行逆向工程(法庭上见),您得到的最好结果就是遗漏了最后20%的结果,以及逆向工程花费了您多长时间。

The tail wagging the dogAI and ML are improving all the time. Every few months a new feature comes out that wows the community and offers your business even more potential revenue.

摇摆狗 AI和ML 的尾巴一直在改善。 每隔几个月就会发布一项新功能,该功能引起了社区的赞誉,并为您的企业提供了更多的潜在收入。

Your vendor's commercial application and UI is so tightly integrated into the older versions of the Opensource software, that you won’t see that update for another 6 to 12 months. Forget it, six months is a lifetime in AI and ML, you just missed that opportunity.

您供应商的商业应用程序和用户界面是如此紧密地集成到旧版本的开源软件中,以至于再过6至12个月您都不会看到该更新。 算了,六个月是AI和ML的生命,您只是错过了这个机会。

Recruitment, retention, and training. Every data scientist you recruit, will, for the most part, come fully trained on the opensource tools that they have been working with for years. Those that are just out of university, will be full of enthusiasm, have fresh ideas. The one thing they all have in common, is they are all experts on the opensource tools sets, that will let them bring their enthusiasm and ideas to reality.

招聘,保留和培训。 您招募的每位数据科学家都将在很大程度上接受他们多年来使用的开源工具的全面培训。 那些刚大学毕业的人会充满热情,并有新的想法。 他们都有一个共同点,就是他们都是开源工具集的专家,这将使他们将热情和想法变为现实。

Of course, you're going to tell them in the interview to forget all that knowledge they have worked hard to accrue, you have just invested a lot of money on a proprietary system that has half the data science capability they are used to and which they have never heard of before.

当然,您将在面试中告诉他们,他们会忘记他们辛辛苦苦积累的所有知识,您刚刚在专有系统上投入了很多钱,而该专有系统具有他们惯用的数据科学能力的一半,并且他们从未听说过。

The long and short is you will find it hard to recruit staff and impossible to recruit talented staff. Any talented staff you currently have will soon be leaving as well.

总而言之,您将很难招募员工,也很难招募有才能的员工。 您目前拥有的所有有才能的员工也将很快离开。

Trust the grassroots. You will very rarely hear a data scientist raving about a commercial data science platform. For that reason, most of the vendors offering these products don’t target the grassroots. They go directly to the senior managers and even CEO looking for a top-down decision. Most CEO’s understand the value of data science, but the details are complex and overwhelming. So when a well-trained salesman scares the living shit out of them with horror stories of Opensource wow’s they tend to believe them.

相信基层。 您很少会听到数据科学家对商业数据科学平台大加赞赏。 因此,大多数提供这些产品的供应商都不以基层为目标。 他们直接向高级经理甚至首席执行官寻求自上而下的决定。 大多数首席执行官都了解数据科学的价值,但细节复杂而压倒性。 因此,当一个训练有素的推销员以开放源代码的恐怖故事吓them他们的生活时,他们往往会相信它们。

Talk to your own loyal staff before forcing something on them. Find out what opensource tools they currently use and what could be done better if a small investment was made, or they were given the time to design and implement a more suitable stack. After all, they work in your business, they know your requirements, and I guarantee the costs will be orders of magnitude less than paying for a commercial platform.

在强迫他们之前,先与自己的忠实员工交谈。 找出他们当前使用哪些开源工具,如果进行少量投资,或者他们有时间设计和实现更合适的堆栈,则可以做得更好。 毕竟,他们在您的企业中工作,知道您的要求,并且我保证成本将比为商业平台支付的费用少几个数量级。

综上所述 (In summary)

If you have got a data science requirement and money to invest, invest it wisely. Invest in talented individuals. Look at how you can make a small investment in infrastructure to get a big payback from the tools they already use. Your skilled staff will make your company more valuable and you will retain 100% of your business IP. You don’t need a high tech cardboard box, the free opensource ones, you already have are the best you can get.

如果您有数据科学方面的要求和资金来进行投资,请明智地进行投资。 投资有才华的人。 看一下如何在基础架构上进行少量投资,以从他们已经使用的工具中获得丰厚的回报。 熟练的员工将使您的公司更有价值,并且您将保留100%的业务IP。 您不需要高科技的纸板箱,免费的开源纸板箱,已经是最好的了。

翻译自: https://medium.com/swlh/why-you-should-avoid-commercial-data-science-platforms-6e9c4b5f3596

数据库数据过长避免

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/392537.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

mysql case快捷方法_MySQL case when使用方法实例解析

首先我们创建数据库表: CREATE TABLE t_demo (id int(32) NOT NULL,name varchar(255) DEFAULT NULL,age int(2) DEFAULT NULL,num int(3) DEFAULT NULL,PRIMARY KEY (id)) ENGINEInnoDB DEFAULT CHARSETutf8;插入数据:INSERT INTO t_demo VALUES (1, 张…

【~~~】POJ-1006

很简单的一道题目,但是引出了很多知识点。 这是一道中国剩余问题,先贴一下1006的代码。 #include "stdio.h" #define MAX 21252 int main() { int p , e , i , d , n 1 , days 0; while(1) { scanf("%d %d %d %d",&p,&e,&…

Java快速扫盲指南

文章转自:https://segmentfault.com/a/1190000004817465#articleHeader22 JDK,JRE和 JVM 的区别 JVM:java 虚拟机,负责将编译产生的字节码转换为特定机器代码,实现一次编译多处执行; JRE:java运…

xcode扩展_如何将Xcode插件转换为Xcode扩展名

xcode扩展by Khoa Pham通过Khoa Pham 如何将Xcode插件转换为Xcode扩展名 (How to convert your Xcode plugins to Xcode extensions) Xcode is an indispensable IDE for iOS and macOS developers. From the early days, the ability to build and install custom plugins ha…

leetcode 861. 翻转矩阵后的得分(贪心算法)

有一个二维矩阵 A 其中每个元素的值为 0 或 1 。 移动是指选择任一行或列,并转换该行或列中的每一个值:将所有 0 都更改为 1,将所有 1 都更改为 0。 在做出任意次数的移动后,将该矩阵的每一行都按照二进制数来解释,矩…

数据分析团队的价值_您的数据科学团队的价值

数据分析团队的价值This is the first article in a 2-part series!!这是分两部分的系列文章中的第一篇! 组织数据科学 (Organisational Data Science) Few would argue against the importance of data in today’s highly competitive corporate world. The tech…

mysql 保留5位小数_小猿圈分享-MySQL保留几位小数的4种方法

今天小猿圈给大家分享的是MySQL使用中4种保留小数的方法,希望可以帮助到大家,让大家的工作更加方便。1 round(x,d)用于数据x的四舍五入, round(x) ,其实就是round(x,0),也就是默认d为0;这里有个值得注意的地方是,d可以是负数&…

leetcode 842. 将数组拆分成斐波那契序列(回溯算法)

给定一个数字字符串 S&#xff0c;比如 S “123456579”&#xff0c;我们可以将它分成斐波那契式的序列 [123, 456, 579]。 形式上&#xff0c;斐波那契式序列是一个非负整数列表 F&#xff0c;且满足&#xff1a; 0 < F[i] < 2^31 - 1&#xff0c;&#xff08;也就是…

博主简介

面向各层次&#xff08;从中学到博士&#xff09;提供GIS和Python GIS案例实验实习培训&#xff0c;以解决问题为导向&#xff0c;以项目实战为主线&#xff0c;以科学研究为思维&#xff0c;不讲概念&#xff0c;不局限理论&#xff0c;简单照做&#xff0c;即学即会。 研究背…

自定义Toast 很简单就可以达到一些对话框的效果 使用起来很方便

自定义一个layout布局 通过toast.setView 设置布局弹出一些警示框 等一些不会改变的提示框 很方便public class CustomToast {public static void showUSBToast(Context context) {//加载Toast布局 View toastRoot LayoutInflater.from(context).inflate(R.layout.toas…

微信小程序阻止冒泡点击_微信小程序bindtap事件与冒泡阻止详解

bindtap就是点击事件在.wxml文件绑定:cilck here在一个组件的属性上添加bindtap并赋予一个值(一个函数名)当点击该组件时, 会触发相应的函数执行在后台.js文件中定义tapMessage函数://index.jsPage({data: {mo: Hello World!!,userid : 1234,},// 定义函数tapMessage: function…

同情机器人_同情心如何帮助您建立更好的工作文化

同情机器人Empathy is one of those things that can help in any part of life whether it’s your family, friends, that special person and even also at work. Understanding what empathy is and how it effects people took me long time. I struggle with human inter…

数据库课程设计结论_结论

数据库课程设计结论When writing about learning or breaking into data science, I always advise building projects.在撰写有关学习或涉足数据科学的文章时&#xff0c;我总是建议构建项目。 It is the best way to learn as well as showcase your skills.这是学习和展示技…

mongo基本使用方法

mongo与关系型数据库的概念对比&#xff0c;区分大小写&#xff0c;_id为主键。 1.数据库操作 >show dbs #查看所有数据库 >use dbname #创建和切换数据库&#xff08;如果dbname存在则切换到该数据库&#xff0c;不存在则创建并切换到该数据库&#xff1b;新创建的…

leetcode 62. 不同路径(dp)

一个机器人位于一个 m x n 网格的左上角 &#xff08;起始点在下图中标记为“Start” &#xff09;。 机器人每次只能向下或者向右移动一步。机器人试图达到网格的右下角&#xff08;在下图中标记为“Finish”&#xff09;。 问总共有多少条不同的路径&#xff1f; 例如&…

第一名数据科学工作冠状病毒医生

背景 (Background) 3 years ago, I had just finished medical school and started working full-time as a doctor in the UK’s National Health Service (NHS). Now, I work full-time as a data scientist at dunnhumby, writing code for “Big Data” analytics with Pyt…

mysql时间区间效率_对于sql中使用to_timestamp判断时间区间和不使用的效率对比及结论...

关于日期函数TO_TIMESTAMP拓展&#xff1a;date类型是Oracle常用的日期型变量&#xff0c;时间间隔是秒。两个日期型相减得到是两个时间的间隔&#xff0c;注意单位是“天”。timestamp是DATE类型的扩展&#xff0c;可以精确到小数秒(fractional_seconds_precision)&#xff0c…

ajax 赋值return

ajax 获得结果后赋值无法成功&#xff0c; function grades(num){ var name"";   $.ajax({    type:"get",     url:"",     async:true,     success:function(result){     var grades result.grades;     …

JavaScript(ES6)传播算子和rest参数简介

by Joanna Gaudyn乔安娜高登(Joanna Gaudyn) JavaScript(ES6)传播算子和rest参数简介 (An intro to the spread operator and rest parameter in JavaScript (ES6)) 扩展运算符和rest参数都被写为三个连续的点(…)。 他们还有其他共同点吗&#xff1f; (Both the spread opera…

python爬虫消费者与生产者_Condition版生产者与消费者模式

概述&#xff1a;在人工智能来临的今天&#xff0c;数据显得格外重要。在互联网的浩瀚大海洋中&#xff0c;隐藏着无穷的数据和信息。因此学习网络爬虫是在今天立足的一项必备技能。本路线专门针对想要从事Python网络爬虫的同学而准备的&#xff0c;并且是严格按照企业的标准定…