数据库数据过长避免
让我们从一个类比开始 (Let's start with an analogy)
Stick with me, I promise it’s relevant.
坚持下去,我保证这很重要。
If your selling vegetables in a grocery store your business value lies in your loyal customers and your position on the high street that sees a high footfall. You probably don’t have a fancy dandy shop front, it’s just boxes of veg, it’s that and your quality sales staff that sells the veg to the passers-by.
如果您在杂货店里卖菜,您的业务价值就在于您的忠实客户和您在大街上人流量大的位置。 您可能没有花哨的花花公子店面,只是一箱蔬菜,是这样,还有您的优质销售人员将蔬菜卖给路人。
One day a salesman from High Tech Veg Retail Solutions Inc comes into your shop. He tells you “cardboard boxes are not efficient and unmanageable”. He has a product that will keep your veg in a locked fridge in the back of the shop, but passers-by could simply ask for cauliflower and it would be whizzed at top speed via conveyer belt to them.
有一天,来自高科技蔬菜零售解决方案公司的推销员走进您的商店。 他告诉您“纸箱效率不高且无法管理”。 他的产品可以将您的蔬菜放在商店后方的锁冰箱中,但是过路人可以简单地索要花椰菜,然后会通过传送带以极高的速度将菜花打发到他们身上。
It does almost everything, the only downside is that due to the complexity of the machine you will only be able to stock half your current range of veg and by the way, all the veg will still be stored in cardboard boxes inside the fridge.
它几乎可以完成所有工作,唯一的缺点是,由于机器的复杂性,您将只能储备当前范围的蔬菜的一半,而且,所有的蔬菜仍将存储在冰箱内的纸板箱中。
On the upside, you can get rid of your quality staff and employ cheaper staff with fewer skills.
从好的方面来看,您可以摆脱高素质的员工,而聘用技能较少的廉价员工。
I’m sure you would send him on his way to find another victim.
我相信您会派他去寻找另一名受害者。
您的商业价值是知识产权 (Your business value is Intellectual Property)
If your reading this article, then you are either considering AI and ML or are already using it and have heard that there is a much better commercial data science platform available.
如果您阅读本文,那么您正在考虑使用AI和ML或已经在使用AI和ML,并且听说有一个更好的商业数据科学平台可用。
In the remainder of this article, I’m going to explain why you would be making a big mistake investing in a commercial data science solution.
在本文的其余部分中,我将解释为什么您在商业数据科学解决方案上进行投资会犯一个大错误。
开源纸箱 (Open source cardboard boxes)
Those free cardboard boxes that are easily accessible on the shop front are your Open Source AI and ML toolsets, freely available and easily accessible.
那些在商店前部容易获得的免费纸板箱是您的开源AI和ML工具集,可免费获得且易于获得。
They don’t hide anything, you can see everything you put in and you can stand by the output, even for safety-critical applications because you can describe how you got your results.
它们不会隐藏任何内容,您可以看到所输入的所有内容,并且可以支持输出,即使对于安全性至关重要的应用程序也是如此,因为您可以描述如何获得结果。
Every available option for squeezing that last 20% out of your model that produces 80% of its value is available to you.
您可以使用每个可用选项来将模型中的最后20%压缩,从而产生其价值的80%。
Any training you need is free or very low cost at least and is easily accessible 24 hours a day on many different web sites.
您需要的任何培训至少都是免费的或非常廉价的,并且每天24小时均可在许多不同的网站上轻松访问。
The most common language adopted by Opensource tools is Python. A language learned at High School, college, and University.
开源工具采用的最常见的语言是Python。 在高中,大学和大学学习的一种语言。
带有闪亮贴纸的昂贵纸板箱 (Expensive cardboard boxes with a shiny sticker)
This is what commercial AI and ML platforms offer.
这就是商业AI和ML平台所提供的。
Under the hood, they are employing the same Opensource tools you can access for free. Yes, they have a fancy wrapper around them, a conveyer belt built-in, and a shiny sticker to boot.
在幕后,他们正在使用可以免费访问的相同开源工具。 是的,它们周围有精美的包装纸,内置的传送带和引导套。
The only way to access those free tools though, is through the interface the platform provides you with. Its a really pretty interface, but it only gives you access to a fraction of the capability of what the underlying opensource tools are capable of.
但是,访问这些免费工具的唯一方法是通过平台提供的界面。 它的界面非常漂亮,但是只允许您访问底层开源工具所能提供的部分功能。
I can’t think of any commercial data science platform that is not employing Opensource tools at its heart.
我想不出任何没有真正使用开放源代码工具的商业数据科学平台。
The 80/20 ruleThe data scientists that could get that last 20% out of a model for you, are now reduced to dragging, dropping, and clicking a mouse and you're losing 80% of your business value. I hear you say, “but the results are much faster on this vendors platform”, OK, so you’re losing 80% of your business value faster!
80/20规则可以为您从模型中获得最后20% 收益的数据科学家现在减少为拖放,单击和单击鼠标,您将失去80%的业务价值。 我听到你说,“但是在这个供应商平台上,结果更快”,好的,因此您损失了80%的业务价值!
Also, ask yourself why is this vendors platform faster, it’s because that last 20% that gets 80% of the value is not the low hanging fruit. It’s complex, it’s why data scientists dedicate their careers to the subject and its why they are invaluable as data scientists and not mouse clickers
另外,问自己为什么这个供应商平台更快,这是因为最后20%获得80%的价值的原因并不容易。 这很复杂,这就是为什么数据科学家将自己的职业奉献给该学科,以及为什么他们作为数据科学家而不是鼠标点击者而具有不可估量的价值
Where is your business value now?Let’s assume that this commercial platform, by some miracle, could get 100% of the value you can get from unrestricted Opensource tools, where is your business value now? It’s locked into this vendor's platform, a platform your spending a huge amount of money on.
您现在的业务价值在哪里? 让我们假设这个商业平台可以奇迹般地从无限制的开源工具中获得100%的价值,现在您的商业价值在哪里? 它已锁定在该供应商的平台中,您在该平台上花费了大量金钱。
You can’t extract your IP, its been converted into a proprietary format. Even if you could reverse engineer their generated code (see you in court), the best you would get is a result that is missing that last 20% and how long did the reverse engineering take you.
您无法提取您的IP,它已转换为专有格式。 即使您可以对他们生成的代码进行逆向工程(法庭上见),您得到的最好结果就是遗漏了最后20%的结果,以及逆向工程花费了您多长时间。
The tail wagging the dogAI and ML are improving all the time. Every few months a new feature comes out that wows the community and offers your business even more potential revenue.
摇摆狗 AI和ML 的尾巴一直在改善。 每隔几个月就会发布一项新功能,该功能引起了社区的赞誉,并为您的企业提供了更多的潜在收入。
Your vendor's commercial application and UI is so tightly integrated into the older versions of the Opensource software, that you won’t see that update for another 6 to 12 months. Forget it, six months is a lifetime in AI and ML, you just missed that opportunity.
您供应商的商业应用程序和用户界面是如此紧密地集成到旧版本的开源软件中,以至于再过6至12个月您都不会看到该更新。 算了,六个月是AI和ML的生命,您只是错过了这个机会。
Recruitment, retention, and training. Every data scientist you recruit, will, for the most part, come fully trained on the opensource tools that they have been working with for years. Those that are just out of university, will be full of enthusiasm, have fresh ideas. The one thing they all have in common, is they are all experts on the opensource tools sets, that will let them bring their enthusiasm and ideas to reality.
招聘,保留和培训。 您招募的每位数据科学家都将在很大程度上接受他们多年来使用的开源工具的全面培训。 那些刚大学毕业的人会充满热情,并有新的想法。 他们都有一个共同点,就是他们都是开源工具集的专家,这将使他们将热情和想法变为现实。
Of course, you're going to tell them in the interview to forget all that knowledge they have worked hard to accrue, you have just invested a lot of money on a proprietary system that has half the data science capability they are used to and which they have never heard of before.
当然,您将在面试中告诉他们,他们会忘记他们辛辛苦苦积累的所有知识,您刚刚在专有系统上投入了很多钱,而该专有系统具有他们惯用的数据科学能力的一半,并且他们从未听说过。
The long and short is you will find it hard to recruit staff and impossible to recruit talented staff. Any talented staff you currently have will soon be leaving as well.
总而言之,您将很难招募员工,也很难招募有才能的员工。 您目前拥有的所有有才能的员工也将很快离开。
Trust the grassroots. You will very rarely hear a data scientist raving about a commercial data science platform. For that reason, most of the vendors offering these products don’t target the grassroots. They go directly to the senior managers and even CEO looking for a top-down decision. Most CEO’s understand the value of data science, but the details are complex and overwhelming. So when a well-trained salesman scares the living shit out of them with horror stories of Opensource wow’s they tend to believe them.
相信基层。 您很少会听到数据科学家对商业数据科学平台大加赞赏。 因此,大多数提供这些产品的供应商都不以基层为目标。 他们直接向高级经理甚至首席执行官寻求自上而下的决定。 大多数首席执行官都了解数据科学的价值,但细节复杂而压倒性。 因此,当一个训练有素的推销员以开放源代码的恐怖故事吓them他们的生活时,他们往往会相信它们。
Talk to your own loyal staff before forcing something on them. Find out what opensource tools they currently use and what could be done better if a small investment was made, or they were given the time to design and implement a more suitable stack. After all, they work in your business, they know your requirements, and I guarantee the costs will be orders of magnitude less than paying for a commercial platform.
在强迫他们之前,先与自己的忠实员工交谈。 找出他们当前使用哪些开源工具,如果进行少量投资,或者他们有时间设计和实现更合适的堆栈,则可以做得更好。 毕竟,他们在您的企业中工作,知道您的要求,并且我保证成本将比为商业平台支付的费用少几个数量级。
综上所述 (In summary)
If you have got a data science requirement and money to invest, invest it wisely. Invest in talented individuals. Look at how you can make a small investment in infrastructure to get a big payback from the tools they already use. Your skilled staff will make your company more valuable and you will retain 100% of your business IP. You don’t need a high tech cardboard box, the free opensource ones, you already have are the best you can get.
如果您有数据科学方面的要求和资金来进行投资,请明智地进行投资。 投资有才华的人。 看一下如何在基础架构上进行少量投资,以从他们已经使用的工具中获得丰厚的回报。 熟练的员工将使您的公司更有价值,并且您将保留100%的业务IP。 您不需要高科技的纸板箱,免费的开源纸板箱,已经是最好的了。
翻译自: https://medium.com/swlh/why-you-should-avoid-commercial-data-science-platforms-6e9c4b5f3596
数据库数据过长避免
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/392537.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!