算命数据_未来的数据科学家或算命精神向导

算命数据

Real Estate Sale Prices, Regression, and Classification: Data Science is the Future of Fortune Telling

房地产销售价格,回归和分类:数据科学是算命的未来

As we all know, I am unusually blessed with totally-real psychic abilities.

众所周知,我拥有非凡的心理能力。

My background as a psychic extends way back to my childhood. On my sixth birthday, my mother got me a full astrological prediction printed out for the next year of my life. I, of course, was disappointed. Not because I was too young for uncanny predictions of the future. But because, I already had the psychic abilities needed to predict my fate. Each morning, I would read the patterns of cheerio-residue leftover in my breakfast cereal bowls. Obviously. I had a system for making sure my future stayed bright!

我的通灵背景可以追溯到童年时代。 在我的六岁生日那天,母亲为我提供了有关生命的第二年的完整的占星术预测。 我当然感到失望。 不是因为我还太年轻,无法对未来做出不可思议的预测。 但是,因为我已经具备了预测命运的心理能力。 每天早晨,我都会在早餐谷物碗中阅读残留的麦角酒残留的图案。 明显。 我有一个系统来确保我的前途一片光明!

In all seriousness though, as a 20-year-old young Data Scientist now, I discover more and more similarities between the skills of a fortune teller and a data scientist. Finally, I’ll be able to put my years of useless-seeming, arcane knowledge to good use. You don’t believe me?

严肃地说,作为一个现年20岁的年轻数据科学家,我发现算命先生和数据科学家之间的技能越来越相似。 最后,我将能够充分利用我多年的无用的神秘知识。 你不相信我吗?

Well algorithms and machine learning are a perfect example of modern fortune telling in practice. Nowadays, the experience of finding invasive amazon ads personally customized to your own interests is near universal:

好的算法和机器学习是实践中现代算命的完美示例。 如今,找到针对您自己的兴趣量身定制的侵入性亚马逊广告的经验几乎普及了:

Image for post

Machine learning is the process of teaching a computer to be able to predict future data points from its previous body of information. The main form of machine learning I focused on in my data science project, “Predicting Real Estate Sale Prices with the Ames, Iowa Housing Dataset,” is linear regression. This model creates a line of best fit over the dataset in order to predict the likelihood of a house being a certain price (if it has, say, 20,000 sq. ft., a finished garage, no fence, etc.)

机器学习是教会计算机能够从其先前的信息主体预测未来数据点的过程。 在我的数据科学项目中,我关注的机器学习的主要形式是“使用爱荷华州住房数据集的Ames预测房地产销售价格”,是线性回归。 该模型在数据集中创建一条最合适的线,以预测房屋达到一定价格的可能性(例如,如果房屋有20,000平方英尺,已建成的车库,没有围栏等)。

The following infographic, for example, represents my analysis of the relationship between Real Estate Sale Price (the X-axis) and Gross Living Area (the Y-axis). Outliers have been removed from this particular set of data, helping preserve the quality of my linear regression predictor. This relationship between Sale Price and Gross Living Area, in addition to many other factors that are correlated with Sale Price highly, become my tools to predict how a house of a certain demographic will be priced.

例如,以下信息图代表我对房地产销售价格(X轴)和总居住面积(Y轴)之间关系的分析。 已从此特定数据集中删除了离群值,有助于保持线性回归预测变量的质量。 销售价格和总居住面积之间的这种关系,除了与销售价格高度相关的许多其他因素外,还成为我预测特定人口的房屋如何定价的工具。

Image for post

Ultimately, my linear regression model became able to predict houses with only a 27,000 Root Mean-Squared Error. This means that for any given house price prediction my model makes, the house’s actual (non-predicted) Sale Price will be on average $27,000 away from my prediction. Given the fact that the majority of houses sell for above $50,000 at least, this amount of error is relatively acceptable. However, my fortune-telling wizard powers now extend even further than just “Linear Regression.”

最终,我的线性回归模型开始能够预测只有27,000均方根误差的房屋。 这意味着,对于我的模型进行的任何给定的房价预测,该房屋的实际(未预测)售价均比我的预测平均低27,000美元。 考虑到大多数房屋的售价至少在50,000美元以上,因此这一误差是可以接受的。 但是,我算命向导的功能现在不仅可以扩展到“线性回归”。

I can also use “logistic regression” and “K-Nearest-Neighbors” classifiers to sort data, predicting which camps each of my data points will fall into. For instance, in my data science project “Tinder Problems or Relationship Advice?,” I scrape data from the subreddits for “Tinder” and “Relationship Advice” off of Reddit. Using a variety of Natural Language Processing techniques, I build a model that can predict whether or not that given post originates from “Tinder” or “Relationship Advice.”

我还可以使用“逻辑回归”和“ K最近邻”分类器对数据进行排序,以预测我的每个数据点将属于哪个阵营。 例如,在我的数据科学项目“ Tinder问题或关系建议?”中,我从Reddit的“ Tinder”和“ Relationship Advice”子目录中抓取了数据。 通过使用各种自然语言处理技术,我建立了一个模型,可以预测给定帖子是源自“ Tinder”还是“ Relationship Advice”。

Now, do I actually have the psychic ability to predict the future with ritual sacrifice? The world may never know. But, thankfully, I can just predict the future with Data Science skills like machine learning. I can create regressions to determine numerical predictions, classifiers to predict categorical outcomes, and I don’t even need to pull out my crystal ball.

现在,我真的有通过仪式牺牲来预测未来的心理能力吗? 世界可能永远不会知道。 但是,幸运的是,我可以借助诸如机器学习之类的数据科学技能来预测未来。 我可以创建回归来确定数值预测,创建分类器来预测分类结果,甚至不需要抽出水晶球。

And even better, unlike arcane sorcery, Data Science grounds all of its predictions in facts and previously gathered data. If anything, that’s the real magic of Data Science. I can take any amount of information in any field and, with enough time and effort, predict the future. What’s more magical than that?

甚至更好的是,与奥术法术不同,数据科学将其所有预测基于事实和先前收集的数据。 如果有的话,那就是数据科学的真正魔力。 我可以在任何领域获得大量信息,并花费足够的时间和精力来预测未来。 有什么比这更神奇的?

翻译自: https://medium.com/@jjp2196/data-scientist-or-fortune-telling-psychic-wizard-from-the-future-5e7a93025fe5

算命数据

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389378.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

openai-gpt_为什么到处都看到GPT-3?

openai-gptDisclaimer: My opinions are informed by my experience maintaining Cortex, an open source platform for machine learning engineering.免责声明:我的看法是基于我维护 机器学习工程的开源平台 Cortex的 经验而 得出 的。 If you frequent any part…

Pytorch高阶API示范——DNN二分类模型

代码部分: import numpy as np import pandas as pd from matplotlib import pyplot as plt import torch from torch import nn import torch.nn.functional as F from torch.utils.data import Dataset,DataLoader,TensorDataset""" 准备数据 &qu…

OO期末总结

$0 写在前面 善始善终,临近期末,为一学期的收获和努力画一个圆满的句号。 $1 测试与正确性论证的比较 $1-0 什么是测试? 测试是使用人工操作或者程序自动运行的方式来检验它是否满足规定的需求或弄清预期结果与实际结果之间的差别的过程。 它…

数据可视化及其重要性:Python

Data visualization is an important skill to possess for anyone trying to extract and communicate insights from data. In the field of machine learning, visualization plays a key role throughout the entire process of analysis.对于任何试图从数据中提取和传达见…

【洛谷算法题】P1046-[NOIP2005 普及组] 陶陶摘苹果【入门2分支结构】Java题解

👨‍💻博客主页:花无缺 欢迎 点赞👍 收藏⭐ 留言📝 加关注✅! 本文由 花无缺 原创 收录于专栏 【洛谷算法题】 文章目录 【洛谷算法题】P1046-[NOIP2005 普及组] 陶陶摘苹果【入门2分支结构】Java题解🌏题目…

python多项式回归_如何在Python中实现多项式回归模型

python多项式回归Let’s start with an example. We want to predict the Price of a home based on the Area and Age. The function below was used to generate Home Prices and we can pretend this is “real-world data” and our “job” is to create a model which wi…

充分利用UC berkeleys数据科学专业

By Kyra Wong and Kendall Kikkawa黄凯拉(Kyra Wong)和菊川健多 ( Kendall Kikkawa) 什么是“数据科学”? (What is ‘Data Science’?) Data collection, an important aspect of “data science”, is not a new idea. Before the tech boom, every industry al…

02-web框架

1 while True:print(server is waiting...)conn, addr server.accept()data conn.recv(1024) print(data:, data)# 1.得到请求的url路径# ------------dict/obj d["path":"/login"]# d.get(”path“)# 按着http请求协议解析数据# 专注于web业…

ai驱动数据安全治理_AI驱动的Web数据收集解决方案的新起点

ai驱动数据安全治理Data gathering consists of many time-consuming and complex activities. These include proxy management, data parsing, infrastructure management, overcoming fingerprinting anti-measures, rendering JavaScript-heavy websites at scale, and muc…

铁拳nat映射_铁拳如何重塑我的数据可视化设计流程

铁拳nat映射It’s been a full year since I’ve become an independent data visualization designer. When I first started, projects that came to me didn’t relate to my interests or skills. Over the past eight months, it’s become very clear to me that when cl…

DengAI —如何应对数据科学竞赛? (EDA)

了解机器学习 (Understanding ML) This article is based on my entry into DengAI competition on the DrivenData platform. I’ve managed to score within 0.2% (14/9069 as on 02 Jun 2020). Some of the ideas presented here are strictly designed for competitions li…

java.net.SocketException: Software caused connection abort: socket write erro

场景:接口测试 编辑器:eclipse 版本:Version: 2018-09 (4.9.0) testng版本:TestNG version 6.14.0 执行testng.xml时报错信息: 出现此报错原因之一:网上有人说是testng版本与eclipse版本不一致造成的&#…

使用K-Means对美因河畔法兰克福的社区进行聚类

介绍 (Introduction) This blog post summarizes the results of the Capstone Project in the IBM Data Science Specialization on Coursera. Within the project, the districts of Frankfurt am Main in Germany shall be clustered according to their venue data using t…

样本均值的抽样分布_抽样分布样本均值

样本均值的抽样分布One of the most important concepts discussed in the context of inferential data analysis is the idea of sampling distributions. Understanding sampling distributions helps us better comprehend and interpret results from our descriptive as …

玩转ceph性能测试---对象存储(一)

笔者最近在工作中需要测试ceph的rgw,于是边测试边学习。首先工具采用的intel的一个开源工具cosbench,这也是业界主流的对象存储测试工具。 1、cosbench的安装,启动下载最新的cosbench包wget https://github.com/intel-cloud/cosbench/release…

因果关系和相关关系 大数据_数据科学中的相关性与因果关系

因果关系和相关关系 大数据Let’s jump into it right away.让我们马上进入。 相关性 (Correlation) Correlation means relationship and association to another variable. For example, a movement in one variable associates with the movement in another variable. For…

vue取数据第一个数据_我作为数据科学家的第一个月

vue取数据第一个数据A lot.很多。 I landed my first job as a Data Scientist at the beginning of August, and like any new job, there’s a lot of information to take in at once.我于8月初找到了数据科学家的第一份工作,并且像任何新工作一样,一…

STL-开篇

基本概念 STL: Standard Template Library,标准模板库 定义: c引入的一个标准类库 特点:1)数据结构和算法的 c实现( 采用模板类和模板函数)2)数据的存储和算法的分离3)高…

rcp rapido_为什么气流非常适合Rapido

rcp rapidoBack in 2019, when we were building our data platform, we started building the data platform with Hadoop 2.8 and Apache Hive, managing our own HDFS. The need for managing workflows whether it’s data pipelines, i.e. ETL’s, machine learning predi…

Mysql5.7开启远程

2019独角兽企业重金招聘Python工程师标准>>> 1.注掉bind-address #bind-address 127.0.0.1 2.开启远程访问权限 grant all privileges on *.* to root"xxx.xxx.xxx.xxx" identified by "密码"; 或 grant all privileges on *.* to root"%…