统计信息在数据库中的作用
数据科学与机器学习 (DATA SCIENCE AND MACHINE LEARNING)
Statistics are everywhere, and most industries rely on statistics and statistical thinking to support their business. The interest to grasp on statistics also required to become a successful data scientist. You need to demonstrate your keen on this field of discipline.
统计数据无处不在,大多数行业都依靠统计数据和统计思想来支持其业务。 掌握统计数据的兴趣也需要成为一名成功的数据科学家。 您需要表现出对这一学科领域的热忱。
What is statistics?
什么是统计数据?
It is the subject that includes all features of learning from data. As a methodology, we speak about the means and methods to allow us to work with data and to understand that data. Statisticians employ and develop data analysis methods and continue exploring to understand their properties.
它是包括从数据中学习的所有功能的主题。 作为一种方法论,我们谈论允许我们处理数据并理解该数据的方式和方法。 统计人员使用和开发数据分析方法,并继续探索以了解其属性。
When will those tools provide insight?When are they possibly misleading?
这些工具何时会提供洞察力?何时可能会产生误导?
Researchers across all various academic fields, workers in many industries, are implementing and reaching the statistical methodology, and they are providing new approaches and techniques for conducting data analysis. A concise terminology is needed upfront, which is the difference between a statistic and the field of statistics.
各个学术领域的研究人员,许多行业的工人,正在实施并达到统计方法论,他们正在提供进行数据分析的新方法和技术。 首先需要一个简洁的术语,这是统计和统计领域之间的区别。
We encounter numerical or graphical reports from a collection of data every day. For instance, the average of total students score on the final exam, the proportion of employed and unemployed workers in some countries, or maybe stocks prices fluctuation in a day. These are statistics.
我们每天都会遇到来自数据收集的数字或图形报告。 例如,在期末考试中学生总数的平均值,某些国家的就业和失业工人比例,或者一天中的股票价格波动。 这些是统计数据。
However, the field of statistics is an academic discipline focusing on research methodology. The essential aspects as statisticians are developing new statistical tools, calculating statistics from data, and collaborating with the specialists to interpret those results in proper ways.
但是,统计学领域是一门专注于研究方法论的学术学科。 统计人员的基本工作是开发新的统计工具,从数据中计算统计数据,并与专家合作以适当的方式解释这些结果。
Statistics is undoubtedly an evolving field and continuously growing. Furthermore, it provides challenges and opportunities.
统计学无疑是一个不断发展的领域,并且在不断增长。 此外,它提供了挑战和机遇。
In data science, numerous statistical methods’ are under continual study to understand how to use it properly. Lots of new application areas are available, and those areas are leading to the necessity to develop innovative analytical methods. For example, an idea of how to measure the data, and new types of methods available leads to new kinds of data that need analysis. Hence, we are often relying on those advances in computing, not only enabling us to do data analysis but also a more sophisticated analysis of the large volume of data collected.
在数据科学中,正在不断研究众多统计方法以了解如何正确使用它。 有许多新的应用领域可用,这些领域导致开发创新的分析方法的必要性。 例如,关于如何测量数据的想法以及可用的新型方法会导致需要分析的新型数据。 因此,我们经常依靠那些在计算上的进步,不仅使我们能够进行数据分析,而且能够对收集到的大量数据进行更复杂的分析。
Statistics is a significant discipline, especially for data scientists and there are numerous schools thought about the field of statistics. It is including brand-new ideas from theory, practical, and relevant fields.
统计学是一门重要的学科,特别是对于数据科学家而言,并且有许多流派思考统计学领域。 它包括来自理论,实践和相关领域的崭新想法。
Numerous viewpoints on the field of statistics are:* The ability of summarizing data
* The idea of uncertainty
* The idea of decisions
* The idea of variation
* The art of forecasting
* The approach of measurement
* The principle of data collection
汇总数据的能力 (The Ability of Summarizing Data)
Data can be terrifying because there is a condition to understand that data, which generally involves reducing and summarizing. The main goal of the data reduction is to make the dataset comprehensible to the human observer. Statisticians have different techniques for summarizing that data, which is required to achieve the goals for the data to be meaningful. Therefore, a statistician is well trained in using appropriate, precise, and effective methods for summarizing data.
数据之所以令人恐惧,是因为有一种条件来理解该数据,这通常涉及精简和汇总。 数据约简的主要目的是使数据集对于人类观察者而言是可理解的。 统计人员使用不同的技术来汇总数据,这是实现数据有意义的目标所必需的。 因此,统计学家在使用适当,精确和有效的方法来汇总数据方面受过良好的培训。
不确定性的想法 (The Idea of Uncertainty)
Data can be misleading. The primary purpose of developing the statistics fields is to get a structure and framework for evaluating data. Generally, insights from data are not 100% accurate, but it’s absurd that we have a way to quantify how far away reported findings may be from the truth. Some evaluation reports return with a margin of error. This margin of error gives an idea of what that possible variance will be between the published and the actual cases of public opinion.
数据可能会产生误导。 开发统计信息字段的主要目的是获得用于评估数据的结构和框架。 通常,来自数据的见解并不是100%准确的,但是我们有一种方法可以量化所报告的发现与事实之间的距离是荒谬的。 一些评估报告以误差幅度返回。 这种误差幅度使人们了解了公开发表的舆论与实际情况之间可能存在的差异。
决策思想 (The Idea of Decisions)
Understanding data is critical, leads to the need to be able to work on what we’ve discovered. There are some domains of statistics where that idea of decision-making is the ultimate goal of any statistical analysis. In the personal and professional journey, we are making decisions in the face of difficulty. We have to compare what are the costs and the benefits of the different approaches.
了解数据至关重要,因此需要能够对我们发现的内容进行处理。 在某些统计领域中,决策思想是任何统计分析的最终目标。 在个人和专业旅程中,我们面对困难时要做出决定。 我们必须比较不同方法的成本和收益。
For example, if a person finds that they might be at higher than average risk for a specific type of cancer, should they undergo a preventative procedure? Statistics can help in the decision-making process.
例如,如果某人发现自己患某种特定癌症的风险可能高于平均风险,那么他们是否应该采取预防措施? 统计可以帮助决策过程。
变化的想法 (The Idea of Variation)
When we summarize data, commonly, our primary focus is on typical or central value. This means we have to place a high emphasis on understanding variation in data from a statistics perspective. For instance, if you know that on average Americans have around $8,000 of credit card bills each month, you have a good idea of that central value for credit card debt distribution. If you are provided that about 10 per cent more, that percentile gives you a bit more information about the variability in credit card debt.
通常,当我们汇总数据时,我们的主要重点是典型值或中心值。 这意味着我们必须高度重视从统计角度来理解数据的变化。 例如,如果您知道美国人平均每个月有大约8,000美元的信用卡账单,那么您应该很好地了解信用卡债务分配的核心价值。 如果提供给您的信息大约多10%,则该百分比为您提供了更多有关信用卡债务可变性的信息。
预测的艺术 (The Art of Forecasting)
The fundamental responsibilities in statistics are forecasting or prediction. You don’t know the future with absolute certainty. Still, if you have effectively used the available data, it sometimes makes reasonably accurate predictions, such as weather predictions, stock market prices forecasting, and predicting the risk of a flood. Furthermore, trying to calculate future requests for the new product distribute to the market or predicting the outcome of an election.
统计的基本职责是预测或预测。 您无法绝对确定未来。 但是,如果您有效地使用了可用数据,它有时仍会做出相当准确的预测,例如天气预报,股市价格预测以及洪水风险。 此外,尝试计算对新产品向市场发布的未来要求或预测选举结果。
测量方法 (The Approach of Measurement)
Let’s say that you are collecting lots of data. Some of those variables are measured, and some of those can be measured with pretty high accuracy. A person’s age or height, and some variables are a little bit more challenging to measure. For instance, blood pressure varies minute to minute, so that’s a little bit more difficult to pin down. Also, there are those constructs such as mood, personality, and political ideology, which are much more difficult to define and quantify. Statistics play a significant role in constructing and evaluating useful approaches for measuring these difficulties in identifying concepts and assessing the quality of the various methods.
假设您正在收集大量数据。 这些变量中的一些是可以测量的,而某些变量可以非常高精度地测量。 一个人的年龄或身高以及一些变量的测量更具挑战性。 例如,血压每分钟变化一次,因此很难确定。 此外,还有诸如情绪,个性和政治意识形态等结构,这些结构很难定义和量化。 统计在构建和评估有用的方法中起着重要作用,这些方法可用来衡量在确定概念和评估各种方法的质量方面的这些困难。
数据收集原理 (The Principle of Data Collection)
Finally, statistics are the basis for principled data collection. Sometimes data can be costly and painful to collect. Resources restrict how much data can be obtained, which means if we have too little data, the findings will not be maximized. However, statistics provide an excellent way to manage this trade-off. You can get more data while knowing and allowing those resource limitations.
最后,统计数据是有原则的数据收集的基础。 有时,数据收集起来可能既昂贵又痛苦。 资源限制了可获取的数据量,这意味着如果我们的数据量太少,结果将不会被最大化。 但是,统计数据提供了一种管理这种折衷的极好方法。 在了解并允许这些资源限制的同时,您可以获取更多数据。
Back in ancient times, civilizations have been gathering data on harvests and population sizes. Right now, randomness and variation can be more mathematically defined. Modern statistics developed in the 19th century, coming from addressing topics from genetics, econometrics, and statistical theory progress in the 20th century with many new application areas in science and industry. For example, the appearance of the ability to have computers to do the data analysis. Next, the rise of Big Data, massive data, data science, and machine learning.
早在远古时代,文明就一直在收集有关收成和人口规模的数据。 现在,随机性和变异性可以在数学上进行更多定义。 现代统计学是在19世纪发展起来的,它来自于20世纪遗传学,计量经济学和统计理论进展的主题,在科学和工业中有许多新的应用领域。 例如,外观具有让计算机进行数据分析的能力。 接下来,大数据,海量数据,数据科学和机器学习的兴起。
Statistics positively has a lot of intersections with it’s allied fields.
积极地,统计数据与其相关领域有很多交集。
Computer science provides us with the algorithms, the structures for working with data, and the programming languages for manipulating that data. In mathematics, we get the language and the figures for showing some of these statistical concepts more concisely, and the tools to evaluate and interpret the properties of those analytical methods.
计算机科学为我们提供了算法,用于处理数据的结构以及用于处理该数据的编程语言。 在数学中,我们获得了用于更简洁地显示某些统计概念的语言和图形,以及用于评估和解释这些分析方法的属性的工具。
One branch of mathematics is probability theory, a critical part of the foundation of statistics that allows us to reveal the ideas of randomness and uncertainty.
数学的一个分支是概率论,它是统计学基础的关键部分,它使我们能够揭示随机性和不确定性的思想。
Then data science, which gives us the database management and machine learning, which infrastructure able to carry out data analysis.
然后是数据科学,它为我们提供了数据库管理和机器学习,哪些基础架构能够执行数据分析。
结论 (Conclusion)
Statistics have evolved from a small to be a significant allied in research and industry. Numerous different applications include computer vision, self-driving cars, facial recognition, recommender systems for online searching, and online purchasing.
在研究和行业中,统计数据已从很小的演变为重要的联盟。 许多不同的应用程序包括计算机视觉,自动驾驶汽车,面部识别,在线搜索的推荐系统和在线购买。
In the health domain, we have predictive and analytics, precision medicine, fraud detection, risk assessment in environment and infrastructure, social and government services in terms of job training, and behavioural therapy. Statistics and statistical thinking help us to understand that data and that information that surrounds us.
在健康领域,我们提供预测和分析,精准医学,欺诈检测,环境和基础设施中的风险评估,在职业培训方面的社会和政府服务以及行为疗法。 统计和统计思考有助于我们理解周围的数据和信息。
关于作者 (About the Author)
Wie Kiang is a researcher who is responsible for collecting, organizing, and analyzing opinions and data to solve problems, explore issues, and predict trends.
Wie Kiang是一名研究人员,负责收集,组织和分析意见和数据以解决问题,探索问题和预测趋势。
He is working in almost every sector of Machine Learning and Deep Learning. He is carrying out experiments and investigations in a range of areas, including Convolutional Neural Networks, Natural Language Processing, and Recurrent Neural Networks.
他几乎在机器学习和深度学习的每个领域工作。 他正在许多领域进行实验和研究,包括卷积神经网络,自然语言处理和递归神经网络。
翻译自: https://towardsdatascience.com/the-role-of-statistics-in-the-industry-d360f3056e4b
统计信息在数据库中的作用
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/390765.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!