使用K-Means对美因河畔法兰克福的社区进行聚类

介绍 (Introduction)

This blog post summarizes the results of the Capstone Project in the IBM Data Science Specialization on Coursera. Within the project, the districts of Frankfurt am Main in Germany shall be clustered according to their venue data using the K-Means clustering algorithm. The first section describes the Business problem that we will be dealing with. Then we shall take a look at the data that can be used to solve the problem and the methodology for finding a solution.

这篇博客文章总结了Coursera上IBM Data Science Specialization中Capstone项目的结果。 在项目内,应使用K-Means聚类算法根据其场地数据对德国美因河畔法兰克福地区进行聚类。 第一部分描述了我们将要处理的业务问题。 然后,我们将研究可用于解决问题的数据和找到解决方案的方法。

业务问题 (Business Problem)

A client is interested in opening a franchise of their Asian restaurant chain in the city of Frankfurt am Main, preferably close to the city center. It will be their first restaurant in the city, and they want us to find out which would be the best neighborhood/district to open an Asian restaurant in the city. Additionally, the results of the clustering algorithm t can also be used by someone interested in moving to Frankfurt and wanting to know about the cuisines available in the various districts.

客户有兴趣在美因河畔法兰克福市(最好是靠近市中心)开设其亚洲餐厅连锁店的特许经营权。 这将是他们在这座城市的第一家餐厅,他们希望我们找出哪一个是在城市开设亚洲餐厅的最佳社区/地区。 另外,聚类算法t的结果也可以供有兴趣移居法兰克福并希望了解各个地区可用美食的人使用。

数据 (Data)

Following datasets have been used in this project:

在该项目中使用了以下数据集:

  1. Street Directory of the city of Frankfurt am Main: https://offenedaten.frankfurt.de/dataset/strassenverzeichnis-der-stadt-frankfurt-am-main

    美因河畔法兰克福市街道目录: https : //offenedaten.frankfurt.de/dataset/strassenverzeichnis-der-stadt-frankfurt-am-main

  2. Foursquare API to get the most common venues in Frankfurt districts.

    Foursquare API获得法兰克福地区最常见的场所。
  3. Demographics of Frankfurt am Main Neighborhoods : https://offenedaten.frankfurt.de/dataset/stadtteilprofile-bevoelkerung

    法兰克福主要社区的人口统计学: https : //offenedaten.frankfurt.de/dataset/stadtteilprofile-bevoelkerung

  4. Election Atlas 2015 — GeoJSON Frankfurt neighborhoods: https://offenedaten.frankfurt.de/dataset/wahlatlas-2015-geodaten/resource/84dff094-ab75-431f-8c64-39606672f1da

    2015年选举地图集-法兰克福GeoJSON社区: https : //offenedaten.frankfurt.de/dataset/wahlatlas-2015-geodaten/resource/84dff094-ab75-431f-8c64-39606672f1da

数据收集与清理 (Data Gathering and cleaning)

We will analyze the districts of the city of Frankfurt am Main in this project. The datasets are available as CSV files which can be converted into a pandas dataframe using the pd.read_csv function inbuilt in pandas.

我们将在此项目中分析美因河畔法兰克福市的地区。 数据集以CSV文件形式提供,可以使用内置在pandas中的pd.read_csv函数将其转换为pandas数据框。

Data 1: Street directory of Frankfurt am Main:

数据1:美因河畔法兰克福的街道目录:

This dataset will be used to extract the district names and postcodes in Frankfurt. It is available as a CSV file and can be accessed via the link given above. Frankfurt contains 46 city districts. This is a huge dataset containing 4540 rows and 15 columns. Therefore, it was necessary to shorten and clean it by keeping only the data that is required. It is a street directory, which is why the dataset is so big. It was shortened to extract only the district names and postcodes. The resultant dataset contained 46 rows (one for each district) and 3 columns.

该数据集将用于提取法兰克福的地区名称和邮政编码。 它以CSV文件的形式提供,可以通过上面给出的链接进行访问。 法兰克福包含46个市区。 这是一个巨大的数据集,包含4540行和15列。 因此,有必要通过仅保留所需的数据来缩短和清理它。 这是街道目录,因此数据集如此之大。 缩短了提取区域名称和邮政编码的时间。 结果数据集包含46行(每个区一个)和3列。

Data 2 :

数据2:

The geographical coordinates of the districts will be utilized as input for Foursquare API that will be leveraged to extract information for each district respectively. We will use the Foursquare API to explore the districts in Frankfurt. We use Foursquare API to get the most common venues for each district. Foursquare returns a JSON file, from which required data needs to be extracted. We only extract the venue name, category, and geographical coordinates for each venue. These are then stored in a separate dataframe, for use in clustering.

地区的地理坐标将被用作Foursquare API的输入,Foursquare API将被用于分别提取每个地区的信息。 我们将使用Foursquare API探索法兰克福地区。 我们使用Foursquare API获取每个地区最常见的场所。 Foursquare返回一个JSON文件,需要从中提取所需的数据。 我们仅提取每个场地的场地名称,类别和地理坐标。 然后将它们存储在单独的数据框中,以用于群集。

Data 3: Frankfurt Demographics:

资料3:法兰克福客层:

This dataset contains the district-wise distribution of population for the city of Frankfurt. It also contains useful data about the percentage of foreigners and specifically, population of various ethnicities in the districts. It contains 46 rows (one for each district) and 164 columns. It needs to be shortened to analyze. Only the required columns were picked from this dataset, which contained information about the total population of each district, population of foreigners, and so on. Moreover, the column names are in German. These were translated into English for easy understanding.

该数据集包含法兰克福市的区域人口分布。 它还包含有关外国人百分比,特别是各地区不同种族人口的有用数据。 它包含46行(每个区一个)和164列。 需要缩短分析时间。 从此数据集中仅选择了必需的列,其中包含有关每个地区的总人口,外国人的人口等信息。 此外,列名是德语。 这些被翻译成英文以便于理解。

Data 4: Frankfurt neighborhoods GeoJSON:

数据4:法兰克福社区GeoJSON:

The geoJSON file is required for plotting the Choropleth maps to analyze the demographics of Frankfurt districts. The district names in this file must match the district names in the dataset which is intended to be plotted. After checking, it was found that the districts of Bahnhofsviertel and Gutleutviertel are combined into a single district in the geoJSON file. Thus, the 2 district rows were merged in the demographics dataset. Also, there was an issue with the German letters containing umlauts, i.e. ü, ä, ö. Hence, districts containing these letters were also renamed as per the characters found in their equivalent names in the geoJSON file.

绘制Choropleth地图以分析法兰克福地区的人口统计信息时,需要geoJSON文件。 该文件中的区域名称必须与要绘制的数据集中的区域名称匹配。 检查之后,发现在geoJSON文件中,Bahnhofsviertel和Gutleutviertel的区域合并为一个区域。 因此,这2个地区行已合并到人口统计数据集中。 另外,包含变音符号(即ü,ä,ö)的德语字母也存在问题。 因此,包含这些字母的地区也根据geoJSON文件中相同名称中的字符进行了重命名。

方法 (Methodology)

Analytical Approach

分析方法

We shall first use k-means clustering to cluster the neighborhoods in Frankfurt. Frankfurt has 46 districts. We shall use the geocoder to get the geographical coordinates for each of these districts. We will use Foursquare API to explore the districts using their coordinates and get the most common venues in each district. Based on this information, we shall cluster the districts using k-means and take a look at each cluster. We need to look at clusters with a greater number of Asian and similar cuisine restaurants, as that indicates that there is demand for Asian cuisine in that cluster.

我们将首先使用k-means聚类对法兰克福的社区进行聚类。 法兰克福有46个区。 我们将使用地理编码器获取这些地区中每个地区的地理坐标。 我们将使用Foursquare API使用坐标来探索区域,并获取每个区域中最常见的场所。 基于此信息,我们将使用k均值对区域进行聚类,并查看每个聚类。 我们需要查看具有更多亚洲和类似美食餐厅的集群,因为这表明该集群中对亚洲美食有需求。

Then we shall use the demographics data to find the districts with a greater population and compare that with the cluster data. We shall find districts that have more Asian restaurants as well as a sizeable Asian population, as these will be ideal for opening a new Asian restaurant. Additionally, we shall also look at closeby districts with lesser Asian restaurants but a sizeable Asian population, as this is also a good prospect, due to less competition in the area.

然后,我们将使用人口统计数据查找人口较多的地区,并将其与聚类数据进行比较。 我们将找到拥有更多亚洲餐厅以及大量亚洲人口的地区,因为这些地区对于开设新的亚洲餐厅非常理想。 此外,我们还将关注亚洲餐馆较少但亚洲人口众多的附近地区,因为由于该地区竞争较少,这也是一个很好的前景。

Image for post
Photo by oxana v on Unsplash
oxana v在Unsplash上的照片

The street directory dataset is scraped and sliced to ultimately obtain just a list of districts in Frankfurt am Main along with their postal codes.

街道目录数据集将被剪切和切片,最终仅可获得美因河畔法兰克福的地区列表以及其邮政编码。

We require the geographical coordinates of the districts to plot on a map using Folium. These are not readily available in the dataset. We obtain the latitude and longitude for each district using Geopy- geopy is a Python 2 and 3 client for several popular geocoding web services.

我们要求使用Folium在地图上绘制区域的地理坐标。 这些在数据集中并不容易获得。 我们使用Geopy获得每个地区的纬度和经度。geopy是Python 2和3客户端,用于几种流行的地理编码Web服务。

Geopy makes it easy for Python developers to locate the coordinates of addresses, cities, countries, and landmarks across the globe using third-party geocoders and other data sources to get the data.

Geopy使Python开发人员可以使用第三方地理编码器和其他数据源轻松获取全球地址,城市,国家和地标的坐标,以获取数据。

Image for post
Map of districts in Frankfurt am Main plotted using Folium
使用Folium绘制的美因河畔法兰克福地区地图

Next, the top 100 venues shall be fetched for each postal code. For this task, an API call to the Foursquare API is performed. The Foursquare API offers location data from all over the world for business purposes as well as for developers. The required format of the URL for performing an API call to the Foursquare API is displayed below. A developer only needs a free developer account.

接下来,应为每个邮政编码获取前100个场所。 对于此任务,执行对Foursquare API的API调用。 Foursquare API提供了来自世界各地的位置数据,用于商业目的以及开发人员。 下面显示了执行对Foursquare API的API调用所需的URL格式。 开发人员只需要一个免费的开发人员帐户。

Image for post
Python code for making a call to the Foursquare API
用于调用Foursquare API的Python代码

The received venues are stored in a new dataframe. We check for the number of unique venue categories present in the data returned by Foursquare. It turns out there are 188 unique venue categories in Frankfurt.

接收到的场所将存储在新的数据框中。 我们检查Foursquare返回的数据中存在的唯一场所类别的数量。 事实证明,法兰克福有188个独特的场馆类别。

Next up, we need to prepare the data for the K-means clustering algorithm. It cannot work with textual data or more commonly known as categorical data. Hence we need to encode the data using one-hot encoding. The encoded data is then grouped by District name in order to have 1 row for each district. When the data gets grouped, the one-hot encoded categories get summed up if a venue category appears more than once within a district. In order to have values at the same scale and smaller than one, the mean of the frequency of occurrence of each category is calculated and stored.

接下来,我们需要为K-means聚类算法准备数据。 它不能与文本数据或更常用的分类数据一起使用。 因此,我们需要使用一键编码对数据进行编码。 然后按地区名称对编码数据进行分组,以便每个地区有1行。 对数据进行分组后,如果场所类别在一个区域中出现多次,则将对一键编码类别进行汇总。 为了使值具有相同的标度并且小于1,计算并存储每个类别的出现频率的平均值。

In order to get more insights into the data, the top 10 most common venues for each district are obtained and a separate dataframe is created to store these.

为了更深入地了解数据,获取了每个地区的前10个最常见的场所,并创建了一个单独的数据框来存储这些场所。

Image for post
Dataframe containing top 10 most common venues for each district
数据框包含每个区的前10个最常见的场所

使用K均值聚类 (Clustering using K-means)

The one-hot encoded and grouped data is the input to the K-means algorithm and the number of clusters is set to five. We use the scikit-learn library for the K-means algorithm. The district column is dropped as it is textual data and we need to cluster using only the encoded values. The resulting cluster labels are then additionally stored in the data frame containing the ten most common venues for each district.

一键编码和分组的数据是K-means算法的输入,并且簇数设置为五个。 我们将scikit-learn库用于K-means算法。 区域列被删除,因为它是文本数据,因此我们只需要使用编码后的值进行聚类。 然后,将生成的聚类标签另外存储在包含每个地区十个最常见场所的数据框中。

Image for post
Python code for K-means clustering
用于K均值聚类的Python代码
Image for post
Dataframe containing the cluster labels along with the top 10 venues for each district
数据框包含群集标签以及每个区的前10个场所

The dataframe containing the cluster labels and top venues is then merged with the dataframe containing latitude and longitude as seen in image above. This data was then used to visualize the clusters on a map using Folium.

然后,将包含聚类标签和顶部地点的数据框与包含纬度和经度的数据框合并,如上图所示。 然后使用Folium将这些数据用于在地图上可视化群集。

Image for post
Map of clustered districts — Frankfurt am Main
集聚区地图—美因河畔法兰克福

We then look at each cluster and based on the most common venues, we can name them and make decisions on which cluster is suitable for opening a new Asian restaurant.

然后,我们查看每个集群,并根据最常见的场所进行命名,并确定哪个集群适合开设新的亚洲餐厅。

观察结果 (Observations)

We observe that the purple and light green clusters contain the most districts and the most number of venues. While the light green cluster contains more restaurants, the purple cluster contains more hotels, which indicates tourists. We can see that a variety of cuisines are offered in the light green cluster, indicating that they cater to a variety of customers. Most of the districts are located close to the city center. These factors make this cluster the most eligible for opening a new Asian restaurant.

我们观察到紫色和浅绿色的群集包含最多的区域和最多的场所。 浅绿色的群集包含更多的餐厅,而紫色的群集包含更多的酒店,表示游客。 我们可以看到,浅绿色群集中提供了多种美食,表明它们可以满足各种客户的需求。 大多数地区都靠近市中心。 这些因素使该集群最有资格开设新的亚洲餐厅。

The purple cluster, on the other hand, although it does not contain many restaurants, has a lot of hotels and is pretty close to the city center. Presence of hotels indicates an influx of tourists, some of them Asian, meaning more prospective customers and if one finds a location not too far from the city center, an Asian restaurant here could flourish.

另一方面,紫色群集虽然没有很多餐厅,但拥有许多旅馆,并且非常靠近市中心。 旅馆的存在表明游客的涌入,其中一些是亚洲人,这意味着潜在的顾客更多,如果发现离市中心不远的地点,这里的亚洲餐馆可能会兴旺。

To know which district specifically would be perfect for opening an Asian restaurant, we look at the district-wise demographics of Frankfurt am Main, and then explore districts from both the light green and purple clusters.

要了解哪个区域最适合开设亚洲餐厅,我们先看一下美因河畔法兰克福的区域人口统计信息,然后从浅绿色和紫色群集中探索区域。

数据探索-法兰克福人口统计 (Data Exploration — Frankfurt demographics)

The demographics dataset contains district-wise distribution of population for the city of Frankfurt. It also contains useful data about the percentage of foreigners and specifically, population of various ethnicities in the districts. Only the required columns were picked from this dataset, which contained information about the total population of each district, population of foreigners, and so on. This dataset was then merged with the dataset containing the latitude and longitudes of the districts. The resulting dataset is as seen below.

人口统计数据集包含法兰克福市的区域人口分布。 它还包含有关外国人百分比,特别是各地区不同种族人口的有用数据。 从该数据集中仅选择了必需的列,其中包含有关每个地区的总人口,外国人的人口等信息。 然后将此数据集与包含地区纬度和经度的数据集合并。 结果数据集如下所示。

Image for post
Frankfurt demographics data overview
法兰克福人口统计数据概述

使用Choropleth映射进行数据可视化 (Data visualization using Choropleth maps)

The data from the demographics dataset is then plotted on a Choropleth map to visualize the population distribution across the city of Frankfurt. This data will then be used to select districts based on the earlier clustering results to explore further.

然后,将人口统计数据集中的数据绘制在Choropleth地图上,以可视化法兰克福市的人口分布。 然后,将根据较早的聚类结果将这些数据用于选择地区,以进行进一步的探索。

Image for post
District-wise population distribution — Frankfurt am Main
地区人口分布—美因河畔法兰克福

From this map, we observe that the central districts have the highest populations in Frankfurt, along with the district of Flughafen on the outskirts.

从这张地图中,我们观察到法兰克福以及法兰克福郊区的Flughafen地区人口最多。

Next, we take a look at the distribution of Asian and Australian population in Frankfurt.

接下来,我们来看看法兰克福的亚洲和澳大利亚人口分布。

Image for post
District-wise distribution of Asian and Australian population — Frankfurt am Main
亚洲和澳大利亚人口的地区分布—美因河畔法兰克福

We can see from the above maps, that the districts of Bockenheim and Gallus have the highest population of Asians and Australians. Out of these, Bockenheim comes under the light green cluster, and Gallus comes under the purple cluster. These 2 neighborhoods are then explored to find out the number of Asian or similar cuisine restaurants in these districts.

从上面的地图我们可以看到,博肯海姆和盖洛斯地区的亚洲人和澳大利亚人数量最多。 其中,博肯海姆位于浅绿色的星团之下,而盖洛斯位于紫色的星团之下。 然后探索这两个街区,以找出这些地区中亚洲或类似餐厅的数量。

  1. Bockenheim

    博肯海姆

Image for post
Asian or similar cuisine restaurants in Bockenheim
博肯海姆亚洲风味餐厅

2. Gallus

2.捷拉斯

Image for post
Asian or similar cuisine restaurants in Gallus
加卢斯亚洲料理或类似餐厅

3. Niederrad

3.尼德拉德

Image for post
Asian or similar cuisine restaurants in Niederrad
尼德拉德亚洲风味餐厅

结果和讨论 (Results and Discussion)

By clustering the districts in Frankfurt and subsequently analyzing the district-wise demographics of the city, and then merging the two findings, we could arrive at 3 prospective neighborhoods that would be ideal for opening an Asian restaurant in the city.

通过将法兰克福的各个区域进行聚类,然后分析该城市的区域人口统计资料,然后合并这两个发现,我们可以得出3个潜在的社区,这对于在该城市开设亚洲餐厅非常理想。

1. Bockenheim:

1.博肯海姆:

Bockenheim falls in the light green cluster and is very close to the city center. It has 7 Asian restaurants which shows that there is a lot of demand for Asian cuisine in the area. It also has the highest population of Asians in the city at 1586.

博肯海姆(Bockenheim)落在浅绿色的集群中,非常靠近市中心。 它拥有7家亚洲餐厅,这表明该地区对亚洲美食的需求很大。 1586年,该市也是亚洲人口最多的城市。

2. Gallus:

2.捷拉斯:

Gallus is in the purple cluster containing a greater number of hotels. It is not far from the city center and has 5 Asian restaurants indicating that there is demand here as well. It has the second-highest population of Asians in the city at 1512. Hence, this seems like a better option than Bockenheim for opening an Asian restaurant owing to lesser competition, similar Asian population, and more prospective customers in the form of tourists.

捷拉斯位于包含大量酒店的紫色集群中。 它距离市中心不远,有5家亚洲餐厅,表明这里也有需求。 在1512年,它是该市第二大亚裔人口。因此,这似乎比博肯海姆(Bockenheim)开设亚洲餐馆更好的选择,原因是竞争较少,亚洲人口相似,并且游客形式更趋于潜在客户。

3. Niederrad:

3.尼德拉德:

Niederrad is also in the purple cluster having more hotels. It is also not far from the city center but has only 1 Asian restaurant — much less than both Bockenheim and Gallus. Niederrad also has a sizeable Asian population at 929, although a bit less than the other 2 districts in contention. Since it is in the purple cluster, we can expect more tourists in this district. We see that there are 3 hotels in the area. This translates to more prospective customers. Hence, this also seems like a good alternative to Gallus owing to much lesser competition, proximity to the city center, and more tourists.

尼德拉德(Niederrad)也在紫色集群中,拥有更多的酒店。 它也离市中心不远,但是只有1家亚洲餐厅-比Bockenheim和Gallus都少得多。 尼德拉德(Niederrad)在929年的亚洲人口也相当可观,尽管在争夺中比其他两个地区要少一些。 由于它位于紫色集群中,因此我们可以期望这个地区有更多游客。 我们发现该地区有3家酒店。 这转化为更多潜在客户。 因此,由于竞争少,靠近市中心且游客多,这似乎是捷拉斯的一个不错的选择。

结论: (Conclusion:)

The neighborhoods in Frankfurt am Main were clustered and displayed on a map containing the results. The demographics were studied and based on the findings, 3 districts were found to be ideal as a solution to the Business problem of opening an Asian restaurant. The client can choose any of the 3 neighborhoods to open an Asian restaurant, based on their preferences, confidence, and affinity to risk-taking.

美因河畔法兰克福的社区被聚类并显示在包含结果的地图上。 研究了人口统计信息,并根据调查结果,发现了3个地区是解决开设亚洲餐厅的业务问题的理想选择。 客户可以根据自己的喜好,信心和对冒险的意愿,选择3个街区中的任何一个开设亚洲餐厅。

翻译自: https://medium.com/swlh/clustering-neighborhoods-in-frankfurt-am-main-using-k-means-bb805545fd00

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389349.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

样本均值的抽样分布_抽样分布样本均值

样本均值的抽样分布One of the most important concepts discussed in the context of inferential data analysis is the idea of sampling distributions. Understanding sampling distributions helps us better comprehend and interpret results from our descriptive as …

玩转ceph性能测试---对象存储(一)

笔者最近在工作中需要测试ceph的rgw,于是边测试边学习。首先工具采用的intel的一个开源工具cosbench,这也是业界主流的对象存储测试工具。 1、cosbench的安装,启动下载最新的cosbench包wget https://github.com/intel-cloud/cosbench/release…

因果关系和相关关系 大数据_数据科学中的相关性与因果关系

因果关系和相关关系 大数据Let’s jump into it right away.让我们马上进入。 相关性 (Correlation) Correlation means relationship and association to another variable. For example, a movement in one variable associates with the movement in another variable. For…

vue取数据第一个数据_我作为数据科学家的第一个月

vue取数据第一个数据A lot.很多。 I landed my first job as a Data Scientist at the beginning of August, and like any new job, there’s a lot of information to take in at once.我于8月初找到了数据科学家的第一份工作,并且像任何新工作一样,一…

STL-开篇

基本概念 STL: Standard Template Library,标准模板库 定义: c引入的一个标准类库 特点:1)数据结构和算法的 c实现( 采用模板类和模板函数)2)数据的存储和算法的分离3)高…

rcp rapido_为什么气流非常适合Rapido

rcp rapidoBack in 2019, when we were building our data platform, we started building the data platform with Hadoop 2.8 and Apache Hive, managing our own HDFS. The need for managing workflows whether it’s data pipelines, i.e. ETL’s, machine learning predi…

Mysql5.7开启远程

2019独角兽企业重金招聘Python工程师标准>>> 1.注掉bind-address #bind-address 127.0.0.1 2.开启远程访问权限 grant all privileges on *.* to root"xxx.xxx.xxx.xxx" identified by "密码"; 或 grant all privileges on *.* to root"%…

分类结果可视化python_可视化分类结果的另一种方法

分类结果可视化pythonI love good data visualizations. Back in the days when I did my PhD in particle physics, I was stunned by the histograms my colleagues built and how much information was accumulated in one single plot.我喜欢出色的数据可视化。 早在我获得…

算法组合 优化算法_算法交易简化了风险价值和投资组合优化

算法组合 优化算法Photo by Markus Spiske (left) and Jamie Street (right) on UnsplashMarkus Spiske (左)和Jamie Street(右)在Unsplash上的照片 In the last post, we saw how actual algorithms are developed and tested. In this post, we will figure out the level of…

PS抠发丝技巧 「选择并遮住…」

PS抠发丝技巧 「选择并遮住…」 现在的海报设计,大多数都有模特MM,然而MM的头发实用太多了,有的还飘起来…… 对于设计师(特别是淘宝美工)没有一个强大、快速、实用的抠发丝技巧真的混不去哦。而PS CC 2017版本开始,就有了一个强大…

covid 19如何重塑美国科技公司的工作文化

未来 , 技术 , 观点 (Future, Technology, Opinion) Who would have thought that a single virus would take down the whole world and make us stay inside our homes? A pandemic wave that has altered our lives in such a way that no human (bi…

python生日悖论分析_生日悖论

python生日悖论分析If you have a group of people in a room, how many do you need to for it to be more likely than not, that two or more will have the same birthday?如果您在一个房间里有一群人,那么您需要多少个才能使两个或两个以上的人有相同的生日&a…

rstudio 管道符号_R中的管道指南

rstudio 管道符号R基础知识 (R Fundamentals) Data analysis often involves many steps. A typical journey from raw data to results might involve filtering cases, transforming values, summarising data, and then running a statistical test. But how can we link al…

蒙特卡洛模拟预测股票_使用蒙特卡洛模拟来预测极端天气事件

蒙特卡洛模拟预测股票In a previous article, I outlined the limitations of conventional time series models such as ARIMA when it comes to forecasting extreme temperature values, which in and of themselves are outliers in the time series.在上一篇文章中 &#…

直方图绘制与直方图均衡化实现

一,直方图的绘制 1.直方图的概念: 在图像处理中,经常用到直方图,如颜色直方图、灰度直方图等。 图像的灰度直方图就描述了图像中灰度分布情况,能够很直观的展示出图像中各个灰度级所 占的多少。 图像的灰度直方图是灰…

时间序列因果关系_分析具有因果关系的时间序列干预:货币波动

时间序列因果关系When examining a time series, it is quite common to have an intervention influence that series at a particular point.在检查时间序列时,在特定时间点对该序列产生干预影响是很常见的。 Some examples of this could be:例如: …

微生物 研究_微生物监测如何工作,为何如此重要

微生物 研究Background背景 While a New York Subway station is bustling with swarms of businessmen, students, artists, and millions of other city-goers every day, its floors, railings, stairways, toilets, walls, kiosks, and benches are teeming with non-huma…

Linux shell 脚本SDK 打包实践, 收集assets和apk, 上传FTP

2019独角兽企业重金招聘Python工程师标准>>> git config user.name "jenkins" git config user.email "jenkinsgerrit.XXX.net" cp $JENKINS_HOME/maven.properties $WORKSPACE cp $JENKINS_HOME/maven.properties $WORKSPACE/app cp $JENKINS_…

opencv:卷积涉及的基础概念,Sobel边缘检测代码实现及卷积填充模式

具体参考我的另一篇文章: opencv:卷积涉及的基础概念,Sobel边缘检测代码实现及Same(相同)填充与Vaild(有效)填充 这里是对这一篇文章的补充! 卷积—三种填充模式 橙色部分为image, 蓝色部分为…

无法从套接字中获取更多数据_数据科学中应引起更多关注的一个组成部分

无法从套接字中获取更多数据介绍 (Introduction) Data science, machine learning, artificial intelligence, those terms are all over the news. They get everyone excited with the promises of automation, new savings or higher earnings, new features, markets or te…