rstudio 关联r_使用关联规则提出建议(R编程)

rstudio 关联r

背景 (Background)

Retailers typically have a wealth of customer transaction data which consists of the type of items purchased by a customer, their value and the date they were purchased. Unless the retailer has a loyalty rewards system, they may not have demographic information on their customers such as height, age, gender and address. Thus, in order to make suggestions on what this customer might want to buy in the future, i.e which products to recommend to a customer, this has to be based on their purchase history and information on the purchase history of other customers.

零售商通常拥有大量的客户交易数据,这些数据包括客户购买的商品的类型,其价值和购买日期。 除非零售商具有忠诚度奖励制度,否则他们可能没有客户的人口统计信息,例如身高,年龄,性别和地址。 因此,为了提出关于该顾客将来可能想要购买什么的建议,即向顾客推荐哪些产品,这必须基于他们的购买历史和关于其他顾客的购买历史的信息。

In collaborative filtering, recommendations are made to customers based on finding similarities between the purchase history of customers. So, if Customers A and B both purchase Product A, but customer B also purchases Product B, then it is likely that customer A may also be interested in Product B. This is a very simple example and there are various algorithms that can be used to find out how similar customers are in order to make recommendations.

协同过滤中 ,会根据发现客户购买历史之间的相似性来向客户提出建议。 因此,如果客户A和客户B都购买了产品A,但客户B也购买了产品B,则客户A可能也对产品B感兴趣。这是一个非常简单的示例,可以使用多种算法找出相似的客户以提出建议。

One such algorithm is k-nearest neighbour where the objective is to find k customers that are most similar to the target customer. It involves choosing a k and a similarity metric (with Euclidean distance being most common). The basis of this algorithm is that points that are closest in space to each other are also likely to be most similar to each other.

一种这样的算法是k最近邻居 ,其目的是找到与目标客户最相似的k客户。 它涉及选择k和相似性度量(以欧几里得距离最为常见)。 该算法的基础是,在空间上彼此最接近的点也可能彼此最相似。

Another techinque is to use basket analysis or association rules. In this method, the aim is to find out which items are bought together (put in the same basket) and the frequency of this purchase. The output of this algorithm is a series of if-then rules i.e. if a customer buys a candle, then they are also likely to buy matches. Association rules can assist retailers with the following:

另一种技术是使用购物篮分析关联规则。 在这种方法中,目的是找出一起购买的物品(放在同一篮子中)和购买的频率。 该算法的输出是一系列的if-then规则,即,如果客户购买了一支蜡烛,那么他们也很可能会购买火柴。 关联规则可以协助零售商进行以下工作:

  • Modifying store layout where associated items are stocked together;

    修改将相关物料存放在一起的商店布局;
  • Sending emails to customers with recommendations on products to purchase based on their previous purchase (i.e. we noticed you bought a candle, perhaps these matches may interest you?); and

    向客户发送电子邮件,并根据他们先前的购买建议购买产品(即,我们注意到您购买了一支蜡烛,也许这些匹配可能会让您感兴趣?); 和
  • Insights into customer behaviour

    洞察客户行为

Let’s now apply association rules to a dummy dataset

现在让我们将关联规则应用于虚拟数据集

数据集 (The dataset)

A dataset of 2,178,282 observations/rows and 16 variables/features was provided.

提供了2,178,282个观测/行和16个变量/特征的数据集。

The first thing I did with this dataset was quickly check for any missing values or NAs as per follows. As shown below, no missing values were found.

我对此数据集所做的第一件事是按照以下步骤快速检查是否有任何缺失值或NA。 如下所示,未找到缺失值。

Now the variables were all either read in as numeric or string variables. In order to meaningfully interpret categorical variables, they need to be changed to factors. As such, the following changes were made.

现在,所有变量都以数字或字符串变量形式读入。 为了有意义地解释分类变量,需要将其更改为因子。 因此,进行了以下更改。

retail <- retail %>% 
mutate(MerchCategoryName = as.factor(MerchCategoryName)) %>%
mutate(CategoryName = as.factor(CategoryName)) %>%
mutate(SubCategoryName = as.factor(SubCategoryName)) %>%
mutate(StoreState = as.factor(StoreState)) %>%
mutate(OrderType = as.factor(OrderType)) %>%
mutate (BasketID = as.numeric(BasketID)) %>%
mutate(MerchCategoryCode = as.numeric(MerchCategoryCode)) %>%
mutate(CategoryCode = as.numeric(CategoryCode)) %>%
mutate(SubCategoryCode = as.numeric(SubCategoryCode)) %>%
mutate(ProductName = as.factor(ProductName))

Then, all the numeric variables were summarised into their five-point summary (min, median, max, std dev., and mean) to identify any outliers within the data. By running this summary, it was found that the features MerchCategoryCode, CategoryCode, and SubCategoryCode contained a large number of NAs. Upon further inspection, it was found that the majority of these code values contained digits; however, the ones that had been converted to NAs contained characters such as “Freight” or the letter “C”. As these codes are not related to customer purchases, these observations were removed.

然后,将所有数值变量汇总到其五点汇总中(最小值,中位数,最大值,标准偏差和均值),以识别数据中的任何异常值。 通过运行这个总结,发现特征MerchCategoryCode,CategoryCode和 SubCategoryCode包含了大量的NAS。 经过进一步检查,发现这些代码值中的大多数包含数字。 但是,已转换为NA的字母包含“运费”或字母“ C”之类的字符。 由于这些代码与客户购买无关,因此删除了这些观察结果。

Negative gross sales and negative quantity indicate either erroneous values or customer returns. This may be interesting information; however, it is not related to our objective of analysis and as such these observations were omitted.

负销售总额和负数量表示错误的价值或客户退货。 这可能是有趣的信息。 但是,这与我们的分析目标无关,因此省略了这些观察。

数据探索 (Data Exploration)

It is always a good idea to explore the data to see if you can see any trends or patterns within the dataset. Later on, you can use an algorithm/machine learning model to validate these trends.

探索数据以查看是否可以看到数据集中的任何趋势或模式始终是一个好主意。 稍后,您可以使用算法/机器学习模型来验证这些趋势。

The graph below shows me that the highest number of transactions come from Victoria followed by Queensland. If a retailer wants to know where to increase sales then this plot may be useful as the number of sales are proportionately low in all other states.

下图显示了交易量最高的国家是维多利亚州,其次是昆士兰州。 如果零售商想知道在哪里增加销售额,那么该图可能很有用,因为在所有其他州,销售额均成比例降低。

Image for post

The below plot shows us that most gross sales values around >0-$40 (median is $37.60).

下图显示了大多数销售总额> 0- $ 40(中位数为$ 37.60)。

Image for post

We can also see this plot by state as below. However, the transactions from Victoria and Queensland seem to cover up information for other states. Boxplots may be better for visualisation.

我们还可以按状态查看此图,如下所示。 但是,维多利亚州和昆士兰州的交易似乎掩盖了其他州的信息。 箱线图可能更适合可视化。

Image for post

The below boxplots (though hard to see due to the scale being extended by the outliers) show that most sales across all states are close to the overall median. There in an abnormally high outlier for NT and a couple for VIC. For our purpose, since we are only interested in understanding which products do customers buy together in order to make recommendations, we do not need to deal with these outliers.

下面的方框图(由于异常值扩大了规模,因此很难看到)表明,所有州的大多数销售额都接近整体中位数。 NT异常高,VIC异常高。 就我们的目的而言,由于我们只想了解客户一起购买哪些产品以提出建议,因此我们不需要处理这些异常值。

Image for post

Now that we have had a look at sales by state. Let’s try and get a better understanding of the products purchased by customers.

现在,我们已经按州查看了销售额。 让我们尝试更好地了解客户购买的产品。

The plot below is coloured based on the frequency of purchases per item. Lighter shades of blue indicate higher frequencies.

下图是根据每件物品的购买频率着色的。 较浅的蓝色阴影表示频率较高。

Some key takeaways are:

一些关键要点是:

  • No sales for team sports in ACT, NSW, SA, and WA — could be due to these products not being stocked there or perhaps they need to be marketed better

    ACT,NSW,SA和WA的团队运动没有销售-可能是因为这些产品没有在那里库存,或者可能需要更好地销售
  • No sales for ski products in ACT, NSW, SA, and WA. I find this quite shocking as NSW and ACT are quite close to some major ski resorts like Thredbo. It is weird that there are ski product sales in QLD which experiences a warm climate throughout the year. Either these products have been mislabelled or they were not stocked in NSW and ACT.

    ACT,NSW,SA和WA的滑雪产品没有销售。 我觉得这很令人震惊,因为新南威尔士州和ACT靠近一些主要的滑雪胜地,如Thredbo。 奇怪的是,昆士兰州的滑雪产品销售全年都处于温暖的气候。 这些产品贴错了标签,或者没有存放在新南威尔士州和首都地区。
  • Paint and panel sales in WA only.

    仅在华盛顿州的油漆和面板销售。
  • Bike sales in VIC only.

    仅在VIC进行自行车销售。
  • Camping and apparel recorded highest sales in VIC, followed by Gas, Fuel and BBQing.

    露营和服装在维也纳国际中心的销售额最高,其次是天然气,燃料和烧烤。
Image for post

Due to the distribution of sales by product and state, it appears that any association rules we come up with will mainly be based on sales from VIC and QLD. Furthermore, as not all products were stocked/sold in all states, it is expected that the association rules will be limited to a very few number of products. However, since I have already embarked on this mode of analysis, let’s continue to see what we get.

由于按产品和州划分的销售额分布,看来我们提出的任何关联规则都将主要基于VIC和QLD的销售额。 此外,由于并非所有产品都在所有州都有库存/出售,因此,预计关联规则将限于极少数产品。 但是,由于我已经开始采用这种分析模式,所以让我们继续看看我们得到了什么。

We have two years worth of data, 2016 and 2017. So, I decided to compare the gross number of sales for the two years.

我们有2016年和2017年的两年数据,因此,我决定比较这两年的销售总额。

Despite the higher number of transactions in 2016 (2.5 times more than 2017), mean gross sales were higher for 2017 than 2016. This seems quite counter-intuitive. So, I decided to dive into this deeper by looking at monthly sales.

尽管2016年交易数量增加(比2017年增加了2.5倍),但2017年的平均销售总额却比2016年更高。这似乎是违反直觉的。 因此,我决定通过查看月度销售来更深入地研究。

Year# of TransactionsMean Gross Sales ($)2016 1481922 $69.02017 593315 $86.0

交易年份1481922销售总额($)2016 1481922 $ 69.02017 593315 $ 86.0

In 2016, the highest number of sales were recorded for January and March with steep declines in September to November and then an increase in December. However, transactions continued to decline in 2017 with an increase in December (Xmas season).

2016年,1月和3月的销售记录最高,9月至11月急剧下降,然后在12月上升。 但是,2017年交易继续下降,12月(圣诞节季节)有所增加。

Deduction: As highest number of sales are for Camping, apparel and BBQ & Gas, it makes sense that sales for these products is high during the holiday season

扣除 :由于露营,服装和烧烤与天然气的销售量最高,因此在假期期间这些产品的销售量很高

Recommendation to the retailer: May want to explore whether stores have sufficient stock for these products in Dec-Jan as they are the most popular.

给零售商的 建议 :可能想探索商店中是否有足够的库存来存放这些产品,因为它们是最受欢迎的产品。

Deduction: Despite the steady decline in the number of transactions, mean gross sales continue to increase month on month with it being highest in Dec 2017. This indicates fewer customers that made purchases but made purchases of products of greater value.

扣除额 :尽管交易数量稳步下降,但平均销售总额仍逐月增加,在2017年12月达到最高。这表明购买商品的顾客减少了,但购买了更高价值的商品。

Recommendation: What can the retailer do to ensure there is a steady state of purchases throughout the year rather than an increasing trend with maximum number of purchases at the end of the year as the retailer is still paying overhead costs and employee salaries amongst other costs to run its stores?

建议 :零售商应采取什么措施确保全年的采购状况稳定,而不是在年底增加采购数量的增加趋势,因为零售商仍需支付间接费用和员工薪金等开店?

购物篮分析/关联规则 (Basket Analysis/Association Rules)

Let’s go back to our objective.

让我们回到我们的目标。

Aim: To determine which products are customers likely to buy together in order to make recommendations for products

目的 :确定客户可能一起​​购买哪些产品,以便为产品提供建议

I used the arules package and the read.transactions function to convert the dataset into a transaction object. A summary of this object gives the following output

我使用了arules包和read.transactions函数将数据集转换为事务对象。 该对象的摘要提供以下输出

## transactions as itemMatrix in sparse format with
## 1019952 rows (elements/itemsets/transactions) and
## 21209 columns (items) and a density of 9.531951e-05
##
## most frequent items:
## GAS BOTTLE REFILL 9KG* GAS BOTTLE REFILL 4KG*
## 30628 11724
## 6 PACK BUTANE - WILD COUNTRY SNAP HOOK ALUMINIUM GRIPWELL
## 9209 7086
## PEG TENT GALV 225X6.3MM P04G (Other)
## 6948 1996372
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10
## 546138 234643 109888 55319 30185 16656 9878 6018 3716 2332
## 11 12 13 14 15 16 17 18 19 20
## 1611 993 751 490 353 237 157 140 99 88
## 21 22 23 24 25 26 27 28 29 30
## 53 48 28 31 20 13 12 15 8 1
## 31 32 33 34 35 36 37 38 39 40
## 4 2 4 3 4 1 4 2 1 4
## 43 46
## 1 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 1.000 2.022 2.000 46.000
##
## includes extended item information - examples:
## labels
## 1 10
## 2 11
## 3 11/12

Based on the output above, we can conclude the following.

根据上面的输出,我们可以得出以下结论。

  • There are 1019952 collections (baskets) of items and 21209 items.

    有1019952个项目(购物篮)和21209个项目。
  • Density measures the percentage of non-zero cells in a sparse matrix. It is the total number of items that are purchased divided by the possible number of items in that matrix. You can calculate how many items were purchased by using density: 1019952212090.0000953 = 2,061,545

    密度衡量的是稀疏矩阵中非零单元格的百分比。 它是购买的商品总数除以该矩阵中的可能商品数。 您可以使用密度来计算购买了多少商品:1019952 21209 0.0000953 = 2,061,545

  • Element (itemset/transaction) length distribution: This tells you you how many transactions are there for 1-itemset, for 2-itemset and so on. The first row is telling you the number of items and the second row is telling you the number of transactions.

    元素(项目集/事务)长度分布:告诉您1项目集,2项目集等的事务数量。 第一行告诉您项目的数量,第二行告诉您交易的数量。

  • Majority of baskets (87%) consist of between 1 to 3 items.

    大部分篮子(87%)由1至3个物品组成。
  • Minimum number of items in a basket = 1 and maximum = 46 (only one basket)

    一个篮子中的最小项目数= 1,最大= 46(仅一个篮子)
  • Most popular items are gas bottle, gas bottle refill, gripwell, and peg tent.

    最受欢迎的物品是气瓶,气瓶笔芯,握把和固定帐篷。

We can look at this information graphically via absolute frequency and relative frequency plots.

我们可以通过绝对频率图和相对频率图以图形方式查看此信息。

Both plots are in descending order of frequency of purchase. The absolute frequency plot tells us that the highest number of sales are for gas related products. The relative frequency plot shows how the sales of the products that are close to each other in the bar chart are related to each other (i.e. relative). Thus, a recommendation that one can make to the retailer is to stock these products together in the store or send customers an EDM making recommendations for products that are related in the plot and have not yet been purchased by the customer.

两种地块均按购买频率降序排列。 绝对频率图告诉我们,与气体相关的产品销量最高。 相对频率图显示了条形图中彼此接近的产品的销售额如何相互关联(即相对)。 因此,可以向零售商提出的建议是将这些产品一起存储在商店中,或者向客户发送EDM,以为该地块中相关但尚未被客户购买的产品提供建议。

Image for post
Image for post

The next step to do is to generate rules for our transaction object. The output is as follows.

下一步是为我们的交易对象生成规则。 输出如下。

## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 1019
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[21209 item(s), 1019952 transaction(s)] done [2.52s].
## sorting and recoding items ... [317 item(s)] done [0.04s].
## creating transaction tree ... done [0.84s].
## checking subsets of size 1 2 done [0.04s].
## writing ... [7 rule(s)] done [0.00s].
## creating S4 object ... done [0.25s].

The above output shows us that 7 rules were generated.

上面的输出向我们显示了生成了7条规则。

Details of these rules are shown below.

这些规则的详细信息如下所示。

## set of 7 rules
##
## rule length distribution (lhs + rhs):sizes
## 2
## 7
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2 2 2 2 2 2
##
## summary of quality measures:
## support confidence lift count
## Min. :0.001128 Min. :0.5458 Min. : 26.30 Min. :1150
## 1st Qu.:0.001464 1st Qu.:0.6395 1st Qu.: 80.36 1st Qu.:1493
## Median :0.001650 Median :0.6634 Median :154.58 Median :1683
## Mean :0.001652 Mean :0.6759 Mean :154.48 Mean :1685
## 3rd Qu.:0.001668 3rd Qu.:0.7265 3rd Qu.:245.30 3rd Qu.:1701
## Max. :0.002524 Max. :0.7898 Max. :249.14 Max. :2574
##
## mining info:
## data ntransactions support confidence
## tr 1019952 0.001 0.5

Now each of these rules have support, confidence, and lift values.

现在,每个规则都具有支持,信心和提升值。

Let’s start with support which is the proportion of transactions out of all transactions used to generate the rules (i.e. 1,019,952) that contain the two items together (i.e. 1190/1019952 = 0.0011 or 0.11%, where count is the number of transactions that contain the two items.

让我们从支持开始,这是用于生成包含两个项目的规则(即1,019,952)的所有交易中交易的比例(即1190/1019952 = 0.0011或0.11%,其中count是包含交易的数量)。两个项目。

Confidence is the proportion of transactions where two items are bought together out of all transactions where one of the item is purchased. As these are apriori rules, the probability of buying item B is based on the purchase of item A.

置信度是在购买一件商品的所有交易中,同时购买两项的交易所占的比例。 由于这些是先验规则,因此购买项目B的概率基于对项目A的购买。

Mathematically, this looks like the following:

从数学上讲,这类似于以下内容:

Confidence(A=>B) = P(A∩B) / P(A) = frequency(A,B) / frequency(A)

置信度(A​​ => B)= P(A∩B)/ P(A)=频率(A,B)/频率(A)

In the results above, confidence values range from 54% to 79%.

在以上结果中,置信度范围为54%至79%。

Probability of customers buying items together with confidence ranges from 54% to 79%, where buying item A has a positive effect on buying item B (as lift values are all greater than 1) .

客户购买商品的概率连同置信度在54%到79%之间,其中购买商品A对购买商品B有积极影响(因为提升值都大于1)。

Note: When I ran the algorithm, I experimented with higher support and confidence values as if there is a greater number of transactions within the dataset where two items are bought together then the higher the confidence. However, when I ran the algorithm with 80% or more confidence, I obtained zero rules.

注意:当我运行算法时,我尝试了更高的支持度和置信度值,好像在数据集中有两个项目一起购买的交易数量较多时,置信度越高。 但是,当我以80%或更高的置信度运行算法时,我获得了零规则。

This was expected due to the sparsity in data for frequent items where 1-item baskets are most common and the majority of purchased items related to camping or gas products.

可以预见,这是因为经常出现的物品(其中最常见的是1个项目的篮子)的数据稀疏,并且购买的大多数物品都与露营或天然气产品有关。

Thus, the algorithm was run with the following parameters.

因此,该算法使用以下参数运行。

association.rules <- apriori(tr, parameter = list(supp=0.001, conf=0.5,maxlen=10))

Lift indicates how two items are correlated to each other. A positive lift value indicates that buying item A is likely to result in a purchase of item B. Mathematically, lift is calculated as follows.

提升指示两个项目如何相互关联。 正提升值表示购买商品A可能导致购买商品B。在数学上, 提升计算如下。

Lift(A=>B) = Support / (Supp(A) * Supp(B) )

提升(A => B)=支撑/(支持(A)*支持(B))

All our rules have positive lift values indicating that buying item A is likely to lead to a purchase of item B.

我们所有的规则都具有正提升值,表明购买商品A可能导致购买商品B。

规则检查 (Rules inspection)

Let’s now inspect the rules.

现在让我们检查规则。

lhs    rhs   support confidence      lift count
## [1] {GAS BOTTLE 9KG POL CODE 2 DC} => {GAS BOTTLE REFILL 9KG*} 0.001650078 0.7897701 26.30036 1683
## [2] {WEBER BABY Q (Q1000) ROASTING TRIVET} => {WEBER BABY Q CONVECTION TRAY} 0.001127504 0.6526674 241.45428 1150
## [3] {GAS BOTTLE 2KG CODE 4 DC} => {GAS BOTTLE REFILL 2KG*} 0.001344181 0.7308102 154.58137 1371
## [4] {GAS BOTTLE 4KG POL CODE 2 DC} => {GAS BOTTLE REFILL 4KG*} 0.001583408 0.7222719 62.83544 1615
## [5] {YTH L J PP THERMAL OE} => {YTH LS TOP PP THERMAL OE} 0.001667726 0.6634165 249.13587 1701
## [6] {YTH LS TOP PP THERMAL OE} => {YTH L J PP THERMAL OE} 0.001667726 0.6262887 249.13587 1701
## [7] {UNI L J PP THERMAL OE} => {UNI L S TOP PP THERMAL OE} 0.002523648 0.5458015 97.88840 2574

Interpretation of the first rule is as follows:

第一条规则的解释如下:

If a customer buys the 9kg gas bottle, there is a 79% chance that customer will also buy its refill. This is identified for 1,683 transactions in the dataset.

如果客户购买了9公斤的气瓶,则客户也有79%的机会购买其补充装。 在数据集中 1,683个事务确定了这一点

Now, let’s look at these plots visually.

现在,让我们直观地查看这些图。

All rules have a confidence value greater than 0.5 with lift ranging from 26 to 249.

所有规则的置信度值都大于0.5,提升范围为26至249。

Image for post

The Parallel coordinates plot for the seven rules shows how the purchase of one product influences the purchase of another product. RHS is the item we propose the customer buy. For LHS, 2 is the most recent addition to the basket and 1 is the item that the customer previously purchased.

七个规则的平行坐标图显示了一种产品的购买如何影响另一种产品的购买。 RHS是我们建议客户购买的物品。 对于LHS,购物篮中最新添加了2个,客户先前购买的商品是1个。

Looking at the first arrow we can see that if a customer has Weber Baby (Q1000) roasting trivet in their basket, then they are likely to purchase weber babgy q convection tray.

查看第一个箭头,我们可以看到,如果客户的购物篮中装有Weber Baby(Q1000)烤三角架,那么他们很可能会购买Weber babgy q对流托盘。

The below plots would be more useful if we could visualize more than 2-itemset baskets.

如果我们可以可视化超过2个项目的购物篮,则以下图表将更加有用。

Image for post

结语 (Wrapping up)

You have now learnt how to make recommendations to customers based on which items are most frequently purchased together based on apriori rules. However, some important things to note about this analysis.

现在,您已经了解了如何根据先验规则,根据最常一起购买的商品向客户提出建议。 但是,有关此分析的一些重要注意事项。

  • The most popular/frequent items have confounded the analysis to some extent where it appears that we can only make recommendations with respect to only seven association rules with confidence. This is due to the uneven distribution of the number of items by frequency in the basket.

    最受欢迎/最常见的项目在某种程度上使分析变得混乱,因为我们似乎只能自信地针对七个关联规则提出建议。 这是由于篮子中物品数量按频率的不均匀分布所致。
  • Customer segmentation may be another approach for this dataset where customers are grouped by spend (SalesGross), product type (i.e. CategoryCode), StateStore, and time of sale (i.e. Month/Year). However, it would be useful to have more features on customers to do this effectively.

    客户细分可能是此数据集的另一种方法,其中按支出(SalesGross),产品类型(即CategoryCode),StateStore和销售时间(即月/年)对客户进行分组。 但是,为客户提供更多功能以有效地执行此操作将很有用。

Code and dataset: https://github.com/shedoesdatascience/basketanalysis

代码和数据集: https : //github.com/shedoesdatascience/basketanalysis

翻译自: https://towardsdatascience.com/making-recommendations-using-association-rules-r-programming-1fd891dc8d2e

rstudio 关联r

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388627.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

jquery数据折叠_通过位折叠缩小大数据

jquery数据折叠Sometimes your dataset is just too large, and you need a way to shrink it down to a reasonable size. I am suffering through this right now as I work on different machine learning techniques for checkers. I could work for over 18 years and buy…

新鬼影病毒

今天和明天是最后两天宿舍有空调的日子啦,暑假宿舍没空调啊,悲催T__T 好吧,今天是最精华的部分啦对于鬼影3的分析,剩下的都是浮云啦,alg.exe不准备分析了,能用OD调试的货.分析起来只是时间问题.但是MBR和之后的保护模式的代码就不一样啦同学们,纯静态分析,伤不起啊,各种硬编码,…

Silverlight:Downloader的使用(event篇)

(1)Downloader的使用首先我们看什么是Downloader,就是一个为描述Silverlight plug-in下载功能的集合.Downloader能异步的通过HTTP GET Request下载内容.他是一个能帮助Silverlight下载内容的一个对象,这些下载内容包括(XMAL content,JavaScript content,ZIP packages,Media,ima…

决策树信息熵计算_决策树熵|熵计算

决策树信息熵计算A decision tree is a very important supervised learning technique. It is basically a classification problem. It is a tree-shaped diagram that is used to represent the course of action. It contains the nodes and leaf nodes. it uses these nod…

Free SQLSever 2008的书

Introducing SQL Server 2008 http://csna01.libredigital.com/?urss1q2we6这是一本提供自由使用书&#xff01;我把它翻译&#xff0c;或转送有什么关系&#xff01;这样的书还是有几本吧&#xff0c;Introducing Linq,Introducting Silverlight,都是啊&#xff01;嘿嘿。。。…

流式数据分析_流式大数据分析

流式数据分析The recent years have seen a considerable rise in connected devices such as IoT [1] devices, and streaming sensor data. At present there are billions of IoT devices connected to the internet. While you read this article, terabytes and petabytes…

Jenkins自动化CI CD流水线之8--流水线自动化发布Java项目

一、前提 插件&#xff1a;Maven Integration plugin 环境&#xff1a; maven、tomcat 用的博客系统代码&#xff1a; git clone https://github.com/b3log/solo.git 远端git服务器&#xff1a; [gitgit repos]$ mkdir -p solo [gitgit repos]$ cd solo/ [gitgit solo]$ git --…

数据科学还是计算机科学_数据科学101

数据科学还是计算机科学什么是数据科学&#xff1f; (What is data science?) Well, if you have just woken up from a 10-year coma and have no idea what is data science, don’t worry, there’s still time. Many years ago, statisticians had some pretty good ideas…

开机流程与主引导分区(MBR)

由于操作系统会提供所有的硬件并且提供内核功能&#xff0c;因此我们的计算机就能够认识硬盘内的文件系统&#xff0c;并且进一步读取硬盘内的软件文件与执行该软件来完成各项软件的执行目的 问题是你有没有发现&#xff0c;既然操作系统也是软件&#xff0c;那么我的计算机优势…

肤色检测算法 - 基于二次多项式混合模型的肤色检测。

由于CSDN博客和博客园的编辑方面有不一致的地方&#xff0c;导致文中部分图片错位&#xff0c;为不影响浏览效果&#xff0c;建议点击打开链接。 由于能力有限&#xff0c;算法层面的东西自己去创新的很少&#xff0c;很多都是从现有的论文中学习&#xff0c;然后实践的。 本文…

oracle解析儒略日,利用to_char获取当前日期准确的周数!

总的来说周数的算法有两种&#xff1a;算法一&#xff1a;iw算法&#xff0c;每周为星期一到星期日算一周&#xff0c;且每年的第一个星期一为第一周&#xff0c;就拿2014年来说&#xff0c;2014-01-01是星期三&#xff0c;但还是算为今年的第一周&#xff0c;可以简单的用sql函…

js有默认参数的函数加参数_函数参数:默认,关键字和任意

js有默认参数的函数加参数PYTHON开发人员的提示 (TIPS FOR PYTHON DEVELOPERS) Think that you are writing a function that accepts multiple parameters, and there is often a common value for some of these parameters. For instance, you would like to be able to cal…

2018大数据学习路线从入门到精通

最近很多人问小编现在学习大数据这么多&#xff0c;他们都是如何学习的呢。很多初学者在萌生向大数据方向发展的想法之后&#xff0c;不免产生一些疑问&#xff0c;应该怎样入门&#xff1f;应该学习哪些技术&#xff1f;学习路线又是什么&#xff1f;今天小编特意为大家整理了…

相似邻里算法_纽约市-邻里之战

相似邻里算法IBM Data Science Capstone ProjectIBM Data Science Capstone项目 分析和可视化与服装店投资者的要求有关的纽约市结构 (Analyzing and visualizing the structure of New York City in relation to the requirements of a Clothing Store Investor) 介绍 (Introd…

linux质控命令,Linux下microRNA质控-cutadapt安装

如果Linux系统已安装pip或conda&#xff0c;cutadapt的安装相对简便一些&#xff0c;示例如下&#xff1a;1.pip安装pip install --user --upgrade cutadapt添加环境变量echo export PATH$PATH:/your path/cutadapt-1.10/bin >> ~/.bashrc2.conda安装conda install -c b…

linux分辨率和用户有关吗,Linux系统在高分屏非正常分辨率显示

问题描述&#xff1a;win10重装为Ubuntu16.04&#xff0c;在1920x1080的显示屏上&#xff0c;linux系统分辨率只有800x600xrandr # 查看当前显示分辨率#输出&#xff1a;[Screen 0: minimum 800 x 600, current 800 x 600, maximum 800 x 600]可以看出显示屏最小为800x600&…

数据透视表和数据交叉表_数据透视表的数据提取

数据透视表和数据交叉表Consider the data of healthcare drugs as provided in the excel sheet. The concept of pivot tables in python allows you to extract the significance from a large detailed dataset. A pivot table helps in tracking only the required inform…

金融信息交换协议(FIX)v5.0

1. 什么是FIXFinancial Information eXchange(FIX)金融信息交换协议的制定是由多个致力于提升其相互间交易流程效率的金融机构和经纪商于1992年共同发起。这些企业把他们及他们的行业视为一个整体&#xff0c;认为能够从对交易指示&#xff0c;交易指令及交易执行的高效电子数…

linux行命令测网速,Linux命令行测试网速的方法

最近给服务器调整了互联网带宽的限速策略&#xff0c;调到100M让自己网站也爽一下。一般在windows上我喜欢用speedtest.net来测试&#xff0c;测速结果也被大家认可。在linux上speedtest.net提供了一个命令行工具speedtest-cli&#xff0c;用起来很方便&#xff0c;这里分享一下…

图像处理傅里叶变换图像变化_傅里叶变换和图像床单视图。

图像处理傅里叶变换图像变化What do Fourier Transforms do? What do the Fourier modes represent? Why are Fourier Transforms notoriously popular for data compression? These are the questions this article aims to address using an interesting analogy to repre…