sql 左联接 全联接_通过了解自我联接将您SQL技能提升到一个新的水平

sql 左联接 全联接

The last couple of blogs that I have written have been great for beginners ( Data Concepts Without Learning To Code or Developing A Data Scientist’s Mindset). But, I would really like to push myself to create content for other members of my audience as well. So today, we are going to take a step into more intermediate data analysis territory *dun dun dun*…. and discuss self joins: what they are and how to use them to take your analysis to the next level.

我写的最后两个博客对初学者非常有用( 无需学习编码或发展数据科学家思维方式的 数据概念 )。 但是,我真的很想推动自己为观众的其他成员创建内容。 因此,今天,我们将迈入更中间的数据分析领域* dun dun dun *...。 并讨论自我联接:它们是什么以及如何使用它们将您的分析提高到一个新的水平。

Disclaimer: This post assumes that you already understand how joins work in SQL. If you are not familiar with this concept yet, no worries at all! Save this article for later because I think it’ll definitely be useful as you master SQL in the future.

免责声明:本文假定您已经了解联接在SQL中的工作方式。 如果您还不熟悉这个概念,那就不用担心了! 请保存本文以供以后使用,因为我认为将来掌握SQL肯定会很有用。

什么是“自”联接? (What is a “Self” Join?)

A self join is actually as literal as it gets — it is joining a database table to itself. You can use any kind of join you want to perform a self join (left, right, inner, etc.) — what makes it the self join is that you use the same table on both sides. Just make sure that you select the correct join type for your specific scenario and desired outcome.

自我连接实际上就是获得的字面量-它是将数据库表连接到自身。 您可以使用任何类型的联接来执行自联接(左,右,内部等)—使之成为自联接的原因是您在两边都使用相同的表。 只需确保为您的特定方案和所需结果选择正确的联接类型即可。

我应该何时使用自我加入? (When Should I Use a Self Join?)

If you’ve been working or studying in the field of data analytics and data science for more than, say, 5 minutes, you’ll know that there are always 27 ways to solve a problem. Some are better than others of course, but sometimes the differences are almost indiscernible.

如果您在数据分析和数据科学领域从事了超过5分钟的工作或学习,那么您将知道总有27种方法可以解决问题。 当然,有些比其他的要好,但是有时差异几乎是看不到的。

That being said, there is probably never going to be one exact case where you MUST HAVE a self join or your analysis will shrivel up and die with nowhere to turn. *drop me a scenario in the comments below if you’ve got one, of course*

话虽这么说,可能永远不会有一个确切的案例,您必须进行自我加入,否则您的分析将崩溃而死,无处可去。 *如果有,请在下面的评论中给我一个方案,*

But, I do at least have some scenarios where I have used self joins to solve my analytics problems, at work or in personal analysis. Here’s my own spin on two of the best (AKA the ones that I remember and can think of a good example for).

但是,至少在某些情况下,我在工作中或个人分析中使用了自我联接来解决我的分析问题。 这是我自己选出的两个最好的(也就是我记得并可以想到的一个很好的例子)。

方案1:消息/响应 (Scenario #1: Message/Response)

Suppose that there exists a database table called Chats that holds all of the chat messages that have been sent or received by an online clothing store business.

假设存在一个名为Chats的数据库表,其中包含在线服装店业务已发送或接收的所有聊天消息。

Image for post

It would be extremely beneficial for the clothing store owner to know how long it usually takes her to respond to messages from her customers.

对于服装店的老板来说,知道她通常需要多长时间才能响应来自客户的消息,这将是非常有益的。

But, the messages from her customers and messages to her customers are in the same data source. So, we can use a self join to query the data and provide this analysis to the store owner. We will need one copy of the Chats table to get the initial message from the customer and one copy of the Chats table to get the response from the owner. Then, we can do some date math on the dates associated with those events to figure out how long the store owner is taking to respond.

但是,来自她的客户的消息和发给她的客户的消息位于同一数据源中。 因此,我们可以使用自我联接来查询数据并将此分析提供给商店所有者。 我们将需要一个Chats表副本来获取来自客户的初始消息,并需要一个Chats表副本来获取所有者的响应。 然后,我们可以对与这些事件相关的日期进行一些日期数学运算,以确定商店所有者需要花多长时间进行响应。

I would write this hypothetical self join query as the following:

我将这个假设的自我联接查询编写如下:

SELECT
msg.MessageDateTime AS CustomerMessageDateTime,
resp.MessageDateTime AS ResponseDateTime,
DATEDIFF(day, msg.MessageDateTime, resp.MessageDateTime)
AS DaysToRespond
FROM
Chats msg
INNER JOIN resp ON msg.MsgId = resp.RespondingTo

Note: This SQL query is written using Transact-SQL. Use whatever date functions work for your database at hand.

注意:此SQL查询是使用Transact-SQL编写的。 使用适用于您的数据库的任何日期函数。

This query is relatively straightforward, since the RespondingTo column gives us a one-to-one mapping of which original message to join back to.

该查询相对简单,因为RespondingTo列为我们提供了将原始消息加入其中的一对一映射。

方案2:开启/关闭 (Scenario #2: On/Off)

Let’s say this time you are presented a database table AccountActivity that holds a log of events that can occur on a yoga subscription site. The yoga site offers certain “premium trial periods” where customers can get a discounted membership rate for some period when they first join. The trials starting and ending date are tracked in this table with the Premium_Start and PremiumEnd event types.

假设这次是为您提供一个数据库表AccountActivity,该表包含一个瑜伽预订网站上可能发生的事件的日志。 瑜伽网站提供某些“高级试用期”,在此期间,客户在首次加入时可以享受一定的折扣会员价。 在此表中,使用Premium_Start和PremiumEnd事件类型跟踪审判的开始和结束日期。

Image for post

Suppose that some employees on the business side at this yoga subscription company are asking 1. how many people have the premium trial period currently active, and 2. how many used to have the premium trial period active but now they don’t.

假设这家瑜伽订阅公司的业务方面的一些员工在问:1.有多少人当前处于保费试用期,并且2.有多少人以前有保费试用期,但现在却没有。

Again, we’ve got the event for the premium period being started and the premium period being ended in the same database table (along with the other account activity as well).

再次,我们在同一数据库表中(同时还有其他帐户活动)开始了保费期开始和保费期结束的事件。

分析请求A:高级试用期内的帐户 (Analysis Request A: Accounts in Premium Trial Period)

To answer the first question, we need to find events where a premium membership was started but has not been ended yet. So, we need to join the AccountActivity table to itself to look for premium start and premium end event matches. But, we can’t use an inner join this time. We need the null rows in the end table… so left join it is.

要回答第一个问题,我们需要找到开始高级会员资格但尚未结束的活动。 因此,我们需要将AccountActivity表自身连接起来,以查找高级开始事件和高级结束事件匹配项。 但是,这次我们不能使用内部联接。 我们需要终端表中的空行…因此需要左连接。

SELECT
t_start.UserId,
t_start.EventDateTime AS PremiumTrialStart,
DATEDIFF(day, t_start.EventDateTime, GETDATE()) AS DaysInTrial
FROM
AccountActivity t_start
LEFT JOIN AccountActivity t_end ON t_start.UserId = t_end.UserId
AND t_start.EventType = 'Premium_Start'
AND t_end.EventType = 'Premium_End'
WHERE
t_end.EventDateTime IS NULL

Notice how we also check and make sure that the events we are joining are in the right order. We want the premium trial start on the left side of the join, and the premium trial end on the right side of the join. We also make sure that the User Id matches on both sides. We wouldn’t want to join events from two different customers!

请注意,我们还如何检查并确保我们加入的事件的顺序正确。 我们希望高级试用版在联接的左侧开始,而高级试用版在联接的右侧结束。 我们还确保用户ID在两侧都匹配。 我们不想参加来自两个不同客户的活动!

分析请求B:曾经处于高级试用期的帐户 (Analysis Request B: Accounts Who Used to Be in Premium Trial Period)

Regarding the second question, we want to find the customers whose sweet premium trial has come to an end. We are going to need to self join AccountActivity again, but this time we can switch it up to be a little stricter. We want matches from both the left and right, since, in this population, the trial has ended. So, we can choose an inner join this time.

关于第二个问题,我们想找到甜蜜溢价试用期已结束的客户。 我们将需要再次自行加入AccountActivity,但是这次我们可以将其更改为更严格一些。 我们需要左右匹配,因为在此人群中,审判已经结束。 因此,这次我们可以选择一个内部联接。

SELECT
t_start.UserId,
t_start.EventDateTime AS PremiumTrialStart,
DATEDIFF(day, t_start.EventDateTime, t_end.EventDateTime)
AS DaysInTrial
FROM
AccountActivity t_start
INNER JOIN AccountActivity t_end ON t_start.UserId = t_end.UserId
AND t_start.EventType = 'Premium_Start'
AND t_end.EventType = 'Premium_End'

See, self joins are pretty fun. They can be pretty useful in cases where you have events that are related to each other in the same database table. Thanks for reading, and happy querying. 🙂

看到,自我加入很有趣。 在同一数据库表中具有彼此相关的事件的情况下,它们非常有用。 感谢您的阅读和查询。 🙂

Originally published at https://datadreamer.io on August 13, 2020.

最初于 2020年8月13日 发布在 https://datadreamer.io

翻译自: https://towardsdatascience.com/take-your-sql-skills-to-the-next-level-by-understanding-the-self-join-75f1d52f2322

sql 左联接 全联接

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388107.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

hadoop windows

1、安装JDK1.6或更高版本 官网下载JDK,安装时注意,最好不要安装到带有空格的路径名下,例如:Programe Files,否则在配置Hadoop的配置文件时会找不到JDK(按相关说法,配置文件中的路径加引号即可解决&#xff…

科学价值 社交关系 大数据_服务的价值:数据科学和用户体验研究美好生活

科学价值 社交关系 大数据A crucial part of building a product is understanding exactly how it provides your customers with value. Understanding this is understanding how you fit into the lives of your customers, and should be central to how you build on wha…

在Ubuntu下创建hadoop组和hadoop用户

一、在Ubuntu下创建hadoop组和hadoop用户 增加hadoop用户组,同时在该组里增加hadoop用户,后续在涉及到hadoop操作时,我们使用该用户。 1、创建hadoop用户组 2、创建hadoop用户 sudo adduser -ingroup hadoop hadoop 回车后会提示输入新的UNIX…

vs azure web_在Azure中迁移和自动化Chrome Web爬网程序的指南。

vs azure webWebscraping as a required skill for many data-science related jobs is becoming increasingly desirable as more companies slowly migrate their processes to the cloud.随着越来越多的公司将其流程缓慢迁移到云中,将Web爬网作为许多与数据科学相…

hadoop eclipse windows

首先说一下本人的环境: Windows7 64位系统 Spring Tool Suite Version: 3.4.0.RELEASE Hadoop2.6.0 一.简介 Hadoop2.x之后没有Eclipse插件工具,我们就不能在Eclipse上调试代码,我们要把写好的java代码的MapReduce打包成jar然后在Linux上运…

netstat 在windows下和Linux下查看网络连接和端口占用

假设忽然起个服务,告诉我8080端口被占用了,OK,我要去看一下是什么服务正在占用着,能不能杀 先假设我是在Windows下: 第一列: Proto 协议 第二列: 本地地址【ip端口】 第三列:远程地址…

selenium 解析网页_用Selenium进行网页搜刮

selenium 解析网页网页抓取系列 (WEB SCRAPING SERIES) 总览 (Overview) Selenium is a portable framework for testing web applications. It is open-source software released under the Apache License 2.0 that runs on Windows, Linux and macOS. Despite serving its m…

代理ARP协议(Proxy ARP)

代理ARP(Proxy-arp)的原理就是当出现跨网段的ARP请求时,路由器将自己的MAC返回给发送ARP广播请求发送者,实现MAC地址代理(善意的欺骗),最终使得主机能够通信。 图中R1和R3处于不同的局域网&…

hive 导入hdfs数据_将数据加载或导入运行在基于HDFS的数据湖之上的Hive表中的另一种方法。

hive 导入hdfs数据Preceding pen down the article, might want to stretch out appreciation to all the wellbeing teams beginning from cleaning/sterile group to Nurses, Doctors and other who are consistently battling to spare the mankind from continuous Covid-1…

对Faster R-CNN的理解(1)

目标检测是一种基于目标几何和统计特征的图像分割,最新的进展一般是通过R-CNN(基于区域的卷积神经网络)来实现的,其中最重要的方法之一是Faster R-CNN。 1. 总体结构 Faster R-CNN的基本结构如下图所示,其基础是深度全…

大数据业务学习笔记_学习业务成为一名出色的数据科学家

大数据业务学习笔记意见 (Opinion) A lot of aspiring Data Scientists think what they need to become a Data Scientist is :许多有抱负的数据科学家认为,成为一名数据科学家需要具备以下条件: Coding 编码 Statistic 统计 Math 数学 Machine Learni…

postman 请求参数为数组及JsonObject

2019独角兽企业重金招聘Python工程师标准>>> 1. (1)数组的请求方式(post) https://blog.csdn.net/qq_21205435/article/details/81909184 (2)数组的请求方式(get) http://localhost:port/list?ages10,20,30 后端接收方式: PostMa…

python 开发api_使用FastAPI和Python快速开发高性能API

python 开发apiIf you have read some of my previous Python articles, you know I’m a Flask fan. It is my go-to for building APIs in Python. However, recently I started to hear a lot about a new API framework for Python called FastAPI. After building some AP…

基于easyui开发Web版Activiti流程定制器详解(一)——目录结构

题外话(可略过): 前一段时间(要是没记错的话应该是3个月以前)发布了一个更新版本,很多人说没有文档看着比较困难,所以打算拿点时间出来详细给大家讲解一下,…

基于easyui开发Web版Activiti流程定制器详解(二)——文件列表

上一篇我们介绍了目录结构,这篇给大家整理一个文件列表以及详细说明,方便大家查找文件。 由于设计器文件主要保存在wf/designer和js/designer目录下,所以主要针对这两个目录进行详细说明。 wf/designer目录文件详解…

Power BI:M与DAX以及度量与计算列

When I embarked on my Power BI journey I was almost immediately slapped with an onslaught of foreign and perplexing terms that all seemed to do similar, but somehow different, things.当我开始Power BI之旅时,我几乎立刻受到了外国和困惑术语的冲击&am…

git 基本命令和操作

设置全局用户名密码 $ git config --global user.name runoob $ git config --global user.email testrunoob.comgit init:初始化仓库 创建新的 Git 仓库 git clone: 拷贝一个 Git 仓库到本地 : git clone [url]git add:将新增的文件添加到缓存 : git add test.htmlgit status …

基于easyui开发Web版Activiti流程定制器详解(三)——页面结构(上)

上一篇介绍了定制器相关的文件,这篇我们来看看整个定制器的界面部分,了解了页面结构有助于更好的理解定制器的实现,那么现在开始吧! 首先,我们来看看整体的结构: 整体结构比较简单…

基于easyui开发Web版Activiti流程定制器详解(四)——页面结构(下)

题外话: 这两天周末在家陪老婆和儿子没上来更新请大家见谅!上一篇介绍了调色板和画布区的页面结构,这篇讲解一下属性区的结构也是定制器最重要的一个页面。 属性区整体页面结构如图: 在这个区域可以定义工…

梯度下降法优化目标函数_如何通过3个简单的步骤区分梯度下降目标函数

梯度下降法优化目标函数Nowadays we can learn about domains that were usually reserved for academic communities. From Artificial Intelligence to Quantum Physics, we can browse an enormous amount of information available on the Internet and benefit from it.如…