sql优化技巧_使用这些查询优化技巧成为SQL向导

sql优化技巧

成为SQL向导! (Become an SQL Wizard!)

It turns out storing data by rows and columns is convenient in a lot of situations, so relational databases have remained a cornerstone of data management in businesses across the globe. Structured Query Language (SQL) is a powerful query language that allows you to retrieve and manipulate data in relational databases. The basics of querying data using SQL are fairly easy to learn, and I highly recommend exploring them if you’re not familiar with it.

事实证明,在很多情况下按行和列存储数据非常方便,因此关系数据库一直是全球企业数据管理的基石。 结构化查询语言( SQL )是一种功能强大的查询语言,可让您检索和处理关系数据库中的数据。 使用SQL查询数据的基础知识非常容易学习,如果您不熟悉SQL,我强烈建议您进行探索 。

In this article, I’m going to give you my process for investigating slow running queries. Even if you’re not in to programming, SQL is a fantastic language to have in your toolbox for situations in which excel doesn’t cut it! If you’re brand new, check out my intro to SQL.

在本文中,我将向您介绍调查运行缓慢的查询的过程。 即使您不喜欢编程, SQL还是一种出色的语言,可用于工具箱中,即使excel不能胜任! 如果您是新手,请查看我SQL简介。

跟着! (Follow Along!)

It is free to download and use! I use Microsoft SQL Server and SQL Server Management Studios at work and at home, so that is what I use in the examples.

它是免费下载和使用! 我在工作中和在家中都使用Microsoft SQL Server和SQL Server Management Studio ,所以这就是我在示例中使用的。

为什么查询速度慢? (Why is my Query Slow?)

There are many reasons a query might be running slow, and it isn’t always obvious. I’ve written plenty of queries I thought would process easily, but ended up taking an absurd amount of time until I did a little tuning. If you’re new to query optimization, read through the SQL Server Query Engine 101 below. If you already know that stuff, skip ahead to the Query Optimization Tips!

有很多原因可能导致查询运行缓慢,而且这种情况并不总是很明显。 我已经写了很多我认为很容易处理的查询,但是最终花了一些荒谬的时间,直到我做了一些调整。 如果您不熟悉查询优化,请通读下面SQL Server查询引擎101。 如果您已经知道这些知识,请跳至“查询优化技巧”!

SQL Server查询引擎101 (SQL Server Query Engine 101)

Although query syntax is fairly simple, there is a lot to understand under the hood of SQL Server. There is no way I can cover it all in an article, but I’ll give you the cliff notes.

尽管查询语法非常简单,但是在SQL Server的背景下还有很多要理解的地方。 我不可能在一篇文章中介绍所有内容,但我会给您一些悬崖笔记。

The SQL Server Engine is composed of 2 main parts: Storage Engine and the Query Processor (Relational Engine). The Query Processor is the part of SQL Server that accepts all incoming queries and devises an Execution Plan for them. There is no guarantee the same plan will always be selected for a query. I’ll get deeper into execution plans later…

SQL Server引擎由2个主要部分组成: 存储引擎查询处理器(关系引擎) 。 查询处理器是SQL Server的一部分,它接受所有传入的查询并为它们设计执行计划 。 不能保证总是为查询选择相同的计划。 稍后我将更深入地执行计划...

The 4 core steps of the Query Processor:

查询处理器的4个核心步骤:

Parsing — Checks the query uses valid syntaxBind — Checks that the objects exist and is responsible for name resolutionOptimize — Uses cost-based optimization to generate an optimal execution planExecute — Executes the execution plan

解析-检查查询是否使用有效的语法绑定-检查对象是否存在并负责名称解析优化-使用基于成本的优化生成最佳执行计划执行-执行执行计划

优化器 (The Optimizer)

The query optimizer arrives at the optimal plan by generating and assessing as many execution plans as possible in a given search space. The search space is all possible execution plans for the query. Any plan in the search space must return the query results.

查询优化器通过在给定的搜索空间中生成并评估尽可能多的执行计划来得出最佳计划。 搜索空间是查询的所有可能的执行计划。 搜索空间中的任何计划都必须返回查询结果。

Of course, it isn’t always possible for the optimizer to assess ALL possible plans. An exhaustive search could take a ridiculously long time and impact overall performance. For example, a complex query might have millions of possible plan combinations. The optimizer finds a balance between plan quality and search time.

当然,优化器并非总是能够评估所有可能的计划。 详尽的搜索可能要花费很长时间,并且会影响整体性能。 例如,一个复杂的查询可能具有数百万种可能的计划组合。 优化器在计划质量和搜索时间之间找到平衡。

Execution plans consist of physical entities called operators. Operators will make more sense once we look at the plans. To produce an estimated cost for the plan, the optimizer considers:

执行计划由称为操作员的物理实体组成。 一旦我们查看了计划,运营商将变得更加有意义。 为了产生计划的估计成本,优化器考虑:

  • Physical operator costs and things like I/O and memory

    物理操作员成本以及I / O和内存之类的东西
  • Estimated number of records (Cardinality estimate)

    估计记录数(基数估计)

To help the query optimizer with Cardinality estimates, SQL Server uses stored information on the distribution of values and columns within a table called Statistics. The query optimizer adds up all these costs pretty quickly and determines which plan is good enough to use!

为了帮助查询优化器进行基数估计 ,SQL Server使用存储在表中的有关值和列分布的信息,该表称为“ 统计” 。 查询优化器可以很快将所有这些成本加起来,并确定哪个计划足以使用!

执行计划 (Execution Plans)

You can see the query execution plan by Right-clicking and selecting Show Execution Plan or Show Estimated Execution Plan. Use the Estimated Execution Plan when you want to look for bottlenecks before running a large or complex query.

您可以通过右键单击并选择“显示执行计划”或“显示估计的执行计划”来查看查询执行计划。 当您想在运行大型或复杂查询之前查找瓶颈时,请使用“估计执行计划”。

Image for post
Execution plan display options Microsoft SQL Server management studios
执行计划显示选项Microsoft SQL Server管理工作室

探索计划 (Exploring the Plan)

The most common way to view the execution plan is the tree format that uses images to represent the operators. For example:

查看执行计划的最常见方法是使用图像表示操作符的树格式。 例如:

Image for post
Example Execution plan generated from Select * from #tek
从#tek的Select *生成的示例执行计划

In the example you can see two operators in the execution plan: SELECT and Table Scan. You also see an arrow that represents the flow of data. The thicker the arrow, the more records.

在示例中,您可以在执行计划中看到两个运算符: SELECTTable Scan。 您还会看到一个箭头 代表数据流。 箭头越粗,记录越多。

The first operator is called the Results operator and is mostly there to represent the SELECT. Beyond that, there are a lot of operators! Each performs a single function like scanning, filtering or performing an aggregation. It can represent a logical operation and/or a physical operation. Look them up when you need to instead of trying to memorize them all!

第一个运算符称为“ 结果”运算符,并且大多数用于表示SELECT 。 除此之外, 这里还有很多运营商! 每个都执行单个功能,例如扫描,过滤或执行聚合。 它可以表示逻辑操作和/或物理操作。 在需要时查找它们,而不要尝试全部记住它们!

查询优化技巧 (Query Optimization Tips)

Although it tries, the plan executed by the query processor isn’t always going to be the best plan. For example, a bad cardinality estimate might result in the wrong operator. That’s why you need to learn some query optimization! Here are my top tips and troubleshooting techniques for queries.

尽管可以尝试,但查询处理器执行的计划并不总是最佳的计划。 例如,基数估计错误可能会导致运算符错误。 这就是为什么您需要学习一些查询优化的原因! 这是我查询的主要技巧和故障排除技术。

设置统计IO开 (SET Statistics IO ON)

Using the command SET Statistics IO ON before the query provides information that can help troubleshoot the query. STATISTICS IO shows you the IO that was incurred for each object. It is useful for understanding what happened behind the scenes and how the data was retrieved.

在查询之前使用命令SET Statistics IO ON提供可以帮助解决查询问题的信息。 统计IO向您显示每个对象产生的IO。 对于了解幕后发生的情况以及如何检索数据很有用。

Image for post
Set Statistics IO ON
将统计数据IO设置为ON

When the query completes, click the Messages tab to see the output.

查询完成后,单击“ 消息”选项卡以查看输出。

Image for post
Statistics IO output
统计IO输出

Notice the 19397 Logical Reads.

请注意19397 逻辑读取

A lower number is better when it comes to reads (logical and physical). A logical read is when the data is read from the SQL Server Buffer Pool. The SQL Server engine uses the buffer pool when when transferring data, like getting it from disk for example.

读取(逻辑和物理)值越小越好。 逻辑读取是指从SQL Server缓冲池读取数据时。 当传输数据时,例如从磁盘获取数据,SQL Server引擎将使用缓冲池。

To greatly reduce the number of logical reads, try adding an index on the table.

要大大减少逻辑读取的次数,请尝试在表上添加索引。

Image for post
The Statistics IO after adding an Columnar Index
添加列索引后的统计数据IO

Notice logical reads is 0. Performance was boosted significantly over querying a table without an index. You can see different types of reads occurred instead of the standard Logical or Physical. This is because I’m using a ColumnStore index. ColumnStore indexes are typically used for large data tables or data warehouses.

请注意,逻辑读取为0。与查询没有索引的表相比,性能得到了显着提高。 您可以看到发生了不同类型的读取,而不是标准的逻辑或物理读取。 这是因为我正在使用ColumnStore索引 。 ColumnStore索引通常用于大型数据表或数据仓库。

使用索引 (Use Indexes)

For good performance, it is imperative the tables have good indexing. Without getting too deep into the woods, basically, there are two types of indexes:

为了获得良好的性能,表必须具有良好的索引编制。 不必太深入了解,基本上有两种类型的索引 :

Clustered indexClustered indexes sort and store the data rows in the table or view based on the key values.

聚集索引聚集索引根据键值对数据行进行排序并将其存储在表或视图中。

Non-clustered index — A non-clustered index is an index in which the rows are ordered by the columns that make up the index.

非聚集索引 聚集索引是这样的索引 ,其中的行按组成索引的列排序

A table without a clustered index is called a Heap. Most tables should have clustered indexes. If a table is a heap, it is still possible to add non-clustered indexes. Tables can have only 1 clustered index, but many non-clustered indexes.

没有聚集索引的表称为 。 大多数表应具有聚集索引。 如果表是堆,仍然可以添加非聚集索引。 表只能有1个聚集索引,但可以有许多非聚集索引。

If you’re not sure what to include in the index, generate the Estimated Execution Plan and the Missing Indexes feature provides information about missing indexes that could improve query performance.

如果您不确定要包括在索引中的内容,请生成“估计执行计划”,“ 缺失索引”功能将提供有关缺失索引的信息,这些信息可以提高查询性能。

Image for post
Missing Index feature
缺少索引功能

谨慎使用 (Use caution)

Use caution when creating non-clustered indexes since they take up space, and over-indexing is bad. The problem with blindly creating this index in the example is that SQL Server has decided that it is useful for a particular query (or handful of queries), but is ignorant of the rest of the workload. The index might not be a good fit, so be aware of what you’re doing.

创建非聚簇索引时要小心 ,因为它们会占用空间,并且过度索引是不好的。 在示例中盲目创建此索引的问题在于,SQL Server已确定该索引对于特定查询(或少数查询)很有用,但是却忽略了其余工作负载。 该索引可能不适合,因此请注意您在做什么。

避免工会 (Avoid Unions)

When I am querying large tables, I do my best to avoid UNION. When I see queries using it, I hope I packed a sleeping bag because I might be there all night!

查询大表时,我会尽量避免使用UNION。 当我看到使用它的查询时,希望我收拾一个睡袋,因为我可能整晚都在那儿!

First, it is important to know the difference between UNION and UNION ALL. The UNION operator is used to combine the result-set of two or more SELECT statements, but it will exclude duplicates. UNION ALL includes duplicates; it essentially concatenates the two datasets. Because they behavior differently, they have very different execution plans:

首先,了解UNIONUNION ALL之间的区别很重要。 UNION运算符用于合并两个或多个SELECT语句的结果集,但它将排除重复项 。 UNION ALL 包括重复项; 本质上是连接两个数据集 由于它们的行为不同,因此它们具有非常不同的执行计划:

Lets take a look at this situation:

让我们看一下这种情况:

--table testing has 1,000,000 rows
--table testing2 has 5,000,000 rows--I want to return all 6 million rows. Which should I use? Is there a faster way?
select * from testing
union
select * from testing2select * from testing
union all
select * from testing2

Of the two options, UNION ALL will guarantee all rows are included. However, when comparing the execution plans, UNION ALL is slower!

在这两个选项中,UNION ALL将保证包括所有行。 但是,在比较执行计划时,UNION ALL会更慢!

Image for post
UNION execution plan
UNION执行计划
Image for post
UNION ALL execution plan
UNION ALL执行计划

Notice UNION ALL takes over 5 minutes in the example!

注意,在示例中,UNION ALL花费了5分钟以上!

Instead of using UNION ALL, I like using a temporary table. I dump the data into a temporary table and then select all from the table.

我喜欢使用临时表, 而不是使用UNION ALL 。 我将数据转储到临时表中,然后从表中全选。

select * into #tek from testing2
insert into #tek select * from testingselect * from #tek
--drop table #tek
Image for post
Using Temp table and Select statement
使用临时表和Select语句

Notice this method took under 3 minutes to return 6 million rows compared to UNION ALL which took over 5! Since the data exists in the temp table, it is in a semi-permanent place allowing you to do more with it if needed too.

请注意,此方法花了不到3分钟的时间才能返回600万行,而UNION ALL则需要5 分钟 ! 由于数据存在于临时表中,因此它位于半永久位置,允许您在需要时进行更多处理。

避免排序 (Avoid Sort)

Do what you can to avoid seeing the Sort operator in the Execution Plan. Sorts are slow and can take up a lot of resources resulting in spills that eat up tempDB! If you see a warning sign in your execution plan, hover over it to see what it says.

尽力避免在执行计划中看到“排序”运算符。 排序速度很慢,并且会占用大量资源,从而导致溢出而耗尽tempDB! 如果在执行计划中看到警告标志,请将鼠标悬停在该计划上以查看其内容。

Image for post
Sort with Warning
按警告排序

Don’t use an ORDER BY clause in your query if you don’t have to. Ideally, if you need to sort by a specific column often, you can add a non-clustered index for that column to help avoid Sort operators in the plan.

如果不需要,请不要在查询中使用ORDER BY子句。 理想情况下,如果您需要经常按特定的列排序,则可以为该列添加非聚集索引,以帮助避免计划中的“排序”运算符。

现货懒桌子线轴 (Spot Lazy Table Spools)

I’ve seen Table Spools cause queries to takes hours when they should be taking minutes, or even seconds. The Table Spool Operator is essentially used to create a temporary table in memory or on-disk that stores results of sub-queries that might be used multiple times in the execution plan. The table spool builds a temporary table that is lazy, meaning it only accesses rows when it is needed. There are 5 or so different types of Spool operators, but all have similar purpose.

我已经看到表假脱机会导致查询花费数小时甚至数秒的时间。 的 表假脱机操作符本质上用于在内存或磁盘上创建一个临时表,该临时表存储可能在执行计划中多次使用的子查询的结果。 表假脱机构建了一个懒惰的临时表,这意味着它仅在需要时才访问行。 大约有5种不同类型的Spool运算符,但是它们都有相似的用途。

Table Spools are tricky because they can sometimes show a low Cost %, but be a huge bottleneck in the execution plan. Hover over the operator to see how the estimated rows compare to the actual rows and collect additional info about the properties!

表假脱机非常棘手,因为它们有时可以显示较低的“成本百分比”,但会成为执行计划中的巨大瓶颈。 将鼠标悬停在运算符上可以查看估算的行与实际行的比较情况,并收集有关属性的其他信息!

Image for post
Table Spools
表线轴

To avoid Table Spools, try using an index that includes all fields in the query. If that is not possible, you can try Query Hints or force the order of the query or specify the join operation. For example, try using INNER HASH JOIN instead of INNER JOIN. Forcing Hash joins can boost speed significantly, but they can cause a lot of spill over into TempDB, so be very careful using query hints!

为避免表假脱机,请尝试使用包含查询中所有字段的索引。 如果不可能,则可以尝试查询提示强制查询顺序或指定联接操作。 例如,尝试使用INNER HASH JOIN代替INNER JOIN。 强制使用哈希联接可以显着提高速度,但是它们可能导致大量溢出到TempDB中,因此请谨慎使用查询提示!

最后的想法 (Final Thoughts)

Understanding execution plans and optimizing SQL queries can be tedious and take a while to learn. I’ve been using SQL for years and still learn new techniques all the time! As long as you remember the following, you’re on your way to becoming a SQL Query tuning wizard:

了解执行计划和优化SQL查询可能是乏味的,需要一段时间才能学习。 我已经使用SQL多年了,仍然一直在学习新技术! 只要您记住以下几点,就可以成为SQL查询调优向导:

  • Use table indexes

    使用表索引
  • Set Statistics IO on

    将统计数据IO设置为打开
  • Check the Execution Plan

    检查执行计划

Check out my other articles on SQL, Programming and Data Science if you enjoyed this article!

如果您喜欢这篇文章,请查看我有关SQL,编程和数据科学的其他文章!

翻译自: https://medium.com/swlh/become-a-sql-wizard-using-these-query-optimization-tips-a932d18c762f

sql优化技巧

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389052.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Day 4:集合——迭代器与List接口

Collection-迭代方法 1、toArray() 返回Object类型数据,接收也需要Object对象! Object[] toArray(); Collection c new ArrayList(); Object[] arr c.toArray(); 2、iterator() Collection的方法,返回实现Iterator接口的对象,…

物种分布模型_减少物种分布建模中的空间自相关

物种分布模型Species distribution models (SDM; for review and definition see, e.g., Peterson et al., 2011) are a dominant paradigm to quantify the relationship between environmental dynamics and several manifestations of species biogeography. These statisti…

深入理解激活函数

为什么需要非线性激活函数? 说起神经网络肯定会降到神经函数,看了很多资料,也许你对激活函数这个名词会感觉很困惑, 它为什么叫激活函数?它有什么作用呢? 看了很多书籍上的讲解说会让神经网络变成很丰富的…

如何一键部署项目、代码自动更新

为什么80%的码农都做不了架构师?>>> 摘要:my-deploy:由nodejs写的一个自动更新工具,理论支持所有语言(php、java、c#)的项目,支持所有git仓库(bitbucket、github等)。github效果如何?如果你的后端项目放在github、bitbucket等git仓库中管理…

Kettle7.1在window启动报错

实验环境: window10 x64 kettle7.1 pdi-ce-7.1.0.0-12.zip 错误现象: a java exception has occurred 问题解决: 运行调试工具 data-integration\SpoonDebug.bat //调试错误的,根据错误明确知道为何启动不了,Y--Y-…

opa847方波放大电路_电子管放大电路当中阴极电阻的作用和选择

胆机制作知识视频:6P14单端胆机用示波器方波测试输出波形详细步骤演示完整版自制胆机试听视频:胆机播放《猛士的士高》经典舞曲 熟悉的旋律震撼的效果首先看下面这一张300B电子管电路图:300B单端胆机原理图图纸里面画圆圈的电阻就是放大电路当…

清洁数据ploy n_清洁屋数据

清洁数据ploy nAs a bootcamp project, I was asked to analyze data about the sale prices of houses in King County, Washington, in 2014 and 2015. The dataset is well known to students of data science because it lends itself to linear regression modeling. You …

redis安装redis集群

NoSql数据库之Redis1、什么是nosql,nosql的应用场景2、Nonsql数据库的类型a) Key-valueb) 文档型(类似于json)c) 列式存储d) 图式3、redis的相关概念kv型的。4、Redis的安装及部署5、Redis的使用方法及数据类型a) Redis启动及关闭b) Redis的数…

机器学习实践一 logistic regression regularize

Logistic regression 数据内容: 两个参数 x1 x2 y值 0 或 1 Potting def read_file(file):data pd.read_csv(file, names[exam1, exam2, admitted])data np.array(data)return datadef plot_data(X, y):plt.figure(figsize(6, 4), dpi150)X1 X[y 1, :]X2 X[…

深度学习数据扩张_适用于少量数据的深度学习结构

作者:Gorkem Polat编译:ronghuaiyang导读一些最常用的few shot learning的方案介绍及对比。传统的CNNs (AlexNet, VGG, GoogLeNet, ResNet, DenseNet…)在数据集中每个类样本数量较多的情况下表现良好。不幸的是,当你拥有一个小数据集时&…

基于边缘计算的实时绩效_基于绩效的营销中的三大错误

基于边缘计算的实时绩效We’ve gone through 20% of the 21st century. It’s safe to say digitalization isn’t a new concept anymore. Things are fully or at least mostly online, and they tend to escalate in the digital direction. That’s why it’s important to…

为什么Facebook的API以一个循环作为开头?

作者 | Antony Garand译者 | 无明如果你有在浏览器中查看过发给大公司 API 的请求,你可能会注意到,JSON 前面会有一些奇怪的 JavaScript:为什么他们会用这几个字节来让 JSON 失效?为了保护你的数据 如果没有这些字节,那…

城市轨道交通运营票务管理论文_城市轨道交通运营管理专业就业前景怎么样?中职优选告诉你...

​​城市轨道交通运营管理专业,专业就业前景怎么样?就业方向有哪些?有很多同学都感觉很迷忙,为了让更多的同学们了解城市轨道交通运营管理专业的就业前景与就业方向,整理出以下内容希望可以帮助同学们。城市轨道交通运…

计算机视觉对扫描文件分类 OCR

通过计算机视觉对扫描文件分类 一种解决扫描文档分类问题的深度学习方法 在数字经济时代, 银行、保险、治理、医疗、法律等部门仍在处理各种手写票据和扫描文件。在业务生命周期的后期, 手动维护和分类这些文档变得非常繁琐。 对这些非机密文档进行简…

笑话生成器_爸爸笑话发生器

笑话生成器(If you’re just here for the generated jokes, scroll down to the bottom!)(如果您只是在这里生成笑话,请向下滚动到底部!) I thought: what is super easy to build, yet would still get an approving chuckle if someone found it on …

机器学习实践二 -多分类和神经网络

本次练习的任务是使用逻辑归回和神经网络进行识别手写数字(form 0 to 9, 自动手写数字问题已经应用非常广泛,比如邮编识别。 使用逻辑回归进行多分类分类 练习2 中的logistic 回归实现了二分类分类问题,现在将进行多分类,one vs…

Hadoop 倒排索引

倒排索引是文档检索系统中最常用的数据结构,被广泛地应用于全文搜索引擎。它主要是用来存储某个单词(或词组)在一个文档或一组文档中存储位置的映射,即提供了一种根据内容来查找文档的方式。由于不是根据文档来确定文档所包含的内…

koa2异常处理_读 koa2 源码后的一些思考与实践

koa2的特点优势什么是 koa2Nodejs官方api支持的都是callback形式的异步编程模型。问题:callback嵌套问题koa2 是由 Express原班人马打造的,是现在比较流行的基于Node.js平台的web开发框架,Koa 把 Express 中内置的 router、view 等功能都移除…

上凸包和下凸包_使用凸包聚类

上凸包和下凸包I recently came across the article titled High-dimensional data clustering by using local affine/convex hulls by HakanCevikalp in Pattern Recognition Letters. It proposes a novel algorithm to cluster high-dimensional data using local affine/c…

幸运三角形 南阳acm491(dfs)

幸运三角形 时间限制:1000 ms | 内存限制:65535 KB 难度:3描述话说有这么一个图形,只有两种符号组成(‘’或者‘-’),图形的最上层有n个符号,往下个数依次减一,形成倒置…