sql优化技巧

成为SQL向导！ (Become an SQL Wizard!)

It turns out storing data by rows and columns is convenient in a lot of situations, so relational databases have remained a cornerstone of data management in businesses across the globe. Structured Query Language (SQL) is a powerful query language that allows you to retrieve and manipulate data in relational databases. The basics of querying data using SQL are fairly easy to learn, and I highly recommend exploring them if you’re not familiar with it.

事实证明，在很多情况下按行和列存储数据非常方便，因此关系数据库一直是全球企业数据管理的基石。结构化查询语言( SQL )是一种功能强大的查询语言，可让您检索和处理关系数据库中的数据。使用SQL查询数据的基础知识非常容易学习，如果您不熟悉SQL，我强烈建议您进行探索。

In this article, I’m going to give you my process for investigating slow running queries. Even if you’re not in to programming, SQL is a fantastic language to have in your toolbox for situations in which excel doesn’t cut it! If you’re brand new, check out my intro to SQL.

在本文中，我将向您介绍调查运行缓慢的查询的过程。即使您不喜欢编程， SQL还是一种出色的语言，可用于工具箱中，即使excel不能胜任！如果您是新手，请查看我SQL简介。

跟着！ (Follow Along!)

It is free to download and use! I use Microsoft SQL Server and SQL Server Management Studios at work and at home, so that is what I use in the examples.

它是免费下载和使用！ 我在工作中和在家中都使用Microsoft SQL Server和SQL Server Management Studio ，所以这就是我在示例中使用的。

为什么查询速度慢？ (Why is my Query Slow?)

There are many reasons a query might be running slow, and it isn’t always obvious. I’ve written plenty of queries I thought would process easily, but ended up taking an absurd amount of time until I did a little tuning. If you’re new to query optimization, read through the SQL Server Query Engine 101 below. If you already know that stuff, skip ahead to the Query Optimization Tips!

有很多原因可能导致查询运行缓慢，而且这种情况并不总是很明显。我已经写了很多我认为很容易处理的查询，但是最终花了一些荒谬的时间，直到我做了一些调整。 如果您不熟悉查询优化，请通读下面SQL Server查询引擎101。 如果您已经知道这些知识，请跳至“查询优化技巧”！

SQL Server查询引擎101 (SQL Server Query Engine 101)

Although query syntax is fairly simple, there is a lot to understand under the hood of SQL Server. There is no way I can cover it all in an article, but I’ll give you the cliff notes.

尽管查询语法非常简单，但是在SQL Server的背景下还有很多要理解的地方。我不可能在一篇文章中介绍所有内容，但我会给您一些悬崖笔记。

The SQL Server Engine is composed of 2 main parts: Storage Engine and the Query Processor (Relational Engine). The Query Processor is the part of SQL Server that accepts all incoming queries and devises an Execution Plan for them. There is no guarantee the same plan will always be selected for a query. I’ll get deeper into execution plans later…

SQL Server引擎由2个主要部分组成： 存储引擎和查询处理器(关系引擎) 。查询处理器是SQL Server的一部分，它接受所有传入的查询并为它们设计执行计划。不能保证总是为查询选择相同的计划。稍后我将更深入地执行计划...

The 4 core steps of the Query Processor:

查询处理器的4个核心步骤：

Parsing — Checks the query uses valid syntaxBind — Checks that the objects exist and is responsible for name resolutionOptimize — Uses cost-based optimization to generate an optimal execution planExecute — Executes the execution plan
解析-检查查询是否使用有效的语法绑定-检查对象是否存在并负责名称解析优化-使用基于成本的优化生成最佳执行计划执行-执行执行计划

优化器 (The Optimizer)

The query optimizer arrives at the optimal plan by generating and assessing as many execution plans as possible in a given search space. The search space is all possible execution plans for the query. Any plan in the search space must return the query results.

查询优化器通过在给定的搜索空间中生成并评估尽可能多的执行计划来得出最佳计划。搜索空间是查询的所有可能的执行计划。搜索空间中的任何计划都必须返回查询结果。

Of course, it isn’t always possible for the optimizer to assess ALL possible plans. An exhaustive search could take a ridiculously long time and impact overall performance. For example, a complex query might have millions of possible plan combinations. The optimizer finds a balance between plan quality and search time.

当然，优化器并非总是能够评估所有可能的计划。详尽的搜索可能要花费很长时间，并且会影响整体性能。例如，一个复杂的查询可能具有数百万种可能的计划组合。优化器在计划质量和搜索时间之间找到平衡。

Execution plans consist of physical entities called operators. Operators will make more sense once we look at the plans. To produce an estimated cost for the plan, the optimizer considers:

执行计划由称为操作员的物理实体组成。一旦我们查看了计划，运营商将变得更加有意义。为了产生计划的估计成本，优化器考虑：

Physical operator costs and things like I/O and memory
物理操作员成本以及I / O和内存之类的东西
Estimated number of records (Cardinality estimate)
估计记录数(基数估计)

To help the query optimizer with Cardinality estimates, SQL Server uses stored information on the distribution of values and columns within a table called Statistics. The query optimizer adds up all these costs pretty quickly and determines which plan is good enough to use!

为了帮助查询优化器进行基数估计 ，SQL Server使用存储在表中的有关值和列分布的信息，该表称为“ 统计” 。查询优化器可以很快将所有这些成本加起来，并确定哪个计划足以使用！

执行计划 (Execution Plans)

You can see the query execution plan by Right-clicking and selecting Show Execution Plan or Show Estimated Execution Plan. Use the Estimated Execution Plan when you want to look for bottlenecks before running a large or complex query.

您可以通过右键单击并选择“显示执行计划”或“显示估计的执行计划”来查看查询执行计划。当您想在运行大型或复杂查询之前查找瓶颈时，请使用“估计执行计划”。

Image for post — Execution plan display options Microsoft SQL Server management studios

探索计划 (Exploring the Plan)

The most common way to view the execution plan is the tree format that uses images to represent the operators. For example:

查看执行计划的最常见方法是使用图像表示操作符的树格式。例如：

In the example you can see two operators in the execution plan: SELECT and Table Scan. You also see an arrow that represents the flow of data. The thicker the arrow, the more records.

在示例中，您可以在执行计划中看到两个运算符： SELECT和Table Scan。 您还会看到一个箭头代表数据流。箭头越粗，记录越多。

The first operator is called the Results operator and is mostly there to represent the SELECT. Beyond that, there are a lot of operators! Each performs a single function like scanning, filtering or performing an aggregation. It can represent a logical operation and/or a physical operation. Look them up when you need to instead of trying to memorize them all!

第一个运算符称为“ 结果”运算符，并且大多数用于表示SELECT 。除此之外，这里还有很多运营商！每个都执行单个功能，例如扫描，过滤或执行聚合。它可以表示逻辑操作和/或物理操作。在需要时查找它们，而不要尝试全部记住它们！

查询优化技巧 (Query Optimization Tips)

Although it tries, the plan executed by the query processor isn’t always going to be the best plan. For example, a bad cardinality estimate might result in the wrong operator. That’s why you need to learn some query optimization! Here are my top tips and troubleshooting techniques for queries.

尽管可以尝试，但查询处理器执行的计划并不总是最佳的计划。例如，基数估计错误可能会导致运算符错误。这就是为什么您需要学习一些查询优化的原因！这是我查询的主要技巧和故障排除技术。

设置统计IO开 (SET Statistics IO ON)

Using the command SET Statistics IO ON before the query provides information that can help troubleshoot the query. STATISTICS IO shows you the IO that was incurred for each object. It is useful for understanding what happened behind the scenes and how the data was retrieved.

在查询之前使用命令SET Statistics IO ON提供可以帮助解决查询问题的信息。 统计IO向您显示每个对象产生的IO。对于了解幕后发生的情况以及如何检索数据很有用。

When the query completes, click the Messages tab to see the output.

查询完成后，单击“ 消息”选项卡以查看输出。

Notice the 19397 Logical Reads.

请注意19397 逻辑读取 。

A lower number is better when it comes to reads (logical and physical). A logical read is when the data is read from the SQL Server Buffer Pool. The SQL Server engine uses the buffer pool when when transferring data, like getting it from disk for example.

读取(逻辑和物理)值越小越好。逻辑读取是指从SQL Server缓冲池读取数据时。当传输数据时，例如从磁盘获取数据，SQL Server引擎将使用缓冲池。

To greatly reduce the number of logical reads, try adding an index on the table.

要大大减少逻辑读取的次数，请尝试在表上添加索引。

Notice logical reads is 0. Performance was boosted significantly over querying a table without an index. You can see different types of reads occurred instead of the standard Logical or Physical. This is because I’m using a ColumnStore index. ColumnStore indexes are typically used for large data tables or data warehouses.

请注意，逻辑读取为0。与查询没有索引的表相比，性能得到了显着提高。您可以看到发生了不同类型的读取，而不是标准的逻辑或物理读取。这是因为我正在使用ColumnStore索引。 ColumnStore索引通常用于大型数据表或数据仓库。

使用索引 (Use Indexes)

For good performance, it is imperative the tables have good indexing. Without getting too deep into the woods, basically, there are two types of indexes:

为了获得良好的性能，表必须具有良好的索引编制。不必太深入了解，基本上有两种类型的索引：

Clustered index — Clustered indexes sort and store the data rows in the table or view based on the key values.

聚集索引 — 聚集索引根据键值对数据行进行排序并将其存储在表或视图中。

Non-clustered index — A non-clustered index is an index in which the rows are ordered by the columns that make up the index.

非聚集索引 — 非 聚集索引是这样的索引，其中的行按组成索引的列排序。

A table without a clustered index is called a Heap. Most tables should have clustered indexes. If a table is a heap, it is still possible to add non-clustered indexes. Tables can have only 1 clustered index, but many non-clustered indexes.

没有聚集索引的表称为堆。大多数表应具有聚集索引。如果表是堆，仍然可以添加非聚集索引。表只能有1个聚集索引，但可以有许多非聚集索引。

If you’re not sure what to include in the index, generate the Estimated Execution Plan and the Missing Indexes feature provides information about missing indexes that could improve query performance.

如果您不确定要包括在索引中的内容，请生成“估计执行计划”，“ 缺失索引”功能将提供有关缺失索引的信息，这些信息可以提高查询性能。

谨慎使用 (Use caution)

Use caution when creating non-clustered indexes since they take up space, and over-indexing is bad. The problem with blindly creating this index in the example is that SQL Server has decided that it is useful for a particular query (or handful of queries), but is ignorant of the rest of the workload. The index might not be a good fit, so be aware of what you’re doing.

创建非聚簇索引时要小心 ，因为它们会占用空间，并且过度索引是不好的。在示例中盲目创建此索引的问题在于，SQL Server已确定该索引对于特定查询(或少数查询)很有用，但是却忽略了其余工作负载。该索引可能不适合，因此请注意您在做什么。

避免工会 (Avoid Unions)

When I am querying large tables, I do my best to avoid UNION. When I see queries using it, I hope I packed a sleeping bag because I might be there all night!

查询大表时，我会尽量避免使用UNION。当我看到使用它的查询时，希望我收拾一个睡袋，因为我可能整晚都在那儿！

First, it is important to know the difference between UNION and UNION ALL. The UNION operator is used to combine the result-set of two or more SELECT statements, but it will exclude duplicates. UNION ALL includes duplicates; it essentially concatenates the two datasets. Because they behavior differently, they have very different execution plans:

首先，了解UNION和UNION ALL之间的区别很重要。 UNION运算符用于合并两个或多个SELECT语句的结果集，但它将排除重复项 。 UNION ALL 包括重复项； 它本质上是连接两个数据集。由于它们的行为不同，因此它们具有非常不同的执行计划：

Lets take a look at this situation:

让我们看一下这种情况：

--table testing has 1,000,000 rows
--table testing2 has 5,000,000 rows--I want to return all 6 million rows. Which should I use? Is there a faster way?
select * from testing
union
select * from testing2select * from testing
union all
select * from testing2

Of the two options, UNION ALL will guarantee all rows are included. However, when comparing the execution plans, UNION ALL is slower!

在这两个选项中，UNION ALL将保证包括所有行。但是，在比较执行计划时，UNION ALL会更慢！

Notice UNION ALL takes over 5 minutes in the example!

注意，在示例中，UNION ALL花费了5分钟以上！

Instead of using UNION ALL, I like using a temporary table. I dump the data into a temporary table and then select all from the table.

我喜欢使用临时表， 而不是使用UNION ALL 。我将数据转储到临时表中，然后从表中全选。

select * into #tek from testing2
insert into #tek select * from testingselect * from #tek
--drop table #tek

Notice this method took under 3 minutes to return 6 million rows compared to UNION ALL which took over 5! Since the data exists in the temp table, it is in a semi-permanent place allowing you to do more with it if needed too.

请注意，此方法花了不到3分钟的时间才能返回600万行，而UNION ALL则需要5 分钟！由于数据存在于临时表中，因此它位于半永久位置，允许您在需要时进行更多处理。

避免排序 (Avoid Sort)

Do what you can to avoid seeing the Sort operator in the Execution Plan. Sorts are slow and can take up a lot of resources resulting in spills that eat up tempDB! If you see a warning sign in your execution plan, hover over it to see what it says.

尽力避免在执行计划中看到“排序”运算符。排序速度很慢，并且会占用大量资源，从而导致溢出而耗尽tempDB！如果在执行计划中看到警告标志，请将鼠标悬停在该计划上以查看其内容。

Don’t use an ORDER BY clause in your query if you don’t have to. Ideally, if you need to sort by a specific column often, you can add a non-clustered index for that column to help avoid Sort operators in the plan.

如果不需要，请不要在查询中使用ORDER BY子句。理想情况下，如果您需要经常按特定的列排序，则可以为该列添加非聚集索引，以帮助避免计划中的“排序”运算符。

现货懒桌子线轴 (Spot Lazy Table Spools)

I’ve seen Table Spools cause queries to takes hours when they should be taking minutes, or even seconds. The Table Spool Operator is essentially used to create a temporary table in memory or on-disk that stores results of sub-queries that might be used multiple times in the execution plan. The table spool builds a temporary table that is lazy, meaning it only accesses rows when it is needed. There are 5 or so different types of Spool operators, but all have similar purpose.

我已经看到表假脱机会导致查询花费数小时甚至数秒的时间。的 表假脱机操作符本质上用于在内存或磁盘上创建一个临时表，该临时表存储可能在执行计划中多次使用的子查询的结果。表假脱机构建了一个懒惰的临时表，这意味着它仅在需要时才访问行。大约有5种不同类型的Spool运算符，但是它们都有相似的用途。

Table Spools are tricky because they can sometimes show a low Cost %, but be a huge bottleneck in the execution plan. Hover over the operator to see how the estimated rows compare to the actual rows and collect additional info about the properties!

表假脱机非常棘手，因为它们有时可以显示较低的“成本百分比”，但会成为执行计划中的巨大瓶颈。将鼠标悬停在运算符上可以查看估算的行与实际行的比较情况，并收集有关属性的其他信息！

To avoid Table Spools, try using an index that includes all fields in the query. If that is not possible, you can try Query Hints or force the order of the query or specify the join operation. For example, try using INNER HASH JOIN instead of INNER JOIN. Forcing Hash joins can boost speed significantly, but they can cause a lot of spill over into TempDB, so be very careful using query hints!

为避免表假脱机，请尝试使用包含查询中所有字段的索引。如果不可能，则可以尝试查询提示或强制查询顺序或指定联接操作。例如，尝试使用INNER HASH JOIN代替INNER JOIN。强制使用哈希联接可以显着提高速度，但是它们可能导致大量溢出到TempDB中，因此请谨慎使用查询提示！

最后的想法 (Final Thoughts)

Understanding execution plans and optimizing SQL queries can be tedious and take a while to learn. I’ve been using SQL for years and still learn new techniques all the time! As long as you remember the following, you’re on your way to becoming a SQL Query tuning wizard:

了解执行计划和优化SQL查询可能是乏味的，需要一段时间才能学习。我已经使用SQL多年了，仍然一直在学习新技术！只要您记住以下几点，就可以成为SQL查询调优向导：

Use table indexes
使用表索引
Set Statistics IO on
将统计数据IO设置为打开
Check the Execution Plan
检查执行计划

Check out my other articles on SQL, Programming and Data Science if you enjoyed this article!

如果您喜欢这篇文章，请查看我有关SQL，编程和数据科学的其他文章！

翻译自: https://medium.com/swlh/become-a-sql-wizard-using-these-query-optimization-tips-a932d18c762f