学习sql注入:猜测数据库

We don’t pick a hammer and look for nails — that would be an unusual way of solving problems. The usual way of doing business is to identify the problem first, then look for appropriate tools.

我们不用锤子找钉子，那是解决问题的不寻常方式。做生意的通常方法是先确定问题，然后寻找合适的工具。

I’ve seen over and over again people learn SQL by picking a SQL statement and then learn how to use it. In my opinion, this tool-based mindset is an inefficient way of learning things, and on the other hand, flipping this mindset can make a huge difference. Problems first, tools to follow!

我已经一遍又一遍地看到人们通过选择一条SQL语句来学习SQL，然后学习如何使用它。我认为，这种基于工具的思维方式是一种学习事物的低效方式，另一方面，翻转这种思维方式可能会产生巨大的变化。问题第一，工具跟随！

If you are into data science, you know the capabilities of pandas andtidyversetin filtering, sorting, grouping, merging — all sorts of data handling operations. With SQL you will do similar things, but in a database environment and using a different language.

如果您对数据科学tidyverset ，那么您将了解pandas和tidyverset在过滤，排序，分组和合并(各种数据处理操作)中的功能。使用SQL，您将执行类似的操作，但是要在数据库环境中并使用另一种语言。

The purpose of this article is to demonstrate how to solve data handling problems in SQL taking a similar approach that data scientists typically follow in a programming environment. You will not learn everything under the sun on SQL, rather the objective is to show “how to” learn.

本文的目的是演示如何使用数据科学家通常在编程环境中遵循的类似方法来解决SQL中的数据处理问题。您不会在SQL的基础上学到所有东西，而是要展示“如何”学习。

适用于您的实践SQL编辑器 (SQL editor for your practice)

If you have a relational database management system installed on your computer, fire it up. If not, w3schools has an online SQL editor that you can use right away on your browser.

如果您的计算机上安装了关系数据库管理系统，请启动它。如果没有，则w3schools有一个在线SQL编辑器，您可以在浏览器上立即使用它。

You’ll also notice there are quite a few datasets on the right side of the screen that you can use and practice along.

您还将注意到屏幕右侧有很多数据集，您可以使用和实践。

Now let’s get into “how to” solve actual data handling problems using SQL.

现在让我们进入“如何”使用SQL解决实际数据处理问题的过程。

了解数据 (Understanding data)

Just like what you do with your favorite programming library such aspandas, the first thing you need to do is loading the dataset in the SQL environment.

就像您对喜欢的编程库(如pandas所做的一样，您需要做的第一件事是在SQL环境中加载数据集。

And like basic exploratory data analysis (EDA) in a typical data science project, you are able to check out the first few rows, count the total number of rows, see column names, data types etc. Below are a few commands.

与典型数据科学项目中的基本探索性数据分析(EDA)一样，您可以签出前几行，计算行的总数，查看列名，数据类型等。以下是一些命令。

# import data into editor
SELECT * # import all columns with *, else specify column name
FROM table_name
LIMIT 10 #to show 10 rows# import and save data as a separate table
SELECT *
INTO new_table_name
FROM table_name# count number of rows in the dataset
SELECT 
COUNT(*)
FROM table_name# count unique values of a single column
SELECT 
COUNT(DISTINCT column_name) 
FROM table_name

使用列 (Working with columns)

Databases are often quite large, it can take a long time to run queries. So if you know what specific columns you are interested in, you could just make a subset of the data by selecting those columns.

数据库通常很大，运行查询可能需要很长时间。因此，如果您知道感兴趣的特定列，则可以通过选择这些列来构成数据的子集。

You might also want to perform column operations such as renaming, creating new columns etc.

您可能还需要执行列操作，例如重命名，创建新列等。

# select two columns from a multi-column dataset
SELECT column1, column2
FROM tableName# rename a column
SELECT
ProductName AS name
FROM productTable# new conditional column (similar to if statment)
SELECT ProductName, Price,(CASE
WHEN Price > 20 AND Price <41 THEN 'medium '
WHEN Price >40 THEN 'high'
ELSE 'low'
END) AS newColNameFROM Products

筛选行 (Filtering rows)

Filtering rows is probably the most important task you will do frequently with SQL. From a large dataset you’ll often filter rows based on product type, range of values etc.

过滤行可能是您将经常使用SQL进行的最重要的任务。在大型数据集中，您经常会根据产品类型，值范围等来过滤行。

If you are learning SQL you should devote a substantial amount of time learning the many different ways to filter data and the SQL statements you’ll need.

如果您正在学习SQL，则应该花大量时间学习各种不同的过滤数据和所需SQL语句的方法。

# select all records that starts with the letter "S"
SELECT * FROM Products
WHERE ProductName like 'S%'# select all records that end at "S"
SELECT * FROM Products
WHERE ProductName like '%S'# select all records that does NOT start at "S"
SELECT * FROM Products
WHERE ProductName like '[^S]%'# filter rows with specific value
SELECT * FROM table_nameWHERE firstName = 'Pilu'
OR lastName != 'Milu'
AND income <= 100
AND city IN ('Arlington', 'Burlington', 'Fairfax')# filter rows within a range of numbers
SELECT *
FROM tableName
WHERE income BETWEEN 100 AND 200 # filter null values
SELECT * FROM tableName
WHERE columnName IS NULL # opposite "IS NOT NULL"

联接数据集 (Joining datasets)

You are using SQL in a relational database management system (RDBMS), which means you will be working with multiple tables at a time, so they need to be joined before you are able to do advanced modeling.

您正在关系数据库管理系统(RDBMS)中使用SQL，这意味着您将一次处理多个表，因此在进行高级建模之前需要将它们连接在一起。

There are basically four ways to join data — left, right, inner, full outer joins — and you need to google a little bit to see how each works, but I’m giving all the codes below to perform these joins.

基本上有四种连接数据的方法-左，右，内部，完全外部联接-您需要用一点点Google来了解每种方法的工作原理，但是我在下面提供了所有代码来执行这些联接。

# inner join (for matching records only)
SELECT * FROM
table1 INNER JOIN table2
ON table1.ID = tbale2.ID# full outer join (all left + all right)
SELECT * FROM
table1 FULL OUTER JOIN table2
ON table1.ID = tbale2.ID# left join (all records from left + matching records from right)
SELECT * FROM
table1 LEFT JOIN table2
ON table1.ID = tbale2.ID# left join (matching records from left + all records from right)
SELECT * FROM
table1 RIGHT JOIN table2
ON table1.ID = tbale2.ID

进行计算 (Doing calculations)

Creating summary statistics, mathematical operations and building models is what you do every day as a data scientist. SQL is not the right tool for much of that, however, if you need to create a quick summary statistics you can use aggregate functions to calculate column mean, totals, min/max values etc.

创建汇总统计信息，数学运算和构建模型是您作为数据科学家每天要做的事情。 SQL并不是大多数情况下的正确工具，但是，如果您需要创建快速汇总统计信息，则可以使用聚合函数来计算列平均值，总计，最小/最大值等。

# new calculated column
SELECT Price,
(Price * 2) AS NewCol
FROM Products# aggregation by group
SELECT CategoryID, SUM(Price) 
FROM Products
GROUP BY CategoryID# min/max values of a column
SELECT ProductName, MIN(Price)
FROM Products

最后的笔记 (Final notes)

The purpose of this article was to introduce some basic SQL concepts and statements for querying data from a relational database management system. But the primary objective was to show how to learn SQL as a data scientist with a mindset to solve a particular problem rather than focusing on SQL statements.

本文的目的是介绍一些基本SQL概念和语句，用于从关系数据库管理系统中查询数据。但是主要目的是展示如何以解决特定问题的思维方式来学习数据科学家，而不是专注于SQL语句。