程序员 sql面试
Today, the word of the moment is DATA, this little combination of 4 letters is transforming how all companies and their employees work, but most people don’t really know how data behaves or how to access it and they also think that this is just for the tech dude from the IT team or someone who knows to code.
今天,关键是DATA,这四个字母的小组合正在改变所有公司及其员工的工作方式,但是大多数人并不真正了解数据的行为方式或访问方式,他们还认为这仅仅是来自IT团队或知道编码的人的技术花花公子。
So this article is going to explain a part of this complex world in an easy way and we shall begin by the beginning… the data.
因此,本文将以一种简单的方式来解释这个复杂世界的一部分,我们将从头开始……数据。
Basically, we can segment data into two groups, Structured and Unstructured.
基本上,我们可以将数据分为两组,即结构化和非结构化 。
As you can see, Unstructured data is way more abundant than Structured, but despite that what is more common to use in the usual day-to-day is the Structured one and this occurs for some reasons as:
如您所见,非结构化数据比结构化数据要丰富得多,尽管如此,在日常日常使用中更常见的是结构化数据,这种情况发生的原因如下:
- Can be displayed in rows and columns 可以显示在行和列中
- Requires less storage 需要更少的存储空间
- Easier to manage and manipulate 易于管理和操纵
I’m not here to say that this 80% is not important, quite the opposite, but it’s just a bit more complex to deal with it so it’s no the focus of this text.
我并不是要说这80%并不重要,恰恰相反,但是处理起来稍微复杂一点,所以它不是本文的重点。
Having this explained, we know that our objective is to learn a little bit of how can we access and manipulate this kind of data.
对此进行了解释后,我们知道我们的目标是学习一些有关如何访问和操作此类数据的知识。
结构化数据概念 (Structured Data Concepts)
Let’s use this image below as an example:
让我们以下面的图片为例:
Structured data is organized in Tables, in this image, we have three and they are Persons, Dept_Members, and Department.
结构化数据按表格组织,在此图像中,我们有三个,分别是Persons,Dept_Members和Department。
Each Table is organized in Columns and Rows.
每个表都按列和行组织。
Each column has a data type, depending on the database that you are using the name or the number of available data types can change, but basically we have in a macro vision Strings, Numbers, Dates, and Timestamps.
每列都有一种数据类型 ,具体取决于您使用的数据库的名称或可用数据类型的数量可以更改,但是从本质上讲 ,我们在宏方面具有Strings , Numbers , Dates和Timestamps 。
- Strings: Everything that is a text. 字符串:一切都是文本。
- Numbers: Everything that is, obviously, a number. 数字:显然所有的东西都是数字。
- Dates: Only dates are accepted, it doesn’t count with hours, minutes, and seconds. 日期:仅接受日期,不计小时,分钟和秒。
- Timestamps: Dates with hours, minutes, and seconds. 时间戳记:带有小时,分钟和秒的日期。
As I said, some databases change some names or have some more specific uses, for example:
正如我所说,某些数据库会更改某些名称或具有某些更特定的用途,例如:
In Oracle when we want to declare a string column we can call it by VARCHAR2 or CHAR, the difference between them is the number of characters that they deal with (char stores only one character while varchar2 stores N) and if we look to Google Big Query we just have the String data type for all cases of text data.
在Oracle中,当我们想声明一个字符串列时,可以通过VARCHAR2或CHAR来调用它,它们之间的区别是它们处理的字符数(char仅存储一个字符,而varchar2存储N),并且如果我们查看Google Big查询我们仅具有用于文本数据所有情况的String数据类型。
Well, once we have spoken about the columns now what left is to talk about the rows. Basically, each row is a record of the Table and the one very important question is “How can we differentiate one record from another? What does separate them?”
好了,一旦我们谈论了列,剩下的就是谈论行。 基本上,每一行都是表的记录,一个非常重要的问题是“如何区分一条记录与另一条记录? 它们之间有什么区别?”
The answer is the Primary Key.
答案是主键 。
The combination of columns of a record in a table that makes it unique is the primary key. Some tables have a specific column that works as an index, this works as a primary key too, but it does not show to you what makes it unique.
使记录唯一的表中各列的组合是主键。 有些表有一个特定的列用作索引,该列也用作主键,但是并没有向您显示使其独特的原因。
Some databases describe it in the table documentation, but if you don’t have this information don’t be afraid and explore your dataset!
一些数据库在表文档中对此进行了描述,但是,如果您没有此信息,请不要害怕并探索您的数据集!
And here comes the main goal of the article: How do I explore it?
这是本文的主要目标: 如何探索它?
结构化查询语言(SQL) (Structured Query Language (SQL))
Structured Query Language is the standard declarative search language for relational database
结构化查询语言是关系数据库的标准声明式搜索语言
This text above it the dictionary explanation of what is SQL, but we can translate it by the code language that lets us get data from one table or a combination of them and how their data is related.
上面的文本对什么是SQL进行了字典解释,但是我们可以通过使您可以从一个表或它们的组合中获取数据以及它们的数据如何相关的代码语言来翻译它。
Resuming it at max we can say that the standard SQL query has “only”, with huge quotes here, 3 elements:
以最大的速度恢复它,我们可以说标准SQL查询具有“ only”(在此处带有引号)三个元素:
SELECT: where you define what you want to pick from your tables.
SELECT :您在其中定义要从表中选择的内容。
FROM: where you define which tables you are going to use and their relationship.
FROM :您将在其中定义要使用的表及其关系。
WHERE: where you define what you want and do not want to see.
在哪里 :您可以在其中定义想要和不想看到的内容。
This is how a query looks like.
这就是查询的样子。
Now, what we can understand here:
现在,我们在这里可以理解的是:
We want to get data from columns A, B, C, and D from TABLE_1 that is in the SCHEMA_1 (that is like a folder of tables) and we desire just rows with code ‘0001’ in column A.
我们想从SCHEMA_1中的TABLE_1的A,B,C和D列获取数据(就像表的文件夹),并且我们只希望A列中的代码为“ 0001”的行。
It was easy, isn’t it? Let’s get a little bit more complex example.
很简单,不是吗? 让我们来看一些更复杂的例子。
In column C we have a number (it could be sales quantity, stock projection, purchase order quantity, etc) and we want to sum the values by column A (again, it could be a store or product codes) and column B (maybe a date).
在C列中,我们有一个数字(可能是销售数量,库存预测,采购订单数量等),我们想按A列(同样可以是商店或产品代码)和B列(可能是一个约会)。
We also want to order it first by column A and after column B.
我们还希望先按A列然后按B列对其进行排序。
Now, when we want to aggregate some value based on another attribute we have to say “Look, I’m aggregating this guy here ( C ) this way (sum) and by these two dudes (group by A and B).”
现在,当我们要基于另一个属性汇总一些值时,我们必须说“看,我正在以这种方式(和)并通过这两个花花公子(按A和B分组)来汇总此人(C)。”
By the end, this isn’t too different from the last one, right?
最后,这与上一个没有太大不同,对吗?
表之间的关系 (Relationship Between Tables)
So, until now, all examples were for querying data of only one table at a time, but what we have to do if we want to merge data from two or more different tables?
因此,到目前为止,所有示例都仅一次查询一个表的数据,但是如果要合并来自两个或多个不同表的数据该怎么办?
The answer is simples, we must say how they relate to each other by simply specifying which columns have equivalent data.
答案很简单,我们必须通过简单地指定哪些列具有等效数据来说明它们之间的关系。
Now our example is a sales table, and there I only have center and product codes, and product stock quantity, but I want to get product and center names too but both informations are from other tables.
现在我们的示例是一个销售表,那里只有中心代码和产品代码以及产品库存数量,但是我也想获得产品和中心名称,但这两个信息都来自其他表。
I’ll say too that I just want to see the stock quantity that is higher than 100 units.
我也要说的是,我只想查看高于100个单位的库存数量。
Let’s focus on the differences between the last example and this one. What is new here?
让我们集中讨论最后一个示例和这个示例之间的区别。 这里有什么新东西?
Tables can have nicknames, this is commonly used when the query has more than one table on it.
表格可以有昵称 ,当查询上有多个表格时,通常使用该昵称 。
Using this you don’t have to write the whole table location which time you reference it.
使用此方法,您不必在引用该表时就编写整个表的位置。
It’s also important because when we join tables we can have the same column names in both tables and we must pass in the query if we are using PLNT_CD from DIM_PLNT or from FT_SLS, otherwise, the query doesn’t know from which table it has to considerate.
这一点也很重要,因为当我们联接表时,两个表中的列名可以相同,并且如果我们使用的是DIM_PLNT或FT_SLS的PLNT_CD,则必须传递查询,否则,查询将不知道从哪个表周到。
Join is the way you combine data from tables. Always think in two tables at a time, one is called Left and another is the Right.
连接是合并表中数据的方式。 总是一次在两个表中思考,一个称为左,另一个称为右。
Left has a conjunct of records called L, Right has another conjunct called R and some records exist in both.
左有一个称为L的记录的连接,右有另一个称为R的连接,并且两者中都存在一些记录。
Joins can be of several types, the one that is shown in the example is the Left Join, this means that we are going to use only the records of the left table and bring values of Right that has a corresponding value in Left.
联接可以有几种类型,示例中显示的联接是“左联接”,这意味着我们将仅使用左表的记录,并在“左”中带入具有相应值的“右”值。
When we are comparing columns from tables to create the join between them we have to remember that is necessary for the relationship to be sealed they must be of the same data type.
当我们比较表中的列以创建它们之间的联接时,我们必须记住,密封关系是必要的,它们必须具有相同的数据类型 。
In this example, we can see that in the join we had to convert the PLNT_CD field of SLS table to STRING, otherwise the join was not able to be consolidated.
在此示例中,我们可以看到在连接中我们必须将SLS表的PLNT_CD字段转换为STRING,否则无法合并该连接。
Inside the Where clause, we have a new struct called Between, it is used to filter a range of data. By syntax, it has to be higher than the first parameter and lower than the second one.
在Where子句中,我们有一个称为Between的新结构,用于过滤一系列数据。 通过语法,它必须高于第一个参数,并且小于第二个参数。
By last, when we have a SUM() or a MEAN() or any other math applied in the query we maybe desire to filter some more specific results and the Having helps us to achieve it by letting us filter the final result of the query before it is shown.
最后,当我们在查询中应用了SUM()或MEAN()或任何其他数学运算时,我们可能希望过滤一些更具体的结果,而Haveing通过让我们过滤查询的最终结果来帮助我们实现此目标在显示之前。
We are getting closer to the end of this article, but before we finish it…
我们接近本文的结尾,但是在完成本文之前……
提示 (Tips)
- In case you don’t know what you want to see in the table you can just use a * to return all columns of the table in the query. 如果您不知道要在表中看到什么,则可以使用*返回查询中表的所有列。
SELECT
TABLE.*
FROM
SCHEMA.TABLE
- Basically we have two types of tables, dimensional (DIM) and fact (FT). 基本上,我们有两种类型的表,维度(DIM)和事实(FT)。
Dimensional tables store data that is attributed to a store or a product like its name, address, shape, size, etc, think about the data up to date or you maybe say it’s today’s value.
维度表存储归因于商店或产品的数据,例如其名称,地址,形状,大小等,考虑一下最新的数据,或者您可以说它是当今的价值。
Fact tables store data related to transactional information as purchase orders or sales tickets, so it brings the data of the moment of an event.
事实表将与交易信息相关的数据存储为采购订单或销售单,因此它会带来事件发生时的数据。
- Types of joins 联接类型
There are several types of joins, I could basically write another article just with this theme, but I found this resume that explains a little bit of them.
联接有几种类型,我基本上可以只用该主题写另一篇文章,但是我发现这份简历可以解释其中的一些内容。
- In and not in 在而不在
Sometimes the need is to get data not of a single product or location, but of a list of them.
有时,需要获取的数据不是单个产品或位置的数据,而是列表中的数据。
In these cases, you could use the operator IN or NOT IN in the WHERE to set as a parameter a list of desired variables instead of an infinity repetition of ANDs and ORs searching by one parameter at a time.
在这些情况下,可以在WHERE中使用运算符IN或NOT IN将所需变量列表设置为参数,而不是一次按一个参数搜索的AND和OR的无穷重复。
就是这样! (And this is it!)
Well, with this I think you can now use SQL to access your data with a little more ease!
好吧,我想您现在可以使用SQL来更轻松地访问数据!
I hope this article has helped you!
希望本文对您有所帮助!
翻译自: https://medium.com/swlh/sql-use-guide-for-non-programmers-5997af000c5f
程序员 sql面试
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388469.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!