bigquery 教程_bigquery挑战实验室教程从数据中获取见解

bigquery 教程

This medium article focusses on the detailed walkthrough of the steps I took to solve the challenge lab of the Insights from Data with BigQuery Skill Badge on the Google Cloud Platform (Qwiklabs). I got access to this lab in the Google Cloud Ready Facilitator Program. Thanks to Google!

这篇中篇文章重点介绍了我为解决Google Cloud Platform( Qwiklabs )上的BigQuery Skill Badge数据见解挑战实验室而采取的步骤的详细演练。 我可以通过Google Cloud Ready Facilitator计划访问此实验室。 感谢Google!

Till now, I have completed over 100 labs and 23 quests on Qwiklabs. Below is the reference of my profile.

到目前为止,我已经完成了100多个实验室和Qwiklabs上的23个任务 。 以下是我的个人资料参考。

This lab is only recommended for students who have completed the labs in the Insights from Data with BigQuery Quest. Knowledge of SQL and BigQuery is also needed to solve this challenge lab. Are you up for the challenge? Let’s go!

仅向在使用BigQuery Quest进行数据洞察中完成实验的学生推荐该实验。 的知识 解决此挑战实验室也需要SQL BigQuery你准备好接受挑战了吗? 我们走吧!

使用的数据集 (Dataset Used)

The dataset that we would be using in this challenge lab is bigquery-public-data.covid19_open_data.covid19_open_data. This dataset contains data related to covid-19 on a country basis globally. We would be using this in this skill badge tutorial.

我们将在此挑战实验室中使用的数据集为bigquery-public-data.covid19_open_data.covid19_open_data。 该数据集包含全球基于国家/地区与covid-19相关的数据。 我们将在本技能徽章教程中使用它。

BigQuery Tutorial can be found on the reference below:

可以在以下参考资料中找到BigQuery教程:

挑战场景 (Challenge Scenario)

There are 10 small tasks in this challenge lab, all of which should be completed to score 100/100. In order to pass the lab, there are 9 SQL commands and 1 Data Studio report that should be generated in order to score 100. This tutorial list out the steps I took to solve all the ten challenges within the lab. The ten tasks are as follows:

这个挑战实验室中10个小任务 ,所有这些小任务都应得分为100/100。 为了通过实验室,应生成9条SQL命令和1个Data Studio报告才能获得100分。本教程列出了我为解决实验室中的所有十个挑战而采取的步骤。 十个任务如下:

  1. Building a SQL query that outputs the total no. of confirmed cases.

    建立一个SQL查询,输出总编号。 确诊病例。

  2. Building a SQL query that outputs the worst affected areas.

    构建一个SQL查询以输出受影响最严重的区域。

  3. Building a SQL query that identifies the Hotspots in USA.

    建立一个SQL查询来标识美国热点。

  4. Building a SQL query that outputs the Fatality Ratio.

    建立一个输出致命率SQL查询

  5. Building a SQL query that identifies a specific day according to the constraints.

    建立一个SQL查询来根据约束条件确定特定的一天

  6. Building a SQL query that outputs the number of days with zero net new cases.

    建立一个SQL查询,以输出净新案例为零的天数。

  7. Building a SQL query that outputs the Doubling Rate.

    建立一个输出双倍速率SQL查询

  8. Building a SQL query that outputs the Recovery Rate.

    构建一个输出恢复率SQL查询

  9. Building a SQL query that outputs the CDGR — Cumulative Daily Growth Rate.

    构建一个输出CDGRSQL查询-累积每日增长率。

  10. Creating a Datastudio report.

    创建一个Datastudio报告。

重要的提示 (Important Note)

Before starting this lab, ensure that you do whatever is required. Allocating more resources or doing something that is not required may lead to blocking of account by qwiklabs admin. Doing something other than that required in the lab results in account blocked by qwiklabs. Don’t worry. I came across this problem. The account can easily be unblocked by contacting qwiklabs support within a second.

在开始本实验之前,请确保您执行所需的任何操作。 分配更多资源或执行不必要的操作可能会导致qwiklabs管理员阻止帐户。 如果执行实验室中未要求的操作,则会导致qwiklabs阻止帐户。 不用担心 我遇到了这个问题。 一秒钟内联系qwiklabs支持人员即可轻松解除帐户锁定。

加载数据集 (Loading the Dataset)

  1. In the cloud console, once logged in completely, Go to Menu > BigQuery.

    在云控制台中,一旦完全登录,请转到菜单> BigQuery。

  2. Click + Add Data and then click on Explore Public Datasets from the left pane.

    单击+添加数据 ,然后从左窗格中单击探索公共数据集

  3. Search covid19_open_data and then select “Covid-19 Open Data”. Click on View Dataset to explore more!

    搜索covid19_open_data ,然后选择“ Covid-19 Open Data”。 单击查看数据集以探索更多内容!

  4. Use filter and locate the table covid19_open_data under the covid19_open_data dataset.

    使用过滤器并在covid19_open_data下找到表covid19_open_data 数据集。

Image for post
Image by Wynn Pointaux on Pixabay
该图片由Wynn Pointaux在Pixabay上发布

任务详细教程— 1 (Detailed Tutorial of Task — 1)

In task 1 it requires the user to execute a query that outputs the total count of confirmed cases on Apr 15, 2020. The output should contain only a single row containing the sum of confirmed cases across all the countries in the dataset. total_cases_worldwide should be the name of the column.

在任务1中,它要求用户执行查询,以输出2020年4月15日确诊病例的总数 。输出应仅包含一行,其中包含数据集中所有国家/地区的确诊病例的总数。 total_cases_worldwide应该是列的名称。

Copy the below query in the query editor and click on RUN.

在查询编辑器中复制以下查询,然后单击“ 运行”。

SELECTSUM(cumulative_confirmed) AS total_cases_worldwideFROM
`bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
date = "2020-04-15"

任务详细教程— 2 (Detailed Tutorial of Task — 2)

Task 2 requires to build a query for extracting the result of: “How many states in the US had more than 100 deaths on Apr 10, 2020?” The output should have the field name as count_of_states.

任务2需要构建一个查询来提取以下结果:“ 到2020年4月10日,美国有多少州的死亡人数超过100? 输出的字段名称应为count_of_states。

Hint: We don’t have to include NULL values.(Important)

提示:我们不必包含NULL值。(重要)

Copy the below query in the query editor and click on RUN.

在查询编辑器中复制以下查询,然后单击“ 运行”。

SELECTCOUNT(*) AS count_of_statesFROM (SELECT
subregion1_name AS state,SUM(cumulative_deceased) AS death_countFROM
`bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
country_name="United States of America"AND date='2020-04-10'AND subregion1_name IS NOT NULLGROUP BY
subregion1_name
)WHERE death_count > 100

任务详细教程— 3 (Detailed Tutorial of Task — 3)

Writing a query that will output the result of: “List all the states in the United States of America that had more than 1000 confirmed cases on Apr 10, 2020?” The output should have two columns named state and total_confirmed_cases that corresponds to State Name and the confirmed cases arranged in descending order.

编写查询将输出以下结果:“ 列出2020年4月10日美国确诊病例超过1000的所有州? ”输出应具有名为statetotal_confirmed_cases的两列,分别对应于State Name和已确认的个案,它们以降序排列。

Copy the below query in the query editor and click on RUN.

在查询编辑器中复制以下查询,然后单击“ 运行”。

SELECT
subregion1_name AS state,SUM(cumulative_confirmed) AS total_confirmed_casesFROM
`bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
country_name="United States of America"AND date = "2020-04-10"GROUP BY subregion1_nameHAVING total_confirmed_cases > 1000ORDER BY total_confirmed_cases DESC

任务详细教程— 4 (Detailed Tutorial of Task — 4)

Building a query in the query editor that will answer the following question: “What was the case-fatality ratio in Italy for the month of April 2020?

在查询编辑器中构建一个查询,该查询将回答以下问题: “意大利2020年4月的病死率是多少?

Case-fatality ratio is defined as (total deaths / total confirmed cases) * 100. The output should have three columns named total_confirmed_cases, total_deaths and case_fatality_ratio.

病死率定义为(总死亡人数/确诊病例总数)*100 。输出应具有三列,分别称为total_confirmed_casestotal_deaths和case_fatality_ratio

Copy the below query in the query editor and click on RUN.

在查询编辑器中复制以下查询,然后单击“ 运行”。

SELECT SUM(cumulative_confirmed) AS total_confirmed_cases, SUM(cumulative_deceased) AS total_deaths, (SUM(cumulative_deceased)/SUM(cumulative_confirmed))*100 AS case_fatality_ratioFROM `bigquery-public-data.covid19_open_data.covid19_open_data`WHERE country_name="Italy" AND date BETWEEN "2020-04-01" AND "2020-04-30"

任务详细教程— 5 (Detailed Tutorial of Task — 5)

Building a query that will answer the following question: “On what day did the total number of deaths cross 10000 in Italy?

建立一个查询,将回答以下问题:“ 意大利的总死亡人数在哪一天超过10000?

The query should output the date with a column name “date” and in the format “yyyy-mm-dd”.

查询应以列名称“ date”和格式“ yyyy-mm-dd”输出日期。

Copy the below query in the query editor and click on RUN.

在查询编辑器中复制以下查询,然后单击“ 运行”。

SELECT
dateFROM
`bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
country_name = 'Italy'AND cumulative_deceased > 10000ORDER BY dateLIMIT 1

任务详细教程— 6 (Detailed Tutorial of Task — 6)

The query given should be updated to output the correct number of days in India between 21 Feb 2020 and 15 March 2020 when there were zero increases in the number of confirmed cases.

给出的查询应进行更新,以输出2020年2月21日至2020年3月15日之间印度的正确天数,此时确诊病例数增加为零。

Copy the below query in the query editor and click on RUN.

在查询编辑器中复制以下查询,然后单击“ 运行”。

WITH india_cases_by_date AS (SELECT
date,SUM(cumulative_confirmed) AS casesFROM
`bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
country_name="India"AND date between '2020-02-21' and '2020-03-15'GROUP BY
dateORDER BY
date ASC
)
, india_previous_day_comparison AS
(SELECT
date,
cases,
LAG(cases) OVER(ORDER BY date) AS previous_day,
cases - LAG(cases) OVER(ORDER BY date) AS net_new_casesFROM india_cases_by_date
)SELECTCOUNT(date)FROM
india_previous_day_comparisonWHERE
net_new_cases = 0

任务详细教程— 7 (Detailed Tutorial of Task — 7)

Using the query that we ran in Task 6 as a template, the user has to build a query to find out the dates on which the confirmed cases increased by more than 10% compared to the previous day in the US between the dates March 22, 2020 and April 20, 2020.

使用我们在任务6中运行的查询作为模板,用户必须构建查询以找出确认的病例比3月22日在美国的前一天增加了10%以上的日期, 2020年和2020年4月20日。

There should be four columns named Date, Confirmed_Cases_On_Day, Confirmed_Cases_Previous_Day and Percentage_Increase_In_Cases.

应该有四列,分别命名为DateConfirmed_Cases_On_DayConfirmed_Cases_Previous_DayPercentage_Increase_In_Cases

Copy the below query in the query editor and click on RUN.

在查询编辑器中复制以下查询,然后单击“ 运行”。

WITH us_cases_by_date AS (SELECT
date,SUM( cumulative_confirmed ) AS casesFROM
`bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
country_name="United States of America"AND date between '2020-03-22' and '2020-04-20'GROUP BY
dateORDER BY
date ASC
)
, us_previous_day_comparison AS
(SELECT
date,
cases,
LAG(cases) OVER(ORDER BY date) AS previous_day,
cases - LAG(cases) OVER(ORDER BY date) AS net_new_cases,
(cases - LAG(cases) OVER(ORDER BY date))*100/LAG(cases) OVER(ORDER BY date) AS percentage_increaseFROM us_cases_by_date
)SELECT
Date,
cases AS Confirmed_Cases_On_Day,
previous_day AS Confirmed_Cases_Previous_Day,
percentage_increase AS Percentage_Increase_In_CasesFROM
us_previous_day_comparisonWHERE
percentage_increase > 10

任务详细教程— 8 (Detailed Tutorial of Task — 8)

Building a query to list the recovery rates of countries on the date May 10, 2020 with only those countries having more than 50K confirmed cases and output arranged in descending order (limit to 10). The name of the columns in the output should be as country, recovered_cases, confirmed_cases, recovery_rate in order to score full marks.

生成查询以列出2020年5月10日的国家的恢复率,只有那些确认病例和产量超过5万的国家/地区以降序排列(限制为10个)。 在输出列的名称应为国家 ,recovered_cases,confirmed_cases,recovery_rate才能得满分。

Copy the below query in the query editor and click on RUN.

在查询编辑器中复制以下查询,然后单击“ 运行”。

WITH cases_by_country AS (SELECT
country_name AS country,SUM(cumulative_confirmed) AS cases,SUM(cumulative_recovered) AS recovered_casesFROM
`bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
date="2020-05-10"GROUP BY
country_name
)
, recovered_rate AS (SELECT
country, cases, recovered_cases,
(recovered_cases * 100)/cases AS recovery_rateFROM
cases_by_country
)SELECT country, cases AS confirmed_cases, recovered_cases, recovery_rateFROM
recovered_rateWHERE
cases > 50000ORDER BY recovery_rate DESCLIMIT 10

任务详细教程— 9 (Detailed Tutorial of Task — 9)

Building a query that outputs the correct CDGR in the correct format. The CDGR or Cumulative Daily Growth Rate is calculated as:

建立一个以正确格式输出正确CDGR的查询。 CDGR或累计每日增长率计算为:

((last_day_cases/first_day_cases)^1/days_diff)-1)

((last_day_cases/first_day_cases)^1/days_diff)-1)

Where last_day_cases, first_day_cases and days_diff is given as:

其中last_day_cases,first_day_cases和days_diff给出为:

  • last_day_cases corresponds to the number of confirmed cases on May 10, 2020

    last_day_cases对应于2020年5月10日的确诊病例数

  • first_day_cases corresponds to the number of confirmed cases on Feb 02, 2020

    first_day_cases对应于2020年2月2日的确诊病例数

  • days_diff corresponds to the number of days between Feb 02 - May 10, 2020

    days_diff对应于2020年2月2日至5月10日之间的天数

Copy the below query in the query editor and click on RUN.

在查询编辑器中复制以下查询,然后单击“ 运行”。

WITH
france_cases AS (SELECT
date,SUM(cumulative_confirmed) AS total_casesFROM
`bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
country_name="France"AND date IN ('2020-01-24',
'2020-05-10')GROUP BY
dateORDER BY
date)
, summary as (SELECT
total_cases AS first_day_cases,
LEAD(total_cases) OVER(ORDER BY date) AS last_day_cases,
DATE_DIFF(LEAD(date) OVER(ORDER BY date),date, day) AS days_diffFROM
france_casesLIMIT 1
)select first_day_cases, last_day_cases, days_diff, POWER(last_day_cases/first_day_cases,1/days_diff)-1 as cdgrfrom summary

任务详细教程— 10 (Detailed Tutorial of Task — 10)

For creating the Data Studio report, a number of steps should be followed.

要创建Data Studio报表,应遵循许多步骤。

1. First of all, Copy the below query in the query editor and click on RUN.

1.首先,在查询编辑器中复制以下查询,然后单击“ 运行”。

SELECT
date, SUM(cumulative_confirmed) AS country_cases,SUM(cumulative_deceased) AS country_deathsFROM
`bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
date BETWEEN '2020-03-15'AND '2020-04-30'AND country_name='United States of America'GROUP BY date

2. Click on EXPLORE DATA > Explore with Data Studio.

2.单击探索 数据 > 使用Data Studio探索

3. Give access to Data Studio and authorize it to control BigQuery.

3.授予对Data Studio的访问权限,并授权它控制BigQuery。

If you fail to create a report for the very first time login of Data Studio, click + Blank Report option and accept the Terms of Service. Then, go back again to BigQuery page and click Explore with Data Studio again.

如果您第一次登录Data Studio时未能创建报告,请单击+空白报告选项并接受服务条款。 然后,再次返回BigQuery页面,然后再次单击“使用Data Studio探索”

4. Create a new Time series chart in the new Data Studio report by selecting Add a chart > Time series Chart.

4.通过选择新的Data Studio报告创建一个新的时间序列图表 添加图表 > 时间序列图

5. Add country_cases and country_deaths to the Metric field.

5.将country_casescountry_deaths添加到“ 度量”字段。

6. Click Save to commit the change.

6.单击保存以提交更改。

恭喜!! (Congratulations!!)

This is the skill badge I got after completing this challenge lab :P

这是完成挑战实验后获得的技能徽章:P

Image for post
Google Cloud — Skill Badge (Image by author)
Google Cloud —技能徽章(作者提供的图片)

With this, we have come to the end of this challenge lab. Thanks for reading this and following along. Hope you loved it! Bundle of thanks for reading it!

至此,我们已经到了挑战实验室的终点。 感谢您阅读并继续。 希望你喜欢它! 捆绑感谢您阅读!

My Portfolio and Linkedin :)

我的投资组合和Linkedin :)

翻译自: https://medium.com/swlh/insights-from-data-with-bigquery-challenge-lab-tutorial-f868992ef9dc

bigquery 教程

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389757.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

学习linux系统到底有没捷径?

2019独角兽企业重金招聘Python工程师标准>>> 说起linux操作系,可能对于很多不了解的人来说,第一个想到的就是类似于黑客帝国中的黑框框以及一串串不知所云的代码,总之这些感觉都可以总结成为一个字,那就是——酷&#…

wxpython实现界面跳转

wxPython实现Frame之间的跳转/更新的一种方法 wxPython是Python中重要的GUI框架,下面通过自己的方法实现模拟类似PC版微信登录,并跳转到主界面(朋友圈)的流程。 (一)项目目录 【说明】 icon : 保存项目使用…

java职业技能了解精通_如何通过精通数字分析来提升职业生涯的发展,第8部分...

java职业技能了解精通Continuing from the seventh article in this series, we are going to explore ways to present data. Over the past few years, Marketing and SEO field has become more data-driven than in the past thanks to tools like Google Webmaster Tools …

kfc流程管理炸薯条几秒_炸薯条成为数据科学的最后前沿

kfc流程管理炸薯条几秒In February, our Data Science team had an argument about which restaurant we went to made the best French Fry.2月,我们的数据科学团队对我们去哪家餐厅做得最好的炸薯条产生了争议。 We decided to make it a competition throughout…

bigquery_到Google bigquery的sql查询模板,它将您的报告提升到另一个层次

bigqueryIn this post, we’re sharing report templates that you can build with SQL queries to Google BigQuery data.在本文中,我们将分享您可以使用SQL查询为Google BigQuery数据构建的报告模板。 First, you’ll find out about what you can calculate wit…

分类树/装袋法/随机森林算法的R语言实现

原文首发于简书于[2018.06.12] 本文是我自己动手用R语言写的实现分类树的代码,以及在此基础上写的袋装法(bagging)和随机森林(random forest)的算法实现。全文的结构是: 分类树 基本知识predginisplitrules…

数据科学学习心得_学习数据科学时如何保持动力

数据科学学习心得When trying to learn anything all by yourself, it is easy to lose motivation and get thrown off track.尝试自己学习所有东西时,很容易失去动力并偏离轨道。 In this article, I will provide you with some tips that I used to stay focus…

用php当作cat使用

今天,本来是想敲 node test.js 执行一下,test.js文件,结果 惯性的敲成了 php test.js, 原文输出了 test.js的内容。 突然觉得,这东西 感觉好像是 cat 命令,嘿嘿,以后要是ubuntu 上没装 cat , …

建信01. 间隔删除链表结点

建信01. 间隔删除链表结点 给你一个链表的头结点 head,每隔一个结点删除另一个结点(要求保留头结点)。 请返回最终链表的头结点。 示例 1: 输入:head [1,2,3,4] 输出: [1,3] 解释: 蓝色结点为删除的结点…

python多项式回归_在python中实现多项式回归

python多项式回归Video Link影片连结 You can view the code used in this Episode here: SampleCode您可以在此处查看 此剧 集中使用的代码: SampleCode 导入我们的数据 (Importing our Data) The first step is to import our data into python.第一步是将我们的…

Uboot 命令是如何被使用的?

有什么问题请 发邮件至syyxyoutlook.com, 欢迎交流~ 在uboot代码中命令的模式是这个样子: 这样是如何和命令行交互的呢? 在command.h 中, 我们可以看到如下宏定义 将其拆分出来: #define U_BOOT_CMD(name,maxargs,rep,cmd,usage,help) \ U_…

大数据可视化应用_在数据可视化中应用种族平等意识

大数据可视化应用The following post is a summarized version of the article accepted to the 2020 Visualization for Communication workshop as part of the 2020 IEEE VIS conference to be held in October 2020. The full paper has been published as an OSF Preprint…

Windows10电脑系统时间校准

有时候新安装电脑系统,系统时间不对,需要主动去校准系统时间。1、点击时间 2、日期和时间设置 3、其他日期、时间和区域设置 4、设置时间和日期 5、Internet 时间 6、点击立即更新,如果更新失败就查电脑是否已联网,重试点击立即更…

pd种知道每个数据的类型_每个数据科学家都应该知道的5个概念

pd种知道每个数据的类型意见 (Opinion) 目录 (Table of Contents) Introduction 介绍 Multicollinearity 多重共线性 One-Hot Encoding 一站式编码 Sampling 采样 Error Metrics 错误指标 Storytelling 评书 Summary 摘要 介绍 (Introduction) I have written about common ski…

xgboost keras_用catboost lgbm xgboost和keras预测财务交易

xgboost kerasThe goal of this challenge is to predict whether a customer will make a transaction (“target” 1) or not (“target” 0). For that, we get a data set of 200 incognito variables and our submission is judged based on the Area Under Receiver Op…

2017. 网格游戏

2017. 网格游戏 给你一个下标从 0 开始的二维数组 grid ,数组大小为 2 x n ,其中 grid[r][c] 表示矩阵中 (r, c) 位置上的点数。现在有两个机器人正在矩阵上参与一场游戏。 两个机器人初始位置都是 (0, 0) ,目标位置是 (1, n-1) 。每个机器…

HUST软工1506班第2周作业成绩公布

说明 本次公布的成绩对应的作业为: 第2周个人作业:WordCount编码和测试 如果同学对作业成绩存在异议,在成绩公布的72小时内(截止日期4月26日0点)可以进行申诉,方式如下: 毕博平台的第二周在线答…

币氪共识指数排行榜0910

币氪量化数据在今天的报告中给出DASH的近期买卖信号,可以看出从今年4月中旬起到目前为止,DASH_USDT的价格总体呈现出下降的趋势。 转载于:https://www.cnblogs.com/tokpick/p/9621821.html

走出囚徒困境的方法_囚徒困境的一种计算方法

走出囚徒困境的方法You and your friend have committed a murder. A few days later, the cops pick the two of you up and put you in two separate interrogation rooms such that you have no communication with each other. You think your life is over, but the polic…

Zookeeper系列四:Zookeeper实现分布式锁、Zookeeper实现配置中心

一、Zookeeper实现分布式锁 分布式锁主要用于在分布式环境中保证数据的一致性。 包括跨进程、跨机器、跨网络导致共享资源不一致的问题。 1. 分布式锁的实现思路 说明: 这种实现会有一个缺点,即当有很多进程在等待锁的时候,在释放锁的时候会有…