[数仓]三、离线数仓(Hive数仓系统)

1章 数仓分层

1.1 为什么要分层

DIM:dimensionality 维度

1.2 数据集市与数据仓库概念

1.3 数仓命名规范

1.3.1 表命名

  • ODS层命名为ods_表名
  • DIM层命名为dim_表名
  • DWD层命名为dwd_表名
  • DWS层命名为dws_表名  
  • DWT层命名为dwt_表名
  • ADS层命名为ads_表名
  • 临时表命名为tmp_表名

1.3.2 脚本命名

  • 数据源_to_目标_db/log.sh
  • 用户行为脚本以log为后缀;业务数据脚本以db为后缀。

1.3.3 表字段类型

  • 数量类型为bigint
  • 金额类型为decimal(16, 2),表示:16位有效数字,其中小数部分2位
  • 字符串(名字,描述信息等)类型为string
  • 主键外键类型为string
  • 时间戳类型为bigint

2章 数仓理论

2.1 范式理论

2.1.1 范式概念

1)定义

数据建模必须遵循一定的规则,在关系建模中,这种规则就是范式。

2)目的

采用范式,可以降低数据的冗余性

为什么要降低数据冗余性?

(1)十几年前,磁盘很贵,为了减少磁盘存储。

(2)以前没有分布式系统,都是单机,只能增加磁盘,磁盘个数也是有限的

(3)一次修改,需要修改多个表,很难保证数据一致性

3)缺点

范式的缺点是获取数据时,需要通过Join拼接出最后的数据。

4)分类

目前业界范式有:第一范式(1NF)、第二范式(2NF)、第三范式(3NF)、巴斯-科德范式(BCNF)、第四范式(4NF)、第五范式(5NF)。 

2.1.2 函数依赖

2.1.3 三范式区分

2.2 关系建模与维度建模

关系建模和维度建模是两种数据仓库的建模技术。关系建模由Bill Inmon所倡导,维度建模由Ralph Kimball所倡导。

2.2.1 关系建模

关系建模将复杂的数据抽象为两个概念——实体和关系,并使用规范化的方式表示出来。关系模型如图所示从图中可以看出,较为松散、零碎,物理表数量多

关系模型严格遵循第三范式(3NF)数据冗余程度低,数据的一致性容易得到保证。由于数据分布于众多的表中,查询会相对复杂,在大数据的场景下,查询效率相对较低。

2.2.2 维度建模

维度模型如图所示,从图中可以看出,模型相对清晰、简洁。

维度模型以数据分析作为出发点,不遵循三范式,故数据存在一定的冗余。维度模型面向业务,将业务用事实表和维度表呈现出来。表结构简单,故查询简单,查询效率较高。

2.3 维度表和事实表(重点)

2.3.1 维度表

维度表:一般是对事实的描述信息。每一张维表对应现实世界中的一个对象或者概念。    例如:用户、商品、日期、地区等。

维表的特征:

  • 维表的范围很宽(具有多个属性、列比较多)
  • 跟事实表相比,行数相对较小:通常< 10万条
  • 内容相对固定:编码表

时间维度表:

日期ID

day of week

day of year

季度

节假日

2020-01-01

2

1

1

元旦

2020-01-02

3

2

1

2020-01-03

4

3

1

2020-01-04

5

4

1

2020-01-05

6

5

1

2.3.2 事实表

        事实表中的每行数据代表一个业务事件(下单、支付、退款、评价等)。“事实”这个术语表示的是业务事件的度量值(可统计次数、个数、金额等),例如,2020年5月21日,宋宋老师在京东花了250块钱买了一瓶海狗人参丸。维度表:时间、用户、商品、商家。事实表:250块钱、一瓶

        每一个事实表的行包括:具有可加性的数值型的度量值、与维表相连接的外键,通常具有两个和两个以上的外键。

事实表的特征:

  • 非常的大
  • 内容相对的窄:列数较少(主要是外键id和度量值)
  • 经常发生变化,每天会新增加很多。

1)事务型事实表

        以每个事务或事件为单位,例如一个销售订单记录,一笔支付记录等,作为事实表里的一行数据。一旦事务被提交,事实表数据被插入,数据就不再进行更改,其更新方式为增量更新。

2)周期型快照事实表

        周期型快照事实表中不会保留所有数据只保留固定时间间隔的数据,例如每天或者每月的销售额,或每月的账户余额等。

        例如购物车,有加减商品,随时都有可能变化,但是我们更关心每天结束时这里面有多少商品,方便我们后期统计分析。

3)累积型快照事实表

        累计快照事实表用于跟踪业务事实的变化。例如,数据仓库中可能需要累积或者存储订单从下订单开始,到订单商品被打包、运输、和签收的各个业务阶段的时间点数据来跟踪订单声明周期的进展情况。当这个业务过程进行时,事实表的记录也要不断更新。

订单id

用户id

下单时间

打包时间

发货时间

签收时间

订单金额

3-8

3-8

3-9

3-10

2.4 维度模型分类

在维度建模的基础上又分为三种模型:星型模型、雪花模型、星座模型。

2.5 数据仓库建模(绝对重点)

2.5.1 ODS

1)HDFS用户行为数据

2)HDFS业务数据

3)针对HDFS上的用户行为数据和业务数据,我们如何规划处理?

(1)保持数据原貌不做任何修改,起到备份数据的作用。
(2)数据采用压缩,减少磁盘存储空间(例如:原始数据100G,可以压缩到10G左右)
(3)创建分区表,防止后续的全表扫描

2.5.2 DIM层和DWD

DIM层DWD层需构建维度模型,一般采用星型模型,呈现的状态一般为星座模型。

维度建模一般按照以下四个步骤:

        选择业务过程→声明粒度→确认维度→确认事实

(1)选择业务过程

        在业务系统中,挑选我们感兴趣的业务线,比如下单业务,支付业务,退款业务,物流业务,一条业务线对应一张事实表。

(2)声明粒度

数据粒度指数据仓库的数据中保存数据的细化程度或综合程度的级别。

声明粒度意味着精确定义事实表中的一行数据表示什么,应该尽可能选择最小粒度,以此来应各种各样的需求。

典型的粒度声明如下:

        订单事实表中一行数据表示的是一个订单中的一个商品项。
        支付事实表中一行数据表示的是一个支付记录。

(3)确定维度

维度的主要作用是描述业务是事实,主要表示的是“谁,何处,何时”等信息。

确定维度的原则是:后续需求中是否要分析相关维度的指标。例如,需要统计,什么时间下的订单多,哪个地区下的订单多,哪个用户下的订单多。需要确定的维度就包括:时间维度、地区维度、用户维度。

(4)确定事实

此处的“事实”一词,指的是业务中的度量值(次数、个数、件数、金额,可以进行累加),例如订单金额、下单次数等。

在DWD层,以业务过程为建模驱动,基于每个具体业务过程的特点,构建最细粒度的明细层事实表。事实表可做适当的宽表化处理。

事实表和维度表的关联比较灵活,但是为了应对更复杂的业务需求,可以将能关联上的表尽量关联上。

时间

用户

地区

商品

优惠券

活动

度量值

订单

运费/优惠金额/原始金额/最终金额

订单详情

件数/优惠金额/原始金额/最终金额

支付

支付金额

加购

件数/金额

收藏

次数

评价

次数

退单

件数/金额

退款

件数/金额

优惠券领用

次数

至此,数据仓库的维度建模已经完毕,DWD层是以业务过程为驱动。DWS层、DWT层和ADS层都是以需求为驱动,和维度建模已经没有关系了。DWS和DWT都是建宽表,按照主题去建表。主题相当于观察问题的角度。对应着维度表。

2.5.3 DWS层与DWT层

DWS层和DWT层统称宽表层,这两层的设计思想大致相同,通过以下案例进行阐述。

1)问题引出:两个需求,统计每个省份订单的个数、统计每个省份订单的总金额
2)处理办法:都是将省份表和订单表进行join,group by省份,然后计算。同样数据被计算了两次,实际上类似的场景还会更多。

那怎么设计能避免重复计算呢?

        针对上述场景,可以设计一张地区宽表,其主键为地区ID,字段包含为:下单次数、下单金额、支付次数、支付金额等。上述所有指标都统一进行计算,并将结果保存在该宽表中,这样就能有效避免数据的重复计算。

3)总结:

(1)需要建哪些宽表:以维度为基准。
(2)宽表里面的字段:是站在不同维度的角度去看事实表,重点关注事实表聚合后的度量值。
(3)DWS和DWT层的区别:DWS层存放的所有主题对象当天的汇总行为,例如每个地区当天的下单次数,下单金额等,DWT层存放的是所有主题对象的累积行为,例如每个地区最近7天(15天、30天、60天)的下单次数、下单金额等。

2.5.4 ADS

        对电商系统各大主题指标分别进行分析。

第3章 数仓环境搭建

3.1 Hive环境搭建

3.1.1 Hive引擎简介

Hive引擎包括:默认MR、tez、spark
Hive on Spark:Hive既作为存储元数据又负责SQL的解析优化,语法是HQL语法,执行引擎变成了Spark,Spark负责采用RDD执行。
Spark on Hive: Hive只作为存储元数据,Spark负责SQL解析优化,语法是Spark SQL语法,Spark负责采用RDD执行。

3.1.2 Hive on Spark配置

1)兼容性说明

注意:官网下载的Hive3.1.2和Spark3.0.0默认是不兼容的。因为Hive3.1.2支持的Spark版本是2.4.5,所以需要我们重新编译Hive3.1.2版本。

编译步骤:官网下载Hive3.1.2源码,修改pom文件中引用的Spark版本为3.0.0,如果编译通过,直接打包获取jar包。如果报错,就根据提示,修改相关方法,直到不报错,打包获取jar包。

2)在Hive所在节点部署Spark

        如果之前已经部署了Spark,则该步骤可以跳过,但要检查SPARK_HOME的环境变量配置是否正确。

(1)Spark官网下载jar包地址:

Downloads | Apache Spark

(2)上传并解压解压spark-3.0.0-bin-hadoop3.2.tgz

[seven@hadoop102 software]$ tar -zxvf spark-3.0.0-bin-hadoop3.2.tgz -C /opt/module/
[seven@hadoop102 software]$ mv /opt/module/spark-3.0.0-bin-hadoop3.2 /opt/module/spark

(3)配置SPARK_HOME环境变量

[seven@hadoop102 software]$ sudo vim /etc/profile.d/my_env.sh

添加如下内容

# SPARK_HOME
export SPARK_HOME=/opt/module/spark
export PATH=$PATH:$SPARK_HOME/bin

source 使其生效

[seven@hadoop102 software]$ source /etc/profile.d/my_env.sh

3)在hive中创建spark配置文件

[seven@hadoop102 software]$ vim /opt/module/hive/conf/spark-defaults.conf

添加如下内容(在执行任务时,会根据如下参数执行)

spark.master                           yarn
spark.eventLog.enabled                 true
spark.eventLog.dir                     hdfs://hadoop102:8020/spark-history
spark.executor.memory                  1g
spark.driver.memory                    1g

在HDFS创建如下路径,用于存储历史日志

[seven@hadoop102 software]$ hadoop fs -mkdir /spark-history

4)向HDFS上传Spark纯净版jar包

说明1:由于Spark3.0.0非纯净版默认支持的是hive2.3.7版本,直接使用会和安装的Hive3.1.2出现兼容性问题。所以采用Spark纯净版jar包,不包含hadoop和hive相关依赖,避免冲突。

说明2:Hive任务最终由Spark来执行,Spark任务资源分配由Yarn来调度,该任务有可能被分配到集群的任何一个节点。所以需要将Spark的依赖上传到HDFS集群路径,这样集群中任何一个节点都能获取到。

(1)上传并解压spark-3.0.0-bin-without-hadoop.tgz

[seven@hadoop102 software]$ tar -zxvf /opt/software/spark-3.0.0-bin-without-hadoop.tgz

(2)上传Spark纯净版jar包到HDFS

[seven@hadoop102 software]$ hadoop fs -mkdir /spark-jars
[seven@hadoop102 software]$ hadoop fs -put spark-3.0.0-bin-without-hadoop/jars/* /spark-jars

5)修改hive-site.xml文件

[seven@hadoop102 ~]$ vim /opt/module/hive/conf/hive-site.xml

添加如下内容

<!--Spark依赖位置(注意:端口号8020必须和namenode的端口号一致)-->
<property><name>spark.yarn.jars</name><value>hdfs://hadoop102:8020/spark-jars/*</value>
</property>  <!--Hive执行引擎-->
<property><name>hive.execution.engine</name><value>spark</value>
</property>

3.1.3 Hive on Spark测试

(1)启动hive客户端

[seven@hadoop102 hive]$ bin/hive

(2)创建一张测试表

hive (default)> create table student(id int, name string);

(3)通过insert测试效果

hive (default)> insert into table student values(1,'abc');

若结果如下,则说明配置成功

3.2 Yarn配置

3.2.1 增加ApplicationMaster资源比例

        容量调度器对每个资源队列中同时运行的Application Master占用的资源进行了限制,该限制通过yarn.scheduler.capacity.maximum-am-resource-percent参数实现,其默认值是0.1,表示每个资源队列上Application Master最多可使用的资源为该队列总资源的10%,目的是防止大部分资源都被Application Master占用,而导致Map/Reduce Task无法执行。

        生产环境该参数可使用默认值。但学习环境,集群资源总数很少,如果只分配10%的资源给Application Master,则可能出现,同一时刻只能运行一个Job的情况,因为一个Application Master使用的资源就可能已经达到10%的上限了。故此处可将该值适当调大。

(1)在hadoop102的/opt/module/hadoop-3.1.3/etc/hadoop/capacity-scheduler.xml文件中修改如下参数值

[seven@hadoop102 hadoop]$ vim capacity-scheduler.xml

<property><name>yarn.scheduler.capacity.maximum-am-resource-percent</name><value>0.8</value>
</property

(2)分发capacity-scheduler.xml配置文件

[seven@hadoop102 hadoop]$ xsync capacity-scheduler.xml

(3)关闭正在运行的任务,重新启动yarn集群

[seven@hadoop103 hadoop-3.1.3]$ sbin/stop-yarn.sh
[seven@hadoop103 hadoop-3.1.3]$ sbin/start-yarn.sh

3.3 数仓开发环境

数仓开发工具可选用DBeaver或者DataGrip。两者都需要用到JDBC协议连接到Hive,故需要启动HiveServer2。

1.启动HiveServer2

[seven@hadoop102 hive]$ hiveserver2

2.配置DataGrip连接

1)创建连接

2)配置连接属性

所有属性配置,和Hive的beeline客户端配置一致即可。初次使用,配置过程会提示缺少JDBC驱动,按照提示下载即可。

3.测试使用

创建数据库gmall,并观察是否创建成功。

1)创建数据库

2)查看数据库

3)修改连接,指明连接数据库

4)选择当前数据库为gmall

3.4 数据准备

一般企业在搭建数仓时,业务系统中会存在一定的历史数据,此处为模拟真实场景,需准备若干历史数据。假定数仓上线的日期为2020-06-14,具体说明如下。 

1.用户行为日志

用户行为日志,一般是没有历史数据的,故日志只需要准备2020-06-14一天的数据。具体操作如下:

1)启动日志采集通道,包括Flume、Kafak等

2)修改两个日志服务器(hadoop102、hadoop103)中的/opt/module/applog/application.yml配置文件,将mock.date参数改为2020-06-14。

3)执行日志生成脚本lg.sh。

4)观察HDFS是否出现相应文件

2.业务数据

业务数据一般存在历史数据,此处需准备2020-06-10至2020-06-14的数据。具体操作如下。

1)修改hadoop102节点上的/opt/module/db_log/application.properties文件,将mock.date、mock.clear,mock.clear.user三个参数调整为如图所示的值。

                

2)执行模拟生成业务数据的命令,生成第一天2020-06-10的历史数据。

[seven@hadoop102 db_log]$ java -jar gmall2020-mock-db-2021-01-22.jar

3)修改/opt/module/db_log/application.properties文件,将mock.date、mock.clear,mock.clear.user三个参数调整为如图所示的值。

                

4)执行模拟生成业务数据的命令,生成第二天2020-06-11的历史数据。
        [seven@hadoop102 db_log]$ java -jar gmall2020-mock-db-2021-01-22.jar

5)之后只修改/opt/module/db_log/application.properties文件中的mock.date参数,依次改为2020-06-12,2020-06-13,2020-06-14,并分别生成对应日期的数据。

6)执行mysql_to_hdfs_init.sh脚本,将模拟生成的业务数据同步到HDFS。
        [seven@hadoop102 bin]$ mysql_to_hdfs_init.sh all 2020-06-14

7)观察HDFS上是否出现相应的数据

第4章 数仓搭建-ODS

1)保持数据原貌不做任何修改,起到备份数据的作用。
2)数据采用LZO压缩,减少磁盘存储空间。100G数据可以压缩到10G以内
3)创建分区表,防止后续的全表扫描,在企业开发中大量使用分区表。
4)创建外部表。在企业开发中,除了自己用的临时表,创建内部表外,绝大多数场景都是创建外部表。

4.1 ODS层(用户行为数据)

4.1.1 创建日志表ods_log

1)创建支持lzo压缩的分区表

(1)建表语句

hive (gmall)> 
drop table if exists ods_log;
CREATE EXTERNAL TABLE ods_log (`line` string)
PARTITIONED BY (`dt` string) -- 按照时间创建分区
STORED AS -- 指定存储方式,读数据采用LzoTextInputFormat;INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_log'  -- 指定数据在hdfs上的存储位置
;

说明Hive的LZO压缩:LanguageManual LZO - Apache Hive - Apache Software Foundation

(2)分区规划

2)加载数据

hive (gmall)> 
load data inpath '/origin_data/gmall/log/topic_log/2020-06-14' into table ods_log partition(dt='2020-06-14');

注意:时间格式都配置成YYYY-MM-DD格式,这是Hive默认支持的时间格式

3)为lzo压缩文件创建索引

[seven@hadoop102 bin]$ hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /warehouse/gmall/ods/ods_log/dt=2020-06-14

4.1.2 Shell中单引号和双引号区别

1)在/home/seven/bin创建一个test.sh文件

[seven@hadoop102 bin]$ vim test.sh

文件中添加如下内容

#!/bin/bash
do_date=$1echo '$do_date'
echo "$do_date"
echo "'$do_date'"
echo '"$do_date"'
echo `date`

2)查看执行结果

[seven@hadoop102 bin]$ test.sh 2020-06-14
$do_date
2020-06-14
'2020-06-14'
"$do_date"
2020年 06月 18日 星期四 21:02:08 CST

3)总结:

(1)单引号不取变量值
(2)双引号取变量值
(3)反引号`,执行引号中命令
(4)双引号内部嵌套单引号,取出变量值
(5)单引号内部嵌套双引号,不取出变量值

4.1.3 ODS层日志表加载数据脚本

1)编写脚本

(1)在hadoop102的/home/seven/bin目录下创建脚本

[seven@hadoop102 bin]$ vim hdfs_to_ods_log.sh

在脚本中编写如下内容

#!/bin/bash# 定义变量方便修改
APP=gmall# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;thendo_date=$1
else do_date=`date -d "-1 day" +%F`
fi echo ================== 日志日期为 $do_date ==================
sql="
load data inpath '/origin_data/$APP/log/topic_log/$do_date' into table ${APP}.ods_log partition(dt='$do_date');
"hive -e "$sql"hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /warehouse/$APP/ods/ods_log/dt=$do_date

(1)说明1:

[ -n 变量 ] 判断变量的值,是否为空

-- 变量的值,非空,返回true

-- 变量的值,为空,返回false

注意:[ -n 变量值 ]不会解析数据,使用[ -n 变量 ]时,需要对变量加上双引号(" ")

(2)说明2:

查看date命令的使用,date --help

(2)增加脚本执行权限

[seven@hadoop102 bin]$ chmod 777 hdfs_to_ods_log.sh

2)脚本使用

(1)执行脚本

[seven@hadoop102 module]$ hdfs_to_ods_log.sh 2020-06-14

(2)查看导入数据

4.2 ODS层(业务数据)

ODS层业务表分区规划如下:

ODS层业务表数据装载思路如下:

4.2.1 活动信息表

DROP TABLE IF EXISTS ods_activity_info;
CREATE EXTERNAL TABLE ods_activity_info(`id` STRING COMMENT '编号',`activity_name` STRING  COMMENT '活动名称',`activity_type` STRING  COMMENT '活动类型',`start_time` STRING  COMMENT '开始时间',`end_time` STRING  COMMENT '结束时间',`create_time` STRING  COMMENT '创建时间'
) COMMENT '活动信息表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_activity_info/';

4.2.2 活动规则表

DROP TABLE IF EXISTS ods_activity_rule;
CREATE EXTERNAL TABLE ods_activity_rule(`id` STRING COMMENT '编号',`activity_id` STRING  COMMENT '活动ID',`activity_type` STRING COMMENT '活动类型',`condition_amount` DECIMAL(16,2) COMMENT '满减金额',`condition_num` BIGINT COMMENT '满减件数',`benefit_amount` DECIMAL(16,2) COMMENT '优惠金额',`benefit_discount` DECIMAL(16,2) COMMENT '优惠折扣',`benefit_level` STRING COMMENT '优惠级别'
) COMMENT '活动规则表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_activity_rule/';

4.2.3 一级品类表

DROP TABLE IF EXISTS ods_base_category1;
CREATE EXTERNAL TABLE ods_base_category1(`id` STRING COMMENT 'id',`name` STRING COMMENT '名称'
) COMMENT '商品一级分类表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_category1/';

4.2.4 二级品类表

DROP TABLE IF EXISTS ods_base_category2;
CREATE EXTERNAL TABLE ods_base_category2(`id` STRING COMMENT ' id',`name` STRING COMMENT '名称',`category1_id` STRING COMMENT '一级品类id'
) COMMENT '商品二级分类表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_category2/';

4.2.5 三级品类表

DROP TABLE IF EXISTS ods_base_category3;
CREATE EXTERNAL TABLE ods_base_category3(`id` STRING COMMENT ' id',`name` STRING COMMENT '名称',`category2_id` STRING COMMENT '二级品类id'
) COMMENT '商品三级分类表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_category3/';

4.2.6 编码字典表

DROP TABLE IF EXISTS ods_base_dic;
CREATE EXTERNAL TABLE ods_base_dic(`dic_code` STRING COMMENT '编号',`dic_name` STRING COMMENT '编码名称',`parent_code` STRING COMMENT '父编码',`create_time` STRING COMMENT '创建日期',`operate_time` STRING COMMENT '操作日期'
) COMMENT '编码字典表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_dic/';

4.2.7 省份表

DROP TABLE IF EXISTS ods_base_province;
CREATE EXTERNAL TABLE ods_base_province (`id` STRING COMMENT '编号',`name` STRING COMMENT '省份名称',`region_id` STRING COMMENT '地区ID',`area_code` STRING COMMENT '地区编码',`iso_code` STRING COMMENT 'ISO-3166编码,供可视化使用',`iso_3166_2` STRING COMMENT 'IOS-3166-2编码,供可视化使用'
)  COMMENT '省份表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_province/';

4.2.8 地区表

DROP TABLE IF EXISTS ods_base_region;
CREATE EXTERNAL TABLE ods_base_region (`id` STRING COMMENT '编号',`region_name` STRING COMMENT '地区名称'
)  COMMENT '地区表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_region/';

4.2.9 品牌表

DROP TABLE IF EXISTS ods_base_trademark;
CREATE EXTERNAL TABLE ods_base_trademark (`id` STRING COMMENT '编号',`tm_name` STRING COMMENT '品牌名称'
)  COMMENT '品牌表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_trademark/';

4.2.10 购物车表

DROP TABLE IF EXISTS ods_cart_info;
CREATE EXTERNAL TABLE ods_cart_info(`id` STRING COMMENT '编号',`user_id` STRING COMMENT '用户id',`sku_id` STRING COMMENT 'skuid',`cart_price` DECIMAL(16,2)  COMMENT '放入购物车时价格',`sku_num` BIGINT COMMENT '数量',`sku_name` STRING COMMENT 'sku名称 (冗余)',`create_time` STRING COMMENT '创建时间',`operate_time` STRING COMMENT '修改时间',`is_ordered` STRING COMMENT '是否已经下单',`order_time` STRING COMMENT '下单时间',`source_type` STRING COMMENT '来源类型',`source_id` STRING COMMENT '来源编号'
) COMMENT '加购表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_cart_info/';

4.2.11 评论表

DROP TABLE IF EXISTS ods_comment_info;
CREATE EXTERNAL TABLE ods_comment_info(`id` STRING COMMENT '编号',`user_id` STRING COMMENT '用户ID',`sku_id` STRING COMMENT '商品sku',`spu_id` STRING COMMENT '商品spu',`order_id` STRING COMMENT '订单ID',`appraise` STRING COMMENT '评价',`create_time` STRING COMMENT '评价时间'
) COMMENT '商品评论表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_comment_info/';

4.2.12 优惠券信息表

DROP TABLE IF EXISTS ods_coupon_info;
CREATE EXTERNAL TABLE ods_coupon_info(`id` STRING COMMENT '购物券编号',`coupon_name` STRING COMMENT '购物券名称',`coupon_type` STRING COMMENT '购物券类型 1 现金券 2 折扣券 3 满减券 4 满件打折券',`condition_amount` DECIMAL(16,2) COMMENT '满额数',`condition_num` BIGINT COMMENT '满件数',`activity_id` STRING COMMENT '活动编号',`benefit_amount` DECIMAL(16,2) COMMENT '减金额',`benefit_discount` DECIMAL(16,2) COMMENT '折扣',`create_time` STRING COMMENT '创建时间',`range_type` STRING COMMENT '范围类型 1、商品 2、品类 3、品牌',`limit_num` BIGINT COMMENT '最多领用次数',`taken_count` BIGINT COMMENT '已领用次数',`start_time` STRING COMMENT '开始领取时间',`end_time` STRING COMMENT '结束领取时间',`operate_time` STRING COMMENT '修改时间',`expire_time` STRING COMMENT '过期时间'
) COMMENT '优惠券表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_coupon_info/';

4.2.13 优惠券领用表

DROP TABLE IF EXISTS ods_coupon_use;
CREATE EXTERNAL TABLE ods_coupon_use(`id` STRING COMMENT '编号',`coupon_id` STRING  COMMENT '优惠券ID',`user_id` STRING  COMMENT 'skuid',`order_id` STRING  COMMENT 'spuid',`coupon_status` STRING  COMMENT '优惠券状态',`get_time` STRING  COMMENT '领取时间',`using_time` STRING  COMMENT '使用时间(下单)',`used_time` STRING  COMMENT '使用时间(支付)',`expire_time` STRING COMMENT '过期时间'
) COMMENT '优惠券领用表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_coupon_use/';

4.2.14 收藏表

DROP TABLE IF EXISTS ods_favor_info;
CREATE EXTERNAL TABLE ods_favor_info(`id` STRING COMMENT '编号',`user_id` STRING COMMENT '用户id',`sku_id` STRING COMMENT 'skuid',`spu_id` STRING COMMENT 'spuid',`is_cancel` STRING COMMENT '是否取消',`create_time` STRING COMMENT '收藏时间',`cancel_time` STRING COMMENT '取消时间'
) COMMENT '商品收藏表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_favor_info/';

4.2.15 订单明细表

DROP TABLE IF EXISTS ods_order_detail;
CREATE EXTERNAL TABLE ods_order_detail(`id` STRING COMMENT '编号',`order_id` STRING  COMMENT '订单号', `sku_id` STRING COMMENT '商品id',`sku_name` STRING COMMENT '商品名称',`order_price` DECIMAL(16,2) COMMENT '商品价格',`sku_num` BIGINT COMMENT '商品数量',`create_time` STRING COMMENT '创建时间',`source_type` STRING COMMENT '来源类型',`source_id` STRING COMMENT '来源编号',`split_final_amount` DECIMAL(16,2) COMMENT '分摊最终金额',`split_activity_amount` DECIMAL(16,2) COMMENT '分摊活动优惠',`split_coupon_amount` DECIMAL(16,2) COMMENT '分摊优惠券优惠'
) COMMENT '订单详情表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_detail/';

4.2.16 订单明细活动关联表

DROP TABLE IF EXISTS ods_order_detail_activity;
CREATE EXTERNAL TABLE ods_order_detail_activity(`id` STRING COMMENT '编号',`order_id` STRING  COMMENT '订单号',`order_detail_id` STRING COMMENT '订单明细id',`activity_id` STRING COMMENT '活动id',`activity_rule_id` STRING COMMENT '活动规则id',`sku_id` BIGINT COMMENT '商品id',`create_time` STRING COMMENT '创建时间'
) COMMENT '订单详情活动关联表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_detail_activity/';

4.2.17 订单明细优惠券关联表

DROP TABLE IF EXISTS ods_order_detail_coupon;
CREATE EXTERNAL TABLE ods_order_detail_coupon(`id` STRING COMMENT '编号',`order_id` STRING  COMMENT '订单号',`order_detail_id` STRING COMMENT '订单明细id',`coupon_id` STRING COMMENT '优惠券id',`coupon_use_id` STRING COMMENT '优惠券领用记录id',`sku_id` STRING COMMENT '商品id',`create_time` STRING COMMENT '创建时间'
) COMMENT '订单详情活动关联表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_detail_coupon/';

4.2.18 订单表

DROP TABLE IF EXISTS ods_order_info;
CREATE EXTERNAL TABLE ods_order_info (`id` STRING COMMENT '订单号',`final_amount` DECIMAL(16,2) COMMENT '订单最终金额',`order_status` STRING COMMENT '订单状态',`user_id` STRING COMMENT '用户id',`payment_way` STRING COMMENT '支付方式',`delivery_address` STRING COMMENT '送货地址',`out_trade_no` STRING COMMENT '支付流水号',`create_time` STRING COMMENT '创建时间',`operate_time` STRING COMMENT '操作时间',`expire_time` STRING COMMENT '过期时间',`tracking_no` STRING COMMENT '物流单编号',`province_id` STRING COMMENT '省份ID',`activity_reduce_amount` DECIMAL(16,2) COMMENT '活动减免金额',`coupon_reduce_amount` DECIMAL(16,2) COMMENT '优惠券减免金额',`original_amount` DECIMAL(16,2)  COMMENT '订单原价金额',`feight_fee` DECIMAL(16,2)  COMMENT '运费',`feight_fee_reduce` DECIMAL(16,2)  COMMENT '运费减免'
) COMMENT '订单表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_info/';

4.2.19 退单表

DROP TABLE IF EXISTS ods_order_refund_info;
CREATE EXTERNAL TABLE ods_order_refund_info(`id` STRING COMMENT '编号',`user_id` STRING COMMENT '用户ID',`order_id` STRING COMMENT '订单ID',`sku_id` STRING COMMENT '商品ID',`refund_type` STRING COMMENT '退单类型',`refund_num` BIGINT COMMENT '退单件数',`refund_amount` DECIMAL(16,2) COMMENT '退单金额',`refund_reason_type` STRING COMMENT '退单原因类型',`refund_status` STRING COMMENT '退单状态',--退单状态应包含买家申请、卖家审核、卖家收货、退款完成等状态。此处未涉及到,故该表按增量处理`create_time` STRING COMMENT '退单时间'
) COMMENT '退单表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_refund_info/';

4.2.20 订单状态日志表

DROP TABLE IF EXISTS ods_order_status_log;
CREATE EXTERNAL TABLE ods_order_status_log (`id` STRING COMMENT '编号',`order_id` STRING COMMENT '订单ID',`order_status` STRING COMMENT '订单状态',`operate_time` STRING COMMENT '修改时间'
)  COMMENT '订单状态表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_status_log/';

4.2.21 支付表

DROP TABLE IF EXISTS ods_payment_info;
CREATE EXTERNAL TABLE ods_payment_info(`id` STRING COMMENT '编号',`out_trade_no` STRING COMMENT '对外业务编号',`order_id` STRING COMMENT '订单编号',`user_id` STRING COMMENT '用户编号',`payment_type` STRING COMMENT '支付类型',`trade_no` STRING COMMENT '交易编号',`payment_amount` DECIMAL(16,2) COMMENT '支付金额',`subject` STRING COMMENT '交易内容',`payment_status` STRING COMMENT '支付状态',`create_time` STRING COMMENT '创建时间',`callback_time` STRING COMMENT '回调时间'
)  COMMENT '支付流水表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_payment_info/';

4.2.22 退款表

DROP TABLE IF EXISTS ods_refund_payment;
CREATE EXTERNAL TABLE ods_refund_payment(`id` STRING COMMENT '编号',`out_trade_no` STRING COMMENT '对外业务编号',`order_id` STRING COMMENT '订单编号',`sku_id` STRING COMMENT 'SKU编号',`payment_type` STRING COMMENT '支付类型',`trade_no` STRING COMMENT '交易编号',`refund_amount` DECIMAL(16,2) COMMENT '支付金额',`subject` STRING COMMENT '交易内容',`refund_status` STRING COMMENT '支付状态',`create_time` STRING COMMENT '创建时间',`callback_time` STRING COMMENT '回调时间'
)  COMMENT '支付流水表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_refund_payment/';

4.2.23 商品平台属性表

DROP TABLE IF EXISTS ods_sku_attr_value;
CREATE EXTERNAL TABLE ods_sku_attr_value(`id` STRING COMMENT '编号',`attr_id` STRING COMMENT '平台属性ID',`value_id` STRING COMMENT '平台属性值ID',`sku_id` STRING COMMENT '商品ID',`attr_name` STRING COMMENT '平台属性名称',`value_name` STRING COMMENT '平台属性值名称'
) COMMENT 'sku平台属性表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_sku_attr_value/';

4.2.24 商品(SKU)表

DROP TABLE IF EXISTS ods_sku_info;
CREATE EXTERNAL TABLE ods_sku_info(`id` STRING COMMENT 'skuId',`spu_id` STRING COMMENT 'spuid',`price` DECIMAL(16,2) COMMENT '价格',`sku_name` STRING COMMENT '商品名称',`sku_desc` STRING COMMENT '商品描述',`weight` DECIMAL(16,2) COMMENT '重量',`tm_id` STRING COMMENT '品牌id',`category3_id` STRING COMMENT '品类id',`is_sale` STRING COMMENT '是否在售',`create_time` STRING COMMENT '创建时间'
) COMMENT 'SKU商品表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_sku_info/';

4.2.25 商品销售属性表

DROP TABLE IF EXISTS ods_sku_sale_attr_value;
CREATE EXTERNAL TABLE ods_sku_sale_attr_value(`id` STRING COMMENT '编号',`sku_id` STRING COMMENT 'sku_id',`spu_id` STRING COMMENT 'spu_id',`sale_attr_value_id` STRING COMMENT '销售属性值id',`sale_attr_id` STRING COMMENT '销售属性id',`sale_attr_name` STRING COMMENT '销售属性名称',`sale_attr_value_name` STRING COMMENT '销售属性值名称'
) COMMENT 'sku销售属性名称'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_sku_sale_attr_value/';

4.2.26 商品(SPU)表

DROP TABLE IF EXISTS ods_spu_info;
CREATE EXTERNAL TABLE ods_spu_info(`id` STRING COMMENT 'spuid',`spu_name` STRING COMMENT 'spu名称',`category3_id` STRING COMMENT '品类id',`tm_id` STRING COMMENT '品牌id'
) COMMENT 'SPU商品表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_spu_info/';

4.2.27 用户表

DROP TABLE IF EXISTS ods_user_info;
CREATE EXTERNAL TABLE ods_user_info(`id` STRING COMMENT '用户id',`login_name` STRING COMMENT '用户名称',`nick_name` STRING COMMENT '用户昵称',`name` STRING COMMENT '用户姓名',`phone_num` STRING COMMENT '手机号码',`email` STRING COMMENT '邮箱',`user_level` STRING COMMENT '用户等级',`birthday` STRING COMMENT '生日',`gender` STRING COMMENT '性别',`create_time` STRING COMMENT '创建时间',`operate_time` STRING COMMENT '操作时间'
) COMMENT '用户表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED ASINPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_user_info/';

4.2.28 ODS层业务表首日数据装载脚本

1)编写脚本

(1)在/home/seven/bin目录下创建脚本hdfs_to_ods_db_init.sh

[seven@hadoop102 bin]$ vim hdfs_to_ods_db_init.sh

在脚本中填写如下内容

#!/bin/bashAPP=gmallif [ -n "$2" ] ;thendo_date=$2
else echo "请传入日期参数"exit
fi ods_order_info=" 
load data inpath '/origin_data/$APP/db/order_info/$do_date' OVERWRITE into table ${APP}.ods_order_info partition(dt='$do_date');"ods_order_detail="
load data inpath '/origin_data/$APP/db/order_detail/$do_date' OVERWRITE into table ${APP}.ods_order_detail partition(dt='$do_date');"ods_sku_info="
load data inpath '/origin_data/$APP/db/sku_info/$do_date' OVERWRITE into table ${APP}.ods_sku_info partition(dt='$do_date');"ods_user_info="
load data inpath '/origin_data/$APP/db/user_info/$do_date' OVERWRITE into table ${APP}.ods_user_info partition(dt='$do_date');"ods_payment_info="
load data inpath '/origin_data/$APP/db/payment_info/$do_date' OVERWRITE into table ${APP}.ods_payment_info partition(dt='$do_date');"ods_base_category1="
load data inpath '/origin_data/$APP/db/base_category1/$do_date' OVERWRITE into table ${APP}.ods_base_category1 partition(dt='$do_date');"ods_base_category2="
load data inpath '/origin_data/$APP/db/base_category2/$do_date' OVERWRITE into table ${APP}.ods_base_category2 partition(dt='$do_date');"ods_base_category3="
load data inpath '/origin_data/$APP/db/base_category3/$do_date' OVERWRITE into table ${APP}.ods_base_category3 partition(dt='$do_date'); "ods_base_trademark="
load data inpath '/origin_data/$APP/db/base_trademark/$do_date' OVERWRITE into table ${APP}.ods_base_trademark partition(dt='$do_date'); "ods_activity_info="
load data inpath '/origin_data/$APP/db/activity_info/$do_date' OVERWRITE into table ${APP}.ods_activity_info partition(dt='$do_date'); "ods_cart_info="
load data inpath '/origin_data/$APP/db/cart_info/$do_date' OVERWRITE into table ${APP}.ods_cart_info partition(dt='$do_date'); "ods_comment_info="
load data inpath '/origin_data/$APP/db/comment_info/$do_date' OVERWRITE into table ${APP}.ods_comment_info partition(dt='$do_date'); "ods_coupon_info="
load data inpath '/origin_data/$APP/db/coupon_info/$do_date' OVERWRITE into table ${APP}.ods_coupon_info partition(dt='$do_date'); "ods_coupon_use="
load data inpath '/origin_data/$APP/db/coupon_use/$do_date' OVERWRITE into table ${APP}.ods_coupon_use partition(dt='$do_date'); "ods_favor_info="
load data inpath '/origin_data/$APP/db/favor_info/$do_date' OVERWRITE into table ${APP}.ods_favor_info partition(dt='$do_date'); "ods_order_refund_info="
load data inpath '/origin_data/$APP/db/order_refund_info/$do_date' OVERWRITE into table ${APP}.ods_order_refund_info partition(dt='$do_date'); "ods_order_status_log="
load data inpath '/origin_data/$APP/db/order_status_log/$do_date' OVERWRITE into table ${APP}.ods_order_status_log partition(dt='$do_date'); "ods_spu_info="
load data inpath '/origin_data/$APP/db/spu_info/$do_date' OVERWRITE into table ${APP}.ods_spu_info partition(dt='$do_date'); "ods_activity_rule="
load data inpath '/origin_data/$APP/db/activity_rule/$do_date' OVERWRITE into table ${APP}.ods_activity_rule partition(dt='$do_date');" ods_base_dic="
load data inpath '/origin_data/$APP/db/base_dic/$do_date' OVERWRITE into table ${APP}.ods_base_dic partition(dt='$do_date'); "ods_order_detail_activity="
load data inpath '/origin_data/$APP/db/order_detail_activity/$do_date' OVERWRITE into table ${APP}.ods_order_detail_activity partition(dt='$do_date'); "ods_order_detail_coupon="
load data inpath '/origin_data/$APP/db/order_detail_coupon/$do_date' OVERWRITE into table ${APP}.ods_order_detail_coupon partition(dt='$do_date'); "ods_refund_payment="
load data inpath '/origin_data/$APP/db/refund_payment/$do_date' OVERWRITE into table ${APP}.ods_refund_payment partition(dt='$do_date'); "ods_sku_attr_value="
load data inpath '/origin_data/$APP/db/sku_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_attr_value partition(dt='$do_date'); "ods_sku_sale_attr_value="
load data inpath '/origin_data/$APP/db/sku_sale_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_sale_attr_value partition(dt='$do_date'); "ods_base_province=" 
load data inpath '/origin_data/$APP/db/base_province/$do_date' OVERWRITE into table ${APP}.ods_base_province;"ods_base_region="
load data inpath '/origin_data/$APP/db/base_region/$do_date' OVERWRITE into table ${APP}.ods_base_region;"case $1 in"ods_order_info"){hive -e "$ods_order_info"};;"ods_order_detail"){hive -e "$ods_order_detail"};;"ods_sku_info"){hive -e "$ods_sku_info"};;"ods_user_info"){hive -e "$ods_user_info"};;"ods_payment_info"){hive -e "$ods_payment_info"};;"ods_base_category1"){hive -e "$ods_base_category1"};;"ods_base_category2"){hive -e "$ods_base_category2"};;"ods_base_category3"){hive -e "$ods_base_category3"};;"ods_base_trademark"){hive -e "$ods_base_trademark"};;"ods_activity_info"){hive -e "$ods_activity_info"};;"ods_cart_info"){hive -e "$ods_cart_info"};;"ods_comment_info"){hive -e "$ods_comment_info"};;"ods_coupon_info"){hive -e "$ods_coupon_info"};;"ods_coupon_use"){hive -e "$ods_coupon_use"};;"ods_favor_info"){hive -e "$ods_favor_info"};;"ods_order_refund_info"){hive -e "$ods_order_refund_info"};;"ods_order_status_log"){hive -e "$ods_order_status_log"};;"ods_spu_info"){hive -e "$ods_spu_info"};;"ods_activity_rule"){hive -e "$ods_activity_rule"};;"ods_base_dic"){hive -e "$ods_base_dic"};;"ods_order_detail_activity"){hive -e "$ods_order_detail_activity"};;"ods_order_detail_coupon"){hive -e "$ods_order_detail_coupon"};;"ods_refund_payment"){hive -e "$ods_refund_payment"};;"ods_sku_attr_value"){hive -e "$ods_sku_attr_value"};;"ods_sku_sale_attr_value"){hive -e "$ods_sku_sale_attr_value"};;"ods_base_province"){hive -e "$ods_base_province"};;"ods_base_region"){hive -e "$ods_base_region"};;"all"){hive -e "$ods_order_info$ods_order_detail$ods_sku_info$ods_user_info$ods_payment_info$ods_base_category1$ods_base_category2$ods_base_category3$ods_base_trademark$ods_activity_info$ods_cart_info$ods_comment_info$ods_coupon_info$ods_coupon_use$ods_favor_info$ods_order_refund_info$ods_order_status_log$ods_spu_info$ods_activity_rule$ods_base_dic$ods_order_detail_activity$ods_order_detail_coupon$ods_refund_payment$ods_sku_attr_value$ods_sku_sale_attr_value$ods_base_province$ods_base_region"};;
esac

(2)增加执行权限

[seven@hadoop102 bin]$ chmod +x hdfs_to_ods_db_init.sh

2)脚本使用

(1)执行脚本

[seven@hadoop102 bin]$ hdfs_to_ods_db_init.sh all 2020-06-14

(2)查看数据是否导入成功

4.2.29 ODS层业务表每日数据装载脚本

1)编写脚本

(1)在/home/seven/bin目录下创建脚本hdfs_to_ods_db.sh

[seven@hadoop102 bin]$ vim hdfs_to_ods_db.sh
在脚本中填写如下内容

#!/bin/bashAPP=gmall# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$2" ] ;thendo_date=$2
else do_date=`date -d "-1 day" +%F`
fiods_order_info=" 
load data inpath '/origin_data/$APP/db/order_info/$do_date' OVERWRITE into table ${APP}.ods_order_info partition(dt='$do_date');"ods_order_detail="
load data inpath '/origin_data/$APP/db/order_detail/$do_date' OVERWRITE into table ${APP}.ods_order_detail partition(dt='$do_date');"ods_sku_info="
load data inpath '/origin_data/$APP/db/sku_info/$do_date' OVERWRITE into table ${APP}.ods_sku_info partition(dt='$do_date');"ods_user_info="
load data inpath '/origin_data/$APP/db/user_info/$do_date' OVERWRITE into table ${APP}.ods_user_info partition(dt='$do_date');"ods_payment_info="
load data inpath '/origin_data/$APP/db/payment_info/$do_date' OVERWRITE into table ${APP}.ods_payment_info partition(dt='$do_date');"ods_base_category1="
load data inpath '/origin_data/$APP/db/base_category1/$do_date' OVERWRITE into table ${APP}.ods_base_category1 partition(dt='$do_date');"ods_base_category2="
load data inpath '/origin_data/$APP/db/base_category2/$do_date' OVERWRITE into table ${APP}.ods_base_category2 partition(dt='$do_date');"ods_base_category3="
load data inpath '/origin_data/$APP/db/base_category3/$do_date' OVERWRITE into table ${APP}.ods_base_category3 partition(dt='$do_date'); "ods_base_trademark="
load data inpath '/origin_data/$APP/db/base_trademark/$do_date' OVERWRITE into table ${APP}.ods_base_trademark partition(dt='$do_date'); "ods_activity_info="
load data inpath '/origin_data/$APP/db/activity_info/$do_date' OVERWRITE into table ${APP}.ods_activity_info partition(dt='$do_date'); "ods_cart_info="
load data inpath '/origin_data/$APP/db/cart_info/$do_date' OVERWRITE into table ${APP}.ods_cart_info partition(dt='$do_date'); "ods_comment_info="
load data inpath '/origin_data/$APP/db/comment_info/$do_date' OVERWRITE into table ${APP}.ods_comment_info partition(dt='$do_date'); "ods_coupon_info="
load data inpath '/origin_data/$APP/db/coupon_info/$do_date' OVERWRITE into table ${APP}.ods_coupon_info partition(dt='$do_date'); "ods_coupon_use="
load data inpath '/origin_data/$APP/db/coupon_use/$do_date' OVERWRITE into table ${APP}.ods_coupon_use partition(dt='$do_date'); "ods_favor_info="
load data inpath '/origin_data/$APP/db/favor_info/$do_date' OVERWRITE into table ${APP}.ods_favor_info partition(dt='$do_date'); "ods_order_refund_info="
load data inpath '/origin_data/$APP/db/order_refund_info/$do_date' OVERWRITE into table ${APP}.ods_order_refund_info partition(dt='$do_date'); "ods_order_status_log="
load data inpath '/origin_data/$APP/db/order_status_log/$do_date' OVERWRITE into table ${APP}.ods_order_status_log partition(dt='$do_date'); "ods_spu_info="
load data inpath '/origin_data/$APP/db/spu_info/$do_date' OVERWRITE into table ${APP}.ods_spu_info partition(dt='$do_date'); "ods_activity_rule="
load data inpath '/origin_data/$APP/db/activity_rule/$do_date' OVERWRITE into table ${APP}.ods_activity_rule partition(dt='$do_date');" ods_base_dic="
load data inpath '/origin_data/$APP/db/base_dic/$do_date' OVERWRITE into table ${APP}.ods_base_dic partition(dt='$do_date'); "ods_order_detail_activity="
load data inpath '/origin_data/$APP/db/order_detail_activity/$do_date' OVERWRITE into table ${APP}.ods_order_detail_activity partition(dt='$do_date'); "ods_order_detail_coupon="
load data inpath '/origin_data/$APP/db/order_detail_coupon/$do_date' OVERWRITE into table ${APP}.ods_order_detail_coupon partition(dt='$do_date'); "ods_refund_payment="
load data inpath '/origin_data/$APP/db/refund_payment/$do_date' OVERWRITE into table ${APP}.ods_refund_payment partition(dt='$do_date'); "ods_sku_attr_value="
load data inpath '/origin_data/$APP/db/sku_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_attr_value partition(dt='$do_date'); "ods_sku_sale_attr_value="
load data inpath '/origin_data/$APP/db/sku_sale_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_sale_attr_value partition(dt='$do_date'); "ods_base_province=" 
load data inpath '/origin_data/$APP/db/base_province/$do_date' OVERWRITE into table ${APP}.ods_base_province;"ods_base_region="
load data inpath '/origin_data/$APP/db/base_region/$do_date' OVERWRITE into table ${APP}.ods_base_region;"case $1 in"ods_order_info"){hive -e "$ods_order_info"};;"ods_order_detail"){hive -e "$ods_order_detail"};;"ods_sku_info"){hive -e "$ods_sku_info"};;"ods_user_info"){hive -e "$ods_user_info"};;"ods_payment_info"){hive -e "$ods_payment_info"};;"ods_base_category1"){hive -e "$ods_base_category1"};;"ods_base_category2"){hive -e "$ods_base_category2"};;"ods_base_category3"){hive -e "$ods_base_category3"};;"ods_base_trademark"){hive -e "$ods_base_trademark"};;"ods_activity_info"){hive -e "$ods_activity_info"};;"ods_cart_info"){hive -e "$ods_cart_info"};;"ods_comment_info"){hive -e "$ods_comment_info"};;"ods_coupon_info"){hive -e "$ods_coupon_info"};;"ods_coupon_use"){hive -e "$ods_coupon_use"};;"ods_favor_info"){hive -e "$ods_favor_info"};;"ods_order_refund_info"){hive -e "$ods_order_refund_info"};;"ods_order_status_log"){hive -e "$ods_order_status_log"};;"ods_spu_info"){hive -e "$ods_spu_info"};;"ods_activity_rule"){hive -e "$ods_activity_rule"};;"ods_base_dic"){hive -e "$ods_base_dic"};;"ods_order_detail_activity"){hive -e "$ods_order_detail_activity"};;"ods_order_detail_coupon"){hive -e "$ods_order_detail_coupon"};;"ods_refund_payment"){hive -e "$ods_refund_payment"};;"ods_sku_attr_value"){hive -e "$ods_sku_attr_value"};;"ods_sku_sale_attr_value"){hive -e "$ods_sku_sale_attr_value"};;"all"){hive -e "$ods_order_info$ods_order_detail$ods_sku_info$ods_user_info$ods_payment_info$ods_base_category1$ods_base_category2$ods_base_category3$ods_base_trademark$ods_activity_info$ods_cart_info$ods_comment_info$ods_coupon_info$ods_coupon_use$ods_favor_info$ods_order_refund_info$ods_order_status_log$ods_spu_info$ods_activity_rule$ods_base_dic$ods_order_detail_activity$ods_order_detail_coupon$ods_refund_payment$ods_sku_attr_value$ods_sku_sale_attr_value"};;
esac

(2)修改权限

[seven@hadoop102 bin]$ chmod +x hdfs_to_ods_db.sh

2)脚本使用

(1)执行脚本

[seven@hadoop102 bin]$ hdfs_to_ods_db.sh all 2020-06-14

(2)查看数据是否导入成功

第5章 数仓搭建-DIM层

5.1 商品维度表(全量)

1.建表语句

DROP TABLE IF EXISTS dim_sku_info;
CREATE EXTERNAL TABLE dim_sku_info (`id` STRING COMMENT '商品id',`price` DECIMAL(16,2) COMMENT '商品价格',`sku_name` STRING COMMENT '商品名称',`sku_desc` STRING COMMENT '商品描述',`weight` DECIMAL(16,2) COMMENT '重量',`is_sale` BOOLEAN COMMENT '是否在售',`spu_id` STRING COMMENT 'spu编号',`spu_name` STRING COMMENT 'spu名称',`category3_id` STRING COMMENT '三级分类id',`category3_name` STRING COMMENT '三级分类名称',`category2_id` STRING COMMENT '二级分类id',`category2_name` STRING COMMENT '二级分类名称',`category1_id` STRING COMMENT '一级分类id',`category1_name` STRING COMMENT '一级分类名称',`tm_id` STRING COMMENT '品牌id',`tm_name` STRING COMMENT '品牌名称',`sku_attr_values` ARRAY<STRUCT<attr_id:STRING,value_id:STRING,attr_name:STRING,value_name:STRING>> COMMENT '平台属性',`sku_sale_attr_values` ARRAY<STRUCT<sale_attr_id:STRING,sale_attr_value_id:STRING,sale_attr_name:STRING,sale_attr_value_name:STRING>> COMMENT '销售属性',`create_time` STRING COMMENT '创建时间'
) COMMENT '商品维度表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dim/dim_sku_info/'
TBLPROPERTIES ("parquet.compression"="lzo");

2.分区规划

3.数据装载

1)Hive读取索引文件问题

(1)两种方式,分别查询数据有多少行

hive (gmall)> select * from ods_log;
Time taken: 0.706 seconds, Fetched:
2955 row(s)

hive (gmall)> select count(*) from ods_log;
2959

(2)两次查询结果不一致。

        原因是select * from ods_log不执行MR操作,直接采用的是ods_log建表语句中指定的DeprecatedLzoTextInputFormat,能够识别lzo.index为索引文件。

        select count(*) from ods_log执行MR操作,会先经过hive.input.format,其默认值为CombineHiveInputFormat,其会先将索引文件当成小文件合并,将其当做普通文件处理更严重的是,这会导致LZO文件无法切片。

hive (gmall)> hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;

解决办法:修改CombineHiveInputFormat为HiveInputFormat
        hive (gmall)> set hive.input.format=org.apache.hadoop.hive.ql.io.
HiveInputFormat;

2)首日装载

with
sku as
(selectid,price,sku_name,sku_desc,weight,is_sale,spu_id,category3_id,tm_id,create_timefrom ods_sku_infowhere dt='2020-06-14'
),
spu as
(selectid,spu_namefrom ods_spu_infowhere dt='2020-06-14'
),
c3 as
(selectid,name,category2_idfrom ods_base_category3where dt='2020-06-14'
),
c2 as
(selectid,name,category1_idfrom ods_base_category2where dt='2020-06-14'
),
c1 as
(selectid,namefrom ods_base_category1where dt='2020-06-14'
),
tm as
(selectid,tm_namefrom ods_base_trademarkwhere dt='2020-06-14'
),
attr as
(selectsku_id,collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrsfrom ods_sku_attr_valuewhere dt='2020-06-14'group by sku_id
),
sale_attr as
(selectsku_id,collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrsfrom ods_sku_sale_attr_valuewhere dt='2020-06-14'group by sku_id
)
insert overwrite table dim_sku_info partition(dt='2020-06-14')
selectsku.id,sku.price,sku.sku_name,sku.sku_desc,sku.weight,sku.is_sale,sku.spu_id,spu.spu_name,sku.category3_id,c3.name,c3.category2_id,c2.name,c2.category1_id,c1.name,sku.tm_id,tm.tm_name,attr.attrs,sale_attr.sale_attrs,sku.create_time
from sku
left join spu on sku.spu_id=spu.id
left join c3 on sku.category3_id=c3.id
left join c2 on c3.category2_id=c2.id
left join c1 on c2.category1_id=c1.id
left join tm on sku.tm_id=tm.id
left join attr on sku.id=attr.sku_id
left join sale_attr on sku.id=sale_attr.sku_id;

3)每日装载

with
sku as
(selectid,price,sku_name,sku_desc,weight,is_sale,spu_id,category3_id,tm_id,create_timefrom ods_sku_infowhere dt='2020-06-15'
),
spu as
(selectid,spu_namefrom ods_spu_infowhere dt='2020-06-15'
),
c3 as
(selectid,name,category2_idfrom ods_base_category3where dt='2020-06-15'
),
c2 as
(selectid,name,category1_idfrom ods_base_category2where dt='2020-06-15'
),
c1 as
(selectid,namefrom ods_base_category1where dt='2020-06-15'
),
tm as
(selectid,tm_namefrom ods_base_trademarkwhere dt='2020-06-15'
),
attr as
(selectsku_id,collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrsfrom ods_sku_attr_valuewhere dt='2020-06-15'group by sku_id
),
sale_attr as
(selectsku_id,collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrsfrom ods_sku_sale_attr_valuewhere dt='2020-06-15'group by sku_id
)
insert overwrite table dim_sku_info partition(dt='2020-06-15')
selectsku.id,sku.price,sku.sku_name,sku.sku_desc,sku.weight,sku.is_sale,sku.spu_id,spu.spu_name,sku.category3_id,c3.name,c3.category2_id,c2.name,c2.category1_id,c1.name,sku.tm_id,tm.tm_name,attr.attrs,sale_attr.sale_attrs,sku.create_time
from sku
left join spu on sku.spu_id=spu.id
left join c3 on sku.category3_id=c3.id
left join c2 on c3.category2_id=c2.id
left join c1 on c2.category1_id=c1.id
left join tm on sku.tm_id=tm.id
left join attr on sku.id=attr.sku_id
left join sale_attr on sku.id=sale_attr.sku_id;

5.2 优惠券维度表(全量)

1.建表语句

DROP TABLE IF EXISTS dim_coupon_info;
CREATE EXTERNAL TABLE dim_coupon_info(`id` STRING COMMENT '购物券编号',`coupon_name` STRING COMMENT '购物券名称',`coupon_type` STRING COMMENT '购物券类型 1 现金券 2 折扣券 3 满减券 4 满件打折券',`condition_amount` DECIMAL(16,2) COMMENT '满额数',`condition_num` BIGINT COMMENT '满件数',`activity_id` STRING COMMENT '活动编号',`benefit_amount` DECIMAL(16,2) COMMENT '减金额',`benefit_discount` DECIMAL(16,2) COMMENT '折扣',`create_time` STRING COMMENT '创建时间',`range_type` STRING COMMENT '范围类型 1、商品 2、品类 3、品牌',`limit_num` BIGINT COMMENT '最多领取次数',`taken_count` BIGINT COMMENT '已领取次数',`start_time` STRING COMMENT '可以领取的开始日期',`end_time` STRING COMMENT '可以领取的结束日期',`operate_time` STRING COMMENT '修改时间',`expire_time` STRING COMMENT '过期时间'
) COMMENT '优惠券维度表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dim/dim_coupon_info/'
TBLPROPERTIES ("parquet.compression"="lzo");

2.分区规划

3.数据装载

1)首日装载

insert overwrite table dim_coupon_info partition(dt='2020-06-14')
selectid,coupon_name,coupon_type,condition_amount,condition_num,activity_id,benefit_amount,benefit_discount,create_time,range_type,limit_num,taken_count,start_time,end_time,operate_time,expire_time
from ods_coupon_info
where dt='2020-06-14';

2)每日装载

insert overwrite table dim_coupon_info partition(dt='2020-06-15')
selectid,coupon_name,coupon_type,condition_amount,condition_num,activity_id,benefit_amount,benefit_discount,create_time,range_type,limit_num,taken_count,start_time,end_time,operate_time,expire_time
from ods_coupon_info
where dt='2020-06-15';

5.3 活动维度表(全量)

1.建表语句

DROP TABLE IF EXISTS dim_activity_rule_info;
CREATE EXTERNAL TABLE dim_activity_rule_info(`activity_rule_id` STRING COMMENT '活动规则ID',`activity_id` STRING COMMENT '活动ID',`activity_name` STRING  COMMENT '活动名称',`activity_type` STRING  COMMENT '活动类型',`start_time` STRING  COMMENT '开始时间',`end_time` STRING  COMMENT '结束时间',`create_time` STRING  COMMENT '创建时间',`condition_amount` DECIMAL(16,2) COMMENT '满减金额',`condition_num` BIGINT COMMENT '满减件数',`benefit_amount` DECIMAL(16,2) COMMENT '优惠金额',`benefit_discount` DECIMAL(16,2) COMMENT '优惠折扣',`benefit_level` STRING COMMENT '优惠级别'
) COMMENT '活动信息表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dim/dim_activity_rule_info/'
TBLPROPERTIES ("parquet.compression"="lzo");

2.分区规划

3.数据装载

1)首日装载

insert overwrite table dim_activity_rule_info partition(dt='2020-06-14')
selectar.id,ar.activity_id,ai.activity_name,ar.activity_type,ai.start_time,ai.end_time,ai.create_time,ar.condition_amount,ar.condition_num,ar.benefit_amount,ar.benefit_discount,ar.benefit_level
from
(selectid,activity_id,activity_type,condition_amount,condition_num,benefit_amount,benefit_discount,benefit_levelfrom ods_activity_rulewhere dt='2020-06-14'
)ar
left join
(selectid,activity_name,start_time,end_time,create_timefrom ods_activity_infowhere dt='2020-06-14'
)ai
on ar.activity_id=ai.id;

2)每日转载

insert overwrite table dim_activity_rule_info partition(dt='2020-06-15')
selectar.id,ar.activity_id,ai.activity_name,ar.activity_type,ai.start_time,ai.end_time,ai.create_time,ar.condition_amount,ar.condition_num,ar.benefit_amount,ar.benefit_discount,ar.benefit_level
from
(selectid,activity_id,activity_type,condition_amount,condition_num,benefit_amount,benefit_discount,benefit_levelfrom ods_activity_rulewhere dt='2020-06-15'
)ar
left join
(selectid,activity_name,start_time,end_time,create_timefrom ods_activity_infowhere dt='2020-06-15'
)ai
on ar.activity_id=ai.id;

5.4 地区维度表(特殊)

1.建表语句

DROP TABLE IF EXISTS dim_base_province;
CREATE EXTERNAL TABLE dim_base_province (`id` STRING COMMENT 'id',`province_name` STRING COMMENT '省市名称',`area_code` STRING COMMENT '地区编码',`iso_code` STRING COMMENT 'ISO-3166编码,供可视化使用',`iso_3166_2` STRING COMMENT 'IOS-3166-2编码,供可视化使用',`region_id` STRING COMMENT '地区id',`region_name` STRING COMMENT '地区名称'
) COMMENT '地区维度表'
STORED AS PARQUET
LOCATION '/warehouse/gmall/dim/dim_base_province/'
TBLPROPERTIES ("parquet.compression"="lzo");

2.数据装载

地区维度表数据相对稳定,变化概率较低,故无需每日装载。

insert overwrite table dim_base_province
selectbp.id,bp.name,bp.area_code,bp.iso_code,bp.iso_3166_2,bp.region_id,br.region_name
from ods_base_province bp
join ods_base_region br on bp.region_id = br.id;

5.5 时间维度表(特殊)

1.建表语句

DROP TABLE IF EXISTS dim_date_info;
CREATE EXTERNAL TABLE dim_date_info(`date_id` STRING COMMENT '日',`week_id` STRING COMMENT '周ID',`week_day` STRING COMMENT '周几',`day` STRING COMMENT '每月的第几天',`month` STRING COMMENT '第几月',`quarter` STRING COMMENT '第几季度',`year` STRING COMMENT '年',`is_workday` STRING COMMENT '是否是工作日',`holiday_id` STRING COMMENT '节假日'
) COMMENT '时间维度表'
STORED AS PARQUET
LOCATION '/warehouse/gmall/dim/dim_date_info/'
TBLPROPERTIES ("parquet.compression"="lzo");

2.数据装载

通常情况下,时间维度表的数据并不是来自于业务系统,而是手动写入,并且由于时间维度表数据的可预见性,无须每日导入,一般可一次性导入一年的数据。

1)创建临时表

DROP TABLE IF EXISTS tmp_dim_date_info;
CREATE EXTERNAL TABLE tmp_dim_date_info (`date_id` STRING COMMENT '日',`week_id` STRING COMMENT '周ID',`week_day` STRING COMMENT '周几',`day` STRING COMMENT '每月的第几天',`month` STRING COMMENT '第几月',`quarter` STRING COMMENT '第几季度',`year` STRING COMMENT '年',`is_workday` STRING COMMENT '是否是工作日',`holiday_id` STRING COMMENT '节假日'
) COMMENT '时间维度表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/tmp/tmp_dim_date_info/';

2)将数据文件上传到HFDS上临时表指定路径/warehouse/gmall/tmp/tmp_dim_date_info/

        

其内容如下:

2020-01-01	1	3	1	1	1	2020	0	元旦
2020-01-02	1	4	2	1	1	2020	1	\N
2020-01-03	1	5	3	1	1	2020	1	\N
2020-01-04	1	6	4	1	1	2020	0	\N

3)执行以下语句将其导入时间维度表

insert overwrite table dim_date_info select * from tmp_dim_date_info;

4)检查数据是否导入成功

select * from dim_date_info;

5.6 用户维度表(拉链表)

5.6.1 拉链表概述

1)什么是拉链表

2)为什么要做拉链表

3)如何使用拉链表

4)拉链表形成过程

5.6.2 制作拉链表

1.建表语句

DROP TABLE IF EXISTS dim_user_info;
CREATE EXTERNAL TABLE dim_user_info(`id` STRING COMMENT '用户id',`login_name` STRING COMMENT '用户名称',`nick_name` STRING COMMENT '用户昵称',`name` STRING COMMENT '用户姓名',`phone_num` STRING COMMENT '手机号码',`email` STRING COMMENT '邮箱',`user_level` STRING COMMENT '用户等级',`birthday` STRING COMMENT '生日',`gender` STRING COMMENT '性别',`create_time` STRING COMMENT '创建时间',`operate_time` STRING COMMENT '操作时间',`start_date` STRING COMMENT '开始日期',`end_date` STRING COMMENT '结束日期'
) COMMENT '用户表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dim/dim_user_info/'
TBLPROPERTIES ("parquet.compression"="lzo");

2.分区规划

3.数据装载

1)首日装载

        拉链表首日装载,需要进行初始化操作,具体工作为将截止到初始化当日的全部历史用户导入一次性导入到拉链表中。目前的ods_user_info表的第一个分区,即2020-06-14分区中就是全部的历史用户,故将该分区数据进行一定处理后导入拉链表的9999-99-99分区即可。

insert overwrite table dim_user_info partition(dt='9999-99-99')
selectid,login_name,nick_name,md5(name),md5(phone_num),md5(email),user_level,birthday,gender,create_time,operate_time,'2020-06-14','9999-99-99'
from ods_user_info
where dt='2020-06-14';

2)每日装载

(1)实现思路

(2)sql编写

with
tmp as
(selectold.id old_id,old.login_name old_login_name,old.nick_name old_nick_name,old.name old_name,old.phone_num old_phone_num,old.email old_email,old.user_level old_user_level,old.birthday old_birthday,old.gender old_gender,old.create_time old_create_time,old.operate_time old_operate_time,old.start_date old_start_date,old.end_date old_end_date,new.id new_id,new.login_name new_login_name,new.nick_name new_nick_name,new.name new_name,new.phone_num new_phone_num,new.email new_email,new.user_level new_user_level,new.birthday new_birthday,new.gender new_gender,new.create_time new_create_time,new.operate_time new_operate_time,new.start_date new_start_date,new.end_date new_end_datefrom(selectid,login_name,nick_name,name,phone_num,email,user_level,birthday,gender,create_time,operate_time,start_date,end_datefrom dim_user_infowhere dt='9999-99-99')oldfull outer join(selectid,login_name,nick_name,md5(name) name,md5(phone_num) phone_num,md5(email) email,user_level,birthday,gender,create_time,operate_time,'2020-06-15' start_date,'9999-99-99' end_datefrom ods_user_infowhere dt='2020-06-15')newon old.id=new.id
)
insert overwrite table dim_user_info partition(dt)
selectnvl(new_id,old_id),nvl(new_login_name,old_login_name),nvl(new_nick_name,old_nick_name),nvl(new_name,old_name),nvl(new_phone_num,old_phone_num),nvl(new_email,old_email),nvl(new_user_level,old_user_level),nvl(new_birthday,old_birthday),nvl(new_gender,old_gender),nvl(new_create_time,old_create_time),nvl(new_operate_time,old_operate_time),nvl(new_start_date,old_start_date),nvl(new_end_date,old_end_date),nvl(new_end_date,old_end_date) dt
from tmp
union all
selectold_id,old_login_name,old_nick_name,old_name,old_phone_num,old_email,old_user_level,old_birthday,old_gender,old_create_time,old_operate_time,old_start_date,cast(date_add('2020-06-15',-1) as string),cast(date_add('2020-06-15',-1) as string) dt
from tmp
where new_id is not null and old_id is not null;

5.7 DIM层首日数据装载脚本

1)编写脚本

(1)在/home/seven/bin目录下创建脚本ods_to_dim_db_init.sh

[seven@hadoop102 bin]$ vim ods_to_dim_db_init.sh

在脚本中填写如下内容

#!/bin/bashAPP=gmallif [ -n "$2" ] ;thendo_date=$2
else echo "请传入日期参数"exit
fi dim_user_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dim_user_info partition(dt='9999-99-99')
selectid,login_name,nick_name,md5(name),md5(phone_num),md5(email),user_level,birthday,gender,create_time,operate_time,'$do_date','9999-99-99'
from ${APP}.ods_user_info
where dt='$do_date';
"dim_sku_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
with
sku as
(selectid,price,sku_name,sku_desc,weight,is_sale,spu_id,category3_id,tm_id,create_timefrom ${APP}.ods_sku_infowhere dt='$do_date'
),
spu as
(selectid,spu_namefrom ${APP}.ods_spu_infowhere dt='$do_date'
),
c3 as
(selectid,name,category2_idfrom ${APP}.ods_base_category3where dt='$do_date'
),
c2 as
(selectid,name,category1_idfrom ${APP}.ods_base_category2where dt='$do_date'
),
c1 as
(selectid,namefrom ${APP}.ods_base_category1where dt='$do_date'
),
tm as
(selectid,tm_namefrom ${APP}.ods_base_trademarkwhere dt='$do_date'
),
attr as
(selectsku_id,collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrsfrom ${APP}.ods_sku_attr_valuewhere dt='$do_date'group by sku_id
),
sale_attr as
(selectsku_id,collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrsfrom ${APP}.ods_sku_sale_attr_valuewhere dt='$do_date'group by sku_id
)insert overwrite table ${APP}.dim_sku_info partition(dt='$do_date')
selectsku.id,sku.price,sku.sku_name,sku.sku_desc,sku.weight,sku.is_sale,sku.spu_id,spu.spu_name,sku.category3_id,c3.name,c3.category2_id,c2.name,c2.category1_id,c1.name,sku.tm_id,tm.tm_name,attr.attrs,sale_attr.sale_attrs,sku.create_time
from sku
left join spu on sku.spu_id=spu.id
left join c3 on sku.category3_id=c3.id
left join c2 on c3.category2_id=c2.id
left join c1 on c2.category1_id=c1.id
left join tm on sku.tm_id=tm.id
left join attr on sku.id=attr.sku_id
left join sale_attr on sku.id=sale_attr.sku_id;
"dim_base_province="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dim_base_province
selectbp.id,bp.name,bp.area_code,bp.iso_code,bp.iso_3166_2,bp.region_id,br.region_name
from ${APP}.ods_base_province bp
join ${APP}.ods_base_region br on bp.region_id = br.id;
"dim_coupon_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dim_coupon_info partition(dt='$do_date')
selectid,coupon_name,coupon_type,condition_amount,condition_num,activity_id,benefit_amount,benefit_discount,create_time,range_type,limit_num,taken_count,start_time,end_time,operate_time,expire_time
from ${APP}.ods_coupon_info
where dt='$do_date';
"dim_activity_rule_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dim_activity_rule_info partition(dt='$do_date')
selectar.id,ar.activity_id,ai.activity_name,ar.activity_type,ai.start_time,ai.end_time,ai.create_time,ar.condition_amount,ar.condition_num,ar.benefit_amount,ar.benefit_discount,ar.benefit_level
from
(selectid,activity_id,activity_type,condition_amount,condition_num,benefit_amount,benefit_discount,benefit_levelfrom ${APP}.ods_activity_rulewhere dt='$do_date'
)ar
left join
(selectid,activity_name,start_time,end_time,create_timefrom ${APP}.ods_activity_infowhere dt='$do_date'
)ai
on ar.activity_id=ai.id;
"case $1 in
"dim_user_info"){hive -e "$dim_user_info"
};;
"dim_sku_info"){hive -e "$dim_sku_info"
};;
"dim_base_province"){hive -e "$dim_base_province"
};;
"dim_coupon_info"){hive -e "$dim_coupon_info"
};;
"dim_activity_rule_info"){hive -e "$dim_activity_rule_info"
};;
"all"){hive -e "$dim_user_info$dim_sku_info$dim_coupon_info$dim_activity_rule_info$dim_base_province"
};;
esac

(2)增加执行权限

[seven@hadoop102 bin]$ chmod +x ods_to_dim_db_init.sh

2)脚本使用

(1)执行脚本

[seven@hadoop102 bin]$ ods_to_dim_db_init.sh all 2020-06-14

注意:该脚本不包含时间维度表的装载,时间维度表需手动装载数据,参考5.5节。

(2)查看数据是否导入成功

5.8 DIM层每日数据装载脚本

1)编写脚本

(1)在/home/seven/bin目录下创建脚本ods_to_dim_db.sh

[seven@hadoop102 bin]$ vim ods_to_dim_db.sh

在脚本中填写如下内容

#!/bin/bashAPP=gmall# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$2" ] ;thendo_date=$2
else do_date=`date -d "-1 day" +%F`
fidim_user_info="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
with
tmp as
(selectold.id old_id,old.login_name old_login_name,old.nick_name old_nick_name,old.name old_name,old.phone_num old_phone_num,old.email old_email,old.user_level old_user_level,old.birthday old_birthday,old.gender old_gender,old.create_time old_create_time,old.operate_time old_operate_time,old.start_date old_start_date,old.end_date old_end_date,new.id new_id,new.login_name new_login_name,new.nick_name new_nick_name,new.name new_name,new.phone_num new_phone_num,new.email new_email,new.user_level new_user_level,new.birthday new_birthday,new.gender new_gender,new.create_time new_create_time,new.operate_time new_operate_time,new.start_date new_start_date,new.end_date new_end_datefrom(selectid,login_name,nick_name,name,phone_num,email,user_level,birthday,gender,create_time,operate_time,start_date,end_datefrom ${APP}.dim_user_infowhere dt='9999-99-99'and start_date<'$do_date')oldfull outer join(selectid,login_name,nick_name,md5(name) name,md5(phone_num) phone_num,md5(email) email,user_level,birthday,gender,create_time,operate_time,'$do_date' start_date,'9999-99-99' end_datefrom ${APP}.ods_user_infowhere dt='$do_date')newon old.id=new.id
)
insert overwrite table ${APP}.dim_user_info partition(dt)
selectnvl(new_id,old_id),nvl(new_login_name,old_login_name),nvl(new_nick_name,old_nick_name),nvl(new_name,old_name),nvl(new_phone_num,old_phone_num),nvl(new_email,old_email),nvl(new_user_level,old_user_level),nvl(new_birthday,old_birthday),nvl(new_gender,old_gender),nvl(new_create_time,old_create_time),nvl(new_operate_time,old_operate_time),nvl(new_start_date,old_start_date),nvl(new_end_date,old_end_date),nvl(new_end_date,old_end_date) dt
from tmp
union all
selectold_id,old_login_name,old_nick_name,old_name,old_phone_num,old_email,old_user_level,old_birthday,old_gender,old_create_time,old_operate_time,old_start_date,cast(date_add('$do_date',-1) as string),cast(date_add('$do_date',-1) as string) dt
from tmp
where new_id is not null and old_id is not null;
"dim_sku_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
with
sku as
(selectid,price,sku_name,sku_desc,weight,is_sale,spu_id,category3_id,tm_id,create_timefrom ${APP}.ods_sku_infowhere dt='$do_date'
),
spu as
(selectid,spu_namefrom ${APP}.ods_spu_infowhere dt='$do_date'
),
c3 as
(selectid,name,category2_idfrom ${APP}.ods_base_category3where dt='$do_date'
),
c2 as
(selectid,name,category1_idfrom ${APP}.ods_base_category2where dt='$do_date'
),
c1 as
(selectid,namefrom ${APP}.ods_base_category1where dt='$do_date'
),
tm as
(selectid,tm_namefrom ${APP}.ods_base_trademarkwhere dt='$do_date'
),
attr as
(selectsku_id,collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrsfrom ${APP}.ods_sku_attr_valuewhere dt='$do_date'group by sku_id
),
sale_attr as
(selectsku_id,collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrsfrom ${APP}.ods_sku_sale_attr_valuewhere dt='$do_date'group by sku_id
)insert overwrite table ${APP}.dim_sku_info partition(dt='$do_date')
selectsku.id,sku.price,sku.sku_name,sku.sku_desc,sku.weight,sku.is_sale,sku.spu_id,spu.spu_name,sku.category3_id,c3.name,c3.category2_id,c2.name,c2.category1_id,c1.name,sku.tm_id,tm.tm_name,attr.attrs,sale_attr.sale_attrs,sku.create_time
from sku
left join spu on sku.spu_id=spu.id
left join c3 on sku.category3_id=c3.id
left join c2 on c3.category2_id=c2.id
left join c1 on c2.category1_id=c1.id
left join tm on sku.tm_id=tm.id
left join attr on sku.id=attr.sku_id
left join sale_attr on sku.id=sale_attr.sku_id;
"dim_base_province="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dim_base_province
selectbp.id,bp.name,bp.area_code,bp.iso_code,bp.iso_3166_2,bp.region_id,bp.name
from ${APP}.ods_base_province bp
join ${APP}.ods_base_region br on bp.region_id = br.id;
"dim_coupon_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dim_coupon_info partition(dt='$do_date')
selectid,coupon_name,coupon_type,condition_amount,condition_num,activity_id,benefit_amount,benefit_discount,create_time,range_type,limit_num,taken_count,start_time,end_time,operate_time,expire_time
from ${APP}.ods_coupon_info
where dt='$do_date';
"dim_activity_rule_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dim_activity_rule_info partition(dt='$do_date')
selectar.id,ar.activity_id,ai.activity_name,ar.activity_type,ai.start_time,ai.end_time,ai.create_time,ar.condition_amount,ar.condition_num,ar.benefit_amount,ar.benefit_discount,ar.benefit_level
from
(selectid,activity_id,activity_type,condition_amount,condition_num,benefit_amount,benefit_discount,benefit_levelfrom ${APP}.ods_activity_rulewhere dt='$do_date'
)ar
left join
(selectid,activity_name,start_time,end_time,create_timefrom ${APP}.ods_activity_infowhere dt='$do_date'
)ai
on ar.activity_id=ai.id;
"case $1 in
"dim_user_info"){hive -e "$dim_user_info"
};;
"dim_sku_info"){hive -e "$dim_sku_info"
};;
"dim_base_province"){hive -e "$dim_base_province"
};;
"dim_coupon_info"){hive -e "$dim_coupon_info"
};;
"dim_activity_rule_info"){hive -e "$dim_activity_rule_info"
};;
"all"){hive -e "$dim_user_info$dim_sku_info$dim_coupon_info$dim_activity_rule_info"
};;
esac

(2)增加执行权限

[seven@hadoop102 bin]$ chmod +x ods_to_dim_db.sh

2)脚本使用

(1)执行脚本

[seven@hadoop102 bin]$ ods_to_dim_db.sh all 2020-06-14

(2)查看数据是否导入成功

第6章 数仓搭建-DWD

1)对用户行为数据解析。

2)对业务数据采用维度模型重新建模。

6.1 DWD层(用户行为日志)

6.1.1 日志解析思路

1)日志结构回顾

(1)页面埋点日志

(2)启动日志

2)日志解析思路

6.1.2 get_json_object函数使用

1)数据

[{"name":"大郎","sex":"男","age":"25"},{"name":"西门庆","sex":"男","age":"47"}]

2)取出第一个json对象

hive (gmall)>select get_json_object('[{"name":"大郎","sex":"男","age":"25"},{"name":"西门庆","sex":"男","age":"47"}]','$[0]');

结果是:{"name":"大郎","sex":"男","age":"25"}

3)取出第一个json的age字段的值

hive (gmall)>SELECT get_json_object('[{"name":"大郎","sex":"男","age":"25"},{"name":"西门庆","sex":"男","age":"47"}]',"$[0].age");

结果是:25

6.1.3 启动日志表

启动日志解析思路:启动日志表中每行数据对应一个启动记录,一个启动记录应该包含日志中的公共信息和启动信息。先将所有包含start字段的日志过滤出来,然后使用get_json_object函数解析每个字段。

1)建表语句

DROP TABLE IF EXISTS dwd_start_log;
CREATE EXTERNAL TABLE dwd_start_log(`area_code` STRING COMMENT '地区编码',`brand` STRING COMMENT '手机品牌',`channel` STRING COMMENT '渠道',`is_new` STRING COMMENT '是否首次启动',`model` STRING COMMENT '手机型号',`mid_id` STRING COMMENT '设备id',`os` STRING COMMENT '操作系统',`user_id` STRING COMMENT '会员id',`version_code` STRING COMMENT 'app版本号',`entry` STRING COMMENT 'icon手机图标 notice 通知 install 安装后启动',`loading_time` BIGINT COMMENT '启动加载时间',`open_ad_id` STRING COMMENT '广告页ID ',`open_ad_ms` BIGINT COMMENT '广告总共播放时间',`open_ad_skip_ms` BIGINT COMMENT '用户跳过广告时点',`ts` BIGINT COMMENT '时间'
) COMMENT '启动日志表'
PARTITIONED BY (`dt` STRING) -- 按照时间创建分区
STORED AS PARQUET -- 采用parquet列式存储
LOCATION '/warehouse/gmall/dwd/dwd_start_log' -- 指定在HDFS上存储位置
TBLPROPERTIES('parquet.compression'='lzo') -- 采用LZO压缩
;

2)数据导入

hive (gmall)> 
insert overwrite table dwd_start_log partition(dt='2020-06-14')
selectget_json_object(line,'$.common.ar'),get_json_object(line,'$.common.ba'),get_json_object(line,'$.common.ch'),get_json_object(line,'$.common.is_new'),get_json_object(line,'$.common.md'),get_json_object(line,'$.common.mid'),get_json_object(line,'$.common.os'),get_json_object(line,'$.common.uid'),get_json_object(line,'$.common.vc'),get_json_object(line,'$.start.entry'),get_json_object(line,'$.start.loading_time'),get_json_object(line,'$.start.open_ad_id'),get_json_object(line,'$.start.open_ad_ms'),get_json_object(line,'$.start.open_ad_skip_ms'),get_json_object(line,'$.ts')
from ods_log
where dt='2020-06-14'
and get_json_object(line,'$.start') is not null;

3)查看数据

hive (gmall)> select * from dwd_start_log where dt='2020-06-14' limit 2;

6.1.4 页面日志表

页面日志解析思路:页面日志表中每行数据对应一个页面访问记录,一个页面访问记录应该包含日志中的公共信息和页面信息。先将所有包含page字段的日志过滤出来,然后使用get_json_object函数解析每个字段。

1)建表语句

DROP TABLE IF EXISTS dwd_page_log;
CREATE EXTERNAL TABLE dwd_page_log(`area_code` STRING COMMENT '地区编码',`brand` STRING COMMENT '手机品牌',`channel` STRING COMMENT '渠道',`is_new` STRING COMMENT '是否首次启动',`model` STRING COMMENT '手机型号',`mid_id` STRING COMMENT '设备id',`os` STRING COMMENT '操作系统',`user_id` STRING COMMENT '会员id',`version_code` STRING COMMENT 'app版本号',`during_time` BIGINT COMMENT '持续时间毫秒',`page_item` STRING COMMENT '目标id ',`page_item_type` STRING COMMENT '目标类型',`last_page_id` STRING COMMENT '上页类型',`page_id` STRING COMMENT '页面ID ',`source_type` STRING COMMENT '来源类型',`ts` bigint
) COMMENT '页面日志表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_page_log'
TBLPROPERTIES('parquet.compression'='lzo');

2)数据导入

hive (gmall)>
insert overwrite table dwd_page_log partition(dt='2020-06-14')
selectget_json_object(line,'$.common.ar'),get_json_object(line,'$.common.ba'),get_json_object(line,'$.common.ch'),get_json_object(line,'$.common.is_new'),get_json_object(line,'$.common.md'),get_json_object(line,'$.common.mid'),get_json_object(line,'$.common.os'),get_json_object(line,'$.common.uid'),get_json_object(line,'$.common.vc'),get_json_object(line,'$.page.during_time'),get_json_object(line,'$.page.item'),get_json_object(line,'$.page.item_type'),get_json_object(line,'$.page.last_page_id'),get_json_object(line,'$.page.page_id'),get_json_object(line,'$.page.source_type'),get_json_object(line,'$.ts')
from ods_log
where dt='2020-06-14'
and get_json_object(line,'$.page') is not null;

3)查看数据

hive (gmall)>

select * from dwd_page_log where dt='2020-06-14' limit 2;

6.1.5 动作日志表

动作日志解析思路:动作日志表中每行数据对应用户的一个动作记录,一个动作记录应当包含公共信息、页面信息以及动作信息。先将包含action字段的日志过滤出来,然后通过UDTF函数将action数组“炸开”(类似于explode函数的效果),然后使用get_json_object函数解析每个字段。

1)建表语句

DROP TABLE IF EXISTS dwd_action_log;
CREATE EXTERNAL TABLE dwd_action_log(`area_code` STRING COMMENT '地区编码',`brand` STRING COMMENT '手机品牌',`channel` STRING COMMENT '渠道',`is_new` STRING COMMENT '是否首次启动',`model` STRING COMMENT '手机型号',`mid_id` STRING COMMENT '设备id',`os` STRING COMMENT '操作系统',`user_id` STRING COMMENT '会员id',`version_code` STRING COMMENT 'app版本号',`during_time` BIGINT COMMENT '持续时间毫秒',`page_item` STRING COMMENT '目标id ',`page_item_type` STRING COMMENT '目标类型',`last_page_id` STRING COMMENT '上页类型',`page_id` STRING COMMENT '页面id ',`source_type` STRING COMMENT '来源类型',`action_id` STRING COMMENT '动作id',`item` STRING COMMENT '目标id ',`item_type` STRING COMMENT '目标类型',`ts` BIGINT COMMENT '时间'
) COMMENT '动作日志表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_action_log'
TBLPROPERTIES('parquet.compression'='lzo');

2)创建UDTF函数——设计思路

3)创建UDTF函数——编写代码

(1)创建一个maven工程hivefunction
(2)创建包名:com.seven.hive.udtf
3)引入如下依赖

<dependencies><!--添加hive依赖--><dependency><groupId>org.apache.hive</groupId><artifactId>hive-exec</artifactId><version>3.1.2</version></dependency>
</dependencies>

4)编码

package com.seven.hive.udtf;import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import org.json.JSONArray;import java.util.ArrayList;
import java.util.List;public class ExplodeJSONArray extends GenericUDTF {@Overridepublic StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException {// 1 参数合法性检查if (argOIs.length != 1) {throw new UDFArgumentException("explode_json_array 只需要一个参数");}// 2 第一个参数必须为string//判断参数是否为基础数据类型if (argOIs[0].getCategory() != ObjectInspector.Category.PRIMITIVE) {throw new UDFArgumentException("explode_json_array 只接受基础类型参数");}//将参数对象检查器强转为基础类型对象检查器PrimitiveObjectInspector argumentOI = (PrimitiveObjectInspector) argOIs[0];//判断参数是否为String类型if (argumentOI.getPrimitiveCategory() != PrimitiveObjectInspector.PrimitiveCategory.STRING) {throw new UDFArgumentException("explode_json_array 只接受string类型的参数");}// 3 定义返回值名称和类型List<String> fieldNames = new ArrayList<String>();List<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();fieldNames.add("items");fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs);}public void process(Object[] objects) throws HiveException {// 1 获取传入的数据String jsonArray = objects[0].toString();// 2 将string转换为json数组JSONArray actions = new JSONArray(jsonArray);// 3 循环一次,取出数组中的一个json,并写出for (int i = 0; i < actions.length(); i++) {String[] result = new String[1];result[0] = actions.getString(i);forward(result);}}public void close() throws HiveException {}}

4)创建函数

(1)打包
(2)将hivefunction-1.0-SNAPSHOT.jar上传到hadoop102的/opt/module,然后再将该jar包上传到HDFS的/user/hive/jars路径下
   [seven@hadoop102 module]$ hadoop fs -mkdir -p /user/hive/jars
   [seven
@hadoop102 module]$ hadoop fs -put hivefunction-1.0-SNAPSHOT.jar /user/hive/jars

(3)创建永久函数与开发好的java class关联

create function explode_json_array as 'com.seven.hive.udtf.ExplodeJSONArray' using jar 'hdfs://hadoop102:8020/user/hive/jars/hivefunction-1.0-SNAPSHOT.jar';

(4)注意:如果修改了自定义函数重新生成jar包怎么处理?只需要替换HDFS路径上的旧jar包,然后重启Hive客户端即可。

5)数据导入

insert overwrite table dwd_action_log partition(dt='2020-06-14')
selectget_json_object(line,'$.common.ar'),get_json_object(line,'$.common.ba'),get_json_object(line,'$.common.ch'),get_json_object(line,'$.common.is_new'),get_json_object(line,'$.common.md'),get_json_object(line,'$.common.mid'),get_json_object(line,'$.common.os'),get_json_object(line,'$.common.uid'),get_json_object(line,'$.common.vc'),get_json_object(line,'$.page.during_time'),get_json_object(line,'$.page.item'),get_json_object(line,'$.page.item_type'),get_json_object(line,'$.page.last_page_id'),get_json_object(line,'$.page.page_id'),get_json_object(line,'$.page.source_type'),get_json_object(action,'$.action_id'),get_json_object(action,'$.item'),get_json_object(action,'$.item_type'),get_json_object(action,'$.ts')
from ods_log lateral view explode_json_array(get_json_object(line,'$.actions')) tmp as action
where dt='2020-06-14'
and get_json_object(line,'$.actions') is not null;

3)查看数据
        
select * from dwd_action_log where dt='2020-06-14' limit 2;

6.1.6 曝光日志表

        曝光日志解析思路:曝光日志表中每行数据对应一个曝光记录,一个曝光记录应当包含公共信息、页面信息以及曝光信息。先将包含display字段的日志过滤出来,然后通过UDTF函数,将display数组“炸开”(类似于explode函数的效果),然后使用get_json_object函数解析每个字段。

1)建表语句

DROP TABLE IF EXISTS dwd_display_log;
CREATE EXTERNAL TABLE dwd_display_log(`area_code` STRING COMMENT '地区编码',`brand` STRING COMMENT '手机品牌',`channel` STRING COMMENT '渠道',`is_new` STRING COMMENT '是否首次启动',`model` STRING COMMENT '手机型号',`mid_id` STRING COMMENT '设备id',`os` STRING COMMENT '操作系统',`user_id` STRING COMMENT '会员id',`version_code` STRING COMMENT 'app版本号',`during_time` BIGINT COMMENT 'app版本号',`page_item` STRING COMMENT '目标id ',`page_item_type` STRING COMMENT '目标类型',`last_page_id` STRING COMMENT '上页类型',`page_id` STRING COMMENT '页面ID ',`source_type` STRING COMMENT '来源类型',`ts` BIGINT COMMENT 'app版本号',`display_type` STRING COMMENT '曝光类型',`item` STRING COMMENT '曝光对象id ',`item_type` STRING COMMENT 'app版本号',`order` BIGINT COMMENT '曝光顺序',`pos_id` BIGINT COMMENT '曝光位置'
) COMMENT '曝光日志表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_display_log'
TBLPROPERTIES('parquet.compression'='lzo'); 

2)数据导入

insert overwrite table dwd_display_log partition(dt='2020-06-14')
selectget_json_object(line,'$.common.ar'),get_json_object(line,'$.common.ba'),get_json_object(line,'$.common.ch'),get_json_object(line,'$.common.is_new'),get_json_object(line,'$.common.md'),get_json_object(line,'$.common.mid'),get_json_object(line,'$.common.os'),get_json_object(line,'$.common.uid'),get_json_object(line,'$.common.vc'),get_json_object(line,'$.page.during_time'),get_json_object(line,'$.page.item'),get_json_object(line,'$.page.item_type'),get_json_object(line,'$.page.last_page_id'),get_json_object(line,'$.page.page_id'),get_json_object(line,'$.page.source_type'),get_json_object(line,'$.ts'),get_json_object(display,'$.display_type'),get_json_object(display,'$.item'),get_json_object(display,'$.item_type'),get_json_object(display,'$.order'),get_json_object(display,'$.pos_id')
from ods_log lateral view explode_json_array(get_json_object(line,'$.displays')) tmp as display
where dt='2020-06-14'
and get_json_object(line,'$.displays') is not null;

3)查看数据

select * from dwd_display_log where dt='2020-06-14' limit 2;

6.1.7 错误日志表

错误日志解析思路:错误日志表中每行数据对应一个错误记录,为方便定位错误,一个错误记录应当包含与之对应的公共信息、页面信息、曝光信息、动作信息、启动信息以及错误信息。先将包含err字段的日志过滤出来,然后使用get_json_object函数解析所有字段。

1)建表语句

DROP TABLE IF EXISTS dwd_error_log;
CREATE EXTERNAL TABLE dwd_error_log(`area_code` STRING COMMENT '地区编码',`brand` STRING COMMENT '手机品牌',`channel` STRING COMMENT '渠道',`is_new` STRING COMMENT '是否首次启动',`model` STRING COMMENT '手机型号',`mid_id` STRING COMMENT '设备id',`os` STRING COMMENT '操作系统',`user_id` STRING COMMENT '会员id',`version_code` STRING COMMENT 'app版本号',`page_item` STRING COMMENT '目标id ',`page_item_type` STRING COMMENT '目标类型',`last_page_id` STRING COMMENT '上页类型',`page_id` STRING COMMENT '页面ID ',`source_type` STRING COMMENT '来源类型',`entry` STRING COMMENT ' icon手机图标  notice 通知 install 安装后启动',`loading_time` STRING COMMENT '启动加载时间',`open_ad_id` STRING COMMENT '广告页ID ',`open_ad_ms` STRING COMMENT '广告总共播放时间',`open_ad_skip_ms` STRING COMMENT '用户跳过广告时点',`actions` STRING COMMENT '动作',`displays` STRING COMMENT '曝光',`ts` STRING COMMENT '时间',`error_code` STRING COMMENT '错误码',`msg` STRING COMMENT '错误信息'
) COMMENT '错误日志表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_error_log'
TBLPROPERTIES('parquet.compression'='lzo');

说明:此处为对动作数组和曝光数组做处理,如需分析错误与单个动作或曝光的关联,可先使用explode_json_array函数将数组“炸开”,再使用get_json_object函数获取具体字段。

4)数据导入

insert overwrite table dwd_error_log partition(dt='2020-06-14')
selectget_json_object(line,'$.common.ar'),get_json_object(line,'$.common.ba'),get_json_object(line,'$.common.ch'),get_json_object(line,'$.common.is_new'),get_json_object(line,'$.common.md'),get_json_object(line,'$.common.mid'),get_json_object(line,'$.common.os'),get_json_object(line,'$.common.uid'),get_json_object(line,'$.common.vc'),get_json_object(line,'$.page.item'),get_json_object(line,'$.page.item_type'),get_json_object(line,'$.page.last_page_id'),get_json_object(line,'$.page.page_id'),get_json_object(line,'$.page.source_type'),get_json_object(line,'$.start.entry'),get_json_object(line,'$.start.loading_time'),get_json_object(line,'$.start.open_ad_id'),get_json_object(line,'$.start.open_ad_ms'),get_json_object(line,'$.start.open_ad_skip_ms'),get_json_object(line,'$.actions'),get_json_object(line,'$.displays'),get_json_object(line,'$.ts'),get_json_object(line,'$.err.error_code'),get_json_object(line,'$.err.msg')
from ods_log
where dt='2020-06-14'
and get_json_object(line,'$.err') is not null;

5)查看数据

hive (gmall)>

select * from dwd_error_log where dt='2020-06-14' limit 2;

6.1.8 DWD层用户行为数据加载脚本

1)编写脚本

(1)在hadoop102的/home/seven/bin目录下创建脚本

[seven@hadoop102 bin]$ vim ods_to_dwd_log.sh

在脚本中编写如下内容

#!/bin/bashAPP=gmall
# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$2" ] ;thendo_date=$2
else do_date=`date -d "-1 day" +%F`
fidwd_start_log="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_start_log partition(dt='$do_date')
selectget_json_object(line,'$.common.ar'),get_json_object(line,'$.common.ba'),get_json_object(line,'$.common.ch'),get_json_object(line,'$.common.is_new'),get_json_object(line,'$.common.md'),get_json_object(line,'$.common.mid'),get_json_object(line,'$.common.os'),get_json_object(line,'$.common.uid'),get_json_object(line,'$.common.vc'),get_json_object(line,'$.start.entry'),get_json_object(line,'$.start.loading_time'),get_json_object(line,'$.start.open_ad_id'),get_json_object(line,'$.start.open_ad_ms'),get_json_object(line,'$.start.open_ad_skip_ms'),get_json_object(line,'$.ts')
from ${APP}.ods_log
where dt='$do_date'
and get_json_object(line,'$.start') is not null;"dwd_page_log="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_page_log partition(dt='$do_date')
selectget_json_object(line,'$.common.ar'),get_json_object(line,'$.common.ba'),get_json_object(line,'$.common.ch'),get_json_object(line,'$.common.is_new'),get_json_object(line,'$.common.md'),get_json_object(line,'$.common.mid'),get_json_object(line,'$.common.os'),get_json_object(line,'$.common.uid'),get_json_object(line,'$.common.vc'),get_json_object(line,'$.page.during_time'),get_json_object(line,'$.page.item'),get_json_object(line,'$.page.item_type'),get_json_object(line,'$.page.last_page_id'),get_json_object(line,'$.page.page_id'),get_json_object(line,'$.page.source_type'),get_json_object(line,'$.ts')
from ${APP}.ods_log
where dt='$do_date'
and get_json_object(line,'$.page') is not null;"dwd_action_log="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_action_log partition(dt='$do_date')
selectget_json_object(line,'$.common.ar'),get_json_object(line,'$.common.ba'),get_json_object(line,'$.common.ch'),get_json_object(line,'$.common.is_new'),get_json_object(line,'$.common.md'),get_json_object(line,'$.common.mid'),get_json_object(line,'$.common.os'),get_json_object(line,'$.common.uid'),get_json_object(line,'$.common.vc'),get_json_object(line,'$.page.during_time'),get_json_object(line,'$.page.item'),get_json_object(line,'$.page.item_type'),get_json_object(line,'$.page.last_page_id'),get_json_object(line,'$.page.page_id'),get_json_object(line,'$.page.source_type'),get_json_object(action,'$.action_id'),get_json_object(action,'$.item'),get_json_object(action,'$.item_type'),get_json_object(action,'$.ts')
from ${APP}.ods_log lateral view ${APP}.explode_json_array(get_json_object(line,'$.actions')) tmp as action
where dt='$do_date'
and get_json_object(line,'$.actions') is not null;"dwd_display_log="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_display_log partition(dt='$do_date')
selectget_json_object(line,'$.common.ar'),get_json_object(line,'$.common.ba'),get_json_object(line,'$.common.ch'),get_json_object(line,'$.common.is_new'),get_json_object(line,'$.common.md'),get_json_object(line,'$.common.mid'),get_json_object(line,'$.common.os'),get_json_object(line,'$.common.uid'),get_json_object(line,'$.common.vc'),get_json_object(line,'$.page.during_time'),get_json_object(line,'$.page.item'),get_json_object(line,'$.page.item_type'),get_json_object(line,'$.page.last_page_id'),get_json_object(line,'$.page.page_id'),get_json_object(line,'$.page.source_type'),get_json_object(line,'$.ts'),get_json_object(display,'$.display_type'),get_json_object(display,'$.item'),get_json_object(display,'$.item_type'),get_json_object(display,'$.order'),get_json_object(display,'$.pos_id')
from ${APP}.ods_log lateral view ${APP}.explode_json_array(get_json_object(line,'$.displays')) tmp as display
where dt='$do_date'
and get_json_object(line,'$.displays') is not null;"dwd_error_log="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_error_log partition(dt='$do_date')
selectget_json_object(line,'$.common.ar'),get_json_object(line,'$.common.ba'),get_json_object(line,'$.common.ch'),get_json_object(line,'$.common.is_new'),get_json_object(line,'$.common.md'),get_json_object(line,'$.common.mid'),get_json_object(line,'$.common.os'),get_json_object(line,'$.common.uid'),get_json_object(line,'$.common.vc'),get_json_object(line,'$.page.item'),get_json_object(line,'$.page.item_type'),get_json_object(line,'$.page.last_page_id'),get_json_object(line,'$.page.page_id'),get_json_object(line,'$.page.source_type'),get_json_object(line,'$.start.entry'),get_json_object(line,'$.start.loading_time'),get_json_object(line,'$.start.open_ad_id'),get_json_object(line,'$.start.open_ad_ms'),get_json_object(line,'$.start.open_ad_skip_ms'),get_json_object(line,'$.actions'),get_json_object(line,'$.displays'),get_json_object(line,'$.ts'),get_json_object(line,'$.err.error_code'),get_json_object(line,'$.err.msg')
from ${APP}.ods_log
where dt='$do_date'
and get_json_object(line,'$.err') is not null;"case $1 indwd_start_log )hive -e "$dwd_start_log";;dwd_page_log )hive -e "$dwd_page_log";;dwd_action_log )hive -e "$dwd_action_log";;dwd_display_log )hive -e "$dwd_display_log";;dwd_error_log )hive -e "$dwd_error_log";;all )hive -e "$dwd_start_log$dwd_page_log$dwd_action_log$dwd_display_log$dwd_error_log";;
esac

(2)增加脚本执行权限

[seven@hadoop102 bin]$ chmod 777 ods_to_dwd_log.sh

2)脚本使用

(1)执行脚本

[seven@hadoop102 module]$ ods_to_dwd_log.sh all 2020-06-14

(2)查询导入结果

6.2 DWD层(业务数据)

业务数据方面DWD层的搭建主要注意点在于维度建模。

6.2.1 评价事实表(事务型事实表)

1)建表语句

DROP TABLE IF EXISTS dwd_comment_info;
CREATE EXTERNAL TABLE dwd_comment_info(`id` STRING COMMENT '编号',`user_id` STRING COMMENT '用户ID',`sku_id` STRING COMMENT '商品sku',`spu_id` STRING COMMENT '商品spu',`order_id` STRING COMMENT '订单ID',`appraise` STRING COMMENT '评价(好评、中评、差评、默认评价)',`create_time` STRING COMMENT '评价时间'
) COMMENT '评价事实表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_comment_info/'
TBLPROPERTIES ("parquet.compression"="lzo");

2)分区规划

3)数据装载

(1)首日装载

insert overwrite table dwd_comment_info partition (dt)
selectid,user_id,sku_id,spu_id,order_id,appraise,create_time,date_format(create_time,'yyyy-MM-dd')
from ods_comment_info
where dt='2020-06-14';

(2)每日装载

insert overwrite table dwd_comment_info partition(dt='2020-06-15')
selectid,user_id,sku_id,spu_id,order_id,appraise,create_time
from ods_comment_info where dt='2020-06-15';

6.2.2 订单明细事实表(事务型事实表)

1)建表语句

DROP TABLE IF EXISTS dwd_order_detail;
CREATE EXTERNAL TABLE dwd_order_detail (`id` STRING COMMENT '订单编号',`order_id` STRING COMMENT '订单号',`user_id` STRING COMMENT '用户id',`sku_id` STRING COMMENT 'sku商品id',`province_id` STRING COMMENT '省份ID',`activity_id` STRING COMMENT '活动ID',`activity_rule_id` STRING COMMENT '活动规则ID',`coupon_id` STRING COMMENT '优惠券ID',`create_time` STRING COMMENT '创建时间',`source_type` STRING COMMENT '来源类型',`source_id` STRING COMMENT '来源编号',`sku_num` BIGINT COMMENT '商品数量',`original_amount` DECIMAL(16,2) COMMENT '原始价格',`split_activity_amount` DECIMAL(16,2) COMMENT '活动优惠分摊',`split_coupon_amount` DECIMAL(16,2) COMMENT '优惠券优惠分摊',`split_final_amount` DECIMAL(16,2) COMMENT '最终价格分摊'
) COMMENT '订单明细事实表表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_order_detail/'
TBLPROPERTIES ("parquet.compression"="lzo");

2)分区规划

3)数据装载

(1)首日装载

insert overwrite table dwd_order_detail partition(dt)
selectod.id,od.order_id,oi.user_id,od.sku_id,oi.province_id,oda.activity_id,oda.activity_rule_id,odc.coupon_id,od.create_time,od.source_type,od.source_id,od.sku_num,od.order_price*od.sku_num,od.split_activity_amount,od.split_coupon_amount,od.split_final_amount,date_format(create_time,'yyyy-MM-dd')
from
(select*from ods_order_detailwhere dt='2020-06-14'
)od
left join
(selectid,user_id,province_idfrom ods_order_infowhere dt='2020-06-14'
)oi
on od.order_id=oi.id
left join
(selectorder_detail_id,activity_id,activity_rule_idfrom ods_order_detail_activitywhere dt='2020-06-14'
)oda
on od.id=oda.order_detail_id
left join
(selectorder_detail_id,coupon_idfrom ods_order_detail_couponwhere dt='2020-06-14'
)odc
on od.id=odc.order_detail_id;

(2)每日装载

insert overwrite table dwd_order_detail partition(dt='2020-06-15')
selectod.id,od.order_id,oi.user_id,od.sku_id,oi.province_id,oda.activity_id,oda.activity_rule_id,odc.coupon_id,od.create_time,od.source_type,od.source_id,od.sku_num,od.order_price*od.sku_num,od.split_activity_amount,od.split_coupon_amount,od.split_final_amount
from
(select*from ods_order_detailwhere dt='2020-06-15'
)od
left join
(selectid,user_id,province_idfrom ods_order_infowhere dt='2020-06-15'
)oi
on od.order_id=oi.id
left join
(selectorder_detail_id,activity_id,activity_rule_idfrom ods_order_detail_activitywhere dt='2020-06-15'
)oda
on od.id=oda.order_detail_id
left join
(selectorder_detail_id,coupon_idfrom ods_order_detail_couponwhere dt='2020-06-15'
)odc
on od.id=odc.order_detail_id;

6.2.3 退事实表(事务型事实表)

1)建表语句

DROP TABLE IF EXISTS dwd_order_refund_info;
CREATE EXTERNAL TABLE dwd_order_refund_info(`id` STRING COMMENT '编号',`user_id` STRING COMMENT '用户ID',`order_id` STRING COMMENT '订单ID',`sku_id` STRING COMMENT '商品ID',`province_id` STRING COMMENT '地区ID',`refund_type` STRING COMMENT '退单类型',`refund_num` BIGINT COMMENT '退单件数',`refund_amount` DECIMAL(16,2) COMMENT '退单金额',`refund_reason_type` STRING COMMENT '退单原因类型',`create_time` STRING COMMENT '退单时间'
) COMMENT '退单事实表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_order_refund_info/'
TBLPROPERTIES ("parquet.compression"="lzo");

2)分区规划

3)数据装载

(1)首日装载

insert overwrite table dwd_order_refund_info partition(dt)
selectri.id,ri.user_id,ri.order_id,ri.sku_id,oi.province_id,ri.refund_type,ri.refund_num,ri.refund_amount,ri.refund_reason_type,ri.create_time,date_format(ri.create_time,'yyyy-MM-dd')
from
(select * from ods_order_refund_info where dt='2020-06-14'
)ri
left join
(select id,province_id from ods_order_info where dt='2020-06-14'
)oi
on ri.order_id=oi.id;

(2)每日装载

insert overwrite table dwd_order_refund_info partition(dt='2020-06-15')
selectri.id,ri.user_id,ri.order_id,ri.sku_id,oi.province_id,ri.refund_type,ri.refund_num,ri.refund_amount,ri.refund_reason_type,ri.create_time
from
(select * from ods_order_refund_info where dt='2020-06-15'
)ri
left join
(select id,province_id from ods_order_info where dt='2020-06-15'
)oi
on ri.order_id=oi.id;

3)查询加载结果

6.2.4 加购事实表周期型快照事实表,每日快照

1)建表语句

DROP TABLE IF EXISTS dwd_cart_info;
CREATE EXTERNAL TABLE dwd_cart_info(`id` STRING COMMENT '编号',`user_id` STRING COMMENT '用户ID',`sku_id` STRING COMMENT '商品ID',`source_type` STRING COMMENT '来源类型',`source_id` STRING COMMENT '来源编号',`cart_price` DECIMAL(16,2) COMMENT '加入购物车时的价格',`is_ordered` STRING COMMENT '是否已下单',`create_time` STRING COMMENT '创建时间',`operate_time` STRING COMMENT '修改时间',`order_time` STRING COMMENT '下单时间',`sku_num` BIGINT COMMENT '加购数量'
) COMMENT '加购事实表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_cart_info/'
TBLPROPERTIES ("parquet.compression"="lzo");

2)分区规划

3)数据装载

(1)首日装载

insert overwrite table dwd_cart_info partition(dt='2020-06-14')
selectid,user_id,sku_id,source_type,source_id,cart_price,is_ordered,create_time,operate_time,order_time,sku_num
from ods_cart_info
where dt='2020-06-14';

(2)每日装载

insert overwrite table dwd_cart_info partition(dt='2020-06-15')
selectid,user_id,sku_id,source_type,source_id,cart_price,is_ordered,create_time,operate_time,order_time,sku_num
from ods_cart_info
where dt='2020-06-15';

6.2.5 收藏事实表周期型快照事实表,每日快照

1)建表语句

DROP TABLE IF EXISTS dwd_favor_info;
CREATE EXTERNAL TABLE dwd_favor_info(`id` STRING COMMENT '编号',`user_id` STRING  COMMENT '用户id',`sku_id` STRING  COMMENT 'skuid',`spu_id` STRING  COMMENT 'spuid',`is_cancel` STRING  COMMENT '是否取消',`create_time` STRING  COMMENT '收藏时间',`cancel_time` STRING  COMMENT '取消时间'
) COMMENT '收藏事实表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_favor_info/'
TBLPROPERTIES ("parquet.compression"="lzo");

2)分区规划

3)数据装载

(1)首日装载

insert overwrite table dwd_favor_info partition(dt='2020-06-14')
selectid,user_id,sku_id,spu_id,is_cancel,create_time,cancel_time
from ods_favor_info
where dt='2020-06-14';

(2)每日装载

insert overwrite table dwd_favor_info partition(dt='2020-06-15')
selectid,user_id,sku_id,spu_id,is_cancel,create_time,cancel_time
from ods_favor_info
where dt='2020-06-15';

6.2.6 优惠券领用事实表累积型快照事实表

1)建表语句

DROP TABLE IF EXISTS dwd_coupon_use;
CREATE EXTERNAL TABLE dwd_coupon_use(`id` STRING COMMENT '编号',`coupon_id` STRING  COMMENT '优惠券ID',`user_id` STRING  COMMENT 'userid',`order_id` STRING  COMMENT '订单id',`coupon_status` STRING  COMMENT '优惠券状态',`get_time` STRING  COMMENT '领取时间',`using_time` STRING  COMMENT '使用时间(下单)',`used_time` STRING  COMMENT '使用时间(支付)',`expire_time` STRING COMMENT '过期时间'
) COMMENT '优惠券领用事实表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_coupon_use/'
TBLPROPERTIES ("parquet.compression"="lzo");

2)分区规划

3)数据装载

(1)首日装载

insert overwrite table dwd_coupon_use partition(dt)
selectid,coupon_id,user_id,order_id,coupon_status,get_time,using_time,used_time,expire_time,coalesce(date_format(used_time,'yyyy-MM-dd'),date_format(expire_time,'yyyy-MM-dd'),'9999-99-99')
from ods_coupon_use
where dt='2020-06-14';

(2)每日装载

a.装载逻辑

b.转载语句

insert overwrite table dwd_coupon_use partition(dt)
selectnvl(new.id,old.id),nvl(new.coupon_id,old.coupon_id),nvl(new.user_id,old.user_id),nvl(new.order_id,old.order_id),nvl(new.coupon_status,old.coupon_status),nvl(new.get_time,old.get_time),nvl(new.using_time,old.using_time),nvl(new.used_time,old.used_time),nvl(new.expire_time,old.expire_time),coalesce(date_format(nvl(new.used_time,old.used_time),'yyyy-MM-dd'),date_format(nvl(new.expire_time,old.expire_time),'yyyy-MM-dd'),'9999-99-99')
from
(selectid,coupon_id,user_id,order_id,coupon_status,get_time,using_time,used_time,expire_timefrom dwd_coupon_usewhere dt='9999-99-99'
)old
full outer join
(selectid,coupon_id,user_id,order_id,coupon_status,get_time,using_time,used_time,expire_timefrom ods_coupon_usewhere dt='2020-06-15'
)new
on old.id=new.id;

6.2.7 支付事实表(累积快照事实表)

1)建表语句

DROP TABLE IF EXISTS dwd_payment_info;
CREATE EXTERNAL TABLE dwd_payment_info (`id` STRING COMMENT '编号',`order_id` STRING COMMENT '订单编号',`user_id` STRING COMMENT '用户编号',`province_id` STRING COMMENT '地区ID',`trade_no` STRING COMMENT '交易编号',`out_trade_no` STRING COMMENT '对外交易编号',`payment_type` STRING COMMENT '支付类型',`payment_amount` DECIMAL(16,2) COMMENT '支付金额',`payment_status` STRING COMMENT '支付状态',`create_time` STRING COMMENT '创建时间',--调用第三方支付接口的时间`callback_time` STRING COMMENT '完成时间'--支付完成时间,即支付成功回调时间
) COMMENT '支付事实表表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_payment_info/'
TBLPROPERTIES ("parquet.compression"="lzo");

2)分区规划

3)数据装载

(1)首日装载

insert overwrite table dwd_payment_info partition(dt)
selectpi.id,pi.order_id,pi.user_id,oi.province_id,pi.trade_no,pi.out_trade_no,pi.payment_type,pi.payment_amount,pi.payment_status,pi.create_time,pi.callback_time,nvl(date_format(pi.callback_time,'yyyy-MM-dd'),'9999-99-99')
from
(select * from ods_payment_info where dt='2020-06-14'
)pi
left join
(select id,province_id from ods_order_info where dt='2020-06-14'
)oi
on pi.order_id=oi.id;

(2)每日装载

insert overwrite table dwd_payment_info partition(dt)
selectnvl(new.id,old.id),nvl(new.order_id,old.order_id),nvl(new.user_id,old.user_id),nvl(new.province_id,old.province_id),nvl(new.trade_no,old.trade_no),nvl(new.out_trade_no,old.out_trade_no),nvl(new.payment_type,old.payment_type),nvl(new.payment_amount,old.payment_amount),nvl(new.payment_status,old.payment_status),nvl(new.create_time,old.create_time),nvl(new.callback_time,old.callback_time),nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99')
from
(select id,order_id,user_id,province_id,trade_no,out_trade_no,payment_type,payment_amount,payment_status,create_time,callback_timefrom dwd_payment_infowhere dt = '9999-99-99'
)old
full outer join
(selectpi.id,pi.out_trade_no,pi.order_id,pi.user_id,oi.province_id,pi.payment_type,pi.trade_no,pi.payment_amount,pi.payment_status,pi.create_time,pi.callback_timefrom(select * from ods_payment_info where dt='2020-06-15')pileft join(select id,province_id from ods_order_info where dt='2020-06-15')oion pi.order_id=oi.id
)new
on old.id=new.id;

6.2.8 退款事实表(累积快照事实表)

1)建表语句

DROP TABLE IF EXISTS dwd_refund_payment;
CREATE EXTERNAL TABLE dwd_refund_payment (`id` STRING COMMENT '编号',`user_id` STRING COMMENT '用户ID',`order_id` STRING COMMENT '订单编号',`sku_id` STRING COMMENT 'SKU编号',`province_id` STRING COMMENT '地区ID',`trade_no` STRING COMMENT '交易编号',`out_trade_no` STRING COMMENT '对外交易编号',`payment_type` STRING COMMENT '支付类型',`refund_amount` DECIMAL(16,2) COMMENT '退款金额',`refund_status` STRING COMMENT '退款状态',`create_time` STRING COMMENT '创建时间',--调用第三方支付接口的时间`callback_time` STRING COMMENT '回调时间'--支付接口回调时间,即支付成功时间
) COMMENT '退款事实表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_refund_payment/'
TBLPROPERTIES ("parquet.compression"="lzo");

2)分区规划

3)数据装载

(1)首日装载

insert overwrite table dwd_refund_payment partition(dt)
selectrp.id,user_id,order_id,sku_id,province_id,trade_no,out_trade_no,payment_type,refund_amount,refund_status,create_time,callback_time,nvl(date_format(callback_time,'yyyy-MM-dd'),'9999-99-99')
from
(selectid,out_trade_no,order_id,sku_id,payment_type,trade_no,refund_amount,refund_status,create_time,callback_timefrom ods_refund_paymentwhere dt='2020-06-14'
)rp
left join
(selectid,user_id,province_idfrom ods_order_infowhere dt='2020-06-14'
)oi
on rp.order_id=oi.id;

(2)每日装载

insert overwrite table dwd_refund_payment partition(dt)
selectnvl(new.id,old.id),nvl(new.user_id,old.user_id),nvl(new.order_id,old.order_id),nvl(new.sku_id,old.sku_id),nvl(new.province_id,old.province_id),nvl(new.trade_no,old.trade_no),nvl(new.out_trade_no,old.out_trade_no),nvl(new.payment_type,old.payment_type),nvl(new.refund_amount,old.refund_amount),nvl(new.refund_status,old.refund_status),nvl(new.create_time,old.create_time),nvl(new.callback_time,old.callback_time),nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99')
from
(selectid,user_id,order_id,sku_id,province_id,trade_no,out_trade_no,payment_type,refund_amount,refund_status,create_time,callback_timefrom dwd_refund_paymentwhere dt='9999-99-99'
)old
full outer join
(selectrp.id,user_id,order_id,sku_id,province_id,trade_no,out_trade_no,payment_type,refund_amount,refund_status,create_time,callback_timefrom(selectid,out_trade_no,order_id,sku_id,payment_type,trade_no,refund_amount,refund_status,create_time,callback_timefrom ods_refund_paymentwhere dt='2020-06-15')rpleft join(selectid,user_id,province_idfrom ods_order_infowhere dt='2020-06-15')oion rp.order_id=oi.id
)new
on old.id=new.id;

3)查询加载结果

6.2.9 订单事实表累积型快照事实表

1)建表语句

DROP TABLE IF EXISTS dwd_order_info;
CREATE EXTERNAL TABLE dwd_order_info(`id` STRING COMMENT '编号',`order_status` STRING COMMENT '订单状态',`user_id` STRING COMMENT '用户ID',`province_id` STRING COMMENT '地区ID',`payment_way` STRING COMMENT '支付方式',`delivery_address` STRING COMMENT '邮寄地址',`out_trade_no` STRING COMMENT '对外交易编号',`tracking_no` STRING COMMENT '物流单号',`create_time` STRING COMMENT '创建时间(未支付状态)',`payment_time` STRING COMMENT '支付时间(已支付状态)',`cancel_time` STRING COMMENT '取消时间(已取消状态)',`finish_time` STRING COMMENT '完成时间(已完成状态)',`refund_time` STRING COMMENT '退款时间(退款中状态)',`refund_finish_time` STRING COMMENT '退款完成时间(退款完成状态)',`expire_time` STRING COMMENT '过期时间',`feight_fee` DECIMAL(16,2) COMMENT '运费',`feight_fee_reduce` DECIMAL(16,2) COMMENT '运费减免',`activity_reduce_amount` DECIMAL(16,2) COMMENT '活动减免',`coupon_reduce_amount` DECIMAL(16,2) COMMENT '优惠券减免',`original_amount` DECIMAL(16,2) COMMENT '订单原始价格',`final_amount` DECIMAL(16,2) COMMENT '订单最终价格'
) COMMENT '订单事实表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_order_info/'
TBLPROPERTIES ("parquet.compression"="lzo");

2)分区规划

3)数据装载

(1)首日装载

insert overwrite table dwd_order_info partition(dt)
selectoi.id,oi.order_status,oi.user_id,oi.province_id,oi.payment_way,oi.delivery_address,oi.out_trade_no,oi.tracking_no,oi.create_time,times.ts['1002'] payment_time,times.ts['1003'] cancel_time,times.ts['1004'] finish_time,times.ts['1005'] refund_time,times.ts['1006'] refund_finish_time,oi.expire_time,feight_fee,feight_fee_reduce,activity_reduce_amount,coupon_reduce_amount,original_amount,final_amount,casewhen times.ts['1003'] is not null then date_format(times.ts['1003'],'yyyy-MM-dd')when times.ts['1004'] is not null and date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7)<='2020-06-14' and times.ts['1005'] is null then date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7)when times.ts['1006'] is not null then date_format(times.ts['1006'],'yyyy-MM-dd')when oi.expire_time is not null then date_format(oi.expire_time,'yyyy-MM-dd')else '9999-99-99'end
from
(select*from ods_order_infowhere dt='2020-06-14'
)oi
left join
(selectorder_id,str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') tsfrom ods_order_status_logwhere dt='2020-06-14'group by order_id
)times
on oi.id=times.order_id;

(2)每日装载

insert overwrite table dwd_order_info partition(dt)
selectnvl(new.id,old.id),nvl(new.order_status,old.order_status),nvl(new.user_id,old.user_id),nvl(new.province_id,old.province_id),nvl(new.payment_way,old.payment_way),nvl(new.delivery_address,old.delivery_address),nvl(new.out_trade_no,old.out_trade_no),nvl(new.tracking_no,old.tracking_no),nvl(new.create_time,old.create_time),nvl(new.payment_time,old.payment_time),nvl(new.cancel_time,old.cancel_time),nvl(new.finish_time,old.finish_time),nvl(new.refund_time,old.refund_time),nvl(new.refund_finish_time,old.refund_finish_time),nvl(new.expire_time,old.expire_time),nvl(new.feight_fee,old.feight_fee),nvl(new.feight_fee_reduce,old.feight_fee_reduce),nvl(new.activity_reduce_amount,old.activity_reduce_amount),nvl(new.coupon_reduce_amount,old.coupon_reduce_amount),nvl(new.original_amount,old.original_amount),nvl(new.final_amount,old.final_amount),casewhen new.cancel_time is not null then date_format(new.cancel_time,'yyyy-MM-dd')when new.finish_time is not null and date_add(date_format(new.finish_time,'yyyy-MM-dd'),7)='2020-06-15' and new.refund_time is null then '2020-06-15'when new.refund_finish_time is not null then date_format(new.refund_finish_time,'yyyy-MM-dd')when new.expire_time is not null then date_format(new.expire_time,'yyyy-MM-dd')else '9999-99-99'end
from
(selectid,order_status,user_id,province_id,payment_way,delivery_address,out_trade_no,tracking_no,create_time,payment_time,cancel_time,finish_time,refund_time,refund_finish_time,expire_time,feight_fee,feight_fee_reduce,activity_reduce_amount,coupon_reduce_amount,original_amount,final_amountfrom dwd_order_infowhere dt='9999-99-99'
)old
full outer join
(selectoi.id,oi.order_status,oi.user_id,oi.province_id,oi.payment_way,oi.delivery_address,oi.out_trade_no,oi.tracking_no,oi.create_time,times.ts['1002'] payment_time,times.ts['1003'] cancel_time,times.ts['1004'] finish_time,times.ts['1005'] refund_time,times.ts['1006'] refund_finish_time,oi.expire_time,feight_fee,feight_fee_reduce,activity_reduce_amount,coupon_reduce_amount,original_amount,final_amountfrom(select*from ods_order_infowhere dt='2020-06-15')oileft join(selectorder_id,str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') tsfrom ods_order_status_logwhere dt='2020-06-15'group by order_id)timeson oi.id=times.order_id
)new
on old.id=new.id;

6.2.10 DWD层业务数据首日装载脚本

1)编写脚本

(1)在/home/seven/bin目录下创建脚本ods_to_dwd_db_init.sh

[seven@hadoop102 bin]$ vim ods_to_dwd_db_init.sh

在脚本中填写如下内容

#!/bin/bash
APP=gmallif [ -n "$2" ] ;thendo_date=$2
else echo "请传入日期参数"exit
fi dwd_order_info="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_order_info partition(dt)
selectoi.id,oi.order_status,oi.user_id,oi.province_id,oi.payment_way,oi.delivery_address,oi.out_trade_no,oi.tracking_no,oi.create_time,times.ts['1002'] payment_time,times.ts['1003'] cancel_time,times.ts['1004'] finish_time,times.ts['1005'] refund_time,times.ts['1006'] refund_finish_time,oi.expire_time,feight_fee,feight_fee_reduce,activity_reduce_amount,coupon_reduce_amount,original_amount,final_amount,casewhen times.ts['1003'] is not null then date_format(times.ts['1003'],'yyyy-MM-dd')when times.ts['1004'] is not null and date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7)<='$do_date' and times.ts['1005'] is null then date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7)when times.ts['1006'] is not null then date_format(times.ts['1006'],'yyyy-MM-dd')when oi.expire_time is not null then date_format(oi.expire_time,'yyyy-MM-dd')else '9999-99-99'end
from
(select*from ${APP}.ods_order_infowhere dt='$do_date'
)oi
left join
(selectorder_id,str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') tsfrom ${APP}.ods_order_status_logwhere dt='$do_date'group by order_id
)times
on oi.id=times.order_id;"dwd_order_detail="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_order_detail partition(dt)
selectod.id,od.order_id,oi.user_id,od.sku_id,oi.province_id,oda.activity_id,oda.activity_rule_id,odc.coupon_id,od.create_time,od.source_type,od.source_id,od.sku_num,od.order_price*od.sku_num,od.split_activity_amount,od.split_coupon_amount,od.split_final_amount,date_format(create_time,'yyyy-MM-dd')
from
(select*from ${APP}.ods_order_detailwhere dt='$do_date'
)od
left join
(selectid,user_id,province_idfrom ${APP}.ods_order_infowhere dt='$do_date'
)oi
on od.order_id=oi.id
left join
(selectorder_detail_id,activity_id,activity_rule_idfrom ${APP}.ods_order_detail_activitywhere dt='$do_date'
)oda
on od.id=oda.order_detail_id
left join
(selectorder_detail_id,coupon_idfrom ${APP}.ods_order_detail_couponwhere dt='$do_date'
)odc
on od.id=odc.order_detail_id;"dwd_payment_info="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_payment_info partition(dt)
selectpi.id,pi.order_id,pi.user_id,oi.province_id,pi.trade_no,pi.out_trade_no,pi.payment_type,pi.payment_amount,pi.payment_status,pi.create_time,pi.callback_time,nvl(date_format(pi.callback_time,'yyyy-MM-dd'),'9999-99-99')
from
(select * from ${APP}.ods_payment_info where dt='$do_date'
)pi
left join
(select id,province_id from ${APP}.ods_order_info where dt='$do_date'
)oi
on pi.order_id=oi.id;"dwd_cart_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_cart_info partition(dt='$do_date')
selectid,user_id,sku_id,source_type,source_id,cart_price,is_ordered,create_time,operate_time,order_time,sku_num
from ${APP}.ods_cart_info
where dt='$do_date';"dwd_comment_info="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_comment_info partition(dt)
selectid,user_id,sku_id,spu_id,order_id,appraise,create_time,date_format(create_time,'yyyy-MM-dd')
from ${APP}.ods_comment_info
where dt='$do_date';
"dwd_favor_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_favor_info partition(dt='$do_date')
selectid,user_id,sku_id,spu_id,is_cancel,create_time,cancel_time
from ${APP}.ods_favor_info
where dt='$do_date';"dwd_coupon_use="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_coupon_use partition(dt)
selectid,coupon_id,user_id,order_id,coupon_status,get_time,using_time,used_time,expire_time,coalesce(date_format(used_time,'yyyy-MM-dd'),date_format(expire_time,'yyyy-MM-dd'),'9999-99-99')
from ${APP}.ods_coupon_use
where dt='$do_date';"dwd_order_refund_info="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_order_refund_info partition(dt)
selectri.id,ri.user_id,ri.order_id,ri.sku_id,oi.province_id,ri.refund_type,ri.refund_num,ri.refund_amount,ri.refund_reason_type,ri.create_time,date_format(ri.create_time,'yyyy-MM-dd')
from
(select * from ${APP}.ods_order_refund_info where dt='$do_date'
)ri
left join
(select id,province_id from ${APP}.ods_order_info where dt='$do_date'
)oi
on ri.order_id=oi.id;"dwd_refund_payment="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_refund_payment partition(dt)
selectrp.id,user_id,order_id,sku_id,province_id,trade_no,out_trade_no,payment_type,refund_amount,refund_status,create_time,callback_time,nvl(date_format(callback_time,'yyyy-MM-dd'),'9999-99-99')
from
(selectid,out_trade_no,order_id,sku_id,payment_type,trade_no,refund_amount,refund_status,create_time,callback_timefrom ${APP}.ods_refund_paymentwhere dt='$do_date'
)rp
left join
(selectid,user_id,province_idfrom ${APP}.ods_order_infowhere dt='$do_date'
)oi
on rp.order_id=oi.id;"case $1 indwd_order_info )hive -e "$dwd_order_info";;dwd_order_detail )hive -e "$dwd_order_detail";;dwd_payment_info )hive -e "$dwd_payment_info";;dwd_cart_info )hive -e "$dwd_cart_info";;dwd_comment_info )hive -e "$dwd_comment_info";;dwd_favor_info )hive -e "$dwd_favor_info";;dwd_coupon_use )hive -e "$dwd_coupon_use";;dwd_order_refund_info )hive -e "$dwd_order_refund_info";;dwd_refund_payment )hive -e "$dwd_refund_payment";;all )hive -e "$dwd_order_info$dwd_order_detail$dwd_payment_info$dwd_cart_info$dwd_comment_info$dwd_favor_info$dwd_coupon_use$dwd_order_refund_info$dwd_refund_payment";;
esac

(2)增加执行权限

[seven@hadoop102 bin]$ chmod +x ods_to_dwd_db_init.sh

2)脚本使用

(1)执行脚本

[seven@hadoop102 bin]$ ods_to_dwd_db_init.sh all 2020-06-14

(2)查看数据是否导入成功

6.2.11 DWD层业务数据每日装载脚本

1)编写脚本

(1)在/home/seven/bin目录下创建脚本ods_to_dwd_db.sh

[seven@hadoop102 bin]$ vim ods_to_dwd_db.sh

在脚本中填写如下内容

#!/bin/bashAPP=gmall
# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$2" ] ;thendo_date=$2
else do_date=`date -d "-1 day" +%F`
fi# 假设某累积型快照事实表,某天所有的业务记录全部完成,则会导致9999-99-99分区的数据未被覆盖,从而导致数据重复,该函数根据9999-99-99分区的数据的末次修改时间判断其是否被覆盖了,如果未被覆盖,就手动清理
clear_data(){current_date=`date +%F`current_date_timestamp=`date -d "$current_date" +%s`last_modified_date=`hadoop fs -ls /warehouse/gmall/dwd/$1 | grep '9999-99-99' | awk '{print $6}'`last_modified_date_timestamp=`date -d "$last_modified_date" +%s`if [[ $last_modified_date_timestamp -lt $current_date_timestamp ]]; thenecho "clear table $1 partition(dt=9999-99-99)"hadoop fs -rm -r -f /warehouse/gmall/dwd/$1/dt=9999-99-99/*fi
}dwd_order_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table ${APP}.dwd_order_info partition(dt)
selectnvl(new.id,old.id),nvl(new.order_status,old.order_status),nvl(new.user_id,old.user_id),nvl(new.province_id,old.province_id),nvl(new.payment_way,old.payment_way),nvl(new.delivery_address,old.delivery_address),nvl(new.out_trade_no,old.out_trade_no),nvl(new.tracking_no,old.tracking_no),nvl(new.create_time,old.create_time),nvl(new.payment_time,old.payment_time),nvl(new.cancel_time,old.cancel_time),nvl(new.finish_time,old.finish_time),nvl(new.refund_time,old.refund_time),nvl(new.refund_finish_time,old.refund_finish_time),nvl(new.expire_time,old.expire_time),nvl(new.feight_fee,old.feight_fee),nvl(new.feight_fee_reduce,old.feight_fee_reduce),nvl(new.activity_reduce_amount,old.activity_reduce_amount),nvl(new.coupon_reduce_amount,old.coupon_reduce_amount),nvl(new.original_amount,old.original_amount),nvl(new.final_amount,old.final_amount),casewhen new.cancel_time is not null then date_format(new.cancel_time,'yyyy-MM-dd')when new.finish_time is not null and date_add(date_format(new.finish_time,'yyyy-MM-dd'),7)='$do_date' and new.refund_time is null then '$do_date'when new.refund_finish_time is not null then date_format(new.refund_finish_time,'yyyy-MM-dd')when new.expire_time is not null then date_format(new.expire_time,'yyyy-MM-dd')else '9999-99-99'end
from
(selectid,order_status,user_id,province_id,payment_way,delivery_address,out_trade_no,tracking_no,create_time,payment_time,cancel_time,finish_time,refund_time,refund_finish_time,expire_time,feight_fee,feight_fee_reduce,activity_reduce_amount,coupon_reduce_amount,original_amount,final_amountfrom ${APP}.dwd_order_infowhere dt='9999-99-99'
)old
full outer join
(selectoi.id,oi.order_status,oi.user_id,oi.province_id,oi.payment_way,oi.delivery_address,oi.out_trade_no,oi.tracking_no,oi.create_time,times.ts['1002'] payment_time,times.ts['1003'] cancel_time,times.ts['1004'] finish_time,times.ts['1005'] refund_time,times.ts['1006'] refund_finish_time,oi.expire_time,feight_fee,feight_fee_reduce,activity_reduce_amount,coupon_reduce_amount,original_amount,final_amountfrom(select*from ${APP}.ods_order_infowhere dt='$do_date')oileft join(selectorder_id,str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') tsfrom ${APP}.ods_order_status_logwhere dt='$do_date'group by order_id)timeson oi.id=times.order_id
)new
on old.id=new.id;"dwd_order_detail="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_order_detail partition(dt='$do_date')
selectod.id,od.order_id,oi.user_id,od.sku_id,oi.province_id,oda.activity_id,oda.activity_rule_id,odc.coupon_id,od.create_time,od.source_type,od.source_id,od.sku_num,od.order_price*od.sku_num,od.split_activity_amount,od.split_coupon_amount,od.split_final_amount
from
(select*from ${APP}.ods_order_detailwhere dt='$do_date'
)od
left join
(selectid,user_id,province_idfrom ${APP}.ods_order_infowhere dt='$do_date'
)oi
on od.order_id=oi.id
left join
(selectorder_detail_id,activity_id,activity_rule_idfrom ${APP}.ods_order_detail_activitywhere dt='$do_date'
)oda
on od.id=oda.order_detail_id
left join
(selectorder_detail_id,coupon_idfrom ${APP}.ods_order_detail_couponwhere dt='$do_date'
)odc
on od.id=odc.order_detail_id;"dwd_payment_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table ${APP}.dwd_payment_info partition(dt)
selectnvl(new.id,old.id),nvl(new.order_id,old.order_id),nvl(new.user_id,old.user_id),nvl(new.province_id,old.province_id),nvl(new.trade_no,old.trade_no),nvl(new.out_trade_no,old.out_trade_no),nvl(new.payment_type,old.payment_type),nvl(new.payment_amount,old.payment_amount),nvl(new.payment_status,old.payment_status),nvl(new.create_time,old.create_time),nvl(new.callback_time,old.callback_time),nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99')
from
(select id,order_id,user_id,province_id,trade_no,out_trade_no,payment_type,payment_amount,payment_status,create_time,callback_timefrom ${APP}.dwd_payment_infowhere dt = '9999-99-99'
)old
full outer join
(selectpi.id,pi.out_trade_no,pi.order_id,pi.user_id,oi.province_id,pi.payment_type,pi.trade_no,pi.payment_amount,pi.payment_status,pi.create_time,pi.callback_timefrom(select * from ${APP}.ods_payment_info where dt='$do_date')pileft join(select id,province_id from ${APP}.ods_order_info where dt='$do_date')oion pi.order_id=oi.id
)new
on old.id=new.id;"dwd_cart_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_cart_info partition(dt='$do_date')
selectid,user_id,sku_id,source_type,source_id,cart_price,is_ordered,create_time,operate_time,order_time,sku_num
from ${APP}.ods_cart_info
where dt='$do_date';"dwd_comment_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_comment_info partition(dt='$do_date')
selectid,user_id,sku_id,spu_id,order_id,appraise,create_time
from ${APP}.ods_comment_info where dt='$do_date';"dwd_favor_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_favor_info partition(dt='$do_date')
selectid,user_id,sku_id,spu_id,is_cancel,create_time,cancel_time
from ${APP}.ods_favor_info
where dt='$do_date';"dwd_coupon_use="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table ${APP}.dwd_coupon_use partition(dt)
selectnvl(new.id,old.id),nvl(new.coupon_id,old.coupon_id),nvl(new.user_id,old.user_id),nvl(new.order_id,old.order_id),nvl(new.coupon_status,old.coupon_status),nvl(new.get_time,old.get_time),nvl(new.using_time,old.using_time),nvl(new.used_time,old.used_time),nvl(new.expire_time,old.expire_time),coalesce(date_format(nvl(new.used_time,old.used_time),'yyyy-MM-dd'),date_format(nvl(new.expire_time,old.expire_time),'yyyy-MM-dd'),'9999-99-99')
from
(selectid,coupon_id,user_id,order_id,coupon_status,get_time,using_time,used_time,expire_timefrom ${APP}.dwd_coupon_usewhere dt='9999-99-99'
)old
full outer join
(selectid,coupon_id,user_id,order_id,coupon_status,get_time,using_time,used_time,expire_timefrom ${APP}.ods_coupon_usewhere dt='$do_date'
)new
on old.id=new.id;"dwd_order_refund_info="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite table ${APP}.dwd_order_refund_info partition(dt='$do_date')
selectri.id,ri.user_id,ri.order_id,ri.sku_id,oi.province_id,ri.refund_type,ri.refund_num,ri.refund_amount,ri.refund_reason_type,ri.create_time
from
(select * from ${APP}.ods_order_refund_info where dt='$do_date'
)ri
left join
(select id,province_id from ${APP}.ods_order_info where dt='$do_date'
)oi
on ri.order_id=oi.id;"dwd_refund_payment="
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table ${APP}.dwd_refund_payment partition(dt)
selectnvl(new.id,old.id),nvl(new.user_id,old.user_id),nvl(new.order_id,old.order_id),nvl(new.sku_id,old.sku_id),nvl(new.province_id,old.province_id),nvl(new.trade_no,old.trade_no),nvl(new.out_trade_no,old.out_trade_no),nvl(new.payment_type,old.payment_type),nvl(new.refund_amount,old.refund_amount),nvl(new.refund_status,old.refund_status),nvl(new.create_time,old.create_time),nvl(new.callback_time,old.callback_time),nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99')
from
(selectid,user_id,order_id,sku_id,province_id,trade_no,out_trade_no,payment_type,refund_amount,refund_status,create_time,callback_timefrom ${APP}.dwd_refund_paymentwhere dt='9999-99-99'
)old
full outer join
(selectrp.id,user_id,order_id,sku_id,province_id,trade_no,out_trade_no,payment_type,refund_amount,refund_status,create_time,callback_timefrom(selectid,out_trade_no,order_id,sku_id,payment_type,trade_no,refund_amount,refund_status,create_time,callback_timefrom ${APP}.ods_refund_paymentwhere dt='$do_date')rpleft join(selectid,user_id,province_idfrom ${APP}.ods_order_infowhere dt='$do_date')oion rp.order_id=oi.id
)new
on old.id=new.id;"case $1 indwd_order_info )hive -e "$dwd_order_info"clear_data dwd_order_info;;dwd_order_detail )hive -e "$dwd_order_detail";;dwd_payment_info )hive -e "$dwd_payment_info"clear_data dwd_payment_info;;dwd_cart_info )hive -e "$dwd_cart_info";;dwd_comment_info )hive -e "$dwd_comment_info";;dwd_favor_info )hive -e "$dwd_favor_info";;dwd_coupon_use )hive -e "$dwd_coupon_use"clear_data dwd_coupon_use;;dwd_order_refund_info )hive -e "$dwd_order_refund_info";;dwd_refund_payment )hive -e "$dwd_refund_payment"clear_data dwd_refund_payment;;all )hive -e "$dwd_order_info$dwd_order_detail$dwd_payment_info$dwd_cart_info$dwd_comment_info$dwd_favor_info$dwd_coupon_use$dwd_order_refund_info$dwd_refund_payment"clear_data dwd_order_infoclear_data dwd_payment_infoclear_data dwd_coupon_useclear_data dwd_refund_payment;;
esac

(2)增加脚本执行权限

[seven@hadoop102 bin]$ chmod 777 ods_to_dwd_db.sh

2)脚本使用

(1)执行脚本

[seven@hadoop102 bin]$ ods_to_dwd_db.sh all 2020-06-14

(2)查看数据是否导入成功

7章 数仓搭建-DWS

7.1 系统函数

7.1.1 nvl函数

1)基本语法

NVL(表达式1,表达式2)

如果表达式1为空值,NVL返回值为表达式2的值,否则返回表达式1的值。

该函数的目的是把一个空值(null)转换成一个实际的值。其表达式的值可以是数字型、字符型和日期型但是表达式1和表达式2的数据类型必须为同一个类型

2)案例实操

hive (gmall)> select nvl(1,0);
1

hive (gmall)> select nvl(null,"hello");
hello

7.1.2 日期处理函数

1)date_format函数(根据格式整理日期)

hive (gmall)> select date_format('2020-06-14','yyyy-MM');
2020-06

2)date_add函数(加减日期)

hive (gmall)> select date_add('2020-06-14',-1);
2020-06-13

hive (gmall)> select date_add('2020-06-14',1);
2020-06-15

3)next_day函数

(1)取当前天的下一个周一

hive (gmall)> select next_day('2020-06-14','MO');
2020-06-15

说明:星期一到星期日的英文(Monday,Tuesday、Wednesday、Thursday、Friday、Saturday、Sunday)

(2)取当前周的周一

hive (gmall)> select date_add(next_day('2020-06-14','MO'),-7);
2020-06-8

4)last_day函数(求当月最后一天日期)

hive (gmall)> select last_day('2020-06-14');
2020-06-
30

7.1.3 复杂数据类型定义

1)map结构数据定义
        map<string,string>

2)array结构数据定义
        array<string>

3)struct结构数据定义
        struct<id:int,name:string,age:int>

4)struct和array嵌套定义
        array<struct<id:int,name:string,age:int>>

7.2 DWS

7.2.1 访客主题

1)建表语句

DROP TABLE IF EXISTS dws_visitor_action_daycount;
CREATE EXTERNAL TABLE dws_visitor_action_daycount
(`mid_id` STRING COMMENT '设备id',`brand` STRING COMMENT '设备品牌',`model` STRING COMMENT '设备型号',`is_new` STRING COMMENT '是否首次访问',`channel` ARRAY<STRING> COMMENT '渠道',`os` ARRAY<STRING> COMMENT '操作系统',`area_code` ARRAY<STRING> COMMENT '地区ID',`version_code` ARRAY<STRING> COMMENT '应用版本',`visit_count` BIGINT COMMENT '访问次数',`page_stats` ARRAY<STRUCT<page_id:STRING,page_count:BIGINT,during_time:BIGINT>> COMMENT '页面访问统计'
) COMMENT '每日设备行为表'
PARTITIONED BY(`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dws/dws_visitor_action_daycount'
TBLPROPERTIES ("parquet.compression"="lzo");

2)数据装载

insert overwrite table dws_visitor_action_daycount partition(dt='2020-06-14')
selectt1.mid_id,t1.brand,t1.model,t1.is_new,t1.channel,t1.os,t1.area_code,t1.version_code,t1.visit_count,t3.page_stats
from
(selectmid_id,brand,model,if(array_contains(collect_set(is_new),'0'),'0','1') is_new,--ods_page_log中,同一天内,同一设备的is_new字段,可能全部为1,可能全部为0,也可能部分为0,部分为1(卸载重装),故做该处理collect_set(channel) channel,collect_set(os) os,collect_set(area_code) area_code,collect_set(version_code) version_code,sum(if(last_page_id is null,1,0)) visit_countfrom dwd_page_logwhere dt='2020-06-14'and last_page_id is nullgroup by mid_id,model,brand
)t1
join
(selectmid_id,brand,model,collect_set(named_struct('page_id',page_id,'page_count',page_count,'during_time',during_time)) page_statsfrom(selectmid_id,brand,model,page_id,count(*) page_count,sum(during_time) during_timefrom dwd_page_logwhere dt='2020-06-14'group by mid_id,model,brand,page_id)t2group by mid_id,model,brand
)t3
on t1.mid_id=t3.mid_id
and t1.brand=t3.brand
and t1.model=t3.model;

3)查询加载结果

7.2.2 用户主题

1)建表语句

DROP TABLE IF EXISTS dws_user_action_daycount;
CREATE EXTERNAL TABLE dws_user_action_daycount
(`user_id` STRING COMMENT '用户id',`login_count` BIGINT COMMENT '登录次数',`cart_count` BIGINT COMMENT '加入购物车次数',`favor_count` BIGINT COMMENT '收藏次数',`order_count` BIGINT COMMENT '下单次数',`order_activity_count` BIGINT COMMENT '订单参与活动次数',`order_activity_reduce_amount` DECIMAL(16,2) COMMENT '订单减免金额(活动)',`order_coupon_count` BIGINT COMMENT '订单用券次数',`order_coupon_reduce_amount` DECIMAL(16,2) COMMENT '订单减免金额(优惠券)',`order_original_amount` DECIMAL(16,2)  COMMENT '订单单原始金额',`order_final_amount` DECIMAL(16,2) COMMENT '订单总金额',`payment_count` BIGINT COMMENT '支付次数',`payment_amount` DECIMAL(16,2) COMMENT '支付金额',`refund_order_count` BIGINT COMMENT '退单次数',`refund_order_num` BIGINT COMMENT '退单件数',`refund_order_amount` DECIMAL(16,2) COMMENT '退单金额',`refund_payment_count` BIGINT COMMENT '退款次数',`refund_payment_num` BIGINT COMMENT '退款件数',`refund_payment_amount` DECIMAL(16,2) COMMENT '退款金额',`coupon_get_count` BIGINT COMMENT '优惠券领取次数',`coupon_using_count` BIGINT COMMENT '优惠券使用(下单)次数',`coupon_used_count` BIGINT COMMENT '优惠券使用(支付)次数',`appraise_good_count` BIGINT COMMENT '好评数',`appraise_mid_count` BIGINT COMMENT '中评数',`appraise_bad_count` BIGINT COMMENT '差评数',`appraise_default_count` BIGINT COMMENT '默认评价数',`order_detail_stats` array<struct<sku_id:string,sku_num:bigint,order_count:bigint,activity_reduce_amount:decimal(16,2),coupon_reduce_amount:decimal(16,2),original_amount:decimal(16,2),final_amount:decimal(16,2)>> COMMENT '下单明细统计'
) COMMENT '每日用户行为'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dws/dws_user_action_daycount/'
TBLPROPERTIES ("parquet.compression"="lzo");

2)数据装载

(1)首日装载

with
tmp_login as
(selectdt,user_id,count(*) login_countfrom dwd_page_logwhere user_id is not nulland last_page_id is nullgroup by dt,user_id
),
tmp_cf as
(selectdt,user_id,sum(if(action_id='cart_add',1,0)) cart_count,sum(if(action_id='favor_add',1,0)) favor_countfrom dwd_action_logwhere user_id is not nulland action_id in ('cart_add','favor_add')group by dt,user_id
),
tmp_order as
(selectdate_format(create_time,'yyyy-MM-dd') dt,user_id,count(*) order_count,sum(if(activity_reduce_amount>0,1,0)) order_activity_count,sum(if(coupon_reduce_amount>0,1,0)) order_coupon_count,sum(activity_reduce_amount) order_activity_reduce_amount,sum(coupon_reduce_amount) order_coupon_reduce_amount,sum(original_amount) order_original_amount,sum(final_amount) order_final_amountfrom dwd_order_infogroup by date_format(create_time,'yyyy-MM-dd'),user_id
),
tmp_pay as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,user_id,count(*) payment_count,sum(payment_amount) payment_amountfrom dwd_payment_infogroup by date_format(callback_time,'yyyy-MM-dd'),user_id
),
tmp_ri as
(selectdate_format(create_time,'yyyy-MM-dd') dt,user_id,count(*) refund_order_count,sum(refund_num) refund_order_num,sum(refund_amount) refund_order_amountfrom dwd_order_refund_infogroup by date_format(create_time,'yyyy-MM-dd'),user_id
),
tmp_rp as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,rp.user_id,count(*) refund_payment_count,sum(ri.refund_num) refund_payment_num,sum(rp.refund_amount) refund_payment_amountfrom(selectuser_id,order_id,sku_id,refund_amount,callback_timefrom dwd_refund_payment)rpleft join(selectuser_id,order_id,sku_id,refund_numfrom dwd_order_refund_info)rion rp.order_id=ri.order_idand rp.sku_id=rp.sku_idgroup by date_format(callback_time,'yyyy-MM-dd'),rp.user_id
),
tmp_coupon as
(selectcoalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt) dt,coalesce(coupon_get.user_id,coupon_using.user_id,coupon_used.user_id) user_id,nvl(coupon_get_count,0) coupon_get_count,nvl(coupon_using_count,0) coupon_using_count,nvl(coupon_used_count,0) coupon_used_countfrom(selectdate_format(get_time,'yyyy-MM-dd') dt,user_id,count(*) coupon_get_countfrom dwd_coupon_usewhere get_time is not nullgroup by user_id,date_format(get_time,'yyyy-MM-dd'))coupon_getfull outer join(selectdate_format(using_time,'yyyy-MM-dd') dt,user_id,count(*) coupon_using_countfrom dwd_coupon_usewhere using_time is not nullgroup by user_id,date_format(using_time,'yyyy-MM-dd'))coupon_usingon coupon_get.dt=coupon_using.dtand coupon_get.user_id=coupon_using.user_idfull outer join(selectdate_format(used_time,'yyyy-MM-dd') dt,user_id,count(*) coupon_used_countfrom dwd_coupon_usewhere used_time is not nullgroup by user_id,date_format(used_time,'yyyy-MM-dd'))coupon_usedon nvl(coupon_get.dt,coupon_using.dt)=coupon_used.dtand nvl(coupon_get.user_id,coupon_using.user_id)=coupon_used.user_id
),
tmp_comment as
(selectdate_format(create_time,'yyyy-MM-dd') dt,user_id,sum(if(appraise='1201',1,0)) appraise_good_count,sum(if(appraise='1202',1,0)) appraise_mid_count,sum(if(appraise='1203',1,0)) appraise_bad_count,sum(if(appraise='1204',1,0)) appraise_default_countfrom dwd_comment_infogroup by date_format(create_time,'yyyy-MM-dd'),user_id
),
tmp_od as
(selectdt,user_id,collect_set(named_struct('sku_id',sku_id,'sku_num',sku_num,'order_count',order_count,'activity_reduce_amount',activity_reduce_amount,'coupon_reduce_amount',coupon_reduce_amount,'original_amount',original_amount,'final_amount',final_amount)) order_detail_statsfrom(selectdate_format(create_time,'yyyy-MM-dd') dt,user_id,sku_id,sum(sku_num) sku_num,count(*) order_count,cast(sum(split_activity_amount) as decimal(16,2)) activity_reduce_amount,cast(sum(split_coupon_amount) as decimal(16,2)) coupon_reduce_amount,cast(sum(original_amount) as decimal(16,2)) original_amount,cast(sum(split_final_amount) as decimal(16,2)) final_amountfrom dwd_order_detailgroup by date_format(create_time,'yyyy-MM-dd'),user_id,sku_id)t1group by dt,user_id
)
insert overwrite table dws_user_action_daycount partition(dt)
selectcoalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id,tmp_od.user_id),nvl(login_count,0),nvl(cart_count,0),nvl(favor_count,0),nvl(order_count,0),nvl(order_activity_count,0),nvl(order_activity_reduce_amount,0),nvl(order_coupon_count,0),nvl(order_coupon_reduce_amount,0),nvl(order_original_amount,0),nvl(order_final_amount,0),nvl(payment_count,0),nvl(payment_amount,0),nvl(refund_order_count,0),nvl(refund_order_num,0),nvl(refund_order_amount,0),nvl(refund_payment_count,0),nvl(refund_payment_num,0),nvl(refund_payment_amount,0),nvl(coupon_get_count,0),nvl(coupon_using_count,0),nvl(coupon_used_count,0),nvl(appraise_good_count,0),nvl(appraise_mid_count,0),nvl(appraise_bad_count,0),nvl(appraise_default_count,0),order_detail_stats,coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt,tmp_coupon.dt,tmp_od.dt)
from tmp_login
full outer join tmp_cf
on tmp_login.user_id=tmp_cf.user_id
and tmp_login.dt=tmp_cf.dt
full outer join tmp_order
on coalesce(tmp_login.user_id,tmp_cf.user_id)=tmp_order.user_id
and coalesce(tmp_login.dt,tmp_cf.dt)=tmp_order.dt
full outer join tmp_pay
on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id)=tmp_pay.user_id
and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt)=tmp_pay.dt
full outer join tmp_ri
on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id)=tmp_ri.user_id
and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt)=tmp_ri.dt
full outer join tmp_rp
on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id)=tmp_rp.user_id
and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt)=tmp_rp.dt
full outer join tmp_comment
on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id)=tmp_comment.user_id
and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt)=tmp_comment.dt
full outer join tmp_coupon
on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id)=tmp_coupon.user_id
and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt)=tmp_coupon.dt
full outer join tmp_od
on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id)=tmp_od.user_id
and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt,tmp_coupon.dt)=tmp_od.dt;

(2)每日装载

with
tmp_login as
(selectuser_id,count(*) login_countfrom dwd_page_logwhere dt='2020-06-15'and user_id is not nulland last_page_id is nullgroup by user_id
),
tmp_cf as
(selectuser_id,sum(if(action_id='cart_add',1,0)) cart_count,sum(if(action_id='favor_add',1,0)) favor_countfrom dwd_action_logwhere dt='2020-06-15'and user_id is not nulland action_id in ('cart_add','favor_add')group by user_id
),
tmp_order as
(selectuser_id,count(*) order_count,sum(if(activity_reduce_amount>0,1,0)) order_activity_count,sum(if(coupon_reduce_amount>0,1,0)) order_coupon_count,sum(activity_reduce_amount) order_activity_reduce_amount,sum(coupon_reduce_amount) order_coupon_reduce_amount,sum(original_amount) order_original_amount,sum(final_amount) order_final_amountfrom dwd_order_infowhere (dt='2020-06-15'or dt='9999-99-99')and date_format(create_time,'yyyy-MM-dd')='2020-06-15'group by user_id
),
tmp_pay as
(selectuser_id,count(*) payment_count,sum(payment_amount) payment_amountfrom dwd_payment_infowhere dt='2020-06-15'group by user_id
),
tmp_ri as
(selectuser_id,count(*) refund_order_count,sum(refund_num) refund_order_num,sum(refund_amount) refund_order_amountfrom dwd_order_refund_infowhere dt='2020-06-15'group by user_id
),
tmp_rp as
(selectrp.user_id,count(*) refund_payment_count,sum(ri.refund_num) refund_payment_num,sum(rp.refund_amount) refund_payment_amountfrom(selectuser_id,order_id,sku_id,refund_amountfrom dwd_refund_paymentwhere dt='2020-06-15')rpleft join(selectuser_id,order_id,sku_id,refund_numfrom dwd_order_refund_infowhere dt>=date_add('2020-06-15',-15))rion rp.order_id=ri.order_idand rp.sku_id=rp.sku_idgroup by rp.user_id
),
tmp_coupon as
(selectuser_id,sum(if(date_format(get_time,'yyyy-MM-dd')='2020-06-15',1,0)) coupon_get_count,sum(if(date_format(using_time,'yyyy-MM-dd')='2020-06-15',1,0)) coupon_using_count,sum(if(date_format(used_time,'yyyy-MM-dd')='2020-06-15',1,0)) coupon_used_countfrom dwd_coupon_usewhere (dt='2020-06-15' or dt='9999-99-99')and (date_format(get_time, 'yyyy-MM-dd') = '2020-06-15'or date_format(using_time,'yyyy-MM-dd')='2020-06-15'or date_format(used_time,'yyyy-MM-dd')='2020-06-15')group by user_id
),
tmp_comment as
(selectuser_id,sum(if(appraise='1201',1,0)) appraise_good_count,sum(if(appraise='1202',1,0)) appraise_mid_count,sum(if(appraise='1203',1,0)) appraise_bad_count,sum(if(appraise='1204',1,0)) appraise_default_countfrom dwd_comment_infowhere dt='2020-06-15'group by user_id
),
tmp_od as
(selectuser_id,collect_set(named_struct('sku_id',sku_id,'sku_num',sku_num,'order_count',order_count,'activity_reduce_amount',activity_reduce_amount,'coupon_reduce_amount',coupon_reduce_amount,'original_amount',original_amount,'final_amount',final_amount)) order_detail_statsfrom(selectuser_id,sku_id,sum(sku_num) sku_num,count(*) order_count,cast(sum(split_activity_amount) as decimal(16,2)) activity_reduce_amount,cast(sum(split_coupon_amount) as decimal(16,2)) coupon_reduce_amount,cast(sum(original_amount) as decimal(16,2)) original_amount,cast(sum(split_final_amount) as decimal(16,2)) final_amountfrom dwd_order_detailwhere dt='2020-06-15'group by user_id,sku_id)t1group by user_id
)
insert overwrite table dws_user_action_daycount partition(dt='2020-06-15')
selectcoalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id,tmp_od.user_id),nvl(login_count,0),nvl(cart_count,0),nvl(favor_count,0),nvl(order_count,0),nvl(order_activity_count,0),nvl(order_activity_reduce_amount,0),nvl(order_coupon_count,0),nvl(order_coupon_reduce_amount,0),nvl(order_original_amount,0),nvl(order_final_amount,0),nvl(payment_count,0),nvl(payment_amount,0),nvl(refund_order_count,0),nvl(refund_order_num,0),nvl(refund_order_amount,0),nvl(refund_payment_count,0),nvl(refund_payment_num,0),nvl(refund_payment_amount,0),nvl(coupon_get_count,0),nvl(coupon_using_count,0),nvl(coupon_used_count,0),nvl(appraise_good_count,0),nvl(appraise_mid_count,0),nvl(appraise_bad_count,0),nvl(appraise_default_count,0),order_detail_stats
from tmp_login
full outer join tmp_cf on tmp_login.user_id=tmp_cf.user_id
full outer join tmp_order on coalesce(tmp_login.user_id,tmp_cf.user_id)=tmp_order.user_id
full outer join tmp_pay on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id)=tmp_pay.user_id
full outer join tmp_ri on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id)=tmp_ri.user_id
full outer join tmp_rp on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id)=tmp_rp.user_id
full outer join tmp_comment on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id)=tmp_comment.user_id
full outer join tmp_coupon on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id)=tmp_coupon.user_id
full outer join tmp_od on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id)=tmp_od.user_id;

3)查询加载结果

7.2.3 商品主题

1)建表语句

DROP TABLE IF EXISTS dws_sku_action_daycount;
CREATE EXTERNAL TABLE dws_sku_action_daycount
(`sku_id` STRING COMMENT 'sku_id',`order_count` BIGINT COMMENT '被下单次数',`order_num` BIGINT COMMENT '被下单件数',`order_activity_count` BIGINT COMMENT '参与活动被下单次数',`order_coupon_count` BIGINT COMMENT '使用优惠券被下单次数',`order_activity_reduce_amount` DECIMAL(16,2) COMMENT '优惠金额(活动)',`order_coupon_reduce_amount` DECIMAL(16,2) COMMENT '优惠金额(优惠券)',`order_original_amount` DECIMAL(16,2) COMMENT '被下单原价金额',`order_final_amount` DECIMAL(16,2) COMMENT '被下单最终金额',`payment_count` BIGINT COMMENT '被支付次数',`payment_num` BIGINT COMMENT '被支付件数',`payment_amount` DECIMAL(16,2) COMMENT '被支付金额',`refund_order_count` BIGINT  COMMENT '被退单次数',`refund_order_num` BIGINT COMMENT '被退单件数',`refund_order_amount` DECIMAL(16,2) COMMENT '被退单金额',`refund_payment_count` BIGINT  COMMENT '被退款次数',`refund_payment_num` BIGINT COMMENT '被退款件数',`refund_payment_amount` DECIMAL(16,2) COMMENT '被退款金额',`cart_count` BIGINT COMMENT '被加入购物车次数',`favor_count` BIGINT COMMENT '被收藏次数',`appraise_good_count` BIGINT COMMENT '好评数',`appraise_mid_count` BIGINT COMMENT '中评数',`appraise_bad_count` BIGINT COMMENT '差评数',`appraise_default_count` BIGINT COMMENT '默认评价数'
) COMMENT '每日商品行为'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dws/dws_sku_action_daycount/'
TBLPROPERTIES ("parquet.compression"="lzo");

2)数据装载

(1)首日装载

with
tmp_order as
(selectdate_format(create_time,'yyyy-MM-dd') dt,sku_id,count(*) order_count,sum(sku_num) order_num,sum(if(split_activity_amount>0,1,0)) order_activity_count,sum(if(split_coupon_amount>0,1,0)) order_coupon_count,sum(split_activity_amount) order_activity_reduce_amount,sum(split_coupon_amount) order_coupon_reduce_amount,sum(original_amount) order_original_amount,sum(split_final_amount) order_final_amountfrom dwd_order_detailgroup by date_format(create_time,'yyyy-MM-dd'),sku_id
),
tmp_pay as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,sku_id,count(*) payment_count,sum(sku_num) payment_num,sum(split_final_amount) payment_amountfrom dwd_order_detail odjoin(selectorder_id,callback_timefrom dwd_payment_infowhere callback_time is not null)pi on pi.order_id=od.order_idgroup by date_format(callback_time,'yyyy-MM-dd'),sku_id
),
tmp_ri as
(selectdate_format(create_time,'yyyy-MM-dd') dt,sku_id,count(*) refund_order_count,sum(refund_num) refund_order_num,sum(refund_amount) refund_order_amountfrom dwd_order_refund_infogroup by date_format(create_time,'yyyy-MM-dd'),sku_id
),
tmp_rp as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,rp.sku_id,count(*) refund_payment_count,sum(ri.refund_num) refund_payment_num,sum(refund_amount) refund_payment_amountfrom(selectorder_id,sku_id,refund_amount,callback_timefrom dwd_refund_payment)rpleft join(selectorder_id,sku_id,refund_numfrom dwd_order_refund_info)rion rp.order_id=ri.order_idand rp.sku_id=ri.sku_idgroup by date_format(callback_time,'yyyy-MM-dd'),rp.sku_id
),
tmp_cf as
(selectdt,item sku_id,sum(if(action_id='cart_add',1,0)) cart_count,sum(if(action_id='favor_add',1,0)) favor_countfrom dwd_action_logwhere action_id in ('cart_add','favor_add')group by dt,item
),
tmp_comment as
(selectdate_format(create_time,'yyyy-MM-dd') dt,sku_id,sum(if(appraise='1201',1,0)) appraise_good_count,sum(if(appraise='1202',1,0)) appraise_mid_count,sum(if(appraise='1203',1,0)) appraise_bad_count,sum(if(appraise='1204',1,0)) appraise_default_countfrom dwd_comment_infogroup by date_format(create_time,'yyyy-MM-dd'),sku_id
)
insert overwrite table dws_sku_action_daycount partition(dt)
selectsku_id,sum(order_count),sum(order_num),sum(order_activity_count),sum(order_coupon_count),sum(order_activity_reduce_amount),sum(order_coupon_reduce_amount),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_num),sum(payment_amount),sum(refund_order_count),sum(refund_order_num),sum(refund_order_amount),sum(refund_payment_count),sum(refund_payment_num),sum(refund_payment_amount),sum(cart_count),sum(favor_count),sum(appraise_good_count),sum(appraise_mid_count),sum(appraise_bad_count),sum(appraise_default_count),dt
from
(selectdt,sku_id,order_count,order_num,order_activity_count,order_coupon_count,order_activity_reduce_amount,order_coupon_reduce_amount,order_original_amount,order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_orderunion allselectdt,sku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,payment_count,payment_num,payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_payunion allselectdt,sku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,refund_order_count,refund_order_num,refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_riunion allselectdt,sku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,refund_payment_count,refund_payment_num,refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_rpunion allselectdt,sku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,cart_count,favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_cfunion allselectdt,sku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,appraise_good_count,appraise_mid_count,appraise_bad_count,appraise_default_countfrom tmp_comment
)t1
group by dt,sku_id;

(2)每日装载

with
tmp_order as
(selectsku_id,count(*) order_count,sum(sku_num) order_num,sum(if(split_activity_amount>0,1,0)) order_activity_count,sum(if(split_coupon_amount>0,1,0)) order_coupon_count,sum(split_activity_amount) order_activity_reduce_amount,sum(split_coupon_amount) order_coupon_reduce_amount,sum(original_amount) order_original_amount,sum(split_final_amount) order_final_amountfrom dwd_order_detailwhere dt='2020-06-15'group by sku_id
),
tmp_pay as
(selectsku_id,count(*) payment_count,sum(sku_num) payment_num,sum(split_final_amount) payment_amountfrom dwd_order_detailwhere (dt='2020-06-15'or dt=date_add('2020-06-15',-1))and order_id in(select order_id from dwd_payment_info where dt='2020-06-15')group by sku_id
),
tmp_ri as
(selectsku_id,count(*) refund_order_count,sum(refund_num) refund_order_num,sum(refund_amount) refund_order_amountfrom dwd_order_refund_infowhere dt='2020-06-15'group by sku_id
),
tmp_rp as
(selectrp.sku_id,count(*) refund_payment_count,sum(ri.refund_num) refund_payment_num,sum(refund_amount) refund_payment_amountfrom(selectorder_id,sku_id,refund_amountfrom dwd_refund_paymentwhere dt='2020-06-15')rpleft join(selectorder_id,sku_id,refund_numfrom dwd_order_refund_infowhere dt>=date_add('2020-06-15',-15))rion rp.order_id=ri.order_idand rp.sku_id=ri.sku_idgroup by rp.sku_id
),
tmp_cf as
(selectitem sku_id,sum(if(action_id='cart_add',1,0)) cart_count,sum(if(action_id='favor_add',1,0)) favor_countfrom dwd_action_logwhere dt='2020-06-15'and action_id in ('cart_add','favor_add')group by item
),
tmp_comment as
(selectsku_id,sum(if(appraise='1201',1,0)) appraise_good_count,sum(if(appraise='1202',1,0)) appraise_mid_count,sum(if(appraise='1203',1,0)) appraise_bad_count,sum(if(appraise='1204',1,0)) appraise_default_countfrom dwd_comment_infowhere dt='2020-06-15'group by sku_id
)
insert overwrite table dws_sku_action_daycount partition(dt='2020-06-15')
selectsku_id,sum(order_count),sum(order_num),sum(order_activity_count),sum(order_coupon_count),sum(order_activity_reduce_amount),sum(order_coupon_reduce_amount),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_num),sum(payment_amount),sum(refund_order_count),sum(refund_order_num),sum(refund_order_amount),sum(refund_payment_count),sum(refund_payment_num),sum(refund_payment_amount),sum(cart_count),sum(favor_count),sum(appraise_good_count),sum(appraise_mid_count),sum(appraise_bad_count),sum(appraise_default_count)
from
(selectsku_id,order_count,order_num,order_activity_count,order_coupon_count,order_activity_reduce_amount,order_coupon_reduce_amount,order_original_amount,order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_orderunion allselectsku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,payment_count,payment_num,payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_payunion allselectsku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,refund_order_count,refund_order_num,refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_riunion allselectsku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,refund_payment_count,refund_payment_num,refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_rpunion allselectsku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,cart_count,favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_cfunion allselectsku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,appraise_good_count,appraise_mid_count,appraise_bad_count,appraise_default_countfrom tmp_comment
)t1
group by sku_id;

3)查询加载结果

7.2.4 优惠券主题

1)建表语句

DROP TABLE IF EXISTS dws_coupon_info_daycount;
CREATE EXTERNAL TABLE dws_coupon_info_daycount(`coupon_id` STRING COMMENT '优惠券ID',`get_count` BIGINT COMMENT '被领取次数',`order_count` BIGINT COMMENT '被使用(下单)次数', `order_reduce_amount` DECIMAL(16,2) COMMENT '用券下单优惠金额',`order_original_amount` DECIMAL(16,2) COMMENT '用券订单原价金额',`order_final_amount` DECIMAL(16,2) COMMENT '用券下单最终金额',`payment_count` BIGINT COMMENT '被使用(支付)次数',`payment_reduce_amount` DECIMAL(16,2) COMMENT '用券支付优惠金额',`payment_amount` DECIMAL(16,2) COMMENT '用券支付总金额',`expire_count` BIGINT COMMENT '过期次数'
) COMMENT '每日活动统计'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dws/dws_coupon_info_daycount/'
TBLPROPERTIES ("parquet.compression"="lzo");

2)数据装载

(1)首日装载

with
tmp_cu as
(selectcoalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt,coupon_exprie.dt) dt,coalesce(coupon_get.coupon_id,coupon_using.coupon_id,coupon_used.coupon_id,coupon_exprie.coupon_id) coupon_id,nvl(get_count,0) get_count,nvl(order_count,0) order_count,nvl(payment_count,0) payment_count,nvl(expire_count,0) expire_countfrom(selectdate_format(get_time,'yyyy-MM-dd') dt,coupon_id,count(*) get_countfrom dwd_coupon_usegroup by date_format(get_time,'yyyy-MM-dd'),coupon_id)coupon_getfull outer join(selectdate_format(using_time,'yyyy-MM-dd') dt,coupon_id,count(*) order_countfrom dwd_coupon_usewhere using_time is not nullgroup by date_format(using_time,'yyyy-MM-dd'),coupon_id)coupon_usingon coupon_get.dt=coupon_using.dtand coupon_get.coupon_id=coupon_using.coupon_idfull outer join(selectdate_format(used_time,'yyyy-MM-dd') dt,coupon_id,count(*) payment_countfrom dwd_coupon_usewhere used_time is not nullgroup by date_format(used_time,'yyyy-MM-dd'),coupon_id)coupon_usedon nvl(coupon_get.dt,coupon_using.dt)=coupon_used.dtand nvl(coupon_get.coupon_id,coupon_using.coupon_id)=coupon_used.coupon_idfull outer join(selectdate_format(expire_time,'yyyy-MM-dd') dt,coupon_id,count(*) expire_countfrom dwd_coupon_usewhere expire_time is not nullgroup by date_format(expire_time,'yyyy-MM-dd'),coupon_id)coupon_exprieon coalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt)=coupon_exprie.dtand coalesce(coupon_get.coupon_id,coupon_using.coupon_id,coupon_used.coupon_id)=coupon_exprie.coupon_id
),
tmp_order as
(selectdate_format(create_time,'yyyy-MM-dd') dt,coupon_id,sum(split_coupon_amount) order_reduce_amount,sum(original_amount) order_original_amount,sum(split_final_amount) order_final_amountfrom dwd_order_detailwhere coupon_id is not nullgroup by date_format(create_time,'yyyy-MM-dd'),coupon_id
),
tmp_pay as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,coupon_id,sum(split_coupon_amount) payment_reduce_amount,sum(split_final_amount) payment_amountfrom(selectorder_id,coupon_id,split_coupon_amount,split_final_amountfrom dwd_order_detailwhere coupon_id is not null)odjoin(selectorder_id,callback_timefrom dwd_payment_info)pion od.order_id=pi.order_idgroup by date_format(callback_time,'yyyy-MM-dd'),coupon_id
)
insert overwrite table dws_coupon_info_daycount partition(dt)
selectcoupon_id,sum(get_count),sum(order_count),sum(order_reduce_amount),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_reduce_amount),sum(payment_amount),sum(expire_count),dt
from
(selectdt,coupon_id,get_count,order_count,0 order_reduce_amount,0 order_original_amount,0 order_final_amount,payment_count,0 payment_reduce_amount,0 payment_amount,expire_countfrom tmp_cuunion allselectdt,coupon_id,0 get_count,0 order_count,order_reduce_amount,order_original_amount,order_final_amount,0 payment_count,0 payment_reduce_amount,0 payment_amount,0 expire_countfrom tmp_orderunion allselectdt,coupon_id,0 get_count,0 order_count,0 order_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,payment_reduce_amount,payment_amount,0 expire_countfrom tmp_pay
)t1
group by dt,coupon_id;

(2)每日装载

with
tmp_cu as
(selectcoupon_id,sum(if(date_format(get_time,'yyyy-MM-dd')='2020-06-15',1,0)) get_count,sum(if(date_format(using_time,'yyyy-MM-dd')='2020-06-15',1,0)) order_count,sum(if(date_format(used_time,'yyyy-MM-dd')='2020-06-15',1,0)) payment_count,sum(if(date_format(expire_time,'yyyy-MM-dd')='2020-06-15',1,0)) expire_countfrom dwd_coupon_usewhere dt='9999-99-99'or dt='2020-06-15'group by coupon_id
),
tmp_order as
(selectcoupon_id,sum(split_coupon_amount) order_reduce_amount,sum(original_amount) order_original_amount,sum(split_final_amount) order_final_amountfrom dwd_order_detailwhere dt='2020-06-15'and coupon_id is not nullgroup by coupon_id
),
tmp_pay as
(selectcoupon_id,sum(split_coupon_amount) payment_reduce_amount,sum(split_final_amount) payment_amountfrom dwd_order_detailwhere (dt='2020-06-15'or dt=date_add('2020-06-15',-1))and coupon_id is not nulland order_id in(select order_id from dwd_payment_info where dt='2020-06-15')group by coupon_id
)
insert overwrite table dws_coupon_info_daycount partition(dt='2020-06-15')
selectcoupon_id,sum(get_count),sum(order_count),sum(order_reduce_amount),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_reduce_amount),sum(payment_amount),sum(expire_count)
from
(selectcoupon_id,get_count,order_count,0 order_reduce_amount,0 order_original_amount,0 order_final_amount,payment_count,0 payment_reduce_amount,0 payment_amount,expire_countfrom tmp_cuunion allselectcoupon_id,0 get_count,0 order_count,order_reduce_amount,order_original_amount,order_final_amount,0 payment_count,0 payment_reduce_amount,0 payment_amount,0 expire_countfrom tmp_orderunion allselectcoupon_id,0 get_count,0 order_count,0 order_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,payment_reduce_amount,payment_amount,0 expire_countfrom tmp_pay
)t1
group by coupon_id;

3)查询加载结果

7.2.5 活动主题

1)建表语句

DROP TABLE IF EXISTS dws_activity_info_daycount;
CREATE EXTERNAL TABLE dws_activity_info_daycount(`activity_rule_id` STRING COMMENT '活动规则ID',`activity_id` STRING COMMENT '活动ID',`order_count` BIGINT COMMENT '参与某活动某规则下单次数',    `order_reduce_amount` DECIMAL(16,2) COMMENT '参与某活动某规则下单减免金额',`order_original_amount` DECIMAL(16,2) COMMENT '参与某活动某规则下单原始金额',`order_final_amount` DECIMAL(16,2) COMMENT '参与某活动某规则下单最终金额',`payment_count` BIGINT COMMENT '参与某活动某规则支付次数',`payment_reduce_amount` DECIMAL(16,2) COMMENT '参与某活动某规则支付减免金额',`payment_amount` DECIMAL(16,2) COMMENT '参与某活动某规则支付金额'
) COMMENT '每日活动统计'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dws/dws_activity_info_daycount/'
TBLPROPERTIES ("parquet.compression"="lzo");

2)数据装载

(1)首日装载

with
tmp_order as
(selectdate_format(create_time,'yyyy-MM-dd') dt,activity_rule_id,activity_id,count(*) order_count,sum(split_activity_amount) order_reduce_amount,sum(original_amount) order_original_amount,sum(split_final_amount) order_final_amountfrom dwd_order_detailwhere activity_id is not nullgroup by date_format(create_time,'yyyy-MM-dd'),activity_rule_id,activity_id
),
tmp_pay as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,activity_rule_id,activity_id,count(*) payment_count,sum(split_activity_amount) payment_reduce_amount,sum(split_final_amount) payment_amountfrom(selectactivity_rule_id,activity_id,order_id,split_activity_amount,split_final_amountfrom dwd_order_detailwhere activity_id is not null)odjoin(selectorder_id,callback_timefrom dwd_payment_info)pion od.order_id=pi.order_idgroup by date_format(callback_time,'yyyy-MM-dd'),activity_rule_id,activity_id
)
insert overwrite table dws_activity_info_daycount partition(dt)
selectactivity_rule_id,activity_id,sum(order_count),sum(order_reduce_amount),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_reduce_amount),sum(payment_amount),dt
from
(selectdt,activity_rule_id,activity_id,order_count,order_reduce_amount,order_original_amount,order_final_amount,0 payment_count,0 payment_reduce_amount,0 payment_amountfrom tmp_orderunion allselectdt,activity_rule_id,activity_id,0 order_count,0 order_reduce_amount,0 order_original_amount,0 order_final_amount,payment_count,payment_reduce_amount,payment_amountfrom tmp_pay
)t1
group by dt,activity_rule_id,activity_id;

(2)每日装载

with
tmp_order as
(selectactivity_rule_id,activity_id,count(*) order_count,sum(split_activity_amount) order_reduce_amount,sum(original_amount) order_original_amount,sum(split_final_amount) order_final_amountfrom dwd_order_detailwhere dt='2020-06-15'and activity_id is not nullgroup by activity_rule_id,activity_id
),
tmp_pay as
(selectactivity_rule_id,activity_id,count(*) payment_count,sum(split_activity_amount) payment_reduce_amount,sum(split_final_amount) payment_amountfrom dwd_order_detailwhere (dt='2020-06-15'or dt=date_add('2020-06-15',-1))and activity_id is not nulland order_id in(select order_id from dwd_payment_info where dt='2020-06-15')group by activity_rule_id,activity_id
)
insert overwrite table dws_activity_info_daycount partition(dt='2020-06-15')
selectactivity_rule_id,activity_id,sum(order_count),sum(order_reduce_amount),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_reduce_amount),sum(payment_amount)
from
(selectactivity_rule_id,activity_id,order_count,order_reduce_amount,order_original_amount,order_final_amount,0 payment_count,0 payment_reduce_amount,0 payment_amountfrom tmp_orderunion allselectactivity_rule_id,activity_id,0 order_count,0 order_reduce_amount,0 order_original_amount,0 order_final_amount,payment_count,payment_reduce_amount,payment_amountfrom tmp_pay
)t1
group by activity_rule_id,activity_id;

3)查询加载结果

7.2.6 地区主题

1)建表语句

DROP TABLE IF EXISTS dws_area_stats_daycount;
CREATE EXTERNAL TABLE dws_area_stats_daycount(`province_id` STRING COMMENT '地区编号',`visit_count` BIGINT COMMENT '访问次数',`login_count` BIGINT COMMENT '登录次数',`visitor_count` BIGINT COMMENT '访客人数',`user_count` BIGINT COMMENT '用户人数',`order_count` BIGINT COMMENT '下单次数',`order_original_amount` DECIMAL(16,2) COMMENT '下单原始金额',`order_final_amount` DECIMAL(16,2) COMMENT '下单最终金额',`payment_count` BIGINT COMMENT '支付次数',`payment_amount` DECIMAL(16,2) COMMENT '支付金额',`refund_order_count` BIGINT COMMENT '退单次数',`refund_order_amount` DECIMAL(16,2) COMMENT '退单金额',`refund_payment_count` BIGINT COMMENT '退款次数',`refund_payment_amount` DECIMAL(16,2) COMMENT '退款金额'
) COMMENT '每日地区统计表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dws/dws_area_stats_daycount/'
TBLPROPERTIES ("parquet.compression"="lzo");

2)数据装载

(1)首日装载

with
tmp_vu as
(selectdt,id province_id,visit_count,login_count,visitor_count,user_countfrom(selectdt,area_code,count(*) visit_count,--访客访问次数count(user_id) login_count,--用户访问次数,等价于sum(if(user_id is not null,1,0))count(distinct(mid_id)) visitor_count,--访客人数count(distinct(user_id)) user_count--用户人数from dwd_page_logwhere last_page_id is nullgroup by dt,area_code)tmpleft join dim_base_province areaon tmp.area_code=area.area_code
),
tmp_order as
(selectdate_format(create_time,'yyyy-MM-dd') dt,province_id,count(*) order_count,sum(original_amount) order_original_amount,sum(final_amount) order_final_amountfrom dwd_order_infogroup by date_format(create_time,'yyyy-MM-dd'),province_id
),
tmp_pay as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,province_id,count(*) payment_count,sum(payment_amount) payment_amountfrom dwd_payment_infogroup by date_format(callback_time,'yyyy-MM-dd'),province_id
),
tmp_ro as
(selectdate_format(create_time,'yyyy-MM-dd') dt,province_id,count(*) refund_order_count,sum(refund_amount) refund_order_amountfrom dwd_order_refund_infogroup by date_format(create_time,'yyyy-MM-dd'),province_id
),
tmp_rp as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,province_id,count(*) refund_payment_count,sum(refund_amount) refund_payment_amountfrom dwd_refund_paymentgroup by date_format(callback_time,'yyyy-MM-dd'),province_id
)
insert overwrite table dws_area_stats_daycount partition(dt)
selectprovince_id,sum(visit_count),sum(login_count),sum(visitor_count),sum(user_count),sum(order_count),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_amount),sum(refund_order_count),sum(refund_order_amount),sum(refund_payment_count),sum(refund_payment_amount),dt
from
(selectdt,province_id,visit_count,login_count,visitor_count,user_count,0 order_count,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_amount,0 refund_order_count,0 refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_vuunion allselectdt,province_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,order_count,order_original_amount,order_final_amount,0 payment_count,0 payment_amount,0 refund_order_count,0 refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_orderunion allselectdt,province_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,0 order_count,0 order_original_amount,0 order_final_amount,payment_count,payment_amount,0 refund_order_count,0 refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_payunion allselectdt,province_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,0 order_count,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_amount,refund_order_count,refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_rounion allselectdt,province_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,0 order_count,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_amount,0 refund_order_count,0 refund_order_amount,refund_payment_count,refund_payment_amountfrom tmp_rp
)t1
group by dt,province_id;

(2)每日装载

with
tmp_vu as
(selectid province_id,visit_count,login_count,visitor_count,user_countfrom(selectarea_code,count(*) visit_count,--访客访问次数count(user_id) login_count,--用户访问次数,等价于sum(if(user_id is not null,1,0))count(distinct(mid_id)) visitor_count,--访客人数count(distinct(user_id)) user_count--用户人数from dwd_page_logwhere dt='2020-06-15'and last_page_id is nullgroup by area_code)tmpleft join dim_base_province areaon tmp.area_code=area.area_code
),
tmp_order as
(selectprovince_id,count(*) order_count,sum(original_amount) order_original_amount,sum(final_amount) order_final_amountfrom dwd_order_infowhere dt='2020-06-15'or dt='9999-99-99'and date_format(create_time,'yyyy-MM-dd')='2020-06-15'group by province_id
),
tmp_pay as
(selectprovince_id,count(*) payment_count,sum(payment_amount) payment_amountfrom dwd_payment_infowhere dt='2020-06-15'group by province_id
),
tmp_ro as
(selectprovince_id,count(*) refund_order_count,sum(refund_amount) refund_order_amountfrom dwd_order_refund_infowhere dt='2020-06-15'group by province_id
),
tmp_rp as
(selectprovince_id,count(*) refund_payment_count,sum(refund_amount) refund_payment_amountfrom dwd_refund_paymentwhere dt='2020-06-15'group by province_id
)
insert overwrite table dws_area_stats_daycount partition(dt='2020-06-15')
selectprovince_id,sum(visit_count),sum(login_count),sum(visitor_count),sum(user_count),sum(order_count),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_amount),sum(refund_order_count),sum(refund_order_amount),sum(refund_payment_count),sum(refund_payment_amount)
from
(selectprovince_id,visit_count,login_count,visitor_count,user_count,0 order_count,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_amount,0 refund_order_count,0 refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_vuunion allselectprovince_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,order_count,order_original_amount,order_final_amount,0 payment_count,0 payment_amount,0 refund_order_count,0 refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_orderunion allselectprovince_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,0 order_count,0 order_original_amount,0 order_final_amount,payment_count,payment_amount,0 refund_order_count,0 refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_payunion allselectprovince_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,0 order_count,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_amount,refund_order_count,refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_rounion allselectprovince_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,0 order_count,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_amount,0 refund_order_count,0 refund_order_amount,refund_payment_count,refund_payment_amountfrom tmp_rp
)t1
group by province_id;

3)查询加载结果

7.2.7 DWS层首日数据装载脚本

1)编写脚本

(1)在/home/seven/bin目录下创建脚本dwd_to_dws_init.sh

#!/bin/bashAPP=gmallif [ -n "$2" ] ;thendo_date=$2
else echo "请传入日期参数"exit
fidws_visitor_action_daycount="
insert overwrite table ${APP}.dws_visitor_action_daycount partition(dt='$do_date')
selectt1.mid_id,t1.brand,t1.model,t1.is_new,t1.channel,t1.os,t1.area_code,t1.version_code,t1.visit_count,t3.page_stats
from
(selectmid_id,brand,model,if(array_contains(collect_set(is_new),'0'),'0','1') is_new,--ods_page_log中,同一天内,同一设备的is_new字段,可能全部为1,可能全部为0,也可能部分为0,部分为1(卸载重装),故做该处理collect_set(channel) channel,collect_set(os) os,collect_set(area_code) area_code,collect_set(version_code) version_code,sum(if(last_page_id is null,1,0)) visit_countfrom ${APP}.dwd_page_logwhere dt='$do_date'and last_page_id is nullgroup by mid_id,model,brand
)t1
join
(selectmid_id,brand,model,collect_set(named_struct('page_id',page_id,'page_count',page_count,'during_time',during_time)) page_statsfrom(selectmid_id,brand,model,page_id,count(*) page_count,sum(during_time) during_timefrom ${APP}.dwd_page_logwhere dt='$do_date'group by mid_id,model,brand,page_id)t2group by mid_id,model,brand
)t3
on t1.mid_id=t3.mid_id
and t1.brand=t3.brand
and t1.model=t3.model;
"dws_area_stats_daycount="
set hive.exec.dynamic.partition.mode=nonstrict;
with
tmp_vu as
(selectdt,id province_id,visit_count,login_count,visitor_count,user_countfrom(selectdt,area_code,count(*) visit_count,--访客访问次数count(user_id) login_count,--用户访问次数,等价于sum(if(user_id is not null,1,0))count(distinct(mid_id)) visitor_count,--访客人数count(distinct(user_id)) user_count--用户人数from ${APP}.dwd_page_logwhere last_page_id is nullgroup by dt,area_code)tmpleft join ${APP}.dim_base_province areaon tmp.area_code=area.area_code
),
tmp_order as
(selectdate_format(create_time,'yyyy-MM-dd') dt,province_id,count(*) order_count,sum(original_amount) order_original_amount,sum(final_amount) order_final_amountfrom ${APP}.dwd_order_infogroup by date_format(create_time,'yyyy-MM-dd'),province_id
),
tmp_pay as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,province_id,count(*) payment_count,sum(payment_amount) payment_amountfrom ${APP}.dwd_payment_infogroup by date_format(callback_time,'yyyy-MM-dd'),province_id
),
tmp_ro as
(selectdate_format(create_time,'yyyy-MM-dd') dt,province_id,count(*) refund_order_count,sum(refund_amount) refund_order_amountfrom ${APP}.dwd_order_refund_infogroup by date_format(create_time,'yyyy-MM-dd'),province_id
),
tmp_rp as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,province_id,count(*) refund_payment_count,sum(refund_amount) refund_payment_amountfrom ${APP}.dwd_refund_paymentgroup by date_format(callback_time,'yyyy-MM-dd'),province_id
)
insert overwrite table ${APP}.dws_area_stats_daycount partition(dt)
selectprovince_id,sum(visit_count),sum(login_count),sum(visitor_count),sum(user_count),sum(order_count),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_amount),sum(refund_order_count),sum(refund_order_amount),sum(refund_payment_count),sum(refund_payment_amount),dt
from
(selectdt,province_id,visit_count,login_count,visitor_count,user_count,0 order_count,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_amount,0 refund_order_count,0 refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_vuunion allselectdt,province_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,order_count,order_original_amount,order_final_amount,0 payment_count,0 payment_amount,0 refund_order_count,0 refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_orderunion allselectdt,province_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,0 order_count,0 order_original_amount,0 order_final_amount,payment_count,payment_amount,0 refund_order_count,0 refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_payunion allselectdt,province_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,0 order_count,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_amount,refund_order_count,refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_rounion allselectdt,province_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,0 order_count,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_amount,0 refund_order_count,0 refund_order_amount,refund_payment_count,refund_payment_amountfrom tmp_rp
)t1
group by dt,province_id;
"dws_user_action_daycount="
set hive.exec.dynamic.partition.mode=nonstrict;
with
tmp_login as
(selectdt,user_id,count(*) login_countfrom ${APP}.dwd_page_logwhere user_id is not nulland last_page_id is nullgroup by dt,user_id
),
tmp_cf as
(selectdt,user_id,sum(if(action_id='cart_add',1,0)) cart_count,sum(if(action_id='favor_add',1,0)) favor_countfrom ${APP}.dwd_action_logwhere user_id is not nulland action_id in ('cart_add','favor_add')group by dt,user_id
),
tmp_order as
(selectdate_format(create_time,'yyyy-MM-dd') dt,user_id,count(*) order_count,sum(if(activity_reduce_amount>0,1,0)) order_activity_count,sum(if(coupon_reduce_amount>0,1,0)) order_coupon_count,sum(activity_reduce_amount) order_activity_reduce_amount,sum(coupon_reduce_amount) order_coupon_reduce_amount,sum(original_amount) order_original_amount,sum(final_amount) order_final_amountfrom ${APP}.dwd_order_infogroup by date_format(create_time,'yyyy-MM-dd'),user_id
),
tmp_pay as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,user_id,count(*) payment_count,sum(payment_amount) payment_amountfrom ${APP}.dwd_payment_infogroup by date_format(callback_time,'yyyy-MM-dd'),user_id
),
tmp_ri as
(selectdate_format(create_time,'yyyy-MM-dd') dt,user_id,count(*) refund_order_count,sum(refund_num) refund_order_num,sum(refund_amount) refund_order_amountfrom ${APP}.dwd_order_refund_infogroup by date_format(create_time,'yyyy-MM-dd'),user_id
),
tmp_rp as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,rp.user_id,count(*) refund_payment_count,sum(ri.refund_num) refund_payment_num,sum(rp.refund_amount) refund_payment_amountfrom(selectuser_id,order_id,sku_id,refund_amount,callback_timefrom ${APP}.dwd_refund_payment)rpleft join(selectuser_id,order_id,sku_id,refund_numfrom ${APP}.dwd_order_refund_info)rion rp.order_id=ri.order_idand rp.sku_id=rp.sku_idgroup by date_format(callback_time,'yyyy-MM-dd'),rp.user_id
),
tmp_coupon as
(selectcoalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt) dt,coalesce(coupon_get.user_id,coupon_using.user_id,coupon_used.user_id) user_id,nvl(coupon_get_count,0) coupon_get_count,nvl(coupon_using_count,0) coupon_using_count,nvl(coupon_used_count,0) coupon_used_countfrom(selectdate_format(get_time,'yyyy-MM-dd') dt,user_id,count(*) coupon_get_countfrom ${APP}.dwd_coupon_usewhere get_time is not nullgroup by user_id,date_format(get_time,'yyyy-MM-dd'))coupon_getfull outer join(selectdate_format(using_time,'yyyy-MM-dd') dt,user_id,count(*) coupon_using_countfrom ${APP}.dwd_coupon_usewhere using_time is not nullgroup by user_id,date_format(using_time,'yyyy-MM-dd'))coupon_usingon coupon_get.dt=coupon_using.dtand coupon_get.user_id=coupon_using.user_idfull outer join(selectdate_format(used_time,'yyyy-MM-dd') dt,user_id,count(*) coupon_used_countfrom ${APP}.dwd_coupon_usewhere used_time is not nullgroup by user_id,date_format(used_time,'yyyy-MM-dd'))coupon_usedon nvl(coupon_get.dt,coupon_using.dt)=coupon_used.dtand nvl(coupon_get.user_id,coupon_using.user_id)=coupon_used.user_id
),
tmp_comment as
(selectdate_format(create_time,'yyyy-MM-dd') dt,user_id,sum(if(appraise='1201',1,0)) appraise_good_count,sum(if(appraise='1202',1,0)) appraise_mid_count,sum(if(appraise='1203',1,0)) appraise_bad_count,sum(if(appraise='1204',1,0)) appraise_default_countfrom ${APP}.dwd_comment_infogroup by date_format(create_time,'yyyy-MM-dd'),user_id
),
tmp_od as
(selectdt,user_id,collect_set(named_struct('sku_id',sku_id,'sku_num',sku_num,'order_count',order_count,'activity_reduce_amount',activity_reduce_amount,'coupon_reduce_amount',coupon_reduce_amount,'original_amount',original_amount,'final_amount',final_amount)) order_detail_statsfrom(selectdate_format(create_time,'yyyy-MM-dd') dt,user_id,sku_id,sum(sku_num) sku_num,count(*) order_count,cast(sum(split_activity_amount) as decimal(16,2)) activity_reduce_amount,cast(sum(split_coupon_amount) as decimal(16,2)) coupon_reduce_amount,cast(sum(original_amount) as decimal(16,2)) original_amount,cast(sum(split_final_amount) as decimal(16,2)) final_amountfrom ${APP}.dwd_order_detailgroup by date_format(create_time,'yyyy-MM-dd'),user_id,sku_id)t1group by dt,user_id
)
insert overwrite table ${APP}.dws_user_action_daycount partition(dt)
selectcoalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id,tmp_od.user_id),nvl(login_count,0),nvl(cart_count,0),nvl(favor_count,0),nvl(order_count,0),nvl(order_activity_count,0),nvl(order_activity_reduce_amount,0),nvl(order_coupon_count,0),nvl(order_coupon_reduce_amount,0),nvl(order_original_amount,0),nvl(order_final_amount,0),nvl(payment_count,0),nvl(payment_amount,0),nvl(refund_order_count,0),nvl(refund_order_num,0),nvl(refund_order_amount,0),nvl(refund_payment_count,0),nvl(refund_payment_num,0),nvl(refund_payment_amount,0),nvl(coupon_get_count,0),nvl(coupon_using_count,0),nvl(coupon_used_count,0),nvl(appraise_good_count,0),nvl(appraise_mid_count,0),nvl(appraise_bad_count,0),nvl(appraise_default_count,0),order_detail_stats,coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt,tmp_coupon.dt,tmp_od.dt)
from tmp_login
full outer join tmp_cf
on tmp_login.user_id=tmp_cf.user_id
and tmp_login.dt=tmp_cf.dt
full outer join tmp_order
on coalesce(tmp_login.user_id,tmp_cf.user_id)=tmp_order.user_id
and coalesce(tmp_login.dt,tmp_cf.dt)=tmp_order.dt
full outer join tmp_pay
on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id)=tmp_pay.user_id
and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt)=tmp_pay.dt
full outer join tmp_ri
on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id)=tmp_ri.user_id
and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt)=tmp_ri.dt
full outer join tmp_rp
on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id)=tmp_rp.user_id
and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt)=tmp_rp.dt
full outer join tmp_comment
on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id)=tmp_comment.user_id
and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt)=tmp_comment.dt
full outer join tmp_coupon
on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id)=tmp_coupon.user_id
and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt)=tmp_coupon.dt
full outer join tmp_od
on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id)=tmp_od.user_id
and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt,tmp_coupon.dt)=tmp_od.dt;
"dws_activity_info_daycount="
set hive.exec.dynamic.partition.mode=nonstrict;
with
tmp_order as
(selectdate_format(create_time,'yyyy-MM-dd') dt,activity_rule_id,activity_id,count(*) order_count,sum(split_activity_amount) order_reduce_amount,sum(original_amount) order_original_amount,sum(split_final_amount) order_final_amountfrom ${APP}.dwd_order_detailwhere activity_id is not nullgroup by date_format(create_time,'yyyy-MM-dd'),activity_rule_id,activity_id
),
tmp_pay as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,activity_rule_id,activity_id,count(*) payment_count,sum(split_activity_amount) payment_reduce_amount,sum(split_final_amount) payment_amountfrom(selectactivity_rule_id,activity_id,order_id,split_activity_amount,split_final_amountfrom ${APP}.dwd_order_detailwhere activity_id is not null)odjoin(selectorder_id,callback_timefrom ${APP}.dwd_payment_info)pion od.order_id=pi.order_idgroup by date_format(callback_time,'yyyy-MM-dd'),activity_rule_id,activity_id
)
insert overwrite table ${APP}.dws_activity_info_daycount partition(dt)
selectactivity_rule_id,activity_id,sum(order_count),sum(order_reduce_amount),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_reduce_amount),sum(payment_amount),dt
from
(selectdt,activity_rule_id,activity_id,order_count,order_reduce_amount,order_original_amount,order_final_amount,0 payment_count,0 payment_reduce_amount,0 payment_amountfrom tmp_orderunion allselectdt,activity_rule_id,activity_id,0 order_count,0 order_reduce_amount,0 order_original_amount,0 order_final_amount,payment_count,payment_reduce_amount,payment_amountfrom tmp_pay
)t1
group by dt,activity_rule_id,activity_id;"dws_sku_action_daycount="
set hive.exec.dynamic.partition.mode=nonstrict;
with
tmp_order as
(selectdate_format(create_time,'yyyy-MM-dd') dt,sku_id,count(*) order_count,sum(sku_num) order_num,sum(if(split_activity_amount>0,1,0)) order_activity_count,sum(if(split_coupon_amount>0,1,0)) order_coupon_count,sum(split_activity_amount) order_activity_reduce_amount,sum(split_coupon_amount) order_coupon_reduce_amount,sum(original_amount) order_original_amount,sum(split_final_amount) order_final_amountfrom ${APP}.dwd_order_detailgroup by date_format(create_time,'yyyy-MM-dd'),sku_id
),
tmp_pay as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,sku_id,count(*) payment_count,sum(sku_num) payment_num,sum(split_final_amount) payment_amountfrom ${APP}.dwd_order_detail odjoin(selectorder_id,callback_timefrom ${APP}.dwd_payment_infowhere callback_time is not null)pi on pi.order_id=od.order_idgroup by date_format(callback_time,'yyyy-MM-dd'),sku_id
),
tmp_ri as
(selectdate_format(create_time,'yyyy-MM-dd') dt,sku_id,count(*) refund_order_count,sum(refund_num) refund_order_num,sum(refund_amount) refund_order_amountfrom ${APP}.dwd_order_refund_infogroup by date_format(create_time,'yyyy-MM-dd'),sku_id
),
tmp_rp as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,rp.sku_id,count(*) refund_payment_count,sum(ri.refund_num) refund_payment_num,sum(refund_amount) refund_payment_amountfrom(selectorder_id,sku_id,refund_amount,callback_timefrom ${APP}.dwd_refund_payment)rpleft join(selectorder_id,sku_id,refund_numfrom ${APP}.dwd_order_refund_info)rion rp.order_id=ri.order_idand rp.sku_id=ri.sku_idgroup by date_format(callback_time,'yyyy-MM-dd'),rp.sku_id
),
tmp_cf as
(selectdt,item sku_id,sum(if(action_id='cart_add',1,0)) cart_count,sum(if(action_id='favor_add',1,0)) favor_countfrom ${APP}.dwd_action_logwhere action_id in ('cart_add','favor_add')group by dt,item
),
tmp_comment as
(selectdate_format(create_time,'yyyy-MM-dd') dt,sku_id,sum(if(appraise='1201',1,0)) appraise_good_count,sum(if(appraise='1202',1,0)) appraise_mid_count,sum(if(appraise='1203',1,0)) appraise_bad_count,sum(if(appraise='1204',1,0)) appraise_default_countfrom ${APP}.dwd_comment_infogroup by date_format(create_time,'yyyy-MM-dd'),sku_id
)
insert overwrite table ${APP}.dws_sku_action_daycount partition(dt)
selectsku_id,sum(order_count),sum(order_num),sum(order_activity_count),sum(order_coupon_count),sum(order_activity_reduce_amount),sum(order_coupon_reduce_amount),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_num),sum(payment_amount),sum(refund_order_count),sum(refund_order_num),sum(refund_order_amount),sum(refund_payment_count),sum(refund_payment_num),sum(refund_payment_amount),sum(cart_count),sum(favor_count),sum(appraise_good_count),sum(appraise_mid_count),sum(appraise_bad_count),sum(appraise_default_count),dt
from
(selectdt,sku_id,order_count,order_num,order_activity_count,order_coupon_count,order_activity_reduce_amount,order_coupon_reduce_amount,order_original_amount,order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_orderunion allselectdt,sku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,payment_count,payment_num,payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_payunion allselectdt,sku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,refund_order_count,refund_order_num,refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_riunion allselectdt,sku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,refund_payment_count,refund_payment_num,refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_rpunion allselectdt,sku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,cart_count,favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_cfunion allselectdt,sku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,appraise_good_count,appraise_mid_count,appraise_bad_count,appraise_default_countfrom tmp_comment
)t1
group by dt,sku_id;"dws_coupon_info_daycount="
set hive.exec.dynamic.partition.mode=nonstrict;
with
tmp_cu as
(selectcoalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt,coupon_exprie.dt) dt,coalesce(coupon_get.coupon_id,coupon_using.coupon_id,coupon_used.coupon_id,coupon_exprie.coupon_id) coupon_id,nvl(get_count,0) get_count,nvl(order_count,0) order_count,nvl(payment_count,0) payment_count,nvl(expire_count,0) expire_countfrom(selectdate_format(get_time,'yyyy-MM-dd') dt,coupon_id,count(*) get_countfrom ${APP}.dwd_coupon_usegroup by date_format(get_time,'yyyy-MM-dd'),coupon_id)coupon_getfull outer join(selectdate_format(using_time,'yyyy-MM-dd') dt,coupon_id,count(*) order_countfrom ${APP}.dwd_coupon_usewhere using_time is not nullgroup by date_format(using_time,'yyyy-MM-dd'),coupon_id)coupon_usingon coupon_get.dt=coupon_using.dtand coupon_get.coupon_id=coupon_using.coupon_idfull outer join(selectdate_format(used_time,'yyyy-MM-dd') dt,coupon_id,count(*) payment_countfrom ${APP}.dwd_coupon_usewhere used_time is not nullgroup by date_format(used_time,'yyyy-MM-dd'),coupon_id)coupon_usedon nvl(coupon_get.dt,coupon_using.dt)=coupon_used.dtand nvl(coupon_get.coupon_id,coupon_using.coupon_id)=coupon_used.coupon_idfull outer join(selectdate_format(expire_time,'yyyy-MM-dd') dt,coupon_id,count(*) expire_countfrom ${APP}.dwd_coupon_usewhere expire_time is not nullgroup by date_format(expire_time,'yyyy-MM-dd'),coupon_id)coupon_exprieon coalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt)=coupon_exprie.dtand coalesce(coupon_get.coupon_id,coupon_using.coupon_id,coupon_used.coupon_id)=coupon_exprie.coupon_id
),
tmp_order as
(selectdate_format(create_time,'yyyy-MM-dd') dt,coupon_id,sum(split_coupon_amount) order_reduce_amount,sum(original_amount) order_original_amount,sum(split_final_amount) order_final_amountfrom ${APP}.dwd_order_detailwhere coupon_id is not nullgroup by date_format(create_time,'yyyy-MM-dd'),coupon_id
),
tmp_pay as
(selectdate_format(callback_time,'yyyy-MM-dd') dt,coupon_id,sum(split_coupon_amount) payment_reduce_amount,sum(split_final_amount) payment_amountfrom(selectorder_id,coupon_id,split_coupon_amount,split_final_amountfrom ${APP}.dwd_order_detailwhere coupon_id is not null)odjoin(selectorder_id,callback_timefrom ${APP}.dwd_payment_info)pion od.order_id=pi.order_idgroup by date_format(callback_time,'yyyy-MM-dd'),coupon_id
)
insert overwrite table ${APP}.dws_coupon_info_daycount partition(dt)
selectcoupon_id,sum(get_count),sum(order_count),sum(order_reduce_amount),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_reduce_amount),sum(payment_amount),sum(expire_count),dt
from
(selectdt,coupon_id,get_count,order_count,0 order_reduce_amount,0 order_original_amount,0 order_final_amount,payment_count,0 payment_reduce_amount,0 payment_amount,expire_countfrom tmp_cuunion allselectdt,coupon_id,0 get_count,0 order_count,order_reduce_amount,order_original_amount,order_final_amount,0 payment_count,0 payment_reduce_amount,0 payment_amount,0 expire_countfrom tmp_orderunion allselectdt,coupon_id,0 get_count,0 order_count,0 order_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,payment_reduce_amount,payment_amount,0 expire_countfrom tmp_pay
)t1
group by dt,coupon_id;
"case $1 in"dws_visitor_action_daycount" )hive -e "$dws_visitor_action_daycount";;"dws_user_action_daycount" )hive -e "$dws_user_action_daycount";;"dws_activity_info_daycount" )hive -e "$dws_activity_info_daycount";;"dws_area_stats_daycount" )hive -e "$dws_area_stats_daycount";;"dws_sku_action_daycount" )hive -e "$dws_sku_action_daycount";;"dws_coupon_info_daycount" )hive -e "$dws_coupon_info_daycount";;"all" )hive -e "$dws_visitor_action_daycount$dws_user_action_daycount$dws_activity_info_daycount$dws_area_stats_daycount$dws_sku_action_daycount$dws_coupon_info_daycount";;
esac

(2)增加执行权限

[seven@hadoop102 bin]$ chmod +x dwd_to_dws_init.sh

2)脚本使用

(1)执行脚本

[seven@hadoop102 bin]$ dwd_to_dws_init.sh all 2020-06-14

(2)查看数据是否导入成功

7.2.8 DWS层每日数据装载脚本

1)编写脚本

(1)在/home/seven/bin目录下创建脚本dwd_to_dws.sh

#!/bin/bashAPP=gmall
# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$2" ] ;thendo_date=$2
else do_date=`date -d "-1 day" +%F`
fidws_visitor_action_daycount="insert overwrite table ${APP}.dws_visitor_action_daycount partition(dt='$do_date')
selectt1.mid_id,t1.brand,t1.model,t1.is_new,t1.channel,t1.os,t1.area_code,t1.version_code,t1.visit_count,t3.page_stats
from
(selectmid_id,brand,model,if(array_contains(collect_set(is_new),'0'),'0','1') is_new,--ods_page_log中,同一天内,同一设备的is_new字段,可能全部为1,可能全部为0,也可能部分为0,部分为1(卸载重装),故做该处理collect_set(channel) channel,collect_set(os) os,collect_set(area_code) area_code,collect_set(version_code) version_code,sum(if(last_page_id is null,1,0)) visit_countfrom ${APP}.dwd_page_logwhere dt='$do_date'and last_page_id is nullgroup by mid_id,model,brand
)t1
join
(selectmid_id,brand,model,collect_set(named_struct('page_id',page_id,'page_count',page_count,'during_time',during_time)) page_statsfrom(selectmid_id,brand,model,page_id,count(*) page_count,sum(during_time) during_timefrom ${APP}.dwd_page_logwhere dt='$do_date'group by mid_id,model,brand,page_id)t2group by mid_id,model,brand
)t3
on t1.mid_id=t3.mid_id
and t1.brand=t3.brand
and t1.model=t3.model;"dws_user_action_daycount="
with
tmp_login as
(selectuser_id,count(*) login_countfrom ${APP}.dwd_page_logwhere dt='$do_date'and user_id is not nulland last_page_id is nullgroup by user_id
),
tmp_cf as
(selectuser_id,sum(if(action_id='cart_add',1,0)) cart_count,sum(if(action_id='favor_add',1,0)) favor_countfrom ${APP}.dwd_action_logwhere dt='$do_date'and user_id is not nulland action_id in ('cart_add','favor_add')group by user_id
),
tmp_order as
(selectuser_id,count(*) order_count,sum(if(activity_reduce_amount>0,1,0)) order_activity_count,sum(if(coupon_reduce_amount>0,1,0)) order_coupon_count,sum(activity_reduce_amount) order_activity_reduce_amount,sum(coupon_reduce_amount) order_coupon_reduce_amount,sum(original_amount) order_original_amount,sum(final_amount) order_final_amountfrom ${APP}.dwd_order_infowhere (dt='$do_date'or dt='9999-99-99')and date_format(create_time,'yyyy-MM-dd')='$do_date'group by user_id
),
tmp_pay as
(selectuser_id,count(*) payment_count,sum(payment_amount) payment_amountfrom ${APP}.dwd_payment_infowhere dt='$do_date'group by user_id
),
tmp_ri as
(selectuser_id,count(*) refund_order_count,sum(refund_num) refund_order_num,sum(refund_amount) refund_order_amountfrom ${APP}.dwd_order_refund_infowhere dt='$do_date'group by user_id
),
tmp_rp as
(selectrp.user_id,count(*) refund_payment_count,sum(ri.refund_num) refund_payment_num,sum(rp.refund_amount) refund_payment_amountfrom(selectuser_id,order_id,sku_id,refund_amountfrom ${APP}.dwd_refund_paymentwhere dt='$do_date')rpleft join(selectuser_id,order_id,sku_id,refund_numfrom ${APP}.dwd_order_refund_infowhere dt>=date_add('$do_date',-15))rion rp.order_id=ri.order_idand rp.sku_id=rp.sku_idgroup by rp.user_id
),
tmp_coupon as
(selectuser_id,sum(if(date_format(get_time,'yyyy-MM-dd')='$do_date',1,0)) coupon_get_count,sum(if(date_format(using_time,'yyyy-MM-dd')='$do_date',1,0)) coupon_using_count,sum(if(date_format(used_time,'yyyy-MM-dd')='$do_date',1,0)) coupon_used_countfrom ${APP}.dwd_coupon_usewhere (dt='$do_date' or dt='9999-99-99')and (date_format(get_time, 'yyyy-MM-dd') = '$do_date'or date_format(using_time,'yyyy-MM-dd')='$do_date'or date_format(used_time,'yyyy-MM-dd')='$do_date')group by user_id
),
tmp_comment as
(selectuser_id,sum(if(appraise='1201',1,0)) appraise_good_count,sum(if(appraise='1202',1,0)) appraise_mid_count,sum(if(appraise='1203',1,0)) appraise_bad_count,sum(if(appraise='1204',1,0)) appraise_default_countfrom ${APP}.dwd_comment_infowhere dt='$do_date'group by user_id
),
tmp_od as
(selectuser_id,collect_set(named_struct('sku_id',sku_id,'sku_num',sku_num,'order_count',order_count,'activity_reduce_amount',activity_reduce_amount,'coupon_reduce_amount',coupon_reduce_amount,'original_amount',original_amount,'final_amount',final_amount)) order_detail_statsfrom(selectuser_id,sku_id,sum(sku_num) sku_num,count(*) order_count,cast(sum(split_activity_amount) as decimal(16,2)) activity_reduce_amount,cast(sum(split_coupon_amount) as decimal(16,2)) coupon_reduce_amount,cast(sum(original_amount) as decimal(16,2)) original_amount,cast(sum(split_final_amount) as decimal(16,2)) final_amountfrom ${APP}.dwd_order_detailwhere dt='$do_date'group by user_id,sku_id)t1group by user_id
)
insert overwrite table ${APP}.dws_user_action_daycount partition(dt='$do_date')
selectcoalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id,tmp_od.user_id),nvl(login_count,0),nvl(cart_count,0),nvl(favor_count,0),nvl(order_count,0),nvl(order_activity_count,0),nvl(order_activity_reduce_amount,0),nvl(order_coupon_count,0),nvl(order_coupon_reduce_amount,0),nvl(order_original_amount,0),nvl(order_final_amount,0),nvl(payment_count,0),nvl(payment_amount,0),nvl(refund_order_count,0),nvl(refund_order_num,0),nvl(refund_order_amount,0),nvl(refund_payment_count,0),nvl(refund_payment_num,0),nvl(refund_payment_amount,0),nvl(coupon_get_count,0),nvl(coupon_using_count,0),nvl(coupon_used_count,0),nvl(appraise_good_count,0),nvl(appraise_mid_count,0),nvl(appraise_bad_count,0),nvl(appraise_default_count,0),order_detail_stats
from tmp_login
full outer join tmp_cf on tmp_login.user_id=tmp_cf.user_id
full outer join tmp_order on coalesce(tmp_login.user_id,tmp_cf.user_id)=tmp_order.user_id
full outer join tmp_pay on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id)=tmp_pay.user_id
full outer join tmp_ri on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id)=tmp_ri.user_id
full outer join tmp_rp on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id)=tmp_rp.user_id
full outer join tmp_comment on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id)=tmp_comment.user_id
full outer join tmp_coupon on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id)=tmp_coupon.user_id
full outer join tmp_od on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id)=tmp_od.user_id;
"dws_activity_info_daycount="
with
tmp_order as
(selectactivity_rule_id,activity_id,count(*) order_count,sum(split_activity_amount) order_reduce_amount,sum(original_amount) order_original_amount,sum(split_final_amount) order_final_amountfrom ${APP}.dwd_order_detailwhere dt='$do_date'and activity_id is not nullgroup by activity_rule_id,activity_id
),
tmp_pay as
(selectactivity_rule_id,activity_id,count(*) payment_count,sum(split_activity_amount) payment_reduce_amount,sum(split_final_amount) payment_amountfrom ${APP}.dwd_order_detailwhere (dt='$do_date'or dt=date_add('$do_date',-1))and activity_id is not nulland order_id in(select order_id from ${APP}.dwd_payment_info where dt='$do_date')group by activity_rule_id,activity_id
)
insert overwrite table ${APP}.dws_activity_info_daycount partition(dt='$do_date')
selectactivity_rule_id,activity_id,sum(order_count),sum(order_reduce_amount),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_reduce_amount),sum(payment_amount)
from
(selectactivity_rule_id,activity_id,order_count,order_reduce_amount,order_original_amount,order_final_amount,0 payment_count,0 payment_reduce_amount,0 payment_amountfrom tmp_orderunion allselectactivity_rule_id,activity_id,0 order_count,0 order_reduce_amount,0 order_original_amount,0 order_final_amount,payment_count,payment_reduce_amount,payment_amountfrom tmp_pay
)t1
group by activity_rule_id,activity_id;"dws_sku_action_daycount="
with
tmp_order as
(selectsku_id,count(*) order_count,sum(sku_num) order_num,sum(if(split_activity_amount>0,1,0)) order_activity_count,sum(if(split_coupon_amount>0,1,0)) order_coupon_count,sum(split_activity_amount) order_activity_reduce_amount,sum(split_coupon_amount) order_coupon_reduce_amount,sum(original_amount) order_original_amount,sum(split_final_amount) order_final_amountfrom ${APP}.dwd_order_detailwhere dt='$do_date'group by sku_id
),
tmp_pay as
(selectsku_id,count(*) payment_count,sum(sku_num) payment_num,sum(split_final_amount) payment_amountfrom ${APP}.dwd_order_detailwhere (dt='$do_date'or dt=date_add('$do_date',-1))and order_id in(select order_id from ${APP}.dwd_payment_info where dt='$do_date')group by sku_id
),
tmp_ri as
(selectsku_id,count(*) refund_order_count,sum(refund_num) refund_order_num,sum(refund_amount) refund_order_amountfrom ${APP}.dwd_order_refund_infowhere dt='$do_date'group by sku_id
),
tmp_rp as
(selectrp.sku_id,count(*) refund_payment_count,sum(ri.refund_num) refund_payment_num,sum(refund_amount) refund_payment_amountfrom(selectorder_id,sku_id,refund_amountfrom ${APP}.dwd_refund_paymentwhere dt='$do_date')rpleft join(selectorder_id,sku_id,refund_numfrom ${APP}.dwd_order_refund_infowhere dt>=date_add('$do_date',-15))rion rp.order_id=ri.order_idand rp.sku_id=ri.sku_idgroup by rp.sku_id
),
tmp_cf as
(selectitem sku_id,sum(if(action_id='cart_add',1,0)) cart_count,sum(if(action_id='favor_add',1,0)) favor_countfrom ${APP}.dwd_action_logwhere dt='$do_date'and action_id in ('cart_add','favor_add')group by item
),
tmp_comment as
(selectsku_id,sum(if(appraise='1201',1,0)) appraise_good_count,sum(if(appraise='1202',1,0)) appraise_mid_count,sum(if(appraise='1203',1,0)) appraise_bad_count,sum(if(appraise='1204',1,0)) appraise_default_countfrom ${APP}.dwd_comment_infowhere dt='$do_date'group by sku_id
)
insert overwrite table ${APP}.dws_sku_action_daycount partition(dt='$do_date')
selectsku_id,sum(order_count),sum(order_num),sum(order_activity_count),sum(order_coupon_count),sum(order_activity_reduce_amount),sum(order_coupon_reduce_amount),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_num),sum(payment_amount),sum(refund_order_count),sum(refund_order_num),sum(refund_order_amount),sum(refund_payment_count),sum(refund_payment_num),sum(refund_payment_amount),sum(cart_count),sum(favor_count),sum(appraise_good_count),sum(appraise_mid_count),sum(appraise_bad_count),sum(appraise_default_count)
from
(selectsku_id,order_count,order_num,order_activity_count,order_coupon_count,order_activity_reduce_amount,order_coupon_reduce_amount,order_original_amount,order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_orderunion allselectsku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,payment_count,payment_num,payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_payunion allselectsku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,refund_order_count,refund_order_num,refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_riunion allselectsku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,refund_payment_count,refund_payment_num,refund_payment_amount,0 cart_count,0 favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_rpunion allselectsku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,cart_count,favor_count,0 appraise_good_count,0 appraise_mid_count,0 appraise_bad_count,0 appraise_default_countfrom tmp_cfunion allselectsku_id,0 order_count,0 order_num,0 order_activity_count,0 order_coupon_count,0 order_activity_reduce_amount,0 order_coupon_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_num,0 payment_amount,0 refund_order_count,0 refund_order_num,0 refund_order_amount,0 refund_payment_count,0 refund_payment_num,0 refund_payment_amount,0 cart_count,0 favor_count,appraise_good_count,appraise_mid_count,appraise_bad_count,appraise_default_countfrom tmp_comment
)t1
group by sku_id;"dws_coupon_info_daycount="
with
tmp_cu as
(selectcoupon_id,sum(if(date_format(get_time,'yyyy-MM-dd')='$do_date',1,0)) get_count,sum(if(date_format(using_time,'yyyy-MM-dd')='$do_date',1,0)) order_count,sum(if(date_format(used_time,'yyyy-MM-dd')='$do_date',1,0)) payment_count,sum(if(date_format(expire_time,'yyyy-MM-dd')='$do_date',1,0)) expire_countfrom ${APP}.dwd_coupon_usewhere dt='9999-99-99'or dt='$do_date'group by coupon_id
),
tmp_order as
(selectcoupon_id,sum(split_coupon_amount) order_reduce_amount,sum(original_amount) order_original_amount,sum(split_final_amount) order_final_amountfrom ${APP}.dwd_order_detailwhere dt='$do_date'and coupon_id is not nullgroup by coupon_id
),
tmp_pay as
(selectcoupon_id,sum(split_coupon_amount) payment_reduce_amount,sum(split_final_amount) payment_amountfrom ${APP}.dwd_order_detailwhere (dt='$do_date'or dt=date_add('$do_date',-1))and coupon_id is not nulland order_id in(select order_id from ${APP}.dwd_payment_info where dt='$do_date')group by coupon_id
)
insert overwrite table ${APP}.dws_coupon_info_daycount partition(dt='$do_date')
selectcoupon_id,sum(get_count),sum(order_count),sum(order_reduce_amount),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_reduce_amount),sum(payment_amount),sum(expire_count)
from
(selectcoupon_id,get_count,order_count,0 order_reduce_amount,0 order_original_amount,0 order_final_amount,payment_count,0 payment_reduce_amount,0 payment_amount,expire_countfrom tmp_cuunion allselectcoupon_id,0 get_count,0 order_count,order_reduce_amount,order_original_amount,order_final_amount,0 payment_count,0 payment_reduce_amount,0 payment_amount,0 expire_countfrom tmp_orderunion allselectcoupon_id,0 get_count,0 order_count,0 order_reduce_amount,0 order_original_amount,0 order_final_amount,0 payment_count,payment_reduce_amount,payment_amount,0 expire_countfrom tmp_pay
)t1
group by coupon_id;"dws_area_stats_daycount="
with
tmp_vu as
(selectid province_id,visit_count,login_count,visitor_count,user_countfrom(selectarea_code,count(*) visit_count,--访客访问次数count(user_id) login_count,--用户访问次数,等价于sum(if(user_id is not null,1,0))count(distinct(mid_id)) visitor_count,--访客人数count(distinct(user_id)) user_count--用户人数from ${APP}.dwd_page_logwhere dt='$do_date'and last_page_id is nullgroup by area_code)tmpleft join ${APP}.dim_base_province areaon tmp.area_code=area.area_code
),
tmp_order as
(selectprovince_id,count(*) order_count,sum(original_amount) order_original_amount,sum(final_amount) order_final_amountfrom ${APP}.dwd_order_infowhere dt='$do_date'or dt='9999-99-99'and date_format(create_time,'yyyy-MM-dd')='$do_date'group by province_id
),
tmp_pay as
(selectprovince_id,count(*) payment_count,sum(payment_amount) payment_amountfrom ${APP}.dwd_payment_infowhere dt='$do_date'group by province_id
),
tmp_ro as
(selectprovince_id,count(*) refund_order_count,sum(refund_amount) refund_order_amountfrom ${APP}.dwd_order_refund_infowhere dt='$do_date'group by province_id
),
tmp_rp as
(selectprovince_id,count(*) refund_payment_count,sum(refund_amount) refund_payment_amountfrom ${APP}.dwd_refund_paymentwhere dt='$do_date'group by province_id
)
insert overwrite table ${APP}.dws_area_stats_daycount partition(dt='$do_date')
selectprovince_id,sum(visit_count),sum(login_count),sum(visitor_count),sum(user_count),sum(order_count),sum(order_original_amount),sum(order_final_amount),sum(payment_count),sum(payment_amount),sum(refund_order_count),sum(refund_order_amount),sum(refund_payment_count),sum(refund_payment_amount)
from
(selectprovince_id,visit_count,login_count,visitor_count,user_count,0 order_count,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_amount,0 refund_order_count,0 refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_vuunion allselectprovince_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,order_count,order_original_amount,order_final_amount,0 payment_count,0 payment_amount,0 refund_order_count,0 refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_orderunion allselectprovince_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,0 order_count,0 order_original_amount,0 order_final_amount,payment_count,payment_amount,0 refund_order_count,0 refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_payunion allselectprovince_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,0 order_count,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_amount,refund_order_count,refund_order_amount,0 refund_payment_count,0 refund_payment_amountfrom tmp_rounion allselectprovince_id,0 visit_count,0 login_count,0 visitor_count,0 user_count,0 order_count,0 order_original_amount,0 order_final_amount,0 payment_count,0 payment_amount,0 refund_order_count,0 refund_order_amount,refund_payment_count,refund_payment_amountfrom tmp_rp
)t1
group by province_id;"case $1 in"dws_visitor_action_daycount" )hive -e "$dws_visitor_action_daycount";;"dws_user_action_daycount" )hive -e "$dws_user_action_daycount";;"dws_activity_info_daycount" )hive -e "$dws_activity_info_daycount";;"dws_area_stats_daycount" )hive -e "$dws_area_stats_daycount";;"dws_sku_action_daycount" )hive -e "$dws_sku_action_daycount";;"dws_coupon_info_daycount" )hive -e "$dws_coupon_info_daycount";;"all" )hive -e "$dws_visitor_action_daycount$dws_user_action_daycount$dws_activity_info_daycount$dws_area_stats_daycount$dws_sku_action_daycount$dws_coupon_info_daycount";;
esac

(2)增加执行权限

[seven@hadoop102 bin]$ chmod +x dwd_to_dws.sh

2)脚本使用

(1)执行脚本

[seven@hadoop102 bin]$ dwd_to_dws.sh all 2020-06-14

(2)查看数据是否导入成功

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/web/41733.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

昇思25天训练营Day11 - 基于 MindSpore 实现 BERT 对话情绪识别

模型简介 BERT全称是来自变换器的双向编码器表征量&#xff08;Bidirectional Encoder Representations from Transformers&#xff09;&#xff0c;它是Google于2018年末开发并发布的一种新型语言模型。与BERT模型相似的预训练语言模型例如问答、命名实体识别、自然语言推理、…

56、最近邻向量量化(LVQ) 网络训练对输入向量进行分类

1、LVQ 网络训练对输入向量进行分类简介 1&#xff09;简介 LVQ&#xff08;最近邻向量量化&#xff09;是一种简单而有效的神经网络模型&#xff0c;用于对输入向量进行分类。LVQ网络通过学习一组原型向量&#xff08;也称为代码矢量或参考向量&#xff09;&#xff0c;来表…

SAP Build4-office 操作

1. 邮件操作 1.1 前期准备 商店中找到outlook的sdk&#xff0c;添加到build中 在process中添加outlook的SDK 电脑上装了outlook的邮箱并且已经登录 我用个人foxmail邮箱向outlook发了一封带附件的销售订单邮件&#xff0c;就以此作为例子 1.2 搜索邮件 搜索有两层&…

计算机视觉、目标检测、视频分析的过去和未来:目标检测从入门到精通 ------ YOLOv8 到 多模态大模型处理视觉基础任务

文章大纲 计算机视觉项目的关键步骤计算机视觉项目核心内容概述步骤1: 确定项目目标步骤2:数据收集和数据标注步骤3:数据增强和拆分数据集步骤4:模型训练步骤5:模型评估和模型微调步骤6:模型测试步骤7:模型部署常见问题目标检测入门什么是目标检测目标检测算法的分类一阶…

CSS实现图片裁剪居中(只截取剪裁图片中间部分,图片不变形)

1.第一种方式&#xff1a;&#xff08;直接给图片设置&#xff1a;object-fit:cover;&#xff09; .imgbox{width: 100%;height:200px;overflow: hidden;position: relative;img{width: 100%;height: 100%; //图片要设置高度display: block;position: absolute;left: 0;right…

基于Java+SpringMvc+Vue技术的在线学习交流平台的设计与实现---60页论文参考

博主介绍&#xff1a;硕士研究生&#xff0c;专注于Java技术领域开发与管理&#xff0c;以及毕业项目实战✌ 从事基于java BS架构、CS架构、c/c 编程工作近16年&#xff0c;拥有近12年的管理工作经验&#xff0c;拥有较丰富的技术架构思想、较扎实的技术功底和资深的项目管理经…

AI+若依框架(低代码开发)

提前说明&#xff1a; 文章是实时更新&#xff0c;写了就会更。 文章是黑马视频的笔记&#xff0c;如果要自己学可以点及下面的链接&#xff1a; https://www.bilibili.com/video/BV1pf421B71v/一、若依介绍 1.版本介绍 若依为满足多样化的开发需求&#xff0c;提供了多个版本…

基于jeecgboot-vue3的Flowable流程-集成仿钉钉流程(一)图标svgicon的使用

因为这个项目license问题无法开源&#xff0c;更多技术支持与服务请加入我的知识星球。 1、lowflow这里使用了tsx的动态图标&#xff0c;如下&#xff1a; import ./index.scss import type { CSSProperties, PropType } from vue import { computed, defineComponent, resolv…

MATLAB基础应用精讲-【数模应用】 岭回归(Ridge)(附MATLAB、python和R语言代码实现)

目录 前言 算法原理 数学模型 Ridge 回归的估计量 Ridge 回归与标准多元线性回归的比较 3. Ridge 参数的选择 算法步骤 SPSSPRO 1、作用 2、输入输出描述 3、案例示例 4、案例数据 5、案例操作 6、输出结果分析 7、注意事项 8、模型理论 SPSSAU 岭回归分析案…

支付宝沙箱对接(GO语言)

支付宝沙箱对接 1.1 官网1.2 秘钥生成&#xff08;系统默认&#xff09;1.3 秘钥生成&#xff08;软件生成&#xff09;1.4 golan 安装 SDK1.5 GoLand 代码1.6 前端代码 1.1 官网 沙箱官网: https://open.alipay.com/develop/sandbox/app 秘钥用具下载&#xff1a; https://ope…

并行处理百万个文件的解析和追加

处理和解析大量文件&#xff0c;尤其是百万级别的文件&#xff0c;是一个复杂且资源密集的任务。为实现高效并行处理&#xff0c;可以使用Python中的多种并行和并发编程工具&#xff0c;比如multiprocessing、concurrent.futures模块以及分布式计算框架如Dask和Apache Spark。这…

Mysql系列-Binlog主从同步

原文链接&#xff1a;https://zhuanlan.zhihu.com/p/669450627 一、主从同步概述 mysql主从同步&#xff0c;即MySQL Replication,可以实现将数据从一台数据库服务器同步到多台数据库服务器。MySQL数据库自带主 从同步功能&#xff0c;经过配置&#xff0c;可以实现基于库、表…

B端设计:任何不顾及用户体验的设计,都是在装样子,花架子

B端设计是指面向企业客户的设计&#xff0c;通常涉及产品、服务或系统的界面和功能设计。与C端设计不同&#xff0c;B端设计更注重实用性和专业性&#xff0c;因为它直接影响企业的效率和利益。 在B端设计中&#xff0c;用户体验同样至关重要。不顾及用户体验的设计只是空洞的表…

经典的layui框架,还有人用吗?令人惋惜。

自从layui官网宣布关闭之后&#xff0c;layui框架的用户飞速下滑&#xff0c;以至于到现在贝格前端工场承接的项目中&#xff0c;鲜有要求使用layui框架的&#xff0c;那么个框架还有人用吗&#xff1f; 一、layui没落是不是jquery惹的祸 layui的没落与jQuery无关。layui框架…

Hi3861 OpenHarmony嵌入式应用入门--UDP Server

本篇使用的是lwip编写udp服务端。需要提前准备好一个PARAM_HOTSPOT_SSID宏定义的热点&#xff0c;并且密码为PARAM_HOTSPOT_PSK。 修改网络参数 在Hi3861开发板上运行上述四个测试程序之前&#xff0c;需要根据你的无线路由、Linux系统IP修改 net_params.h文件的相关代码&…

起底:Three.js和Cesium.js,二者异同点,好比全科和专科.

Three.js和Cesium.js是两个常用的webGL引擎&#xff0c;很多小伙伴容易把它们搞混淆了&#xff0c;今天威斯数据来详细介绍一下&#xff0c;他们的起源、不同点和共同点&#xff0c;阅读后你就发现二者就像全科医院和专科医院的关系&#xff0c;很好识别。 一、二者的起源 Th…

性能测试相关理解---性能测试流程(二)

六、性能测试流程&#xff08;如何做性能测试&#xff1f;) 根据学习全栈测试博主的课程做的笔记 1、前期准备– 项目初期就开始&#xff0c;业务需求评审时尽量参与,对业务更深刻的认识&#xff08;确定哪些是核心业务、哪些可能存在并发请求、确定什么地方会出现瓶颈,方便后…

WebOffice在线编微软Offfice,并以二进制流的形式打开Word文档

在日常办公场景中&#xff0c;我们经常会遇到这种场景&#xff1a;我们的合同管理系统的各种Word,excel,ppt数据都是以二进制数组的形式存储在数据库中&#xff0c;如何从数据库中读取二进制数据&#xff0c;以二进制数据作为参数&#xff0c;然后加载到浏览器的Office窗口&…

【无标题】地平线2西之绝境/Horizon Forbidden West™ Complete Edition(更新:V1.3.57)

游戏介绍 与埃洛伊同行&#xff0c;在危险壮美的边疆之地揭开种种未知的神秘威胁。此完整版可完整享受广受好评的《地平线 西之绝境™》内容和额外内容&#xff0c;包括在主线游戏后展开的后续故事“炙炎海岸”。 重返《地平线》中遥远未来的后末日世界&#xff0c;探索远方的土…

Twitter群发消息API接口的功能?如何配置?

Twitter群发消息API接口怎么申请&#xff1f;如何使用API接口&#xff1f; 为了方便企业和开发者有效地与用户互动&#xff0c;Twitter提供了各种API接口&#xff0c;其中Twitter群发消息API接口尤为重要。AokSend将详细介绍Twitter群发消息API接口的功能及其应用场景。 Twit…