增量数据采集,目前实现的方式是hive中按某个字段创建分区表,
insert override的时候where语句带上对应的增量过滤条件。
我一般选取日期字段ETL_DATE。
hive建立分区表,hql如下:
CREATE TABLE IF NOT EXISTS product_sell(
category_id BIGINT,
province_id BIGINT,
product_id BIGINT,
price DOUBLE,
sell_num BIGINT
)
PARTITIONED BY (ds string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’
STORED AS TEXTFILE;
然后以日期作为分区依据,插入数据,shell脚本如下:
hive -e “hql”;