flink iceberg写数据到hdfs,hive同步读取

目录

1、组件版本

环境变量配置

2、hadoop配置

hadoop-env.sh

core-site.xml

hdfs-site.xml

mapred-site.xml

yarn-site.xml

3、hive配置

hive-env.sh

hive-site.xml

HIVE LIB 原始JAR

4、flink配置集成HDFS和YARN

修改iceberg源码

为了兼容HIVE 4.0.1

config.yaml

Flink Lib目录

TEZ

/cluster/tez/conf/tez-site.xml

TEZ LIB

启动程序 

docker启动postgresql数据库

 Hive命令行执行

FlinkSQL执行命令


1、组件版本

名称版本
hadoop3.4.1
flink1.20.1
hive4.0.1
kafka3.9.0
zookeeper3.9.3
tez0.10.4
spark(hadoop3)3.5.4
jdk11.0.13
maven3.9.9

环境变量配置

vim编辑保存后,要执行source /etc/profile

LD_LIBRARY_PATH=/usr/local/lib

export LD_LIBRARY_PATH

# Java环境

export JAVA_HOME=/cluster/jdk

export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar

export TEZ_HOME=/cluster/tez/

export TEZ_CONF_DIR=$TEZ_HOME/conf

export TEZ_JARS=$TEZ_HOME/*.jar:$TEZ_HOME/lib/*.jar

# Hadoop生态

export HADOOP_HOME=/cluster/hadoop3

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

HADOOP_CLASSPATH=`hadoop classpath`

export HADOOP_CLASSPATH=$TEZ_CONF_DIR:$TEZ_JARS:$HADOOP_CLASSPATH

export HDFS_NAMENODE_USER=root

export HDFS_DATANODE_USER=root

export HDFS_SECONDARYNAMENODE_USER=root

export YARN_RESOURCEMANAGER_USER=root

export YARN_NODEMANAGER_USER=root

# Hive配置

export HIVE_HOME=/cluster/hive

export HIVE_CONF_DIR=$HIVE_HOME/conf

# Spark配置

export SPARK_HOME=/cluster/spark

export SPARK_LOCAL_IP=10.10.10.99

export SPARK_CONF_DIR=$SPARK_HOME/conf

# Flink配置

export FLINK_HOME=/cluster/flink

# ZooKeeper/Kafka

export ZOOKEEPER_HOME=/cluster/zookeeper

export KAFKA_HOME=/cluster/kafka

# 其他工具

export FLUME_HOME=/cluster/flume

export M2_HOME=/cluster/maven

# 动态链接库

export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native/:$LD_LIBRARY_PATH

# 环境变量合并

export PATH=$PATH:$HIVE_HOME/bin:$JAVA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$M2_HOME/bin:$FLINK_HOME/bin:$ZOOKEEPER_HOME/bin

export LC_ALL=zh_CN.UTF-8

export LANG=zh_CN.UTF-8

2、hadoop配置

hadoop-env.sh

#

# Licensed to the Apache Software Foundation (ASF) under one

# or more contributor license agreements.  See the NOTICE file

# distributed with this work for additional information

# regarding copyright ownership.  The ASF licenses this file

# to you under the Apache License, Version 2.0 (the

# "License"); you may not use this file except in compliance

# with the License.  You may obtain a copy of the License at

#

#     http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

# Set Hadoop-specific environment variables here.

##

## THIS FILE ACTS AS THE MASTER FILE FOR ALL HADOOP PROJECTS.

## SETTINGS HERE WILL BE READ BY ALL HADOOP COMMANDS.  THEREFORE,

## ONE CAN USE THIS FILE TO SET YARN, HDFS, AND MAPREDUCE

## CONFIGURATION OPTIONS INSTEAD OF xxx-env.sh.

##

## Precedence rules:

##

## {yarn-env.sh|hdfs-env.sh} > hadoop-env.sh > hard-coded defaults

##

## {YARN_xyz|HDFS_xyz} > HADOOP_xyz > hard-coded defaults

##

# Many of the options here are built from the perspective that users

# may want to provide OVERWRITING values on the command line.

# For example:

#

#  JAVA_HOME=/usr/java/testing hdfs dfs -ls

#

# Therefore, the vast majority (BUT NOT ALL!) of these defaults

# are configured for substitution and not append.  If append

# is preferable, modify this file accordingly.

###

# Generic settings for HADOOP

###

# Technically, the only required environment variable is JAVA_HOME.

# All others are optional.  However, the defaults are probably not

# preferred.  Many sites configure these options outside of Hadoop,

# such as in /etc/profile.d

# The java implementation to use. By default, this environment

# variable is REQUIRED on ALL platforms except OS X!

# export JAVA_HOME=

# Location of Hadoop.  By default, Hadoop will attempt to determine

# this location based upon its execution path.

# export HADOOP_HOME=

# Location of Hadoop's configuration information.  i.e., where this

# file is living. If this is not defined, Hadoop will attempt to

# locate it based upon its execution path.

#

# NOTE: It is recommend that this variable not be set here but in

# /etc/profile.d or equivalent.  Some options (such as

# --config) may react strangely otherwise.

#

# export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop

# The maximum amount of heap to use (Java -Xmx).  If no unit

# is provided, it will be converted to MB.  Daemons will

# prefer any Xmx setting in their respective _OPT variable.

# There is no default; the JVM will autoscale based upon machine

# memory size.

# export HADOOP_HEAPSIZE_MAX=

# The minimum amount of heap to use (Java -Xms).  If no unit

# is provided, it will be converted to MB.  Daemons will

# prefer any Xms setting in their respective _OPT variable.

# There is no default; the JVM will autoscale based upon machine

# memory size.

# export HADOOP_HEAPSIZE_MIN=

# Enable extra debugging of Hadoop's JAAS binding, used to set up

# Kerberos security.

# export HADOOP_JAAS_DEBUG=true

# Extra Java runtime options for all Hadoop commands. We don't support

# IPv6 yet/still, so by default the preference is set to IPv4.

# export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"

# For Kerberos debugging, an extended option set logs more information

# export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true -Dsun.security.spnego.debug"

# Some parts of the shell code may do special things dependent upon

# the operating system.  We have to set this here. See the next

# section as to why....

export HADOOP_OS_TYPE=${HADOOP_OS_TYPE:-$(uname -s)}

# Extra Java runtime options for some Hadoop commands

# and clients (i.e., hdfs dfs -blah).  These get appended to HADOOP_OPTS for

# such commands.  In most cases, # this should be left empty and

# let users supply it on the command line.

# export HADOOP_CLIENT_OPTS=""

#

# A note about classpaths.

#

# By default, Apache Hadoop overrides Java's CLASSPATH

# environment variable.  It is configured such

# that it starts out blank with new entries added after passing

# a series of checks (file/dir exists, not already listed aka

# de-deduplication).  During de-deduplication, wildcards and/or

# directories are *NOT* expanded to keep it simple. Therefore,

# if the computed classpath has two specific mentions of

# awesome-methods-1.0.jar, only the first one added will be seen.

# If two directories are in the classpath that both contain

# awesome-methods-1.0.jar, then Java will pick up both versions.

# An additional, custom CLASSPATH. Site-wide configs should be

# handled via the shellprofile functionality, utilizing the

# hadoop_add_classpath function for greater control and much

# harder for apps/end-users to accidentally override.

# Similarly, end users should utilize ${HOME}/.hadooprc .

# This variable should ideally only be used as a short-cut,

# interactive way for temporary additions on the command line.

# export HADOOP_CLASSPATH="/some/cool/path/on/your/machine"

# Should HADOOP_CLASSPATH be first in the official CLASSPATH?

# export HADOOP_USER_CLASSPATH_FIRST="yes"

# If HADOOP_USE_CLIENT_CLASSLOADER is set, the classpath along

# with the main jar are handled by a separate isolated

# client classloader when 'hadoop jar', 'yarn jar', or 'mapred job'

# is utilized. If it is set, HADOOP_CLASSPATH and

# HADOOP_USER_CLASSPATH_FIRST are ignored.

# export HADOOP_USE_CLIENT_CLASSLOADER=true

# HADOOP_CLIENT_CLASSLOADER_SYSTEM_CLASSES overrides the default definition of

# system classes for the client classloader when HADOOP_USE_CLIENT_CLASSLOADER

<

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/web/75456.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

qq邮箱群发程序

1.界面设计 1.1 环境配置 在外部工具位置进行配置 1.2 UI界面设计 1.2.1 进入QT的UI设计界面 在pycharm中按顺序点击&#xff0c;进入UI编辑界面&#xff1a; 点击第三步后进入QT的UI设计界面&#xff0c;通过点击按钮进行界面设计&#xff0c;设计后进行保存到当前Pycharm…

【C++游戏引擎开发】第10篇:AABB/OBB碰撞检测

一、AABB(轴对齐包围盒) 1.1 定义 ​最小点: m i n = ( x min , y min , z min ) \mathbf{min} = (x_{\text{min}}, y_{\text{min}}, z_{\text{min}}) min=(xmin​,ymin​,zmin​)​最大点: m a x = ( x max , y max , z max ) \mathbf{max} = (x_{\text{max}}, y_{\text{…

大模型是如何把向量解码成文字输出的

hidden state 向量 当我们把一句话输入模型后&#xff0c;例如 “Hello world”&#xff1a; token IDs: [15496, 995]经过 Embedding Transformer 层后&#xff0c;会得到每个 token 的中间表示&#xff0c;形状为&#xff1a; hidden_states: (batch_size, seq_len, hidd…

C++指针(三)

个人主页:PingdiGuo_guo 收录专栏&#xff1a;C干货专栏 文章目录 前言 1.字符指针 1.1字符指针的概念 1.2字符指针的用处 1.3字符指针的操作 1.3.1定义 1.3.2初始化 1.4字符指针使用注意事项 2.数组参数&#xff0c;指针参数 2.1数组参数 2.1.1数组参数的概念 2.1…

生命篇---心肺复苏、AED除颤仪使用、海姆立克急救法、常见情况急救简介

生命篇—心肺复苏、AED除颤仪使用、海姆立克急救法、常见情况急救简介 文章目录 生命篇---心肺复苏、AED除颤仪使用、海姆立克急救法、常见情况急救简介一、前言二、急救1、心肺复苏&#xff08;CPR&#xff09;&#xff08;1&#xff09;适用情况&#xff08;2&#xff09;操作…

基于神经环路的神经调控可增强遗忘型轻度认知障碍患者的延迟回忆能力

简要总结 这篇文章提出了一种名为CcSi-MHAHGEL的框架&#xff0c;用于基于多站点、多图谱fMRI的功能连接网络&#xff08;FCN&#xff09;分析&#xff0c;以辅助自闭症谱系障碍&#xff08;ASD&#xff09;的识别。该框架通过多视图超边感知的超图嵌入学习方法&#xff0c;整合…

[WUSTCTF2020]level1

关键知识点&#xff1a;for汇编 ida64打开&#xff1a; 00400666 55 push rbp .text:0000000000400667 48 89 E5 mov rbp, rsp .text:000000000040066A 48 83 EC 30 sub rsp, 30h .text:000000…

cpp自学 day20(文件操作)

基本概念 程序运行时产生的数据都属于临时数据&#xff0c;程序一旦运行结束都会被释放 通过文件可以将数据持久化 C中对文件操作需要包含头文件 <fstream> 文件类型分为两种&#xff1a; 文本文件 - 文件以文本的ASCII码形式存储在计算机中二进制文件 - 文件以文本的…

Gartner发布软件供应链安全市场指南:软件供应链安全工具的8个强制功能、9个通用功能及全球29家供应商

攻击者的目标是由开源和商业软件依赖项、第三方 API 和 DevOps 工具链组成的软件供应链。软件工程领导者可以使用软件供应链安全工具来保护他们的软件免受这些攻击的连锁影响。 主要发现 越来越多的软件工程团队现在负责解决软件供应链安全 (SSCS) 需求。 软件工件、开发人员身…

备赛蓝桥杯-Python-考前突击

额&#xff0c;&#xff0c;离蓝桥杯开赛还有十个小时&#xff0c;最近因为考研复习节奏的问题&#xff0c;把蓝桥杯的优先级后置了&#xff0c;突然才想起来还有一个蓝桥杯呢。。 到目前为止python基本语法熟练了&#xff0c;再补充一些常用函数供明天考前再背背&#xff0c;算…

榕壹云外卖跑腿系统:基于Spring Boot+MySQL+UniApp的智慧生活服务平台

项目背景与需求分析 随着本地生活服务需求的爆发式增长&#xff0c;外卖、跑腿等即时配送服务成为现代都市的刚性需求。传统平台存在开发成本高、功能定制受限等问题&#xff0c;中小企业及创业团队极需一款轻量级、可快速部署且支持二次开发的外卖跑腿系统。榕壹云外卖跑腿系统…

使用Docker安装Gogs

1、拉取镜像 docker pull gogs/gogs 2、运行容器 # 创建/var/gogs目录 mkdir -p /var/gogs# 运行容器 # -d&#xff0c;后台运行 # -p&#xff0c;端口映射&#xff1a;(宿主机端口:容器端口)->(10022:22)和(10880:3000) # -v&#xff0c;数据卷映射&#xff1a;(宿主机目…

【antd + vue】Modal 对话框:修改弹窗标题样式、Modal.confirm自定义使用

一、标题样式 1、目标样式&#xff1a;修改弹窗标题样式 2、问题&#xff1a; 直接在对应css文件中修改样式不生效。 3、原因分析&#xff1a; 可能原因&#xff1a; 选择器权重不够&#xff0c;把在控制台找到的选择器直接复制下来&#xff0c;如果还不够就再加&#xff…

Streamlit在测试领域中的应用:构建自动化测试报告生成器

引言 Streamlit 在开发大模型AI测试工具方面具有显著的重要性&#xff0c;尤其是在简化开发流程、增强交互性以及促进快速迭代等方面。以下是几个关键点&#xff0c;说明了 Streamlit 对于构建大模型AI测试工具的重要性&#xff1a; 1. 快速原型设计和迭代 对于大模型AI测试…

docker 运行自定义化的服务-后端

docker 运行自定义化的服务-前端-CSDN博客 运行自定义化的后端服务 具体如下&#xff1a; ①打包后端项目&#xff0c;形成jar包 ②编写dockerfile文件&#xff0c;文件内容如下&#xff1a; # 使用官方 OpenJDK 镜像 FROM jdk8:1.8LABEL maintainer"ATB" version&…

解决java使用easyexcel填充模版后,高度不一致问题

自定义工具&#xff0c;可以通过获取上一行行高设置后面所以行的高度 package org.springblade.modules.api.utils;import com.alibaba.excel.write.handler.RowWriteHandler; import com.alibaba.excel.write.metadata.holder.WriteSheetHolder; import com.alibaba.excel.wr…

repo仓库文件清理

1. repo 仓库内文件清理 # 清理所有Git仓库中的项目 repo forall -c git clean -dfx # 重置所有Git 仓库中的项目 repo forall -c git reset --hard 解释&#xff1a; repo forall -c git clean -dfx&#xff1a; repo forall 是一个用于在所有项目中执行命令的工具。-c 后…

结合大语言模型整理叙述并生成思维导图的思路

楔子 我比较喜欢长篇大论。这在代理律师界被视为一种禁忌。 我高中一年级的时候因为入学成绩好&#xff08;所在县榜眼名次&#xff09;&#xff0c;直接被所在班的班主任任命为班长。我其实不喜欢这个岗位。因为老师一来就要提前注意到&#xff0c;要及时喊“起立”、英语课…

spark-core编程2

Key-Value类型&#xff1a; foldByKey 当分区内计算规则和分区间计算规则相同时&#xff0c;aggregateByKey 就可以简化为 foldByKey combineByKey 最通用的对 key-value 型 rdd 进行聚集操作的聚集函数&#xff08;aggregation function&#xff09;。类似于aggregate()&…

原理图设计准备:页面栅格模板应用设置

一、页面大小的设置 &#xff08;1&#xff09;单页原理图页面设置 首先&#xff0c;选中需要更改页面尺寸的那一页原理图&#xff0c;鼠标右键&#xff0c;选择“Schmatic Page Properties”选项&#xff0c;进行页面大小设置。 &#xff08;2&#xff09;对整个原理图页面设…