目前还有很多公司基于HDP来构建自己的大数据平台,随着Apache Kyuubi的持续热度,如何基于原有的HDP产品来集成Apache Kyuubi,很多人都迫切的需求。集成Apache Kyuubi到HDP中,主要涉及Ambari的二次开发。本文详细叙述了集成Apache Kyuubi的实践过程。
1、集成版本信息
服务 | 版本 |
九尾 | 1.4.1-孵化 |
安巴里 | 2.7.3 |
高清图 | 3.1.0 |
操作系统 | CentOS 7.4.1708 |
背景:基于HDP 3.1.0版本,我司完成了Apache版本的所有组件替换,然后集成了Apache Kyuubi。
集成Apache的主要组件版本信息如下
服务 | 版本 |
高清文件系统 | 3.3.0 |
纱 | 3.3.0 |
MapReduce2 | 3.3.0 |
蜂巢 | 3.1.2 |
火花 | 3.1.1 |
2、集成步骤
自定义组件添加分为两大部分,一部分是需要将组件的可文件打包成RPM,另一部分是在Ambari 中添加组件的配置信息,启动脚本等。
2.1 制作RPM包
使用Ambari 安装或集成大数据组件时,需要将组件格式制作成 rpm 格式。
2.1.1 下载并解压Apache Kyuubi
下载地址:https://kyuubi.apache.org/releases.html
在本次集成中我们选择的是1.4.1-incubating版本。
执行tar zxf apache-kyuubi-1.4.1-incubating-bin.tgz,Kyuubi安装包结构简介
apache-kyuubi-1.4.1-incubating-bin
├── DISCLAIMER
├── LICENSE
├── NOTICE
├── RELEASE
├── beeline-jars
├── bin
├── conf
| ├── kyuubi-defaults.conf.template
│ ├── kyuubi-env.sh.template
│ └── log4j2.properties.template
├── docker
│ ├── Dockerfile
│ ├── helm
│ ├── kyuubi-configmap.yaml
│ ├── kyuubi-deployment.yaml
│ ├── kyuubi-pod.yaml
│ └── kyuubi-service.yaml
├── externals
│ └── engines
├── jars
├── licenses
├── logs
├── pid
└── work
2.1.2 创建RPM包制作环境
安装 rpm-build 包
yum install rpm-build
安装 rpmdevtools
yum install rpmdevtools
创建工作空间
rpmdev-setuptree
安装工作空间简介
2.1.3 制作RPM包
2.1.3.1 编辑Spec文件
制作rpm包需要用spec格式的文件。根据Kyuubi解压后的目录结构,需要自行编辑spec文件以包含所有需要的目录及文件。部分截图概要
%description
kyuubi
%files
%dir %attr(0755, root, root) "/usr/hdp/3.1.0.0-78/kyuubi"
%attr(0644, root, root) "/usr/hdp/3.1.0.0-78/kyuubi/DISCLAIMER"
%attr(0644, root, root) "/usr/hdp/3.1.0.0-78/kyuubi/LICENSE"
%attr(0644, root, root) "/usr/hdp/3.1.0.0-78/kyuubi/NOTICE"
%attr(0644, root, root) "/usr/hdp/3.1.0.0-78/kyuubi/RELEASE"
%dir %attr(0777, root, root) "/usr/hdp/3.1.0.0-78/kyuubi/beeline-jars"
%dir %attr(0777, root, root) "/usr/hdp/3.1.0.0-78/kyuubi/logs"
%dir %attr(0777, root, root) "/usr/hdp/3.1.0.0-78/kyuubi/pid"
%dir %attr(0777, root, root) "/usr/hdp/3.1.0.0-78/kyuubi/work"
%changelog
2.1.3.2 更换文件
创建目录
cd /root/rpmbuild/BUILDROOT
mkdir -p /root/rpmbuild/BUILDROOT/kyuubi_3_1_0_0_78-2.3.2.3.1.0.0-78.x86_64/usr/hdp/3.1.0.0-78/kyuubi
转到apache-kyuubi-1.4.1-incubating-bin.tgz解压后目录
cp -r * /root/rpmbuild/BUILDROOT/kyuubi_3_1_0_0_78-2.3.2.3.1.0.0-78.x86_64/usr/hdp/3.1.0.0-78/kyuubi
将kyuubi.spec文件放到/root/rpmbuild/SPECS文件夹中
cp kyuubi.spec /root/rpmbuild/SPECS/
2.1.3.3 执行打包。
cd /root/rpmbuild/SPECS
rpmbuild -ba kyuubi.spec
查看 rpm 包
打包好的rpm包在/root/rpmbuild/RPMS虚拟机
2.1.4更新YUM源
将上步骤中生成rpm的目标yum源中对应目录,保持完成后执行更新操作
createrepo --update ./
2.2 Ambari集成Apache Kyuubi
2.2.1 公民服务目录结构
在Ambari中添加自定义服务,需要配置文件以Spark详细说明
配置
该目录下存放的是spark的属性配置文件,对应Ambari页面的属性配置页面,可以设置默认值,类型,描述等信息。
包/脚本
该目录下存放服务操作相关的脚本,如服务启动,服务停止,服务检查等。
包/模板
该目录可存放,存放的是组件属性的配置信息,和配置目录下的配置对应,这个关系是如果我们在Ambari页面修改了属性信息,则修改信息会自动填充该目录下的文件的属性,所以,这个目录下的属性是最新的,并且是服务要调用
包裹/警报
该程序存放告急配置,例如程序断网、运行时告急
快速链接
该目录下存放的是快速链接配置,Ambari页面通过该配置可以跳转到我们想要跳转的页面。
指标.json
用来配置指标相关配置
kerberos.json
用来配置kerberos认证
元信息文件
这个文件很重要,主要是配置服务名称、服务类型、服务操作脚本、服务组件、指标以及快速链接等信息。
2.2.2 添加Kyuubi组件
在司的应用场景中,主要是用Kyuubi来替换Spark Thrift Server,依赖于Spark服务中集成Kyuubi组件。实现如下图所示的效果
2.2.2.1 编辑metainfo.xml
在ambari-server/src/main/resources/stacks/HDP/3.0/services/SPARK目录中首先编辑metainfo.xml,为spark服务添加Kyuubi组件
<component>
<name>KYUUBI_SEVER</name>
<displayName>KYUUBI SEVER</displayName>
<category>SLAVE</category>
<cardinality>0+</cardinality>
<versionAdvertised>true</versionAdvertised>
<dependencies>
<dependency>
<name>HDFS/HDFS_CLIENT</name>
<scope>host</scope>
<auto-deploy>
<enabled>true</enabled>
</auto-deploy>
</dependency>
<dependency>
<name>MAPREDUCE2/MAPREDUCE2_CLIENT</name>
<scope>host</scope>
<auto-deploy>
<enabled>true</enabled>
</auto-deploy>
</dependency>
<dependency>
<name>YARN/YARN_CLIENT</name>
<scope>host</scope>
<auto-deploy>
<enabled>true</enabled>
</auto-deploy>
</dependency>
<dependency>
<name>SPARK/SPARK_CLIENT</name>
<scope>host</scope>
<auto-deploy>
<enabled>true</enabled>
</auto-deploy>
</dependency>
<dependency>
<name>HIVE/HIVE_METASTORE</name>
<scope>cluster</scope>
<auto-deploy>
<enabled>true</enabled>
</auto-deploy>
</dependency>
</dependencies>
<commandScript>
<script>scripts/kyuubi_server.py</script>
<scriptType>PYTHON</scriptType>
<timeout>600</timeout>
</commandScript>
<logs>
<log>
<logId>kyuubi_server</logId>
<primary>true</primary>
</log>
</logs>
</component>
其中kyuubi_server.py定义了安装、配置、启动、停止、获取服务状态等功能。
<configuration-dependencies>
<config-type>core-site</config-type>
<config-type>spark-defaults</config-type>
<config-type>spark-env</config-type>
<config-type>spark-log4j-properties</config-type>
<config-type>spark-metrics-properties</config-type>
<config-type>spark-thrift-sparkconf</config-type>
<config-type>spark-hive-site-override</config-type>
<config-type>spark-thrift-fairscheduler</config-type>
<config-type>kyuubi-defaults</config-type>
<config-type>kyuubi-env</config-type>
<config-type>kyuubi-log4j-properties</config-type>
<config-type>ranger-spark-audit</config-type>
<config-type>ranger-spark-security</config-type>
</configuration-dependencies>
在< configuration-dependencies ></ configuration-dependencies >标签内添加kyuubi相关配置项,包括kyuubi-defaults、kyuubi-env、kyuubi-log4j-properties、ranger-spark-audit、ranger-spark-security。
<osSpecific>
<osFamily>redhat7,amazonlinux2,redhat6,suse11,suse12</osFamily>
<packages>
<package>
<name>spark2_${stack_version}</name>
</package>
<package>
<name>spark2_${stack_version}-python</name>
</package>
<package>
<name>kyuubi_${stack_version}</name>
</package>
</packages>
</osSpecific>
在< package > </ package >标签内添加kyuubi rpm包名称信息
2.2.2.2 编辑kyuubi_server.py文件
#!/usr/bin/env python
import os
from resource_management import *
class KyuubiServer(Script):
def install(self, env):
self.install_packages(env)
def configure(self, env, upgrade_type=None, config_dir=None):
import kyuubi_params
env.set_params(kyuubi_params)
Directory([kyuubi_params.kyuubi_log_dir, kyuubi_params.kyuubi_pid_dir, kyuubi_params.kyuubi_metrics_dir, kyuubi_params.kyuubi_operation_log_dir],
owner=kyuubi_params.kyuubi_user,
group=kyuubi_params.kyuubi_group,
mode=0775,
create_parents = True
)
kyuubi_defaults = dict(kyuubi_params.config['configurations']['kyuubi-defaults'])
PropertiesFile(format("{kyuubi_conf_dir}/kyuubi-defaults.conf"),
properties = kyuubi_defaults,
key_value_delimiter = " ",
owner=kyuubi_params.kyuubi_user,
group=kyuubi_params.kyuubi_group,
mode=0644
)
# create kyuubi-env.sh in kyuubi install dir
File(os.path.join(kyuubi_params.kyuubi_conf_dir, 'kyuubi-env.sh'),
owner=kyuubi_params.kyuubi_user,
group=kyuubi_params.kyuubi_group,
content=InlineTemplate(kyuubi_params.kyuubi_env_sh),
mode=0644,
)
#create log4j.properties kyuubi install dir
File(os.path.join(kyuubi_params.kyuubi_conf_dir, 'log4j.properties'),
owner=kyuubi_params.kyuubi_user,
group=kyuubi_params.kyuubi_group,
content=kyuubi_params.kyuubi_log4j_properties,
mode=0644,
)
def start(self, env, upgrade_type=None):
import kyuubi_params
env.set_params(kyuubi_params)
self.configure(env)
Execute(kyuubi_params.kyuubi_start_cmd,user=kyuubi_params.kyuubi_user,environment={'JAVA_HOME': kyuubi_params.java_home})
def stop(self, env, upgrade_type=None):
import kyuubi_params
env.set_params(kyuubi_params)
self.configure(env)
Execute(kyuubi_params.kyuubi_stop_cmd,user=kyuubi_params.kyuubi_user,environment={'JAVA_HOME': kyuubi_params.java_home})
def status(self, env):
import kyuubi_params
env.set_params(kyuubi_params)
check_process_status(kyuubi_params.kyuubi_pid_file)
def get_user(self):
import kyuubi_params
return kyuubi_params.kyuubi_user
def get_pid_files(self):
import kyuubi_params
return [kyuubi_params.kyuubi_pid_file]
if __name__ == "__main__":
KyuubiServer().execute()
kyuubi_server.py定义了安装、配置、启动、停止Kyuubi服务的逻辑。
kyuubi_params配置变量以及需要的相关参数,由于篇幅原因这里不推荐使用。
2.2.2.3 编辑kyuubi_default.xml文件
<?xml version="1.0" encoding="UTF-8"?>
<configuration supports_final="true">
<property>
<name>kyuubi.ha.zookeeper.quorum</name>
<value>{{cluster_zookeeper_quorum}}</value>
<description>
The connection string for the zookeeper ensemble
</description>
<on-ambari-upgrade add="true"/>
</property>
<property>
<name>kyuubi.frontend.thrift.binary.bind.port</name>
<value>10009</value>
<description>
Port of the machine on which to run the thrift frontend service via binary protocol.
</description>
<on-ambari-upgrade add="true"/>
</property>
<property>
<name>kyuubi.ha.zookeeper.session.timeout</name>
<value>600000</value>
<description>
The timeout(ms) of a connected session to be idled
</description>
<on-ambari-upgrade add="true"/>
</property>
<property>
<name>kyuubi.session.engine.initialize.timeout</name>
<value>300000</value>
<description>
Timeout for starting the background engine, e.g. SparkSQLEngine.
</description>
<on-ambari-upgrade add="true"/>
</property>
<property>
<name>kyuubi.authentication</name>
<value>{{kyuubi_authentication}}</value>
<description>
Client authentication types
</description>
<on-ambari-upgrade add="true"/>
</property>
<property>
<name>spark.master</name>
<value>yarn</value>
<description>
The deploying mode of spark application.
</description>
<on-ambari-upgrade add="true"/>
</property>
<property>
<name>spark.submit.deployMode</name>
<value>cluster</value>
<description>spark submit deploy mode</description>
<on-ambari-upgrade add="true"/>
</property>
<property>
<name>spark.yarn.queue</name>
<value>default</value>
<description>
The name of the YARN queue to which the application is submitted.
</description>
<depends-on>
<property>
<type>capacity-scheduler</type>
<name>yarn.scheduler.capacity.root.queues</name>
</property>
</depends-on>
<on-ambari-upgrade add="false"/>
</property>
<property>
<name>spark.yarn.driver.memory</name>
<value>4g</value>
<description>spark yarn driver momory</description>
<on-ambari-upgrade add="false"/>
</property>
<property>
<name>spark.executor.memory</name>
<value>4g</value>
<description>spark.executor.memory</description>
<on-ambari-upgrade add="false"/>
</property>
<property>
<name>spark.sql.extensions</name>
<value>org.apache.submarine.spark.security.api.RangerSparkSQLExtension</value>
<description>spark sql ranger extension</description>
<on-ambari-upgrade add="false"/>
</property>
</configuration>
其中kyuubi.ha.zookeeper.quorum属性配置的值值为{{cluster_zookeeper_quorum}},会在kyuubi安装中自动替换为当前zk集群的信息。
kyuubi.authentication属性配置的值值为{{kyuubi_authentication}},在kyuubi安装中判断当前集群是否开启kerberos认证,来设置true或者false。
2.2.2.4 编辑kyuubi_env.xml文件
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration supports_adding_forbidden="true">
<property>
<name>kyuubi_user</name>
<display-name>Kyuubi User</display-name>
<value>spark</value>
<property-type>USER</property-type>
<value-attributes>
<type>user</type>
<overridable>false</overridable>
<user-groups>
<property>
<type>cluster-env</type>
<name>user_group</name>
</property>
<property>
<type>kyuubi-env</type>
<name>kyuubi_group</name>
</property>
</user-groups>
</value-attributes>
<on-ambari-upgrade add="true"/>
</property>
<property>
<name>kyuubi_group</name>
<display-name>Kyuubi Group</display-name>
<value>spark</value>
<property-type>GROUP</property-type>
<description>kyuubi group</description>
<value-attributes>
<type>user</type>
</value-attributes>
<on-ambari-upgrade add="true"/>
</property>
<property>
<name>kyuubi_log_dir</name>
<display-name>Kyuubi Log directory</display-name>
<value>/var/log/kyuubi</value>
<description>Kyuubi Log Dir</description>
<value-attributes>
<type>directory</type>
</value-attributes>
<on-ambari-upgrade add="true"/>
</property>
<property>
<name>kyuubi_pid_dir</name>
<display-name>Kyuubi PID directory</display-name>
<value>/var/run/kyuubi</value>
<value-attributes>
<type>directory</type>
</value-attributes>
<on-ambari-upgrade add="true"/>
</property>
<!-- kyuubi-env.sh -->
<property>
<name>content</name>
<description>This is the jinja template for kyuubi-env.sh file</description>
<value>
#!/usr/bin/env bash
export JAVA_HOME={{java_home}}
export HADOOP_CONF_DIR=/etc/hadoop/conf
export SPARK_HOME=/usr/hdp/current/spark-client
export SPARK_CONF_DIR=/etc/spark/conf
export KYUUBI_LOG_DIR={{kyuubi_log_dir}}
export KYUUBI_PID_DIR={{kyuubi_pid_dir}}
</value>
<value-attributes>
<type>content</type>
</value-attributes>
<on-ambari-upgrade add="true"/>
</property>
</configuration>
本文件主要设置JAVA_HOME、HADOOP_CONF_DIR、KYUUBI_LOG_DIR、KYUUBI_PID_DIR等相关路径信息
2.2.2.5 编辑kyuubi-log4j-properties.xml文件
<?xml version="1.0" encoding="UTF-8"?>
<configuration supports_final="false" supports_adding_forbidden="true">
<property>
<name>content</name>
<description>Kyuubi-log4j-Properties</description>
<value>
# Set everything to be logged to the console
log4j.rootCategory=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss.SSS} %p %c{2}: %m%n
# Set the default kyuubi-ctl log level to WARN. When running the kyuubi-ctl, the
# log level for this class is used to overwrite the root logger's log level.
log4j.logger.org.apache.kyuubi.ctl.ServiceControlCli=ERROR
</value>
<value-attributes>
<type>content</type>
<show-property-name>false</show-property-name>
</value-attributes>
<on-ambari-upgrade add="true"/>
</property>
</configuration>
2.2.2.6 编辑ranger-spark-security.xml文件
<?xml version="1.0"?>
<configuration>
<property>
<name>ranger.plugin.spark.service.name</name>
<value>{{repo_name}}</value>
<description>Name of the Ranger service containing policies for this SPARK instance</description>
<on-ambari-upgrade add="false"/>
</property>
<property>
<name>ranger.plugin.spark.policy.source.impl</name>
<value>org.apache.ranger.admin.client.RangerAdminRESTClient</value>
<description>Class to retrieve policies from the source</description>
<on-ambari-upgrade add="false"/>
</property>
<property>
<name>ranger.plugin.spark.policy.rest.url</name>
<value>{{policymgr_mgr_url}}</value>
<description>URL to Ranger Admin</description>
<on-ambari-upgrade add="false"/>
<depends-on>
<property>
<type>admin-properties</type>
<name>policymgr_external_url</name>
</property>
</depends-on>
</property>
<property>
<name>ranger.plugin.spark.policy.pollIntervalMs</name>
<value>30000</value>
<description>How often to poll for changes in policies?</description>
<on-ambari-upgrade add="false"/>
</property>
<property>
<name>ranger.plugin.spark.policy.cache.dir</name>
<value>/etc/ranger/{{repo_name}}/policycache</value>
<description>Directory where Ranger policies are cached after successful retrieval from the source</description>
<on-ambari-upgrade add="false"/>
</property>
</configuration>
本文件主要用来配置spark ranger相关参数。
2.2.2.7 编辑ranger-spark-audit.xml文件
<?xml version="1.0"?>
<configuration>
<property>
<name>xasecure.audit.is.enabled</name>
<value>true</value>
<description>Is Audit enabled?</description>
<value-attributes>
<type>boolean</type>
</value-attributes>
<on-ambari-upgrade add="false"/>
</property>
<property>
<name>xasecure.audit.destination.db</name>
<value>false</value>
<display-name>Audit to DB</display-name>
<description>Is Audit to DB enabled?</description>
<value-attributes>
<type>boolean</type>
</value-attributes>
<depends-on>
<property>
<type>ranger-env</type>
<name>xasecure.audit.destination.db</name>
</property>
</depends-on>
<on-ambari-upgrade add="false"/>
</property>
<property>
<name>xasecure.audit.destination.db.jdbc.driver</name>
<value>{{jdbc_driver}}</value>
<description>Audit DB JDBC Driver</description>
<on-ambari-upgrade add="false"/>
</property>
<property>
<name>xasecure.audit.destination.db.jdbc.url</name>
<value>{{audit_jdbc_url}}</value>
<description>Audit DB JDBC URL</description>
<on-ambari-upgrade add="false"/>
</property>
<property>
<name>xasecure.audit.destination.db.password</name>
<value>{{xa_audit_db_password}}</value>
<property-type>PASSWORD</property-type>
<description>Audit DB JDBC Password</description>
<value-attributes>
<type>password</type>
</value-attributes>
<on-ambari-upgrade add="false"/>
</property>
<property>
<name>xasecure.audit.destination.db.user</name>
<value>{{xa_audit_db_user}}</value>
<description>Audit DB JDBC User</description>
<on-ambari-upgrade add="false"/>
</property>
</configuration>
本文件主要用来配置spark ranger audit 相关参数。
2.2.2.8 编辑alerts.json文件
"KYUUBI_SEVER": [
{
"name": "kyuubi_server_status",
"label": "Kyuubi Server",
"description": "This host-level alert is triggered if the Kyuubi Server cannot be determined to be up.",
"interval": 1,
"scope": "ANY",
"source": {
"type": "SCRIPT",
"path": "DIF/3.0/services/SPARK/package/scripts/alerts/alert_kyuubi_server_port.py",
"parameters": [
{
"name": "check.command.timeout",
"display_name": "Command Timeout",
"value": 120.0,
"type": "NUMERIC",
"description": "The maximum time before check command will be killed by timeout",
"units": "seconds",
"threshold": "CRITICAL"
}
]
}
}
]
在alert.json文件中添加检测kyuubi服务器是否启动的告警检测配置,每隔120秒检测一次kyuubi服务器服务是否正常。检测逻辑由alert_kyuubi_server_port.py实现
2.2.2.9 编辑alert_kyuubi_server_port.py文件
alert_kyuubi_server_port.py的实现逻辑可以参考alert_spark_thrift_port.py,在此不具体实现逻辑,原理就是定时执行beeline连接操作判断网站上的连接成功。
2.2.2.10 编辑kerberos.json文件
{
"name": "kyuubi_service_keytab",
"principal": {
"value": "spark/_HOST@${realm}",
"type" : "service",
"configuration": "kyuubi-defaults/kyuubi.kinit.principal",
"local_username" : "${spark-env/spark_user}"
},
"keytab": {
"file": "${keytab_dir}/spark.service.keytab",
"owner": {
"name": "${spark-env/spark_user}",
"access": "r"
},
"group": {
"name": "${cluster-env/user_group}",
"access": ""
},
"configuration": "kyuubi-defaults/kyuubi.kinit.keytab"
}
}
在kerberos.json文件中添加自动生成principal、keytab配置到kyuubi-defaults的逻辑。在集群启动kerberos认证的时候,会自动在kyuubi-defaults中添加kyuubi.kinit.keytab及kyuubi.kinit.principal配置项。
2.2.2.11 更新 ambari-server、ambari-agent RPM
将上述修改以及添加的文件内容更新到ambari-server、ambari-agent RPM包中的相应目录中。
对于已经安装的套件,可以通过如下操作进行:
1.卸载spark服务
2.将上述添加的文件放到如下目录的位置:
/var/lib/ambari-server/resources/stacks/HDP/3.0/services/SPARK
/var/lib/ambari-agent/cache/stacks/DIF/3.0/services/SPARK
3. 在ambari-serer执行sudo ambari-server restart
4. 在ambari-agent所在节点执行sudo ambari-agent restart
5. 重新安装spark服务
3、效果展示
3.1 安装效果展示
安装spark时,支持选择kyuubi服务器组件
安装过程中可以在界面上配置kyuubi参数
3.2 安装成功后效果展示
安装成功
Kyuubi相关配置页面展示成功
3.3 停掉Kyuubi Server效果展示
界面上停止掉kyuubi服务器
界面展示停止成功
界面告警提示kyuubi server停止
重新启动服务器
说警消失
使用ambari-qa用户在后台通过beeline连接,可以连接成功
在yarn界面上看到用户ambari-qa应用程序运行成功
引用文献:
Ambari集成Apache Kyuubi实践