大数据集群（Hadoop生态）安装部署

1. 简介

2. 前置要求

3. Hadoop集群角色

4. 角色和节点分配

5. 调整虚拟机内存

6. Zookeeper集群部署

7. Hadoop集群部署

7.1 下载Hadoop安装包、解压、配置软链接

7.2 修改配置文件：hadoop-env.sh

7.3 修改配置文件：core-site.xml

7.4 配置：hdfs-site.xml文件

7.5 配置：mapred-env.sh文件

7.6 配置：mapred-site.xml文件

7.7 配置：yarn-env.sh文件

7.8 配置：yarn-site.xml文件

7.9 修改workers文件

7.10 分发hadoop到其它机器

7.11 在node2、node3执行

7.12 创建所需目录

7.13 配置环境变量

7.14 格式化NameNode，在node1执行

7.15 启动hadoop的hdfs集群，在node1执行即可

7.16 启动hadoop的yarn集群，在node1执行即可

7.17 启动历史服务器

7.18 启动web代理服务器

8. 验证Hadoop集群运行情况

8.1 在node1、node2、node3上通过jps验证进程是否都启动成功

8.2 验证HDFS，浏览器打开：http: / node1:9870

8.3 验证YARN，浏览器打开：http: / node1:8088

1. 简介

1 ） Hadoop 是一个由 Apache 基金会所开发的分布式系统基础架构。

2 ）主要解决，海量数据的存储和海量数据的分析计算问题。

Hadoop HDFS 提供分布式海量数据存储能力;

Hadoop YARN 提供分布式集群资源管理能力;

Hadoop MapReduce 提供分布式海量数据计算能力

2. 前置要求

请确保完成了集群化环境前置准备章节的内容

即： JDK 、 SSH 免密、关闭防火墙、配置主机名映射等前置操作

链接：集群化环境前置准备_时光の尘的博客-CSDN博客

链接： Zookeeper集群安装部署、Kafka集群安装部署_时光の尘的博客-CSDN博客

3. Hadoop集群角色

Hadoop 生态体系中总共会出现如下进程角色：

1. Hadoop HDFS 的管理角色： Namenode 进程（仅需 1 个即可（管理者一个就够））

2. Hadoop HDFS 的工作角色： Datanode 进程（需要多个（工人，越多越好，一个机器启动一个））

3. Hadoop YARN 的管理角色： ResourceManager 进程（仅需 1 个即可（管理者一个就够））

4. Hadoop YARN 的工作角色： NodeManager 进程（需要多个（工人，越多越好，一个机器启动一个））

5. Hadoop 历史记录服务器角色： HistoryServer 进程（仅需 1 个即可（功能进程无需太多1 个足够））

6. Hadoop 代理服务器角色： WebProxyServer 进程（仅需 1 个即可（功能进程无需太多1 个足够））

7. Zookeeper 的进程： QuorumPeerMain 进程（仅需 1 个即可（ Zookeeper 的工作者，越多越好））

4. 角色和节点分配

角色分配如下：

1. node1:Namenode 、 Datanode 、 ResourceManager 、 NodeManager 、 HistoryServer、 WebProxyServer 、 QuorumPeerMain

2. node2:Datanode 、 NodeManager 、 QuorumPeerMain

3. node3:Datanode 、 NodeManager 、 QuorumPeerMain

5. 调整虚拟机内存

如上图，可以看出 node1 承载了太多的压力。同时 node2 和 node3 也同时运行了不少程序，为了确保集群的稳定，需要对虚拟机进行内存设置。

请在 VMware 中，对：

1. node1 设置 4GB 或以上内存

2. node2 和 node3 设置 2GB 或以上内存

大数据的软件本身就是集群化（一堆服务器）一起运行的。

现在我们在一台电脑中以多台虚拟机来模拟集群，确实会有很大的内存压力哦。

6. Zookeeper集群部署

Zookeeper集群安装部署、Kafka集群安装部署_时光の尘的博客-CSDN博客

7. Hadoop集群部署

7.1 下载Hadoop安装包、解压、配置软链接

# 1. 下载
wget
http://rchive.apache.org/dist/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz# 2. 解压
# 请确保目录/export/server存在
tar -zxvf hadoop-3.3.0.tar.gz -C /export/server/# 3. 构建软链接
ln -s /export/server/hadoop-3.3.0 /export/server/hadoop

7.2 修改配置文件：hadoop-env.sh

cd 进入到 /export/server/hadoop/etc/hadoop ，文件夹中，配置文件都在这里

修改 hadoop-env.sh 文件

此文件是配置一些 Hadoop 用到的环境变量

这些是临时变量，在 Hadoop 运行时有用

如果要永久生效，需要写到 /etc/profile 中

# 在文件开头加入：
# 配置Java安装路径
export JAVA_HOME=/export/server/jdk# 配置Hadoop安装路径
export HADOOP_HOME=/export/server/hadoop
# Hadoop hdfs配置文件路径
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop# Hadoop YARN配置文件路径
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop# Hadoop YARN 日志文件夹
export YARN_LOG_DIR=$HADOOP_HOME/logs/yarn# Hadoop hdfs 日志文件夹
export HADOOP_LOG_DIR=$HADOOP_HOME/logs/hdfs# Hadoop的使用启动用户配置
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
export YARN_PROXYSERVER_USER=root

7.3 修改配置文件：core-site.xml

如下，清空文件，填入如下内容

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl"
href="configuration.xsl"?>Licensed under the Apache License, Version 2.0
(the "License");
you may not use this file except in compliance
with the License.
You may obtain a copy of the License athttp: / www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in
writing, software
distributed under the License is distributed on
an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
either express or implied.
See the License for the specific language
governing permissions and
limitations under the License. See accompanying
LICENSE file.
>- Put site-specific property overrides in this
file. ><configuration>
<property><name>fs.defaultFS / name><value>hdfs: / node1:8020 / value><description> / description>/ property><property><name>io.file.buffer.size / name><value>131072 / value><description> / description>/ property>/ configuration>

格式转换出现了问题，这里用图片展现上述代码：

7.4 配置：hdfs-site.xml文件

1 <?xml version="1.0" encoding="UTF-8"?>
2 <?xml-stylesheet type="text/xsl"
href="configuration.xsl"?>
3 4 Licensed under the Apache License, Version 2.0
(the "License");
5 you may not use this file except in compliance
with the License.
6 You may obtain a copy of the License at
7
8 http: / www.apache.org/licenses/LICENSE-2.0
9
10 Unless required by applicable law or agreed to in
writing, software
11 distributed under the License is distributed on
an "AS IS" BASIS,
12 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
either express or implied.
13 See the License for the specific language
governing permissions and
14 limitations under the License. See accompanying
LICENSE file.
15 >
16
17 - Put site-specific property overrides in this
file. >
18
19 <configuration>
20 <property>
21 <name>dfs.datanode.data.dir.perm / name>
22 <value>700 / value>
23 / property>
24
25 <property>
26 <name>dfs.namenode.name.dir / name>
27 <value>/data/nn / value>
28 <description>Path on the local filesystem where
the NameNode stores the namespace and transactions
logs persistently. / description>
29 / property>
30
31 <property>
32 <name>dfs.namenode.hosts / name>
33 <value>node1,node2,node3 / value>
34 <description>List of permitted DataNodes.
/
description>
35 / property>
36
37 <property>
38 <name>dfs.blocksize / name>
39 <value>268435456 / value>
40 <description> / description>
41 / property>
42
43
44 <property>
45 <name>dfs.namenode.handler.count / name>
46 <value>100 / value>
47 <description> / description>
48 / property>
49
50 <property>
51 <name>dfs.datanode.data.dir / name>
52 <value>/data/dn / value>
53 / property>
54 / configuration>

同上，代码转换时出现错误，请参考：

7.5 配置：mapred-env.sh文件

1 # 在文件的开头加入如下环境变量设置
2 export JAVA_HOME=/export/server/jdk
3 export HADOOP_JOB_HISTORYSERVER_HEAPSIZE=1000
4 export HADOOP_MAPRED_ROOT_LOGGER=INFO,RFA

7.6 配置：mapred-site.xml文件

1 <?xml version="1.0"?>
2 <?xml-stylesheet type="text/xsl"
href="configuration.xsl"?>
3 4 Licensed under the Apache License, Version 2.0
(the "License");
5 you may not use this file except in compliance
with the License.
6 You may obtain a copy of the License at
7
8 http: / www.apache.org/licenses/LICENSE-2.0
9
10 Unless required by applicable law or agreed to in
writing, software
11 distributed under the License is distributed on
an "AS IS" BASIS,
12 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
either express or implied.
13 See the License for the specific language
governing permissions and
14 limitations under the License. See accompanying
LICENSE file.
15 >
16
17 - Put site-specific property overrides in this
file. >
18
19 <configuration>
20 <property>
21 <name>mapreduce.framework.name / name>
22 <value>yarn / value>
23 <description> / description>
24 / property>
25
26 <property>
27 <name>mapreduce.jobhistory.address / name>
28 <value>node1:10020 / value>
29 <description> / description>
30 / property>
31
32
33 <property>
34
<name>mapreduce.jobhistory.webapp.address / name>
35 <value>node1:19888 / value>
36 <description> / description>
37 / property>
38
39
40 <property>
41 <name>mapreduce.jobhistory.intermediate-donedir / name>
42 <value>/data/mr-history/tmp / value>
43 <description> / description>
44 / property>
45
46
47 <property>
48 <name>mapreduce.jobhistory.done-dir / name>
49 <value>/data/mr-history/done / value>
50 <description> / description>
51 / property>
52 <property>
53 <name>yarn.app.mapreduce.am.env / name>
54 <value>HADOOP_MAPRED_HOME=$HADOOP_HOME / value>
55 / property>
56 <property>
57 <name>mapreduce.map.env / name>
58 <value>HADOOP_MAPRED_HOME=$HADOOP_HOME / value>
59 / property>
60 <property>
61 <name>mapreduce.reduce.env / name>
62 <value>HADOOP_MAPRED_HOME=$HADOOP_HOME / value>
63 / property>
64 / configuration>

7.7 配置：yarn-env.sh文件

1 # 在文件的开头加入如下环境变量设置
2 export JAVA_HOME=/export/server/jdk
3 export HADOOP_HOME=/export/server/hadoop
4 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
5 export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
6 export YARN_LOG_DIR=$HADOOP_HOME/logs/yarn
7 export HADOOP_LOG_DIR=$HADOOP_HOME/logs/hdfs

7.8 配置：yarn-site.xml文件

1 <?xml version="1.0"?>
2 3 Licensed under the Apache License, Version 2.0
(the "License");
4 you may not use this file except in compliance
with the License.
5 You may obtain a copy of the License at
6
7 http: / www.apache.org/licenses/LICENSE-2.0
8
9 Unless required by applicable law or agreed to in
writing, software
10 distributed under the License is distributed on
an "AS IS" BASIS,
11 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
either express or implied.
12 See the License for the specific language
governing permissions and
13 limitations under the License. See accompanying
LICENSE file.
14 >
15 <configuration>
16
17 - Site specific YARN configuration properties >
18 <property>
19 <name>yarn.log.server.url / name>
20
<value>http: / node1:19888/jobhistory/logs / value>
21 <description> / description>
22 / property>
23
24 <property>
25 <name>yarn.web-proxy.address / name>
26 <value>node1:8089 / value>
27 <description>proxy server hostname and
port / description>
28 / property>
29
30
31 <property>
32 <name>yarn.log-aggregation-enable / name>
33 <value>true / value>
34 <description>Configuration to enable or disable
log aggregation / description>
35 / property>
36
37 <property>
38 <name>yarn.nodemanager.remote-app-logdir / name>
39 <value>/tmp/logs / value>
40 <description>Configuration to enable or disable
log aggregation / description>
41 / property>
42
43
44 - Site specific YARN configuration properties >
45 <property>
46 <name>yarn.resourcemanager.hostname / name>
47 <value>node1 / value>
48 <description> / description>
49 / property>
50
51 <property>
52
<name>yarn.resourcemanager.scheduler.class / name>
53
<value>org.apache.hadoop.yarn.server.resourcemanag
er.scheduler.fair.FairScheduler / value>
54 <description> / description>
55 / property>
56
57 <property>
58 <name>yarn.nodemanager.local-dirs / name>
59 <value>/data/nm-local / value>
60 <description>Comma-separated list of paths on
the local filesystem where intermediate data is
written. / description>
61 / property>
62
63
64 <property>
65 <name>yarn.nodemanager.log-dirs / name>
66 <value>/data/nm-log / value>
67 <description>Comma-separated list of paths on
the local filesystem where logs are written.
/
description>
68 / property>
69
70
71 <property>
72 <name>yarn.nodemanager.log.retainseconds / name>
73 <value>10800 / value>
74 <description>Default time (in seconds) to
retain log files on the NodeManager Only applicable
if log-aggregation is disabled. / description>
75 / property>
76
77
78
79 <property>
80 <name>yarn.nodemanager.aux-services / name>
81 <value>mapreduce_shuffle / value>
82 <description>Shuffle service that needs to be
set for Map Reduce applications. / description>
83 / property>
84 / configuration>

7.9 修改workers文件

1 # 全部内容如下
2 node1
3 node2
4 node3

7.10 分发hadoop到其它机器

1 # 在node1执行
2 cd /export/server
3
4 scp -r hadoop-3.3.0 node2:`pwd`/
5 scp -r hadoop-3.3.0 node2:`pwd`/

7.11 在node2、node3执行

# 创建软链接
ln -s /export/server/hadoop-3.3.0
/export/server/hadoop

7.12 创建所需目录

在 node1 执行：

1 mkdir -p /data/nn
2 mkdir -p /data/dn
3 mkdir -p /data/nm-log
4 mkdir -p /data/nm-local

在 node2 执行：

1 mkdir -p /data/dn
2 mkdir -p /data/nm-log
3 mkdir -p /data/nm-local

在 node3 执行：

1 mkdir -p /data/dn
2 mkdir -p /data/nm-log
3 mkdir -p /data/nm-local

7.13 配置环境变量

在 node1 、 node2 、 node3 修改 /etc/profile

export HADOOP_HOME =/export/server/hadoop

export PATH = $PATH : $HADOOP_HOME /bin: $HADOOP_HOME /sbin

执行 source /etc/profile 生效