一、完全分布式模式的安装和配置的具体步骤:
1.配置jdk;2.配置hosts文件;3.建立hadoop运行账号;4.配置ssh免密码连入;
5.下载并解压hadoop安装包;6.配置namenode,修改site文件;7.配置hadoop-env.sh;
8.配置masters和slaves文件;9.向各节点复制hadoop;10.格式化namenode;
11.启动hadoop;12.用jps检验各后台进程是否成功启动
1.配置jdk,分别在各节点配置
1 ----首先把压缩包解压出来---- 2 [root@localhost ~]# tar -zxvf jdk-7u9-linux-i586.tar.gz 3 4 ----修改目录名---- 5 [root@localhost ~]# mv jdk1.7.0_09 /jdk1.7 6 7 ----在/etc/profile文件中添加下面几行---- 8 [root@localhost ~]# vi /etc/profile 9 10 export JAVA_HOME=/jdk1.7 11 export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib 12 export PATH=$JAVA_HOME/bin:$PATH 13 14 ----验证是否已经成功安装jdk1.7---- 15 [root@localhost ~]# java -version 16 java version "1.7.0_09" 17 Java(TM) SE Runtime Environment (build 1.7.0_09-b05) 18 Java HotSpot(TM) Client VM (build 23.5-b02, mixed mode)
2.配置hosts文件,分别在各节点配置
1 [root@localhost 123]# cat /etc/hosts 2 # Do not remove the following line, or various programs 3 # that require network functionality will fail. 4 127.0.0.1 localhost.localdomain localhost 5 ::1 localhost6.localdomain6 localhost6 6 7 192.168.1.151 node1 8 192.168.1.152 node2 9 192.168.1.153 node3
3.建立hadoop运行账号,分别在各个节点进行配置
1 [root@localhost ~]# useradd jack 2 [root@localhost ~]# passwd jack 3 Changing password for user jack. 4 New UNIX password: 5 BAD PASSWORD: it is too short 6 Retype new UNIX password: 7 passwd: all authentication tokens updated successfully. 8 [root@localhost ~]# id jack 9 uid=500(jack) gid=500(jack) groups=500(jack)
4.配置ssh免密码连入
1 [jack@node1 ~]$ ssh-keygen -t rsa 2 Generating public/private rsa key pair. 3 Enter file in which to save the key (/home/jack/.ssh/id_rsa): 4 Created directory '/home/jack/.ssh'. 5 Enter passphrase (empty for no passphrase): 6 Enter same passphrase again: 7 Your identification has been saved in /home/jack/.ssh/id_rsa. 8 Your public key has been saved in /home/jack/.ssh/id_rsa.pub. 9 The key fingerprint is: 10 65:22:5b:af:69:09:7b:8f:8b:35:f6:b8:69:8c:f0:a1 jack@node1 11 12 [jack@node2 ~]$ ssh-keygen -t rsa 13 Generating public/private rsa key pair. 14 Enter file in which to save the key (/home/jack/.ssh/id_rsa): 15 Created directory '/home/jack/.ssh'. 16 Enter passphrase (empty for no passphrase): 17 Enter same passphrase again: 18 Your identification has been saved in /home/jack/.ssh/id_rsa. 19 Your public key has been saved in /home/jack/.ssh/id_rsa.pub. 20 The key fingerprint is: 21 ab:18:29:89:57:82:f8:cc:3c:ed:47:05:b2:15:43:56 jack@node2 22 23 [jack@node3 ~]$ ssh-keygen -t rsa 24 Generating public/private rsa key pair. 25 Enter file in which to save the key (/home/jack/.ssh/id_rsa): 26 Created directory '/home/jack/.ssh'. 27 Enter passphrase (empty for no passphrase): 28 Enter same passphrase again: 29 Your identification has been saved in /home/jack/.ssh/id_rsa. 30 Your public key has been saved in /home/jack/.ssh/id_rsa.pub. 31 The key fingerprint is: 32 11:9f:7c:81:e2:dd:c8:44:1d:8a:24:15:28:bc:06:78 jack@node3 33 34 [jack@node1 ~]$ cd .ssh/ 35 [jack@node1 .ssh]$ cat id_rsa.pub > authorized_keys 36 37 [jack@node2 ~]$ cd .ssh/ 38 [jack@node2 .ssh]$ scp id_rsa.pub node1:/home/jack/ 39 The authenticity of host 'node1 (192.168.1.151)' can't be established. 40 RSA key fingerprint is 51:ac:0e:ec:9c:ec:60:ac:53:19:20:bc:e4:a6:95:64. 41 Are you sure you want to continue connecting (yes/no)? yes 42 Warning: Permanently added 'node1,192.168.1.151' (RSA) to the list of known hosts. 43 jack@node1's password: 44 id_rsa.pub 100% 392 0.4KB/s 00:00 45 46 [jack@node1 .ssh]$ cat /home/jack/id_rsa.pub >> authorized_keys 47 48 [jack@node3 ~]$ cd .ssh/ 49 [jack@node3 .ssh]$ scp id_rsa.pub node1:/home/jack/ 50 The authenticity of host 'node1 (192.168.1.151)' can't be established. 51 RSA key fingerprint is 51:ac:0e:ec:9c:ec:60:ac:53:19:20:bc:e4:a6:95:64. 52 Are you sure you want to continue connecting (yes/no)? yes 53 Warning: Permanently added 'node1,192.168.1.151' (RSA) to the list of known hosts. 54 jack@node1's password: 55 id_rsa.pub 100% 392 0.4KB/s 00:00 56 57 [jack@node1 .ssh]$ cat /home/jack/id_rsa.pub >> authorized_keys 58 59 [jack@node1 .ssh]$ ls 60 authorized_keys id_rsa id_rsa.pub 61 [jack@node1 .ssh]$ rm id_rsa.pub 62 [jack@node1 .ssh]$ scp authorized_keys node2:/home/jack/.ssh/ 63 The authenticity of host 'node2 (192.168.1.152)' can't be established. 64 RSA key fingerprint is 51:ac:0e:ec:9c:ec:60:ac:53:19:20:bc:e4:a6:95:64. 65 Are you sure you want to continue connecting (yes/no)? yes 66 Warning: Permanently added 'node2,192.168.1.152' (RSA) to the list of known hosts. 67 jack@node2's password: 68 authorized_keys 100% 1176 1.2KB/s 00:00 69 [jack@node1 .ssh]$ scp authorized_keys node3:/home/jack/.ssh/ 70 The authenticity of host 'node3 (192.168.1.153)' can't be established. 71 RSA key fingerprint is 51:ac:0e:ec:9c:ec:60:ac:53:19:20:bc:e4:a6:95:64. 72 Are you sure you want to continue connecting (yes/no)? yes 73 Warning: Permanently added 'node3,192.168.1.153' (RSA) to the list of known hosts. 74 jack@node3's password: 75 authorized_keys 100% 1176 1.2KB/s 00:00 76 [jack@node1 .ssh]$ chmod 400 authorized_keys 77 78 [jack@node2 .ssh]$ rm id_rsa.pub 79 [jack@node2 .ssh]$ chmod 400 authorized_keys 80 81 [jack@node3 .ssh]$ rm id_rsa.pub 82 [jack@node3 .ssh]$ chmod 400 authorized_keys 83 [jack@node3 .ssh]$ ssh node2 84 The authenticity of host 'node2 (192.168.1.152)' can't be established. 85 RSA key fingerprint is 51:ac:0e:ec:9c:ec:60:ac:53:19:20:bc:e4:a6:95:64. 86 Are you sure you want to continue connecting (yes/no)? yes 87 Warning: Permanently added 'node2,192.168.1.152' (RSA) to the list of known hosts. 88 Last login: Wed May 15 21:57:50 2013 from 192.168.1.104 89 [jack@node2 ~]$
5.下载并解压hadoop安装包
1 [jack@node1 ~]$ tar -zxvf hadoop-0.20.2.tar.gz 2 3 [root@node1 jack]# mv hadoop-0.20.2 /hadoop-0.20.2
6.配置namenode,修改site文件
core-site.xml:hadoop core的配置项,例如hdfs和mapreduce常用的i/o设置等。
hdfs-site.xml:hadoop守护进程的配置项,包括namenode、辅助namenode和datanode等。
mapred-site.xml:mapreduce守护进程的配置项,包括jobtracker和tasktracker。
1 [jack@node1 conf]$ cat core-site.xml 2 <?xml version="1.0"?> 3 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 4 5 <!-- Put site-specific property overrides in this file. --> 6 ----fs.default.name 这是一个描述集群中namenode节点的URI(包括协议、主机名称、端口号),集群里面的每一台机器都需要知道namenode的地址。
datanode节点会先在namenode上注册,这样它们的数据才可以被使用。独立的客户端程序通过这个URI跟datanode交互,以取得文件的块列表。----
----hadoop.tmp.dir是hadoop文件系统依赖的基础配置,很多路径都依赖它。如果hdfs-site.xml中不配置namenode和datanode的存放位置,默认就放在这个路径中----
7 <configuration> 8 <property>9 <name>fs.default.name</name> 10 <value>hdfs://192.168.1.151:9000</value> 11 </property> 12 <property> 13 <name>hadoop.tmp.dir</name> 14 <value>/temp</value> 15 </property> 16 </configuration> 17 [jack@node1 conf]$ cat hdfs-site.xml 18 <?xml version="1.0"?> 19 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 20 21 <!-- Put site-specific property overrides in this file. --> 22 ----<!--dfs.replication-它决定着系统里面的文件块的数据备份个数。对于一个实际的应用,它应该被设为3(这个数字并没有上限,但更多的备份可能并没有作用,而且会占用更多的空间)。
少于三个的备份,可能会影响到数据的可靠性(系统故障时,也许会造成数据丢失)----
----<!-dfs.data.dir-这是datanode节点被指定要存储数据的本地文件系统路径。datanode节点上的这个路径没有必要完全相同,因为每台机器的环境很可能不一样的。
但是如果每台机器上的这个路径都是统一配置的话,会使工作变得简单一些。默认的情况下,它的值hadoop.tmp.dir,这个路径只能用于测试的目的,因为它很可能丢失掉一些数据。所以,这个值最好还是被覆盖。
----<!--dfs.name.dir-这是namenode节点存储hadoop文件系统信息的本地系统路径。这个值只对namenode有效,datanode并不需要使用到它。上面对于/temp类型的警告,同样也适用于这里。----
23 <configuration> 24 <property> 25 <name>dfs.name.dir</name> 26 <value>/user/hdfs/name</value> 27 </property> 28 <property> 29 <name>dfs.data.dir</name> 30 <value>/user/hdfs/data</value> 31 </property> 32 <property> 33 <name>dfs.replication</name> 34 <value>2</value> 35 </property> 36 </configuration> 37 [jack@node1 conf]$ cat mapred-site.xml 38 <?xml version="1.0"?> 39 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 40 41 <!-- Put site-specific property overrides in this file. --> 42 43 <configuration> 44 <property> 45 <name>mapred.job.tracker</name> 46 <value>192.168.1.151:9001</value> 47 </property> 48 </configuration>
1 [root@node1 ~]# mkdir /temp 2 [root@node1 ~]# mkdir -p /user/hdfs/data 3 [root@node1 ~]# mkdir -p /user/hdfs/name 4 [root@node1 ~]# chown -R jack:jack /temp/ 5 [root@node1 ~]# chown -R jack:jack /user/
7.配置hadoop-env.sh
hadoop-env.sh:记录脚本要用的环境变量,以运行hadoop。
1 [jack@node1 conf]$ vi hadoop-env.sh 2 3 # Set Hadoop-specific environment variables here. 4 5 # The only required environment variable is JAVA_HOME. All others are 6 # optional. When running a distributed configuration it is best to 7 # set JAVA_HOME in this file, so that it is correctly defined on 8 # remote nodes. 9 10 # The java implementation to use. Required. 11 export JAVA_HOME=/jdk1.7 12 13 # Extra Java CLASSPATH elements. Optional. 14 # export HADOOP_CLASSPATH=
8.配置masters和slaves文件
master:记录运行辅助namenode的机器列表。
slave:记录运行datanode和tasktracker的机器列表。
1 [root@node1 conf]# cat masters 2 192.168.1.151 3 [root@node1 conf]# cat slaves 4 192.168.1.152 5 192.168.1.153
9.向各节点复制hadoop
[root@node1 /]# scp -r hadoop-0.20.2 node3:/hadoop-0.20.2[root@node1 /]# scp -r hadoop-0.20.2 node2:/hadoop-0.20.2
10.格式化namenode
1 [jack@node1 /]$ cd hadoop-0.20.2/bin 2 [jack@node1 bin]$ ./hadoop namenode -format 3 13/05/05 20:02:03 INFO namenode.NameNode: STARTUP_MSG: 4 /************************************************************ 5 STARTUP_MSG: Starting NameNode 6 STARTUP_MSG: host = node1/192.168.1.151 7 STARTUP_MSG: args = [-format] 8 STARTUP_MSG: version = 0.20.2 9 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 10 ************************************************************/ 11 Re-format filesystem in /user/hdfs/name ? (Y or N) Y 12 13/05/05 20:02:06 INFO namenode.FSNamesystem: fsOwner=root,root,bin,daemon,sys,adm,disk,wheel 13 13/05/05 20:02:06 INFO namenode.FSNamesystem: supergroup=supergroup 14 13/05/05 20:02:06 INFO namenode.FSNamesystem: isPermissionEnabled=true 15 13/05/05 20:02:06 INFO common.Storage: Image file of size 94 saved in 0 seconds. 16 13/05/05 20:02:06 INFO common.Storage: Storage directory /user/hdfs/name has been successfully formatted. 17 13/05/05 20:02:06 INFO namenode.NameNode: SHUTDOWN_MSG: 18 /************************************************************ 19 SHUTDOWN_MSG: Shutting down NameNode at node1/192.168.1.151 20 ************************************************************/
11.启动hadoop
[jack@node1 bin]$ ./start-all.sh starting namenode, logging to /hadoop-0.20.2/bin/../logs/hadoop-root-namenode-node1.out 192.168.1.153: starting datanode, logging to /hadoop-0.20.2/bin/../logs/hadoop-root-datanode-node3.out 192.168.1.152: starting datanode, logging to /hadoop-0.20.2/bin/../logs/hadoop-root-datanode-node2.out 192.168.1.151: starting secondarynamenode, logging to /hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-node1.out starting jobtracker, logging to /hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-node1.out 192.168.1.152: starting tasktracker, logging to /hadoop-0.20.2/bin/../logs/hadoop-root-tasktracker-node2.out 192.168.1.153: starting tasktracker, logging to /hadoop-0.20.2/bin/../logs/hadoop-root-tasktracker-node3.out
12.用jps检验各后台进程是否成功启动
1 [jack@node1 bin]$ jps 2 4375 NameNode 3 4696 Jps 4 4531 SecondaryNameNode 5 4592 JobTracker 6 7 [jack@node3 ~]$ jps 8 4435 Jps 9 4373 TaskTracker 10 4275 DataNode 11 12 [jack@node2 /]jack jps 13 3934 TaskTracker 14 3994 Jps 15 3836 DataNode
在这里注意到后来使用了root来进行安装,因为之前在设置ssh免密码登陆出现问题,后来经过检查后成功了,后来又创建了一个hadoop的账号echo。
对刚安装好的hadoop集群做个测试:
1 [jack@node1 ~]$ mkdir input 2 [jack@node1 ~]$ cd input/ 3 [jack@node1 input]$ echo "hello world" > test1.txt 4 [jack@node1 input]$ echo "hello hadoop" > test2.txt 5 [jack@node1 input]$ cat test1.txt 6 hello world 7 [jack@node1 input]$ cat test2.txt 8 hello hadoop 9 [jack@node1 input]$ cd /hadoop-0.20.2/ 10 [jack@node1 hadoop-0.20.2]$ bin/hadoop dfs -put /home/jack/input in 11 [jack@node1 hadoop-0.20.2]$ bin/hadoop dfs -ls in 12 Found 2 items 13 -rw-r--r-- 1 echo supergroup 12 2013-05-06 15:23 /user/jack/in/test1.txt 14 -rw-r--r-- 1 echo supergroup 13 2013-05-06 15:23 /user/jack/in/test2.txt 15 [jack@node1 hadoop-0.20.2]$ ls 16 bin CHANGES.txt docs hadoop-0.20.2-examples.jar ivy librecordio NOTICE.txt webapps 17 build.xml conf hadoop-0.20.2-ant.jar hadoop-0.20.2-test.jar ivy.xml LICENSE.txt README.txt 18 c++ contrib hadoop-0.20.2-core.jar hadoop-0.20.2-tools.jar lib logs src 19 [jack@node1 hadoop-0.20.2]$ bin/hadoop jar hadoop-0.20.2-examples.jar wordcount in out 20 13/05/06 15:24:01 INFO input.FileInputFormat: Total input paths to process : 2 21 13/05/06 15:24:02 INFO mapred.JobClient: Running job: job_201305061516_0001 22 13/05/06 15:24:03 INFO mapred.JobClient: map 0% reduce 0% 23 13/05/06 15:24:30 INFO mapred.JobClient: map 50% reduce 0% 24 13/05/06 15:24:46 INFO mapred.JobClient: map 50% reduce 16% 25 13/05/06 15:24:51 INFO mapred.JobClient: map 100% reduce 16% 26 13/05/06 15:25:02 INFO mapred.JobClient: map 100% reduce 100% 27 13/05/06 15:25:04 INFO mapred.JobClient: Job complete: job_201305061516_0001 28 13/05/06 15:25:04 INFO mapred.JobClient: Counters: 17 29 13/05/06 15:25:04 INFO mapred.JobClient: Job Counters 30 13/05/06 15:25:04 INFO mapred.JobClient: Launched reduce tasks=1 31 13/05/06 15:25:04 INFO mapred.JobClient: Launched map tasks=2 32 13/05/06 15:25:04 INFO mapred.JobClient: Data-local map tasks=2 33 13/05/06 15:25:04 INFO mapred.JobClient: FileSystemCounters 34 13/05/06 15:25:04 INFO mapred.JobClient: FILE_BYTES_READ=55 35 13/05/06 15:25:04 INFO mapred.JobClient: HDFS_BYTES_READ=25 36 13/05/06 15:25:04 INFO mapred.JobClient: FILE_BYTES_WRITTEN=180 37 13/05/06 15:25:04 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=25 38 13/05/06 15:25:04 INFO mapred.JobClient: Map-Reduce Framework 39 13/05/06 15:25:04 INFO mapred.JobClient: Reduce input groups=3 40 13/05/06 15:25:04 INFO mapred.JobClient: Combine output records=4 41 13/05/06 15:25:04 INFO mapred.JobClient: Map input records=2 42 13/05/06 15:25:04 INFO mapred.JobClient: Reduce shuffle bytes=61 43 13/05/06 15:25:04 INFO mapred.JobClient: Reduce output records=3 44 13/05/06 15:25:04 INFO mapred.JobClient: Spilled Records=8 45 13/05/06 15:25:04 INFO mapred.JobClient: Map output bytes=41 46 13/05/06 15:25:04 INFO mapred.JobClient: Combine input records=4 47 13/05/06 15:25:04 INFO mapred.JobClient: Map output records=4 48 13/05/06 15:25:04 INFO mapred.JobClient: Reduce input records=4 49 [jack@node1 hadoop-0.20.2]$ bin/hadoop dfs -ls 50 Found 2 items 51 drwxr-xr-x - echo supergroup 0 2013-05-06 15:23 /user/jack/in 52 drwxr-xr-x - echo supergroup 0 2013-05-06 15:25 /user/jack/out 53 [jack@node1 hadoop-0.20.2]$ bin/hadoop dfs -ls ./out 54 Found 2 items 55 drwxr-xr-x - echo supergroup 0 2013-05-06 15:24 /user/jack/out/_logs 56 -rw-r--r-- 1 echo supergroup 25 2013-05-06 15:24 /user/jack/out/part-r-00000 57 [jack@node1 hadoop-0.20.2]$ bin/hadoop dfs -cat ./out/* 58 hadoop 1 59 hello 2 60 world 1 61 cat: Source must be a file.
参考资料:http://hadoop.apache.org/docs/r0.19.1/cn/cluster_setup.html