Hadoop宿醉：使用Apache Whirr启动hadoop集群CDH4

这篇文章是关于如何在EC2实例上启动CDH4 MRv1或CDH4 Yarn集群的。据说您可以在Whirr的帮助下在5分钟内启动一个集群！当且仅当一切正常时，这才是正确的！

希望本文在这方面对您有所帮助。

所以，让我们划船...

下载Apache Whirr的稳定版本，即。来自以下链接whirr-0.8.1.tar.gz的whirr-0.8.1.tar.gz
从压缩包中提取并生成密钥

$ tar -xzvf whirr-0.8.1.tar.gz
$ cd whirr-0.8.1

生成密钥

$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa_whirr
$ cd whirr-0.8.1

创建一个属性文件以使用该配置启动集群。

# Cluster name goes here
whirr.cluster-name=testcluster# Change the number of machines in the cluster here
# Using 3 DN and TT and 1JT and NN# Ganglia is configured
whirr.instance-templates=1 hadoop-jobtracker+hadoop-namenode+ganglia-monitor+ganglia-metad,3 hadoop-datanode+hadoop-tasktracker+ganglia-monitor# Install JAVA
whirr.java.install-function=install_openjdk
whirr.java.install-function=install_oab_java## Install CDH4 MRV1
whirr.hadoop.install-function=install_cdh_hadoop
whirr.hadoop.configure-function=configure_cdh_hadoop
whirr.env.REPO=cdh4# For EC2 set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.
whirr.provider=aws-ec2
whirr.hardware-id=c1.xlarge# Credentials should go here
whirr.identity=XXXXXXXXXXXXXXXXX
whirr.credential=XXXXXXXXXXXXXXXXXXXX
whirr.cluster-user=whirr
whirr.private-key-file=/home/ubuntu/.ssh/yourKey
whirr.public-key-file=/home/ubuntu/.ssh/yourKey.pub

现在，让我告诉您如何避免头痛！
- - 群集名称：使群集名称保持简单。避免使用testCluster，testCluster1等。没有大写数字。
  - 明智地确定所需的数据节点数。
  - 如果未安装java，则启动可能不会成功。确保图像具有Java。但是，此属性文件可以解决此问题。
  - 现在最好继续使用MRv1，等到产品稳定发布后再切换到MRv2。
  - 这是用于启动Hadoop集群的最小配置集。但是，您可以对此进行很多性能调整。
  - 我是从ec2实例启动此集群的，最初我遇到关于用户的错误。在下面设置配置即可解决问题。
```
whirr.cluster-user=whirr
```
在启动之前，为〜/ .ssh和whirr-0.8.1文件夹设置适当的权限。
- 好了，我们已经准备好启动集群。将属性文件命名为“ whirr_cdh.properties”。
```
$ cd whirr-0.8.1
$ bin/whirr launch-cluster --config whirr_cdh.properties
```
在控制台中，您可以看到指向Namenode和JobTracker Web UI的链接。最后，它还会显示如何ssh到实例。
- 现在，您应该已经生成了文件。您将能够看到以下文件：实例，hadoop-proxy.sh和hadoop-site.xml
- 启动代理
```
$ sh hadoop-proxy.sh
```
打开另一个终端，然后键入
您应该能够访问HDFS。

$ export HADOOP_CONF_DIR=~/.whirr/testcluster/hadoop-site.xml
$ hadoop fs -ls /

您也可以下载hadoop tarball并使用

$ bin/hadoop --config ~/.whirr/testcluster fs -ls /

好的！因此，我知道除非您拥有网络用户界面，否则您将不满意

Now, Launch Firefox (3.0v+)
Download the FoxyProxy extension by clicking this link:https://addons.mozilla.org/en-US/firefox/addon/2464.
Steps to configure and access the UI
Select Tools > FoxyProxy > Options
Click the “Add New Proxy” button.
Select “Manual Proxy Configuration”
Enter “localhost” for the “Host or IP Address” field.
Enter “6666″ for the “Port” field.
Click on the “General” tab at the top of the dialog box.
Enter “EC2″ for the “Proxy Name” field.
Click on the “URL Patterns” tab at the top of the dialog box.
Click the “Add New Pattern” button.
Enter “EC2″ for the “Pattern Name” field.
Enter “*compute-1.amazonaws.com*, *.ec2.internal*, *.compute-1.internal*” for the “URL pattern” field (not case sensitive)
Select the “Whitelist” and “Wildcards” radio buttons.
Click the “OK” button to dismiss the new URL pattern dialog box.
Click the “OK” button to dismiss the new proxy dialog box.
Completely disable the Foxyproxy for now.
You should be able to see 2 proxy names after closing, default and EC2.
Click on “Use proxy EC2 for all URLs” from the pop-up menu of FoxyProxy
Copy the URL of JobTracker (can be seen while running proxy, ec2-***-**-***-**.********.amazonaws.com) and paste it in the browser.

所以，我们很好！

如果要启动MRv2，请使用它。

## Cluster name goes here.
whirr.cluster-name=yarncluster# Change the number of machines in the cluster here
whirr.instance-templates=1 hadoop-namenode+yarn-resourcemanager+mapreduce-historyserver,2 hadoop-datanode+yarn-nodemanager# Install JAVA
whirr.java.install-function=install_openjdk
whirr.java.install-function=install_oab_java## Install CDH4 Yarn
whirr.hadoop.install-function=install_cdh_hadoop
whirr.hadoop.configure-function=configure_cdh_hadoop
whirr.yarn.configure-function=configure_cdh_yarn
whirr.yarn.start-function=start_cdh_yarn
whirr.mr_jobhistory.start-function=start_cdh_mr_jobhistory
whirr.env.REPO=cdh4
whirr.env.MAPREDUCE_VERSION=2# For EC2 set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.
whirr.provider=aws-ec2
whirr.hardware-id=c1.xlarge# Credentials should go here
whirr.identity=XXXXXXXXXXXXXXXXX
whirr.credential=XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
whirr.cluster-user=whirr
whirr.private-key-file=/home/ubuntu/.ssh/yourKey
whirr.public-key-file=/home/ubuntu/.ssh/yourKey.pub

和相同的过程！

学习愉快！

参考： Hadoop宿醉：在Techie（S）pArK *博客上，使用我们JCG合作伙伴 Swathi V的Apache Whirr使用hadoop集群CDH4 。