2019独角兽企业重金招聘Python工程师标准>>>
1.安装cygwin
参考博文:http://hi.baidu.com/%BD%AB%D6%AE%B7%E7_%BE%B2%D6%AE%D4%A8/blog/item/8832551c7598551f314e15c2.html
Q1.实际安装中在第9步 “打开cygwin进行配置,首先输入:ssh-host-config.回车。会让你输入yes/no输入no。回车。见到Have fun!就说明成功了”有些不同
Administrator@03ad6b3ba2f34fe ~
$ ssh-host-config*** Info: Generating /etc/ssh_host_key
*** Info: Generating /etc/ssh_host_rsa_key
*** Info: Generating /etc/ssh_host_dsa_key
*** Info: Generating /etc/ssh_host_ecdsa_key
*** Info: Creating default /etc/ssh_config file
*** Info: Creating default /etc/sshd_config file
*** Info: Privilege separation is set to yes by default since OpenSSH 3.3.
*** Info: However, this requires a non-privileged account called 'sshd'.
*** Info: For more info on privilege separation read /usr/share/doc/openssh/README.privsep.
*** Query: Should privilege separation be used? (yes/no) no
*** Info: Updating /etc/sshd_config file
*** Info: Added ssh to C:\WINDOWS\system32\driversc\services*** Query: Do you want to install sshd as a service?
*** Query: (Say "no" if it is already installed as a service) (yes/no) yes
*** Query: Enter the value of CYGWIN for the daemon: [] --直接敲回车*** Info: The sshd service has been installed under the LocalSystem
*** Info: account (also known as SYSTEM). To start the service now, call
*** Info: `net start sshd' or `cygrunsrv -S sshd'. Otherwise, it
*** Info: will start automatically after the next reboot.*** Info: Host configuration finished. Have fun!
Q2. 第一次安装中电脑死机,当时执行到创建图标的步骤,已经可以运行了,但是还是想重装一遍。于是找卸载办法,有人说用setup那个文件,把选中的都uninstall一下,我信了然后就悲剧了,卸不干净。然后找完美卸载的办法,尝试了一个"删除所有cygwin的文件夹,然后清理注册表中有cygwin的项" 这次OK了。千万别用setup去卸载!!
2.安装jdk和eclipse,这部分没有遇到问题,毕业java程序也写了1年多了
3.hadoop配置
参考博文:http://hi.baidu.com/%BD%AB%D6%AE%B7%E7_%BE%B2%D6%AE%D4%A8/blog/item/a0ebb1db953a772033fa1c9a.html
Q1.顺着博主的第四步./hadoop jar ./../hadoop-0.20.2-examples.jar wordcount testin testout的时候开始报错
INFO input.FileInputFormat: Total input paths to process : 2
INFO mapred.JobClient: Running job: job_201202131412_0007
INFO mapred.JobClient: map 0% reduce 0%
INFO mapred.JobClient: Task Id : attempt_201202131412_0007_m_0 00003_0, Status : FAILED
java.io.FileNotFoundException: File D:/hadoop/temp/taskTracker/jobcache/job_2012 02131412_0007/attempt_201202131412_0007_m_000003_0/work/tmp does not exist.at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSys tem.java:361)at
没错,博文下留言的人就是俺。这个错误怎么看都是找不到文件,上网找到了一个解决办法,就是在mapred-site.xml文件中修改
<property><name>mapred.child.tmp</name><value>/hadoop/tmp</value>
后来的操作就一直OK了。
4.常用的命令
ssh localhost 登录
cd /cygdriver/d/hadoop-0.20.2 进入目录
ls 查看当前目录下的所有文件
在/cygdrive/d/hadoop-0.20.2/bin目录下
./start-all.sh 启动
./hadoop namenode -format 格式化一个新的HDFS
./start-all.sh 同时启动HDFS和MAP/Reduce
./hadoop dfs -mkdir testin 创建目录testin
./hadoop dfs -put /test/*.jav0a testin 把test目录下的java文件全部复制到testin中
./hadoop dfs -ls testin 查看testin中的所有文件
./hadoop dfs -rmr testout 删除testout文件夹
./hadoop jar ./../hadoop-0.20.2-examples.jar wordcount testin testout
./hadoop dfs -cat testout/part-r-00000 查看testout文件夹下的part-r-00000文件
================================
遗留的问题
1. 好多人的博客中都写到hadoop0.20.2版本会遇到很多问题,“在windows用cygwin配置hadoop环境的时候一定要选择0.19.2的版本”。这个我暂时没遇到,另外提供0.19.2的下载链接,需要的自己下载:http://archive.apache.org/dist/hadoop/core/hadoop-0.19.2/ 我也上传到了csdn 或者可以留个邮箱我发给你
2. 在cygwin上跑起来没问题的wordCount,在eclipse下跑着总有问题,和最初遇到那个问题一样,找不到文件。这个还需要进一步解决
注.参考的文档:http://wildrain.iteye.com/blog/1164608
---低头拉车,抬头看路