mahout+hadoop安装,配置,运行
1.准备工作
下载maven2.x
配置maven环境变量:
export MAVEN_HOME=xxxx
export PATH=${MAVEN_HOME}/bin:${PATH}
运行mvn -v,检测是否成功
下载安装JDK
配置JDK环境变量:
export JAVA_HOME=xxxx
export CLASSPATH=$JAVA_HOME/lib
2.安装mahout
下载mahout源码
cd xxxx/mahout
sudo mvn install
运行mahout --help ##检查Mahout是否安装完好
3.安装hadoop
解压
配置环境变量
export HADOOP_HOME=xxxx/hadoop
export PATH=$HADOOP_HOME/bin:$PATH
配置hadoop-env.sh文件
export JAVA_HOME=/root/jdk1.6.0_24
配置伪分布式
配置core-site.xml
# vi core-site.xml
<configuration> <property> <name>fs.default.name</name> <value>hdfs://127.0.0.1:9000</value> </property> </configuration>
配置hdfs-site.xml
# vi
<configuration> <name>dfs.name.dir</name> <value>/usr/local/hadoop/hdfs/name</value> </property> <property> <name>dfs.data.dir</name> <value>/usr/local/hadoop/hdfs/data</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
配置mapred-site.xml
# vi mapred-site.xml
<configuration> <property> <name>mapred.job.tracker</name> <value>127.0.0.1:9001</value> </property> </configuration>
准备无需密码登录的ssh(从localhost)
Hadoop集群的很多部署操作都依赖于无密码登录。
密钥登录方式:
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
一般都会有主机的known_host认证会打扰登录过程(yes/no那个选择)
cat "StrictHostKeyChecking no" >> ~/.ssh/config
格式化namenode
bin/hadoop namenode -format
全部启动
bin/start-all.sh,启动项应该是5项
然后看看web监控端口:
HDFS Namenode: http://localhost:50070
Job Tracker: http://localhost:50030
5.准备input数据(测试hadoop运行)
bin/hadoop fs -put ./conf/ input
bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
bin/hadoop fs -cat output/*
6.查看输出文件:
将输出文件从分布式文件系统拷贝到本地文件系统查看:
$ bin/hadoop fs -get output output
$ cat output/*
或者
在分布式文件系统上查看输出文件:
$ bin/hadoop fs -cat output/*
7.运行mahout kmeans算法示例
下载数据集synthetic_control.data,wget http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data
创建测试目录testdata,并把数据导入到这个tastdata目录中(这里的目录的名字只能是testdata,因为mahout它会自动到hdfs中去寻找这个目录)
$HADOOP_HOME/bin/hadoop fs -mkdir testdata
$HADOOP_HOME/bin/hadoop fs -put /home/test/synthetic_control.data testdata
运行,hadoop jar mahout-examples-0.4-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
这里运行时间可能会比较长,耐心等待
8.查看运行结果。依次运行下列命令:
bin/hadoop fs -lsr output
bin/hadoop fs -get output $MAHOUT_HOME/result
$cd MAHOUT_HOME/examples/result
$ls
看到如下结果:clusteredPoints clusters-0 clusters-1 clusters-2...... clusters-10 data,表示算法运行成功
其他参考资料:
http://wenku.baidu.com/view/dbd15bd276a20029bd642d55.html
http://blog.csdn.net/chjshan55/article/details/5923646
http://bbs.hadoopor.com/thread-983-1-1.html
https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data
2021年9月29日 17:58
Household cleaning is definitely not a fun undertaking, not only because it involves grueling activities, such as hard scrubbing, but also because it may entail the use of products that can be harmful to the skin and to the body in general. A Norwegian study, which observed around 5000 women mostly doing household tasks with chemicals, reported high occurrences of chronic obstructive pulmonary disease (COPD) and skin pain in these subjects.