mahout+hadoop安装，配置，运行

流夜未央 posted @ 2011年11月21日 10:46 in 未分类 , 8530 阅读

1.准备工作

下载maven2.x

配置maven环境变量：

export MAVEN_HOME=xxxx

export PATH=${MAVEN_HOME}/bin:${PATH}

运行mvn -v，检测是否成功

下载安装JDK

配置JDK环境变量：

export JAVA_HOME=xxxx

export CLASSPATH=$JAVA_HOME/lib

2.安装mahout

下载mahout源码

cd xxxx/mahout

sudo mvn install

运行mahout --help ##检查Mahout是否安装完好

3.安装hadoop

解压

配置环境变量

export HADOOP_HOME=xxxx/hadoop

export PATH=$HADOOP_HOME/bin:$PATH

配置hadoop-env.sh文件

export JAVA_HOME=/root/jdk1.6.0_24

配置伪分布式

配置core-site.xml

# vi core-site.xml

    <configuration>  
         <property>  
             <name>fs.default.name</name>  
             <value>hdfs://127.0.0.1:9000</value>  
         </property>  
    </configuration>

配置hdfs-site.xml

# vi hdfs-site.xml

    <configuration>  
    <name>dfs.name.dir</name>  
    <value>/usr/local/hadoop/hdfs/name</value>  
    </property>  
    <property>  
    <name>dfs.data.dir</name>  
    <value>/usr/local/hadoop/hdfs/data</value>  
    </property>  
    <property>  
    <name>dfs.replication</name>  
    <value>1</value>  
    </property>  
    </configuration>

配置mapred-site.xml

# vi mapred-site.xml

    <configuration>  
         <property>  
             <name>mapred.job.tracker</name>  
             <value>127.0.0.1:9001</value>  
         </property>  
    </configuration>

准备无需密码登录的ssh（从localhost）

Hadoop集群的很多部署操作都依赖于无密码登录。

密钥登录方式：

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

一般都会有主机的known_host认证会打扰登录过程（yes/no那个选择）

cat "StrictHostKeyChecking no" >> ~/.ssh/config

4.启动hadoop

格式化namenode

bin/hadoop namenode -format

全部启动

bin/start-all.sh，启动项应该是5项

然后看看web监控端口：

HDFS Namenode: http://localhost:50070

Job Tracker: http://localhost:50030

5.准备input数据(测试hadoop运行)

bin/hadoop fs -put ./conf/ input

bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

bin/hadoop fs -cat output/*

6.查看输出文件：

将输出文件从分布式文件系统拷贝到本地文件系统查看：
$ bin/hadoop fs -get output output
$ cat output/*

或者

在分布式文件系统上查看输出文件：
$ bin/hadoop fs -cat output/*

7.运行mahout kmeans算法示例

下载数据集synthetic_control.data，wget http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data

创建测试目录testdata，并把数据导入到这个tastdata目录中（这里的目录的名字只能是testdata，因为mahout它会自动到hdfs中去寻找这个目录）

$HADOOP_HOME/bin/hadoop fs -mkdir testdata
$HADOOP_HOME/bin/hadoop fs -put /home/test/synthetic_control.data testdata

运行，hadoop jar mahout-examples-0.4-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

这里运行时间可能会比较长，耐心等待

8.查看运行结果。依次运行下列命令：

bin/hadoop fs -lsr output

bin/hadoop fs -get output $MAHOUT_HOME/result

$cd MAHOUT_HOME/examples/result

$ls

看到如下结果：clusteredPoints clusters-0 clusters-1 clusters-2...... clusters-10 data，表示算法运行成功

其他参考资料：

http://wenku.baidu.com/view/dbd15bd276a20029bd642d55.html

http://blog.csdn.net/chjshan55/article/details/5923646

http://bbs.hadoopor.com/thread-983-1-1.html

https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data

[回复]

maids in dubai 说:
2021年9月29日 17:58

Household cleaning is definitely not a fun undertaking, not only because it involves grueling activities, such as hard scrubbing, but also because it may entail the use of products that can be harmful to the skin and to the body in general. A Norwegian study, which observed around 5000 women mostly doing household tasks with chemicals, reported high occurrences of chronic obstructive pulmonary disease (COPD) and skin pain in these subjects.

流夜未央

mahout+hadoop安装，配置，运行

流夜未央

分类

最新评论

最新留言

链接

功能