hadoop2.6HA部署

因?yàn)樾枰渴餾park環(huán)境，特意重新安裝了一個(gè)測(cè)試的hadoop集群，現(xiàn)將相關(guān)步驟記錄如下：

成都創(chuàng)新互聯(lián)堅(jiān)持“要么做到，要么別承諾”的工作理念，服務(wù)領(lǐng)域包括：成都網(wǎng)站設(shè)計(jì)、成都網(wǎng)站建設(shè)、企業(yè)官網(wǎng)、英文網(wǎng)站、手機(jī)端網(wǎng)站、網(wǎng)站推廣等服務(wù)，滿足客戶于互聯(lián)網(wǎng)時(shí)代的山海關(guān)網(wǎng)站設(shè)計(jì)、移動(dòng)媒體設(shè)計(jì)的需求，幫助企業(yè)找到有效的互聯(lián)網(wǎng)解決方案。努力成為您成熟可靠的網(wǎng)絡(luò)建設(shè)合作伙伴！

硬件環(huán)境：四臺(tái)虛擬機(jī)，hadoop1~hadoop4，3G內(nèi)存，60G硬盤，2核CPU

軟件環(huán)境：CentOS6.5，hadoop-2.6.0-cdh6.8.2，JDK1.7

部署規(guī)劃：

hadoop1（192.168.0.3）：namenode（active）、resourcemanager

hadoop2（192.168.0.4）：namenode（standby）、journalnode、datanode、nodemanager、historyserver

hadoop3（192.168.0.5）：journalnode、datanode、nodemanager

hadoop4（192.168.0.6）：journalnode、datanode、nodemanager

HDFS的HA采用QJM的方式（journalnode）：

hadoop2.6 HA部署

一、系統(tǒng)準(zhǔn)備

1、每臺(tái)機(jī)關(guān)閉selinux

#vi /etc/selinux/config

SELINUX=disabled

2、每臺(tái)機(jī)關(guān)閉防火墻（切記要關(guān)閉，否則格式化hdfs時(shí)會(huì)報(bào)錯(cuò)無法連接journalnode）

#chkconfig iptables off

#service iptables stop

3、每臺(tái)機(jī)安裝jdk1.7

#cd /software

#tar -zxf jdk-7u65-linux-x64.gz -C /opt/

#cd /opt

#ln -s jdk-7u65-linux-x64.gz java

#vi /etc/profile

export JAVA_HOME=/opt/java

export PATH=$PATH:$JAVA_HOME/bin

4、每臺(tái)機(jī)建立hadoop相關(guān)用戶，并建立互信

#useradd grid

#passwd grid

（建立互信步驟略）

5、每臺(tái)機(jī)建立相關(guān)目錄

#mkdir -p /hadoop_data/hdfs/name

#mkdir -p /hadoop_data/hdfs/data

#mkdir -p /hadoop_data/hdfs/journal

#mkdir -p /hadoop_data/yarn/local

#chown -R grid:grid /hadoop_data

二、hadoop部署

HDFS HA主要是指定nameservices（如果不做HDFS ferderation，就只會(huì)有一個(gè)ID），同時(shí)指定該nameserviceID下面的兩個(gè)namenode及其地址。此處的nameservice名設(shè)置為hadoop-spark

1、每臺(tái)機(jī)解壓hadoop包

#cd /software

#tar -zxf hadoop-2.6.0-cdh6.8.2.tar.gz -C /opt/

#cd /opt

#chown -R grid:grid hadoop-2.6.0-cdh6.8.2

#ln -s hadoop-2.6.0-cdh6.8.2 hadoop

2、切換到grid用戶繼續(xù)操作

#su - grid

$cd /opt/hadoop/etc/hadoop

3、配置hadoop-env.sh（其實(shí)只配置JAVA_HOME）

$vi hadoop-env.sh

# The java implementation to use.

export JAVA_HOME=/opt/java

4、設(shè)置hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>hadoop-spark</value>
<description>
Comma-separated list of nameservices.
</description>
</property>
<property>
<name>dfs.ha.namenodes.hadoop-spark</name>
<value>nn1,nn2</value>
<description>
The prefix for a given nameservice, contains a comma-separated
list of namenodes for a given nameservice (eg EXAMPLENAMESERVICE).
</description>
</property>
<property>
<name>dfs.namenode.rpc-address.hadoop-spark.nn1</name>
<value>hadoop1:8020</value>
<description>
RPC address for nomenode1 of hadoop-spark
</description>
</property>
<property>
<name>dfs.namenode.rpc-address.hadoop-spark.nn2</name>
<value>hadoop2:8020</value>
<description>
RPC address for nomenode2 of hadoop-spark
</description>
</property>
<property>
<name>dfs.namenode.http-address.hadoop-spark.nn1</name>
<value>hadoop1:50070</value>
<description>
The address and the base port where the dfs namenode1 web ui will listen on.
</description>
</property>
<property>
<name>dfs.namenode.http-address.hadoop-spark.nn2</name>
<value>hadoop2:50070</value>
<description>
The address and the base port where the dfs namenode2 web ui will listen on.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///hadoop_data/hdfs/name</value>
<description>Determines where on the local filesystem the DFS name node
should store the name table(fsp_w_picpath).  If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy. </description>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop2:8485;hadoop3:8485;hadoop4:8485/hadoop-spark</value>
<description>A directory on shared storage between the multiple namenodes
in an HA cluster. This directory will be written by the active and read
by the standby in order to keep the namespaces synchronized. This directory
does not need to be listed in dfs.namenode.edits.dir above. It should be
left empty in a non-HA cluster.
</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///hadoop_data/hdfs/data</value>
<description>Determines where on the local filesystem an DFS data node
should store its blocks.  If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices.
Directories that do not exist are ignored.
</description>
</property>
<!-- 這個(gè)如果不設(shè)置，會(huì)造成無法直接通過nameservice名稱來訪問hdfs，只能直接寫active的namenode地址 -->
<property> 
  <name>dfs.client.failover.proxy.provider.hadoop-spark</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>false</value>
<description>
Whether automatic failover is enabled. See the HDFS High
Availability documentation for details on automatic HA
configuration.
</description>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/hadoop_data/hdfs/journal</value>
</property>
</configuration>

5、配置core-site.xml（配置fs.defaultFS，使用HA的nameservices名稱）

<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-spark</value>
<description>The name of the default file system.  A URI whose
scheme and authority determine the FileSystem implementation.  The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class.  The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>

6、配置mapred-site.xml

<configuration>
<!-- MR YARN Application properties -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>The runtime framework for executing MapReduce jobs.
Can be one of local, classic or yarn.
</description>
</property>
<!-- jobhistory properties -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop2:10020</value>
<description>MapReduce JobHistory Server IPC host:port</description>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop2:19888</value>
<description>MapReduce JobHistory Server Web UI host:port</description>
</property>
</configuration>

7、配置yarn-site.xml

<configuration>
<!-- Site specific YARN configuration properties -->
<!-- Resource Manager Configs -->
<property>
<description>The hostname of the RM.</description>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop1</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<description>The http address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<description>The https adddress of the RM web application.</description>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<description>The class to use as the resource scheduler.</description>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<description>fair-scheduler conf location</description>
<name>yarn.scheduler.fair.allocation.file</name>
<value>${yarn.home.dir}/etc/hadoop/fairscheduler.xml</value>
</property>
<property>
<description>List of directories to store localized files in. An
application's localized file directory will be found in:
${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
Individual containers' work directories, called container_${contid}, will
be subdirectories of this.
</description>
<name>yarn.nodemanager.local-dirs</name>
<value>/hadoop_data/yarn/local</value>
</property>
<property>
<description>Whether to enable log aggregation</description>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/tmp/logs</value>
</property>
<property>
<description>Amount of physical memory, in MB, that can be allocated
for containers.</description>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<description>Number of CPU cores that can be allocated
for containers.</description>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
<property>
<description>the valid service name should only contain a-zA-Z0-9_ and can not start with numbers</description>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

8、配置slaves

hadoop2

hadoop3

hadoop4

9、配置fairscheduler.xml

<?xml version="1.0"?>
<allocations>
<queue name="common">
<minResources>0mb, 0 vcores </minResources>
<maxResources>6144 mb, 6 vcores </maxResources>
<maxRunningApps>50</maxRunningApps>
<minSharePreemptionTimeout>300</minSharePreemptionTimeout>
<weight>1.0</weight>
<aclSubmitApps>grid</aclSubmitApps>
</queue>
</allocations>

10、同步配置文件到各個(gè)節(jié)點(diǎn)

$cd /opt/hadoop/etc

$scp -r hadoop hadoop2:/opt/hadoop/etc/

$scp -r hadoop hadoop3:/opt/hadoop/etc/

$scp -r hadoop hadoop4:/opt/hadoop/etc/

三、啟動(dòng)集群（格式化文件系統(tǒng)）

1、建立環(huán)境變量

$vi ~/.bash_profile

export HADOOP_HOME=/opt/hadoop

export YARN_HOME_DIR=/opt/hadoop

export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop

export YARN_CONF_DIR=/opt/hadoop/etc/hadoop

2、啟動(dòng)HDFS

先啟動(dòng)journalnode，在hadoop2~hadoop4上：

$cd /opt/hadoop/

$sbin/hadoop-daemon.sh start journalnode

格式化HDFS，然后啟動(dòng)namenode。在hadoop1上：

$bin/hdfs namenode -format

$sbin/hadoop-daemon.sh start namenode

同步另一個(gè)namenode，并啟動(dòng)。在hadoop2上：

$bin/hdfs namenode -bootstrapStandby

$sbin/hadoop-daemon.sh start namenode

此時(shí)兩個(gè)namenode都是standby狀態(tài)，將hadoop1切換成active（hadoop1在hdfs-site.xml里對(duì)應(yīng)的是nn1）：

$bin/hdfs haadmin -transitionToActive nn1

啟動(dòng)datanode。在hadoop1上（active的namenode）：

$sbin/hadoop-daemons.sh start datanode

注意事項(xiàng)：后續(xù)啟動(dòng)，只需使用sbin/start-dfs.sh即可。但由于沒有配置zookeeper的failover，所以只能HA只能使用手工切換。所以每次啟動(dòng)HDFS，都要執(zhí)行$bin/hdfs haadmin -transitionToActive nn1來使hadoop1的namenode變成active狀態(tài)

2、啟動(dòng)yarn

在hadoop1上（resourcemanager）：

$sbin/start-yarn.sh

————————————————————————————————————————————

以上配置的HDFS HA并不是自動(dòng)故障切換的，如果配置HDFS自動(dòng)故障切換，需要添加以下步驟（先停掉集群）：

1、部署zookeeper，步驟省略。部署在hadoop2、hadoop3、hadoop4，并啟動(dòng)

2、在hdfs-site.xml中添加：

<property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property>

<property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/exampleuser/.ssh/id_rsa</value> </property>

解釋詳見官方文檔。這樣配置設(shè)定了fencing方法是通過ssh去關(guān)閉前一個(gè)活動(dòng)節(jié)點(diǎn)的端口。前提前兩個(gè)namenode能互相SSH。

還有另外一種配置方法：

<property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property>

<property> <name>dfs.ha.fencing.methods</name> <value>shell(/path/to/my/script.sh arg1 arg2 ...)</value> </property>

這樣的配置實(shí)際上是使用shell來隔絕端口和程序。如果不想做實(shí)際的動(dòng)作，dfs.ha.fencing.methods可配置成shell(/bin/true)

3、在core-site.xml中添加：

<property> <name>ha.zookeeper.quorum</name> <value>hadoop2:2181,hadoop3:2181,hadoop4:2181</value> </property>

4、初始化zkfc（在namenode上執(zhí)行）

bin/hdfs zkfc -formatZK

5、啟動(dòng)集群

___________________________________________________________________________________________________

zkfc：每個(gè)namenode上都運(yùn)行，是zk的客戶端，負(fù)責(zé)自動(dòng)故障切換

zk：奇數(shù)個(gè)節(jié)點(diǎn)，維護(hù)一致性鎖、負(fù)責(zé)選舉活動(dòng)節(jié)點(diǎn)

joural node：奇數(shù)個(gè)節(jié)點(diǎn)，用于active和standby節(jié)點(diǎn)之間數(shù)據(jù)同步?；顒?dòng)節(jié)點(diǎn)把數(shù)據(jù)寫入這些節(jié)點(diǎn)，standby節(jié)點(diǎn)讀取

————————————————————————————————————————————

更改成resourcemanager HA：

選擇hadoop2作為另一個(gè)rm節(jié)點(diǎn)

1、設(shè)置hadoop2對(duì)其它節(jié)點(diǎn)作互信

2、編譯yarn-site.xml并同步到其它機(jī)器

3、復(fù)制fairSheduler.xml到hadoop2

4、啟動(dòng)rm

5、啟動(dòng)另一個(gè)rm

名稱欄目：hadoop2.6HA部署
URL鏈接：http://bm7419.com/article26/gejejg.html

成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián)，為您提供定制網(wǎng)站、響應(yīng)式網(wǎng)站、外貿(mào)網(wǎng)站建設(shè)、網(wǎng)站制作、全網(wǎng)營銷推廣、軟件開發(fā)

聲明：本網(wǎng)站發(fā)布的內(nèi)容（圖片、視頻和文字）以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主，如果涉及侵權(quán)請(qǐng)盡快告知，我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng)，如需處理請(qǐng)聯(lián)系客服。電話：028-86922220；郵箱：631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載，或轉(zhuǎn)載時(shí)需注明來源：創(chuàng)新互聯(lián)

猜你還喜歡下面的內(nèi)容