xiaoming728

xiaoming728

Hadoop HA 集群安装

2023-12-11
Hadoop HA 集群安装

集群规划

编号

ip

hostname

进程

00

192.168.206.180

server-00

nn1(NameNode)

zkfc(DFSZKFailoverController)

01

192.168.206.181

server-01

dn(datanode)

jn(journalnode)

zk(QuorumPeerMain)

NodeManager

02

192.168.206.182

server-02

dn(datanode)

jn(journalnode)

zk(QuorumPeerMain)

NodeManager

03

192.168.206.183

server-03

dn(datanode)

jn(journalnode)

zk(QuorumPeerMain)

NodeManager

04

192.168.206.184

server-04

ResourceManager

05

192.168.206.185

server-05

nn2(NameNode)

zkfc(DFSZKFailoverController)

ResourceManager

 

hadoop各个组件介绍

 

*修改hosts及hostname

127.0.0.1 locahost

192.168.206.180 server-00

192.168.206.181 server-01

192.168.206.182 server-02

192.168.206.183 server-03

192.168.206.184 server-04

192.168.206.185 server-05

*ssh设置免密登录

我们这里选择的是00机器和05机器作为NameNode,需要配置这两台机器免密登录到其他机器

生成秘钥对

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

spc将00机器公钥拷贝到其他机器

scp id_rsa.pub root@server-00:~/.ssh/id_rsa_00.pub

scp id_rsa.pub root@server-01:~/.ssh/id_rsa_00.pub

scp id_rsa.pub root@server-02:~/.ssh/id_rsa_00.pub

scp id_rsa.pub root@server-03:~/.ssh/id_rsa_00.pub

scp id_rsa.pub root@server-04:~/.ssh/id_rsa_00.pub

scp id_rsa.pub root@server-05:~/.ssh/id_rsa_00.pub

spc将05机器公钥拷贝到其他机器

scp id_rsa.pub root@server-00:~/.ssh/id_rsa_05.pub

scp id_rsa.pub root@server-01:~/.ssh/id_rsa_05.pub

scp id_rsa.pub root@server-02:~/.ssh/id_rsa_05.pub

scp id_rsa.pub root@server-03:~/.ssh/id_rsa_05.pub

scp id_rsa.pub root@server-04:~/.ssh/id_rsa_05.pub

scp id_rsa.pub root@server-05:~/.ssh/id_rsa_05.pub

在每台机器上可以使用cat将秘钥追加到authorized_keys文件

cat id_rsa_00.pub >> authorized_keys

cat id_rsa_04.pub >> authorized_keys

cat id_rsa_05.pub >> authorized_keys

此时authorized_keys文件权限需要改为644(经常因为权限问题导致ssh无密登录失败)

chmod 644 authorized_keys

*关闭防火墙

*关闭selinux

安装zookeeper

我们这里选择01机器、02机器和03机器

tar -xzvf apache-zookeeper-3.6.0-bin.tar.gz

mv apache-zookeeper-3.6.0-bin zookeeper-3.6.0

在/etc/profile中添加zk环境变量,并重新编译/etc/profile文件

vim /etc/profile

exprot ZK_HOME=/home/zookeeper-3.6.0

source /etc/profile

复制conf/zoo_simple.cfg 为同目录下 zoo.cfg,三台机器配置文件统一

cp zoo_simple.cfg zoo.cfg

vi zoo.cfg

修改datadir配置

dataDir=/home/tmp/zookeeper

增加集群配置

server.1=server-01:2888:3888

server.2=server-02:2888:3888

server.3=server-03:2888:3888

分别在三台机器的/home/hdfs/zookeeper目录下创建myid文件,内容分别为1/2/3

配置NameService

tar -xzvf hadoop-2.10.0.tar.gz

[hadoo-env.sh]

export JAVA_HOME=/usr/local/src/hoox/jdk1.8.0_91

[core-site.xml]

<configuration>

<!-- 指定hdfs的nameservice为archivescenter -->

<property>

<name>fs.defaultFS</name>

<value>hdfs://archivescenter/</value>

</property>

<!-- 指定hadoop临时目录 -->

<property>

<name>hadoop.tmp.dir</name>

<value>/data01/tmp/hadoop</value>

</property>

<!-- 指定zookeeper地址 -->

<property>

<name>ha.zookeeper.quorum</name>

<value>server-01:2181,server-02:2181,server-03:2181</value>

</property>

</configuration>

在hadoop的配置文件core-site.xml中,需要设置 fs.default.name 或 fs.defaultFS ,具体应该使用哪一个? 要首先判断是否开启了NameNode的HA (namenode 的 highavaliable),如果开启了nn ha,那么就用fs.defaultFS,在单一namenode的情况下,就用 fs.default.name , 如果在单一namenode节点的情况使用 fs.defaultFS ,系统将报

ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.

[hdfs-site.xml]

参数说明请参考:https://blog.csdn.net/w13770269691/article/details/24457241

<configuration>

<!--指定hdfs的nameservice为archivescenter,需要和core-site.xml中的保持一致 -->

<property>

<name>dfs.nameservices</name>

<value>archivescenter</value>

</property>

<!-- archivescenter下面有两个NameNode,分别是center00,center05-->

<property>

<name>dfs.ha.namenodes.archivescenter</name>

<value>center00,center05</value>

</property>

<!-- center00的RPC通信地址 -->

<property>

<name>dfs.namenode.rpc-address.archivescenter.center00</name>

<value>server-00:9000</value>

</property>

<!-- center00的http通信地址 -->

<property>

<name>dfs.namenode.http-address.archivescenter.center00</name>

<value>server-00:50070</value>

</property>

<!-- center05的RPC通信地址 -->

<property>

<name>dfs.namenode.rpc-address.archivescenter.center05</name>

<value>server-05:9000</value>

</property>

<!-- center05的http通信地址 -->

<property>

<name>dfs.namenode.http-address.archivescenter.center05</name>

<value>server-05:50070</value>

</property>

<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->

<property>

<name>dfs.namenode.shared.edits.dir</name>

<value>qjournal://server-01:8485;server-02:8485;server-03:8485/archivescenter</value>

</property>

<!-- 指定JournalNode在本地磁盘存放数据的位置 -->

<property>

<name>dfs.journalnode.edits.dir</name>

<value>/data01/tmp/hadoop/journaldata</value>

</property>

<!-- 开启NameNode失败自动切换 -->

<property>

<name>dfs.ha.automatic-failover.enabled</name>

<value>true</value>

</property>

<!-- 配置失败自动切换实现方式 -->

<property>

<name>dfs.client.failover.proxy.provider.archivescenter</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>

<!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->

<property>

<name>dfs.ha.fencing.methods</name>

<value>

sshfence

shell(/bin/true)

</value>

</property>

<!-- 使用sshfence隔离机制时需要ssh免登陆 -->

<property>

<name>dfs.ha.fencing.ssh.private-key-files</name>

<value>~/.ssh/id_rsa</value>

</property>

<!-- 配置sshfence隔离机制超时时间 -->

<property>

<name>dfs.ha.fencing.ssh.connect-timeout</name>

<value>30000</value>

</property>

</configuration>

[mapred-site.xml]

<configuration>

<!-- 指定mr框架为yarn方式 -->

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

[yarn-site.xml]

hadoop1.x版本JobTracker的作用是资源管理和任务的调度,当存在多个计算框架时,比如说spark,如果两个计算框架都有着自己的资源管理模块,就会存在资源竞争,不便于管理。此时就需要一个公共的资源管理模块,这就产生了YARN.

Yarn相关知识参考:https://www.jianshu.com/p/f50e85bdb9ce

    https://blog.csdn.net/amandalm/article/details/81630702

<configuration>

<!-- 开启RM高可用 -->

<property>

<name>yarn.resourcemanager.ha.enabled</name>

<value>true</value>

</property>

<!-- 指定RM的cluster id -->

<property>

<name>yarn.resourcemanager.cluster-id</name>

<value>yarn_cluster</value>

</property>

<!-- 指定RM的名字 -->

<property>

<name>yarn.resourcemanager.ha.rm-ids</name>

<value>server04,server05</value>

</property>

<!-- 分别指定RM的地址 -->

<property>

<name>yarn.resourcemanager.hostname.server04</name>

<value>server-04</value>

</property>

<property>

<name>yarn.resourcemanager.webapp.address.server04</name>

<value>server-04:8088</value>

</property>

<property>

<name>yarn.resourcemanager.hostname.server05</name>

<value>server-05</value>

</property>

<property>

<name>yarn.resourcemanager.webapp.address.server05</name>

<value>server-05:8088</value>

</property>

<!-- 指定zk集群地址 -->

<property>

<name>yarn.resourcemanager.zk-address</name>

<value>server-01:2181,server-02:2181,server-03:2181</value>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>

修改slaves

slaves是指定子节点的位置,因为要在server00上启动HDFS、在server04启动yarn,所以server00上的slaves文件指定的是datanode的位置,server04上的slaves文件指定的是nodemanager的位置

分别修改 server00和 server04上的配置文件增加子节点

hadoop HA 集群启动顺序

* 启动 zookeeper 在server-01,server-02,server-03执行

zkServer.sh start

* 启动 journalnode 在server-01,server-02,server-03执行

sbin/hadoop-daemon.sh start journalnode

* 格式化HDFS  在active NameNode (这里是server-00)上执行

hdfs namenode -format

格式化后会在core-site.xml中配置的hadoop.tmp.dir目录生成配置文件,这里配置的目录是/data01/tmp/hadoop

* 拷贝配置信息到 standby NameNode (这里是server-05)

scp -r /data01/tmp/hadoop root@server-05:/data01/tmp/hadoop

启动 active NameNode (这里是server-00)

hadoop-daemon.sh start namenode

* 在 standby NameNode 执行 (这里是server-05)

hdfs namenode -bootstrapStandby

当收到提示:Re-format filesystem in Storage Directory /home/tmp/hadoop/dfs/name ? (Y or N) 选择N

* 在 active NameNode 执行 (这里是server-00)

hadoop-daemon.sh stop namenode

在 standby NameNode 执行 (这里是server-05)

hdfs namenode -initializeSharedEdits

当收到提示:Re-format filesystem in QJM to [192.168.206.181:8485, 192.168.206.182:8485, 192.168.206.183:8485] ? (Y or N) 选择N

* 关闭journalnode 在三台journalnode上执行

hadoop-daemon.sh stop journalnode

* 格式化ZKFC 在server-00上执行即可,仅第一次启动执行

hdfs zkfc -formatZK

* 启动HDFS 在server-00上执行

sbin/start-dfs.sh

* 启动YARN 在server-05上执行

start-yarn.sh

把namenode和resourcemanager分开是因为性能问题,因为他们都要占用大量资源,所以把他们分开了,他们分开了就要分别在不同的机器上启动

* 手动启动server-05上面的resourcemanager

yarn-daemon.sh start resourcemanager

至此 hadoop HA 集群搭建成功