`
小网客
  • 浏览: 1216110 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

[综合]Apache Hadoop 2.2.0集群安装(2)[翻译]

 
阅读更多

NodeManager节点健康监控

hadoop提供一个检测一个节点健康状态的机制,那就是管理员可以配置NodeManager去周期性执行一个脚本。

管理员可以在这个脚本中做任何的状态监控从而决定此节点是否健康。如果某节点不健康了,那么他们会有一个标准的错误输出,NodeManager的脚本周期性检测输出,如果节点输出中包含了ERROR字符串,那么此节点会被上报为unhealthy ,并且此节点会被ResourceManager放入黑名单。从而将不会有task被分配到此节点上,不过NodeManager 仍然会健康此节点,当此节点正常之后他将会被从ResourceManager 的黑名单中自动移除,节点的运行状况取决于输出,当他不正常的时候他仍然会在ResourceManager上展示。

如下参数为节点状况健康脚本的配置conf/yarn-site.xml:

Parameter Value Notes
yarn.nodemanager.health-checker.script.path Node health script Script to check for node's health status.
yarn.nodemanager.health-checker.script.opts Node health script options Options for script to check for node's health status.
yarn.nodemanager.health-checker.script.interval-ms Node health script interval Time interval for running health script.
yarn.nodemanager.health-checker.script.timeout-ms Node health script timeout interval Timeout for health script execution.

当一些物理磁盘出现坏道时监控程序不会提示错误。NodeManager 有能力对物理磁盘做周期性检测(特别是nodemanager-local-dirs and nodemanager-log-dirs)当目录损坏数达到配置的阀值(yarn.nodemanager.disk-health-checker.min-healthy-disks配置的)之后整个节点就会被标记为不正常的。同时这些信息也会上报给资源管理器(resource manager),检测脚本也会检测启动盘。

 

Slaves文件

通常你选择了一个机器做NameNode ,一个机器做ResourceManager,其他的做DataNode和NodeManager 也就是从节点。

把所有的从节点的ip或者hostname写在conf/slaves文件里,每个机器一行。

 

日志

Hadoop 用apache的log4j去访问Apache Commons Logging框架去记录日志。去修改conf/log4j.properties 可以自定义自己的日志输出。

 

操作Hadoop集群

一旦配置文件都已经配置完成之后拷贝他们到所有机器的HADOOP_CONF_DIR 目录

Hadoop启动

你需要启动hdfs和YARN 

格式化一个新的分布式系统:

 

$ $HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>

NameNode执行如下命令去启动hdfs:

 

 

$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode

在所有的从节点上执行如下命令启动DataNodes :

 

 

$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode

ResourceManager上执行如下命令去启动YARN

$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager

在所有的从节点上执行如下命令去启动NodeManagers :

 

 

 

$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager

单独启动一个web服务器,如果需要负载均衡的话那么在每个机子上都执行如下脚本:

 

 

$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh start proxyserver --config $HADOOP_CONF_DIR

在任何一台机子上执行如下命令去启动MapReduce JobHistory 服务:

 

 

$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start historyserver --config $HADOOP_CONF_DIR

Hadoop集群关闭

 

NameNode 节点上执行如下命令去关闭NameNode进程:

 

$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode

在所有的从节点上执行如下脚本去停止DataNodes 进程:

 

 

$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode

ResourceManager 节点上执行如下命令可以停止ResourceManager 进程:

 

 

$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager

在所有从节点执行如下命令去停止NodeManagers 进程:

 

 

$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop nodemanager

在运行WebAppProxy 的节点上执行如下命令可以停止WebAppProxy 服务:

 

 

$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh stop proxyserver --config $HADOOP_CONF_DIR

在运行MapReduce JobHistory 服务的节点上执行如下命令去停止MapReduce JobHistory 服务:

 

 

$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config $HADOOP_CONF_DIR

 

 

Hadoop在安全模式下运行

本节将讲述一些在安全模式下运行的参数,安全模式是可靠的基于Kerberos协议认证的。

Hadoop进程的用户账户

确保HDFS和YARN进程是由不同的Unix用户启动的,如hdfs,yarn,并且MapReduce JobHistory 是由mapred启动的。

推荐他们都属于同一个组如Hadoop:

User:Group Daemons
hdfs:hadoop NameNode, Secondary NameNode, Checkpoint Node, Backup Node, DataNode
yarn:hadoop ResourceManager, NodeManager
mapred:hadoop MapReduce JobHistory Server

HDSF和本地文件权限:

下表罗列出hdfs上的path和本地文件系统的推荐权限设置:

Filesystem Path User:Group Permissions
local dfs.namenode.name.dir hdfs:hadoop drwx------
local dfs.datanode.data.dir hdfs:hadoop drwx------
local $HADOOP_LOG_DIR hdfs:hadoop drwxrwxr-x
local $YARN_LOG_DIR yarn:hadoop drwxrwxr-x
local yarn.nodemanager.local-dirs yarn:hadoop drwxr-xr-x
local yarn.nodemanager.log-dirs yarn:hadoop drwxr-xr-x
local container-executor root:hadoop --Sr-s---
local conf/container-executor.cfg root:hadoop r--------
hdfs / hdfs:hadoop drwxr-xr-x
hdfs /tmp hdfs:hadoop drwxrwxrwxt
hdfs /user hdfs:hadoop drwxr-xr-x
hdfs yarn.nodemanager.remote-app-log-dir yarn:hadoop drwxrwxrwxt
hdfs mapreduce.jobhistory.intermediate-done-dir mapred:hadoop drwxrwxrwxt
hdfs mapreduce.jobhistory.done-dir mapred:hadoop drwxr-x---

Kerberos Keytab文件:

HDFS:

NameNode 节点上的的keytab文件如下:

 

$ /usr/kerberos/bin/klist -e -k -t /etc/security/keytab/nn.service.keytab
Keytab name: FILE:/etc/security/keytab/nn.service.keytab
KVNO Timestamp         Principal
   4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)

Secondary NameNode 的keytab文件如下:

 

 

$ /usr/kerberos/bin/klist -e -k -t /etc/security/keytab/sn.service.keytab
Keytab name: FILE:/etc/security/keytab/sn.service.keytab
KVNO Timestamp         Principal
   4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)

DataNode 的keytab文件如下:

 

 

$ /usr/kerberos/bin/klist -e -k -t /etc/security/keytab/dn.service.keytab
Keytab name: FILE:/etc/security/keytab/dn.service.keytab
KVNO Timestamp         Principal
   4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)

YARN:

 

ResourceManager 节点上的ResourceManager keytab文件如下:

 

$ /usr/kerberos/bin/klist -e -k -t /etc/security/keytab/rm.service.keytab
Keytab name: FILE:/etc/security/keytab/rm.service.keytab
KVNO Timestamp         Principal
   4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)

NodeManager节点上的keytab文件如下:

 

 

$ /usr/kerberos/bin/klist -e -k -t /etc/security/keytab/nm.service.keytab
Keytab name: FILE:/etc/security/keytab/nm.service.keytab
KVNO Timestamp         Principal
   4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)

MapReduce JobHistory Server:

 

MapReduce JobHistory Server keytab 文件如下:

 

$ /usr/kerberos/bin/klist -e -k -t /etc/security/keytab/jhs.service.keytab
Keytab name: FILE:/etc/security/keytab/jhs.service.keytab
KVNO Timestamp         Principal
   4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)

 

安全模式配置:

conf/core-site.xml:

Parameter Value Notes
hadoop.security.authentication kerberos simple is non-secure.
hadoop.security.authorization true Enable RPC service-level authorization.

 

conf/hdfs-site.xml:

NameNode配置:

Parameter Value Notes
dfs.block.access.token.enable true Enable HDFS block access tokens for secure operations.
dfs.https.enable true  
dfs.namenode.https-address nn_host_fqdn:50470  
dfs.https.port 50470  
dfs.namenode.keytab.file /etc/security/keytab/nn.service.keytab Kerberos keytab file for the NameNode.
dfs.namenode.kerberos.principal nn/_HOST@REALM.TLD Kerberos principal name for the NameNode.
dfs.namenode.kerberos.https.principal host/_HOST@REALM.TLD HTTPS Kerberos principal name for the NameNode.

Secondary NameNode配置:

Parameter Value Notes
dfs.namenode.secondary.http-address c_nn_host_fqdn:50090  
dfs.namenode.secondary.https-port 50470  
dfs.namenode.secondary.keytab.file /etc/security/keytab/sn.service.keytab Kerberos keytab file for the NameNode.
dfs.namenode.secondary.kerberos.principal sn/_HOST@REALM.TLD Kerberos principal name for the Secondary NameNode.
dfs.namenode.secondary.kerberos.https.principal host/_HOST@REALM.TLD HTTPS Kerberos principal name for the Secondary NameNode.

 

DataNode配置:

Parameter Value Notes
dfs.datanode.data.dir.perm 700  
dfs.datanode.address 0.0.0.0:2003  
dfs.datanode.https.address 0.0.0.0:2005  
dfs.datanode.keytab.file /etc/security/keytab/dn.service.keytab Kerberos keytab file for the DataNode.
dfs.datanode.kerberos.principal dn/_HOST@REALM.TLD Kerberos principal name for the DataNode.
dfs.datanode.kerberos.https.principal host/_HOST@REALM.TLD HTTPS Kerberos principal name for the DataNode.

 

conf/yarn-site.xml:

WebAppProxy:

WebAppProxy在应用和用户之间提供了一个web输出,如果是在安全模式下那么当用户不安全访问的时候就会被警告,跟普通的web应用一样。

Parameter Value Notes
yarn.web-proxy.address WebAppProxy host:port for proxy to AM web apps. host:port if this is the same as yarn.resourcemanager.webapp.address or it is not defined then theResourceManager will run the proxy otherwise a standalone proxy server will need to be launched.
yarn.web-proxy.keytab /etc/security/keytab/web-app.service.keytab Kerberos keytab file for the WebAppProxy.
yarn.web-proxy.principal wap/_HOST@REALM.TLD Kerberos principal name for the WebAppProxy.

LinuxContainerExecutor:

YARN框架使用的ContainerExecutor 定义了多少个容器被启动和控制。

如下在Hadoop YARN是也是有效的:

ContainerExecutor Description
DefaultContainerExecutor The default executor which YARN uses to manage container execution. The container process has the same Unix user as the NodeManager.
LinuxContainerExecutor Supported only on GNU/Linux, this executor runs the containers as the user who submitted the application. It requires all user accounts to be created on the cluster nodes where the containers are launched. It uses a setuid executable that is included in the Hadoop distribution. The NodeManager uses this executable to launch and kill containers. The setuid executable switches to the user who has submitted the application and launches or kills the containers. For maximum security, this executor sets up restricted permissions and user/group ownership of local files and directories used by the containers such as the shared objects, jars, intermediate files, log files etc. Particularly note that, because of this, except the application owner and NodeManager, no other user can access any of the local files/directories including those localized as part of the distributed cache.

构建LinuxContainerExecutor 执行如下脚本:

 

 $ mvn package -Dcontainer-executor.conf.dir=/etc/hadoop/

通过 -Dcontainer-executor.conf.dir传过来的路径集群节点上必须有且是本地的路径,执行文件必须在$HADOOP_YARN_HOME/bin中有。执行文件必须有权限:6050 or --Sr-s--- ,NodeManager 的unix用户必须同组,这个组必须是个特殊的组,如果其他应用程序具有这个组的权限那么他将是不安全的,这个组的名称需要在 yarn.nodemanager.linux-container-executor.group 属性中配置涉及到conf/yarn-site.xml and conf/container-executor.cfg两个文件。

 

如:NodeManager 的启动用户为yarn 为hadoop组,users组中有如下两个用户yarn 和alice(应用程序提交者) 同时alice 不属于hadoop组如上所述那么setuid/setgid 执行文件必须设置权限为 6050 or --Sr-s--- ,yarn 用户和hadoop 组(这样alice 就不能执行了)。

LinuxTaskController 需要的目录 yarn.nodemanager.local-dirs andyarn.nodemanager.log-dirs他们的权限设置为755 。

conf/container-executor.cfg:

执行文件需要一个配置文件container-executor.cfg上面mvn提到的,此文件必须为运行NodeManager 的用户所有(如上面的yarn ),任意组那么权限为:0400 or r--------.

执行文件需要下属参数在conf/container-executor.cfg配置,以key-value对出现,并且一行一个。

Parameter Value Notes
yarn.nodemanager.linux-container-executor.group hadoop Unix group of the NodeManager. The group owner of the container-executor binary should be this group. Should be same as the value with which the NodeManager is configured. This configuration is required for validating the secure access of the container-executor binary.
banned.users hfds,yarn,mapred,bin Banned users.
allowed.system.users foo,bar Allowed system users.
min.user.id 1000 Prevent other super-users.

LinuxContainerExecutor中涉及到的本地文件系统权限如下:

Filesystem Path User:Group Permissions
local container-executor root:hadoop --Sr-s---
local conf/container-executor.cfg root:hadoop r--------
local yarn.nodemanager.local-dirs yarn:hadoop drwxr-xr-x
local yarn.nodemanager.log-dirs yarn:hadoop drwxr-xr-x

 

ResourceManager配置:

Parameter Value Notes
yarn.resourcemanager.keytab /etc/security/keytab/rm.service.keytab Kerberos keytab file for the ResourceManager.
yarn.resourcemanager.principal rm/_HOST@REALM.TLD Kerberos principal name for the ResourceManager.

NodeManager配置:

 

Parameter Value Notes
yarn.nodemanager.keytab /etc/security/keytab/nm.service.keytab Kerberos keytab file for the NodeManager.
yarn.nodemanager.principal nm/_HOST@REALM.TLD Kerberos principal name for the NodeManager.
yarn.nodemanager.container-executor.class org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor Use LinuxContainerExecutor.
yarn.nodemanager.linux-container-executor.group hadoop Unix group of the NodeManager.

 

conf/mapred-site.xml

MapReduce JobHistory Server配置:

Parameter Value Notes
mapreduce.jobhistory.address MapReduce JobHistory Server host:port Default port is 10020.
mapreduce.jobhistory.keytab /etc/security/keytab/jhs.service.keytab Kerberos keytab file for the MapReduce JobHistory Server.
mapreduce.jobhistory.principal jhs/_HOST@REALM.TLD Kerberos principal name for the MapReduce JobHistory Server.

 

操作hadoop集群

一旦配置完成之后就把所有HADOOP_CONF_DIR 里面的文件拷贝到其他节点上

此章节会说明不同的unix用户启动不同的hadoop服务,采用的unix系统用户和用户组

hadoop启动

启动hadoop集群你需要启动HDFS and YARN 集群

hdfs用户格式hadoop文件系统执行如下命令:

[hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>

NameNode 节点上启动hdfs,用户为hdfs用户:

[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode

DataNodes 节点上启动DataNodes 用户为root,设置环境变量HADOOP_SECURE_DN_USER为hdfs:

[root]$ HADOOP_SECURE_DN_USER=hdfs $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode

ResourceManager 节点上执行如下命令启动YARN,用户为yarn:

[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager

在其他从节点上执行如下命令启动NodeManagers,用户为yarn:

[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager

用户yarn启动一个WebAppProxy 服务如果需要启动多个去负载均衡那么就用同样的方式启动多个:

[yarn]$ $HADOOP_YARN_HOME/bin/yarn start proxyserver --config $HADOOP_CONF_DIR

用mapred用户启动MapReduce JobHistory Server :

[mapred]$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start historyserver --config $HADOOP_CONF_DIR

 

hadoop集群关闭:

用户hdfs执行如下命令关闭NameNode :

[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode

root用户在所有从节点上执行如下命令停止DataNodes :

[root]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode

yarn用户在ResourceManager 节点上执行如下命令关闭ResourceManager:

[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager

yarn用户在所有的从节点上执行如下命令结束NodeManagers:

[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop nodemanager

yarn用户在WebAppProxy server.节点上执行如下命令停止WebAppProxy server.如果有多台那么依次:

[yarn]$ $HADOOP_YARN_HOME/bin/yarn stop proxyserver --config $HADOOP_CONF_DIR

mapred用户执行如下命令停止MapReduce JobHistory Server:

[mapred]$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config $HADOOP_CONF_DIR

 

Web监控页面

一旦集群启动之后可以通过web-ui监控进程运行情况:

Daemon Web Interface Notes
NameNode http://nn_host:port/ Default HTTP port is 50070.
ResourceManager http://rm_host:port/ Default HTTP port is 8088.
MapReduce JobHistory Server http://jhs_host:port/ Default HTTP port is 19888.

          

 

 

0
0
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics