- 浏览: 537856 次
文章分类
最新评论
Hadoop启动脚本全解析,不能再全了![bed]
转自: http://blog.csdn.net/mrtitan/article/details/9115621
在工作过程中,经常需要调整一些hadoop的参数配置,所以经常会遇到各种各样的问题。比如改了个配置怎么突然namenode起不来啦,加了个jar包怎么让hadoop的jvm加载啊,如何设定log目录啦等等,每次都需要仔细的查一遍启动脚本才能找到原因,费时又费力,因此专门总结了一下以便不时之需。
cloudera的hadoop的启动脚本写的异常复杂和零散,各种shell脚本分散在系统的各个角落,让人很无语。下面以namenode启动的过程为例说明hadoop的启动脚本的调用关系和各个脚本的作用。
hadoop启动的入口脚本是/etc/init.d/hadoop-hdfs-name,下面我们顺着启动namenode的顺序看看hadoop的启动调用过程。
/etc/init.d/hadoop-hdfs-namenode:
#1.加载/etc/default/hadoop /etc/default/hadoop-hdfs-namenode
#2.执行/usr/lib/hadoop/sbin/hadoop-daemon.sh启动namenode
cloudera启动namenode的用户为hdfs,默认的配置目录是/etc/hadoop/conf
- start(){
- [-x$EXEC_PATH]||exit$ERROR_PROGRAM_NOT_INSTALLED
- [-d$CONF_DIR]||exit$ERROR_PROGRAM_NOT_CONFIGURED
- log_success_msg"Starting${DESC}:"
- su-s/bin/bash$SVC_USER-c"$EXEC_PATH--config'$CONF_DIR'start$DAEMON_FLAGS"
- #Someprocessesareslowtostart
- sleep$SLEEP_TIME
- checkstatusofproc
- RETVAL=$?
- [$RETVAL-eq$RETVAL_SUCCESS]&&touch$LOCKFILE
- return$RETVAL
- }
start() { [ -x $EXEC_PATH ] || exit $ERROR_PROGRAM_NOT_INSTALLED [ -d $CONF_DIR ] || exit $ERROR_PROGRAM_NOT_CONFIGURED log_success_msg "Starting ${DESC}: " su -s /bin/bash $SVC_USER -c "$EXEC_PATH --config '$CONF_DIR' start $DAEMON_FLAGS" # Some processes are slow to start sleep $SLEEP_TIME checkstatusofproc RETVAL=$? [ $RETVAL -eq $RETVAL_SUCCESS ] && touch $LOCKFILE return $RETVAL }
/etc/default/hadoop /etc/default/hadoop-hdfs-namenode:
#1.配置logdir,piddir,user
/usr/lib/hadoop/sbin/hadoop-daemon.sh
#1.加载/usr/lib/hadoop/libexec/hadoop-config.sh
- DEFAULT_LIBEXEC_DIR="$bin"/../libexec
- HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
- .$HADOOP_LIBEXEC_DIR/hadoop-config.sh
DEFAULT_LIBEXEC_DIR="$bin"/../libexec HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR} . $HADOOP_LIBEXEC_DIR/hadoop-config.sh
#2.加载hadoop-env.sh
- if[-f"${HADOOP_CONF_DIR}/hadoop-env.sh"];then
- ."${HADOOP_CONF_DIR}/hadoop-env.sh"
- fi
if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then . "${HADOOP_CONF_DIR}/hadoop-env.sh" fi
#3.指定log目录
- #getlogdirectory
- if["$HADOOP_LOG_DIR"=""];then
- exportHADOOP_LOG_DIR="$HADOOP_PREFIX/logs"
- fi
# get log directory if [ "$HADOOP_LOG_DIR" = "" ]; then export HADOOP_LOG_DIR="$HADOOP_PREFIX/logs" fi
#4.补全log目录和log4j的logger等参数
- exportHADOOP_LOGFILE=hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.log
- exportHADOOP_ROOT_LOGGER=${HADOOP_ROOT_LOGGER:-"INFO,RFA"}
- exportHADOOP_SECURITY_LOGGER=${HADOOP_SECURITY_LOGGER:-"INFO,RFAS"}
- exportHDFS_AUDIT_LOGGER=${HDFS_AUDIT_LOGGER:-"INFO,NullAppender"}
- log=$HADOOP_LOG_DIR/hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.out
- pid=$HADOOP_PID_DIR/hadoop-$HADOOP_IDENT_STRING-$command.pid
- HADOOP_STOP_TIMEOUT=${HADOOP_STOP_TIMEOUT:-5}
export HADOOP_LOGFILE=hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.log export HADOOP_ROOT_LOGGER=${HADOOP_ROOT_LOGGER:-"INFO,RFA"} export HADOOP_SECURITY_LOGGER=${HADOOP_SECURITY_LOGGER:-"INFO,RFAS"} export HDFS_AUDIT_LOGGER=${HDFS_AUDIT_LOGGER:-"INFO,NullAppender"} log=$HADOOP_LOG_DIR/hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.out pid=$HADOOP_PID_DIR/hadoop-$HADOOP_IDENT_STRING-$command.pid HADOOP_STOP_TIMEOUT=${HADOOP_STOP_TIMEOUT:-5}
#5.调用/usr/lib/hadoop-hdfs/bin/hdfs
- hadoop_rotate_log$log
- echostarting$command,loggingto$log
- cd"$HADOOP_PREFIX"
- case$commandin
- namenode|secondarynamenode|datanode|journalnode|dfs|dfsadmin|fsck|balancer|zkfc)
- if[-z"$HADOOP_HDFS_HOME"];then
- hdfsScript="$HADOOP_PREFIX"/bin/hdfs
- else
- hdfsScript="$HADOOP_HDFS_HOME"/bin/hdfs
- fi
- nohupnice-n$HADOOP_NICENESS$hdfsScript--config$HADOOP_CONF_DIR$command"$@">"$log"2>&1</dev/null&
- ;;
- (*)
- nohupnice-n$HADOOP_NICENESS$hadoopScript--config$HADOOP_CONF_DIR$command"$@">"$log"2>&1</dev/null&
- ;;
- esac
- echo$!>$pid
- sleep1;head"$log"
- sleep3;
- if!ps-p$!>/dev/null;then
- exit1
- fi
hadoop_rotate_log $log echo starting $command, logging to $log cd "$HADOOP_PREFIX" case $command in namenode|secondarynamenode|datanode|journalnode|dfs|dfsadmin|fsck|balancer|zkfc) if [ -z "$HADOOP_HDFS_HOME" ]; then hdfsScript="$HADOOP_PREFIX"/bin/hdfs else hdfsScript="$HADOOP_HDFS_HOME"/bin/hdfs fi nohup nice -n $HADOOP_NICENESS $hdfsScript --config $HADOOP_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null & ;; (*) nohup nice -n $HADOOP_NICENESS $hadoopScript --config $HADOOP_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null & ;; esac echo $! > $pid sleep 1; head "$log" sleep 3; if ! ps -p $! > /dev/null ; then exit 1 fi
可以看到namenode的sysout输出到$log中,即log=$HADOOP_LOG_DIR/hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.out
/usr/lib/hadoop/libexec/hadoop-config.sh
#1.加载/usr/lib/hadoop/libexec/hadoop-layout.sh
hadoop-layout.sh主要描述了hadoop的lib的文件夹结构,主要内容如下
- HADOOP_COMMON_DIR="./"
- HADOOP_COMMON_LIB_JARS_DIR="lib"
- HADOOP_COMMON_LIB_NATIVE_DIR="lib/native"
- HDFS_DIR="./"
- HDFS_LIB_JARS_DIR="lib"
- YARN_DIR="./"
- YARN_LIB_JARS_DIR="lib"
- MAPRED_DIR="./"
- MAPRED_LIB_JARS_DIR="lib"
- HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-"/usr/lib/hadoop/libexec"}
- HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop/conf"}
- HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/usr/lib/hadoop"}
- HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/usr/lib/hadoop-hdfs"}
- HADOOP_MAPRED_HOME=${HADOOP_MAPRED_HOME:-"/usr/lib/hadoop-0.20-mapreduce"}
- YARN_HOME=${YARN_HOME:-"/usr/lib/hadoop-yarn"}
HADOOP_COMMON_DIR="./" HADOOP_COMMON_LIB_JARS_DIR="lib" HADOOP_COMMON_LIB_NATIVE_DIR="lib/native" HDFS_DIR="./" HDFS_LIB_JARS_DIR="lib" YARN_DIR="./" YARN_LIB_JARS_DIR="lib" MAPRED_DIR="./" MAPRED_LIB_JARS_DIR="lib" HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-"/usr/lib/hadoop/libexec"} HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop/conf"} HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/usr/lib/hadoop"} HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/usr/lib/hadoop-hdfs"} HADOOP_MAPRED_HOME=${HADOOP_MAPRED_HOME:-"/usr/lib/hadoop-0.20-mapreduce"} YARN_HOME=${YARN_HOME:-"/usr/lib/hadoop-yarn"}
#2.指定HDFS和YARN的lib
- HADOOP_COMMON_DIR=${HADOOP_COMMON_DIR:-"share/hadoop/common"}
- HADOOP_COMMON_LIB_JARS_DIR=${HADOOP_COMMON_LIB_JARS_DIR:-"share/hadoop/common/lib"}
- HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_COMMON_LIB_NATIVE_DIR:-"lib/native"}
- HDFS_DIR=${HDFS_DIR:-"share/hadoop/hdfs"}
- HDFS_LIB_JARS_DIR=${HDFS_LIB_JARS_DIR:-"share/hadoop/hdfs/lib"}
- YARN_DIR=${YARN_DIR:-"share/hadoop/yarn"}
- YARN_LIB_JARS_DIR=${YARN_LIB_JARS_DIR:-"share/hadoop/yarn/lib"}
- MAPRED_DIR=${MAPRED_DIR:-"share/hadoop/mapreduce"}
- MAPRED_LIB_JARS_DIR=${MAPRED_LIB_JARS_DIR:-"share/hadoop/mapreduce/lib"}
- #therootoftheHadoopinstallation
- #SeeHADOOP-6255fordirectorystructurelayout
- HADOOP_DEFAULT_PREFIX=$(cd-P--"$common_bin"/..&&pwd-P)
- HADOOP_PREFIX=${HADOOP_PREFIX:-$HADOOP_DEFAULT_PREFIX}
- exportHADOOP_PREFIX
HADOOP_COMMON_DIR=${HADOOP_COMMON_DIR:-"share/hadoop/common"} HADOOP_COMMON_LIB_JARS_DIR=${HADOOP_COMMON_LIB_JARS_DIR:-"share/hadoop/common/lib"} HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_COMMON_LIB_NATIVE_DIR:-"lib/native"} HDFS_DIR=${HDFS_DIR:-"share/hadoop/hdfs"} HDFS_LIB_JARS_DIR=${HDFS_LIB_JARS_DIR:-"share/hadoop/hdfs/lib"} YARN_DIR=${YARN_DIR:-"share/hadoop/yarn"} YARN_LIB_JARS_DIR=${YARN_LIB_JARS_DIR:-"share/hadoop/yarn/lib"} MAPRED_DIR=${MAPRED_DIR:-"share/hadoop/mapreduce"} MAPRED_LIB_JARS_DIR=${MAPRED_LIB_JARS_DIR:-"share/hadoop/mapreduce/lib"} # the root of the Hadoop installation # See HADOOP-6255 for directory structure layout HADOOP_DEFAULT_PREFIX=$(cd -P -- "$common_bin"/.. && pwd -P) HADOOP_PREFIX=${HADOOP_PREFIX:-$HADOOP_DEFAULT_PREFIX} export HADOOP_PREFIX
#3.对slave文件判断。但cdh的hadoop不是依靠slave来启动集群的,而是要用户自己写集群启动脚本(也许是为了逼用户用他的CloudManager。。。)
#4.再次指定env文件
- if[-f"${HADOOP_CONF_DIR}/hadoop-env.sh"];then
- ."${HADOOP_CONF_DIR}/hadoop-env.sh"
- fi
if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then . "${HADOOP_CONF_DIR}/hadoop-env.sh" fi
#5.指定java home
- #AttempttosetJAVA_HOMEifitisnotset
- if[[-z$JAVA_HOME]];then
- #OnOSXusejava_home(or/Libraryforolderversions)
- if["Darwin"=="$(uname-s)"];then
- if[-x/usr/libexec/java_home];then
- exportJAVA_HOME=($(/usr/libexec/java_home))
- else
- exportJAVA_HOME=(/Library/Java/Home)
- fi
- fi
- #Bailifwedidnotdetectit
- if[[-z$JAVA_HOME]];then
- echo"Error:JAVA_HOMEisnotsetandcouldnotbefound."1>&2
- exit1
- fi
- fi
# Attempt to set JAVA_HOME if it is not set if [[ -z $JAVA_HOME ]]; then # On OSX use java_home (or /Library for older versions) if [ "Darwin" == "$(uname -s)" ]; then if [ -x /usr/libexec/java_home ]; then export JAVA_HOME=($(/usr/libexec/java_home)) else export JAVA_HOME=(/Library/Java/Home) fi fi # Bail if we did not detect it if [[ -z $JAVA_HOME ]]; then echo "Error: JAVA_HOME is not set and could not be found." 1>&2 exit 1 fi fi
#6.指定Java程序启动的heapsize,如果用户在hadoop-env.sh中指定了HADOOP_HEAPSIZE字段则会覆盖默认值1000m
- #someJavaparameters
- JAVA_HEAP_MAX=-Xmx1000m
- #checkenvvarswhichmightoverridedefaultargs
- if["$HADOOP_HEAPSIZE"!=""];then
- #echo"runwithheapsize$HADOOP_HEAPSIZE"
- JAVA_HEAP_MAX="-Xmx""$HADOOP_HEAPSIZE""m"
- #echo$JAVA_HEAP_MAX
- fi
# some Java parameters JAVA_HEAP_MAX=-Xmx1000m # check envvars which might override default args if [ "$HADOOP_HEAPSIZE" != "" ]; then #echo "run with heapsize $HADOOP_HEAPSIZE" JAVA_HEAP_MAX="-Xmx""$HADOOP_HEAPSIZE""m" #echo $JAVA_HEAP_MAX fi
#7.指定程序的classpath,一大串代码,总结下就是
HADOOP_CONF_DIR+HADOOP_CLASSPATH+HADOOP_COMMON_DIR+HADOOP_COMMON_LIB_JARS_DIR+
HADOOP_COMMON_LIB_JARS_DIR+HADOOP_COMMON_LIB_NATIVE_DIR+HDFS_DIR+HDFS_LIB_JARS_DIR
+YARN_DIR+YARN_LIB_JARS_DIR+MAPRED_DIR+MAPRED_LIB_JARS_DIR
有一个要注意的,hadoop比较贴心的提供了HADOOP_USER_CLASSPATH_FIRST属性,如何设置了,
则HADOOP_CLASSPATH(用户自定义classpath)会在hadoop自身的jar包前加载,用来解决用户
想最先加载自定义的jar包情况。
#8.指定HADOOP_OPTS,-Dhadoop.log.dir这些类似参数会在conf下的log4j配置中用到
- HADOOP_OPTS="$HADOOP_OPTS-Dhadoop.log.dir=$HADOOP_LOG_DIR"
- HADOOP_OPTS="$HADOOP_OPTS-Dhadoop.log.file=$HADOOP_LOGFILE"
- HADOOP_OPTS="$HADOOP_OPTS-Dhadoop.home.dir=$HADOOP_PREFIX"
- HADOOP_OPTS="$HADOOP_OPTS-Dhadoop.id.str=$HADOOP_IDENT_STRING"
- HADOOP_OPTS="$HADOOP_OPTS-Dhadoop.root.logger=${HADOOP_ROOT_LOGGER:-INFO,console}"
- if["x$JAVA_LIBRARY_PATH"!="x"];then
- HADOOP_OPTS="$HADOOP_OPTS-Djava.library.path=$JAVA_LIBRARY_PATH"
- exportLD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_LIBRARY_PATH
- fi
- HADOOP_OPTS="$HADOOP_OPTS-Dhadoop.policy.file=$HADOOP_POLICYFILE"
- #Disableipv6asitcancauseissues
- HADOOP_OPTS="$HADOOP_OPTS-Djava.net.preferIPv4Stack=true"
- <SPANstyle="FONT-SIZE:18px">
- </SPAN>
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.dir=$HADOOP_LOG_DIR"
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.file=$HADOOP_LOGFILE"
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.home.dir=$HADOOP_PREFIX"
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.id.str=$HADOOP_IDENT_STRING"
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.root.logger=${HADOOP_ROOT_LOGGER:-INFO,console}"
if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then
HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$JAVA_LIBRARY_PATH"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_LIBRARY_PATH
fi
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.policy.file=$HADOOP_POLICYFILE"
# Disable ipv6 as it can cause issues
HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
/usr/lib/hadoop-hdfs/bin/hdfs
#1.加载/usr/lib/hadoop/libexec/hdfs-config.sh,但好像没啥作用
#2.根据启动参数指定java的启动mainclass:
- if["$COMMAND"="namenode"];then
- CLASS='org.apache.hadoop.hdfs.server.namenode.NameNode'
- HADOOP_OPTS="$HADOOP_OPTS$HADOOP_NAMENODE_OPTS"
if [ "$COMMAND" = "namenode" ] ; then CLASS='org.apache.hadoop.hdfs.server.namenode.NameNode' HADOOP_OPTS="$HADOOP_OPTS $HADOOP_NAMENODE_OPTS"
- #3.启动Java程序
- exec"$JAVA"-Dproc_$COMMAND$JAVA_HEAP_MAX$HADOOP_OPTS$CLASS"$@"
#3.启动Java程序 exec "$JAVA" -Dproc_$COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
最后介绍几个配置的小例子。
1.如何指定hadoop的log目录:
从启动脚本中看几个配置的优先级排序是hadoop-env.sh>hadoop-config.sh>/etc/default/hadoop,因此我们如果想指定hadoop的log目录只需在hadoop-env.sh中添加一行:
export HADOOP_LOG_DIR=xxxxx
2.如何添加自己的jar包到hadoop中被namenode,datanode使用
export HADOOP_CLASSPATH=xxxxx
3.如何单独设定namenode的java heapsize。
比如想设置namenode10G,datanode1G,这个就有点意思了。如果直接指定HADOOP_HEAPSIZE那么此参数会作用于namenode,datanode,而单独在namenode的参数中指定也会有点小问题哦,不过基本是可以使用的。
总之,由于hadoop的启动脚本极其多而且琐碎,再加上hbase hive的启动脚本都是类似的结构,导致在添加修改一些配置时会产生很多莫名的问题,大家也可以在使用的过程中细细体会啦
在工作过程中,经常需要调整一些hadoop的参数配置,所以经常会遇到各种各样的问题。比如改了个配置怎么突然namenode起不来啦,加了个jar包怎么让hadoop的jvm加载啊,如何设定log目录啦等等,每次都需要仔细的查一遍启动脚本才能找到原因,费时又费力,因此专门总结了一下以便不时之需。
cloudera的hadoop的启动脚本写的异常复杂和零散,各种shell脚本分散在系统的各个角落,让人很无语。下面以namenode启动的过程为例说明hadoop的启动脚本的调用关系和各个脚本的作用。
hadoop启动的入口脚本是/etc/init.d/hadoop-hdfs-name,下面我们顺着启动namenode的顺序看看hadoop的启动调用过程。
/etc/init.d/hadoop-hdfs-namenode:
#1.加载/etc/default/hadoop /etc/default/hadoop-hdfs-namenode
#2.执行/usr/lib/hadoop/sbin/hadoop-daemon.sh启动namenode
cloudera启动namenode的用户为hdfs,默认的配置目录是/etc/hadoop/conf
- start(){
- [-x$EXEC_PATH]||exit$ERROR_PROGRAM_NOT_INSTALLED
- [-d$CONF_DIR]||exit$ERROR_PROGRAM_NOT_CONFIGURED
- log_success_msg"Starting${DESC}:"
- su-s/bin/bash$SVC_USER-c"$EXEC_PATH--config'$CONF_DIR'start$DAEMON_FLAGS"
- #Someprocessesareslowtostart
- sleep$SLEEP_TIME
- checkstatusofproc
- RETVAL=$?
- [$RETVAL-eq$RETVAL_SUCCESS]&&touch$LOCKFILE
- return$RETVAL
- }
start() { [ -x $EXEC_PATH ] || exit $ERROR_PROGRAM_NOT_INSTALLED [ -d $CONF_DIR ] || exit $ERROR_PROGRAM_NOT_CONFIGURED log_success_msg "Starting ${DESC}: " su -s /bin/bash $SVC_USER -c "$EXEC_PATH --config '$CONF_DIR' start $DAEMON_FLAGS" # Some processes are slow to start sleep $SLEEP_TIME checkstatusofproc RETVAL=$? [ $RETVAL -eq $RETVAL_SUCCESS ] && touch $LOCKFILE return $RETVAL }
/etc/default/hadoop /etc/default/hadoop-hdfs-namenode:
#1.配置logdir,piddir,user
/usr/lib/hadoop/sbin/hadoop-daemon.sh
#1.加载/usr/lib/hadoop/libexec/hadoop-config.sh
- DEFAULT_LIBEXEC_DIR="$bin"/../libexec
- HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
- .$HADOOP_LIBEXEC_DIR/hadoop-config.sh
DEFAULT_LIBEXEC_DIR="$bin"/../libexec HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR} . $HADOOP_LIBEXEC_DIR/hadoop-config.sh
#2.加载hadoop-env.sh
- if[-f"${HADOOP_CONF_DIR}/hadoop-env.sh"];then
- ."${HADOOP_CONF_DIR}/hadoop-env.sh"
- fi
if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then . "${HADOOP_CONF_DIR}/hadoop-env.sh" fi
#3.指定log目录
- #getlogdirectory
- if["$HADOOP_LOG_DIR"=""];then
- exportHADOOP_LOG_DIR="$HADOOP_PREFIX/logs"
- fi
# get log directory if [ "$HADOOP_LOG_DIR" = "" ]; then export HADOOP_LOG_DIR="$HADOOP_PREFIX/logs" fi
#4.补全log目录和log4j的logger等参数
- exportHADOOP_LOGFILE=hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.log
- exportHADOOP_ROOT_LOGGER=${HADOOP_ROOT_LOGGER:-"INFO,RFA"}
- exportHADOOP_SECURITY_LOGGER=${HADOOP_SECURITY_LOGGER:-"INFO,RFAS"}
- exportHDFS_AUDIT_LOGGER=${HDFS_AUDIT_LOGGER:-"INFO,NullAppender"}
- log=$HADOOP_LOG_DIR/hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.out
- pid=$HADOOP_PID_DIR/hadoop-$HADOOP_IDENT_STRING-$command.pid
- HADOOP_STOP_TIMEOUT=${HADOOP_STOP_TIMEOUT:-5}
export HADOOP_LOGFILE=hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.log export HADOOP_ROOT_LOGGER=${HADOOP_ROOT_LOGGER:-"INFO,RFA"} export HADOOP_SECURITY_LOGGER=${HADOOP_SECURITY_LOGGER:-"INFO,RFAS"} export HDFS_AUDIT_LOGGER=${HDFS_AUDIT_LOGGER:-"INFO,NullAppender"} log=$HADOOP_LOG_DIR/hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.out pid=$HADOOP_PID_DIR/hadoop-$HADOOP_IDENT_STRING-$command.pid HADOOP_STOP_TIMEOUT=${HADOOP_STOP_TIMEOUT:-5}
#5.调用/usr/lib/hadoop-hdfs/bin/hdfs
- hadoop_rotate_log$log
- echostarting$command,loggingto$log
- cd"$HADOOP_PREFIX"
- case$commandin
- namenode|secondarynamenode|datanode|journalnode|dfs|dfsadmin|fsck|balancer|zkfc)
- if[-z"$HADOOP_HDFS_HOME"];then
- hdfsScript="$HADOOP_PREFIX"/bin/hdfs
- else
- hdfsScript="$HADOOP_HDFS_HOME"/bin/hdfs
- fi
- nohupnice-n$HADOOP_NICENESS$hdfsScript--config$HADOOP_CONF_DIR$command"$@">"$log"2>&1</dev/null&
- ;;
- (*)
- nohupnice-n$HADOOP_NICENESS$hadoopScript--config$HADOOP_CONF_DIR$command"$@">"$log"2>&1</dev/null&
- ;;
- esac
- echo$!>$pid
- sleep1;head"$log"
- sleep3;
- if!ps-p$!>/dev/null;then
- exit1
- fi
hadoop_rotate_log $log echo starting $command, logging to $log cd "$HADOOP_PREFIX" case $command in namenode|secondarynamenode|datanode|journalnode|dfs|dfsadmin|fsck|balancer|zkfc) if [ -z "$HADOOP_HDFS_HOME" ]; then hdfsScript="$HADOOP_PREFIX"/bin/hdfs else hdfsScript="$HADOOP_HDFS_HOME"/bin/hdfs fi nohup nice -n $HADOOP_NICENESS $hdfsScript --config $HADOOP_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null & ;; (*) nohup nice -n $HADOOP_NICENESS $hadoopScript --config $HADOOP_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null & ;; esac echo $! > $pid sleep 1; head "$log" sleep 3; if ! ps -p $! > /dev/null ; then exit 1 fi
可以看到namenode的sysout输出到$log中,即log=$HADOOP_LOG_DIR/hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.out
/usr/lib/hadoop/libexec/hadoop-config.sh
#1.加载/usr/lib/hadoop/libexec/hadoop-layout.sh
hadoop-layout.sh主要描述了hadoop的lib的文件夹结构,主要内容如下
- HADOOP_COMMON_DIR="./"
- HADOOP_COMMON_LIB_JARS_DIR="lib"
- HADOOP_COMMON_LIB_NATIVE_DIR="lib/native"
- HDFS_DIR="./"
- HDFS_LIB_JARS_DIR="lib"
- YARN_DIR="./"
- YARN_LIB_JARS_DIR="lib"
- MAPRED_DIR="./"
- MAPRED_LIB_JARS_DIR="lib"
- HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-"/usr/lib/hadoop/libexec"}
- HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop/conf"}
- HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/usr/lib/hadoop"}
- HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/usr/lib/hadoop-hdfs"}
- HADOOP_MAPRED_HOME=${HADOOP_MAPRED_HOME:-"/usr/lib/hadoop-0.20-mapreduce"}
- YARN_HOME=${YARN_HOME:-"/usr/lib/hadoop-yarn"}
HADOOP_COMMON_DIR="./" HADOOP_COMMON_LIB_JARS_DIR="lib" HADOOP_COMMON_LIB_NATIVE_DIR="lib/native" HDFS_DIR="./" HDFS_LIB_JARS_DIR="lib" YARN_DIR="./" YARN_LIB_JARS_DIR="lib" MAPRED_DIR="./" MAPRED_LIB_JARS_DIR="lib" HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-"/usr/lib/hadoop/libexec"} HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop/conf"} HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/usr/lib/hadoop"} HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/usr/lib/hadoop-hdfs"} HADOOP_MAPRED_HOME=${HADOOP_MAPRED_HOME:-"/usr/lib/hadoop-0.20-mapreduce"} YARN_HOME=${YARN_HOME:-"/usr/lib/hadoop-yarn"}
#2.指定HDFS和YARN的lib
- HADOOP_COMMON_DIR=${HADOOP_COMMON_DIR:-"share/hadoop/common"}
- HADOOP_COMMON_LIB_JARS_DIR=${HADOOP_COMMON_LIB_JARS_DIR:-"share/hadoop/common/lib"}
- HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_COMMON_LIB_NATIVE_DIR:-"lib/native"}
- HDFS_DIR=${HDFS_DIR:-"share/hadoop/hdfs"}
- HDFS_LIB_JARS_DIR=${HDFS_LIB_JARS_DIR:-"share/hadoop/hdfs/lib"}
- YARN_DIR=${YARN_DIR:-"share/hadoop/yarn"}
- YARN_LIB_JARS_DIR=${YARN_LIB_JARS_DIR:-"share/hadoop/yarn/lib"}
- MAPRED_DIR=${MAPRED_DIR:-"share/hadoop/mapreduce"}
- MAPRED_LIB_JARS_DIR=${MAPRED_LIB_JARS_DIR:-"share/hadoop/mapreduce/lib"}
- #therootoftheHadoopinstallation
- #SeeHADOOP-6255fordirectorystructurelayout
- HADOOP_DEFAULT_PREFIX=$(cd-P--"$common_bin"/..&&pwd-P)
- HADOOP_PREFIX=${HADOOP_PREFIX:-$HADOOP_DEFAULT_PREFIX}
- exportHADOOP_PREFIX
HADOOP_COMMON_DIR=${HADOOP_COMMON_DIR:-"share/hadoop/common"} HADOOP_COMMON_LIB_JARS_DIR=${HADOOP_COMMON_LIB_JARS_DIR:-"share/hadoop/common/lib"} HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_COMMON_LIB_NATIVE_DIR:-"lib/native"} HDFS_DIR=${HDFS_DIR:-"share/hadoop/hdfs"} HDFS_LIB_JARS_DIR=${HDFS_LIB_JARS_DIR:-"share/hadoop/hdfs/lib"} YARN_DIR=${YARN_DIR:-"share/hadoop/yarn"} YARN_LIB_JARS_DIR=${YARN_LIB_JARS_DIR:-"share/hadoop/yarn/lib"} MAPRED_DIR=${MAPRED_DIR:-"share/hadoop/mapreduce"} MAPRED_LIB_JARS_DIR=${MAPRED_LIB_JARS_DIR:-"share/hadoop/mapreduce/lib"} # the root of the Hadoop installation # See HADOOP-6255 for directory structure layout HADOOP_DEFAULT_PREFIX=$(cd -P -- "$common_bin"/.. && pwd -P) HADOOP_PREFIX=${HADOOP_PREFIX:-$HADOOP_DEFAULT_PREFIX} export HADOOP_PREFIX
#3.对slave文件判断。但cdh的hadoop不是依靠slave来启动集群的,而是要用户自己写集群启动脚本(也许是为了逼用户用他的CloudManager。。。)
#4.再次指定env文件
- if[-f"${HADOOP_CONF_DIR}/hadoop-env.sh"];then
- ."${HADOOP_CONF_DIR}/hadoop-env.sh"
- fi
if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then . "${HADOOP_CONF_DIR}/hadoop-env.sh" fi
#5.指定java home
- #AttempttosetJAVA_HOMEifitisnotset
- if[[-z$JAVA_HOME]];then
- #OnOSXusejava_home(or/Libraryforolderversions)
- if["Darwin"=="$(uname-s)"];then
- if[-x/usr/libexec/java_home];then
- exportJAVA_HOME=($(/usr/libexec/java_home))
- else
- exportJAVA_HOME=(/Library/Java/Home)
- fi
- fi
- #Bailifwedidnotdetectit
- if[[-z$JAVA_HOME]];then
- echo"Error:JAVA_HOMEisnotsetandcouldnotbefound."1>&2
- exit1
- fi
- fi
# Attempt to set JAVA_HOME if it is not set if [[ -z $JAVA_HOME ]]; then # On OSX use java_home (or /Library for older versions) if [ "Darwin" == "$(uname -s)" ]; then if [ -x /usr/libexec/java_home ]; then export JAVA_HOME=($(/usr/libexec/java_home)) else export JAVA_HOME=(/Library/Java/Home) fi fi # Bail if we did not detect it if [[ -z $JAVA_HOME ]]; then echo "Error: JAVA_HOME is not set and could not be found." 1>&2 exit 1 fi fi
#6.指定Java程序启动的heapsize,如果用户在hadoop-env.sh中指定了HADOOP_HEAPSIZE字段则会覆盖默认值1000m
- #someJavaparameters
- JAVA_HEAP_MAX=-Xmx1000m
- #checkenvvarswhichmightoverridedefaultargs
- if["$HADOOP_HEAPSIZE"!=""];then
- #echo"runwithheapsize$HADOOP_HEAPSIZE"
- JAVA_HEAP_MAX="-Xmx""$HADOOP_HEAPSIZE""m"
- #echo$JAVA_HEAP_MAX
- fi
# some Java parameters JAVA_HEAP_MAX=-Xmx1000m # check envvars which might override default args if [ "$HADOOP_HEAPSIZE" != "" ]; then #echo "run with heapsize $HADOOP_HEAPSIZE" JAVA_HEAP_MAX="-Xmx""$HADOOP_HEAPSIZE""m" #echo $JAVA_HEAP_MAX fi
#7.指定程序的classpath,一大串代码,总结下就是
HADOOP_CONF_DIR+HADOOP_CLASSPATH+HADOOP_COMMON_DIR+HADOOP_COMMON_LIB_JARS_DIR+
HADOOP_COMMON_LIB_JARS_DIR+HADOOP_COMMON_LIB_NATIVE_DIR+HDFS_DIR+HDFS_LIB_JARS_DIR
+YARN_DIR+YARN_LIB_JARS_DIR+MAPRED_DIR+MAPRED_LIB_JARS_DIR
有一个要注意的,hadoop比较贴心的提供了HADOOP_USER_CLASSPATH_FIRST属性,如何设置了,
则HADOOP_CLASSPATH(用户自定义classpath)会在hadoop自身的jar包前加载,用来解决用户
想最先加载自定义的jar包情况。
#8.指定HADOOP_OPTS,-Dhadoop.log.dir这些类似参数会在conf下的log4j配置中用到
- HADOOP_OPTS="$HADOOP_OPTS-Dhadoop.log.dir=$HADOOP_LOG_DIR"
- HADOOP_OPTS="$HADOOP_OPTS-Dhadoop.log.file=$HADOOP_LOGFILE"
- HADOOP_OPTS="$HADOOP_OPTS-Dhadoop.home.dir=$HADOOP_PREFIX"
- HADOOP_OPTS="$HADOOP_OPTS-Dhadoop.id.str=$HADOOP_IDENT_STRING"
- HADOOP_OPTS="$HADOOP_OPTS-Dhadoop.root.logger=${HADOOP_ROOT_LOGGER:-INFO,console}"
- if["x$JAVA_LIBRARY_PATH"!="x"];then
- HADOOP_OPTS="$HADOOP_OPTS-Djava.library.path=$JAVA_LIBRARY_PATH"
- exportLD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_LIBRARY_PATH
- fi
- HADOOP_OPTS="$HADOOP_OPTS-Dhadoop.policy.file=$HADOOP_POLICYFILE"
- #Disableipv6asitcancauseissues
- HADOOP_OPTS="$HADOOP_OPTS-Djava.net.preferIPv4Stack=true"
- <SPANstyle="FONT-SIZE:18px">
- </SPAN>
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.dir=$HADOOP_LOG_DIR"
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.file=$HADOOP_LOGFILE"
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.home.dir=$HADOOP_PREFIX"
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.id.str=$HADOOP_IDENT_STRING"
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.root.logger=${HADOOP_ROOT_LOGGER:-INFO,console}"
if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then
HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$JAVA_LIBRARY_PATH"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_LIBRARY_PATH
fi
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.policy.file=$HADOOP_POLICYFILE"
# Disable ipv6 as it can cause issues
HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
/usr/lib/hadoop-hdfs/bin/hdfs
#1.加载/usr/lib/hadoop/libexec/hdfs-config.sh,但好像没啥作用
#2.根据启动参数指定java的启动mainclass:
- if["$COMMAND"="namenode"];then
- CLASS='org.apache.hadoop.hdfs.server.namenode.NameNode'
- HADOOP_OPTS="$HADOOP_OPTS$HADOOP_NAMENODE_OPTS"
if [ "$COMMAND" = "namenode" ] ; then CLASS='org.apache.hadoop.hdfs.server.namenode.NameNode' HADOOP_OPTS="$HADOOP_OPTS $HADOOP_NAMENODE_OPTS"
- #3.启动Java程序
- exec"$JAVA"-Dproc_$COMMAND$JAVA_HEAP_MAX$HADOOP_OPTS$CLASS"$@"
#3.启动Java程序 exec "$JAVA" -Dproc_$COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
最后介绍几个配置的小例子。
1.如何指定hadoop的log目录:
从启动脚本中看几个配置的优先级排序是hadoop-env.sh>hadoop-config.sh>/etc/default/hadoop,因此我们如果想指定hadoop的log目录只需在hadoop-env.sh中添加一行:
export HADOOP_LOG_DIR=xxxxx
2.如何添加自己的jar包到hadoop中被namenode,datanode使用
export HADOOP_CLASSPATH=xxxxx
3.如何单独设定namenode的java heapsize。
比如想设置namenode10G,datanode1G,这个就有点意思了。如果直接指定HADOOP_HEAPSIZE那么此参数会作用于namenode,datanode,而单独在namenode的参数中指定也会有点小问题哦,不过基本是可以使用的。
总之,由于hadoop的启动脚本极其多而且琐碎,再加上hbase hive的启动脚本都是类似的结构,导致在添加修改一些配置时会产生很多莫名的问题,大家也可以在使用的过程中细细体会啦
相关推荐
在学习hadoop启动脚本过程中记录的,有一定的参考价值,值得一看!
HadoopHA集群 批量启动脚本HadoopHA集群 批量启动脚本HadoopHA集群 批量启动脚本HadoopHA集群 批量启动脚本
hadoop 高可用启动脚本,运行启动zookeeper集群和hadoop ha集群
启动集群脚本,私聊免费发。上传只为防丢失。以备后用
2测试脚本环境为centOS6,其他操作系统会有些配置不一样,请手动调整 资源描述: 安装Linux教程.mp4 搭建hadoop集群-脚本使用教程.mp4 搭建Hadoop集群.docx(此为文本教程,脚本都是按这个配置的) 脚本使用教程.txt...
将文件中的hadoop配置文件、自动安装脚本,自己下载的jdk、hadoop文件放于同一目录下,(注jdk、hadoop文件名中要包含关键字jdk、hadoop),然后运行脚本。详细请看说明。
Hadoop Mapreduce过程shuffle过程全解析,Shuffle过程
Hadoop技术内幕 深入解析
停止集群麻烦,所以写的脚本。私我可以免费发给你。上传只为自己以后用,防止丢失。
hadoop一键安装脚本,支持安装hadoop集群,zk集群 ha高可用
docker中启动大数据脚本
Hadoop技术内幕深入解析YARN架构设计与实现原理PDF,不可用于商业用途,如有版权问题,请联系删除!
jps判断hadoop启动是否成功;分别对master和slave进行了判断。jps不是hadoop的什么命令,是java的命令,所以直接执行就行了。
hadoop启动日志
Hadoop技术内幕 深入解析MapReduce架构设计与实现原理[董西成][带书签].pdf 百度网盘下载
利用shell脚本 对Hadoop环境进行傻瓜式配置 先解压! 环节包括: 1.修改hostname 2.修改hosts 3.配置免密 4.Hadoop配置文件的更改 !!!!!!!!!!!!!!!!!!!! ps 请特别注意以下几个问题: 1....
NULL 博文链接:https://heipark.iteye.com/blog/1280945
Hadoop技术内幕深入解析YARN架构设计与实现原理
Hadoop技术内幕 深入解析HADOOP COMMON和HDFS架构设计与实现原理
hadoop集群服务开启命令 简单好用 一个命令开启所有服务 炫酷!!!!!!