Linux Tutorials
Building Hadoop Clusters On Linux In EC2
Installing And Using Hadoop
Setting Up SSH Keys Using SSH Agents And Tunnels
Creating OpenSSL Certificates and Certificate Authorities
Installing and configuring Xen
IPTables Primer
Linux Basic Bash Scripting

Building Hadoop Clusters On Linux In EC2

Backup node setup

Currently the NameNode is a single point of failure. While we can't run a multi-master environment a backup-NameNode can be setup as a hot backup. This second node publishes an NFS mount to the NameNode. The NameNode writes a copy of the HDFS metadata into the NFS share. If the NameNode were to fail Hadoop could be restarted pointing at this backup-NameNode as the new master NameNode. While the regular NameNode server is running the backup node doesn't run daemons. Start one last instance. This time don't add this host to the slaves file, but do the other setup steps.

Set up NFS mount

On the new instance we need to setup the NFS mount. The directory /mnt/hadoop/dfs/name needs to be exported and services need to be started.

/mnt/hadoop/dfs/name    *(rw)

This is a very insecure export that will properly export the directory. Place this in the /etc/exports file.

# mkdir -p /mnt/hadoop/dfs/name
# chown -R hadoop:hadoop /mnt/hadoop
# /etc/init.d/rpcbind start
# /etc/init.d/nfs start
# /etc/init.d/nfslock start

First create and chown the directory to export. Then start the rpcbind, NFS, and locking services. Start these 3 services on the master NameNode as well.

If you with to have the NFS mount attached at boot insert the following into the NameNode fstab.

10.250.58.85:/mnt/hadoop/dfs/name               /mnt/node-backup                       nfs    defaults,rw 0 0

Now mount /mnt/node-backup on the master NameNode.

# mount /mnt/node-backup

Once everything has been properly mounted stop Hadoop and make the following change to the hadoop-site.xml configuration file on the master NameNode.

<property>
  <name>dfs.name.dir</name>
  <value>${hadoop.tmp.dir}/dfs/name,/mnt/node-backup</value>
  <final>true</final>
</property>

Failover process

Now if the primary NameNode dies the IP address of this backup NameNode server can be changed to the IP address of the primary NameNode. This is why we setup an Amazon EC2 Elastic IP address and attached it to the primary NameNode. Remember even though the IP listed in this tutorial is the internal to use failover the public IP must be used on the NameNode.

Removing Nodes

To remove hosts to replace hardware and perform upgrades they should be removed from the Hadoop cluster. Add the following to the hadoop-site.xml configuration file. The NameNode will prevent access from hosts listed in the exclude file.

  <property>
    <name>mapred.hosts.exclude</name>
    <value>/usr/local/hadoop-0.19.0/conf/excludes</value>
    <final>true</final>
  </property>

Now create the file /usr/local/hadoop-0.19.0/conf/excludes and add the IP address of one of the slaves.

10.251.26.69

Setting up the secondary-namenode

Another Hadoop process that currently is set to run on the NameNode is the secondary-NameNode service. The secondary-NameNode is not a backup NameNode service. The secondary-NameNode takes snapshots of the NameNode metadata. This process can be offloaded from the NameNode onto the backup node we setup the NFS share on. To do this change the masters file to contain the IP or hostname of the node.

10.250.58.85

TaskTracker Memory Management Configuration

As the Hadoop introduction atricle touched on TaskTracker has a memory management feature. This allows for jobs and hosts to be configured to kill jobs that exceed memory thresholds. The TaskTracker process monitors the virtual memory (VMEM) usages and thresholds are based on VMEM usage. Setting job based limits allows TaskTracker to kill abusive jobs that exceed their specified VMEM limit. Additionally host based VMEM limits can be set in TaskTracker that tells TaskTracker to kill 1 or more jobs to bring memory consumption back within the host threshold.

To prevent jobs from causing an out of memory (OOM) condition set mapred.tasktracker.vmem.reserved in hadoop-site.xml. The mapred.tasktracker.vmem.reserved option sets a buffer of X size in bytes of VMEM on the system that TaskTracker and it's children (jobs) can NOT consume. On a job basis there are 2 settings, first mapred.task.default.maxvmem sets the default amount of VMEM that a job can use. This amount can be overridden by the user at submission. Second mapred.task.limit.maxvmem sets a hard limit on VMEM a job is allowed to consume. Jobs should not request more than this amount and users can't override this.

Setting up Hadoop HDFS and Configuring Hadoop <<  1 2 3
New Content