Compiling and installing Hadoop-LZO compression support module

If you want to benefit of splittable LZO compression in Hadoop you have to build it yourself. Due to licensing reasons, the module isnt shipped with Apaches Hadoop or Cloudera.

LZO compression is significantly faster than the other compressions in Hadoop, such as Snappy or GZip.

The original work is located at http://code.google.com/p/hadoop-gpl-compression/ however there are two more improved forks that are in-sync. We will be using Todd Lipcons fork. He is employed at Cloudera, so his fork is the closest to the CDH4 stack.

The reason why I am writing this short article is because all installation articles for hadoop-lzo I found were not as short as they could be. So here we go:

yum install lzo-devel ant java-1.7.0-openjdk-devel gcc
cd /usr/local/src
git clone git://github.com/toddlipcon/hadoop-lzo.git
cd hadoop-lzo

This step is only required if you are using openjdk 1.7.0 which you should:

in build.xml search for <javah destdir=”${build.native}/src/com/hadoop/compression/lzo” and insert a line before </javah> that reads <classpath refid=”classpath”/>

Then

ant clean compile-native tar
cp /usr/local/src/hadoop-lzo/build/hadoop-lzo-0.4.15.tar.gz /usr/local/src
cd /usr/local/src && tar -xzf hadoop-lzo-0.4.15.tar.gz
cp /usr/local/src/hadoop-lzo-0.4.15/hadoop-lzo-0.4.15.jar /usr/lib/hadoop/lib
cp /usr/local/src/hadoop-lzo-0.4.15/lib/native/Linux-amd64-64/* /usr/lib/hadoop/lib/native/

In core-site.xml add

<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>

In mapred-site.xml add

<property>
<name>mapreduce.map.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.output.compress.codec</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>

That should be it! Test it with

sudo -u hdfs hbase org.apache.hadoop.hbase.util.CompressionTest ./test.txt lzo

If you see a line like

13/06/14 02:26:04 INFO compress.CodecPool: Got brand-new compressor [.lzo_deflate]

and got no related errors, than you are set. Copy over the hadoop-lzo-0.4.15.jar, native librariers and configs to all cluster nodes and you are done !

  1. No comments yet.

  1. June 23rd, 2013