Compiling and installing Hadoop-LZO compression support module
If you want to benefit of splittable LZO compression in Hadoop you have to build it yourself. Due to licensing reasons, the module isnt shipped with Apaches Hadoop or Cloudera.
LZO compression is significantly faster than the other compressions in Hadoop, such as Snappy or GZip.
The original work is located at http://code.google.com/p/hadoop-gpl-compression/ however there are two more improved forks that are in-sync. We will be using Todd Lipcons fork. He is employed at Cloudera, so his fork is the closest to the CDH4 stack.
The reason why I am writing this short article is because all installation articles for hadoop-lzo I found were not as short as they could be. So here we go:
cd /usr/local/src
git clone git://github.com/toddlipcon/hadoop-lzo.git
cd hadoop-lzo
This step is only required if you are using openjdk 1.7.0 which you should:
in build.xml search for <javah destdir=”${build.native}/src/com/hadoop/compression/lzo” and insert a line before </javah> that reads <classpath refid=”classpath”/>
Then
cp /usr/local/src/hadoop-lzo/build/hadoop-lzo-0.4.15.tar.gz /usr/local/src
cd /usr/local/src && tar -xzf hadoop-lzo-0.4.15.tar.gz
cp /usr/local/src/hadoop-lzo-0.4.15/hadoop-lzo-0.4.15.jar /usr/lib/hadoop/lib
cp /usr/local/src/hadoop-lzo-0.4.15/lib/native/Linux-amd64-64/* /usr/lib/hadoop/lib/native/
In core-site.xml add
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
In mapred-site.xml add
<name>mapreduce.map.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.output.compress.codec</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
That should be it! Test it with
If you see a line like
and got no related errors, than you are set. Copy over the hadoop-lzo-0.4.15.jar, native librariers and configs to all cluster nodes and you are done !
Refer to see how to Use LZO Compression in Hadoop
https://knpcode.com/hadoop/hadoop-io/how-to-use-lzo-compression-in-hadoop/