Apache Carbondata QAT Codec

QAT Codec for Apache Carbondata

License	License Apache 2.0 License
Categories	Categories Data
GroupId	GroupId com.intel.qat
ArtifactId	ArtifactId carbondata_qat_wrapper
Last Version	Last Version 2.0.0
Release Date	Release Date Apr 12, 2019
Type	Type jar
Description	Description Apache Carbondata QAT Codec QAT Codec for Apache Carbondata
Project URL	Project URL https://github.com/intel-hadoop/IntelQATCodec
Source Code Management	Source Code Management https://github.com/intel-hadoop/IntelQATCodec

Download carbondata_qat_wrapper

Filename	Size
carbondata_qat_wrapper-2.0.0.pom
carbondata_qat_wrapper-2.0.0.jar	5 KB
carbondata_qat_wrapper-2.0.0-sources.jar	4 KB
carbondata_qat_wrapper-2.0.0-javadoc.jar	38 KB
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/com.intel.qat/carbondata_qat_wrapper/ -->
<dependency>
    <groupId>com.intel.qat</groupId>
    <artifactId>carbondata_qat_wrapper</artifactId>
    <version>2.0.0</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/com.intel.qat/carbondata_qat_wrapper/
implementation 'com.intel.qat:carbondata_qat_wrapper:2.0.0'

Gradle Kotlin

// https://jarcasting.com/artifacts/com.intel.qat/carbondata_qat_wrapper/
implementation ("com.intel.qat:carbondata_qat_wrapper:2.0.0")

Apache Buildr

'com.intel.qat:carbondata_qat_wrapper:jar:2.0.0'

Apache Ivy

<dependency org="com.intel.qat" name="carbondata_qat_wrapper" rev="2.0.0">
  <artifact name="carbondata_qat_wrapper" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='com.intel.qat', module='carbondata_qat_wrapper', version='2.0.0')
)

Scala SBT

libraryDependencies += "com.intel.qat" % "carbondata_qat_wrapper" % "2.0.0"

Leiningen

[com.intel.qat/carbondata_qat_wrapper "2.0.0"]

Dependencies

test (1)

Group / Artifact	Type	Version
junit : junit	jar	RELEASE

Project Modules

There are no modules declared in this project.

QAT Codec

QAT Codec project provides compression and decompression library for Apache Hadoop to make use of the Intel® QuickAssist Technology for compression/decompression.

Big data analytics are commonly performed on large data sets that are moved within a Hadoop cluster containing high-volume, industry-standard servers. A significant amount of time and network bandwidth can be saved when the data is compressed before it is passed between servers, as long as the compression/ decompression operations are efficient and require negligible CPU cycles. This is possible with the hardware-based compression delivered by Intel® QuickAssist Technology, which is easy to integrate into existing systems and networks using the available Intel drivers and patches.

Online Documentation

http://www.intel.com/content/www/us/en/embedded/technology/quickassist/overview.html

Building QAT Codec

1. Building with Maven

This option assumes that you have installed maven in your build machine. Also assumed to have java installed and set JAVA_HOME

Run the following command for building qatcodec.jar and libqatcodec.so

mvn clean install -Dqatzip.libs=QATZIP_LIBRARIES_PATH -Dqatzip.src=QATZIP_SOURCE_CODE PATH

Here

 qatzip.libs - A path where qatzip libraries placed. This is needed because QATCodec depends on qatzip libraries.
 qatzip.src  - A path where qatzip source code placed. This is needed because QATCodec needs qatzip exposed h files for building.

Native code building will be skipped in Windows machine as QATCodec native code can not be build in Windows.

When you run the build in Linux os, native code will be build automatically when run the above command.

If you want native building to be skipped in linux os explicitly, then you need to mention -DskipNative

ex: mvn clean install -Dqatzip.libs=QATZIP_LIBRARIES_PATH -Dqatzip.src=QATZIP_SOURCE_CODE PATH -DskipNative

By default above commands will run the test cases as well. TO skip the test cases to run use the following command

mvn clean install -DskipTests Dqatzip.libs=QATZIP_LIBRARIES_PATH -Dqatzip.src=QATZIP_SOURCE_CODE_PATH

To run the specific test cases

mvn clean test -Dtest=TestQatCompressorDecompressor Dqatzip.libs=QATZIP_LIBRARIES_PATH -Dqatzip.src=QATZIP_SOURCE_CODE PATH

2. Building with Makefile

1. Building qatcodec.jar

Set the below env variables,

JAVA_HOME - Java home

HADOOPJARS - Cloudera Hadoop jars

After exporting above parameters execute the following commands

cd QATCodec/build/

make

2. Building libqatcodec.so

Set the below env variables,

JAVA_HOME - Java home

QATZIPSRC - QATZIP source code path

LD_LIBRARY_PATH - make sure to export LD_LIBRARY_PATH with qatzip libraries

After exporting above parameters execute the following commands

cd QATCodec/build/native/

make

Build for CDP DC 7.0

Build the hive module for QAT

1. Run the scripts

$ cd columnar_format_qat_wrapper
$ ./apply_hive_jars.sh 7.0.0 $PATH/TO/IntelQATCodec

After this, we can see that in the folder under columnar_format_qat_wrapper/target, there have four parts: (1) parquet-format (2) parquet-mr (3) orc (4) hive

2. Building the Parquet-format

go to the folder target/parquet-format
Please refer to documentation at building for detailed prerequisites and guidance on building parquet-format.

3. Building the Parquet-mr

Install Protobuf 3.5.1
Install Thrift 0.9.3
go to the folder target/parquet-mr
Please refer to the documentation at building for detailed prerequisites and guidance on building parquet-mr.

4. Building the ORC

Install java 1.7 or higher
Install maven 3 or higher
Install cmake
go to the folder target/orc
Please refer to the documentation at building for detailed prerequisites and guidance on buidling orc.

5. Building the Hive

go to the folder target/hive
Please refer to the documentation at getting-started for detailed prerequisites and guidance on buidling hive.

6. Copy the jars to the CDP

Please copy the following jars obtained in the previous steps to appropriate location in CDP.

parquet-format-2.4.0.jar
parquet-common-1.10.0.jar
parquet-hadoop-1.10.0.jar
orc-core-1.5.1.jar
orc-shims-1.5.1.jar
hive-exec-3.1.0.jar

How to use QATCodec for Spark SQL Paruquet Datasource

1. Copy the jars to the Spark

Please copy the following jars obtained in the previous steps to appropriate location in Spark

parquet-format-2.4.0.jar
parquet-common-1.10.1.jar
parquet-hadoop-1.10.1.jar

2. Configuration to enable QATCodec

Put below configurations to $SPARK_HOME/conf/spark-defaults.conf or via spark-shell --conf

spark.sql.parquet.compression.codec gzip
spark.hadoop.io.compression.codec.qat.enable true

For any security concerns, please visit https://01.org/security.

Intel Hadoop

Versions

Version
2.0.0 Apr 12, 2019
1.0.0 Apr 11, 2019