Hive Execution Engines

Overview

As of CDAP 3.4.0, CDAP Explore has added support for two additional execution engines: Apache Spark and Apache Tez.

Hive on Spark (Experimental)

Using this feature, you can configure CDAP Explore to use Apache Spark as the execution engine. To use this feature, you should configure Hive to use Spark as the execution engine in the hive-site.xml file that is used by CDAP. In particular, these properties need to be set:

Parameter Value
hive.execution.engine spark
spark.eventLog.enabled true
spark.eventLog.dir Path to the Spark event log directory (must exist)
spark.executor.memory 512m
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory 1g

This feature is currently experimental in CDAP due to these limitations:

  • It requires Spark to be installed on all cluster nodes.
  • Currently, CDAP Explore launches a new Spark job for every query. Starting a new Spark job for every query may bring a significant overhead.
  • Users cannot dynamically adjust memory for containers created for the query.

Hive on Tez

Using this feature, you can configure CDAP Explore to use Apache Tez as the execution engine. To use this feature, you should configure Hive to use Tez as the execution engine in the hive-site.xml file that is used by CDAP.

In addition, you should set these environment variables in cdap-env.sh:

Environment Variable Value
TEZ_HOME Path to home directory of the Apache Tez installation
TEZ_CONF_DIR Path to the directory where tez-site.xml is located