System Requirements

In this section, we list the specific hardware, memory, core, and network requirements, and the software prerequisites that need to be met and completed before installation of the CDAP components.

Complete the requirements and instructions below prior to installing the CDAP components.

Hardware Requirements

Systems hosting the CDAP components must meet these hardware specifications, in addition to having CPUs with a minimum speed of 2 GHz:

CDAP Component Package Hardware Component Specifications
CDAP Master cdap-master RAM 2 GB minimum, 4 GB recommended
CDAP Router cdap-gateway RAM 2 GB minimum, 4 GB recommended
CDAP UI cdap-ui RAM 1 GB minimum, 2 GB recommended
CDAP Kafka cdap-kafka RAM 1 GB minimum, 2 GB recommended
  Disk Space CDAP Kafka maintains a data cache in a configurable data directory. Required space depends on the number of CDAP applications deployed and running in CDAP and the quantity of logs and metrics that they generate.
CDAP Authentication Server cdap-security RAM 1 GB minimum, 2 GB recommended

Memory and Core Requirements

Memory and core requirements are governed by two sources: CDAP and YARN.

The default settings for CDAP are found in the cdap-defaults.xml, and are overridden in particular instances by the cdap-site.xml file. These vary with each service and range from 512 to 1024 MB and from one to two cores.

The YARN settings will override these; for instance, the minimum YARN container size is determined by yarn.scheduler.minimum-allocation-mb. The YARN default in Hadoop is 1024 MB, so containers will be allocated with 1024 MB, even if the CDAP settings are for 512 MB.

With these default YARN and CDAP memory settings, just starting CDAP can require having 14 to 16 CPU cores (and a total of 14 to 16 GB of memory) available to YARN.

Network Requirements

CDAP components communicate over your network with HBase, HDFS, and YARN. For the best performance, CDAP components should be located on the same LAN, ideally running at 1 Gbps or faster. A good rule of thumb is to treat CDAP components as you would Hadoop datanodes.

Software Prerequisites

You’ll need this software installed:

  • A Java runtime on each CDAP node and Hadoop datanode.
  • A Hadoop, HBase, Hive (and optionally Spark) environment to run against.
  • To use the ad-hoc querying capabilities of CDAP, ensure the cluster has a compatible version of Hive installed. See the section on Hadoop Compatibility.
  • If Hive is not going to be installed, you will need to disable the CDAP Explore Service, as by default it is enabled. The installation instructions describe how to configure this.
  • CDAP nodes require Hadoop and HBase client installation and configuration. Note: No Hadoop services need actually be running.
  • We recommend installing an NTP (Network Time Protocol) daemon on all nodes of the cluster, including those with CDAP components.

Java Runtime

The latest JDK or JRE version 1.7.xx or 1.8.xx for Linux, Windows, or Mac OS X must be installed in your environment; we recommend the Oracle JDK.

To check the Java version installed, run the command:

$ java -version

CDAP is tested with the Oracle JDKs; it may work with other JDKs such as Open JDK, but it has not been tested with them.

Once you have installed the JDK, you’ll need to set the JAVA_HOME environment variable.

NTP (Network Time Protocol)

Installing NTP on RPM using Yum

  1. Install the NTP service and dependencies:

    $ sudo yum install ntp ntpdate ntp-doc
    
  2. Set the service to start at reboot:

    $ sudo chkconfig ntpd on
    
  3. Start the NTP server. This will continuously adjust the system time from an upstream NTP server:

    $ sudo /etc/init.d/ntpd start
    
  4. Synchronize the system clock with the 0.pool.ntp.org server. You should use this command only once:

    $ sudo ntpdate -u pool.ntp.org
    
  5. Synchronize the hardware clock (to prevent synchronization problems), unless on a virtual server:

    $ sudo hwclock --systohc
    

Installing NTP on Debian using APT

  1. Install the NTP service and dependencies:

    $ sudo apt-get install ntp
    
  2. Start the NTP server. This will continuously adjust the system time from an upstream NTP server:

    $ sudo service ntp start
    
  3. Synchronize the system clock with the 0.pool.ntp.org server. You should use this command only once:

    $ sudo ntpdate -u pool.ntp.org
    
  4. Synchronize the hardware clock (to prevent synchronization problems), unless on a virtual server:

    $ sudo hwclock --systohc
    

NTP Troubleshooting and Configuration

  • To check the synchronization:

    $ ntpq -p
    
         remote           refid      st t when poll reach   delay   offset  jitter
    ==============================================================================
    +173.44.32.10    18.26.4.105      2 u    5   64    1   78.786   -0.157   1.966
    *66.241.101.63   132.163.4.103    2 u    7   64    1   43.085    2.872   0.409
    +services.quadra 198.60.22.240    2 u    6   64    1   21.805    3.040   1.033
    -hydrogen.consta 200.98.196.212   2 u    7   64    1  114.250   16.011   0.873
    
  • If you need to adjust the configuration (add or delete servers, use servers closer to you, etc.):

    $ vi /etc/ntp.conf