Installation using Apache Ambari

step-1step-2step-3step-4step-5

Notes

  • Apache Ambari can only be used to add CDAP to an existing Hadoop cluster, one that already has the required services (Hadoop: HDFS, YARN, HBase, ZooKeeper, and—optionally—Hive and Spark) installed.
  • Ambari is for setting up HDP (Hortonworks Data Platform) on bare clusters; it can’t be used for clusters with HDP already installed, where the original installation was not with Ambari.
  • These features are currently not included in the CDAP Apache Ambari Service (though they may in the future):
  • A number of features are currently planned to be added, including:

Preparing the Cluster

Hadoop Configuration

  1. ZooKeeper’s maxClientCnxns must be raised from its default. We suggest setting it to zero (0: unlimited connections). As each YARN container launched by CDAP makes a connection to ZooKeeper, the number of connections required is a function of usage.

  2. Ensure that YARN has sufficient memory capacity by lowering the default minimum container size (controlled by the property yarn.scheduler.minimum-allocation-mb). Lack of YARN memory capacity is the leading cause of apparent failures that we see reported. We recommend starting with these settings:

    • yarn.nodemanager.delete.debug-delay-sec: 43200
    • yarn.scheduler.minimum-allocation-mb: 512 mb

    Please ensure your yarn.nodemanager.resource.cpu-vcores and yarn.nodemanager.resource.memory-mb settings are set sufficiently to run CDAP, as described in the CDAP Memory and Core Requirements.

You can make these changes during the configuration of your cluster using Ambari.

HDFS Permissions

Ensure YARN is configured properly to run MapReduce programs. Often, this includes ensuring that the HDFS /user/yarn directory exists with proper permissions:

# su hdfs
$ hdfs dfs -mkdir -p /user/yarn && hadoop fs -chown yarn /user/yarn && hadoop fs -chgrp yarn /user/yarn

Downloading and Distributing Packages

Downloading CDAP Ambari Service

To install CDAP on a cluster managed by Ambari, we have available packages for RHEL-compatible and Ubuntu systems, which you can install onto your Ambari management server. These packages add CDAP to the list of available services which Ambari can install.

To install the cdap-ambari-service package, first add the appropriate CDAP repository to your system’s package manager by following the steps below. These steps will install a Cask repository on your Ambari server.

The repository version (shown in the commands below as "cdap/3.4") must match the CDAP series which you’d like installed on your cluster. To install the latest version of the CDAP 3.0 series, you would install the CDAP 3.0 repository. The default (in the commands below) is to use cdap/3.3, which has the widest compatibility with the Ambari-supported Hadoop distributions.

Replace—in the commands that follow on this page—all references to "cdap/3.4" with the CDAP Repository from the list below that you would like to use:

Supported Hortonworks Data Platform (HDP) Distributions
CDAP Series CDAP Repository Hadoop Distributions
CDAP 3.4.x cdap/3.4 HDP 2.0 through HDP 2.4
CDAP 3.3.x cdap/3.3 HDP 2.0 through HDP 2.3
CDAP 3.2.x cdap/3.2 HDP 2.0 through HDP 2.3
CDAP 3.1.x cdap/3.1 HDP 2.0 through HDP 2.2
CDAP 3.0.x cdap/3.0 HDP 2.0 and HDP 2.1

Note: The CDAP Ambari service has been tested on Ambari Server 2.0 and 2.1, as supplied from Hortonworks.

On RPM using Yum

Download the Cask Yum repo definition file:

$ sudo curl -o /etc/yum.repos.d/cask.repo http://repository.cask.co/centos/6/x86_64/cdap/3.4/cask.repo

This will create the file /etc/yum.repos.d/cask.repo with:

[cask]
name=Cask Packages
baseurl=http://repository.cask.co/centos/6/x86_64/cdap/3.4
enabled=1
gpgcheck=1

Add the Cask Public GPG Key to your repository:

$ sudo rpm --import http://repository.cask.co/centos/6/x86_64/cdap/3.4/pubkey.gpg

Update your Yum cache:

$ sudo yum makecache

On Debian using APT

Download the Cask APT repo definition file:

$ sudo curl -o /etc/apt/sources.list.d/cask.list http://repository.cask.co/ubuntu/precise/amd64/cdap/3.4/cask.list

This will create the file /etc/apt/sources.list.d/cask.list with:

deb [ arch=amd64 ] http://repository.cask.co/ubuntu/precise/amd64/cdap/3.4 precise cdap

Add the Cask Public GPG Key to your repository:

$ curl -s http://repository.cask.co/ubuntu/precise/amd64/cdap/3.4/pubkey.gpg | sudo apt-key add -

Update your APT-cache:

$ sudo apt-get update

Installing CDAP Ambari Service

Now, install the cdap-ambari-service package from the repo you specified above:

Installing the CDAP Service via YUM

$ sudo yum install -y cdap-ambari-service
$ sudo ambari-server restart

Installing the CDAP Service via APT

$ sudo apt-get install -y cdap-ambari-service
$ sudo ambari-server restart

Installing CDAP Services

You can now install CDAP using the Ambari Service Wizard.

Start the Ambari Service Wizard

  1. In the Ambari UI (the Ambari Dashboard), start the Add Service Wizard.

    ../_images/ss01-add-service.png

    Ambari Dashboard: Starting the Add Service Wizard

  2. Select CDAP from the list and click Next. If there are core dependencies which are not currently installed on the cluster, Ambari will prompt you to install them.

    ../_images/ss02-select-cdap.png

    Ambari Dashboard: Selecting CDAP

Assign CDAP Services to Hosts

  1. Next, assign CDAP services to hosts.

    CDAP consists of 4 daemons:

    1. Master: Coordinator service which launches CDAP system services into YARN
    2. Router: Serves HTTP endpoints for CDAP applications and REST API
    3. Kafka Server: For transporting CDAP metrics and CDAP system service log data
    4. UI: Web interface to CDAP and Cask Hydrator (for CDAP 3.2.x and later installations)
    ../_images/ss03-assign-masters.png

    Ambari Dashboard: Assigning Masters

    We recommended you install all CDAP services onto an edge node (or the NameNode, for smaller clusters) such as in our example above. After assigning the master hosts, click Next.

  2. Select hosts for the CDAP CLI client. This should be installed on every edge node on the cluster or, for smaller clusters, on the same node as the CDAP services.

    ../_images/ss04-choose-clients.png

    Ambari Dashboard: Selecting hosts for CDAP

  3. Click Next to customize the CDAP installation.

Customize CDAP

  1. On the Customize Services screen, click the Advanced tab to bring up the CDAP configuration. Under Advanced cdap-env, you can configure environment settings such as heap sizes and the directories used to store logs and pids for the CDAP services which run on the edge nodes.

    Including Spark: If you are including Spark, the Advanced cdap-env needs to contain the location of the Spark libraries, typically as SPARK_HOME=/usr/hdp/<version>/spark, where “<version>” matches the HDP version of the cluster, including its build iteration, such as “2.3.4.0-3485”.

    ../_images/ss05-config-cdap-env.png

    Ambari Dashboard: Customizing Services 1

  2. Under Advanced cdap-site, you can configure all options for the operation and running of CDAP and CDAP applications.

    ../_images/ss06-config-cdap-site.png

    Ambari Dashboard: Customizing Services 2

  3. To use the CDAP Explore service (to use SQL to query CDAP data), you must have Hive installed on the cluster, have the Hive client libraries installed on the same host as the CDAP services, and have the Advanced cdap-site explore.enabled option set to true (the default). If you do not have Hive installed or available, this option must be set to false.

    Router Bind Port, Router Server Port: These two ports should match; Router Server Port is used by the CDAP UI to connect to the CDAP Router service.

    ../_images/ss07-config-enable-explore.png

    Ambari Dashboard: Enabling CDAP Explore

    Additional environment variables can be set, as required, using Ambari’s “Configs > Advanced > Advanced cdap-env”.

    Additional CDAP configuration properties, not shown in the web interface, can be added using Ambari’s advanced custom properties at the end of the page. Documentation of the available CDAP properties is in the Appendix: cdap-site.xml and cdap-default.xml.

    For a complete explanation of these options, refer to the CDAP documentation of cdap-site.xml. When finished with configuration changes, click Next.

Starting CDAP Services

Deploying CDAP

  1. Review the desired service layout and click Deploy to begin the actual deployment of CDAP.

    ../_images/ss08-review-deploy.png

    Ambari Dashboard: Summary of Services

  2. Ambari will install CDAP and start the services.

    ../_images/ss09-install-start-test.png

    Ambari Dashboard: Install, Start, and Test

  3. After the services are installed and started, you will click Next to get to the Summary screen.

  4. This screen shows a summary of the changes that were made to the cluster. No services should need to be restarted following this operation.

    ../_images/ss10-post-install-summary.png

    Ambari Dashboard: Summary

  5. Click Complete to complete the CDAP installation.

CDAP Started

  1. You should now see CDAP listed on the main summary screen for your cluster.
../_images/ss11-main-screen.png

Ambari Dashboard: Selecting CDAP

Verification

Service Checks in Apache Ambari

  1. Selecting CDAP from the left sidebar, or choosing it from the Services drop-down menu, will take you to the CDAP service screen.
../_images/ss12-cdap-screen.png

Ambari Dashboard: CDAP Service Screen

CDAP is now running on your cluster, managed by Ambari. You can login to the CDAP UI at the address of the node running the CDAP UI service at port 9999.

CDAP Smoke Test

The CDAP UI may initially show errors while all of the CDAP YARN containers are starting up. Allow for up to a few minutes for this. The Services link in the CDAP UI in the upper right will show the status of the CDAP services.

../_images/console_01_overview.png

CDAP UI: Showing started-up before data or applications are deployed.

Further instructions for verifying your installation are contained in Verification.

Advanced Topics

Enabling Perimeter Security

CDAP Security is not currently supported when using Apache Ambari. The CDAP Apache Ambari Service is not integrated with the CDAP Authentication Server. As a consequence, any settings made to support CDAP Security will be erased by Ambari.

Enabling Kerberos

Ambari-managed Kerberos-enabled clusters are currently not supported in CDAP.

CDAP HA Setup

CDAP component high-availability is not supported.