Installation on Amazon EMR using Bootstrap Actions

Introduction

This section describes installing CDAP on Amazon EMR clusters using the Amazon EMR "Run If" Bootstrap Action to:

  • Install necessary EMR components;
  • Restrict CDAP installation to the EMR master node;
  • Download, install, and automatically configure CDAP for EMR; and
  • Run all services as the 'cdap' user

Information on Amazon EMR is available online.

CDAP 4.2 is compatible with Amazon EMR 4.6.0 through 4.8.2.

Using the Create Cluster Wizard

Note: For any settings not listed or specified below, we recommend using the default settings.

  1. Open the Amazon EMR console at https://console.aws.amazon.com/elasticmapreduce/.

  2. Choose "Create cluster."

  3. In the Advanced Options, Step 1: Software and Steps, set:

    • Vendor: Amazon
    • Release: emr-4.6.0 through emr-4.8.2
    • Software: Hadoop, HBase, Hive, Spark
    • No auto-terminate
    ../_images/emr-step1-software-and-steps.png

    EMR Create Cluster Wizard: Step 1: Software and Steps

  4. In Step 2: Hardware, set:

    • Network: use defaults
    • EC2 Subnet: use defaults
    • Master
      • EC2 Instance type: m3.xlarge
      • Instance count: 1
    • Core
      • EC2 Instance type: m3.xlarge
      • Instance count: 4 (as a minimum)
    • Task
      • Instance count: 0 (not required)
    ../_images/emr-step2-hardware.png

    EMR Create Cluster Wizard: Step 2: Hardware

  5. In Step 3: General Cluster Settings, set:

    • Logging
    • Debugging
    • Termination protection (no auto-terminate)
    ../_images/emr-step3-general-cluster-settings.png

    EMR Create Cluster Wizard: Step 3: General Cluster Settings

  6. In Step 3: General Cluster Settings, add a Bootstrap Action:

    • Type: Run If

    • Optional arguments:

      instance.isMaster=true "curl https://downloads.cask.co/emr/install-4.2.0.sh | sudo bash -s"
      
    ../_images/emr-step3b-bootstrap-action-run-if.png

    EMR Create Cluster Wizard: Add Bootstrap Action

  7. In Step 4: Security, set following defaults, and then add a security group (next step).

    ../_images/emr-step4-security.png

    EMR Create Cluster Wizard: Step 4: Security

  8. In Step 4: Security, set additional EC2 Security Groups to the master node:

    • Master (one of the following):
      • A Security Group with ports 11011/11015 open; or
      • An SSH Tunnel
    ../_images/emr-step4b-additional-security-group.png

    EMR Create Cluster Wizard: Assigning additional security group to master node

Once the cluster is created, CDAP services will start up. This will take about 10 minutes after the cluster is in a Waiting state.

Verification

CDAP Smoke Test

The CDAP UI may initially show errors while all of the CDAP YARN containers are starting up. Allow for up to a few minutes for this.

The Administration page of the CDAP UI shows the status of the CDAP services. It can be reached at http://<cdap-host>:11011/cdap/administration, substituting for <cdap-host> the host name or IP address of the CDAP server:

../_images/console-distributed.png

CDAP UI: Showing started-up, Administration page.

Further instructions for verifying your installation are contained in Verification.