🔗Docker Image

Docker is one of the easiest ways to start working with CDAP without having to manually configure anything. A Docker image with the CDAP SDK pre-installed is available on the Docker Hub for download.

To use the Docker image, you can either start the container from a command line or use Docker's Kitematic (on Mac OS X and Windows), a graphical user interface for running Docker containers.

🔗Docker from a Command Line

Docker environments are available for a variety of platforms. Download and install Docker for your platform by following the platform-specific installation instructions from Docker.com, and verify that the Docker environment is working and has started correctly.

  1. If you are not running on Linux, you will need to create and start a Docker Virtual Machine (VM) before you can use containers. For example:

    $ docker-machine create --driver virtualbox cdap
    $ docker-machine env cdap
    
    > docker-machine create --driver virtualbox cdap
    > docker-machine env cdap
    

    This will create a new Docker virtual machine using VirtualBox named cdap and print out the environment.

  2. When you run docker-machine env cdap, it will print a message on the screen such as:

    export DOCKER_TLS_VERIFY="1"
    export DOCKER_HOST="tcp://192.168.99.100:2376"
    export DOCKER_CERT_PATH="/Users/<username>/.docker/machine/machines/cdap"
    export DOCKER_MACHINE_NAME="cdap"
    # Run this command to configure your shell:
    # eval $(docker-machine env cdap)
    

    It is essential to run these export commands (or the single eval command). Otherwise, subsequent Docker commands will fail because they won't be able to connect to the correct Docker VM.

  3. If you are running Docker on either Mac OS X or Microsoft Windows, Docker is running a virtual Linux machine on top of your host OS. You will need to use the address shown above (such as 192.168.99.100) as the host name when either connecting to the CDAP UI or making an HTTP request.

  4. Once Docker has started, pull down the CDAP Docker Image from the Docker Hub using:

    $ docker pull caskdata/cdap-standalone:4.0.1
    
    > docker pull caskdata/cdap-standalone:4.0.1
    
  5. Start the CDAP Standalone Docker container with:

    $ docker run -d --name cdap-standalone -p 11011:11011 -p 11015:11015 caskdata/cdap-standalone:4.0.1
    
    > docker run -d --name cdap-standalone -p 11011:11011 -p 11015:11015 caskdata\cdap-standalone:4.0.1
    

    This will start the container, name it cdap-standalone, and setup the proxying of ports.

  6. CDAP will start automatically once the container starts. CDAP’s software directory is under /opt/cdap/sdk.

  7. Once CDAP starts, it will instruct you to connect to the CDAP UI with a web browser at http://localhost:11011/.

  8. If you are running Docker on either Mac OS X or Microsoft Windows, replace localhost with the Docker VM's IP address (such as 192.168.99.100) that you obtained earlier. Start a browser and enter the address to access the CDAP UI from outside Docker.

  9. To control the CDAP instance, use this command, substituting one of start, restart, status, or stop for <command>:

    $ docker exec -d cdap-standalone /opt/cdap/sdk/bin/cdap sdk <command>
    
    > docker exec -d cdap-standalone /opt/cdap/sdk/bin/cdap sdk <command>
    
  10. When you are finished, stop CDAP and then shutdown Docker:

    $ docker exec -d cdap-standalone /opt/cdap/sdk/bin/cdap sdk stop
    $ docker-machine stop cdap
    
    > docker exec -d cdap-standalone /opt/cdap/sdk/bin/cdap sdk stop
    > docker-machine stop cdap
    
  11. For a full list of Docker Commands, see the Docker Command Line Documentation.

🔗Docker using Kitematic

Docker Kitematic is available as part of the Docker Toolbox for either Mac OS X or Microsoft Windows. It is a graphical user interface for running Docker containers. Follow these steps to install Kitematic and then download, start, and connect to a CDAP container.

  1. Download and install the Docker Toolbox for either Mac OS X or Microsoft Windows.

  2. Start Kitematic. On Mac OS X, it will be installed in /Applications/Docker/Kitematic; on Windows, in Start Menu > Docker > Kitematic.

  3. Once Kitematic has started, search for the CDAP image using the search box at the top of the window and caskdata:cdap-standalone. Then click on the repository menu, circled in red here:

    ../../_images/kitematic-1-searching.png
  4. Click on the tags button:

    ../../_images/kitematic-2-tags.png
  5. Select the desired version. Note that the tag latest is the last version that was put up at Docker Hub, which is not the necessarily the desired version, which is 4.0.1:

    ../../_images/kitematic-3-select-tag.png
  6. Close the menu by pressing the X in the circle. Press "Create" to download and start the CDAP image. When it has started up, you will see in the logs a message that the CDAP UI is listening on port 11011:

    ../../_images/kitematic-4-cdap-started.png
  7. To connect a web browser for the CDAP UI, you'll need to find the external IP addresses and ports that the Docker host is exposing. The easiest way to do that is click on the Settings tab, and then the Ports tab:

    ../../_images/kitematic-5-links.png
  8. This shows that the CDAP container is listening on the internal port 11011 within the Docker host, while the Docker host proxies that port on the virtual machine IP address and port (192.168.99.100:32769). Enter that address and port into your system web browser to connect to the CDAP UI:

    ../../_images/kitematic-6-cdap-ui.png

🔗Docker and CDAP Applications

🔗Development Environment Setup

🔗Creating an Application

When writing a CDAP application, it's best to use an integrated development environment (IDE) that understands the application interface and provides code-completion in writing interface methods.

The best way to start developing a CDAP application is by using the Maven archetype:

$ mvn archetype:generate \
    -DarchetypeGroupId=co.cask.cdap \
    -DarchetypeArtifactId=cdap-app-archetype \
    -DarchetypeVersion=4.0.1 \
    -DartifactId=myExampleApp \
    -DgroupId=org.example.app
> mvn archetype:generate ^
    -DarchetypeGroupId=co.cask.cdap ^
    -DarchetypeArtifactId=cdap-app-archetype ^
    -DarchetypeVersion=4.0.1 ^
    -DartifactId=myExampleApp ^
    -DgroupId=org.example.app

This creates a Maven project with all required dependencies, Maven plugins, and a simple application template for the development of your application (myExampleApp). You can import this Maven project into your preferred IDE—such as IntelliJ or Eclipse—and start developing your first CDAP application.

For an application that contains a MapReduce program, set the archetypeArtifactId to cdap-mapreduce-archetype; for Spark, use either cdap-spark-java-archetype or cdap-spark-scala-archetype.

Note: Replace the artifactId (myExampleApp) and groupId parameters (org.example.app) with your own app name and organization, but the groupId must not be replaced with co.cask.cdap.

Complete examples for each archetype:

$ mvn archetype:generate -DarchetypeGroupId=co.cask.cdap -DarchetypeArtifactId=cdap-app-archetype -DarchetypeVersion=4.0.1
$ mvn archetype:generate -DarchetypeGroupId=co.cask.cdap -DarchetypeArtifactId=cdap-mapreduce-archetype -DarchetypeVersion=4.0.1
$ mvn archetype:generate -DarchetypeGroupId=co.cask.cdap -DarchetypeArtifactId=cdap-spark-java-archetype -DarchetypeVersion=4.0.1
$ mvn archetype:generate -DarchetypeGroupId=co.cask.cdap -DarchetypeArtifactId=cdap-spark-scala-archetype -DarchetypeVersion=4.0.1

When prompted, complete the values for groupId and artifactId parameters. Enter for the groupId parameter your own organization; it must not be replaced with co.cask.cdap. (The version and package parameters can be either specified or you can use the Maven defaults.)

Maven supplies a guide to the naming convention used above at https://maven.apache.org/guides/mini/guide-naming-conventions.html.

🔗Using IntelliJ

  1. Open IntelliJ and import the Maven project by:
    • If at the starting IntelliJ dialog, click on Import Project; or
    • If an existing project is open, go to the menu item File -> Open...
  2. Navigate to and select the pom.xml in the Maven project's directory.
  3. In the Import Project from Maven dialog, select the Import Maven projects automatically and Automatically download: Sources, Documentation boxes.
  4. Click Next, complete the remaining dialogs, and the new CDAP project will be created and opened.

🔗Using Eclipse

  1. In your Eclipse installation, make sure you have the m2eclipse plugin installed.
  2. Go to menu File -> Import
  3. Enter maven in the Select an import source dialog to filter for Maven options.
  4. Select Existing Maven Projects as the import source.
  5. Browse for the Maven project's directory.
  6. Click Finish, and the new CDAP project will be imported, created and opened.

🔗Running CDAP from within an IDE

As CDAP is an open source project, you can download the source, import it into an IDE, then modify, build, and run CDAP.

To do so, follow these steps:

  1. Install all the prerequisite system requirements for CDAP development.
  2. Either clone the CDAP repo or download a ZIP of the source:
    • Clone the CDAP repository using $ git clone -b v4.0.1 https://github.com/caskdata/cdap.git
    • Download the source as a ZIP from GitHub and unpack the ZIP in a suitable location
  3. In your IDE, install the Scala plugin (for IntelliJ or Eclipse) as there is Scala code in the project.
  4. Open the CDAP project in the IDE as an existing project by finding and opening the cdap/pom.xml.
  5. Resolve dependencies: this can take quite a while, as there are numerous downloads required.
  6. Before starting CDAP, disable audit logs by changing the audit.enabled setting in cdap-default.xml to false. Otherwise, due to CDAP-5864, Kafka errors will appear in the logs.
  7. In the case of IntelliJ, you can create a run configuration to run CDAP Standalone:
    1. Select Run > Edit Configurations...
    2. Add a new "Application" run configuration.
    3. Set "Main class" to be co.cask.cdap.StandaloneMain.
    4. Set "VM options" to -Xmx1024m -XX:MaxPermSize=128m (for in-memory MapReduce jobs).
    5. Click "OK".
    6. You can now use this run configuration to start an instance of CDAP Standalone.

This will allow you to start CDAP and access it from either the command line (CLI) or through the HTTP RESTful API. To start the CLI, you can either start it from a shell using the cdap script or run the CLIMain class from the IDE.

If you want to run and develop the UI, you will need to follow additional instructions in the CDAP UI README.

🔗Building and Running CDAP Applications

See Building and Running CDAP Applications for information on accessing the CDAP CLI and CDAP SDK bin utilities, building examples, starting CDAP, and deploying, starting, and stopping applications.