Docker is one of the easiest ways to start working with CDAP without having to manually configure anything. A Docker image with the CDAP SDK pre-installed is available on the Docker Hub for download.
🔗Docker from a Command Line
Docker environments are available for a variety of platforms. Download and install Docker for your platform by following the platform-specific installation instructions from Docker.com, and verify that the Docker environment is working and has started correctly.
If you are not running on Linux, you will need to create and start a Docker Virtual Machine (VM) before you can use containers. For example:
$ docker-machine create --driver virtualbox cdap $ docker-machine env cdap
> docker-machine create --driver virtualbox cdap > docker-machine env cdap
This will create a new Docker virtual machine using VirtualBox named
cdapand print out the environment.
When you run
docker-machine env cdap, it will print a message on the screen such as:
export DOCKER_TLS_VERIFY="1" export DOCKER_HOST="tcp://192.168.99.100:2376" export DOCKER_CERT_PATH="/Users/<username>/.docker/machine/machines/cdap" export DOCKER_MACHINE_NAME="cdap" # Run this command to configure your shell: # eval $(docker-machine env cdap)
It is essential to run these export commands (or the single
evalcommand). Otherwise, subsequent Docker commands will fail because they won't be able to connect to the correct Docker VM.
If you are running Docker on either Mac OS X or Microsoft Windows, Docker is running a virtual Linux machine on top of your host OS. You will need to use the address shown above (such as
192.168.99.100) as the host name when either connecting to the CDAP UI or making an HTTP request.
Once Docker has started, pull down the CDAP Docker Image from the Docker Hub using:
$ docker pull caskdata/cdap-standalone:4.0.0
> docker pull caskdata/cdap-standalone:4.0.0
Start the CDAP Standalone Docker container with:
$ docker run -d --name cdap-standalone -p 11011:11011 -p 11015:11015 caskdata/cdap-standalone:4.0.0
> docker run -d --name cdap-standalone -p 11011:11011 -p 11015:11015 caskdata\cdap-standalone:4.0.0
This will start the container, name it
cdap-standalone, and setup the proxying of ports.
CDAP will start automatically once the container starts. CDAP’s software directory is under
Once CDAP starts, it will instruct you to connect to the CDAP UI with a web browser at http://localhost:11011/.
If you are running Docker on either Mac OS X or Microsoft Windows, replace
localhostwith the Docker VM's IP address (such as
192.168.99.100) that you obtained earlier. Start a browser and enter the address to access the CDAP UI from outside Docker.
To control the CDAP instance, use this command, substituting one of
$ docker exec -d cdap-standalone /opt/cdap/sdk/bin/cdap sdk <command>
> docker exec -d cdap-standalone /opt/cdap/sdk/bin/cdap sdk <command>
When you are finished, stop CDAP and then shutdown Docker:
$ docker exec -d cdap-standalone /opt/cdap/sdk/bin/cdap sdk stop $ docker-machine stop cdap
> docker exec -d cdap-standalone /opt/cdap/sdk/bin/cdap sdk stop > docker-machine stop cdap
For a full list of Docker Commands, see the Docker Command Line Documentation.
🔗Docker using Kitematic
Docker Kitematic is available as part of the Docker Toolbox for either Mac OS X or Microsoft Windows. It is a graphical user interface for running Docker containers. Follow these steps to install Kitematic and then download, start, and connect to a CDAP container.
Download and install the Docker Toolbox for either Mac OS X or Microsoft Windows.
Start Kitematic. On Mac OS X, it will be installed in
/Applications/Docker/Kitematic; on Windows, in
Start Menu > Docker > Kitematic.
Once Kitematic has started, search for the CDAP image using the search box at the top of the window and
caskdata:cdap-standalone. Then click on the repository menu, circled in red here:
Click on the tags button:
Select the desired version. Note that the tag latest is the last version that was put up at Docker Hub, which is not the necessarily the desired version, which is
Close the menu by pressing the
Xin the circle. Press "Create" to download and start the CDAP image. When it has started up, you will see in the logs a message that the CDAP UI is listening on port 11011:
To connect a web browser for the CDAP UI, you'll need to find the external IP addresses and ports that the Docker host is exposing. The easiest way to do that is click on the Settings tab, and then the Ports tab:
This shows that the CDAP container is listening on the internal port
11011within the Docker host, while the Docker host proxies that port on the virtual machine IP address and port (
192.168.99.100:32769). Enter that address and port into your system web browser to connect to the CDAP UI:
🔗Docker and CDAP Applications
- In order to begin building CDAP applications, have our recommended software and tools installed in your environment.
🔗Development Environment Setup
🔗Creating an Application
When writing a CDAP application, it's best to use an integrated development environment (IDE) that understands the application interface and provides code-completion in writing interface methods.
The best way to start developing a CDAP application is by using the Maven archetype:
$ mvn archetype:generate \ -DarchetypeGroupId=co.cask.cdap \ -DarchetypeArtifactId=cdap-app-archetype \ -DarchetypeVersion=4.0.0 \ -DgroupId=org.example.app
> mvn archetype:generate ^ -DarchetypeGroupId=co.cask.cdap ^ -DarchetypeArtifactId=cdap-app-archetype ^ -DarchetypeVersion=4.0.0 ^ -DgroupId=org.example.app
This creates a Maven project with all required dependencies, Maven plugins, and a simple application template for the development of your application. You can import this Maven project into your preferred IDE—such as IntelliJ or Eclipse—and start developing your first CDAP application.
For an application that contains a MapReduce program, set the
cdap-mapreduce-archetype; for Spark, use either
Note: Replace the groupId parameter (
org.example.app) with your own organization, but it must not be replaced with
Complete examples for each archetype:
$ mvn archetype:generate -DarchetypeGroupId=co.cask.cdap -DarchetypeArtifactId=cdap-app-archetype -DarchetypeVersion=4.0.0 -DgroupId=org.example.app
$ mvn archetype:generate -DarchetypeGroupId=co.cask.cdap -DarchetypeArtifactId=cdap-mapreduce-archetype -DarchetypeVersion=4.0.0 -DgroupId=org.example.app
$ mvn archetype:generate -DarchetypeGroupId=co.cask.cdap -DarchetypeArtifactId=cdap-spark-java-archetype -DarchetypeVersion=4.0.0 -DgroupId=org.example.app
$ mvn archetype:generate -DarchetypeGroupId=co.cask.cdap -DarchetypeArtifactId=cdap-spark-scala-archetype -DarchetypeVersion=4.0.0 -DgroupId=org.example.app
Maven supplies a guide to the naming convention used above at https://maven.apache.org/guides/mini/guide-naming-conventions.html.
- Open IntelliJ and import the Maven project.
- Go to menu File -> Import Project...
- Select the
pom.xmlin the Maven project's directory.
- Select the Import Maven projects automatically and Automatically download: Sources, Documentation boxes in the Import Project from Maven dialog.
- Click Next, complete the remaining dialogs, and the new CDAP project will be created and opened.
- In your Eclipse installation, make sure you have the m2eclipse plugin installed.
- Go to menu File -> Import
- Enter maven in the Select an import source dialog to filter for Maven options.
- Select Existing Maven Projects as the import source.
- Browse for the Maven project's directory.
- Click Finish, and the new CDAP project will be imported, created and opened.
🔗Running CDAP from within an IDE
As CDAP is an open source project, you can download the source, import it into an IDE, then modify, build, and run CDAP.
To do so, follow these steps:
- Install all the prerequisite system requirements for CDAP development.
- Either clone the CDAP repo or download a ZIP of the source:
- Clone the CDAP repository using
$ git clone -b v4.0.0 https://github.com/caskdata/cdap.git
- Download the source as a ZIP from GitHub and unpack the ZIP in a suitable location
- Clone the CDAP repository using
- In your IDE, install the Scala plugin (for IntelliJ or Eclipse) as there is Scala code in the project.
- Open the CDAP project in the IDE as an existing project by finding and opening the
- Resolve dependencies: this can take quite a while, as there are numerous downloads required.
- Before starting CDAP, disable audit logs by changing the
false. Otherwise, due to CDAP-5864, Kafka errors will appear in the logs.
- In the case of IntelliJ, you can create a run configuration to run CDAP Standalone:
Run > EditConfigurations...
- Add a new "Application" run configuration.
- Set "Main class" to be
- Set "VM options" to
-Xmx1024m -XX:MaxPermSize=128m(for in-memory MapReduce jobs).
- Click "OK".
- You can now use this run configuration to start an instance of CDAP Standalone.
This will allow you to start CDAP and access it from either the command line (CLI)
or through the HTTP RESTful API. To start the CLI, you can either start
it from a shell using the
cdap script or run the
CLIMain class from the IDE.
If you want to run and develop the UI, you will need to follow additional instructions in the CDAP UI README.