CDAP Developers’ Manual

  • Getting Started Developing: A quick, hands-on introduction to developing with CDAP, which guides you through installing the CDAP SDK, setting up your development environment, starting and stopping CDAP, and building and running example applications.
  • Overview: Covers the overall architecture and technology behind CDAP, including the abstraction of Data and Applications, CDAP modes and components, and the anatomy of a Big Data application.
  • Building Blocks: This section covers the two core abstractions in the Cask Data Application Platform: Data and Applications. Data abstractions include streams, datasets, and views. Application abstraction is accomplished using flows and flowlets, MapReduce, Spark, workers, workflows, schedules, and services. Details are provided on working with these abstractions to build Big Data applications.
  • Security: CDAP supports securing clusters using perimeter security. Configuration and client authentication are covered in this section.
  • Testing and Debugging: CDAP has a test framework that developers can use with their applications plus tools and practices for debugging your application prior to deployment.
  • Ingesting Data: CDAP comes with a number of tools to make a developer’s life easier. These tools help with ingesting data into CDAP using Java, Python, and Ruby APIs, and include an Apache Flume Sink implementation.
  • Data Exploration: Data in CDAP can be explored without writing any code through the use of ad-hoc SQL-like queries. Exploration of streams and datasets, along with integration with business intelligence tools, are covered in this section.
  • Advanced Topics: Covers advanced topics on CDAP that will be of interest to developers who want a deeper dive into CDAP, including suggested best practices for CDAP development, class loading in CDAP, and on adding a custom logback to a CDAP application.