Examples

For a comprehensive inital introduction to CDAP and its capabilities, follow the Introduction to CDAP tutorial, covering everything from installation of a CDAP Sandbox through the creation of a real-world application.

For developers intent on building Java-based CDAP applications, see the Getting Started's Quick Start/Web Log Analytics example.

The CDAP Sandbox includes these examples in the download:

Example Name Description
Hello World A simple HelloWorld App that's written using CDAP. It introduces how the components stream, flow, dataset, and service are used in a CDAP application.
Clicks and Views An application that demonstrates a reduce-side join across two streams using a MapReduce program.
Count Random An application that demonstrates the @Tick feature of flows. It uses a tick method to generate random numbers which are then counted by downstream flowlets.
Data Cleansing A Cask Data Application Platform (CDAP) example demonstrating incrementally consuming partitions of a partitioned fileset using MapReduce.
Decision Tree Regression An application demonstrating machine-learning model training using a Spark2 program. It trains decision tree regression models from labeled data uploaded through a Service.
FileSet Example A variation of the WordCount example that operates on files. It demonstrates the usage of the FileSet dataset, including a service to upload and download files, and a MapReduce that operates over these files.
Log Analysis An example demonstrating Spark and MapReduce running in parallel inside a workflow, showing the use of forks within workflows.
Purchase

This example demonstrates use of many of the CDAP components—streams, flows, flowlets, datasets, queries, MapReduce programs, workflows, and services—in a single application.

A flow receives events from a stream, each event describing a purchase ("John bought 5 apples for $2"); the flow processes the events and stores them in a dataset. A MapReduce program reads the dataset, compiles the purchases for each customer into a purchase history and stores the histories in a second dataset. The purchase histories can then be queried either through a service or an ad-hoc SQL query.

Spam Classifier An application that demonstrates a Spark Streaming application that classifies Kafka messages as either "spam" or "ham" (not "spam") based on a trained Spark MLlib NaiveBayes model.
Spark K-Means An application that demonstrates streaming text analysis using a Spark program. It calculates the centers of points from an input stream using the K-Means clustering method.
Spark Page Rank An application that demonstrates text analysis using Spark and MapReduce programs. It computes the page rank of URLs from an input stream.
Sport Results An application that illustrates the use of partitioned File sets. It loads game results into a File set partitioned by league and season, and processes them with MapReduce.
Stream Conversion An application that demonstrates the use of time-partitioned File sets. It periodically converts a stream into partitions of a File set, which can be read by SQL queries.
User Profiles An application that demonstrates column-level conflict detection using the example of updating of user profiles in a dataset.
Web Analytics An application to generate statistics and to provide insights about web usage through the analysis of web traffic.
Wikipedia Pipeline An application that performs analysis on Wikipedia data using MapReduce and Spark programs running within a CDAP workflow: WikipediaPipelineWorkflow.
Word Count A simple application that counts words, and tracks word associations and unique words seen on the stream. It demonstrates the power of using datasets and how they can be employed to simplify storing complex data. It uses a configuration class to configure the application at deployment time.

What's Next

For more about developing data application using CDAP:

  • Look at our How-To Guides and Tutorials, with a collection of quick how-to-guides and longer tutorials covering a complete range of Big Data application topics.
  • Explore a web analytics application source code. It includes test cases that show unit-testing an application.
  • For a detailed understanding of what CDAP is capable of, read our Overview and Building Blocks sections.