Cask Data Application Platform Release Notes

Release 3.2.2

Improvement

  • CDAP-3644 - Made the metadata system datasets upgradable.
  • CDAP-4093 - Revised the installation documentation for Cloudera Manager and Apache Ambari.
  • CDAP-4133 - Added ability to get the live-info for the CDAP AppFabric system service.

Bug Fixes

  • CDAP-4067 - Fixed an issue where socket connections to the TransactionManager were not being closed.
  • CDAP-4092 - Fixes an issue that causes worker threads to go into an infinite recursion while exceptions are being thrown in channel handlers.
  • CDAP-4119 - Fixed a problem where, when the CDAP Master switched from active to standby, that any programs that were running were marked as failed.
  • CDAP-4278 - Fixed a problem that caused a Workflow to fail if datasets were used in custom actions.
  • CDAP-4373 - Fixed a problem that prevented MapReduce jobs from being run when the Resource Manager switches from active to standby in a Kerberos-enabled HA cluster.
  • CDAP-4384 - Fixed a problem that prevented streams from being read in HA HDFS mode.

Known Issues

  • CDAP-4393 - Metadata search for tags does not work if the search key is not the first tag in the list.
  • See also the Known Issues of version 3.2.0.

Release 3.2.1

New Features

  • CDAP-3951 - Added the ability for S3 batch sources and sinks to set additional file system properties.

Improvements

  • CDAP-3870 - Added logging and metrics support for Script, ScriptFilter, and Validator transforms.
  • CDAP-3939 - Improved artifact and application deployment failure handling.

Bug Fixes

  • CDAP-3342 - Fixed a problem with the CDAP SDK unable to start on certain Windows machines by updating the Hadoop native library in CDAP with a version that does not have a dependency on a debug version of the Microsoft msvcr100.dll.
  • CDAP-3815 - Fixed an issue where the regex filter for S3 batch sources wasn’t being applied correctly.
  • CDAP-3829 - Fixed snapshot sinks so that the data is explorable as a PartitionedFileSet.
  • CDAP-3833 - Fixed snapshot sinks so that they can be read safely.
  • CDAP-3859 - Fixed a compilation error in the Maven application archetype.
  • CDAP-3860 - Fixed a bug where plugins, packaged in the same artifact as an application class, could not be used by that application class.
  • CDAP-3891 - Updated the documentation to remove references to application templates and adaptors that were removed as of CDAP 3.2.0.
  • CDAP-3949 - Fixed a problem with running certain examples on Linux systems by increasing the maximum Java heap size of the Standalone SDK on Linux systems to 2048m.
  • CDAP-3961 - Fixed a missing dependency on cdap-hbase-compat-1.1 package in the CDAP Master package.

Release 3.2.0

New Features

  • CDAP-2556 - Added support for HBase1.1.
  • CDAP-2666 - Added a new API for creating an application from an artifact.
  • CDAP-2756 - Added the ability to write to multiple outputs from a MapReduce job.
  • CDAP-2757 - Added the ability to dynamically write to multiple partitions of a PartitionedFileSet dataset as the output of a MapReduce job.
  • CDAP-3253 - Added a Stream and Dataset Widget to the CDAP-UI.
  • CDAP-3390 - Added stream views, enabling reading from a single stream using various formats and schemas.
  • CDAP-3476 - Added a Validator Transform that can be used to validate records based on a set of available validators and configured to write invalid records to an error dataset.
  • CDAP-3516 - Added a service to manage the metadata of CDAP entities.
  • CDAP-3518 - Added the publishing of metadata change notifications to Apache Kafka.
  • CDAP-3519 - Added the ability to compute lineage of a CDAP dataset or stream in a given time window.
  • CDAP-3520 - Added RESTful APIs for adding/retrieving/deleting of metadata for apps/programs/datasets/streams.
  • CDAP-3521 - Added the ability to record a dataset or stream access by a CDAP program.
  • CDAP-3522 - Added the capability to search CDAP entities based on their metadata.
  • CDAP-3523 - Added RESTful APIs for searching CDAP entities based on business metadata.
  • CDAP-3527 - Added a data store to manage business metadata of CDAP entities.
  • CDAP-3549 - Added SSH port forwarding to the CDAP virtual machine.
  • CDAP-3556 - Added a data store for recording data accesses by CDAP programs and computing lineage.
  • CDAP-3590 - Added the ability to write to multiple sinks in ETL real-time and batch applications.
  • CDAP-3591 - Added the ability for real-time ETL pipelines to write to multiple sinks.
  • CDAP-3592 - Added the ability for batch ETL pipelines to write to multiple sinks.
  • CDAP-3626 - For the CSV and TSV stream formats, a “mapping” setting can now be specified, mapping stream event columns to schema columns.
  • CDAP-3693 - Added support for CDAP to work with HDP 2.3.

Improvements

  • CDAP-1914 - Added documentation of the RESTful endpoint to retrieve the properties of a stream.
  • CDAP-2514 - Added an interface to load a file into a stream from the CDAP-UI.
  • CDAP-2809 - The CDAP-UI “Errors” pop-up in the main screen now displays the time and date for each error.
  • CDAP-2872 - Updated the Cloudera Manager CSD to use support for logback.
  • CDAP-2950 - Cleaned up the messages shown in the errors dropdown in the CDAP-UI.
  • CDAP-3147 - Added a CDAP-CLI command to stop a workflow.
  • CDAP-3179 - Added support for upgrading the Hadoop distribution or the HBase version that CDAP is running on.
  • CDAP-3257 - Revised the documentation of the file cdap-default.xml, removed properties no longer in use, and corrected discrepancies between the documentation and the shipped XML file.
  • CDAP-3270 - Improved the help provided in the CDAP-CLI for the setting of stream formats.
  • CDAP-3275 - Upgraded netty-http version to 0.12.0.
  • CDAP-3282 - Added a HTTP RESTful API to update the application configuration and artifact version.
  • CDAP-3332 - Added a “clear” button in the CDAP-UI for cases where a user decides to not used a pre-populated schema.
  • CDAP-3351 - Defined a directory structure to be used for predefined applications.
  • CDAP-3357 - Added documentation in the source code on adding new commands and completers to the CDAP-CLI.
  • CDAP-3393 - In the CDAP-UI, added visualization for Workflow tokens in Workflows.
  • CDAP-3419 - HBaseQueueDebugger now shows the minimum queue event transaction write pointer both for each queue and for all queues.
  • CDAP-3443 - Added an example cdap-env.sh to the shipped packages.
  • CDAP-3464 - Added an example in the documentation explaining how to prune invalid transactions from the transaction manager.
  • CDAP-3490 - Modified the CDAP upgrade tool to delete all adapters and the ETLBatch and ETLRealtime ApplicationTemplates.
  • CDAP-3495 - Added the ability to persist the runtime arguments with which a program was run.
  • CDAP-3550 - Added support for writing to Amazon S3 in Avro and Parquet formats from batch ETL applications.
  • CDAP-3564 - Updated CDAP to use Tephra 0.6.2.
  • CDAP-3610 - Updated the transaction debugger client to print checkpoint information.

Bug Fixes

  • CDAP-1697 - Fixed an issue where failed dataset operations via Explore queries did not invalidate the associated transaction.
  • CDAP-1864 - Fixed a problem where users got an incorrect message while creating a dataset in a non-existent namespace.
  • CDAP-1892 - Fixed a problem with services returning the same message for all failures.
  • CDAP-1984 - Fixed a problem where a dataset could be created in a non-existent namespace in standalone mode.
  • CDAP-2428 - Fixed a problem with the CDAP-CLI creating file logs.
  • CDAP-2521 - Fixed a problem with the CDAP-CLI not auto-completing when setting a stream format.
  • CDAP-2785 - Fixed a problem with the CDAP-UI of buttons staying ‘in focus’ after clicking.
  • CDAP-2809 - The CDAP-UI “Errors” pop-up in the main screen now displays the time and date for each error.
  • CDAP-2892 - Fixed a problem with schedules not being deployed in suspended mode.
  • CDAP-3014 - Fixed a problem where failure of a spark node would cause a workflow to restart indefinitely.
  • CDAP-3073 - Fixed an issue with the CDAP standalone process periodically crashing with Out-of-Memory errors when writing to an Oracle table.
  • CDAP-3101 - Fixed a problem with workflow runs not getting scheduled due to Quartz exceptions.
  • CDAP-3121 - Fixed a problem with discrepancies between the documentation and the defaults actually used by CDAP.
  • CDAP-3200 - Fixed a problem in the CDAP-UI with the clone button in an incorrect position when using Firefox.
  • CDAP-3201 - Fixed a problem in the CDAP-UI with an incorrect tabbing order when using Firefox.
  • CDAP-3219 - Fixed a problem when specifying the HBase version using the HBASE_VERSION environment variable.
  • CDAP-3233 - Fixed a problem in the CDAP-UI error pop-ups not having a default focus on a button.
  • CDAP-3243 - Fixed a problem in the CDAP-UI with the default schema shown for streams.
  • CDAP-3260 - Fixed a problem in the CDAP-UI with scrolling on the namespaces dropdown on certain pages.
  • CDAP-3261 - Fixed a problem on CDAP distributed mode with the serializing of the metadata artifact causing a stack overflow.
  • CDAP-3305 - Fixed a problem in the CDAP-UI not warning users if they exit or close their browser without saving.
  • CDAP-3313 - Fixed a problem in the CDAP-UI with refreshing always returning to the overview page.
  • CDAP-3326 - Fixed a problem with the table batch source requiring a row key to be set.
  • CDAP-3343 - Fixed a problem with the application deployment for apps that contain Spark.
  • CDAP-3349 - Fixed a problem with the display of ETL application metrics in the CDAP-UI.
  • CDAP-3355 - Fixed a problem in the CDAP examples with the use of a runtime argument, min.pages.threshold.
  • CDAP-3362 - Fixed a problem with the logback-container.xml not being copied into master services.
  • CDAP-3374 - Fixed a problem with warning messages in the logs indicating that programs were running that actually were not running.
  • CDAP-3376 - Fixed a problem with being unable to deploy the SparkPageRank example application on a cluster.
  • CDAP-3386 - Fixed a problem with the Spark classes not being found when running a Spark program through a Workflow in CDAP Distributed mode on HDP 2.2.
  • CDAP-3394 - Fixed a problem with the deployment of applications through the CDAP-UI.
  • CDAP-3399 - Fixed a problem with the SparkPageRankApp example spawning multiple containers in distributed mode due to its number of services.
  • CDAP-3400 - Fixed an issue with warning messages about the notification system every time the CDAP Standalone is restarted.
  • CDAP-3408 - Fixed a problem with running the CDAP Explore Service on CDH 5.[2,3].
  • CDAP-3432 - Fixed a bug where connecting with a certain namespace from the CLI would not immediately display that namespace in the CLI prompt.
  • CDAP-3435 - Fixed an issue where the program status was shown as running even after it is stopped.
  • CDAP-3442 - Fixed a problem that caused application creation to fail if a config setting was given to an application that does not use a config.
  • CDAP-3449 - Fixed a problem with the readless increment co-processor not handling multiple readless increment columns in the same row.
  • CDAP-3452 - Fixed a problem that prevented explore service working on clusters with secure hive 0.14.
  • CDAP-3458 - Fixed a problem where streams events that had already been processed were re-processed in flows.
  • CDAP-3470 - Fixed an issue with error messages being logged during a master process restart.
  • CDAP-3472 - Fixed the error message returned when trying to stop a program started by a workflow.
  • CDAP-3473 - Fixed a problem with a workflow failure not updating a run record for the inner program.
  • CDAP-3530 - Fixed a problem with the CDAP-UI performance when rendering flow diagrams with a large number of nodes.
  • CDAP-3563 - Removed faulty and unused metrics around CDAP file resource usage.
  • CDAP-3574 - Fix an issue with Explore not working on HDP Hive 0.12.
  • CDAP-3603 - Fixed an issue with configuration properties for ETL Transforms being validated at runtime instead of when an application is created.
  • CDAP-3618 - Fix a problem where suspended schedules were lost when CDAP master was restarted.
  • CDAP-3660 - Fixed and issue where the Hadoop filesystem object was getting instantiated before the Kerberos keytab login was completed, leading to CDAP processes failing after the initial ticket expired.
  • CDAP-3700 - Fixed an issue with the log saver having numerous open connections to HBase, causing it to go Out-of-Memory.
  • CDAP-3711 - Fixed an issue that prevented the downloading of Explore results on a secure cluster.
  • CDAP-3713 - Fixed an issue where certain RESTful APIs were not returning appropriate error messages for internal server errors.
  • CDAP-3716 - Fixed a possible deadlock when CDAP master is restarted with an existing app running on a cluster.

API Changes

  • CDAP-2763 - Added RESTful APIs for managing artifacts.
  • CDAP-2956 - Deprecated the existing API for configuring a workflow action, replacing it with a simpler API.
  • CDAP-3063 - Added CLI commands for managing artifacts.
  • CDAP-3064 - Added an ArtifactClient to interact with Artifact HTTP RESTful APIs.
  • CDAP-3283 - Added artifact information to Application RESTful APIs and the means to filter applications by artifact name and version.
  • CDAP-3324 - Added a RESTful API for creating an application from an artifact.
  • CDAP-3367 - Added the ability to delete an artifact.
  • CDAP-3488 - Changed the ETLBatchTemplate from an ApplicationTemplate to an Application.
  • CDAP-3535 - Added an API for programs to retrieve their application specification at runtime.
  • CDAP-3554 - Changed the plugin types from ‘source’ to either ‘batchsource’ or ‘realtimesource’, and from ‘sink’ to either ‘batchsink’ or ‘realtimesink’ to reflect that the plugins implement different interfaces.
  • CDAP-1554 - Moved constants for default and system namespaces from Common to Id.
  • CDAP-3388 - Added interfaces to cdap-spi that abstract StreamEventRecordFormat (and dependent interfaces) so users can extend the cdap-spi interfaces.
  • CDAP-3583 - Added a RESTful API for retrieving the metadata associated with a particular run of a CDAP program.
  • CDAP-3632 - Added a RESTful API for computing lineage of a CDAP dataset or stream.

Deprecated and Removed Features

  • See the CDAP 3.2.0 Javadocs for a list of deprecated and removed APIs.
  • CDAP-2667 - Removed application templates and adapters RESTful APIs, as these templates and adapters have been replaced with applications that can be controlled with the Lifecycle HTTP RESTful API.
  • CDAP-2951 - Removed deprecated methods in cdap-client.
  • CDAP-3596 - Replaced the ETL ApplicationTemplates with the new ETL Applications.

Known Issues

  • CDAP-3492 - In CDAP-CLI, executing ‘select *’ from a dataset with many fields generates an error.
  • CDAP-3641 - WorkflowStatsSLAHttpHandler hangs if units not provided.
  • CDAP-3262 - There is a problem under Microsoft Windows and using the CDAP Standalone scripts when JAVA_HOME is defined as a path with spaces in it. A workaround is to use a definition of JAVA_HOME that does not include spaces.
  • CDAP-3697 - CDAP Explore is broken on secure CDH 5.1.
  • CDAP-3698 - CDAP Explore is unable to get a delegation token while fetching next results on HDP2.0.
  • CDAP-3749 - The DBSource plugin does not allow a username with an empty password.
  • CDAP-3750 - If a table schema contains a field name that is a reserved word in Hive DDL, ‘enable explore’ fails.
  • CDAP-3819 - The Cassandra source does not handles spaces properly in column fields which require a comma-separated list.
  • See also the Known Issues of version 3.1.0.

Release 3.1.0

New Features

MapR 4.1 Support, HDP 2.2 Support, CDH 5.4 Support

  • CDAP-1614 - Added HBase 1.0 support.
  • CDAP-2318 - Made CDAP work on the HDP 2.2 distribution.
  • CDAP-2786 - Added support to CDAP 3.1.0 for the MapR 4.1 distro.
  • CDAP-2798 - Added Hive 0.14 support.
  • CDAP-2801 - Added CDH 5.4 Hive 1.1 support.
  • CDAP-2836 - Added support for restart of specific CDAP System Services Instances.
  • CDAP-2853 - Completed certification process for MapR on CDAP.
  • CDAP-2879 - Added Hive 1.0 in Standalone.
  • CDAP-2881 - Added support for HDP 2.2.x.
  • CDAP-2891 - Documented cdap-env.sh and settings OPTS for HDP 2.2.
  • CDAP-2898 - Added Hive 1.1 in Standalone.
  • CDAP-2953 - Added HiveServer2 support in a secure cluster.

Spark

  • CDAP-344 - Users can now run Spark in distributed mode.
  • CDAP-1993 - Added ability to manipulate the SparkConf.
  • CDAP-2700 - Added the ability to Spark programs of discovering CDAP services in distributed mode.
  • CDAP-2701 - Spark programs are able to collect Metrics in distributed mode.
  • CDAP-2703 - Users are able to collect/view logs from Spark programs in distributed mode.
  • CDAP-2705 - Added examples, guides and documentation for Spark in distributed mode. LogAnalysis application demonstrating parallel execution of the Spark and MapReduce programs using Workflows.
  • CDAP-2923 - Added support for the WorkflowToken in the Spark programs.
  • CDAP-2936 - Spark program can now specify resources usage for driver and executor process in distributed mode.

Workflows

  • CDAP-1983 - Added example application for processing and analyzing Wikipedia data using Workflows.
  • CDAP-2709 - Added ability to add generic keys to the WorkflowToken.
  • CDAP-2712 - Added ability to update the WorkflowToken in MapReduce and Spark programs.
  • CDAP-2713 - Added ability to persist the WorkflowToken per run of the Workflow.
  • CDAP-2714 - Added ability to query the WorkflowToken for the past as well as currently running Workflow runs.
  • CDAP-2752 - Added ability for custom actions to access the CDAP datasets and services.
  • CDAP-2894 - Added an API to retreive the system properties (e.g. MapReduce counters in case of MapReduce program) from the WorkflowToken.
  • CDAP-2923 - Added support for the WorkflowToken in the Spark programs.
  • CDAP-2982 - Added verification that the Workflow contains all programs/custom actions with a unique name.

Datasets

  • CDAP-347 - User can use datasets in beforeSubmit and afterFinish.

  • CDAP-585 - Changes to Spark program runner to use File dataset in Spark. Spark programs can now use file-based datasets.

  • CDAP-2734 - Added PartitionedFileSet support to setting/getting properties at the Partition level.

  • CDAP-2746 - PartitionedFileSets now record the creation time of each partition in the metadata.

  • CDAP-2747 - PartitionedFileSets now index the creation time of partitions to allow selection of partitions that were created after a given time. Introduced BatchPartitionConsumer as a way to incrementally consume new data in a PartitionedFileSet.

  • CDAP-2752 - Added ability for custom actions to access the CDAP datasets and services.

  • CDAP-2758 - FileSet now support existing HDFS locations.

    Treat base paths that start with “/” as absolute in the file system. An absolute base path for a (Partitioned)FileSet was interpreted as relative to the namespace’s data directory. Newly created FileSets interpret absolute base paths as absolute in the file system.

    Introduced a new property for (Partitioned)FileSets name “data.external”. If true, the base path of the FileSet is assumed to be managed by some external process. That is, the FileSet will not attempt to create the directory, it will not delete any files when the FileSet is dropped or truncated, and it will not allow adding or deleting files or partitions. In other words, the FileSet is read-only.

  • CDAP-2784 - Added support to write to PartitionedFileSet Partition metadata from MapReduce.

  • CDAP-2822 - IndexedTable now supports scans on the indexed field.

Metrics

  • CDAP-2975 - Added pre-split FactTables.
  • CDAP-2326 - Added better unit-test coverage for Cube dataset.
  • CDAP-1853 - Metrics processor scaling no longer needs a master services restart.
  • CDAP-2844 - MapReduce metrics collection no longer use counters, and instead report directly to Kafka.
  • CDAP-2701 - Spark programs are able to collect Metrics in distributed mode.
  • CDAP-2466 - Added CLI for metrics search and query.
  • CDAP-2236 - New CDAP UI switched over to using newer search/query APIs.
  • CDAP-1998 - Removed deprecated Context - Query param in Metrics v3 API.

Miscellaneous New Features

  • CDAP-332 - Added a Restful end-point for deleting Streams.
  • CDAP-1483 - QueueAdmin now uses Id.Namespace instead of simply String.
  • CDAP-1584 - CDAP CLI now shows the username in the CLI prompt.
  • CDAP-2139 - Removed a duplicate Table of Contents on the Documentation Search page.
  • CDAP-2515 - Added a metrics client for search and query by tags.
  • CDAP-2582 - Documented the licenses of the shipped CDAP-UI components.
  • CDAP-2595 - Added data modelling of flows.
  • CDAP-2596 - Added data modelling of MapReduce.
  • CDAP-2617 - Added the capability to get logs for a given time range from CLI.
  • CDAP-2618 - Simplified the Cube sink configurations.
  • CDAP-2670 - Added Parquet sink with time partitioned file dataset.
  • CDAP-2739 - Added S3 batch source for ETLbatch.
  • CDAP-2802 - Stopped using HiveConf.ConfVars.defaultValue, to support Hive >0.13.
  • CDAP-2847 - Added ability to add custom filters to FileBatchSource.
  • CDAP-2893 - Custom Transform now parses log formats for ETL.
  • CDAP-2913 - Provided installation method for EMR.
  • CDAP-2915 - Added an SQS real-time plugin for ETL.
  • CDAP-3022 - Added Cloudfront format option to LogParserTransform.
  • CDAP-3032 - Documented TestConfiguration class usage in unit-test framework.

Improvements

  • CDAP-593 - Spark no longer determines the mode through MRConfig.FRAMEWORK_NAME.
  • CDAP-595 - Refactored SparkRuntimeService and SparkProgramWrapper.
  • CDAP-665 - Documentation received a product-specifc 404 Page.
  • CDAP-683 - Changed all README files from markdown to rst format.
  • CDAP-1132 - Improved the CDAP Doc Search Result Sorting.
  • CDAP-1416 - Added links to upper level pages on Docs.
  • CDAP-1572 - Standardized Id classes.
  • CDAP-1583 - Refactored InMemoryWorkerRunner and ServiceProgramRunnner after ServiceWorkers were removed.
  • CDAP-1918 - Switched to using the Spark 1.3.0 release.
  • CDAP-1926 - Streams endpoint accept “now”, “now-30s”, etc., for time ranges.
  • CDAP-2007 - CLI output for “call service” is rendered in a copy-pastable manner.
  • CDAP-2310 - Kafka Source now able to apply a Schema to the Payload received.
  • CDAP-2388 - Added Java 8 support to CDAP.
  • CDAP-2422 - Removed redundant catch blocks in AdapterHttpHandler.
  • CDAP-2455 - Version in CDAP-UI footer is dynamic.
  • CDAP-2482 - Reduced excessive capitalisation in documentation.
  • CDAP-2531 - Adapter details made available through CDAP-UI.
  • CDAP-2539 - Added a build identifier (branch, commit) in header of Documentation HTML pages.
  • CDAP-2552 - Documentation Build script now flags errors.
  • CDAP-2554 - Documented that streams can now be deleted.
  • CDAP-2557 - Non-handler logic moved out of DatasetInstanceHandler.
  • CDAP-2570 - CLI prompt changes to ‘DISCONNECTED’ after CDAP is stopped.
  • CDAP-2578 - Ability to look at configs of created adapters.
  • CDAP-2585 - Use Id in cdap-client rather than Id.Namespace + String.
  • CDAP-2588 - Improvements to the MetricsClient APIs.
  • CDAP-2590 - Switching namespaces when in CDAP-UI Operations screens.
  • CDAP-2620 - CDAP clients now use Id classes from cdap proto, instead of plain strings.
  • CDAP-2628 - CDAP-UI: Breadcrumbs in Workflow/Mapreduce work as expected.
  • CDAP-2644 - In cdap-clients, no longer need to retrieve runtime arguments before starting a program.
  • CDAP-2651 - CDAP-UI: the Namespace is made more prominent.
  • CDAP-2681 - CDAP-UI: scrolling no longer enlarges the workflow diagram instead of scrolling through.
  • CDAP-2683 - CDAP-UI: added a remove icons for fork and Join.
  • CDAP-2684 - CDAP-UI: workflow diagrams are directed graphs.
  • CDAP-2688 - CDAP-UI: added search & pagination for lists of apps and datasets.
  • CDAP-2689 - CDAP-UI: shows which application is a part of which dataset.
  • CDAP-2691 - CDAP-UI: added ability to delete streams.
  • CDAP-2692 - CDAP-UI: added pagination for logs.
  • CDAP-2694 - CDAP-UI: added a loading icon/UI element when creating an adapter.
  • CDAP-2695 - CDAP-UI: long names of adapters are replaced by a short version ending in an ellipsis.
  • CDAP-2697 - CDAP-UI: added tab names during adapter creation.
  • CDAP-2716 - CDAP-UI: when creating an adapter, the tabbing order shows correctly.
  • CDAP-2733 - Implemented a TimeParitionedFileSet source.
  • CDAP-2811 - Improved Hive version detection.
  • CDAP-2921 - Removed backward-compatibility for pre-2.8 TPFS.
  • CDAP-2938 - Implemented new ETL application template creation.
  • CDAP-2983 - Spark program runner now calls onFailure() of the DatasetOutputCommitter.
  • CDAP-2986 - Spark program now are able to specify runtime arguments when reading or writing a datset.
  • CDAP-2987 - Added an example for Spark using datasets directly.
  • CDAP-2989 - Added an example for Spark using FileSets.
  • CDAP-3018 - Updated workflow guides for workflow token.
  • CDAP-3028 - Improved the system service restart endpoint to handle illegal instance IDs and “service not available”.
  • CDAP-3053 - Added schema javadocs that explain how to write the schema to JSON.
  • CDAP-3077 - Add the ability in TableSink to find schema.row.field case-insensitively.
  • CDAP-3144 - Changed CLI command descriptions to use consistent element case.
  • CDAP-3152 - Refactored ETLBatch sources and sinks.

Bug Fixes

  • CDAP-23 - Fixed a problem with the DatasetFramework not loading a given dataset with the same classloader across calls.
  • CDAP-68 - Made sure all network services in Singlenode only bind to localhost.
  • CDAP-376 - Fixed a problem with HBaseOrderedTable never calling HTable.close().
  • CDAP-550 - Consolidated Examples, Guides, and Tutorials styles.
  • CDAP-598 - Fixed problems with the CDAP ClassLoading model.
  • CDAP-674 - Fixed problems with CDAP code examples and versioning.
  • CDAP-814 - Resolved issues in the documentation about element versus program.
  • CDAP-1042 - Fixed a problem with specifying dataset selection as input for Spark job.
  • CDAP-1145 - Fixed the PurchaseAppTest.
  • CDAP-1184 - Fixed a problem with the DELETE call not clearing queue metrics.
  • CDAP-1273 - Fixed a problem with the ProgramClassLoader getResource.
  • CDAP-1457 - Fixed a memory leak of user class after running Spark program.
  • CDAP-1552 - Fixed a problem with Mapreduce progress metrics not being interpolated.
  • CDAP-1868 - Fixed a problem with Java Client and CLI not setting set dataset properties on existing datasets.
  • CDAP-1873 - Fixed a problem with warnings and errors when CDAP-Master starts up.
  • CDAP-1967 - Fixed a problem with CDAP-Master failing to start up due to conflicting dependencies.
  • CDAP-1976 - Fixed a problem with examples not following the same pattern.
  • CDAP-1988 - Fixed a problem with creating a Dataset through REST API failing if no properties are provided.
  • CDAP-2081 - Fixed a problem with StreamSizeSchedulerTest failing randomly.
  • CDAP-2140 - Fixed a problem with the CDAP-UI not showing system service status when system services are down.
  • CDAP-2177 - Fixed a problem with Enable and Fix LogSaverPluginTest.
  • CDAP-2208 - Fixed a problem with CDAP-Explore service failing on wrapped indexedTable with Avro (specific record) contents.
  • CDAP-2228 - Fixed a problem with Mapreduce not working in Hadoop 2.2.
  • CDAP-2254 - Fixed a problem with an incorrect error message returned by HTTP RESTful Handler.
  • CDAP-2258 - Fixed a problem with an internal error when attempting to start a non-existing program.
  • CDAP-2279 - Fixed a problem with namespace and gear widgets disappearing when the browser window is too narrow.
  • CDAP-2280 - Fixed a problem when starting a flow from the GUI that the GUI does not fully refresh the page.
  • CDAP-2341 - Fixed a problem that when a MapReduce fails to start, it cannot be started or stopped any more.
  • CDAP-2343 - Fixed a problem in the CDAP-UI that Mapreduce logs are convoluted with system logs.
  • CDAP-2344 - Fixed a problem with the formatting of logs in the CDAP-UI.
  • CDAP-2355 - Fixed a problem with an Adapter CLI help error.
  • CDAP-2356 - Fixed a problem with CLI autocompletion results not sorted in alphabetical order.
  • CDAP-2365 - Fixed a problem that when restarting CDAP-Master, the CDAP-UI oscillates between being up and down.
  • CDAP-2376 - Fixed a problem with logs from mapper and reducer not being collected.
  • CDAP-2444 - Fixed a problem with Cloudera Configuring doc needs fixing.
  • CDAP-2446 - Fixed a problem with that examples needing to be updated for new CDAP-UI.
  • CDAP-2454 - Fixed a problem with Proto class RunRecord containing the Twill RunId when serialized in REST API response.
  • CDAP-2459 - Fixed a problem with the CDAP-UI going into a loop when the Router returns 200 and App Fabric is not up.
  • CDAP-2474 - Fixed a problem with changing the format of the name for the connectionfactory in JMS source plugin.
  • CDAP-2475 - Fixed a problem with JMS source accepting the type and name of the JMS provider plugin.
  • CDAP-2480 - Fixed a problem with the Workflow current run info endpoint missing a /runs/ in the path.
  • CDAP-2489 - Fixed a problem when, in distributed mode and CDAP master restarted, status of the running PROGRAM is always returned as STOPPED.
  • CDAP-2490 - Fixed a problem with checking if invalid Run Records for Spark and MapReduce are part of run from Workflow child programs.
  • CDAP-2491 - Fixed a problem with the MapReduce program in standalone mode not always using LocalJobRunnerWithFix.
  • CDAP-2493 - After the fix for CDAP-2474 (ConnectionFactory in JMS source), the JSON file requires updating for the change to reflect in CDAP-UI.
  • CDAP-2496 - Fixed a problem with CDAP using its own transaction snapshot codec.
  • CDAP-2498 - Fixed a problem with validation while creating adapters only by types and not also by values.
  • CDAP-2517 - Fixed a problem with Explore docs not mentioning partitioned file sets.
  • CDAP-2520 - Fixed a problem with StreamSource not liking values of ‘0m’.
  • CDAP-2522 - Fixed a problem with TransactionStateCache needing to reference Tephra SnapshotCodecV3.
  • CDAP-2529 - Fixed a problem with CLI not printing an error if it can’t connect to CDAP.
  • CDAP-2530 - Fixed a problem with Custom RecordScannable<StructuredRecord> datasets not be explorable.
  • CDAP-2535 - Fixed a problem with IntegrationTestManager deployApplication not being namespaced.
  • CDAP-2538 - Fixed a problem with handling extra whitespace in CLI input.
  • CDAP-2540 - Fixed a problem with the Preferences Namespace CLI help having errors.
  • CDAP-2541 - Added the ability to stop the particular run of a program. Allows concurrent runs of the MapReduce and Workflow programs and the ability to stop programs at a per-run level.
  • CDAP-2547 - Fixed a problem with Kakfa Source - not using the persisted offset when the Adapter is restarted.
  • CDAP-2549 - Fixed a problem with a suspended workflow run record not being removed upon app/namespace delete.
  • CDAP-2562 - Fixed a problem with the automated Doc Build failing in develop.
  • CDAP-2564 - Improved the management of dataset resources.
  • CDAP-2565 - Fixed a problem with the transaction latency metric being of incorrect type.
  • CDAP-2569 - Fixed a problem with master process not being resilient to zookeeper exceptions.
  • CDAP-2571 - Fixed a problem with the RunRecord thread not resilient to errors.
  • CDAP-2587 - Fixed a problem with being unable to create default namespaces on starting up SDK.
  • CDAP-2635 - Fixed a problem with Namespace Create ignoring the properties’ config field.
  • CDAP-2636 - Fixed a problem with “out of perm gen” space in CDAP Explore service.
  • CDAP-2654 - Fixed a problem with False values showing up as ‘false null’ in the CDAP Explore UI.
  • CDAP-2685 - Fixed a problem with the CDAP-UI: no empty box for transforms.
  • CDAP-2729 - Fixed a problem with CDAP-UI not handling downstream system services gracefully.
  • CDAP-2740 - Fixed a problem with CDAP-UI not gracefully handling when the nodejs server goes down.
  • CDAP-2748 - Fixed a problem with the currently running and completed status of Spark programs in a workflow not highlighted in the UI.
  • CDAP-2765 - Fixed a problem with security warnings when CLI starts up.
  • CDAP-2766 - Fixed a problem with CLI asking for the user/password twice.
  • CDAP-2767 - Fixed a problem with incorrect error messages for namespace deletion.
  • CDAP-2768 - Fixed a problem with CLI and UI listing system.queue as a dataset.
  • CDAP-2769 - Fixed a problem with Use co.cask.cdap.common.app.RunIds instead of org.apache.twill.internal.RunIds for InMemoryServiceProgramRunner.
  • CDAP-2787 - Fixed a problem when the number of MapReduce task metrics going over limit and causing MapReduce to fail.
  • CDAP-2796 - Fixed a problem with emitting duplicate metrics for dataset ops.
  • CDAP-2803 - Fixed a problem with scan operations not reflecting in dataset ops metrics.
  • CDAP-2804 - Fixed a problem with DataSetRecordReader incorrectly reporting dataset ops metrics.
  • CDAP-2810 - Fixed a problem with IncrementAndGet, CompareAndSwap, and Delete ops on Table incorrectly reporting two writes each.
  • CDAP-2821 - Fixed a problem with a Spark native library linkage error causing CDAP standalone to stop.
  • CDAP-2823 - Fixed a problem with the conversion from Avro and to Avro not taking into account nested records.
  • CDAP-2830 - Fixed a problem with CDAP-UI dying when CDAP Master is killed.
  • CDAP-2832 - Fixed a problem where suspending a schedule takes a long time and the CDAP-UI does not provide any indication.
  • CDAP-2838 - Fixed a problem with poor error message when there is a mistake in security configration.
  • CDAP-2839 - Fixed a problem with the CDAP start script needing updating for the correct Node.js version.
  • CDAP-2848 - Fixed a problem with the Preferences Client test.
  • CDAP-2849 - Fixed a problem with the FileBatchSource reading files in twice if it takes longer that one workflow cycle to complete the job.
  • CDAP-2851 - Fixed a problem with RPM and DEB release artifacts being uploaded to incorrect staging directory.
  • CDAP-2854 - Fixed a problem with the instructions for using Docker.
  • CDAP-2855 - Fixed a problem with the example builds in VM failing with a Maven dependency error.
  • CDAP-2860 - Fixed a problem with the documentation for updating dataset properties.
  • CDAP-2861 - Fixed a problem with CDAP-UI not mentioning required fields in all entry forms.
  • CDAP-2862 - Fixed a problem with CDAP-UI creating multiple namespaces with the same name.
  • CDAP-2866 - Fixed a problem with FileBatchSource not reattempting to read in files if there is a failure.
  • CDAP-2870 - Fixed a problem with Workflow Diagrams.
  • CDAP-2871 - Fixed a problem with the Cloudera Manager Hbase Gateway dependency.
  • CDAP-2895 - Fixed a problem with a put operation on the WorkflowToken not throwing an exception.
  • CDAP-2899 - Fixed a problem with Mapreduce local dirs not getting cleaned up.
  • CDAP-2900 - Fixed a problem with exposing app.template.dir as a config option.
  • CDAP-2904 - Fixed a problem with “Make Request” button overlapping with paths when a path is long.
  • CDAP-2912 - Fixed a problem with HBaseQueueDebugger not sorting queue barriers correctly.
  • CDAP-2922 - Fixed a problem with datasets created through DynamicDatasetContext not having metrics context. Datasets in MapReduce and Spark programs, and workers, were not emitting metrics.
  • CDAP-2925 - Fixed a problem with the documentation on how to create datasets with properties.
  • CDAP-2932 - Fixed a problem with the AdapterClient getRuns method constructing a malformed URL.
  • CDAP-2935 - Fixed a problem with the logs endpoint to retrieve the latest entry not working correctly.
  • CDAP-2940 - Fixed a problem with the test case ArtifactStoreTest#testConcurrentSnapshotWrite.
  • CDAP-2941 - Fixed a problem with the ScriptTransform failing to initialize.
  • CDAP-2942 - Fixed a problem with the CDAP-UI namespace dropdown failing on standalone restart.
  • CDAP-2948 - Fixed a problem with creating Adapters.
  • CDAP-2952 - Fixed a problem with the plugin avro library not being accessible to MapReduce.
  • CDAP-2955 - Fixed a problem with a NoSuchMethodException when trying to explore Datasets/Stream.
  • CDAP-2971 - Fixed a problem with the dataset registration not registering datasets for applications upon deploy.
  • CDAP-2972 - Fixed a problem with being unable to instantiate dataset in ETLWorker initialization.
  • CDAP-2981 - Fixed a problem with undoing a FileSets upgrade in favor of versioning and backward-compatibility.
  • CDAP-2991 - Fixed a problem with Explore not working when it launches a MapReduce job.
  • CDAP-2992 - Fixed a problem with CLI broken for secure CDAP.
  • CDAP-2996 - Fixed a problem with CDAP-UI: Stop Run and Suspend Run buttons needed styling updates.
  • CDAP-2997 - Fixed a problem with SparkProgramRunnerTest failing randomly.
  • CDAP-2999 - Fixed a problem with MapReduce jobs showing the duration for tasks as 17 days before the mapper starts.
  • CDAP-3001 - Fixed a problem with truncating a custom dataset failing with internal server error.
  • CDAP-3002 - Fixed a problem with tick initialDelay not working properly.
  • CDAP-3003 - Fixed a problem with user metrics emitted from flowlets not being queryable using the flow’s tags.
  • CDAP-3006 - Fixed a problem with updating cdap-spark-* archetypes.
  • CDAP-3007 - Fixed a problem with testing all Spark apps/guides to work with 3.1 (in dist mode).
  • CDAP-3009 - Fixed a problem with the stream conversion upgrade being in the upgrade tool in 3.1.
  • CDAP-3010 - Fixed a problem with a Bower Dependency Error.
  • CDAP-3011 - Fixed a problem with the IncrementSummingScannerTest failing intermittently.
  • CDAP-3012 - Fixed a problem with the DistributedWorkflowProgramRunner localizing the spark-assembly.jar even if the workflow does not contain a Spark program.
  • CDAP-3013 - Fixed a problem with excluding a Spark assembly jar when building a MapReduce job jar.
  • CDAP-3019 - Fixed a problem with the PartitionedFileSet dropPartition not deleting files under the partition.
  • CDAP-3021 - Fixed a problem with allowing Cloudfront data to use BatchFileFilter.
  • CDAP-3023 - Fixed a problem with flowlet instance count defaulting to 1.
  • CDAP-3024 - Fixed a problem with surfacing more logs in CDAP-UI for System Services.
  • CDAP-3026 - Fixed a problem with updating SparkPageRank example docs.
  • CDAP-3027 - Fixed a problem with the DFSStreamHeartbeatsTest failing on clusters.
  • CDAP-3030 - Fixed a problem with the loading of custom datasets being broken after upgrading.
  • CDAP-3031 - Fixed a problem with deploying an app with a dataset with an invalid base path returning an “internal error”.
  • CDAP-3037 - Fixed a problem with not being able to use a PartitionedFileSet in a custom dataset. If a custom dataset embedded a Table and a PartitionedFileSet, loading the dataset at runtime would fail.
  • CDAP-3038 - Fixed a problem with logs not showing up in UI when using Spark.
  • CDAP-3039 - Fixed a problem with worker not stopping at the end of a run method in standalone.
  • CDAP-3040 - Fixed a problem with flowlet and stream metrics not being available in distributed mode.
  • CDAP-3042 - Fixed a problem with the BufferingTable not merging buffered writes with multi-get results.
  • CDAP-3043 - Fixed a problem with the Javadocs being broken.
  • CDAP-3044 - Fixed a problem with the user service ‘methods’ field in service specifications being inaccurate.
  • CDAP-3058 - Fixed a problem with the NamespacedLocationFactory not appending correctly.
  • CDAP-3066 - Fixed a problem with FileBatchSource not failing properly.
  • CDAP-3067 - Fixed a problem with the UpgradeTool throwing a NullPointerException during UsageRegistry.upgrade().
  • CDAP-3070 - Fixed a problem on Ubuntu 14.10 where removing JSON files from templates/plugins/ETLBatch breaks adapters.
  • CDAP-3072 - Fixed a problem with a documentation Javascript bug.
  • CDAP-3073 - Fixed a problem with out-of-memory perm gen space.
  • CDAP-3085 - Fixed a problem with adding integration tests for datasets.
  • CDAP-3086 - Fixed a problem with the CDAP-UI current adapter UI.
  • CDAP-3087 - Fixed a problem with CDAP-UI: a session timeout on secure mode.
  • CDAP-3088 - Fixed a problem with CDAP-UI: application types need to be updated.
  • CDAP-3092 - Fixed a problem with reading multiple files with one mapper in FileBatchSource.
  • CDAP-3096 - Fixed a problem with running MapReduce on HDP 2.2.
  • CDAP-3098 - Fixed problems with the CDAP-UI Adapter UI.
  • CDAP-3099 - Fixed a problem with CDAP-UI and that settings icons shift 2px when you click on them.
  • CDAP-3104 - Fixed a problem with CDAP Explore throwing an exception if a Table dataset does not set schema.
  • CDAP-3105 - Fixed a problem with LogParserTransform needing to emit HTTP status code info.
  • CDAP-3106 - Fixed a problem with Hive query - local MapReduce task failure on CDH-5.4.
  • CDAP-3125 - Fixed a problem with the WorkerProgramRunnerTest failing intermittently.
  • CDAP-3127 - Fixed a problem with the Kafka guide not working with CDAP 3.1.0.
  • CDAP-3132 - Fixed a problem with the ProgramLifecycleHttpHandlerTest failing intermittently.
  • CDAP-3145 - Fixed a problem with the Metrics processor not processing metrics.
  • CDAP-3146 - Fixed a problem with the CDAP VM build failing to instal the Eclipse plugin.
  • CDAP-3148 - Fixed a problem with CDAP Explore MapReduce queries failing due to MR-framework being localized in the mapper container.
  • CDAP-3149 - Fixed a problem with cycles in an adapter create page causing the browser to freeze.
  • CDAP-3151 - Fixed a problem with CDAP examples shipped with SDK using JDK 1.6.
  • CDAP-3161 - Fixed a problem with MapReduce no longer working with default Cloudera manager settings.
  • CDAP-3173 - Fixed a problem with upgrading to 3.1.0 crashing the HBase co-processor.
  • CDAP-3174 - Fixed a problem with the ETL source/transform/sinks descriptions and documentation.
  • CDAP-3175 - Fixed a problem with the AbstractFlowlet constructors being deprecated when they should not be.

Deprecated and Removed Features

Known Issues

  • CDAP-2878 - There is a problem with confusing semantics for TTL. The Table TTL property is interpreted as milliseconds in some contexts: DatasetDefinition.confgure() and getAdmin().
  • CDAP-2945 - There is a problem in the PartitionedFileSet causing MapReduce to fail if the input partition filter does not match any partitions.
  • CDAP-3000 - There is a problem with the Workflow token being in inconsistent state for nodes in a fork while the fork is still running. It becomes consistent after the join.
  • CDAP-3101 - There is a problem with workflow runs not being scheduled due to Quartz exceptions. The issue is related to that there cannot be more than 30 concurrent runs of a workflow.
  • CDAP-3179 - If you are using CDH 5.3 (CDAP 3.0.0) and are upgrading to CDH 5.4 (CDAP 3.1.0), you must first upgrade the underlying HBase before you upgrade CDAP. This means perform the CDH upgrade before upgrading the CDAP.
  • CDAP-3189 - Large MapReduce jobs can cause excessive logging in the CDAP logs.
  • CDAP-3221 - There is a problem when running Standalone mode: if an adapter is configured incorrectly such that it leads to a MapReduce job to fail repeatedly, then the SDK hits an OOM exception due to perm gen. The Standalone needs restarting at this point.
  • See also the Known Issues of version 3.0.1.

Release 3.0.3

Bug Fix

  • Fixed a Bower dependency error in the CDAP UI (CDAP-3010).

Release 3.0.2

Bug Fixes

Release 3.0.1

New Features

  • In the CDAP UI, mandatory parameters for Application Template creation are marked with asterisks, and if a user tries to create a template without one of those parameters, the missing parameter is highlighted (CDAP-2499).

Improvements

Tools

CDAP UI

CDAP SDK VM

  • Added the Apache Flume agent flume-ng to the CDAP SDK VM (CDAP-2612).
  • Added the ability to copy and paste to the CDAP SDK VM (CDAP-2611).
  • Pre-downloaded the example dependencies into the CDAP SDK VM to speed building of the CDAP examples (CDAP-2613).

Bug Fixes

General

  • Fixed a problem with the HBase store and flows with multiple queues, where one queue name is a prefix of another queue name (CDAP-1996).
  • Fixed a problem with namespaces with underscores in the name crashing the Hadoop HBase region servers (CDAP-2110).
  • Removed the requirement to specify the JDBC driver class property twice in the adaptor configuration for Database Sources and Sinks (CDAP-2453).
  • Fixed a problem in CDAP Distributed where the status of running program always returns as “STOPPED” when the CDAP Master is restarted (CDAP-2489).
  • Fixed a problem with invalid RunRecords for Spark and MapReduce programs that are run as part of a Workflow (CDAP-2490).
  • Fixed a problem with the CDAP Master not being HA (highly available) when a leadership change happens (CDAP-2495).
  • Fixed a problem with upgrading of queues with the UpgradeTool (CDAP-2502).
  • Fixed a problem with ObjectMappedTables not deleting missing fields when updating a row (CDAP-2523, CDAP-2524).
  • Fixed a problem with a stream not being created properly when deploying an application after the default namespace was deleted (CDAP-2537).
  • Fixed a problem with the Applicaton Template Kafka Source not using the persisted offset when the Adapter is restarted (CDAP-2547).
  • A problem with CDAP using its own transaction snapshot codec, leading to huge snapshot files and OutOfMemory exceptions, and transaction snapshots that can’t be read using Tephra’s tools, has been resolved by replacing the codec with Tephra’s SnapshotCodecV3 (CDAP-2563, CDAP-2946, TEPHRA-101).
  • Fixed a problem with CDAP Master not being resilient in the handling of ZooKeeper exceptions (CDAP-2569).
  • Fixed a problem with RunRecords not being cleaned up correctly after certain exceptions (CDAP-2584).
  • Fixed a problem with the CDAP Maven archetype having an incorrect CDAP version in it (CDAP-2634).
  • Fixed a problem with the description of the TwitterSource not describing its output (CDAP-2648).
  • Fixed a problem with the Twitter Source not handling missing fields correctly and as a consequence producing tweets (with errors) that were then not stored on disk (CDAP-2653).
  • Fixed a problem with the TwitterSource not calculating the time of tweet correctly (CDAP-2656).
  • Fixed a problem with the JMS Real-time Source failing to load required plugin sources (CDAP-2661).
  • Fixed a problem with executing Hive queries on a distributed CDAP due to a failure to load Grok classes (CDAP-2678).
  • Fixed a problem with CDAP Program jars not being cleaned up from the temporary directory (CDAP-2698).
  • Fixed a problem with ProjectionTransforms not handling input data fields with null values correctly (CDAP-2719).
  • Fixed a problem with the CDAP SDK running out of memory when MapReduce jobs are run repeatedly (CDAP-2743).
  • Fixed a problem with not using CDAP RunIDs in the in-memory version of the CDAP SDK (CDAP-2769).

CDAP CLI

  • Fixed a problem with the CDAP CLI not printing an error if it is unable to connect to a CDAP instance (CDAP-2529).
  • Fixed a problem with extra whitespace in commands entered into the CDAP CLI causing errors (CDAP-2538).

CDAP SDK Standalone

  • Updated the messages displayed when starting the CDAP Standalone SDK as to components and the JVM required (CDAP-2445).
  • Fixed a problem with the creation of the default namespace upon starting the CDAP SDK (CDAP-2587).

CDAP SDK VM

  • Fixed a problem with using the default namespace on the CDAP SDK Virtual Machine Image (CDAP-2500).
  • Fixed a problem with the VirtualBox VM retaining a MAC address obtained from the build host (CDAP-2640).

CDAP UI

  • Fixed a problem with incorrect flow metrics showing in the CDAP UI (CDAP-2494).
  • Fixed a problem in the CDAP UI with the properties in the Projection Transform being displayed inconsistently (CDAP-2525).
  • Fixed a problem in the CDAP UI not automatically updating the number of flowlet instances (CDAP-2534).
  • Fixed a problem in the CDAP UI with a window resize preventing clicking of the Adapter Template drop down menu (CDAP-2573).
  • Fixed a problem with the CDAP UI not performing validation of mandatory parameters before the creation of an adapter (CDAP-2575).
  • Fixed a problem with an incorrect version of CDAP being shown in the CDAP UI (CDAP-2586).
  • Reduced the number of clicks required to navigate and perform actions within the CDAP UI (CDAP-2622, CDAP-2625).
  • Fixed a problem with an additional forward-slash character in the URL causing a “page not found error” in the CDAP UI (CDAP-2624).
  • Fixed a problem with the error dropdown of the CDAP UI not scrolling when it has a large number of errors (CDAP-2633).
  • Fixed a problem in the CDAP UI with the Twitter Source’s consumer key secret not being treated as a password field (CDAP-2649).
  • Fixed a problem with the CDAP UI attempting to create an adapter without a name (CDAP-2652).
  • Fixed a problem with the CDAP UI not being able to find the ETL plugin templates on distributed CDAP (CDAP-2655).
  • Fixed a problem with the CDAP UI’s System Dashboard chart having a y-axis starting at “-200” (CDAP-2699).
  • Fixed a problem with the rendering of stack trace logs in the CDAP UI (CDAP-2745).
  • Fixed a problem with the CDAP UI not working with secure CDAP instances, either clusters or standalone (CDAP-2770).
  • Fixed a problem with the coloring of completed runs of Workflow DAGs in the CDAP UI (CDAP-2781).

Documentation

  • Fixed errors with the documentation examples of the ETL Plugins (CDAP-2503).
  • Documented the licenses of all shipped CDAP UI components (CDAP-2582).
  • Corrected issues with the building of Javadocs used on the website and removed Javadocs previously included in the SDK (CDAP-2730).
  • Added a recommended version (v.12.0) of Node.js to the documentation (CDAP-2762).

API Changes

Deprecated and Removed Features

Known Issues

  • The application in the cdap-kafka-ingest-guide does not run on Ubuntu 14.x and CDAP 3.0.x (CDAP-2632, CDAP-2749).
  • Metrics for TimePartitionedFileSets can show zero values even if there is data present (CDAP-2721).
  • In the CDAP UI: many buttons will remain in focus after being clicked, even if they should not retain focus (CDAP-2785).
  • When the CDAP-Master dies, the CDAP UI does not repsond appropriately, and instead of waiting for routing to the secondary master to begin, it loses its connection (CDAP-2830).
  • A workflow that is scheduled by time will not be run between the failure of the primary master and the time that the secondary takes over. This scheduled run will not be triggered at all. There is no warnings or messages about the missed run of the workflow. (CDAP-2831)
  • See also the Known Issues of version 3.0.0.

Release 3.0.0

New Features

New User Interface

  • Introduced a new UI, organization based on namespaces and users.
  • Users can switch between namespaces.
  • Uses web sockets to retrieve updates from the backend.
  • Development Section
    • Introduces a UI for programs based on run-ids.
    • Users can view logs and, in certain cases—flows—flowlets, of a program based on run ids.
    • Shows a list of datasets and streams used by a program, and shows programs using a specific dataset and stream.
    • Shows the history of a program (list of runs).
    • Datasets or streams are explorable on a dataset/stream level or on a global level.
    • Shows program level metrics on under each program.
  • Operations section
    • Introduces an operations section to explore metrics.
    • Allows users to create custom dashboard with custom metrics.
    • Users can add different types of charts (line, bar, area, pie, donut, scatter, spline, area-spline, area-spline-stacked, area-stacked, step, table).
    • Users can add multiple metrics on a single dashboard, or on a single widget on a single dashboard.
    • Users can organize the widgets in either a two, three, or four-column layout.
    • Users can toggle the frequency at which data is polled for a metric.
    • Users can toggle the resolution of data displayed in a graph.
  • Admin Section
    • Users can manage different objects of CDAP (applications, programs, datasets, and streams).
    • Users can create namespaces.
    • Through the Admin view, users can configure their preferences at the CDAP level, namespace level, or application level.
    • Users can manage the system services, applications, and streams through the Admin view.
  • Adapters
    • Users can create ETLBatch and ETLRealtime adapters from within the UI.
    • Users can choose from a list of plugins that comes by default with CDAP when creating an adapter.
    • Users can save an adapter as a draft, to be created at a later point-in-time.
    • Users can configure plugin properties with appropriate editors from within the UI when creating an adapter.
  • The Old CDAP Console has been deprecated.

Improvement

Bug Fixes

  • The CDAP Authentication server now reports the port correctly when the port is set to 0 (CDAP-614).
  • History of the programs running under workflow (Spark and MapReduce) is now updated correctly (CDAP-1293).
  • Programs running under a workflow now receive a unique run-id (CDAP-2025).
  • RunRecords are now updated with the RuntimeService to account for node failures (CDAP-2202).
  • MapReduce metrics are now available on a secure cluster (CDAP-64).

API Changes

  • The endpoint (POST '<base-url>/metrics/search?target=childContext[&context=<context>]') that searched for the available contexts of metrics has been deprecated, pending removal in a later version of CDAP (CDAP-1998). A replacement endpoint is available.
  • The endpoint (POST '<base-url>/metrics/search?target=metric&context=<context>') that searched for metrics in a specified context has been deprecated, pending removal in a later version of CDAP (CDAP-1998). A replacement endpoint is available.
  • The endpoint (POST '<base-url>/metrics/query?context=<context>[&groupBy=<tags>]&metric=<metric>&<time-range>') that queried for a metric has been deprecated, pending removal in a later version of CDAP (CDAP-1998). A replacement endpoint is available.
  • Metrics: The tag name for service handlers in previous releases was wrongly "runnable", and internally represented as "srn". These metrics are now tagged as "handler" ("hnd"), and metrics queries will only account for this tag name. If you need to query historic metrics that were emitted with the old tag "runnable", use "srn" to query them (instead of either "runnable" or "handler").
  • The CDAP CLI startup options have been changed to accommodate a new option of executing a file containing a series of CLI commands, line-by-line.
  • The metrics system APIs have been improved (CDAP-1596).
  • The rules for resolving resolution when using resolution=auto in metrics query have been changed (CDAP-1922).
  • Backward incompatible changes in InputFormatProvider and OutputFormatProvider. It won’t affect user code that uses FileSet or PartitionedFileSet. It only affects classes who implement the InputFormatProvider or OutputFormatProvider:
    • InputFormatProvider.getInputFormatClass() is removed and
      • replaced with InputFormatProvider.getInputFormatClassName();
    • OutputFormatProvider.getOutputFormatClass() is removed and
      • replaced with OutputFormatProvider.getOutputFormatClassName().

Deprecated and Removed Features

  • The File DropZone and File Tailer are no longer supported as of Release 3.0.
  • Support for procedures has been removed. After upgrading, an application that contained a procedure must be redeployed.
  • Support for service workers have been removed. After upgrading, an application that contained a service worker must be redeployed.
  • The old CDAP Console has been deprecated.
  • Support for JDK/JRE 1.6 (Java 6) has ended; JDK/JRE 1.7 (Java 7) is now required for CDAP or the CDAP SDK.

Known Issues

  • CDAP has been tested on and supports CDH 4.2.x through CDH 5.3.x, HDP 2.0 through 2.1, and Apache Bigtop 0.8.0. It has not been tested on more recent versions of CDH. See our Hadoop/HBase Environment configurations.
  • After upgrading CDAP from a pre-3.0 version, any unprocessed metrics data in Kafka will be lost and WARN log messages will be logged that tell about the inability to process old data in the old format.
  • See the above section (API Changes) for alterations that can affect existing installations.
  • See also the Known Issues of version 2.8.0.

Release 2.8.0

General

New Features

  • Command Line Interface (CLI)
    • CLI can now directly connect to a CDAP instance of your choice at startup by using cdap-cli.sh --uri <uri>.
    • Support for runtime arguments, which can be listed by running "cdap-cli.sh --help".
    • Table rendering can be configured using "cli render as <alt|csv>". The option "alt" is the default, with "csv" available for copy & pasting.
    • Stream statistics can be computed using "get stream-stats <stream-id>".
  • Datasets
    • Added an ObjectMappedTable dataset that maps object fields to table columns and that is also explorable.
    • Added a PartitionedFileSet dataset that allows addressing files by meta data and that is also explorable.
    • Table datasets now support a multi-get operation for batched reads.
    • Allow an unchecked dataset upgrade upon application deployment (CDAP-1574).
  • Metrics
    • Added new APIs for exploring available metrics, including drilling down into the context of emitted metrics
    • Added the ability to explore (search) all metrics; previously, this was restricted to custom user metrics
    • There are new APIs for querying metrics
    • New capability to break down a metrics time series using the values of one or more tags in its context
  • Namespaces
    • Applications and programs are now managed within namespaces.
    • Application logs are available within namespaces.
    • Metrics are now collected and queried within namespaces.
    • Datasets can now created and managed within namespaces.
    • Streams are now namespaced for ingestion, fetching, and consuming by programs.
    • Explore operations are now namespaced.
  • Preferences
    • Users can store preferences (a property map) at the instance, namespace, application, or program level.
  • Spark
    • Spark now uses a configurer-style API for specifying (CDAP-382).
  • Workflows
    • Users can schedule a workflow based on increments of data being ingested into a stream.
    • Workflows can be stopped.
    • The execution of a workflow can be forked into parallelized branches.
    • The runtime arguments for workflow can be scoped.
  • Workers
    • Added Worker, a new program type that can be added to CDAP applications, used to run background processes and (beta feature) can write to streams through the WorkerContext.
  • Upgrade and Data Migration Tool
    • Added an automated upgrade tool which supports upgrading from 2.6.x to 2.8.0. (Note: Apps need to be both recompiled and re-deployed.) Upgrade from 2.7.x to 2.8.0 is not currently supported. If you have a use case for it, please reach out to us at cdap-user@googlegroups.com.
    • Added a metric migration tool which migrates old metrics to the new 2.8 format.

Improvement

  • Improved flow performance and scalability with a new distributed queue implementation.

API Changes

  • The endpoint (GET <base-url>/data/explore/datasets/<dataset-name>/schema) that retrieved the schema of a dataset’s underlying Hive table has been removed (CDAP-1603).
  • Endpoints have been added to retrieve the CDAP version and the current configurations of CDAP and HBase (Configuration HTTP RESTful API).

Known Issues

  • See also the Known Issues of version 2.7.1.

  • If the Hive Metastore is restarted while the CDAP Explore Service is running, the Explore Service remains alive, but becomes unusable. To correct, restart the CDAP Master, which will restart all services (CDAP-1007).

  • User datasets with names starting with "system" can potentially cause conflicts (CDAP-1587).

  • Scaling the number of metrics processor instances doesn’t automatically distribute the processing load to the newer instances of the metrics processor. The CDAP Master needs to be restarted to effectively distribute the processing across all metrics processor instances (CDAP-1853).

  • Creating a dataset in a non-existent namespace manifests in the RESTful API with an incorrect error message (CDAP-1864).

  • Retrieving multiple metrics—by issuing an HTTP POST request with a JSON list as the request body that enumerates the name and attributes for each metric—is currently not supported in the Metrics HTTP RESTful API v3. Instead, use the v2 API. It will be supported in a future release.

  • Typically, datasets are bundled as part of applications. When an application is upgraded and redeployed, any changes in datasets will not be redeployed. This is because datasets can be shared across applications, and an incompatible schema change can break other applications that are using the dataset. A workaround (CDAP-1253) is to allow unchecked dataset upgrades. Upgrades cause the dataset metadata, i.e. its specification including properties, to be updated. The dataset runtime code is also updated. To prevent data loss the existing data and the underlying HBase tables remain as-is.

    You can allow unchecked dataset upgrades by setting the configuration property dataset.unchecked.upgrade to true in cdap-site.xml. This will ensure that datasets are upgraded when the application is redeployed. When this configuration is set, the recommended process to deploy an upgraded dataset is to first stop all applications that are using the dataset before deploying the new version of the application. This lets all containers (flows, services, etc) to pick up the new dataset changes. When datasets are upgraded using dataset.unchecked.upgrade, no schema compatibility checks are performed by the system. Hence it is very important that the developer verify the backward-compatibility, and makes sure that other applications that are using the dataset can work with the new changes.

Release 2.7.1

API Changes

  • The property security.auth.server.address has been deprecated and replaced with security.auth.server.bind.address (CDAP-639, CDAP-1078).

New Features

  • Spark
    • Spark now uses a configurer-style API for specifying (CDAP-382).
    • Spark can now run as a part of a workflow (CDAP-465).
  • Security
    • CDAP Master now obtains and refreshes Kerberos tickets programmatically (CDAP-1134).
  • Datasets
    • A new, experimental dataset type to support time-partitioned File sets has been added.
    • Time-partitioned File sets can be queried with Impala on CDH distributions (CDAP-926).
    • Streams can be made queryable with Impala by deploying an adapter that periodically converts it into partitions of a time-partitioned File set (CDAP-1129).
    • Support for different levels of conflict detection: ROW, COLUMN, or NONE (CDAP-1016).
    • Removed support for @DisableTransaction (CDAP-1279).
    • Support for annotating a stream with a schema (CDAP-606).
    • A new API for uploading entire files to a stream has been added (CDAP-411).
  • Workflow
    • Workflow now uses a configurer-style API for specifying (CDAP-1207).
    • Multiple instances of a workflow can be run concurrently (CDAP-513).
    • Programs are no longer part of a workflow; instead, they are added in the application and are referenced by a workflow using their names (CDAP-1116).
    • Schedules are now at the application level and properties can be specified for Schedules; these properties will be passed to the scheduled program as runtime arguments (CDAP-1148).

Known Issues

  • See also the Known Issues of version 2.6.1.
  • When upgrading an existing CDAP installation to 2.7.1, all metrics are reset.

Release 2.6.1

CDAP Bug Fixes

  • Allow an unchecked dataset upgrade upon application deployment (CDAP-1253).
  • Update the Hive dataset table when a dataset is updated (CDAP-71).
  • Use Hadoop configuration files bundled with the Explore Service (CDAP-1250).

Known Issues

  • See also the Known Issues of version 2.6.0.

  • Typically, datasets are bundled as part of applications. When an application is upgraded and redeployed, any changes in datasets will not be redeployed. This is because datasets can be shared across applications, and an incompatible schema change can break other applications that are using the dataset. A workaround (CDAP-1253) is to allow unchecked dataset upgrades. Upgrades cause the dataset metadata, i.e. its specification including properties, to be updated. The dataset runtime code is also updated. To prevent data loss the existing data and the underlying HBase tables remain as-is.

    You can allow unchecked dataset upgrades by setting the configuration property dataset.unchecked.upgrade to true in cdap-site.xml. This will ensure that datasets are upgraded when the application is redeployed. When this configuration is set, the recommended process to deploy an upgraded dataset is to first stop all applications that are using the dataset before deploying the new version of the application. This lets all containers (flows, services, etc) to pick up the new dataset changes. When datasets are upgraded using dataset.unchecked.upgrade, no schema compatibility checks are performed by the system. Hence it is very important that the developer verify the backward-compatibility, and makes sure that other applications that are using the dataset can work with the new changes.

Release 2.6.0

API Changes

  • API for specifying services and MapReduce programs has been changed to use a “configurer” style; this will require modification of user classes implementing either MapReduce or service as the interfaces have changed (CDAP-335).

New Features

  • General
    • Health checks are now available for CDAP system services (CDAP-663).
  • Applications
    • Jar deployment now uses a chunked request and writes to a local temp file (CDAP-91).
  • MapReduce
    • MapReduce programs can now read binary stream data (CDAP-331).
  • Datasets
    • Added FileSet, a new core dataset type for working with sets of files (CDAP-1).
  • Spark
    • Spark programs now emit system and custom user metrics (CDAP-346).
    • Services can be called from Spark programs and its worker nodes (CDAP-348).
    • Spark programs can now read from streams (CDAP-403).
    • Added Spark support to the CDAP CLI (Command-line Interface) (CDAP-425).
    • Improved speed of Spark unit tests (CDAP-600).
    • Spark programs now display system metrics in the CDAP Console (CDAP-652).
  • Procedures
    • Procedures have been deprecated in favor of services (CDAP-413).
  • Services
    • Added an HTTP endpoint that returns the endpoints a particular service exposes (CDAP-412).
    • Added an HTTP endpoint that lists all services (CDAP-469).
    • Default metrics for services have been added to the CDAP Console (CDAP-512).
    • The annotations @QueryParam and @DefaultValue are now supported in custom service handlers (CDAP-664).
  • Metrics
    • System and user metrics now support gauge metrics (CDAP-484).
    • Metrics can be queried using a program’s run-ID (CDAP-620).
  • Documentation

CDAP Bug Fixes

  • Fixed a problem with readless increments not being used when they were enabled in a dataset (CDAP-383).
  • Fixed a problem with applications, whose Spark or Scala user classes were not extended from either JavaSparkProgram or ScalaSparkProgram, failing with a class loading error (CDAP-599).
  • Fixed a problem with the CDAP upgrade tool not preserving—for tables with readless increments enabled—the coprocessor configuration during an upgrade (CDAP-1044).
  • Fixed a problem with the readless increment implementation dropping increment cells when a region flush or compaction occurred (CDAP-1062).

Known Issues

  • When running secure Hadoop clusters, metrics and debug logs from MapReduce programs are not available (CDAP-64 and CDAP-797).

  • When upgrading a cluster from an earlier version of CDAP, warning messages may appear in the master log indicating that in-transit (emitted, but not yet processed) metrics system messages could not be decoded (Failed to decode message to MetricsRecord). This is because of a change in the format of emitted metrics, and can result in a small amount of metrics data points being lost (CDAP-745).

  • Writing to datasets through Hive is not supported in CDH4.x (CDAP-988).

  • A race condition resulting in a deadlock can occur when a TwillRunnable container shutdowns while it still has ZooKeeper events to process. This occasionally surfaces when running with OpenJDK or JDK7, though not with Oracle JDK6. It is caused by a change in the ThreadPoolExecutor implementation between Oracle JDK6 and OpenJDK/JDK7. Until Twill is updated in a future version of CDAP, a work-around is to kill the errant process. The Yarn command to list all running applications and their app-ids is:

    yarn application -list -appStates RUNNING
    

    The command to kill a process is:

    yarn application -kill <app-id>
    

    All versions of CDAP running Twill version 0.4.0 with this configuration can exhibit this problem (TWILL-110).

Release 2.5.2

CDAP Bug Fixes

  • Fixed a problem with a Coopr-provisioned secure cluster failing to start due to a classpath issue (CDAP-478).
  • Fixed a problem with the WISE app zip distribution not packaged correctly; a new version (0.2.1) has been released (CDAP-533).
  • Fixed a problem with the examples and tests incorrectly using the ByteBuffer.array method when reading a stream event (CDAP-549).
  • Fixed a problem with the Authentication Server so that it can now communicate with an LDAP instance over SSL (CDAP-556).
  • Fixed a problem with the program class loader to allow applications to use a different version of a library than the one that the CDAP platform uses; for example, a different Kafka library (CDAP-559).
  • Fixed a problem with CDAP master not obtaining new delegation tokens after running for hbase.auth.key.update.interval milliseconds (CDAP-562).
  • Fixed a problem with the transaction not being rolled back when a user service handler throws an exception (CDAP-607).

Other Changes

  • Improved the CDAP documentation:
    • Re-organized the documentation into three manuals—Developers’ Manual, Administration Manual, Reference Manual—and a set of examples, how-to guides and tutorials;
    • Documents are now in smaller chapters, with numerous updates and revisions;
    • Added a link for downloading an archive of the documentation for offline use;
    • Added links to examples relevant to a particular component;
    • Added suggested deployment architectures for Distributed CDAP installations;
    • Added a glossary;
    • Added navigation aids at the bottom of each page; and
    • Tested and updated the Standalone CDAP examples and their documentation.

Known Issues

  • Currently, applications that include Spark or Scala classes in user classes not extended from either JavaSparkProgram or ScalaSparkProgram (depending upon the language) fail with a class loading error. Spark or Scala classes should not be used outside of the Spark program. (CDAP-599)
  • See also the Known Issues of version 2.5.0.
  • See also the TWILL-110 Known Issue of version 2.6.0.

Release 2.5.1

CDAP Bug Fixes

  • Improved the documentation of the CDAP authentication and stream clients, both Java and Python APIs.
  • Fixed problems with the CDAP Command Line Interface (CLI):
    • Did not work in non-interactive mode;
    • Printed excessive debug log messages;
    • Relative paths did not work as expected; and
    • Failed to execute SQL queries.
  • Removed dependencies on SNAPSHOT artifacts for netty-http and auth-clients.
  • Corrected an error in the message printed by the startup script cdap.sh.
  • Resolved a problem with the reading of the properties file by the CDAP Flume Client of CDAP Ingest library without first checking if authentication was enabled.

Other Changes

  • The scripts send-query.sh, access-token.sh and access-token.bat has been replaced by the CDAP Command Line Interface, cdap-cli.sh.
  • The CDAP Command Line Interface now uses and saves access tokens when connecting to a secure CDAP instance.
  • The CDAP Java Stream Client now allows empty String events to be sent.
  • The CDAP Python Authentication Client’s configure() method now takes a dictionary rather than a filepath.

Known Issues

Release 2.5.0

New Features

Ad-hoc querying

  • Capability to write to datasets using SQL
  • Added a CDAP JDBC driver allowing connections from Java applications and third-party business intelligence tools
  • Ability to perform ad-hoc queries from the CDAP Console:
    • Execute a SQL query from the Console
    • View list of active, completed queries
    • Download query results

Datasets

  • Datasets can be tested with TestBase outside of the context of an application
  • CDAP now checks datasets for compatibility in a verification stage
  • The Transaction engine uses server-side filtering for efficient transactional reads
  • Dataset specifications can now be dynamically reconfigured through the use of RESTful endpoints
  • The Bundle jar format is now used for dataset libs
  • Increments on datasets are now read-less

Services

  • Added simplified APIs for using services from other programs such as MapReduce, flows and Procedures
  • Added an API for creating services and handlers that can use datasets transactionally
  • Added a RESTful API to make requests to a service via the Router

Security

  • Added authorization logging
  • Added Kerberos authentication to ZooKeeper secret keys
  • Added support for SSL

Spark Integration

  • Supports running Spark programs as a part of CDAP applications in Standalone mode
  • Supports running Spark programs written with Spark versions 1.0.1 or 1.1.0
  • Supports Spark’s MLib and GraphX modules
  • Includes three examples demonstrating CDAP Spark programs
  • Adds display of Spark program logs and history in the CDAP Console

Streams

  • Added a collection of applications, tools and APIs specifically for the ETL (Extract, Transform and Loading) of data
  • Added support for asynchronously writing to streams

Clients

  • Added a Command Line Interface
  • Added a Java Client Interface

Major CDAP Bug Fixes

  • Fixed a problem with a HADOOP_HOME exception stacktrace when unit-testing an application
  • Fixed an issue with Hive creating directories in /tmp in the Standalone and unit-test frameworks
  • Fixed a problem with type inconsistency of service API calls, where numbers were showing up as strings
  • Fixed an issue with the premature expiration of long-term Authentication Tokens
  • Fixed an issue with the dataset size metric showing data operations size instead of resource usage

Known Issues

  • Metrics for MapReduce programs aren’t populated on secure Hadoop clusters
  • The metric for the number of cores shown in the Resources view of the CDAP Console will be zero unless YARN has been configured to enable virtual cores
  • See also the TWILL-110 Known Issue of version 2.6.0.