πŸ”—Appendix: cdap-site.xml, cdap-default.xml

The cdap-site.xml file is the configuration file for a CDAP installation. Its properties and values determine the settings used by CDAP when starting and operating.

Any properties not found in an installation's cdap-site.xml will use a default parameter value defined in the file cdap-default.xml. It is located in the CDAP JARs, and should not be altered.

Any of the default values (with the exception of those marked [Final]) can be over-ridden by defining a modifying value in the cdap-site.xml file, located (by default) either in <CDAP-SDK-HOME>/conf/cdap-site.xml (Standalone mode) or /etc/cdap/conf/cdap-site.xml (Distributed mode).

The section below are the parameters that can be defined in the cdap-site.xml file, their default values (obtained from cdap-default.xml) and their descriptions.

Notes

  • [Final]: Properties marked as [Final] indicates that their value cannot be changed, even with a setting in the cdap-site.xml.
  • Kafka Server: All properties that begin with kafka.server. are passed to the CDAP Kafka service when it is started up.
  • Security: For information on configuring the cdap-site.xml file, its security section, and CDAP for security, see the documentation Security section.

πŸ”—General

Parameter Name Default Value Description
cluster.name   A cluster-based name for CDAP. It is used for scope resolution of preferences and runtime arguments. For example: the preference key "cluster.[cluster.name].my.key" would be resolved to "my.key" at runtime; a program can then retrieve the preference value by using just "my.key". The administrator can use this property to set different preferences for each cluster.
hdfs.lib.dir ${hdfs.namespace}/lib Common directory in HDFS for, among others, JAR files for coprocessors
hdfs.namespace /${root.namespace} Root directory for HDFS files written by CDAP
hdfs.user yarn User name for accessing HDFS
instance.name ${root.namespace} Determines a unique identifier for a CDAP instance. It is used for providing authorization to a particular CDAP instance. Must be alphanumeric, and should not be changed after CDAP has been started. If it is changed, there is a risk of losing data (for example, authorization policies).
local.data.dir data Data directory for Standalone CDAP and the CDAP Master process in Distributed CDAP
mapreduce.include.custom.format.classes true Indicates whether to include custom input/output format classes in the job.jar or not; if set to true, custom format classes will be added to the job.jar and available as part of the MapReduce system classpath
mapreduce.jobclient.connect.max.retries 2 Indicates the maximum number of retries the JobClient will make to establish a service connection when retrieving job status and history
master.manage.hbase.coprocessors true Whether CDAP Master should manage HBase coprocessors. This should only be set to false if you are managing coprocessors yourself in order to support rolling HBase upgrades.
master.startup.checks.classes   Comma-separated list of classnames for checks that will be run before the CDAP Master starts up. If any of the checks fails, the CDAP Master will not start up. Checks will only be run if ${master.startup.checks.enabled} is set to true.
master.startup.checks.enabled true Whether checks should be run before startup to determine if the CDAP Master can be run correctly. Which checks are run is determined by the ${master.startup.checks.packages} and ${master.startup.checks.classes} settings. If any checks fail, the CDAP Master will fail to start instead of waiting for the problem to be fixed. This setting only affects Distributed CDAP. It does not apply to Standalone CDAP.
master.startup.checks.packages
co.cask.cdap.master.startup,co.cask.
cdap.data.startup
Comma-separated list of packages containing checks that will be run before the CDAP Master starts up. If any of the checks fails, the CDAP Master will not start up. Checks will only be run if ${master.startup.checks.enabled} is set to true.
namespaces.dir namespaces The sub-directory of ${hdfs.namespace} in which namespaces are stored
root.namespace cdap Root for this CDAP instance; used as the parent (or root) node for ZooKeeper, as the directory under which all CDAP data and metadata is stored in HDFS, and as the prefix for all HBase tables created by CDAP; must be composed of alphanumeric characters
thrift.max.read.buffer 16777216 Specifies the maximum read buffer size in bytes used by the Thrift service; value should be set to greater than the maximum frame sent on the RPC channel
twill.java.heap.memory.ratio 0.7 The minimum ratio of heap to non-heap memory for Apache Twill container
twill.java.reserved.memory.mb 250 Reserved non-heap memory in megabytes for Apache Twill container
twill.jvm.gc.opts
-verbose:gc
-Xloggc:&lt;LOG_DIR&gt;/gc.log
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=1M
Java garbage collection options for all Apache Twill containers; "&lt;LOG_DIR&gt;" is the location of the log directory in the container; note that the special characters are replaced with entity equivalents so they can be included in the XML
twill.location.cache.dir .cache The relative directory name on the distributed file system for Apache Twill to cache generated files, to speed up launching applications. This directory is relative to ${root.namespace}/twill on the file system.
twill.no.container.timeout 120000 Duration in milliseconds to wait for at least one container for Apache Twill runnable
twill.yarn.am.memory.mb 512 The memory size in megabytes of the Apache Twill application master container
twill.yarn.am.reserved.memory.mb ${twill.java.reserved.memory.mb} Reserved non-heap memory in megabytes for Apache Twill application master container
twill.zookeeper.namespace /twill ZooKeeper namespace prefix for Apache Twill
upgrade.thread.pool.size 10 Number of threads to be used running operations concurrently during a CDAP upgrade
zookeeper.client.startup.timeout.millis 60000 Duration in milliseconds to wait for a successful connection to a server in the ZooKeeper quorum
zookeeper.quorum 127.0.0.1:2181/${root.namespace} ZooKeeper quorum string; specifies the ZooKeeper host:port; substitute the quorum (FQDN1:2181,FQDN2:2181,...) for the components shown here
zookeeper.session.timeout.millis 40000 ZooKeeper session timeout in milliseconds

πŸ”—Global

Parameter Name Default Value Description
dataset.unchecked.upgrade false If false, any changes made to existing datasets are not deployed when an app is redeployed; setting this value to true allows the dataset changes to be deployed upon app redeployment
enable.unrecoverable.reset false Determines if resetting CDAP should be enabled. WARNING: Enabling this option makes it possible to delete all applications and data; NO RECOVERY IS POSSIBLE!

πŸ”—Applications

Parameter Name Default Value Description
app.artifact.dir /opt/cdap/master/artifacts Semicolon-separated list of local directories scanned for system artifacts to add to the artifact repository
app.bind.port 0 App Fabric service bind port; if 0, binds to a random port
app.meta.upgrade.timeout.secs 60 Timeout value in seconds while upgrading application versions
app.output.dir /programs Directory where all archives are stored
app.program.extra.classpath   Additional Java classpath for CDAP programs. These extra classpaths must be present on all nodes in the cluster. Supports wildcard suffix "*" to include all JAR files under a directory.
app.program.jvm.opts
-XX:MaxPermSize=128M
${twill.jvm.gc.opts}
Java options for all program containers
app.program.max.start.seconds 300 Maximum number of seconds to wait for a program to start before killing it
app.program.max.stop.seconds 300 Maximum number of seconds to wait for a program to stop before killing it
app.program.runid.corrector.interval 180 Interval in seconds of how often the run id corrector thread will run; this value should be greater than 0
app.program.runtime.extensions.dir /opt/cdap/master/ext/runtimes Semicolon-separated list of local directories that are scanned for program runtime extensions
app.program.spark.yarn.client.rewrite.
enabled
true Specify whether to rewrite the YARN 'Client.scala' class in Spark to work around issue SPARK-13441 in CDH clusters
app.ssl.bind.port 30443 App Fabric service bind port for HTTPS
app.temp.dir /tmp Temp directory
apps.scheduler.queue   Scheduler queue for CDAP programs and CDAP Explore queries
app.deploy.update.schedules true If true, redeploying an application will modify any schedules that currently exist for the application; if false, redeploying an application does not create any new schedules and existing schedules are neither deleted nor updated. This property only affects the redeployment of an application; all related actions or endpoints are unaffected.
master.services.bind.address 0.0.0.0 Bind address for app fabric service and dataset service
program.container.dist.jars   Additional jars to be localized to every program container and to be added to classpaths of CDAP programs. They can be local file paths on the CDAP Master or URIs of remote files. Multiple JAR files are comma- separated.
scheduler.max.thread.pool.size 100 Size of the scheduler thread pool
scheduler.misfire.threshold.ms 60000 The number of milliseconds by which a schedule execution can miss its next-fire-time and still run
workflow.token.max.size.mb 30 Maximum allowed size in megabytes of a workflow token; if the workflow token exceeds this size, no further updates are allowed

πŸ”—Audit

Parameter Name Default Value Description
audit.enabled true Determines whether to publish audit messages
audit.publish.timeout.ms 2000 Audit message publishing timeout in milliseconds
audit.topic audit Topic name used to publish audit messages in the messaging system

πŸ”—Datasets

Parameter Name Default Value Description
data.local.storage ${local.data.dir}/ldb Database directory for LevelDB, used for data fabric in Standalone CDAP
data.local.storage.blocksize 1024 Block size in bytes for data fabric when in Standalone CDAP
data.local.storage.cachesize 104857600 Cache size in bytes for data fabric when in Standalone CDAP
data.tx.bind.address 0.0.0.0 Transaction service bind address
data.tx.bind.port 0 Transaction service bind port; if 0, binds to a random port
data.tx.client.count 50 The number of pooled instances of the transaction client; increase this to increase transaction concurrency
data.tx.client.provider pool Provider strategy for transaction clients; valid values are "pool" and "thread-local"
data.tx.discovery.service.name transaction Name in discovery service for the transaction service
data.tx.hdfs.user ${hdfs.user} User name for accessing HDFS (if not running in secure HDFS)
data.tx.janitor.enable true Determines if the TransactionDataJanitor coprocessor is enabled on tables; normally should be true
data.tx.max.instances ${master.service.max.instances} Maximum number of transaction service instances
data.tx.max.timeout 600 The limit for the allowed transaction timeout, in seconds. Attempts to start a transaction with a longer timeout will fail.
data.tx.memory.mb ${master.service.memory.mb} Memory in megabytes for each transaction service instance
data.tx.num.cores ${master.service.num.cores} Number of virtual cores for the transaction service
data.tx.num.instances 1 Requested number of transaction service instances
data.tx.prune.enable false Enable invalid transaction list pruning
data.tx.prune.plugins data.tx.pruning.plugin List of transaction pruning plugins; for CDAP HBase tables that use transaction functionality to skip or clean invalid data
data.tx.prune.state.table
${dataset.table.prefix}_system:tephra.
state
Table used to store intermediate state when invalid transaction list pruning is enabled
data.tx.pruning.plugin.class
co.cask.data2.txprune.
DefaultHBaseTransactionPruningPlugin
Class name for the default transaction pruning plugin
data.tx.server.io.threads 2 Number of IO threads for the transaction service
data.tx.server.threads 25 Number of threads for the transaction service
data.tx.snapshot.codecs
org.apache.tephra.snapshot.SnapshotCodecV3,
org.apache.tephra.snapshot.SnapshotCodecV4
Specifies the class names of all supported transaction state codecs
data.tx.snapshot.dir ${hdfs.namespace}/tx.snapshot Directory in HDFS used to store snapshots and logs of transaction state
data.tx.snapshot.interval 60 Frequency of transaction snapshots in seconds
data.tx.snapshot.local.dir ${local.data.dir}/tx.snapshot Storage directory on the local filesystem of snapshot and logs of transaction state when in Standalone CDAP
data.tx.snapshot.retain 10 Number of transaction snapshot files to retain as backups
data.tx.thrift.max.read.buffer ${thrift.max.read.buffer} Maximum read buffer size in bytes used by the transaction service; the value should be set to something greater than the maximum frame sent on the RPC channel
data.tx.timeout 30 Timeout value in seconds for a transaction; if the transaction is not finished in that time, it is marked invalid
dataset.data.dir data Base directory for user data on the filesystem
dataset.executor.bind.port 0 Dataset executor bind port; if 0, binds to a random port
dataset.executor.container.instances 1 Number of dataset executor instances
dataset.executor.container.memory.mb ${master.service.memory.mb} Memory in megabytes for each dataset executor instance
dataset.executor.container.num.cores 1 Number of virtual cores for each dataset executor instance
dataset.executor.max.instances ${master.service.max.instances} Maximum number of dataset executor instances
dataset.extensions.dir /opt/cdap/ext/lib Directory where all dataset extensions are stored
dataset.service.bind.port 0 Dataset service bind port; if 0, binds to a random port
dataset.service.output.dir /datasets Directory where all dataset modules archives are stored
dataset.table.prefix ${root.namespace} Prefix for dataset table name

πŸ”—Explore Service

Parameter Name Default Value Description
explore.active.operation.timeout.secs 86400 Timeout value in seconds for an SQL operation whose result was not fetched completely
explore.cleanup.job.schedule.secs 60 Interval in seconds to schedule the clean-up of timed-out operations
explore.container.yarn.app.classpath.
first
false Determines if the YARN application classpath precedes the query engine classpath
explore.enabled true Determines if the CDAP Explore Service (ad-hoc SQL queries) is enabled
explore.executor.container.memory.mb 2048 Memory in megabytes for each CDAP Explore executor instance. This is explicitly set differently than ${master.service.memory.mb} as Explore requires more memory to run than the CDAP Master service.
explore.executor.container.num.cores 1 Number of virtual cores for each CDAP Explore executor instance
explore.http.timeout 20 The timeout in seconds for HTTP requests to the CDAP Explore service. Because requests may happen within a transaction, it is recommended to keep this timeout noticeably shorter than the default transaction timeout, ${data.tx.timeout}.
explore.inactive.operation.timeout.secs 3600 Timeout value in seconds for an SQL operation which does not have any more results to be fetched
explore.local.data.dir ${local.data.dir}/explore Data directory for the CDAP Explore service when in Standalone CDAP
explore.service.bind.port 0 CDAP Explore service bind port; if 0, binds to a random port
explore.start.on.demand false Determines the start-up of the CDAP Explore service (ad-hoc SQL queries); if false, the Explore service starts up when CDAP is started; if true, the Explore service will start upon the first query it receives
explore.writes.enabled true Determines if writing to a table through the CDAP Explore service (ad- hoc SQL queries) is enabled

πŸ”—Gateway

Parameter Name Default Value Description
app.boss.threads 1 Number of Netty service boss threads
app.connection.backlog 20000 Maximum connection backlog of CDAP Master
app.exec.threads 20 Number of Netty service executor threads
app.worker.threads 10 Number of Netty service worker threads

πŸ”—Kafka Server

Parameter Name Default Value Description
kafka.seed.brokers 127.0.0.1:9092 Comma-separated list of CDAP Kafka service brokers; for Distributed CDAP, replace with list of FQDN:port brokers
kafka.server.default.replication.factor 1 CDAP Kafka service replication factor; used to replicate Kafka messages across multiple machines to prevent data loss in the event of a hardware failure. The recommended setting is to run at least two CDAP Kafka servers. If you are running two CDAP Kafka servers, set this value to 2; otherwise, set it to the maximum number of tolerated machine failures plus one (assuming you have that number of machines).
kafka.server.host.name 0.0.0.0 CDAP Kafka service bind address
kafka.server.log.dirs /tmp/kafka-logs Comma-separated list of CDAP Kafka service log storage directories
kafka.server.log.flush.interval.messages 10000 The interval length (in number of messages in the CDAP Kafka service) at which to force an fsync of data written to the log
kafka.server.log.retention.hours 24 The number of hours to keep a log file before deleting it; this is the time-to-live in the CDAP Kafka service, while a log is in-flight between the container and the CDAP log saver
kafka.server.num.partitions 10 Default number of partitions for a topic in the CDAP Kafka service
kafka.server.port 9092 CDAP Kafka service bind port
kafka.server.zookeeper.connection.
timeout.ms
1000000 Maximum time in milliseconds that the CDAP Kafka service will wait to establish a connection to ZooKeeper
kafka.zookeeper.namespace kafka CDAP Kafka service ZooKeeper namespace
kafka.zookeeper.quorum   CDAP Kafka service ZooKeeper quorum and namespace. If set, this will override the ZooKeeper quorum (set by ${zookeeper.quorum}) and the ZooKeeper namespace (set by ${kafka.zookeeper.namespace}) when setting up a connection to the Kafka service used by CDAP. If the same Kafka service ZooKeeper quorum and namespace are shared by multiple CDAP instances, each CDAP instance needs to distinguish its Kafka topics from those of other CDAP instances with unique values for ${log.kafka.topic} and ${metrics.topic.prefix}.

πŸ”—Logging

Parameter Name Default Value Description
log.base.dir /logs/avro In Distributed CDAP, the HDFS directory under which the system log pipeline saves log files
log.collection.root ${local.data.dir}/logs In Standalone CDAP, the local directory under which the system log pipeline saves log files
log.kafka.topic logs.user-v2 Kafka topic name used to publish logs
log.pipeline.cdap.dir.permissions 700 Permissions used by the system log pipeline when creating directories
log.pipeline.cdap.file.cleanup.interval.
mins
1440 Time in minutes between runs of the log cleanup thread
log.pipeline.cdap.file.cleanup.
transaction.timeout
60 Transaction timeout in seconds used by the log cleanup thread. This should not be greater than ${data.tx.max.timeout}.
log.pipeline.cdap.file.max.lifetime.ms 21600000 Maximum time span in milliseconds of a log file created by the system log pipeline
log.pipeline.cdap.file.max.size.bytes 104857600 Maximum size in bytes of a log file created by the system log pipeline
log.pipeline.cdap.file.permissions 600 Permissions used by the system log pipeline when creating files
log.pipeline.cdap.file.retention.
duration.days
7 Time in days a log file is retained
log.process.pipeline.checkpoint.
interval.ms
10000 The time between log processing pipeline checkpoints in milliseconds
log.process.pipeline.config.dir /opt/cdap/master/ext/logging/config A local directory on the CDAP Master that is scanned for log processing pipeline configurations. Each pipeline is defined by a file in the logback XML format, with ".xml" as the file name extension.
log.process.pipeline.event.delay.ms 2000 The time a log event stays in the log processing pipeline buffer before writing out to log appenders in milliseconds. A longer delay will result in better time ordering of log events before presenting to log appenders but will consume more memory.
log.process.pipeline.kafka.fetch.size 1048576 The buffer size in bytes, per topic partition, for fetching log events from Kafka
log.process.pipeline.lib.dir /opt/cdap/master/ext/logging/lib Comma-separated list of local directories on the CDAP Master scanned for additional library JAR files to be included for log processing
log.publish.num.partitions 10 Number of CDAP Kafka service partitions to publish the logs to
log.publish.partition.key program Publish logs from an application or a program to the same partition. Valid values are "application" or "program". If set to "application", logs from all the programs of an application go to the same partition. If set to "program", logs from the same program go to the same partition. Changes to this property requires restarting of all CDAP applications.
log.saver.container.memory.mb 1024 Memory in megabytes for each log saver instance to run in YARN. This is explicitly set differently than ${master.service.memory.mb} as the log saver requires more memory to run than the CDAP Master service.
log.saver.container.num.cores 2 Number of virtual cores for each log saver instance in YARN
log.saver.max.instances ${master.service.max.instances} Maximum number of log saver instances to run in YARN
log.saver.num.instances 1 Number of log saver instances to run in YARN

πŸ”—Market

Parameter Name Default Value Description
market.base.url http://market.cask.co/v2 The base URL of the Cask Market used by the CDAP UI for the Cask Market RESTful API. The default value shown is that of the public Cask Market.

πŸ”—Master

Parameter Name Default Value Description
hbase.client.retries.number 2 Maximum number of retries while performing HBase operations from master services
hbase.rpc.timeout 15000 RPC timeout from HBase operations performed from master services
http.service.boss.threads 1 Number of Netty service boss threads for master HTTP services
http.service.connection.backlog 20000 Maximum connection backlog of master HTTP service
http.service.exec.threads 20 Number of Netty service executor threads for master HTTP services
http.service.worker.threads 10 Number of Netty service worker threads for master HTTP services
master.collect.app.containers.log.level ERROR The log level of application container logs that are streamed back to the CDAP Master process log. The levels supported are "ALL", "TRACE", "DEBUG", "INFO", "WARN", "ERROR", and "OFF".
master.collect.containers.log true Determines if master service container logs are streamed back to the CDAP Master process log
master.service.max.instances 5 Maximum number of master service instances
master.service.memory.mb 512 Memory in megabytes for each master service instance
master.service.num.cores 2 Number of virtual cores for each master service instance
master.services.scheduler.queue   Scheduler queue for CDAP Master services
master.startup.service.timeout.seconds 600 Timeout in seconds for master services to wait for their dependent services to be available. For example, the dataset executor master service requires the transaction service, and will wait for the transaction service to become available while it is starting up. If the timeout is hit, the service will fail to start and the master service will shut itself down. If set to 0 or below, master services will not wait for their dependent services to start before starting themselves.

πŸ”—Messaging System

Parameter Name Default Value Description
messaging.container.instances [Final] 1 Number of instances for the messaging service
messaging.container.memory.mb ${master.service.memory.mb} Memory in megabytes for each messaging service instance
messaging.container.num.cores ${master.service.num.cores} Number of virtual cores for each messaging service instance
messaging.coprocessor.dir tms Directory to store messaging service table coprocessors. This path is relative to ${hdfs.namespace}.
messaging.coprocessor.metadata.cache.
expiration.seconds
120 Number of seconds after which the metadata cache in HBase data table coprocessors will expire
messaging.hbase.max.scan.threads 96 Maximum number of threads used for scanning HBase tables
messaging.hbase.scan.cache.rows 1000 Number of rows for caching that will be passed to HBase scanners. Higher caching values will enable faster scanning but will use more memory.
messaging.http.server.consume.chunk.size 60000 Approximate size in bytes of each chunk streamed back to a consumer
messaging.http.server.executor.threads 0 Number of executor threads for the HTTP server in the messaging system. If set to 0, no executor threads will be used and requests will be handled directly in the IO thread.
messaging.http.server.max.request.size.
mb
10 Maximum request content size in megabytes for each request to the HTTP server in the messaging system
messaging.http.server.worker.threads 30 Number of IO threads used by the HTTP server in the messaging system
messaging.local.data.cleanup.frequency.
secs
3600 Scheduling frequency of time-to-live cleanup thread in seconds (only used in Standalone CDAP)
messaging.local.data.dir ${local.data.dir}/messaging Local storage directory for the messaging system (used only in Standalone CDAP)
messaging.max.instances [Final] 1 Maximum number of instances for the messaging service
messaging.message.table.hbase.splits 16 Number of splits to use for the message table in HBase upon table creation
messaging.message.table.name tms.message Name of the message table of the messaging system
messaging.metadata.table.name tms.meta Name of the metadata table of the messaging system
messaging.payload.table.hbase.splits 16 Number of splits to use for the payload table in HBase upon table creation
messaging.payload.table.name tms.payload Name of the payload table of the messaging system
messaging.system.topics
${audit.topic},${metrics.topic.
prefix}:${metrics.messaging.topic.
num},${notification.topic}
A comma-separated list of topics that are always available in the system namespace. Multiple topics sharing the same prefix and distinguished by different numerical suffixes can be specified with the syntax <common.prefix>:<total.topic.number>, where the <total.topic.number> is the total number of topics sharing the <common.prefix>, and the numerical suffixes will range from 0 to (<total.topic.number> - 1).
messaging.table.expiration.seconds 300 Number of seconds after which the messaging table cache will expire
messaging.topic.default.ttl.seconds 604800 The default time-to-live in seconds for messages in a topic

πŸ”—Metadata

Parameter Name Default Value Description
metadata.max.allowed.chars 50 Maximum number of characters for metadata keys, values, and tags
metadata.service.bind.address 0.0.0.0 Metadata HTTP service bind address
metadata.service.bind.port 0 Metadata HTTP service bind port; if 0, binds to a random port
metadata.service.exec.threads ${http.service.exec.threads} Number of Netty service executor threads for metadata HTTP service
metadata.service.worker.threads ${http.service.worker.threads} Number of Netty service IO worker threads for metadata HTTP service

πŸ”—Metrics

Parameter Name Default Value Description
metrics.boss.threads ${http.service.boss.threads} Number of Netty service boss threads for metrics HTTP services
metrics.connection.backlog ${http.service.connection.backlog} Maximum connection backlog of metrics HTTP service
metrics.data.table.retention.resolution.
1.seconds
7200 Retention resolution in seconds of the 1-second resolution table; default retention period is 2 hours
metrics.data.table.retention.resolution.
3600.seconds
2592000 Retention resolution in seconds of the 1-hour resolution table; default retention period is 30 days
metrics.data.table.retention.resolution.
60.seconds
2592000 Retention resolution in seconds for the 1-minute resolution table; default retention period is 30 days
metrics.data.table.ts.rollTime.3600 24 Number of columns in a 1-hour resolution timeseries table
metrics.data.table.ts.rollTime.60 60 Number of columns in a 1-minute resolution timeseries table
metrics.dataset.hbase.stats.report.
interval
60 Report interval in seconds for HBase stats
metrics.dataset.leveldb.stats.report.
interval
60 Report interval in seconds for LevelDB stats
metrics.exec.threads ${http.service.exec.threads} Number of Netty service executor threads for metrics HTTP services
metrics.kafka.meta.table metrics.kafka.meta Name of the Kafka metrics meta table
metrics.kafka.partition.size 10 Number of partitions for the Kafka metrics topic
metrics.kafka.topic.prefix metrics Topic prefix used to publish metrics in Kafka
metrics.max.instances ${master.service.max.instances} Maximum number of instances for the metrics service
metrics.memory.mb ${master.service.memory.mb} Memory in megabytes for each metrics service instance
metrics.messaging.fetcher.limit 200 Maximum number of metrics messages to be fetched from the messaging fetcher at a time. It is also the maximum number of metric values to be persisted in the metrics store after the number of fetched messages reaches the limit.
metrics.messaging.meta.table metrics.messaging.meta Name of the messaging metrics meta table
metrics.messaging.topic.num 10 Number of topics for metrics messages. This property also sets the number of threads used to fetch and process metrics in parallel from the messaging service. For a value of N, topics will be created for metrics with names beginning at ${metrics.topic.prefix}0, ${metrics.topic.prefix}1, up to ${metrics.topic.prefix}(N-1).
metrics.num.cores ${master.service.num.cores} Number of virtual cores for the metrics service
metrics.num.instances 1 Number of instances for the metrics service
metrics.processor.max.instances ${master.service.max.instances} Maximum number of instances for metrics processor service Apache Twill runnable
metrics.processor.memory.mb ${master.service.memory.mb} Memory in megabytes for each metrics processor service Apache Twill runnable instance
metrics.processor.num.cores 1 Number of virtual cores for metrics processor service Apache Twill runnable
metrics.processor.num.instances 1 Number of instances for metrics processor service Apache Twill runnable
metrics.processor.status.bind.address 0.0.0.0 Metrics processor HTTP service bind address
metrics.query.bind.port 45005 Metrics query service bind port
metrics.topic.prefix metrics Topic prefix used to publish metrics in messaging
metrics.worker.threads ${http.service.worker.threads} Number of Netty service worker threads for metrics HTTP services

πŸ”—Monitor Handler

Parameter Name Default Value Description
monitor.handler.service.discovery.
timeout.seconds
1 Timeout in seconds for service discovery used in monitor handler service status check

πŸ”—Notification System

Parameter Name Default Value Description
notification.topic notifications Topic name used to publish notifications in the messaging system

πŸ”—Operational Statistics

Parameter Name Default Value Description
operational.stats.extensions.dir /opt/cdap/master/ext/operations Semicolon-separated list of local directories on the CDAP Master that are scanned for operational statistics extensions
operational.stats.refresh.interval.secs 60 Number of seconds after which operational statistics should be refreshed

πŸ”—Queue

Parameter Name Default Value Description
data.queue.config.update.interval 5 Frequency in seconds of updates to the queue consumer configuration used in evicting queue entries on flush and compaction
data.queue.dequeue.tx.percent 30 Percentage of transaction time allowed to spend in dequeue; it should be an integer between 1-100
data.queue.table.presplits 16 Number of splits in the queue table

πŸ”—Remote System Operation

Parameter Name Default Value Description
remote.system.op.exec.threads ${http.service.exec.threads} Number of Netty service executor threads for the remote system operation HTTP service
remote.system.op.service.bind.address 0.0.0.0 Remote system operation HTTP service bind address
remote.system.op.worker.threads ${http.service.worker.threads} Number of Netty service IO worker threads for the remote system operation HTTP service

πŸ”—Retry Policies

Parameter Name Default Value Description
custom.action.retry.policy.base.delay.ms 1000 The base delay between retries in milliseconds
custom.action.retry.policy.max.delay.ms 30000 The maximum delay between retries in milliseconds
custom.action.retry.policy.max.retries 1000 The maximum number of retries to attempt before aborting
custom.action.retry.policy.max.time.secs 600 The maximum elapsed time in seconds before retries are aborted
custom.action.retry.policy.type exponential.backoff The type of retry policy for custom actions. Allowed options: "none", "fixed.delay", or "exponential.backoff".
flow.retry.policy.base.delay.ms 100 The base delay between retries in milliseconds
flow.retry.policy.max.delay.ms 1000 The maximum delay between retries in milliseconds
flow.retry.policy.max.retries 3 The maximum number of retries to attempt before aborting
flow.retry.policy.max.time.secs 10 The maximum elapsed time in seconds before retries are aborted
flow.retry.policy.type none The type of retry policy for flows. Allowed options: "none", "fixed.delay", or "exponential.backoff".
mapreduce.retry.policy.base.delay.ms 1000 The base delay between retries in milliseconds
mapreduce.retry.policy.max.delay.ms 30000 The maximum delay between retries in milliseconds
mapreduce.retry.policy.max.retries 1000 The maximum number of retries to attempt before aborting
mapreduce.retry.policy.max.time.secs 600 The maximum elapsed time in seconds before retries are aborted
mapreduce.retry.policy.type exponential.backoff The type of retry policy for MapReduce programs. Allowed options: "none", "fixed.delay", or "exponential.backoff".
service.retry.policy.base.delay.ms 100 The base delay between retries in milliseconds
service.retry.policy.max.delay.ms 1000 The maximum delay between retries in milliseconds
service.retry.policy.max.retries 3 The maximum number of retries to attempt before aborting
service.retry.policy.max.time.secs 10 The maximum elapsed time in seconds before retries are aborted
service.retry.policy.type none The type of retry policy for services. Allowed options: "none", "fixed.delay", or "exponential.backoff".
spark.retry.policy.base.delay.ms 1000 The base delay between retries in milliseconds
spark.retry.policy.max.delay.ms 30000 The maximum delay between retries in milliseconds
spark.retry.policy.max.retries 1000 The maximum number of retries to attempt before aborting
spark.retry.policy.max.time.secs 600 The maximum elapsed time in seconds before retries are aborted
spark.retry.policy.type exponential.backoff The type of retry policy for Spark programs. Allowed options: "none", "fixed.delay", or "exponential.backoff".
system.log.process.retry.policy.base.
delay.ms
1000 The base delay between retries in milliseconds
system.log.process.retry.policy.max.
retries
1500 The maximum number of retries to attempt before aborting
system.log.process.retry.policy.max.
time.secs
1500 The maximum elapsed time in seconds before retries are aborted
system.log.process.retry.policy.type fixed.delay The type of retry policy for log processing. Allowed options: "none", "fixed.delay", or "exponential.backoff".
system.metrics.retry.policy.base.delay.
ms
1000 The base delay between retries in milliseconds
system.metrics.retry.policy.max.retries 600 The maximum number of retries to attempt before aborting
system.metrics.retry.policy.max.time.
secs
600 The maximum elapsed time in seconds before retries are aborted
system.metrics.retry.policy.type fixed.delay The type of retry policy for metrics publishing. Allowed options: "none", "fixed.delay", or "exponential.backoff".
worker.retry.policy.base.delay.ms 1000 The base delay between retries in milliseconds
worker.retry.policy.max.delay.ms 30000 The maximum delay between retries in milliseconds
worker.retry.policy.max.retries 1000 The maximum number of retries to attempt before aborting
worker.retry.policy.max.time.secs 600 The maximum elapsed time in seconds before retries are aborted
worker.retry.policy.type exponential.backoff The type of retry policy for workers. Allowed options: "none", "fixed.delay", or "exponential.backoff".
workflow.retry.policy.base.delay.ms 1000 The base delay between retries in milliseconds
workflow.retry.policy.max.delay.ms 30000 The maximum delay between retries in milliseconds
workflow.retry.policy.max.retries 1000 The maximum number of retries to attempt before aborting
workflow.retry.policy.max.time.secs 600 The maximum elapsed time in seconds before retries are aborted
workflow.retry.policy.type exponential.backoff The type of retry policy for workflows. Allowed options: "none", "fixed.delay", or "exponential.backoff".

πŸ”—Router

Parameter Name Default Value Description
router.audit.path.check.enabled true Determines if to check the number of paths for audit logging
router.bind.address 0.0.0.0 CDAP Router service bind address
router.bind.port 11015 CDAP Router service bind port
router.client.boss.threads 1 The number of boss threads in the CDAP Router service client
router.client.worker.threads 10 The number of worker threads in the CDAP Router service client
router.connection.backlog 20000 The connection backlog in the CDAP Router service
router.connection.idle.timeout.secs 15 Time in seconds after an HTTP request completes that idle router connections are closed
router.server.address 127.0.0.1 CDAP Router service address to which CDAP UI connects
router.server.boss.threads 1 The number of boss threads in the CDAP Router service
router.server.port ${router.bind.port} CDAP Router service port, for clients and webapps
router.server.worker.threads 10 The number of worker threads in the CDAP Router service
router.ssl.bind.port 10443 CDAP Router service bind port for HTTPS
router.ssl.server.port ${router.ssl.bind.port} CDAP Router service bind port for HTTPS, for clients and webapps
router.userservice.fallback.strategy random If a RouteConfig is not found for a particular user service, this property is used to set the fallback routing strategy. Allowed options: "random", "smallest", "largest", or "drop". A string comparison of the versions of the user service is used for "smallest" or "largest". The "drop" option will not route the request to any service and will return "service not found".

πŸ”—Security

Parameter Name Default Value Description
cdap.master.kerberos.keytab   The full path to the Kerberos keytab file containing the CDAP Master service's credentials
cdap.master.kerberos.principal   Example: "CDAP_PRINCIPAL/_HOST@EXAMPLE.COM". The Kerberos primary user that should be used to login to the CDAP Master service. Substitute the Kerberos primary (user) for CDAP_PRINCIPAL, and your domain for EXAMPLE.COM. The string "_HOST" will be substituted with the local hostname.
cdap.ugi.cache.expiration.ms 3600000 UserGroupInformation cache entry expiration time in milliseconds. It is only used when impersonation is enabled.
kerberos.auth.enabled ${security.enabled} Determines if Kerberos authentication is enabled
kerberos.auth.relogin.interval.seconds 300 Re-login interval in seconds for Kerberos keytab
security.auth.server.announce.urls   CDAP Authentication service announce URL's separated by comma. Each URL is in the format of protocol://host:port. These are the URL's that clients should use to communicate with the Authentication Server. Leave empty to use the default value generated by the Authentication Server.
security.auth.server.bind.address 0.0.0.0 CDAP Authentication service bind address
security.auth.server.bind.port 10009 CDAP Authentication service bind port
security.auth.server.ssl.bind.port 10010 CDAP Authentication service bind port for HTTPS
security.authentication.basic.realmfile   Username and password file to use when basic authentication is configured
security.authentication.handlerClassName   Name of the authentication implementation to use to validate user credentials
security.authentication.loginmodule.
className
  JAAS LoginModule implementation to use when co.cask.security.server.JAASAuthenticationHandler is configured for ${security.authentication.handlerClassName}
security.authorization.admin.users   A comma-separated list of users for whom admin privileges are to be granted on the CDAP instance during CDAP startup, so that these users can create namespaces. These users are also granted admin privileges on the 'default' namespace, so that they can manage privileges on the 'default' namespace. This provides a method to bootstrap CDAP on an authorization-enabled cluster. The default value is empty, in which case no users have access to creating namespaces or to managing privileges on the 'default' namespace. In that scenario, authorization in CDAP has to be bootstrapped externally using interfaces provided by the configured authorization extension.
security.authorization.cache.enabled true Determines if authorization policies can be cached by programs; defaults to true
security.authorization.cache.refresh.
interval.secs
50 The interval in seconds after which a background thread will refresh cached authorization policies. It is recommended to keep this value slightly lower than ${security.authorization.cache.ttl.secs}, so that the cached authorization policies do not expire unless they have been explicitly revoked. This setting only takes effect if ${security.authorization.cache.enabled} is set to true.
security.authorization.cache.ttl.secs 60 The time-to-live in seconds for entries in the authorization cache used by programs. This setting only takes effect if ${security.authorization.cache.enabled} is set to true.
security.authorization.enabled false When set to true, all operations in CDAP are authorized using the authorizer implementation found at the property ${security.authorization.extension.jar.path}
security.authorization.extension.jar.
path
  If an external authorization system is used for authorizing operations on CDAP entities, this property sets the path to the bundled JAR file containing the extension code. This jar is only used when authorization is enabled by setting ${security.authorization.enabled} to true.
security.data.keyfile.path ${local.data.dir}/security/keyfile Path to the secret key file (only used in Standalone CDAP)
security.enabled false Determines if authentication is enabled for CDAP; if true, all requests to CDAP must provide a valid access token
security.keytab.path   The location of Kerberos keytabs used for impersonation. The location can contain ${name}, which will be replaced by the short user name of the principal being impersonated.
security.realm cdap Authentication realm used for scoping security; this value should be unique for each installation of CDAP
security.server.extended.token.
expiration.ms
604800000 Admin tool access token expiration time in milliseconds; defaults to 1 week (internal)
security.server.maxthreads 100 Maximum number of threads that the CDAP Authentication service should use for handling HTTP requests
security.server.token.expiration.ms 86400000 Access token expiration time in milliseconds; defaults to 24 hours
security.store.file.name securestore Name of the secure store file
security.store.file.path ${local.data.dir}/store Location of the encrypted file which holds the secure store entries
security.store.provider none Backend provider for the secure store; use 'none' if no secure store
security.token.digest.algorithm HmacSHA256 Algorithm used for generating MAC of access tokens
security.token.digest.key.expiration.ms 3600000 Duration in milliseconds after which an active secret key used for signing tokens should be retired
security.token.digest.keylength 128 Key length used in generating the secret keys for generating MAC of access tokens
security.token.distributed.parent.znode /${root.namespace}/security/auth Parent node in ZooKeeper used for secret key distribution in Distributed CDAP
ssl.external.enabled false Enable SSL for external services
ssl.internal.enabled false Enable SSL between Router and App Fabric

πŸ”—Stream

Parameter Name Default Value Description
stream.async.queue.size 100 Queue size per async worker thread for queuing up async write requests
stream.async.worker.threads ${stream.worker.threads} Number of async worker threads for handling async write requests
stream.base.dir /streams The directory root for all stream files, relative to the HDFS namespace
stream.batch.buffer.threshold 1048576 Bytes retained in-memory before writing to a new stream file
stream.bind.address 0.0.0.0 Stream HTTP service bind address
stream.bind.port 0 Stream HTTP service bind port; if 0, binds to a random port
stream.consumer.table.presplits 16 Number of splits for the stream consumer table
stream.container.instance.id 0 Instance ID for the stream service container; the actual value will be set at runtime by the system automatically
stream.container.instances 1 Number of YARN container instances for the stream handler; in Standalone CDAP, it is always one
stream.container.memory.mb ${master.service.memory.mb} Memory in megabytes for each YARN container that runs the stream handler
stream.container.num.cores 2 Number of virtual cores for the YARN container that runs the stream handler
stream.event.ttl 9223372036854775807 Default time-to-live in milliseconds (Long.MAX_VALUE) for stream events
stream.file.cleanup.period 300000 Interval in milliseconds for running the stream file cleanup process
stream.file.prefix file Prefix of file name for stream file
stream.index.interval 10000 Interval in milliseconds for emitting new index entry in stream file
stream.instance.file.prefix [Final]
${stream.file.prefix}.${stream.
container.instance.id}
Prefix of file name for stream file per writer instance
stream.notification.threshold 1024 Size of data in megabytes to be ingested by a stream before a notification is published
stream.partition.duration 3600000 Duration in milliseconds of a stream partition
stream.size.schedule.polling.delay 600 Delay in seconds to poll a stream in a StreamSizeSchedule if no notification is received
stream.worker.threads ${http.service.worker.threads} Default number of IO worker threads for the stream HTTP service

πŸ”—UI

Parameter Name Default Value Description
dashboard.bind.address 0.0.0.0 CDAP UI bind address
dashboard.bind.port 11011 CDAP UI bind port
dashboard.router.check.timeout.secs 0 Interval in seconds that the CDAP UI waits before exiting when unable to connect to the CDAP Router service on startup; use a timeout of 0 to wait indefinitely
dashboard.ssl.bind.port 9443 CDAP UI bind port for HTTPS
dashboard.ssl.disable.cert.check false True to disable SSL certificate check from the CDAP UI
http.client.connection.timeout.ms 60000 Connection timeout in milliseconds for internal HTTP requests
http.client.read.timeout.ms 60000 Read timeout in milliseconds for internal HTTP requests

πŸ”—Deprecated Properties

These properties are deprecated as of CDAP 4.1.1 and should not be used. They will be removed in a future release. Replacement properties are listed as noted.

Parameter Name Default Value Description
app.bind.address 0.0.0.0 App Fabric service bind address (deprecated; use ${master.services.bind.address} instead)
audit.kafka.topic audit Apache Kafka topic name to which audit messages are published
dataset.service.bind.address 0.0.0.0 Dataset service bind address (deprecated; use ${master.services.bind.address} instead)
explore.executor.container.instances 1 Number of explore executor instances (deprecated; instance count is always set to 1)
explore.executor.max.instances 1 Maximum number of explore executor instances (deprecated; instance count is always set to 1)
kafka.bind.address ${kafka.server.host.name} CDAP Kafka service bind port (deprecated; use ${kafka.server.host.name} instead)
kafka.bind.port ${kafka.server.port} CDAP Kafka service bind port (deprecated; use ${kafka.server.port} instead)
kafka.default.replication.factor
${kafka.server.default.replication.
factor}
CDAP Kafka service replication factor (deprecated; use ${kafka.server.default.replication.factor} instead)
kafka.log.dir ${kafka.server.log.dirs} CDAP Kafka service log storage directory (deprecated; use ${kafka.server.log.dirs} instead)
kafka.log.retention.hours ${kafka.server.log.retention.hours} The number of hours to keep a log file before deleting it (deprecated; use ${kafka.server.log.retention.hours} instead)
kafka.num.partitions ${kafka.server.num.partitions} Default number of partitions for a topic (deprecated; use ${kafka.server.num.partitions} instead)
kafka.zookeeper.connection.timeout.ms
${kafka.server.zookeeper.connection.
timeout.ms}
The maximum time (in milliseconds) that the client will wait to establish a connection to Zookeeper (deprecated; use ${kafka.server.zookeeper.connection.timeout.ms} instead)
log.cleanup.max.num.files 1000 Maximum number of files scanned in one iteration
log.cleanup.run.interval.mins 1440 Log cleanup interval in minutes
log.retention.duration.days 7 Duration (the time-to-live) in days of saved log files in HDFS retention
log.saver.checkpoint.interval.ms 60000 The time between log saver checkpoints in milliseconds (deprecated: use ${log.process.pipeline.checkpoint.interval.ms} instead)
log.saver.run.memory.megs 1024 Memory in megabytes allocated for log saver instances to run in YARN (deprecated: use ${log.saver.container.memory.mb} instead)
log.saver.run.num.cores 2 Number of cores for each log saver instance in YARN (deprecated: use ${log.saver.container.num.cores} instead)
notification.kafka.topic notifications Kafka topic name used to publish notifications
notification.transport.system kafka Transport system used by the notification system; can be either 'kafka' or 'stream'
security.auth.server.address 0.0.0.0 CDAP Authentication service bind address (deprecated; use ${security.auth.server.bind.address} instead)
ssl.enabled false Determines if SSL is enabled (deprecated; use ${ssl.external.enabled} instead)
security.auth.server.announce.address   CDAP Authentication service announce address. This is the address in the URL that clients should use to communicate with the Authentication Server. Leave empty to use the default value generated by the Authentication Server. (deprecated; use ${security.auth.server.announce.urls} instead)