Appendix: cdap-site.xml, cdap-default.xml

The cdap-site.xml file is the configuration file for a CDAP installation. Its properties and values determine the settings used by CDAP when starting and operating.

Any properties not found in an installation's cdap-site.xml will use a default parameter value defined in the file cdap-default.xml. It is located in the CDAP JARs, and should not be altered.

Any of the default values (with the exception of those marked [Final]) can be over-ridden by defining a modifying value in the cdap-site.xml file, located (by default) either in <CDAP-HOME>/conf/cdap-site.xml (CDAP Sandbox) or /etc/cdap/conf/cdap-site.xml (Distributed CDAP).

The section below are the parameters that can be defined in the cdap-site.xml file, their default values (obtained from cdap-default.xml) and their descriptions.

Notes

  • [Final]: Properties marked as [Final] indicates that their value cannot be changed, even with a setting in the cdap-site.xml.
  • Kafka Server: All properties that begin with kafka.server. are passed to the CDAP Kafka service when it is started up.
  • Security: For information on configuring the cdap-site.xml file, its security section, and CDAP for security, see the documentation Security section.

General

Parameter Name Default Value Description
cluster.name   A cluster-based name for CDAP. It is used for scope resolution of preferences and runtime arguments. For example: the preference key "cluster.[cluster.name].my.key" would be resolved to "my.key" at runtime; a program can then retrieve the preference value by using just "my.key". The administrator can use this property to set different preferences for each cluster.
hdfs.lib.dir ${hdfs.namespace}/lib Common directory in HDFS for, among others, JAR files for coprocessors
hdfs.namespace /${root.namespace} Root directory for HDFS files written by CDAP
hdfs.user yarn User name for accessing HDFS
instance.name ${root.namespace} Determines a unique identifier for a CDAP instance. It is used for providing authorization to a particular CDAP instance. Must be alphanumeric, and should not be changed after CDAP has been started. If it is changed, there is a risk of losing data (for example, authorization policies).
local.data.dir data Data directory for CDAP Local Sandbox and the CDAP Master process in Distributed CDAP
mapreduce.include.custom.format.classes true Indicates whether to include custom input/output format classes in the job.jar or not; if set to true, custom format classes will be added to the job.jar and available as part of the MapReduce system classpath
mapreduce.jobclient.connect.max.retries 2 Indicates the maximum number of retries the JobClient will make to establish a service connection when retrieving job status and history
mapreduce.status.report.interval.seconds 60 Time in seconds between reporting the status, including retrieval of the job's task report, while a MapReduce program is running.
master.manage.hbase.coprocessors true Whether CDAP Master should manage HBase coprocessors. This should only be set to false if you are managing coprocessors yourself in order to support rolling HBase upgrades.
master.startup.checks.classes   Comma-separated list of classnames for checks that will be run before the CDAP Master starts up. If any of the checks fails, the CDAP Master will not start up. Checks will only be run if ${master.startup.checks.enabled} is set to true.
master.startup.checks.enabled true Whether checks should be run before startup to determine if the CDAP Master can be run correctly. Which checks are run is determined by the ${master.startup.checks.packages} and ${master.startup.checks.classes} settings. If any checks fail, the CDAP Master will fail to start instead of waiting for the problem to be fixed. This setting only affects Distributed CDAP. It does not apply to CDAP Local Sandbox.
master.startup.checks.packages
co.cask.cdap.master.startup,
co.cask.cdap.data.startup
Comma-separated list of packages containing checks that will be run before the CDAP Master starts up. If any of the checks fails, the CDAP Master will not start up. Checks will only be run if ${master.startup.checks.enabled} is set to true.
namespaces.dir namespaces The sub-directory of ${hdfs.namespace} in which namespaces are stored
root.namespace cdap Root for this CDAP instance; used as the parent (or root) node for ZooKeeper, as the directory under which all CDAP data and metadata is stored in HDFS, and as the prefix for all HBase tables created by CDAP; must be composed of alphanumeric characters
thrift.max.read.buffer 16777216 Specifies the maximum read buffer size in bytes used by the Thrift service; value should be set to greater than the maximum frame sent on the RPC channel
twill.java.heap.memory.ratio 0.6 The minimum ratio of heap to non-heap memory for all launched Apache Twill containers. Container-specific settings also exist for CDAP system containers.
twill.java.reserved.memory.mb 768 Desired reserved non-heap memory in megabytes for all launched Apache Twill containers. The actual value is bounded by the ${twill.java.heap.memory.ratio} setting of the container memory size. Container-specific settings also exist for CDAP system containers.
twill.jvm.gc.opts
-XX:+UseG1GC
-verbose:gc
-Xloggc:&lt;LOG_DIR&gt;/gc.log
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=1M
Java garbage collection options for all Apache Twill containers; "&lt;LOG_DIR&gt;" is the location of the log directory in the container; note that the special characters are replaced with entity equivalents so they can be included in the XML
twill.location.cache.dir .cache The relative directory name on the distributed file system for Apache Twill to cache generated files, to speed up launching applications. This directory is relative to ${root.namespace}/twill on the file system.
twill.no.container.timeout 120000 Duration in milliseconds to wait for at least one container for Apache Twill runnable
twill.yarn.am.memory.mb 512 The memory size in megabytes of the Apache Twill application master container
twill.yarn.am.reserved.memory.mb ${twill.java.reserved.memory.mb} Desired reserved non-heap memory in megabytes for Apache Twill application master container. The actual value is bounded by the ${twill.java.heap.memory.ratio} of the ${twill.yarn.am.memory.mb} setting.
twill.zookeeper.namespace /twill ZooKeeper namespace prefix for Apache Twill
upgrade.thread.pool.size 1 Number of threads to be used running operations concurrently during a CDAP upgrade
zookeeper.client.startup.timeout.millis 60000 Duration in milliseconds to wait for a successful connection to a server in the ZooKeeper quorum
zookeeper.quorum 127.0.0.1:2181/${root.namespace} ZooKeeper quorum string; specifies the ZooKeeper host:port; substitute the quorum (FQDN1:2181,FQDN2:2181,...) for the components shown here
zookeeper.session.timeout.millis 40000 ZooKeeper session timeout in milliseconds

Global

Parameter Name Default Value Description
dataset.unchecked.upgrade false If false, any changes made to existing datasets are not deployed when an app is redeployed; setting this value to true allows the dataset changes to be deployed upon app redeployment
enable.unrecoverable.reset false Determines if resetting CDAP should be enabled. WARNING: Enabling this option makes it possible to delete all applications and data; NO RECOVERY IS POSSIBLE!

Applications

Parameter Name Default Value Description
app.artifact.dir /opt/cdap/master/artifacts Semicolon-separated list of local directories scanned for system artifacts to add to the artifact repository
app.bind.port 0 App Fabric service bind port; if 0, binds to a random port
app.meta.upgrade.timeout.secs 60 Timeout value in seconds while upgrading application versions
app.output.dir /programs Directory where all archives are stored
app.program.extra.classpath   Additional Java classpath for CDAP programs. These extra classpaths must be present on all nodes in the cluster. Supports wildcard suffix "*" to include all JAR files under a directory.
app.program.jvm.opts ${twill.jvm.gc.opts} Java options for all Apache Twill containers
app.program.max.start.seconds 300 Maximum number of seconds to wait for a program to start before killing it
app.program.max.stop.seconds 300 Maximum number of seconds to wait for a program to stop before killing it
app.program.runid.corrector.interval 180 Interval in seconds of how often the run id corrector thread will run; this value should be greater than 0
app.program.runid.corrector.tx.batch.
size
1000 Number of run records being fetched per transaction for checking if needed for correction. This value is directly proportional to the ${data.tx.timeout} setting.
app.program.local.dataset.deleter.
initial.delay
300 Interval in seconds for initial delay for the local dataset deletion thread; this value should be greater than 0
app.program.local.dataset.deleter.
interval
3600 Interval in seconds of how often the local dataset deletion thread will run; this value should be greater than 0
app.program.runtime.extensions.dir /opt/cdap/master/ext/runtimes Semicolon-separated list of local directories that are scanned for program runtime extensions
app.program.spark.yarn.client.rewrite.
enabled
true Specify whether to rewrite the YARN 'Client.scala' class in Spark to work around issue SPARK-13441 in CDH clusters
app.program.status.event.fetch.size 100 Maximum number of events to fetch from the messaging system in each processing cycle for program status update events
app.program.status.event.poll.delay.
millis
2000 The delay in milliseconds to check again for new program status events after it detects there was no event
app.program.yarn.attempt.failures.
validity.interval
60000 The interval in milliseconds for the time window used by YARN Resource Manager to check for application max failure attempts. By default, this is only used for long running program, but can be override through runtime argument system.yarn.attempt.failures.validity.interval
app.program.transaction.control implicit Defines how transactions are controlled for program methods invocation; "implicit" means that the platform encloses method execution into a transaction, whereas "explicit" means that the method itself is in control of executing transactions.
app.ssl.bind.port 30443 App Fabric service bind port for HTTPS
app.temp.dir /tmp Temp directory
apps.scheduler.queue   Scheduler queue for CDAP programs and CDAP Explore queries
app.deploy.update.schedules true If true, redeploying an application will modify any schedules that currently exist for the application; if false, redeploying an application does not create any new schedules and existing schedules are neither deleted nor updated. This property only affects the redeployment of an application; all related actions or endpoints are unaffected.
master.services.bind.address 0.0.0.0 Bind address for app fabric service and dataset service
program.container.dist.jars   Additional jars to be localized to every program container and to be added to classpaths of CDAP programs. They can be local file paths on the CDAP Master or URIs of remote files. Multiple JAR files are comma- separated.
scheduler.max.thread.pool.size 100 Size of the scheduler thread pool
scheduler.misfire.threshold.ms 60000 The number of milliseconds by which a schedule execution can miss its next-fire-time and still run
scheduler.event.poll.delay.millis 2000 The delay in milliseconds that the scheduler checks again for new events after it detects there was no event
scheduler.time.event.fetch.size 100 Maximum number of events to fetch from the messaging system in each processing cycle for time schedule events
scheduler.stream.size.event.fetch.size 100 Maximum number of events to fetch from the messaging system in each processing cycle for stream size schedule events
scheduler.data.event.fetch.size 100 Maximum number of events to fetch from the messaging system in each processing cycle for data schedule events
scheduler.program.status.event.fetch.
size
100 Maximum number of events to fetch from the messaging system in each processing cycle for program status schedule events
time.event.topic timeevent Topic name for publishing time events from time scheduler to the messaging system
program.status.event.topic programstatusevent Topic name for publishing status transitioning events of program runs to the messaging system
program.status.record.event.topic programstatusrecordevent Topic name for publishing program status recording events to the messaging system
workflow.token.max.size.mb 30 Maximum allowed size in megabytes of a workflow token; if the workflow token exceeds this size, no further updates are allowed

Audit configuration

Parameter Name Default Value Description
audit.enabled true Determines whether to publish audit messages
audit.publish.timeout.ms 2000 Audit message publishing timeout in milliseconds
audit.topic audit Topic name used to publish audit messages in the messaging system

Datasets

Parameter Name Default Value Description
data.local.storage ${local.data.dir}/ldb Database directory for LevelDB, used for data fabric in CDAP Local Sandbox
data.local.storage.blocksize 1024 Block size in bytes for data fabric when in CDAP Local Sandbox
data.local.storage.cachesize 104857600 Cache size in bytes for data fabric when in CDAP Local Sandbox
data.event.topic dataevent Topic name for publishing data events to the messaging system
data.tx.bind.address 0.0.0.0 Transaction service bind address
data.tx.bind.port 0 Transaction service bind port; if 0, binds to a random port
data.tx.changeset.count.limit 2147483647 Hard limit for the number of entries in a transaction's change set; if exceeded, the transaction fails. By default, this is unlimited (that is, Int.MAX_VALUE).
data.tx.changeset.count.warn.threshold 50000 Soft limit for the number of entries in a transaction's change set; if exceeded, a warning is logged.
data.tx.changeset.size.limit 9223372036854775807 Hard limit for the aggregate size in bytes of a transaction's change set; if exceeded, the transaction fails. By default, this is unlimited (that is, Long.MAX_VALUE).
data.tx.changeset.size.warn.threshold 5000000 Soft limit for the aggregate size in bytes of a transaction's change set; if exceeded, a warning is logged.
data.tx.client.count 50 The number of pooled instances of the transaction client; increase this to increase transaction concurrency
data.tx.client.provider pool Provider strategy for transaction clients; valid values are "pool" and "thread-local"
data.tx.discovery.service.name transaction Name in discovery service for the transaction service
data.tx.hdfs.user ${hdfs.user} User name for accessing HDFS (if not running in secure HDFS)
data.tx.janitor.enable true Determines if the TransactionDataJanitor coprocessor is enabled on tables; normally should be true
data.tx.max.instances ${master.service.max.instances} Maximum number of transaction service instances. Increasing the number of transaction service instances only improves availability, but not scalability
data.tx.max.timeout 600 The limit for the allowed transaction timeout, in seconds. Attempts to start a transaction with a longer timeout will fail.
data.tx.memory.mb ${master.service.memory.mb} Memory in megabytes for each transaction service instance
data.tx.num.cores ${master.service.num.cores} Number of virtual cores for the transaction service
data.tx.num.instances 1 Requested number of transaction service instances
data.tx.prune.enable false Enable invalid transaction list pruning
data.tx.prune.plugins data.tx.pruning.plugin List of transaction pruning plugins; for CDAP HBase tables that use transaction functionality to skip or clean invalid data
data.tx.prune.state.table
${dataset.table.prefix}_system:tephra.
state
Table used to store intermediate state when invalid transaction list pruning is enabled
data.tx.pruning.plugin.class
co.cask.data2.txprune.
DefaultHBaseTransactionPruningPlugin
Class name for the default transaction pruning plugin
data.tx.retain.client.id committed Whether and how long to retain the client id of a transaction. Valid values are: "off" to disable retention of the client id; "active" to retain the client id until a transaction is committed; or "committed" to retain the client id as long as its change set participates in conflict detection. Retaining the client id slightly increases the memory footprint of the transaction service. Client ids are never retained past a restart or fail-over of the transaction manager.
data.tx.server.io.threads 2 Number of IO threads for the transaction service
data.tx.server.threads 25 Number of threads for the transaction service
data.tx.snapshot.codecs
org.apache.tephra.snapshot.SnapshotCodecV3,
org.apache.tephra.snapshot.SnapshotCodecV4
Specifies the class names of all supported transaction state codecs
data.tx.snapshot.dir ${hdfs.namespace}/tx.snapshot Directory in HDFS used to store snapshots and logs of transaction state
data.tx.snapshot.interval 60 Frequency of transaction snapshots in seconds
data.tx.snapshot.local.dir ${local.data.dir}/tx.snapshot Storage directory on the local filesystem of snapshot and logs of transaction state when in CDAP Local Sandbox
data.tx.snapshot.retain 10 Number of transaction snapshot files to retain as backups
data.tx.thrift.max.read.buffer ${thrift.max.read.buffer} Maximum read buffer size in bytes used by the transaction service; the value should be set to something greater than the maximum frame sent on the RPC channel
data.tx.timeout 30 Timeout value in seconds for a transaction; if the transaction is not finished in that time, it is marked invalid
dataset.data.dir data Base directory for user data on the filesystem
dataset.executor.bind.port 0 Dataset executor bind port; if 0, binds to a random port
dataset.executor.container.instances 1 Number of dataset executor instances
dataset.executor.container.memory.mb ${master.service.memory.mb} Memory in megabytes for each dataset executor instance
dataset.executor.container.num.cores 1 Number of virtual cores for each dataset executor instance
dataset.executor.max.instances ${master.service.max.instances} Maximum number of dataset executor instances
dataset.extensions.dir /opt/cdap/ext/lib Directory where all dataset extensions are stored
dataset.service.bind.port 0 Dataset service bind port; if 0, binds to a random port
dataset.service.boss.threads 1 Number of Netty service boss threads for the dataset service
dataset.service.connection.backlog 20000 Maximum connection backlog of the dataset service
dataset.service.exec.threads 30 Number of Netty service executor threads for the dataset service
dataset.service.output.dir /datasets Directory where all dataset modules archives are stored
dataset.service.worker.threads 10 Number of Netty service worker threads for the dataset service
dataset.table.prefix ${root.namespace} Prefix for dataset table name

Explore Service

Parameter Name Default Value Description
explore.active.operation.timeout.secs 82800 Timeout value in seconds for an SQL operation whose result was not fetched completely
explore.cleanup.job.schedule.secs 60 Interval in seconds to schedule the clean-up of timed-out operations
explore.container.yarn.app.classpath.
first
false Determines if the YARN application classpath precedes the query engine classpath
explore.enabled true Determines if the CDAP Explore Service (ad-hoc SQL queries) is enabled
explore.executor.container.memory.mb 2048 Memory in megabytes for each CDAP Explore executor instance. This is explicitly set differently than ${master.service.memory.mb} as Explore requires more memory to run than the CDAP Master service.
explore.executor.container.num.cores 1 Number of virtual cores for each CDAP Explore executor instance
explore.http.timeout 20 The timeout in seconds for HTTP requests to the CDAP Explore service. Because requests may happen within a transaction, it is recommended to keep this timeout noticeably shorter than the default transaction timeout, ${data.tx.timeout}.
explore.inactive.operation.timeout.secs 3600 Timeout value in seconds for an SQL operation which does not have any more results to be fetched
explore.local.data.dir ${local.data.dir}/explore Data directory for the CDAP Explore service when in CDAP Local Sandbox
explore.service.bind.port 0 CDAP Explore service bind port; if 0, binds to a random port
explore.start.on.demand false Determines the start-up of the CDAP Explore service (ad-hoc SQL queries); if false, the Explore service starts up when CDAP is started; if true, the Explore service will start upon the first query it receives
explore.writes.enabled true Determines if writing to a table through the CDAP Explore service (ad- hoc SQL queries) is enabled
hive.version.resolution.strategy auto.strict Determines how to behave when the Hive version on a cluster is an unsupported value. The default value of "auto.strict" will require that the Hive version matches a supported value; if Explore is enabled, CDAP Master will then not start if the Hive version is unsupported. Set to "auto.latest" to use the latest Hive version of CDAP modules available on the cluster with an unsupported Hive version. This property is ignored for supported versions of Hive or if Explore has been disabled by setting ${explore.enabled} to false.
hive.server2.jdbc.url   The JDBC URL for the HiveServer2 in the cluster. This is needed if user programs need HiveServer2 delegation token to interact with HiveServer2. It can remain empty if the delegation token is not required for the user programs.

Gateway

Parameter Name Default Value Description
app.boss.threads 1 Number of Netty service boss threads
app.connection.backlog 20000 Maximum connection backlog of CDAP Master
app.exec.threads 20 Number of Netty service executor threads
app.worker.threads 10 Number of Netty service worker threads

Kafka Server

Parameter Name Default Value Description
kafka.seed.brokers 127.0.0.1:9092 Comma-separated list of CDAP Kafka service brokers; for Distributed CDAP, replace with list of FQDN:port brokers
kafka.server.default.replication.factor 1 CDAP Kafka service replication factor; used to replicate Kafka messages across multiple machines to prevent data loss in the event of a hardware failure. The recommended setting is to run at least two CDAP Kafka servers. If you are running two CDAP Kafka servers, set this value to 2; otherwise, set it to the maximum number of tolerated machine failures plus one (assuming you have that number of machines).
kafka.server.host.name 0.0.0.0 CDAP Kafka service bind address
kafka.server.log.dirs /tmp/kafka-logs Comma-separated list of CDAP Kafka service log storage directories
kafka.server.log.flush.interval.messages 10000 The interval length (in number of messages in the CDAP Kafka service) at which to force an fsync of data written to the log
kafka.server.log.retention.hours 24 The number of hours to keep a log file before deleting it; this is the time-to-live in the CDAP Kafka service, while a log is in-flight between the container and the CDAP log saver
kafka.server.num.partitions 10 Default number of partitions for a topic in the CDAP Kafka service
kafka.server.port 9092 CDAP Kafka service bind port
kafka.server.zookeeper.connection.
timeout.ms
1000000 Maximum time in milliseconds that the CDAP Kafka service will wait to establish a connection to ZooKeeper
kafka.zookeeper.namespace kafka CDAP Kafka service ZooKeeper namespace
kafka.zookeeper.quorum   CDAP Kafka service ZooKeeper quorum and namespace. If set, this will override the ZooKeeper quorum (set by ${zookeeper.quorum}) and the ZooKeeper namespace (set by ${kafka.zookeeper.namespace}) when setting up a connection to the Kafka service used by CDAP. If the same Kafka service ZooKeeper quorum and namespace are shared by multiple CDAP instances, each CDAP instance needs to distinguish its Kafka topics from those of other CDAP instances with unique values for ${log.kafka.topic} and ${metrics.topic.prefix}.

Logging

Parameter Name Default Value Description
log.base.dir /logs/avro In Distributed CDAP, the HDFS directory under which the system log pipeline saves log files
log.collection.root ${local.data.dir}/logs In CDAP Local Sandbox, the local directory under which the system log pipeline saves log files
log.kafka.topic logs.user-v2 Kafka topic name used to publish logs
log.tms.topic.prefix logs TMS topic prefix used to publish logs
log.tms.queue.size 512 The buffer size used in TMS Log Appender
log.pipeline.cdap.dir.permissions 700 Permissions used by the system log pipeline when creating directories
log.pipeline.cdap.file.cleanup.interval.
mins
1440 Time in minutes between runs of the log cleanup thread
log.pipeline.cdap.file.cleanup.
transaction.timeout
60 Transaction timeout in seconds used by the log cleanup thread. This should not be greater than ${data.tx.max.timeout}.
log.pipeline.cdap.file.max.lifetime.ms 21600000 Maximum time span in milliseconds of a log file created by the system log pipeline
log.pipeline.cdap.file.max.size.bytes 104857600 Maximum size in bytes of a log file created by the system log pipeline
log.pipeline.cdap.file.permissions 600 Permissions used by the system log pipeline when creating files
log.pipeline.cdap.file.retention.
duration.days
7 Time in days a log file is retained
log.process.pipeline.checkpoint.
interval.ms
10000 The time between log processing pipeline checkpoints in milliseconds
log.process.pipeline.config.dir /opt/cdap/master/ext/logging/config A local directory on the CDAP Master that is scanned for log processing pipeline configurations. Each pipeline is defined by a file in the logback XML format, with ".xml" as the file name extension.
log.process.pipeline.event.delay.ms 2000 The time a log event stays in the log processing pipeline buffer before writing out to log appenders in milliseconds. A longer delay will result in better time ordering of log events before presenting to log appenders but will consume more memory.
log.process.pipeline.kafka.fetch.size 1048576 The buffer size in bytes, per topic partition, for fetching log events from Kafka
log.process.pipeline.lib.dir /opt/cdap/master/ext/logging/lib Comma-separated list of local directories on the CDAP Master scanned for additional library JAR files to be included for log processing
log.publish.num.partitions 10 Number of CDAP Kafka service partitions to publish the logs to
log.publish.partition.key program Publish logs from an application or a program to the same partition. Valid values are "application" or "program". If set to "application", logs from all the programs of an application go to the same partition. If set to "program", logs from the same program go to the same partition. Changes to this property requires restarting of all CDAP applications.
log.saver.container.memory.mb ${master.service.memory.mb} Memory in megabytes for each log saver instance to run in YARN.
log.saver.container.num.cores 2 Number of virtual cores for each log saver instance in YARN
log.saver.max.instances ${master.service.max.instances} Maximum number of log saver instances to run in YARN
log.saver.num.instances 1 Number of log saver instances to run in YARN

Market

Parameter Name Default Value Description
market.base.url http://market.cask.co/v2 The base URL of the Cask Market used by the CDAP UI for the Cask Market RESTful API. The default value shown is that of the public Cask Market.

Master

Parameter Name Default Value Description
hbase.client.retries.number 2 Maximum number of retries while performing HBase operations from master services
hbase.rpc.timeout 15000 RPC timeout from HBase operations performed from master services
hbase.version.resolution.strategy auto.strict Determines how to behave when the HBase version on a cluster is an unsupported value. The default value of "auto.strict" will require that the HBase version match a supported value, and CDAP Master will not start if the HBase version is unsupported. Set to "auto.latest" to use the latest HBase version available on the cluster with an unsupported HBase version.
http.service.boss.threads 1 Number of Netty service boss threads for master HTTP services
http.service.connection.backlog 20000 Maximum connection backlog of master HTTP service
http.service.exec.threads 20 Number of Netty service executor threads for master HTTP services
http.service.worker.threads 10 Number of Netty service worker threads for master HTTP services
master.collect.app.containers.log.level ERROR The log level of application container logs that are streamed back to the CDAP Master process log. The levels supported are "ALL", "TRACE", "DEBUG", "INFO", "WARN", "ERROR", and "OFF".
master.collect.containers.log true Determines if master service container logs are streamed back to the CDAP Master process log
master.service.max.instances 5 Maximum number of master service instances
master.service.memory.mb 1024 Macro property to set the default memory in megabytes for each CDAP system container
master.service.num.cores 2 Number of virtual cores for each master service instance
master.services.scheduler.queue   Scheduler queue for CDAP Master services
master.startup.service.timeout.seconds 600 Timeout in seconds for master services to wait for their dependent services to be available. For example, the dataset executor master service requires the transaction service, and will wait for the transaction service to become available while it is starting up. If the timeout is hit, the service will fail to start and the master service will shut itself down. If set to 0 or below, master services will not wait for their dependent services to start before starting themselves.

Messaging System

Parameter Name Default Value Description
messaging.cache.size.mb 30 Memory in megabytes for the cache size used by the messaging service for caching recently-published messages. Currently, only topics listed in the ${messaging.system.topics} configuration have caching enabled. Set it to 0 to disable caching.
messaging.container.instances 1 Number of instances for the messaging service
messaging.container.memory.mb ${master.service.memory.mb} Memory in megabytes for each messaging service instance
messaging.container.num.cores ${master.service.num.cores} Number of virtual cores for each messaging service instance
messaging.coprocessor.metadata.cache.
expiration.seconds
120 Number of seconds after which the metadata cache in HBase data table coprocessors will expire
messaging.ha.fencing.delay.seconds 5 Number of seconds to wait before the leader process start serving requests
messaging.hbase.max.scan.threads 96 Maximum number of threads used for scanning HBase tables
messaging.hbase.scan.cache.rows 1000 Number of rows for caching that will be passed to HBase scanners. Higher caching values will enable faster scanning but will use more memory.
messaging.http.server.consume.chunk.size 60000 Approximate size in bytes of each chunk streamed back to a consumer
messaging.http.server.executor.threads 0 Number of executor threads for the HTTP server in the messaging system. If set to 0, no executor threads will be used and requests will be handled directly in the IO thread.
messaging.http.server.max.request.size.
mb
10 Maximum request content size in megabytes for each request to the HTTP server in the messaging system
messaging.http.server.worker.threads 30 Number of IO threads used by the HTTP server in the messaging system
messaging.local.data.cleanup.frequency.
secs
3600 Scheduling frequency of time-to-live cleanup thread in seconds (only used in CDAP Local Sandbox)
messaging.local.data.dir ${local.data.dir}/messaging Local storage directory for the messaging system (used only in CDAP Local Sandbox)
messaging.max.instances [Final] ${master.service.max.instances} Maximum number of instances for the messaging service. Increasing the number of messaging service instances only improves availability, but not scalability
messaging.message.table.hbase.splits 16 Number of splits to use for the message table in HBase upon table creation
messaging.message.table.name tms.message Name of the message table of the messaging system
messaging.metadata.table.name tms.meta Name of the metadata table of the messaging system
messaging.payload.table.hbase.splits 16 Number of splits to use for the payload table in HBase upon table creation
messaging.payload.table.name tms.payload Name of the payload table of the messaging system
messaging.system.topics [Final]
${audit.topic},
${metadata.messaging.topic},
${data.event.topic},
${metrics.topic.prefix}:${metrics.
messaging.topic.num},
${notification.topic},
${time.event.topic},
${program.status.event.topic},
${program.status.record.event.topic},
${log.tms.topic.prefix}:${log.publish.
num.partitions}
A comma-separated list of topics that are always available in the system namespace. Multiple topics sharing the same prefix and distinguished by different numerical suffixes can be specified with the syntax <common.prefix>:<total.topic.number>, where the <total.topic.number> is the total number of topics sharing the <common.prefix>, and the numerical suffixes will range from 0 to (<total.topic.number> - 1).
messaging.table.expiration.seconds 300 Number of seconds after which the messaging table cache will expire
messaging.table.hbase.split.policy
org.apache.hadoop.hbase.regionserver.
DisabledRegionSplitPolicy
The class name that controls the HBase table region split policy. Ideally, auto-splitting should be disabled for HBase tables used by the messaging system.
messaging.topic.default.ttl.seconds 604800 The default time-to-live in seconds for messages in a topic
messaging.twill.java.heap.memory.ratio 0.6 The minimum ratio of heap to non-heap memory for the messaging service container
messaging.twill.java.reserved.memory.mb 512 Desired reserved non-heap memory in megabytes for the messaging service container. The actual value is bounded by the ${twill.java.heap.memory.ratio} setting of the container memory size.

Metadata

Parameter Name Default Value Description
metadata.max.allowed.chars 50 Maximum number of characters for metadata keys, values, and tags
metadata.service.bind.address 0.0.0.0 Metadata HTTP service bind address
metadata.service.bind.port 0 Metadata HTTP service bind port; if 0, binds to a random port
metadata.service.exec.threads ${http.service.exec.threads} Number of Netty service executor threads for metadata HTTP service
metadata.service.worker.threads ${http.service.worker.threads} Number of Netty service IO worker threads for metadata HTTP service
metadata.messaging.topic metadata Topic name used to publish metadata messages in the messaging system
metadata.messaging.fetch.size 100 Number of messages to fetch from messaging system for each batch
metadata.messaging.poll.delay.millis 2000 The delay in milliseconds that the lineage processor checks again for new events after it detects there was no event
metadata.upgrade.migration.batch.size 1000 Number of metadata value or history rows to be migrated to v2 metadata table in a batch

Metrics

Parameter Name Default Value Description
metrics.boss.threads ${http.service.boss.threads} Number of Netty service boss threads for metrics HTTP services
metrics.connection.backlog ${http.service.connection.backlog} Maximum connection backlog of metrics HTTP service
metrics.data.table.retention.resolution.
1.seconds
7200 Retention resolution in seconds of the 1-second resolution table; default retention period is 2 hours
metrics.data.table.retention.resolution.
3600.seconds
2592000 Retention resolution in seconds of the 1-hour resolution table; default retention period is 30 days
metrics.data.table.retention.resolution.
60.seconds
2592000 Retention resolution in seconds for the 1-minute resolution table; default retention period is 30 days
metrics.data.table.ts.rollTime.3600 24 Number of columns in a 1-hour resolution timeseries table
metrics.data.table.ts.rollTime.60 60 Number of columns in a 1-minute resolution timeseries table
metrics.dataset.hbase.stats.report.
interval
60 Report interval in seconds for HBase stats
metrics.dataset.leveldb.stats.report.
interval
60 Report interval in seconds for LevelDB stats
metrics.exec.threads ${http.service.exec.threads} Number of Netty service executor threads for metrics HTTP services
metrics.kafka.meta.table metrics.kafka.meta Name of the Kafka metrics meta table
metrics.kafka.partition.size 10 Number of partitions for the Kafka metrics topic
metrics.kafka.topic.prefix metrics Topic prefix used to publish metrics in Kafka
metrics.max.instances ${master.service.max.instances} Maximum number of instances for the metrics service
metrics.memory.mb ${master.service.memory.mb} Memory in megabytes for each metrics service instance
metrics.messaging.meta.table metrics.messaging.meta Name of the messaging metrics meta table
metrics.messaging.topic.num 10 Number of topics for metrics messages. This property also sets the number of threads used to fetch and process metrics in parallel from the messaging service. For a value of N, topics will be created for metrics with names beginning at ${metrics.topic.prefix}0, ${metrics.topic.prefix}1, up to ${metrics.topic.prefix}(N-1).
metrics.num.cores ${master.service.num.cores} Number of virtual cores for the metrics service
metrics.num.instances 1 Number of instances for the metrics service
metrics.processor.max.instances ${master.service.max.instances} Maximum number of instances for metrics processor service Apache Twill runnable
metrics.processor.memory.mb ${master.service.memory.mb} Memory in megabytes for each metrics processor service Apache Twill runnable instance
metrics.processor.num.cores 1 Number of virtual cores for metrics processor service Apache Twill runnable
metrics.processor.num.instances 1 Number of instances for metrics processor service Apache Twill runnable
metrics.processor.status.bind.address 0.0.0.0 Metrics processor HTTP service bind address
metrics.topic.prefix metrics Topic prefix used to publish metrics in messaging
metrics.worker.threads ${http.service.worker.threads} Number of Netty service worker threads for metrics HTTP services
metrics.processor.queue.size 20000 Maximum size of a queue where the metrics processor temporarily stores newly-fetched metrics in-memory before persisting them
metrics.processor.max.delay.ms 3000 Maximum delay in milliseconds allowed between the latest metrics timestamp and the time when it is processed
metrics.table.migration.sleep.millis 10 Delay in milliseconds after migration of each row from pre-4.3 to post 4.3 metrics tables. This is used to throttle the metrics migration and reduce its the load on the metrics system
app.program.metrics.enabled true Enable or disable emitting metrics from user application programs, by default set to true.
metrics.hbase.max.scan.threads 96 Maximum number of threads used for scanning HBase tables
metrics.table.splits 16 Number of splits for all metrics tables. This property can only be changed before CDAP starts up for the first time and creates the metrics tables. Once all metrics tables are created, this property will not take any effect.
metrics.table.hbase.split.policy
org.apache.hadoop.hbase.regionserver.
DisabledRegionSplitPolicy
The class name that controls the HBase table region split policy. Ideally, auto-splitting should be disabled for HBase tables used by the metrics system.

Monitor Handler

Parameter Name Default Value Description
monitor.handler.service.discovery.
timeout.seconds
1 Timeout in seconds for service discovery used in monitor handler service status check

Runtime Monitor

Parameter Name Default Value Description
app.program.runtime.monitor.polltime.ms 2000 Polling time in milliseconds to poll updates from a runtime
app.program.runtime.monitor.batch.size 1000 Number of events to fetch from a runtime in each poll call
app.program.runtime.monitor.topics.
configs [Final]
audit.topic,data.event.topic,
metadata.messaging.topic,
metrics.topic.prefix:${metrics.
messaging.topic.num},
program.status.event.topic,
log.tms.topic.prefix:${log.publish.num.
partitions}
A comma-separated list of topic config to be monitored by runtime monitor
app.program.runtime.monitor.graceful.
shutdown.ms
5000 Number of milliseconds to wait for the runtime to shutdown after a program execution completed in that runtime.
app.program.runtime.monitor.initialize.
batch.size
100 Number of runtime states to fetch from dataset in each batch during runtime monitor initialization.
app.program.runtime.monitor.server.port 443 CDAP Runtime monitor server port
app.program.runtime.monitor.server.
consume.chunk.size
60000 Approximate size in bytes of each chunk streamed back to Runtime Monitor client
app.program.runtime.monitor.threads 20 Maximum number of threads being used by runtime monitor for runtime monitoring
system.runtime.monitor.retry.policy.
base.delay.ms
100 The base delay between retries in milliseconds
system.runtime.monitor.retry.policy.max.
delay.ms
1000 The maximum delay between retries in milliseconds
system.runtime.monitor.retry.policy.max.
retries
2147483647 The maximum number of retries to attempt before aborting
system.runtime.monitor.retry.policy.max.
time.secs
2147483647 The maximum elapsed time in seconds before retries are aborted
system.runtime.monitor.retry.policy.type exponential.backoff The type of retry policy for log processing. Allowed options: "none", "fixed.delay", or "exponential.backoff".

Notification System

Parameter Name Default Value Description
notification.topic notifications Topic name used to publish notifications in the messaging system

Operational Statistics

Parameter Name Default Value Description
operational.stats.extensions.dir /opt/cdap/master/ext/operations Semicolon-separated list of local directories on the CDAP Master that are scanned for operational statistics extensions
operational.stats.refresh.interval.secs 60 Number of seconds after which operational statistics should be refreshed

Runtime

Parameter Name Default Value Description
runtime.extensions.dir /opt/cdap/master/ext/runtimeproviders Semicolon-separated list of local directories on the CDAP Master that are scanned for program runtime extensions

Queue

Parameter Name Default Value Description
data.queue.config.update.interval 5 Frequency in seconds of updates to the queue consumer configuration used in evicting queue entries on flush and compaction
data.queue.dequeue.tx.percent 30 Percentage of transaction time allowed to spend in dequeue; it should be an integer between 1-100
data.queue.table.presplits 16 Number of splits in the queue table

Remote System Operation

Parameter Name Default Value Description
remote.system.op.exec.threads ${http.service.exec.threads} Number of Netty service executor threads for the remote system operation HTTP service
remote.system.op.service.bind.address 0.0.0.0 Remote system operation HTTP service bind address
remote.system.op.worker.threads ${http.service.worker.threads} Number of Netty service IO worker threads for the remote system operation HTTP service

Retry Policies

Parameter Name Default Value Description
custom.action.retry.policy.base.delay.ms 1000 The base delay between retries in milliseconds
custom.action.retry.policy.max.delay.ms 30000 The maximum delay between retries in milliseconds
custom.action.retry.policy.max.retries 1000 The maximum number of retries to attempt before aborting
custom.action.retry.policy.max.time.secs 600 The maximum elapsed time in seconds before retries are aborted
custom.action.retry.policy.type exponential.backoff The type of retry policy for custom actions. Allowed options: "none", "fixed.delay", or "exponential.backoff".
flow.retry.policy.base.delay.ms 100 The base delay between retries in milliseconds
flow.retry.policy.max.delay.ms 1000 The maximum delay between retries in milliseconds
flow.retry.policy.max.retries 3 The maximum number of retries to attempt before aborting
flow.retry.policy.max.time.secs 10 The maximum elapsed time in seconds before retries are aborted
flow.retry.policy.type none The type of retry policy for flows. Allowed options: "none", "fixed.delay", or "exponential.backoff".
mapreduce.retry.policy.base.delay.ms 1000 The base delay between retries in milliseconds
mapreduce.retry.policy.max.delay.ms 30000 The maximum delay between retries in milliseconds
mapreduce.retry.policy.max.retries 1000 The maximum number of retries to attempt before aborting
mapreduce.retry.policy.max.time.secs 600 The maximum elapsed time in seconds before retries are aborted
mapreduce.retry.policy.type exponential.backoff The type of retry policy for MapReduce programs. Allowed options: "none", "fixed.delay", or "exponential.backoff".
service.retry.policy.base.delay.ms 100 The base delay between retries in milliseconds
service.retry.policy.max.delay.ms 1000 The maximum delay between retries in milliseconds
service.retry.policy.max.retries 3 The maximum number of retries to attempt before aborting
service.retry.policy.max.time.secs 10 The maximum elapsed time in seconds before retries are aborted
service.retry.policy.type none The type of retry policy for services. Allowed options: "none", "fixed.delay", or "exponential.backoff".
spark.retry.policy.base.delay.ms 1000 The base delay between retries in milliseconds
spark.retry.policy.max.delay.ms 30000 The maximum delay between retries in milliseconds
spark.retry.policy.max.retries 1000 The maximum number of retries to attempt before aborting
spark.retry.policy.max.time.secs 600 The maximum elapsed time in seconds before retries are aborted
spark.retry.policy.type exponential.backoff The type of retry policy for Spark programs. Allowed options: "none", "fixed.delay", or "exponential.backoff".
system.log.process.retry.policy.base.
delay.ms
1000 The base delay between retries in milliseconds
system.log.process.retry.policy.max.
retries
1500 The maximum number of retries to attempt before aborting
system.log.process.retry.policy.max.
time.secs
1500 The maximum elapsed time in seconds before retries are aborted
system.log.process.retry.policy.type fixed.delay The type of retry policy for log processing. Allowed options: "none", "fixed.delay", or "exponential.backoff".
system.metadata.retry.policy.base.delay.
ms
100 The base delay between retries in milliseconds
system.metadata.retry.policy.max.delay.
ms
2000 The maximum delay between retries in milliseconds
system.metadata.retry.policy.max.retries 1000 The maximum number of retries to attempt before aborting
system.metadata.retry.policy.max.time.
secs
2147483647 The maximum elapsed time in seconds before retries are aborted
system.metadata.retry.policy.type exponential.backoff The type of retry policy for workers. Allowed options: "none", "fixed.delay", or "exponential.backoff".
system.metrics.retry.policy.base.delay.
ms
1000 The base delay between retries in milliseconds
system.metrics.retry.policy.max.retries 600 The maximum number of retries to attempt before aborting
system.metrics.retry.policy.max.time.
secs
600 The maximum elapsed time in seconds before retries are aborted
system.metrics.retry.policy.type fixed.delay The type of retry policy for metrics publishing. Allowed options: "none", "fixed.delay", or "exponential.backoff".
system.notification.retry.policy.base.
delay.ms
100 The base delay between retries in milliseconds
system.notification.retry.policy.max.
delay.ms
5000 The maximum delay between retries in milliseconds
system.notification.retry.policy.max.
retries
5000 The maximum number of retries to attempt before aborting
system.notification.retry.policy.max.
time.secs
7200 The maximum elapsed time in seconds before retries are aborted
system.notification.retry.policy.type exponential.backoff The type of retry policy for notification publish or subscription. Allowed options: "none", "fixed.delay", or "exponential.backoff".
system.program.state.retry.policy.base.
delay.ms
1000 The base delay between retries in milliseconds
system.program.state.retry.policy.max.
delay.ms
3000 The maximum delay between retries in milliseconds
system.program.state.retry.policy.max.
retries
1000 The maximum number of retries to attempt before aborting
system.program.state.retry.policy.max.
time.secs
600 The maximum elapsed time in seconds before retries are aborted
system.program.state.retry.policy.type fixed.delay The type of retry policy for programs. Allowed options: "none", "fixed.delay", or "exponential.backoff".
worker.retry.policy.base.delay.ms 1000 The base delay between retries in milliseconds
worker.retry.policy.max.delay.ms 30000 The maximum delay between retries in milliseconds
worker.retry.policy.max.retries 1000 The maximum number of retries to attempt before aborting
worker.retry.policy.max.time.secs 600 The maximum elapsed time in seconds before retries are aborted
worker.retry.policy.type exponential.backoff The type of retry policy for workers. Allowed options: "none", "fixed.delay", or "exponential.backoff".
workflow.retry.policy.base.delay.ms 1000 The base delay between retries in milliseconds
workflow.retry.policy.max.delay.ms 30000 The maximum delay between retries in milliseconds
workflow.retry.policy.max.retries 1000 The maximum number of retries to attempt before aborting
workflow.retry.policy.max.time.secs 600 The maximum elapsed time in seconds before retries are aborted
workflow.retry.policy.type exponential.backoff The type of retry policy for workflows. Allowed options: "none", "fixed.delay", or "exponential.backoff".

Router

Parameter Name Default Value Description
router.audit.log.enabled ${security.enabled} Determine if access audit log is enabled
router.audit.path.check.enabled true Determines if to check the number of paths for audit logging
router.bind.address 0.0.0.0 CDAP Router service bind address
router.bind.port 11015 CDAP Router service bind port
router.connection.backlog 20000 The connection backlog in the CDAP Router service
router.connection.idle.timeout.secs 15 Time in seconds after an HTTP request completes that idle router connections are closed
router.server.address 127.0.0.1 CDAP Router service address to which CDAP UI connects
router.server.boss.threads 1 The number of boss threads in the CDAP Router service
router.server.port ${router.bind.port} CDAP Router service port
router.server.worker.threads 10 The number of worker threads in the CDAP Router service
router.ssl.bind.port 10443 CDAP Router service bind port for HTTPS
router.ssl.server.port ${router.ssl.bind.port} CDAP Router service bind port for HTTPS
router.userservice.fallback.strategy random If a RouteConfig is not found for a particular user service, this property is used to set the fallback routing strategy. Allowed options: "random", "smallest", "largest", or "drop". A string comparison of the versions of the user service is used for "smallest" or "largest". The "drop" option will not route the request to any service and will return "service not found".

Security

Parameter Name Default Value Description
cdap.master.kerberos.keytab   The full path to the Kerberos keytab file containing the CDAP Master service's credentials
cdap.master.kerberos.principal   Example: "CDAP_PRINCIPAL/_HOST@EXAMPLE.COM". The Kerberos primary user that should be used to login to the CDAP Master service. Substitute the Kerberos primary (user) for CDAP_PRINCIPAL, and your domain for EXAMPLE.COM. The string "_HOST" will be substituted with the local hostname.
cdap.ugi.cache.expiration.ms 3600000 UserGroupInformation cache entry expiration time in milliseconds. It is only used when impersonation is enabled.
kerberos.auth.enabled ${security.enabled} Determines if Kerberos authentication is enabled
kerberos.auth.relogin.interval.seconds 300 Re-login interval in seconds for Kerberos keytab
security.auth.server.announce.urls   CDAP Authentication service announce URL's separated by comma. Each URL is in the format of protocol://host:port. These are the URL's that clients should use to communicate with the Authentication Server. Leave empty to use the default value generated by the Authentication Server.
security.auth.server.bind.address 0.0.0.0 CDAP Authentication service bind address
security.auth.server.bind.port 10009 CDAP Authentication service bind port
security.auth.server.ssl.bind.port 10010 CDAP Authentication service bind port for HTTPS
security.authentication.basic.realmfile   Username and password file to use when basic authentication is configured
security.authentication.handlerClassName   Name of the authentication implementation to use to validate user credentials
security.authentication.loginmodule.
className
  JAAS LoginModule implementation to use when co.cask.security.server.JAASAuthenticationHandler is configured for ${security.authentication.handlerClassName}
security.authorization.cache.max.entries 100000 Number of entries to hold in the container authorization cache. If set to 0, no caching will be performed.
security.authorization.extension.config.
cache.max.entries
${security.authorization.cache.max.
entries}
Number of entries to hold in the container authorization cache. If set to 0, no caching will be performed.
security.authorization.cache.ttl.secs 300 The time-to-live in seconds for entries in the authorization cache used by programs and system services outside of CDAP Master.
security.authorization.extension.config.
cache.ttl.secs
${security.authorization.cache.ttl.secs} The time-to-live in seconds for entries in the authorization cache used by programs and system services outside of CDAP Master.
security.authorization.enabled false When set to true, all operations in CDAP are authorized using the authorizer implementation found at the property ${security.authorization.extension.jar.path}
security.authorization.extension.jar.
path
  If an external authorization system is used for authorizing operations on CDAP entities, this property sets the path to the bundled JAR file containing the extension code. This jar is only used when authorization is enabled by setting ${security.authorization.enabled} to true.
security.authorization.extension.
operation.time.warn.threshold.ms
5000 Time taken by an authorization extension to perform an enforce operation is recorded and logged at TRACE level. This property sets the upper limit for the time taken by the extension in milliseconds after which it is logged at WARN level rather than TRACE.
security.data.keyfile.path ${local.data.dir}/security/keyfile Path to the secret key file (only used in CDAP Local Sandbox)
security.enabled false Determines if authentication is enabled for CDAP; if true, all requests to CDAP must provide a valid access token
security.keytab.path   The location of Kerberos keytabs used for impersonation. The location can contain ${name}, which will be replaced by the short user name of the principal being impersonated.
security.realm cdap Authentication realm used for scoping security; this value should be unique for each installation of CDAP
security.server.extended.token.
expiration.ms
604800000 Admin tool access token expiration time in milliseconds; defaults to 1 week (internal)
security.server.maxthreads 100 Maximum number of threads that the CDAP Authentication service should use for handling HTTP requests
security.server.token.expiration.ms 86400000 Access token expiration time in milliseconds; defaults to 24 hours
security.store.file.name securestore Name of the secure store file
security.store.file.path ${local.data.dir}/store Location of the encrypted file which holds the secure store entries
security.store.provider none Backend provider for the secure store; use 'none' if no secure store
security.token.digest.algorithm HmacSHA256 Algorithm used for generating MAC of access tokens
security.token.digest.key.expiration.ms 3600000 Duration in milliseconds after which an active secret key used for signing tokens should be retired
security.token.digest.keylength 128 Key length used in generating the secret keys for generating MAC of access tokens
security.token.distributed.parent.znode /${root.namespace}/security/auth Parent node in ZooKeeper used for secret key distribution in Distributed CDAP
ssl.external.enabled false Enable SSL for external services
ssl.internal.enabled false Enable SSL between Router and App Fabric

Stream

Parameter Name Default Value Description
stream.async.queue.size 100 Queue size per async worker thread for queuing up async write requests
stream.async.worker.threads ${stream.worker.threads} Number of async worker threads for handling async write requests
stream.base.dir /streams The directory root for all stream files, relative to the HDFS namespace
stream.batch.buffer.threshold 1048576 Bytes retained in-memory before writing to a new stream file
stream.bind.address 0.0.0.0 Stream HTTP service bind address
stream.bind.port 0 Stream HTTP service bind port; if 0, binds to a random port
stream.consumer.table.presplits 16 Number of splits for the stream consumer table
stream.container.instance.id 0 Instance ID for the stream service container; the actual value will be set at runtime by the system automatically
stream.container.instances 1 Number of YARN container instances for the stream handler; in CDAP Local Sandbox, it is always one
stream.container.memory.mb ${master.service.memory.mb} Memory in megabytes for each YARN container that runs the stream handler
stream.container.num.cores 2 Number of virtual cores for the YARN container that runs the stream handler
stream.event.ttl 9223372036854775807 Default time-to-live in milliseconds (Long.MAX_VALUE) for stream events
stream.file.cleanup.period 300000 Interval in milliseconds for running the stream file cleanup process
stream.file.prefix file Prefix of file name for stream file
stream.index.interval 10000 Interval in milliseconds for emitting new index entry in stream file
stream.instance.file.prefix [Final]
${stream.file.prefix}.${stream.
container.instance.id}
Prefix of file name for stream file per writer instance
stream.notification.threshold 1024 Size of data in megabytes to be ingested by a stream before a notification is published
stream.partition.duration 3600000 Duration in milliseconds of a stream partition
stream.size.schedule.polling.delay 600 Delay in seconds to poll a stream in a StreamSizeSchedule if no notification is received
stream.worker.threads ${http.service.worker.threads} Default number of IO worker threads for the stream HTTP service

UI

Parameter Name Default Value Description
dashboard.bind.address 0.0.0.0 CDAP UI bind address
dashboard.bind.port 11011 CDAP UI bind port
dashboard.router.check.timeout.secs 0 Interval in seconds that the CDAP UI waits before exiting when unable to connect to the CDAP Router service on startup; use a timeout of 0 to wait indefinitely
dashboard.ssl.bind.port 9443 CDAP UI bind port for HTTPS
dashboard.ssl.disable.cert.check false True to disable SSL certificate check from the CDAP UI
http.client.connection.timeout.ms 60000 Connection timeout in milliseconds for internal HTTP requests
http.client.read.timeout.ms 60000 Read timeout in milliseconds for internal HTTP requests
program.heartbeat.interval.seconds 1800 Interval of heartbeat sent from program while it's running, default 30 minutes
program.heartbeat.table.ttl.seconds 2592000 TTL duration for program heartbeat table, by default 30 days

Deprecated Properties

These properties are deprecated as of CDAP 5.0.0 and should not be used. They will be removed in a future release. Replacement properties are listed as noted.

Parameter Name Default Value Description
app.bind.address 0.0.0.0 App Fabric service bind address (deprecated; use ${master.services.bind.address} instead)
audit.kafka.topic audit Apache Kafka topic name to which audit messages are published
dataset.service.bind.address 0.0.0.0 Dataset service bind address (deprecated; use ${master.services.bind.address} instead)
explore.executor.container.instances 1 Number of explore executor instances (deprecated; instance count is always set to 1)
explore.executor.max.instances 1 Maximum number of explore executor instances (deprecated; instance count is always set to 1)
kafka.bind.address ${kafka.server.host.name} CDAP Kafka service bind port (deprecated; use ${kafka.server.host.name} instead)
kafka.bind.port ${kafka.server.port} CDAP Kafka service bind port (deprecated; use ${kafka.server.port} instead)
kafka.default.replication.factor
${kafka.server.default.replication.
factor}
CDAP Kafka service replication factor (deprecated; use ${kafka.server.default.replication.factor} instead)
kafka.log.dir ${kafka.server.log.dirs} CDAP Kafka service log storage directory (deprecated; use ${kafka.server.log.dirs} instead)
kafka.log.retention.hours ${kafka.server.log.retention.hours} The number of hours to keep a log file before deleting it (deprecated; use ${kafka.server.log.retention.hours} instead)
kafka.num.partitions ${kafka.server.num.partitions} Default number of partitions for a topic (deprecated; use ${kafka.server.num.partitions} instead)
kafka.zookeeper.connection.timeout.ms
${kafka.server.zookeeper.connection.
timeout.ms}
The maximum time (in milliseconds) that the client will wait to establish a connection to Zookeeper (deprecated; use ${kafka.server.zookeeper.connection.timeout.ms} instead)
log.cleanup.max.num.files 1000 Maximum number of files scanned in one iteration
log.cleanup.run.interval.mins 1440 Log cleanup interval in minutes
log.retention.duration.days 7 Duration (the time-to-live) in days of saved log files in HDFS retention
log.saver.checkpoint.interval.ms 60000 The time between log saver checkpoints in milliseconds (deprecated: use ${log.process.pipeline.checkpoint.interval.ms} instead)
log.saver.run.memory.megs 1024 Memory in megabytes allocated for log saver instances to run in YARN (deprecated: use ${log.saver.container.memory.mb} instead)
log.saver.run.num.cores 2 Number of cores for each log saver instance in YARN (deprecated: use ${log.saver.container.num.cores} instead)
metrics.messaging.fetcher.limit 2000 Maximum number of metrics messages to be fetched from the messaging fetcher at a time. It is also the maximum number of metric values to be persisted in the metrics store after the number of fetched messages reaches the limit.
notification.kafka.topic notifications Kafka topic name used to publish notifications
notification.transport.system kafka Transport system used by the notification system; can be either 'kafka' or 'stream'
router.client.boss.threads 1 The number of boss threads in the CDAP Router service client
router.client.worker.threads 10 The number of worker threads in the CDAP Router service client
security.auth.server.address 0.0.0.0 CDAP Authentication service bind address (deprecated; use ${security.auth.server.bind.address} instead)
ssl.enabled false Determines if SSL is enabled (deprecated; use ${ssl.external.enabled} instead)
security.auth.server.announce.address   CDAP Authentication service announce address. This is the address in the URL that clients should use to communicate with the Authentication Server. Leave empty to use the default value generated by the Authentication Server. (deprecated; use ${security.auth.server.announce.urls} instead)