🔗Logging and Monitoring

CDAP collects logs and metrics for all of its internal services and user applications. Being able to view these details can be very helpful in debugging CDAP applications as well as analyzing their performance. CDAP gives access to its logs, metrics, and other monitoring information through RESTful APIs, the CDAP UI, as well as a Java Client.

In Hadoop clusters, the programs running inside their containers generate individual log files as part of the container. As an application can consist of multiple programs distributed across the nodes of a cluster, the complete logs for the application may similarly be distributed. As these files generally do not persist beyond the life of the container, they are volatile and not very useful for post-mortem diagnostics and troubleshooting, or for analyzing performance.

To address these issues, the CDAP log framework was designed:

  • to centralize the location of logs, so that the logs of the individual containers of a program can be merged into one;
  • to make logs both persistent—available for later use and analysis—and available while the program is still running;
  • to be extensible using custom log pipelines; and
  • to allow fine-tuning of the logging behavior at the level of an individual application as well as the entire cluster.

🔗Logging Example

This diagram shows the steps CDAP follows when logging a program of an application:

  • Logs are collected from an individual program running in a YARN container.
  • YARN writes the log messages emitted by containers to files inside the container.
  • In addition, CDAP programs publish these messages to Kafka.
  • The CDAP Log Saver Service is configured to read log messages from Kafka. Log saver reads the messages from Kafka, groups them by program or application, buffers and sorts them in memory, and finally persists them to files in HDFS. Each of these files corresponds to one program or application, depending on how the grouping is configured. (This is set by the property log.publish.partition.key, described below.)
  • In addition to persisting logs to files, the Log.saver also emits metrics about the number of log messages emitted by each program. These metrics can be retrieved by querying the CDAP metrics system.
  • For security, the files written out to persistent storage in HDFS have permissions set such that they are accessible only by the cdap user.
../_images/logging-framework-simple.png

CDAP Logging Example: From a YARN container, through Kafka and the CDAP Log Saver Service, to HDFS

Logging uses the standard SLF4J (Simple Logging Facade for Java) APIs and Logback. Logging is configured using instances of Logback's "logback" file, consisting of log pipelines with log appenders:

  • A log pipeline is a process that consumes log events from Kafka, buffers, groups by application or program, sorts, and then invokes the log appenders defined in its configuration.
  • A log appender (or appender) is a Java class, responsible for consuming and processing messages; typically, this includes persisting the log events. It can also, for example, collect metrics, maintain metadata about the storage, or emit alerts when it finds certain messages.

🔗User Application Program Logs

🔗Emitting Log Messages from a Program

CDAP supports application logging through the standard SLF4J (Simple Logging Facade for Java) APIs.

For instance, in a flowlet you can write:

private static Logger LOG = LoggerFactory.getLogger(WordCounter.class);
...
@ProcessInput
public void process(String line) {
  LOG.debug("{}: Received line {}", this.getContext().getTransactionAwareName(), line);
  ... // processing
  LOG.debug("{}: Emitting count {}", this.getContext().getTransactionAwareName(), wordCount);
  output.emit(wordCount);
}

This will emit "debug" level messages from the flowlet when it is processing an event.

🔗Retrieving Log Messages from a Program

The log messages emitted by your application code can be retrieved by:

  • Using the CDAP HTTP RESTful API v3: the Logging HTTP RESTful API details the available contexts that can be called to retrieve different messages.

  • Log messages of a program can be viewed in the CDAP UI. In the Overview page, select the application and program that you are interested in, and click its Logs icon (circled in red here):

    ../_images/logging-cdap-ui-purchase.png

    CDAP UI: Log icon, enabled if there are logs available for viewing

    The logs will be displayed in the log viewer:

    ../_images/logging-cdap-ui-purchase-flow-log.png

    CDAP UI: Log Viewer, showing PurchaseFlow log events, INFO level

🔗Program Log File Locations

Program logs are stored in locations specified by properties in the cdap-site.xml file depending on the mode of CDAP (Standalone or Distributed):

  • For Standalone CDAP: the property log.collection.root (default ${local.data.dir}/logs) is the root location for collecting program logs when in Standalone CDAP.
  • For Distributed CDAP: the property hdfs.namespace (default /cdap) is the base directory in HDFS; program logs are stored in ${hdfs.namespace}/logs (by default, /cdap/logs).

🔗Configuring Program Logs and Log Levels

The logging of an application's programs are configured by the logback-container.xml file, packaged with the CDAP distribution. This "logback" does log rotation once a day at midnight and expires logs older than 14 days. Changes can be made to logback-container.xml; afterwards, applications or programs needs to be restarted for the modified logback file to take effect. Changing the logback-container.xml will only affect programs that are started after the change; existing running programs will not be affected.

  • For Standalone CDAP: As the entire Standalone CDAP runs in a single JVM, the logback.xml file, located in <cdap-sdk-home>/conf, configures both "container" and CDAP system service logging.
  • For Distributed CDAP: the logback-container.xml file is located in /etc/cdap/conf.

You can also use a custom "logback" file with your application, as described in the Developers' Manual section Application Logback.

🔗Changing Program Log Levels

The CDAP Logging HTTP RESTful API can be used to set the log levels of a program, either before a particular run or—in the case of a flow, service, or worker—while it is running. Once changed, they can be reset back to what they were originally by using the reset endpoint.

Only flows, services, or workers can be dynamically changed; other program types are currently not supported. Their log levels can only be changed using their preferences before the program starts.

  • To configure the log level before an application starts, you can add the logger name as the key and log level as the value in the preferences, using the CDAP UI, CDAP CLI, or other command line tools. The logger name should be prefixed with system.log.level.

    For example, if you want to set the log level of the HelloWorld class of the Hello World example to DEBUG, you would use system.log.level.co.cask.cdap.examples.helloworld.HelloWorld as the key and DEBUG as the value. This can be applied to any package or classes. If the logger name is omitted, it will change the log level of ROOT.

  • To configure the log level of a program dynamically—such as a flow, service, or worker which is currently running—see the Logging HTTP RESTful API.

Note: The Logging HTTP RESTful API for changing program log levels can only be used with programs that are run under Distributed CDAP. For changing the log levels of programs run under Standalone CDAP, you either modify the logback.xml file, or you provide a "logback.xml" with your application before it is deployed.

🔗CDAP System Services Logs

As CDAP system services run either on cluster edge nodes or in YARN containers, their logging and its configuration depends on the service and where it is located.

🔗Retrieving Log Messages from a System Service

The log messages emitted by CDAP system services can be retrieved by:

  • Using the CDAP HTTP RESTful API v3: the Logging HTTP RESTful API details downloading the logs emitted by a system service.

  • Log messages of system services can be viewed in the CDAP UI. In the Administration page (accessible through the CDAP menu on the far-right) click either a Logs icon (such as the HBase log icon circled in red, in the Component Overview section):

    ../_images/logging-cdap-ui-administration.png

    CDAP UI: HBase Component, with its Log icon circled in red

    You can click on a CDAP system service and a window will appear displaying a log icon (in this case, after clicking App Fabric), as shown here circled in red:

    ../_images/logging-cdap-ui-administration-app-fabric.png

    CDAP UI: CDAP App Fabric popup window, with its Log icon circled in red

🔗System Service Log File Locations

The location of CDAP system service logs depends on the mode of CDAP (Standalone or Distributed) and the Hadoop distribution:

  • For Standalone CDAP: system logs are located in <CDAP-SDK-HOME>/logs.
  • For Distributed CDAP: system logs are located in /var/log/cdap (with the exception of Cloudera Manager-based clusters). With Cloudera Manager installations, system log files are located in directories under /var/run/cloudera-scm-agent/process.

🔗Configuring System Service Logs

  • CDAP system services that run in YARN containers, such as the Metrics Service, are configured by the same logback-container.xml that configures user application program logging.
  • CDAP system services that run on cluster edge nodes, such as CDAP Master or Router, are configured by the logback.xml Changes can be made to logback.xml; afterwards, the service(s) affected will need to be restarted for the modified "logback" file to take effect.
    • For Standalone CDAP: the logback.xml file is located in /etc/cdap/conf.
    • For Distributed CDAP: the file logback.xml file, located in <cdap-sdk-home>/conf, configures both "container" and CDAP system service logging.

🔗Configuring the Log Saver Service

The Log Saver Service is the CDAP service that reads log messages from Kafka, processes them in log pipelines, persists them to HDFS, and sends metrics on logging to the Metrics Service.

In addition to the default CDAP Log Pipeline, you can specify custom log pipelines that are run by the log saver service and perform custom actions.

The cdap-site.xml file has properties that control the writing of logs to Kafka, the log saver service, the CDAP log pipeline, and any custom log pipelines that have been defined.

🔗Writing Logs to Kafka

These properties control the writing of logs to Kafka:

Parameter Name Default Value Description
log.kafka.topic logs.user-v2 Kafka topic name used to publish logs
log.publish.num.partitions 10 Number of CDAP Kafka service partitions to publish the logs to
log.publish.partition.key program Publish logs from an application or a program to the same partition. Valid values are "application" or "program". If set to "application", logs from all the programs of an application go to the same partition. If set to "program", logs from the same program go to the same partition. Changes to this property requires restarting of all CDAP applications.

Notes:

  • If an external Kafka service is used (instead of the CDAP Kafka service), the number of partitions used for log.publish.num.partitions must match the number set in the external service for the topic being used to publish logs (log.kafka.topic).
  • By default, log.publish.partition.key is set to program, which means that all logs for the same program go to the same partition. Set this to application if you want all logs from an application to go to the same instance of the Log Saver Service.

🔗Log Saver Service

These properties control the Log Saver Service:

Parameter Name Default Value Description
log.saver.max.instances ${master.service.max.instances} Maximum number of log saver instances to run in YARN
log.saver.num.instances 1 Number of log saver instances to run in YARN
log.saver.container.memory.mb 1024 Memory in megabytes for each log saver instance to run in YARN. This is explicitly set differently than ${master.service.memory.mb} as the log saver requires more memory to run than the CDAP Master service.
log.saver.container.num.cores 2 Number of virtual cores for each log saver instance in YARN

Log saver instances should be from a minimum of one to a maximum of ten. The maximum is set by the number of Kafka partitions (log.publish.num.partitions), which by default is 10.

🔗Log Pipeline Configuration

Configuration properties for logging and custom log pipelines are shown in the documentation of the logging properties section of the cdap-site.xml file.

The CDAP log pipeline is configured by settings in the cdap-site.xml file.

Custom log pipelines are configured by a combination of the settings in the cdap-site.xml file and a "logback" file used to specify the custom pipeline. The XML file is placed in the log.process.pipeline.config.dir, a local directory on the CDAP Master node that is scanned for log processing pipeline configurations. Each pipeline is defined by a file in the Logback XML format, with .xml as the file name extension.

These properties control the CDAP log pipeline:

Parameter Name Default Value Description
log.pipeline.cdap.dir.permissions 700 Permissions used by the system log pipeline when creating directories
log.pipeline.cdap.file.cleanup.interval.
mins
1440 Time in minutes between runs of the log cleanup thread
log.pipeline.cdap.file.cleanup.
transaction.timeout
60 Transaction timeout in seconds used by the log cleanup thread. This should not be greater than ${data.tx.max.timeout}.
log.pipeline.cdap.file.max.lifetime.ms 21600000 Maximum time span in milliseconds of a log file created by the system log pipeline
log.pipeline.cdap.file.max.size.bytes 104857600 Maximum size in bytes of a log file created by the system log pipeline
log.pipeline.cdap.file.permissions 600 Permissions used by the system log pipeline when creating files
log.pipeline.cdap.file.retention.
duration.days
7 Time in days a log file is retained

These properties control both the CDAP log pipeline and custom log pipelines:

Parameter Name Default Value Description
log.process.pipeline.checkpoint.
interval.ms
10000 The time between log processing pipeline checkpoints in milliseconds
log.process.pipeline.config.dir /opt/cdap/master/ext/logging/config A local directory on the CDAP Master that is scanned for log processing pipeline configurations. Each pipeline is defined by a file in the logback XML format, with ".xml" as the file name extension.
log.process.pipeline.event.delay.ms 2000 The time a log event stays in the log processing pipeline buffer before writing out to log appenders in milliseconds. A longer delay will result in better time ordering of log events before presenting to log appenders but will consume more memory.
log.process.pipeline.kafka.fetch.size 1048576 The buffer size in bytes, per topic partition, for fetching log events from Kafka
log.process.pipeline.lib.dir /opt/cdap/master/ext/logging/lib Comma-separated list of local directories on the CDAP Master scanned for additional library JAR files to be included for log processing

The log.process.pipeline.* properties can be over-ridden and specified at the custom pipeline level by providing a value in a pipeline's "logback" file for any of these properties.

🔗Logging Framework

This diagram shows in greater detail the components and steps CDAP follows when logging programs of an application and system services with the logging framework:

../_images/logging-framework.png

CDAP Logging Framework: From YARN containers, through Kafka and the Log Saver Service, to HDFS

  • Logs are collected from individual programs running in YARN containers.

  • YARN writes the log messages emitted by containers to files inside the container.

  • In addition, CDAP programs publish these messages to Kafka.

  • CDAP System Services run (depending on the service) either on cluster edge nodes or in YARN containers. Where they run determines the file that configures that service's logging.

  • The Log Saver Service (log.saver) is configured to read log messages for the logs.user-v2 Kafka topic (set by the property log.kafka.topic). The number of log saver instances can be scaled to process the Kafka partitions in parallel, if needed.

    Log saver, by default, runs only the CDAP Log Pipeline: it reads the messages from Kafka, groups them by program or application, buffers and sorts them in memory, and finally persists them to files in HDFS. Each of these files corresponds to one program or application, depending on how the grouping is configured. (This is set by the property log.publish.partition.key, described below.)

    Note: These files are configured to rotate based on time and size; they can be changed using the properties log.pipeline.cdap.file.max.size.bytes and log.pipeline.cdap.file.max.lifetime.ms in the cdap-site.xml file as described in Log Pipeline Configuration.

  • For security, the files written out to persistent storage in HDFS have permissions set such that they are accessible only by the cdap user.

  • In addition, custom log pipelines can be configured by adding an XML file in a prescribed location. Each pipeline buffers log messages in memory and sorts them based on their timestamp.

  • In addition to persisting logs to files, the log saver also emits metrics about the number of log messages emitted by each program. These metrics can be retrieved by querying the CDAP metrics system.

    These tables list the metrics from the section Available System Metrics of the Metrics HTTP RESTful API. See that section for further information.

    Application Logging Metric Description
    system.app.log.{error, info, warn} Number of error, info, or warn log messages logged by an application or applications
    System Services Logging Metric Description
    system.services.log.{error, info, warn} Number of error, info, or warn log messages logged by a system service or system services

🔗Custom Log Pipelines

For a custom log pipeline, create and configure a "logback" file, configuring loggers, appenders, and properties based on your requirements, and place the file at the path specified in the cdap-site.xml file by the property log.process.pipeline.config.dir of the cdap-site.xml file.

Each custom pipeline requires a unique name. Properties controlling the pipeline (the log.process.pipeline.* properties) are described above.

For every XML file in the log.process.pipeline.config.dir directory, a separate log pipeline is created. As they are separate Kafka consumers and processes, each pipeline is isolated and independent of each other. The performance of one pipeline does not affect the performance of another. Though CDAP has been tested with multiple log pipelines and appenders, the fewer of each that are specified will provide better performance.

🔗Configuring Custom Log Pipelines

CDAP looks for "logback" files located in a directory as set by the property log.process.pipeline.config.dir in the cdap-site.xml file. In the default configuration, this is:

  • For Standalone CDAP: <cdap-sdk-home>/ext/logging/config
  • For Distributed CDAP: /opt/cdap/master/ext/logging/config

🔗Example "logback" File for a Custom log pipeline

Here is an example "logback" file, using two appenders (STDOUT and rollingAppender). This file must be located (as noted above) with a file extension of .xml:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
  <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
    <encoder>
      <pattern>%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n</pattern>
    </encoder>
  </appender>

  <property name="cdap.log.saver.instance.id" value="instanceId"/>

  <appender name="rollingAppender" class="co.cask.cdap.logging.plugins.RollingLocationLogAppender">

    <!-- log file path will be created by the appender as: <basePath>/<namespace-id>/<application-id>/<filePath> -->
    <basePath>plugins/applogs</basePath>
    <filePath>securityLogs/logFile-${cdap.log.saver.instance.id}.log</filePath>

    <!-- cdap is the owner of the log files directory, so cdap will get read/write/execute permissions.
    Log files will be read-only for others. -->
    <dirPermissions>744</dirPermissions>

    <!-- cdap is the owner of the log files, so cdap will get read/write permissions.
    Log files will be read-only for others -->
    <filePermissions>644</filePermissions>

    <!-- It is an optional parameter, which takes number of miliseconds.
    Appender will close a file if it is not modified for fileMaxInactiveTimeMs
    period of time. Here it is set for thirty minutes. -->
    <fileMaxInactiveTimeMs>1800000</fileMaxInactiveTimeMs>

    <rollingPolicy class="co.cask.cdap.logging.plugins.FixedWindowRollingPolicy">
      <!-- Only specify the file name without a directory, as the appender will use the
      appropriate directory specified in filePath -->
      <fileNamePattern>logFile-${cdap.log.saver.instance.id}.log.%i</fileNamePattern>
      <minIndex>1</minIndex>
      <maxIndex>9</maxIndex>
    </rollingPolicy>

    <triggeringPolicy class="co.cask.cdap.logging.plugins.SizeBasedTriggeringPolicy">
      <!-- Set the maximum file size appropriately to avoid a large number of small files -->
      <maxFileSize>100MB</maxFileSize>
    </triggeringPolicy>

    <encoder>
      <pattern>%-4relative [%thread] %-5level %logger{35} - %msg%n</pattern>
      <!-- Do not flush on every event -->
      <immediateFlush>false</immediateFlush>
    </encoder>
  </appender>

  <logger name="co.cask.cdap.logging.plugins.RollingLocationLogAppenderTest" level="INFO">
    <appender-ref ref="rollingAppender"/>
  </logger>

  <root level="INFO">
    <appender-ref ref="STDOUT"/>
  </root>

</configuration>

🔗Custom Log Appender

You can use any existing logback appender. The RollingLocationLogAppender—an extension of the Logback FileAppender—lets you use HDFS locations in your log pipelines.

If you need an appender beyond what is available through Logback or CDAP, you can write and implement your own custom appender. See the Logback documentation for information on this.

As the CDAP LogFramework uses Logback's Appender API, your custom appender needs to implement the same Appender interface. Access to CDAP's system components (such as datasets, metrics, LocationFactory) are made available to the AppenderContext, an extension of Logback's LoggerContext:

public class CustomLogAppender extends FileAppender<ILoggingEvent> implements Flushable, Syncable {
  . . .
  private LocationManager locationManager;

  @Override
  public void start() {
    if (context instanceof AppenderContext) {
      AppenderContext context = (AppenderContext) this.context;
      locationManager = new LocationManager(context.getLocationFactory() . . .);
      . . .
    }
  }

  @Override
  public void doAppend(ILoggingEvent eventObject) throws LogbackException {
    try {
      . . .
      OutputStream locationOutputStream = locationManager.getLocationOutputStream . . .
      setOutputStream(locationOutputStream);
      writeOut(eventObject);
      . . .
    } catch
     . . .
  }
}

Adding a dependency on the cdap-watchdog API will allow you to access the https://github.com/caskdata/cdap/blob/release/4.1/cdap-watchdog-api/src/main/java/co/cask/cdap/api/logging/AppenderContext.java in your appender.

🔗Monitoring Utilities

CDAP can be monitored using external systems such as Nagios; a Nagios-style plugin is available for checking the status of CDAP applications, programs, and the CDAP instance itself.

🔗Additional References

For additional information beyond here, see the Logging, Metrics, and Monitoring HTTP RESTful APIs, the Java Client, and the Application Logback.