CDAP Replication

This document lists the detailed steps required for setting up CDAP replication, where one CDAP cluster (a master) is replicated to one or more additional CDAP slave clusters.

Note: As described below, CDAP must have invalid transaction list pruning disabled, as this cannot be used with replication.

These steps should be reviewed (and the Cluster Setup completed) prior to starting CDAP.

Cluster Setup

CDAP replication relies on the cluster administrator setting up replication on HBase, HDFS, Hive, and Kafka.

  • It is assumed that CDAP is only running on the master cluster.
  • It is assumed that you have not started CDAP before any of these steps.

HBase

  • Install the relevant cdap-hbase-compat package on all HBase nodes of your cluster in order to use the replication status coprocessors. Note that due to HBase limitations, these coprocessors cannot be used on HBase 0.96 or 0.98.

    Available "compat" packages are:

    • cdap-hbase-compat-1.0
    • cdap-hbase-compat-1.0-cdh
    • cdap-hbase-compat-1.0-cdh5.5.0
    • cdap-hbase-compat-1.1
    • cdap-hbase-compat-1.2-cdh5.7.0

    Note: For Cloudera Manager, all of these packages will be installed in your "Parcel Directory" and as described below, you will add the appropriate one to your HBASE_CLASSPATH.

  • Modify hbase-site.xml on all HBase nodes to enable HBase replication, and to use the CDAP replication status coprocessors:

    <property>
      <name>hbase.replication</name>
      <value>true</value>
    </property>
    <property>
      <name>hbase.coprocessor.regionserver.classes</name>
      <value>co.cask.cdap.data2.replication.LastReplicateTimeObserver</value>
    </property>
    <property>
      <name>hbase.coprocessor.wal.classes</name>
      <value>co.cask.cdap.data2.replication.LastWriteTimeObserver</value>
    </property>
    
  • Modify hbase-env.sh on all HBase nodes to include the HBase coprocessor in the classpath:

    export HBASE_CLASSPATH="$HBASE_CLASSPATH:/<cdap-home>/<hbase-compat-version>/coprocessor/*"
    
    # <cdap-home> will vary depending on your distribution and installation
    # For Cloudera Manager/CDH: it is "${PARCEL_ROOT}/CDAP" where ${PARCEL_ROOT} is your configured "Parcel Directory"
    # Ambari/HDP, MapR, packages: it is "/opt/cdap"
    #
    # <hbase-compat-version> is the HBase package compatible with the distribution
    
    # For example, if you are on a Cloudera Manager cluster with CDH 5.5.x:
    export HBASE_CLASSPATH="$HBASE_CLASSPATH:/opt/cloudera/parcels/CDAP/hbase-compat-1.0-cdh5.5.0/coprocessor/*"
    
  • Restart HBase master and regionservers.

  • Enable replication from master to slave:

    master_hbase_shell> add_peer '[slave-name]', '[slave-zookeeper-quorum]:/[slave-zk-node]'
    
    # For example:
    master_hbase_shell> add_peer 'slave', 'slave.example.com:2181:/hbase'
    
  • Enable replication from slave to master:

    slave_hbase_shell> add_peer '[master-name]', '[master-zookeeper-quorum]:/[master-zk-node]'
    
    # For example:
    slave_hbase_shell> add_peer 'master', 'master.example.com:2181:/hbase'
    
  • Confirm that HBase replication is working:

    master_hbase_shell> create 'repltest', 'f'
    
    slave_hbase_shell> create 'repltest', 'f'
    
    master_hbase_shell> enable_table_replication 'repltest'
    
    slave_hbase_shell> alter 'repltest', { 'NAME' => 'f', 'REPLICATION_SCOPE' => 1 }
    
    master_hbase_shell> put 'repltest', 'masterrow', 'f:v1', 'v1'
    
    slave_hbase_shell> put 'repltest', 'slaverow', 'f:v1', 'v1'
    
    master_hbase_shell> scan 'repltest'
    
    slave_hbase_shell> scan 'repltest'
    

HDFS

Set up HDFS replication using the solution provided by your distribution. HDFS does not have true replication, but it is usually achieved by scheduling regular distcp jobs.

Hive

Set up replication for the database backing your Hive Metastore. Note that this will simply replicate the Hive metadata—which tables exist, table metadata, etc.—but not the data itself. It is assumed you will not be running Hive queries on the slave until after a manual failover occurs.

For example, to setup MySQL 5.7 replication, follow the steps described at Setting Up Binary Log File Position Based Replication.

Kafka

Set up replication for the Kafka brokers you are using. Kafka MirrorMaker is the most common solution. See Mirroring data between clusters and Kafka mirroring (MirrorMaker) for additional information.

CDAP Setup

CDAP requires that you provide an extension that will perform HBase-related DDL operations on both clusters instead of only on a single cluster. To create the extension, you must implement the HBaseDDLExecutor class. Details on implementing this class, a sample implementation, and example files are available in the Appendix: HBaseDDLExecutor.

CDAP must have invalid transaction list pruning disabled, as this cannot be used with replication.

To deploy your extension (once compiled and packaged as a JAR file, such as my-extension.jar), run these steps on both your master and slave clusters. These steps assume <cdap-home> is /opt/cdap:

  1. Create an extension directory, such as:

    $ mkdir -p /opt/cdap/master/ext/hbase/repl
    
  2. Copy your JAR to the directory:

    $ cp my-extension.jar /opt/cdap/master/ext/hbase/repl/
    
  3. Modify cdap-site.xml to use your implementation of HBaseDDLExecutor:

    <property>
      <name>hbase.ddlexecutor.extension.dir</name>
      <value>/opt/cdap/master/ext/hbase</value>
    </property>
    
  4. Modify cdap-site.xml with any properties required by your executor. Any property prefixed with cdap.hbase.spi.hbase. will be available through the HBaseDDLExecutorContext object passed into your executor's initialize method:

    <property>
      <name>cdap.hbase.spi.hbase.zookeeper.quorum</name>
      <value>slave.example.com:2181/cdap</value>
    </property>
    <property>
      <name>cdap.hbase.spi.hbase.zookeeper.session.timeout</name>
      <value>60000</value>
    </property>
    <property>
      <name>cdap.hbase.spi.hbase.cluster.distributed</name>
      <value>true</value>
    </property>
    <property>
      <name>cdap.hbase.spi.hbase.bulkload.staging.dir</name>
      <value>/tmp/hbase-staging</value>
    </property>
    <property>
      <name>cdap.hbase.spi.hbase.replication</name>
      <value>true</value>
    </property>
    
  5. Modify cdap-site.xml to disable invalid transaction list pruning, as it cannot be used with replication:

    <property>
      <name>data.tx.prune.enable</name>
      <value>false</value>
      <description>
        Enable invalid transaction list pruning
      </description>
    </property>
    
  6. Before starting CDAP on the master cluster, run a command on the slave cluster to load the HBase coprocessors required by CDAP onto the slave's HDFS:

    [slave] $ cdap setup coprocessors
    
  7. Start CDAP on the master cluster:

    [master] $ cdap master start
    

Manual Failover Procedure

To manually failover from the master to a slave cluster, follow these steps:

  1. Stop all CDAP programs on the master cluster

  2. Stop CDAP on the master cluster

  3. Copy any HDFS files that have not yet been copied using either your distro's solution or distcp

  4. Run the CDAP replication status tool to retrieve the cluster state:

    [master] $ cdap run co.cask.cdap.data.tools.ReplicationStatusTool -m -o /tmp/master_state
    
  5. Copy the master state onto your slave cluster:

    [master] $ scp /tmp/master_state <slave>:/tmp/master_state
    
  6. Verify that replication has copied the required data onto the slave:

    [slave] $ cdap run co.cask.cdap.data.tools.ReplicationStatusTool -i /tmp/master_state
    ...
    Master and Slave Checksums match. HDFS Replication is complete.
    HBase Replication is complete.
    
  7. Run Hive's metatool to update the locations for the Hive tables:

    [slave] $ hive --service metatool -updateLocation hdfs://[slave-namenode-host]:[slave-namenode-port] \
                 hdfs://[master-namenode-host]:[master-namenode-port] \
                 -tablePropKey avro.schema.url -serdePropKey avro.schema.url
    
  8. Start CDAP on the slave:

    [slave] $ cdap master start
    

Upgrading Replicated Clusters

Consider a scenario where CDAP is running on the master cluster with data getting replicated on the slave cluter. To upgrade the replicated clusters, follow these steps:

  1. Update the CDAP repository definition on the slave cluster by running either of these methods:
  • On RPM using Yum:

    $ sudo curl -o /etc/yum.repos.d/cask.repo http://repository.cask.co/centos/6/x86_64/cdap/5.1/cask.repo
    

    This will create the file /etc/yum.repos.d/cask.repo with:

    [cask]
    name=Cask Packages
    baseurl=https://repository.cask.co/centos/6/x86_64/cdap/5.1
    enabled=1
    gpgcheck=1
    

    Add the Cask Public GPG Key to your repository:

    $ sudo rpm --import http://repository.cask.co/centos/6/x86_64/cdap/5.1/pubkey.gpg
    

    Update your Yum cache:

    $ sudo yum makecache
    
  • On Debian using APT:

    $ sudo curl -o /etc/apt/sources.list.d/cask.list http://repository.cask.co/ubuntu/precise/amd64/cdap/5.1/cask.list
    

    This will create the file /etc/apt/sources.list.d/cask.list with:

    deb [ arch=amd64 ] http://repository.cask.co/ubuntu/precise/amd64/cdap/5.1 precise cdap
    

    Add the Cask Public GPG Key to your repository:

    $ curl -s http://repository.cask.co/ubuntu/precise/amd64/cdap/5.1/pubkey.gpg | sudo apt-key add -
    

    Update your APT-cache:

    $ sudo apt-get update
    
  1. Update the CDAP packages on the slave cluster by running either of these methods:
  • On RPM using Yum:

    $ sudo yum upgrade 'cdap*'
    
  • On Debian using APT:

    $ sudo apt-get install --only-upgrade '^cdap.*'
    
  1. Generate the coprocessor jar on the slave cluster corresponding to the newly downloaded CDAP version:

    $ sudo -u <cdap-user> cdap setup coprocessors
    

    The coprocessor jar will be stored on the HDFS and path to the jar will be printed on console.

  2. Copy the coprocessor jar to the same HDFS location on the master cluster.

  3. Stop all CDAP services on the master cluster.

  4. Make sure that all HBase and HDFS data is replicated on the slave cluster.

  5. Run the upgrade tool on the slave cluster.:

    $ sudo -u <cdap-user> /opt/cdap/master/bin/cdap run co.cask.cdap.data.tools.UpgradeTool upgrade
    

    Since replication is enabled and HBaseDDLExecutor is in place, HBase tables on the master cluster will also get upgraded.

  6. New version of the CDAP can be started on the slave cluster now.

  7. Download and install the new CDAP packages on the master cluster using the steps mentioned above for slave. This is to keep the master ready when there is a failover from slave. Note that there is no need to run the upgrade tool on the master since HBase tables on master were already upgraded when the upgrade tool was executed on the slave.