HBASE ROWLOG LIBRARY

WHY?

While designing and developing the Lily content repository we identified the need for two components: a Message Queue and a Write Ahead Log. We implemented both on top of a RowLog library which is described and made available below. Although the RowLog library was created to fulfill two Lily use cases, it was designed to be useful outside the context of Lily as well.

First, let's talk about the use cases of a MQ and WAL and the need for a RowLog solution that came out of those use cases:

Message Queue (MQ)

The Lily content repository uses a Solr index to make stored content queryable. An indexer component inspects the data of a record and creates a corresponding document in the Solr index containing the fields that should be queryable. Each time a Lily record is updated (created, updated or deleted), the indexer also needs to update the Solr index so that its content stays in line with what is in the records of Lily content repository. To keep the indexer informed of which changes should be made, a messaging mechanism is used that sends a message to the indexer each time a record gets updated. 

Delivering such message can be done in an asynchronous way, meaning that the execution of the record update call can be returned to the client before the message is delivered to the indexer component. Although delivery of the message can be done asynchronously, delivery of the message must be guaranteed though, even in case of node failures. Messages must therefore be persisted, otherwise if a message would get lost the index would become out-of-date and a full index rebuild would be required. 

Other requirements for such message are that they must be delivered in-order and that no two messages of a same record should be processed concurrently. 

It is clear that we need some kind of Message Queue system. Existing systems like ActiveMQ, RabitMQ, ... were considered. But we decided to build our own MQ system on top of HBase since we wanted to avoid adding extra technologies and persistency layers (e.g. MySql for storing messages) to Lily which would add an extra level of complexity with respect to configuration, administration, deployment,...  Also, we want all components in Lily to follow the same properties as we expect from HBase wrt scalability (partitioning), failover and reliability (replication).

Besides the indexer, other components could be thought of for needing a message queue. For instance, an audit logging system, an email notification system, ...

Write Ahead Log (WAL)

The Message Queue use case shows a need for being able to deliver asynchronous messages to other components as a result of updating a record. We also have a need for executing actions as a consequence of a record update in a synchronous way. We will call these the secondary actions. Examples of such secondary actions are pushing a message to the MQ (see above) and updating secondary indexes in HBase (cfr http://www.lilyproject.org/lily/about/playground/hbaseindexes.html).

Secondary actions need to be performed before execution of the record update can be returned to the client. In case of failing nodes however, this cannot be guaranteed. What still needs to be guaranteed though is that the secondary actions will eventually be executed, and that they will be executed before any other updates on the record are allowed or performed. 

An extra requirement is that it should be possible to execute the secondary actions related to one record update in-order. What we mean with this is that for instance the secondary indexes need to be up to date before a message can be put on the message queue in order to update the solr index.

In general, we assume that the execution of secondary actions should succeed, thus that the subsystems (such as the message queue) that we call for these actions are highly-available, and if they would not be available, that it is okay for the repository to deny any further updates to the records.

To enforce the execution of secondary actions before a new update on a record can happen, we need to keep track of the still-to-be-executed secondary actions in an index which we will call the Write-Ahead Log (WAL). When a new update comes in, the WAL can be checked for any open secondary actions and execute them first. In case no new updates for the record arrive soon, we would still like the secondary actions to be executed within a reasonable time of their original update. Therefore a cleanup process is needed that periodically checks the WAL and executes any open secondary actions.

When secondary actions themselves fail (rather than a repository node failure) we have to make a distinction between :

  • The execution of secondary actions left open from a previous update
    • In this case, if the execution fails, we just throw an exception to the client. We will not yet have started on the update supplied by the client.
  • The execution of the secondary actions related to the current update.
    • In this case, we will already have performed the record update of the client, it is only the execution of the secondary actions that failed.

The described WAL does not completely comply to what is generally understood under a write-ahead log. For instance we don't need a concept of transaction across multiple updates on different rows. Also the update on the record is being performed instead of first writing or describing it on a log. There are however the secondary actions that need to be performed before a new update on the record is allowed. We therefore decided to stick with the name.

WHAT?

The solution that we came up with for the WAL was very similar to the solution for the MQ. We therefore created a component - the Rowlog - that could be used for both use cases.

For both the WAL and MQ we want to feed messages to either secondary actions or message consumers (indexer). These messages describe the update that happened to one record, or more generally to one row in a HBase table. For this reason we called the component the 'Rowlog'. This is a generalised component which provides the message queue-like behaviour, and where the context of the messages are updates to one HBase row. The WAL and MQ in our Lily CR are both based on this Rowlog component.

HOW?

The picture below shows the components of the Rowlog library, which will be described below.

RowLog implementation overview

The RowLog Implementation

Subscriptions and Listeners

A single message that has been put on the rowlog can be processed for multiple purposes of which the indexer is one example. Each such 'purpose' is represented by a subscription. For each subscription the message will be processed once by a listener. A listener can be considered as a 'worker' that accepts a message, does the necessary work in order to process the message and returns when it is finished. There can be multiple isntances of listeneres, but only one of the will be processing a single message (see rowlog processor below). The piece of code representing the listener could be running local, meaning in the same VM as the Rowlog or remotely in another VM on the same or another node. For the remote listeners communication is done over a Netty communication channel.

RowLog

The RowLog class is the main entry point of the rowlog component. Each node in the system has an instance of the rowlog class.

Before an update happens on a row, a message describing the update that is about to happen needs to be given to the rowlog (RowLog#putMessage()). The rowlog accepts the message and stores it for later processing and to avoid message loss in case of node failures. It also maintains the order in which messages arrive on the rowlog based on a timestamp. The actual storing of the message happens in two places: a global queue and a row-local queue which are described below.

Processing

The message that has been put on the rowlog can then be processed either synchronous or asynchronous.

Synchronous processing happens when the subscriptions need to handle the message before the the actual record-update call returns to the client (cfr the WAL use case). In such case the rowlog will be asked to process the message immediately after the update on the row has happened (RowLog#processMessage()).

Asynchronous processing is used when the the record-update call can return to the client before the subscriptions have handled the message (cfr the MQ use case). In the background a rowlog processor is running which monitors the messages that are put on the rowlog, and forwards them for handling to one listener of each subscription. This rowlog processor will also monitor the rowlog for any synchronous messages that have not been processed immediately (due to node failures) and have those handled by the listeners of the subscriptions as well.

The rowlog processor also makes sure that no two listeners of the same subscription can be receiving messages about the same record at the same time. This requirement has been introduced in the latest iteration of the code (Lily 1.0) in order to be able to optimize code paths where having to take and maintain locks to avoid conflicts between such listeners is no longer needed.

The bookkeeping of the messages is done by so called global and row-local queues:

Global Queue

Each message is stored in a global queue. In fact, for each subscription a entry will be put for the message in the global queue. This global queue is implemented by a 'RowLogShard'. This shard is a HBase table in which a message is stored with as key a subscription id, a timestamp to maintain the order in which the messages arrived, a sequence number which indicates the position of the message within the set of messages that happened for that particular row, and the row key indicating which HBase row the message is about. 

The rowlog processor can limit itself to monitoring this table in order to know for which records and which subscriptions a message has arrived. It does not need to monitor all regions of the record table to get this information.

When adding new messages in the order of the timestamp in which they arrive, they will always be appended to the end of the HBase table. We also expect that, in normal operation (especially in the context of the WAL use case), the number of messages in the table will always remain small as they will be processed soon after they have been put on it. For both these reasons there is a risk that the same region will always be addressed, and at a high influx. This one region could become a bottleneck.
To avoid this bottleneck, multiple rowlog shards could be used and messages could be spread over those shards. For each shard there should then be a rowlog processor monitoring this shard. This is however not implemented yet in the currently available code (Lily 1.0). Due to the above mentioned optimization on the rowlog processor, those shards should be row-aware so that differen rowlog processors would not be giving messages about the same record to listeners of the same subscription.

Row-Local Queue

The row-local queue is a queue of the messages which is maintained on the same HBase row of which the messages are about. Within this queue, the order of the messages is maintained based on a sequence number.

For each message, a payload and an execution stated is stored in a column family which is separate from the row's actual data. The payload describes the event the message is about and which is needed by subscriptions to be able to perform their processing of the message. The actual producers and consumers of the message will have to agree on the content that is stored in this payload.

Next to the payload an execution state of the message will be stored. The execution state maintains for each subscription if the message has been processed already or not.

There are several reasons why we need such a row-local queue:

  • For the WAL use case, a message that is put on the rowlog should be about an update that actually happened on the row. If a message would be put first on the rowlog and then the actual update is performed, a node-failure could cause a message to be present on the rowlog for an unpdate that never actually happened. Also vice-versa, when an update happened on a row the message should be present on the rowlog so that it can be executed before a new update is allowed to happen on the row. If the update on the row would be performed before the message is put on the rowlog, a node-failure could cause a message never to be put on the rowlog. The use of a row-local queue allows to perform an update on the data of the row and to store the message information in a single HBase Put operation. (Please see the javadoc for information on how to write both with the same put operation.) So the update of the row's data and the row-local queue both happen or both don't happen. It is possible that a message is present on the global queue that has no counterpart in the row-local queue. This message will be discarded since it is about an update that never happened.
  • The messages that are stored on the RowLogShards and that are sent around to the consumers can be kept small. The bulk of the message information can be stored in the payload on the row itself. And the consumers can request this information only when they are really interested in the message inted to process it.
  • Besides the need to be able to see if there are any preceding messages, it can also be useful to get an overview of all messages that are open for a row and even see if there are any 'future' messages. A consumer could then for instance choose to optimize its work and combine the processing of all open messages into one action.

The RowLog HBase schema

The RowLog HBase schema

Rowlog Configuration Manager

The configuration of the rowlogs, their subscriptions and listeners is managed by the rowlog configuration manager. This information is then used by the rowlog and rowlog processor to know for which subscriptions messages should be handled and where the listeners for those subscriptions can be found. It stores its information in Zookeeper.Before a Rowlog can be used it should be registered. The properties that can be configured for a rowlog are :

  • respect order : true if the order of the subscritpions should be respected when processing messages
  • enable notify : true if the notification mechanism should be enabled informing the rowlog processor when new messages are put on the rowlog
  • notify delay : the time that should exist between two notification messages being sent in order to avoid an overload of notification messages when there are a lot of record updates 
  • minimal process delay : the minimal age a message should have before the processor picks it up for processing. For instance in the WAL use case messages will usually be processed in a synchronous way right after they have been put on the rowlog, so the processor should not pick those.
  • wakeup timeout : the time the processor uses between two polls to see if there are any messages to be processed in case it didn't receive a notification for a while

For a rowlog to be useful there should be at least one subscription registered with it. When a subscription is registered or unregistered, the rowlog configuration manager will inform the rowlog process of the change. The subscription has besides an id also as properties:

  • an order number to order it against the other subscriptions 
  • a type : the type of listeners the subscription is expecting. VM for listeners that exist local in the same VM, Netty for remote listeners or WAL which has been introduced as an optimization for the WAL use case and is described in more detail below.

Finally listeners should be registered for each subscription. In case of VM listeners, there should be one listener registered with the subscription. Its id should be an id that can be used to retrieve from the 'RowLogMessageListenerMapping' an instance of the listener class that can process the message. Each time a listener is needed, such an instance will be retrieved from the listener mapping. For remote listeners there can be multiple listeners per subscription. Each remote listener should start an instance of the 'RemoteListenerHandler' which will register the listener on the configuration manager by putting a hostname and port in the listener id. On this hostname and port a Netty connection will be set up on which to receive the messages.

In Lily

As mentioned, the RowLog library is used in Lily to implement both the WAL and MQ functionality. We will describe now how the RowLog library can be used for these two use cases.

Some code snippets are included but it is best to take a look at the Lily source code for all details. The snippets come from different classes and are just intended to illustrate the explanation given.

See also the org.lilyproject.rowlog.impl.test.Example class below for an example of how to setup and use the RowLog library.

MQ

Setup

For the MQ functionality we create an instance of the RowLog and register it with the configuration manager. We provide it with information about the record-table, and the column family within that record-table that can be used to store the row-local queue information. 
On the RowLog a RowLogShard where the messages can be stored is registered. (Currently we only support one shard.)

confMgr.addRowLog("mq", new RowLogConfig(true, true, 200L, 0L, 5000L));
RowLog messageQueue = new RowLogImpl("mq", LilyHBaseSchema.getRecordTable(hbaseTableFactory), RecordCf.ROWLOG.bytes, RecordColumn.MQ_PREFIX, confMgr, null);
messageQueue.registerShard(new RowLogShardImpl("shard1", hbaseConf, messageQueue, 100, hbaseTableFactory));

The MQ currently has only one subscription : the "Indexer" that will perform updates on the SOLR index. This subscription needs to be registered through the RowLogConfigurationManager.

rowLogConfMgr.addSubscription("mq", subscriptionId, RowLogSubscription.Type.Netty, 1);

The listeners for this subscription will be running remotely (subscription type 'Netty') , meaning that they won't necessarily be executed in the same vm as where rowlog is running that receives the messages or the rowlog processor that picks up the messages from the global queue. Instead of registering the listener directly with the RowLogConfigurationManager, a RemoteListenerHandler must be instantiated which will handle the remote communication and register the listener and it's remote location automatically with the RowLogConfigurationManager.

RemoteListenerHandler handler = new RemoteListenerHandler(rowLog, index.getQueueSubscriptionId(), indexUpdater, rowLogConfMgr, hostname);

Execution

For messages to be processed asynchronously a RowLogProcessor needs to be started. Since only one processor is needed across all Lily nodes we use the RowLogProcessorElection utility which uses ZooKeeper to make sure only one node is running the processor and starts a new processor in case of a node-failure.

messageQueueProcessorLeader = new RowLogProcessorElection(zk, new RowLogProcessorImpl(messageQueue, confMgr), lilyInfo);
messageQueueProcessorLeader.start();        

WAL

For the WAL use case some optimizations have been provided. As described earlier, for each message put on the rowlog one entry is created on the global queue (rowlog shard) for each subscription of that rowlog. Since the subscriptions of the WAL rowlog should be treated in-order, the processor cannot handle messages for one subscription before they have been handled for the preceding subscription. It is therefor sufficient to have one 'general' message on the global queue instead of one for each subscription, but still keep the administration of the separate subscriptions in the row-local queue. This 'general' message will only be removed from the global queue when the message has actually been handled for all registered subscriptions. This way several put and delete operations on HBase can be avoided and the rowlog processor should only monitor messages for one subscription (WAL). To implement this optimization, several specialized classes have been introduced: WalRowlog, WalProcessor, WalSubscriptionHandler and WalListener.

Setup

For the WAL functionality we create one instance of the RowLog, register it with the configuration manager and provide it with the information about record-table, and the column families that can be used for the row-local index on that table.
On the RowLog a RowLogShard where the messages can be stored is registered.

confMgr.addRowLog("wal", new RowLogConfig(true, false, 200L, 5000L, 5000L));
RowLog writeAheadLog = new WalRowLog("wal", LilyHBaseSchema.getRecordTable(hbaseTableFactory), RecordCf.ROWLOG.bytes,
                RecordColumn.WAL_PREFIX, confMgr, rowLocker); 
RowLogShard walShard = new RowLogShardImpl("shard1", hbaseConf, writeAheadLog, 100, hbaseTableFactory);
writeAheadLog.registerShard(walShard);

The WAL currently has two subscriptions : a "LinkIndexUpdater" which updates a HBase- link index and a "MessageQueueFeeder" which puts events onto the MQ for an indexer to pick this up and perform an (asynchronous) update on the SOLR index. Each subscription is registered throught the RowLogConfigurationMananger.

confMgr.addSubscription("wal", "LinkIndexUpdater", RowLogSubscription.Type.VM, 10);
confMgr.addSubscription("wal", "MQFeeder", RowLogSubscription.Type.VM, 20);

The listeners for these subscriptions must implement the RowLogMessageListener. To indicate that these listeners will be running in the same vm as the rowlog, the subscription gets as subscription type "VM" and the listeners must be explicitly registered through the RowLogConfigurationManager. (In the remote case -see below- the framework will register these automatically). Note that internally, the WalProcessor and WalSubscritpionHandler will be using the subscritpion type "WAL" to indicate that message on the global queue plays the role of the 'generalized' message.

confMgr.addListener("wal", "LinkIndexUpdater", "LinkIndexUpdaterListener");
confMgr.addListener("wal", "MQFeeder", "MQFeederListener");

An instance of the listener must also be registered in the RowLogListenerMapping so that the rowlog can retrieve this instance based on its id. (Note: make sure this either happens before the rowlog is used or before the listener is registered.)

RowLogMessageListenerMapping.INSTANCE.put(WalListener.ID, new WalListener(writeAheadLog, rowLocker));
RowLogMessageListenerMapping.INSTANCE.put("LinkIndexUpdater", linkIndexUpdater);
RowLogMessageListenerMapping.INSTANCE.put("MQFeeder", new MessageQueueFeeder(messageQueue));

Execution

When an update happens on a Lily record, a calculation is done of the changes that need to be performed on the record's data itself, and which information should be put in the payload. A request is then sent to put the message on the RowLog. The RowLog then puts a message on the RowLogShard and adds the message information for the row-local queue on the given put object. It is at this point that both the record's data and the row-local message info can be  put on the HBase table using one atomic put operation. In our code this is doen through a rowlocker component which also manages a custom (non-HBase) rowlock for that row.

walMessage = wal.putMessage(recordId.toBytes(), null, recordEvent.toJsonBytes(), put);
rowLocker.put(put, rowLock);

At this moment, from the client's perspective, the update of the record was performed successfully.

Immediately after the update, the RowLog is requested to process the message, which will result in executing the listeners for the LinkIndexUpdater and the MQFeeder.

wal.processMessage(walMessage);

Due to a node-failure, it could be that the rowlog was not (yet) requested to process the message. If a new update comes in, the rowlog will first check for open messages and try to process them first.

Example code

Below is some example code showing how to instantiate and use the RowLog.

/*
 * Copyright 2010 Outerthought bvba
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package org.lilyproject.rowlog.impl.test;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseTestingUtility;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;
import org.lilyproject.rowlog.api.*;
import org.lilyproject.rowlog.api.RowLogSubscription.Type;
import org.lilyproject.rowlog.impl.RowLogConfigurationManagerImpl;
import org.lilyproject.rowlog.impl.RowLogImpl;
import org.lilyproject.rowlog.impl.RowLogProcessorImpl;
import org.lilyproject.rowlog.impl.RowLogShardImpl;
import org.lilyproject.util.zookeeper.ZkUtil;
import org.lilyproject.util.zookeeper.ZooKeeperItf;

public class Example {
    private final static HBaseTestingUtility TEST_UTIL = new HBaseTestingUtility();

    public static void main(String[] args) throws Exception {
        TEST_UTIL.startMiniCluster(1);
        Configuration configuration = TEST_UTIL.getConfiguration();
        // Create the row table
        final String ROW_TABLE = "rowTable";
        final byte[] DATA_COLUMN_FAMILY = Bytes.toBytes("DataCF");
        final byte[] ROWLOG_COLUMN_FAMILY = Bytes.toBytes("RowLogCF");
        
        HBaseAdmin admin = new HBaseAdmin(configuration);
        HTableDescriptor tableDescriptor = new HTableDescriptor(ROW_TABLE);
        tableDescriptor.addFamily(new HColumnDescriptor(DATA_COLUMN_FAMILY));
        tableDescriptor.addFamily(new HColumnDescriptor(ROWLOG_COLUMN_FAMILY));
        admin.createTable(tableDescriptor);
        HTable rowTable = new HTable(configuration, ROW_TABLE);

        // Setup a zooKeeper connection
        String zkConnectionString = configuration.get("hbase.zookeeper.quorum") + ":" + configuration.get("hbase.zookeeper.property.clientPort");
        ZooKeeperItf zooKeeper = ZkUtil.connect(zkConnectionString, 10000);

        // Create the row log configuration manager
        RowLogConfigurationManagerImpl configurationManager = new RowLogConfigurationManagerImpl(zooKeeper);

        // Create a RowLog instance
        configurationManager.addRowLog("Example", new RowLogConfig(false, true, 100L, 0L, 5000L));
        RowLog rowLog = new RowLogImpl("Example", rowTable, ROWLOG_COLUMN_FAMILY, (byte)1, configurationManager, null);
        
        // Create a shard and register it with the rowlog
        RowLogShard shard = new RowLogShardImpl("AShard", configuration, rowLog, 100);
        rowLog.registerShard(shard);
        
        // Register a listener class on the RowLogMessageListenerMapping
        RowLogMessageListenerMapping.INSTANCE.put("FooBar", new FooBarListener());
        
        // Add a subscription and listener to the configuration manager for the example Rowlog
        configurationManager.addSubscription("Example", "FooBar", Type.VM, 0);
        configurationManager.addListener("Example", "FooBar", "listener1");
        
        // The WAL use case 
        
        // Update a row with some user data
        // and put a message on the RowLog using the same put action
        byte[] row1 = Bytes.toBytes("row1");
        Put put = new Put(row1);
        put.add(DATA_COLUMN_FAMILY, Bytes.toBytes("AUserField"), Bytes.toBytes("SomeUserData"));
        RowLogMessage message = rowLog.putMessage(row1, Bytes.toBytes("SomeInfo"), Bytes.toBytes("Updated:AUserField"), put);
        rowTable.put(put);
        // Explicitly request the RowLog to process the message
        rowLog.processMessage(message, null);
        
        // The MQ use case
        
        // Create a processor and start it
        RowLogProcessor processor = new RowLogProcessorImpl(rowLog, configurationManager);
        processor.start();
        
        message  = rowLog.putMessage(row1, Bytes.toBytes("SomeMoreInfo"), Bytes.toBytes("Re-evaluate:AUserField"), null);
        
        // Give the processor some time to process the message
        Thread.sleep(10000);
        processor.stop();
        configurationManager.shutdown();
        zooKeeper.close();
        TEST_UTIL.shutdownMiniCluster();
    }
    
    private static class FooBarListener implements RowLogMessageListener {
        public boolean processMessage(RowLogMessage message) {
                System.out.println("= Received a message =");
                System.out.println(Bytes.toString(message.getRowKey()));
                System.out.println(Bytes.toString(message.getData()));
            try {
                System.out.println(Bytes.toString(message.getPayload()));
            } catch (RowLogException e) {
                // ignore
            }
            return true;
        }
    }
    
}

STATE

May 17, 2010

The code has gone through a third iteration with a focus on performance improvements in the light of the Lily 1.0 release and has been developed against the CDH3 hadoop & hbase codebase. Further optimizations would include the introduction of multiple shard support. This is currently considered to be introduced for the Lily 1.1 release.

Nov 23, 2010

The code has gone through a second iteration and should be usable for most use cases now.

Some functionality is not implemented yet since it was not yet needed for our use cases :

  • Support for multiple RowLogs managing multiple RowLogShards, with failover of shards to other RowLogs when a RowLog dies
  • Load balancing over multiple shards

Javadoc

The latest javadoc can be consulted here 

DOWNLOAD

License? Apache! What else?
Also, we welcome contributions!

Nov 23, 2010: Subversion access

The Lily source tree has moved from outerthought_lilycms to outerthought_lilyproject, and the rowlog project can now be found in the subdirectory global/rowlog of the Lily source tree.

To get & build the code, now use :

svn checkout http://dev.outerthought.org/svn_public/outerthought_lilyproject trunk
mvn -Pfast install

July 22, 2010: Subversion access

The latest RowLog code  is now available from Lily's source tree. The downloads will no longer be maintained.

The project can be found in the rowlog subdirectory of the Lily source tree, its dependencies can be found in the pom.xml.

To get & build the code, use:

svn checkout http://dev.outerthought.org/svn_public/outerthought_lilycms trunk
mvn -Pfast install

June 18, 2010 Snapshot

Note that this code is deprecated ! Please use the subversion access described above.

Download (application/x-gzip, 1.1 MB, info)

USAGE

The code was developed against HBase trunk (= 0.21).

To get started, you will need the following on the classpath :

  • Our own stuff:
  • lily-util-0.1-dev.jar
  • lily-rowlog-api-0.1-dev.jar
  • lily-rowlog-impl-0.1-dev.jar
  • Dependencies:
  • jackson-mapper-asl-1.5.0.jar
  • jackson-core-asl-1.5.0.jar
  • hbase-0.21.0-dev.jar
  • hadoop-core-0.20.2-with-200-826.jar
  • zookeeper-3.3.1.jar
  • commons-logging-1.1.1.jar
  • log4j-1.2.15.jar

The dependencies are included, except for hbase and hadoop-core, for which it is recommended to use the jars from the actual HBase/Hadoop version that you are using.

FEEDBACK

We are interested in hearing what you think of this library. You can write to the Lily mailing list.