A Node is Born

SymmetricDS is a Java-based application that acts as an agent for data synchronization between a single database instance and other SymmetricDS instances in a network.  Each of the other SymmetricDS instances act as agents for their own database instance.


Each SymmetricDS instance is a member of a SymmetricDS network and is called a “node.”  SymmetricDS was designed to scale out to many thousands of nodes.  For a SymmetricDS instance to work with a database instance it must be configured with a database connection string, database user, and database password in a properties file.  SymmetricDS can synchronize any table that is accessible by the connection represented by the database connection string, given that the database user has been assigned the appropriate database permissions.


A SymmetricDS node is also assigned an external id and a node group id.  The external id is a meaningful, user-assigned identifier that is used by SymmetricDS to understand which data is destined for a given node.  The node group id is used to identify groupings or tiers of nodes.  It defines where the node fits into the overall node network.  For example, one node group might be called “corporate” and represent an enterprise or corporate database.  Another node group might be “local_office” and represent databases located in different offices across a country.  The external id for a “local_office” might be an office number or some other identifying factor.  A node is uniquely identified in a network by a node id that is automatically generated from the external id.  If local office #1 had two office databases and two SymmetricDS nodes, they would probably have an external id of “1” and node ids of “1-1” and “1-2.”


SymmetricDS can be deployed in a number of ways.  The most common and straight forward way is to install it as a standalone process running as a service on your chosen server platform.  When deployed in this manner SymmetricDS can act as either a client, a multi-tenant server or both depending on where the SymmetricDS database fits into the overall network of databases.  Although it can run on the same server as its database, it is not required to do so.  SymmetricDS can also be deployed as a web application within the context of an external application server such as Apache Tomcat, JBoss Application Server, IBM Webshpere, or others.


SymmetricDS was designed to be a simple, approachable, non-threatening tool for IT personnel.  It can be thought of as a web application, only instead of a browser as the client, other instances of SymmetricDS are the clients.  It has all the characteristics of a web application and can be tuned using the same principles that would be used to tune a properly written web application.


Change Happens

Changes are captured at a SymmetricDS enabled database by database triggers that are installed automatically by SymmetricDS based on configuration settings that you specify.  The triggers record data changes in a SymmetricDS-specific table called SYM_DATA.  The database triggers are designed to be as non-invasive and as lightweight as possible.  After SymmetricDS triggers are installed, changes are captured for any Data Manipulation Language (DML) statements performed by external applications.  Note that no additional libraries or changes are needed by the application(s) that interact directly with the database to change the data that will be synchronized by SymmetricDS.


Database tables that need to be replicated are configured in a series of SymmetricDS configuration tables.  The configuration for the entire network of nodes is managed at a one particular node in the network, known as the registration server node.  The registration server node is almost always the same node as the root node in a tree topology.  When configuring “leaf” nodes, one of the start-up parameters is the URL of the registration server node.  If the “leaf” node has not yet registered, it contacts the registration server and requests to join the network.  Upon acceptance, the node downloads its configuration.  After a node is registered, SymmetricDS can also provide an initial load of data before synchronization starts.


SymmetricDS will install or update its database triggers at start-up time and on a regular basis when a scheduled job runs (by default, each night at midnight).  This update allows changes to your database structure to become known to SymmetricDS automatically.  Optionally, SymmetricDS can be configured to generate a DDL script that can be run by a DBA.


After data is inserted by a trigger into the SYM_DATA table, it is routed and batched by a background job.   To route data means to choose other nodes in the SymmetricDS network to which the data should be sent.  Data is routed to other nodes based on the node group and, optionally, the external id of remote nodes.  To batch data means to group the data change with other changes that should be loaded together at the target node in a single database transaction.  Batches are recorded in a table called SYM_OUTGOING_BATCH.  They are node specific.  SYM_DATA and SYM_OUTGOING_BATCH are linked by an association table called SYM_DATA_EVENT.  The delivery status of a batch is maintained in SYM_OUTGOING_BATCH.  After the data has been delivered to a remote node the batch status is changed to ‘OK.’


The Delivery

Data is delivered to remote nodes over HTTP or HTTPS.  It can be delivered in one of two ways depending on the type of transport link that is configured between node groups.  A node group can be configured to push changes to other nodes in a group or pull changes from other nodes in a group.  Pushing happens from a configurable scheduled job running inside the SymmetricDS process at the source node.  If there are batches that are waiting to be transported, the pushing node will reserve a connection to each target node using an HTTP HEAD request.  If the reservation request is accepted, then the source node will fully extract the data for the batch.  Data is extracted to a memory buffer in CSV format until a configurable threshold is reached.  If the threshold is reached, the data is flushed to a file and the extraction of data continues to that file.  After the batch has been extracted, it is transported using an HTTP PUT to the target node.  The next batch is then extracted and sent.  This is repeated until the maximum number of batches have been sent for each channel or there are no more batches available to send.  After all the batches have been sent for one push, the target returns a list of the batch statuses.


Pull requests happen from a configurable scheduled job running inside the SymmetricDS process at the target node.  Whether data is being extracted for a push to a target or is being extracted because of a pull request, the same extraction process described above occurs.  After data has been extracted and transported, the data is loaded at the target node.  Similar to the extract process, while data is being received the data loader will cache the CSV in a memory buffer until a threshold is reached.   If the threshold is reached the data is flushed to a file and the receiving of data continues.  After all data is available locally, a database connection is retrieved from the connection pool and the events that had occurred at the source database are played back against the target database.


Channeling Data

Data is always delivered to a remote node in the order it was recorded for a specific channel. A channel is a user defined grouping of tables that are dependent on each other.  Data that is captured for tables belonging to a channel is always synchronized together.  Each trigger must be assigned a channel id as part of the trigger definition process.   The channel id is recorded on SYM_DATA and SYM_OUTGOING_BATCH.  If a batch fails to load, then no more data is sent for that channel until the failure has been addressed.  Data on other channels will continue to be synchronized, however.


If a remote node is offline, the data remains recorded at the source database until the node comes back online.  Optionally, a timeout can be set where a node is removed from the network.  Change data is purged from the data capture tables by SymmetricDS after it has been sent and a configurable purge retention period has been reached.  Unsent change data for a disabled node is also purged.


The default behavior of SymmetricDS in the case of data integrity errors is to attempt to repair the data.  If an insert statement is run and there is already a row that exists, SymmetricDS will fall back and try to update the existing row.  Likewise, if an update that was successful on a source node is run and no rows are found to update on the destination, then SymmetricDS will fall back to an insert on the destination.  If a delete is run and no rows were deleted, the condition is simply logged.


Data can be manipulated prior to being loaded in a node by using an extension point called a data loader filter.   The data loader filter is a good place to implement conflict resolution logic, siphon data off to other data sinks, or transform and enhance data.  There are several out of the box filters that can be configured via XML that do things like publish an XML representation of data to a messaging provider or make simple data transformations.


Wrapping Up

SymmetricDS was designed to use standard web technologies so it can be scaled to many clients across different types of databases.  It can synchronize data to and from as many client nodes as the deployed database and web infrastructure will support.  When a two tiered database and web infrastructure is maxed out, a SymmetricDS network can be designed to use N-tiers to allow for even greater scalability.


At this point we have covered what SymmetricDS is and how it does its job of replicating data to many databases using standard, well understood technologies.


Hopefully, this article gave you an idea of how the core of SymmetricDS works.  For more information please review the SymmetricDS User’s Guide or inquire about SymmetricDS at the community forums.  Also feel free to contact JumpMind, the company behind SymmetricDS, directly.



Chris Henson
Author: Chris Henson

Chris, the original founder of JumpMind, has been a software developer since the mid 1990's and has developed and architected systems for the defense, aviation, and retail industries. He is a productive consumer, active participant, and dedicated producer of open source solutions. Chris has also led SymmetricDS and POS implementations at both the national and international level.