Chapter 12.  Berkeley DB Replication

Table of Contents

Replication introduction
Replication environment IDs
Replication environment priorities
Building replicated applications
Replication Manager methods
Base API methods
Building the communications infrastructure
Connecting to a new site
Managing Replication Manager group membership
Adding sites to a replication group
Removing sites from a replication group
Primordial startups
Upgrading groups
Replication views
Managing replication directories and files
Replication database directory considerations
Managing replication internal files
Running Replication Manager in multiple processes
One replication process and multiple subordinate processes
Persistence of local site network address configuration
Programming considerations
Handling failure
Other miscellaneous rules
Running replication using the db_replicate utility
One replication process and multiple subordinate processes
Common use case
Avoiding rollback
When to consider an integrated replication application
Choosing a Replication Manager acknowledgement policy
Elections
Synchronizing with a master
Delaying client synchronization
Client-to-client synchronization
Blocked client operations
Clients too far out-of-date to synchronize
Initializing a new site
Bulk transfer
Transactional guarantees
Master leases
Changing group size
Read your writes consistency
Getting a token
Token handling
Using a token to check or wait for a transaction
Clock skew
Communicating between Replication Manager Sites
Configuring for Write Forwarding
Using Replication Manager message channels
Special considerations for two-site replication groups
Two-site strict configuration
Preferred master mode
Other alternatives
Network partitions
Replication FAQ
Ex_rep: a replication example
Ex_rep_base: a TCP/IP based communication infrastructure
Ex_rep_base: putting it all together
Ex_rep_chan: a Replication Manager channel example

Replication introduction

Berkeley DB includes support for building highly available applications based on replication. Berkeley DB replication groups consist of some number of independently configured database environments. There is a single master database environment and one or more client database environments. Master environments support both database reads and writes; client environments support only database reads. If the master environment fails, applications may upgrade a client to be the new master. The database environments might be on separate computers, on separate hardware partitions in a non-uniform memory access (NUMA) system, or on separate disks in a single server. As always with Berkeley DB environments, any number of concurrent processes or threads may access a database environment. In the case of a master environment, any number of threads of control may read and write the environment, and in the case of a client environment, any number of threads of control may read the environment.

Applications may be written to provide various degrees of consistency between the master and clients. The system can be run synchronously such that replicas are guaranteed to be up-to-date with all committed transactions, but doing so may incur a significant performance penalty. Higher performance solutions sacrifice total consistency, allowing the clients to be out of date for an application-controlled amount of time.

There are three ways to build replicated applications. The simpler way is to use the Berkeley DB Replication Manager. The Replication Manager provides a standard communications infrastructure, and it creates and manages the background threads needed for processing replication messages.

The Replication Manager implementation is based on TCP/IP sockets, and uses POSIX 1003.1 style networking and thread support. (On Windows systems, it uses standard Windows thread support.) As a result, it is not as portable as the rest of the Berkeley DB library itself.

For applications with simple data and transaction models, Replication Manager provides automatic write forwarding as a configurable option. Use of this option enables some write operations to be performed on a client environment. For more information, see Communicating between Replication Manager Sites.

If for some reason using Replication Manager or write forwarding does not meet your application's technical requirements, you will have to use the lower-level replication "Base APIs". This approach affords more flexibility, but requires the application to provide some critical components:

  1. A communication infrastructure. Applications may use whatever wire protocol is appropriate for their application (for example, RPC, TCP/IP, UDP, VI or message-passing over the backplane).
  2. The application is responsible for naming. Berkeley DB refers to the members of a replication group using an application-provided ID, and applications must map that ID to a particular database environment or communication channel.
  3. The application is responsible for monitoring the status of the master and clients, and identifying any unavailable database environments.
  4. The application must provide whatever security policies are needed. For example, the application may choose to encrypt data, use a secure socket layer, or do nothing at all. The level of security is left to the sole discretion of the application.

(Note that Replication Manager does not provide wire security for replication messages.)

The following pages present various programming considerations, many of which are directly relevant only for Base API applications. However, even when using Replication Manager it is important to understand the concepts.

Finally, the Berkeley DB replication implementation has one other additional feature to increase application reliability. Replication in Berkeley DB is implemented to perform database updates using a different code path than the standard ones. This means operations that manage to crash the replication master due to a software bug will not necessarily also crash replication clients.

For more information on the Replication Manager operations, see the Replication and Related Methods section in the Berkeley DB C API Reference Guide.