Skip navigation

Release: 2.6.2 Previous Releases
Publish Date: July, 2008

Article Rating?


Creating a Terracotta Server Cluster


Introduction

For high availability, the Terracotta server can be clustered to run in ACTIVE-PASSIVE mode. In this mode one server runs in ACTIVE mode servicing requests from Terracotta clients and one or more servers run in PASSIVE mode acting as a hot standby for the ACTIVE server in case of a failure.

There are two ways to configure Terracotta server to run in ACTIVE-PASSIVE mode.

  • ACTIVE-PASSIVE using shared disk
  • ACTIVE-PASSIVE over network

ACTIVE-PASSIVE over network

In this configuration, cluster states are replicated between the ACTIVE and PASSIVE Terracotta servers over the network. A shared disk is NOT needed.

Prerequisites

  • For a data-intense cluster, the ACTIVE and the PASSIVE servers should be connected to each other over a low-latency, high-bandwidth network.

Diagram

The following diagram depicts a typical Terracotta deployment using a network.

Configuration - Network Based

  • Two or more servers should be defined in the <servers> section of Terracotta configuration file tc-config.xml.
  • <l2-group-port> is the port used by the Terracotta server to communicate with other Terracotta servers.
  • The <ha> section should indicate the mode as networked-active-passive.
  • The <networked-active-passive> subsection has a configurable parameter called <election-time> whose value is given in seconds. <election-time> sets the duration for elections to elect an ACTIVE server. <election-time> is a factor in network latency and server load. The default value is 5 seconds.
  • A reconnection mechanism can be enabled to restore lost connections between active and passive Terracotta servers. See Automatic Reconnect below for more information.

For more information on configuration, see Configuration Guide and Reference documentation .

Sample Configuration:

<?xml version="1.0" encoding="UTF-8" ?>
<tc:tc-config xmlns:tc="http://www.terracotta.org/config"
                         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                         xsi:schemaLocation="http://www.terracotta.org/schema/terracotta-4.xsd">
  <servers>
    <server name="Server 1">
      <data>/opt/terracotta/server1-data</data>
      <l2-group-port>9530</l2-group-port>
      <dso>
        <persistence>
          <mode>permanent-store</mode>
        </persistence>
      </dso>
    </server>
    <server name="Server 2">
      <data>/opt/terracotta/server2-data</data>
      <l2-group-port>9530</l2-group-port>
      <dso>
        <persistence>
          <mode>permanent-store</mode>
        </persistence>
      </dso>
    </server>
     <ha>
        <mode>networked-active-passive</mode>
           <networked-active-passive>
               <election-time>5</election-time>
           </networked-active-passive>
       </ha>  
  </servers>
  ...
</tc:tc-config>

Working Details

When multiple Terracotta servers are running in ACTIVE-PASSIVE mode over network, an election is run to elect an ACTIVE server. When an ACTIVE server is elected and agreed upon by all the other servers, it gains control of the cluster. The rest of the servers become PASSIVE-STANDBY and can take over only when the ACTIVE server fails.

When an ACTIVE server fails, one of the available servers in PASSIVE-STANDBY is chosen to be ACTIVE after an election, and clients seamlessly connect to the new ACTIVE server and resume work.

When a PASSIVE server is started while an ACTIVE server is present, the PASSIVE server first needs to sync up state from the ACTIVE server before becoming PASSIVE-STANDBY. While it is syncing state, the PASSIVE server is in PASSIVE-UNINITIALIZED state and cannot become ACTIVE server during a failure until its state is fully synced up.

The ACTIVE server carries the load of sending the state to the PASSIVE server during the sync process. The time taken to sync up is a factor of the amount of data that needs to be synced up and the current load on the cluster. The ACTIVE and PASSIVE servers should be run on similarly configured machines (for better throughput), and should be started together to avoid unnecessary sync ups.

In ACTIVE-PASSIVE mode, the Terracotta servers can run either in persistent mode or non-persistent mode. If an ACTIVE server is running in persistent mode and goes down, and a PASSIVE server takes over, the data directory must be cleared before bringing back the crashed server. Removing the data is necessary because the cluster state could have changed since the crash. The new state is taken from the current ACTIVE server when the crashed server comes back up. The same applies to a crashed PASSIVE server running in persistent mode. Failing to do so will result in the server not starting up and an error message on this condition.

Automatic Reconnect

You can configure an automatic reconnect mechanism to prevent short network disruptions from forcing you to restart any Terracotta servers in an ACTIVE-PASSIVE cluster. See l2.nha.tcgroupcomm.reconnect.enabled and l2.nha.tcgroupcomm.reconnect.timeout in the tc.properties page for more information on how to configure the reconnect feature.

If you enable this feature, time to failover increases by the timeout value set for the automatic reconnect mechanism.

Advantages of running in network mode

  • Common disk share is not needed in this mode.
  • Terracotta servers need not run in persistence mode.

Disadvantages of running in network mode

  • Having multiple PASSIVE servers for protecting against multiple failures adds processing overhead (though after initial sync this overhead becomes minimal)
  • The cluster could end up with a "split brain" problem if the network topology allows a network failure to sever the cluster into two or more disconnected subnetworks.

Troubleshooting

  1. Multiple Terracotta Servers start up as ACTIVE servers.
    Adjust the election-time in the config to meet your network latency and load.
  2. When a crashed Terracotta server is restarted, it fails to come up.
    Clear the data directory for the crashed server if its running in persistent mode.

ACTIVE-PASSIVE using shared disk

In this configuration, the Terracotta server uses a shared disk (SAN, SMB) between the ACTIVE and the PASSIVES to replicate state. Note that the Terracotta server needs to run in persistent mode in this configuration.

Prerequisites

  • A shared disk between the ACTIVE and PASSIVES with file locking support.

Diagram

The following diagram depicts a typical Terracotta deployment using Shared Disk.

Configuration - Disk Based

  • Two or more servers should be defined in the <servers> section of Terracotta config.
  • The <data> section of each server should all point to the same directory in the shared disk to work correctly
  • The <persistence> section should indicated the <mode> as permanent-store
  • For more information on config check out Configuration Guide and Reference documentation

Sample Configuration:

<?xml version="1.0" encoding="UTF-8" ?>
<tc:tc-config xmlns:tc="http://www.terracotta.org/config"
                         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                         xsi:schemaLocation="http://www.terracotta.org/schema/terracotta-4.xsd">
  <servers>
    <server name="Server 1">
      <!-- THIS DIRECTORY IS SHARED BETWEEN Server 1 AND Server 2 -->
      <data>/opt/terracotta/server-data</data>
      <dso>
        <persistence>
          <mode>permanent-store</mode>
        </persistence>
      </dso>
    </server>
    <server name="Server 2">
      <!-- THIS DIRECTORY IS SHARED BETWEEN Server 1 AND Server 2 -->
      <data>/opt/terracotta/server-data</data>
      <dso>
        <persistence>
          <mode>permanent-store</mode>
        </persistence>
      </dso>
    </server>         
  </servers>
  ...
</tc:tc-config>

Working Details

When multiple Terracotta servers are running in ACTIVE-PASSIVE using shared disk mode, they all try to acquire a lock on the data directory. The one that succeeds becomes ACTIVE server and gains control of the cluster. The rest of the servers become PASSIVE-STANDBY and take over only when the ACTIVE server fails. When that happens and when one of the PASSIVE server becomes ACTIVE, the clients seamlessly connect to the new ACTIVE server and resume work.

A consistent view of the state of the cluster is maintained in the disk and hence the new ACTIVE server can resume work from where the old one left off.

Advantages of running in disk share mode

  • Having multiple PASSIVE servers for protecting against multiple failures do not add extra processing or load to the cluster
  • Split brain problem is avoided by having a central authority (here the file server) arbitrate control to the cluster

Disadvantages of running in disk share mode

  • A common disk share with working file locks is need to work in this mode
  • Terracotta servers need to run in persistence mode. The performance of the Terracotta server will be more directly affected by the shared disk performance than in the Network ACTIVE-PASSIVE mode.

Troubleshooting

  1. Terracotta servers fail to come up.
    Check to see if locking is enabled in your shared disk. Some services, like SMB, require separate lock demon to be running to provide these services. Usually Terracotta logs will have clear messages about these errors.
  2. Multiple Terracotta Servers start up as ACTIVE servers.
    Check your config to make sure that all the data directories for all the servers point to the same logical directory in the shared disk.

HealthChecker

HealthChecker is a connection monitor similar to TCP keepalive. HealthChecker functions between Terracotta servers (in High Availability environments), and between Terracotta severs and clients. Using HealthChecker, Terracotta nodes can determine if peer nodes are reachable, up, or in a GC operation. If a peer node is unreachable or down, a Terracotta node using HealthChecker can take corrective action.

See the HealthChecker section in the tc.properties page for more information on how to configure HealthChecker.

Deployment

The typical steps are listed in the Deployment Guide.

Adaptavist Theme Builder Powered by Atlassian Confluence