Zimbra High Availability Setup with GlusterFS

WARNING: Before you embark on this, please read this disclaimer:

Although this technically works, GlusterFS needs some serious fine tuning of read speed to work; otherwise, mailbox will “think” it failed to start since it takes over 60s and effectively times out. This, in turn, causes the init.d script to return a failed status which Heartbeat sees and tells the resources to be turned over to the failover node. Problems abound. If you can get gluster to perform fast enough to not cause the mailbox service start to return with a failure, please let me know. Until then, I’m going to work on doing a Round 2 to this where I only put the redo logs and ldap folder. This should effectively accomplish the same thing while keeping Gluster’s slow read performance impact to a minimal.

Credits go to:

Gaurav Kohli’s Blog Post on setting up GlusterFS with Heartbeat

Philip Lawlor’s Post on setting up Zimbra for High Availability

Overview of Setup

zm1a.hlmn.co – 192.168.2.23

zm1b.hlmn.co – 192.168.2.24

zm1.hlmn.co – 192.168.2.50

Edit Hosts Files

On zm1a:

127.0.0.1 localhost.hlmn.co localhost
127.0.1.1 zm1.hlmn.co zm1a
192.168.2.23 zm1a zm1.hlmn.co
192.168.2.24 zm1b
192.168.2.50 zm1.hlmn.co

On zm2a:

127.0.0.1       zm1.hlmn.co localhost.hlmn.co localhost
192.168.1.23    zm1a 
192.168.1.24    zm1b zm1.hlmn.co

Update Hostname of both:

nano /etc/hostname

zm1a

 

Setup Heartbeat

  1. Install heartbeat:
    apt-get install heartbeat
  2. On both servers, add this config:
    nano /etc/heartbeat/ha.cf
    logfacility local0
    logfile /var/log/ha-log
    keepalive 2
    deadtime 20 # timeout before the other server takes over
    bcast eth0
    node zm1a
    node zm1b 
    auto_failback on # very important or auto failover won't happen
  3. edit /etc/heartbeat/haresources for Server1:
    zm1a IPaddr::192.168.2.50/24 zimbra
  4. edit /etc/heartbeat/haresources for Server2:
    zm1a IPaddr::192.168.2.50/24 zimbra
  5. Notice that both point to zm1a. That sets zm1a as the primary. Failure to do that will result in them trying to take each over, which just becomes a huge mess.
  6. Create /etc/heartbeat/authkeys on both servers
    auth 3
    3 md5 yourrandommd5string

    Protect the permissions of authkeys file on both servers:

    chmod 600 /etc/heartbeat/authkeys

Disable Upstart for Zimbra Services

On both machines, issue the below command to remove the startup services since Heartbeat will be handling them:

# update-rc.d -f zimbra remove

Final Comments:

Again, Heartbeat thinks Zimbra failed to start since the service takes so long to read from the GlusterFS. If you can figure a way to improve that, the above proof of concept should work well.