GlusterFS – Ryan Hallman

WARNING: Before you embark on this, please read this disclaimer:

Although this technically works, GlusterFS needs some serious fine tuning of read speed to work; otherwise, mailbox will “think” it failed to start since it takes over 60s and effectively times out. This, in turn, causes the init.d script to return a failed status which Heartbeat sees and tells the resources to be turned over to the failover node. Problems abound. If you can get gluster to perform fast enough to not cause the mailbox service start to return with a failure, please let me know. Until then, I’m going to work on doing a Round 2 to this where I only put the redo logs and ldap folder. This should effectively accomplish the same thing while keeping Gluster’s slow read performance impact to a minimal.

Credits go to:

Gaurav Kohli’s Blog Post on setting up GlusterFS with Heartbeat

Philip Lawlor’s Post on setting up Zimbra for High Availability

Overview of Setup

zm1a.hlmn.co – 192.168.2.23

zm1b.hlmn.co – 192.168.2.24

zm1.hlmn.co – 192.168.2.50

Edit Hosts Files

On zm1a:

127.0.0.1 localhost.hlmn.co localhost
127.0.1.1 zm1.hlmn.co zm1a
192.168.2.23 zm1a zm1.hlmn.co
192.168.2.24 zm1b
192.168.2.50 zm1.hlmn.co

On zm2a:

127.0.0.1       zm1.hlmn.co localhost.hlmn.co localhost
192.168.1.23    zm1a 
192.168.1.24    zm1b zm1.hlmn.co

Update Hostname of both:

nano /etc/hostname

zm1a

Setup Heartbeat

Install heartbeat:
```
apt-get install heartbeat
```

On both servers, add this config:

nano /etc/heartbeat/ha.cf

logfacility local0
logfile /var/log/ha-log
keepalive 2
deadtime 20 # timeout before the other server takes over
bcast eth0
node zm1a
node zm1b 
auto_failback on # very important or auto failover won't happen

edit /etc/heartbeat/haresources for Server1:
```
zm1a IPaddr::192.168.2.50/24 zimbra
```
edit /etc/heartbeat/haresources for Server2:
```
zm1a IPaddr::192.168.2.50/24 zimbra
```
Notice that both point to zm1a. That sets zm1a as the primary. Failure to do that will result in them trying to take each over, which just becomes a huge mess.
Create /etc/heartbeat/authkeys on both servers
```
auth 3
3 md5 yourrandommd5string
```
Protect the permissions of authkeys file on both servers:
```
chmod 600 /etc/heartbeat/authkeys
```

Disable Upstart for Zimbra Services

On both machines, issue the below command to remove the startup services since Heartbeat will be handling them:

# update-rc.d -f zimbra remove

Final Comments:

Again, Heartbeat thinks Zimbra failed to start since the service takes so long to read from the GlusterFS. If you can figure a way to improve that, the above proof of concept should work well.

Overview of Setup

Primary Gluster Server

Hostname: gf1.hlmn.co

IP Address: 192.168.2.26

OS: Ubuntu 14.04

Memory: 1GB

Secondary Gluster Server

Hostname: gf2.hlmn.co

IP Address: 192.168.2.27

OS Ubuntu 14.04

Memory: 1GB

Prepare the Virtual Machines

Create a new clean, base Ubuntu 14.04 install
Name it gf1 and setup the hosts file and hostname file to match that as well as the domain information.
Add a raw VirtIO disk to be used by Gluster as the brick. We’ll call this gf1_brick1.img
Repeat for the second machine, naming it gf2.
Once they’re setup, make sure they’re both updated:
```
sudo apt-get update && sudo apt-get upgrade
```

Install Gluster on Both Nodes

Install python-software properties:

$ sudo apt-get install python-software-properties

Add the PPA:

$ sudo add-apt-repository ppa:semiosis/ubuntu-glusterfs-3.5
$ sudo apt-get update

Then install Gluster packages:
```
$ sudo apt-get install glusterfs-server
```
Add both hosts to your DNS host so that they can see each other by hostname

Configure GlusterFS

We’ll setup GF1 as the primary server. Many of the Gluster commands will execute on both or all servers.

Drop into root user
Configure the Trusted Pool on gf1:
```
gluster peer probe gf2.hlmn.co
```

Check to make sure it works by typing this on gf2 as root user:

# gluster peer status

The output should be:

Number of Peers: 1

Hostname: 192.168.2.26
Uuid: 8aadbadf-8498-4674-8b42-a561d63b2e3d
State: Peer in Cluster (Connected)

It’s time to setup the disks to be used as bricks. If you’re using KVM and you setup the second disk as a raw VirtIO device, it should be listed as /dev/vd[a-z]. Mine is vdb

We can double check to make sure it’s the right disk by issuing:

# fdisk -l /dev/vdb

And we should get something like this:

Disk /dev/vdb: 21.0 GB, 20971520000 bytes
16 heads, 63 sectors/track, 40634 cylinders, total 40960000 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/vdb doesn't contain a valid partition table

Once we ID the disk, issue:

# fdisk /dev/vdb
Command (m for help): n
Partition type:
   p   primary (0 primary, 0 extended, 4 free)
   e   extended
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-40959999, default 2048): 
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-40959999, default 40959999): 
Using default value 40959999

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks

Install xfs:
```
apt-get install xfsprogs
```
Format the partition:
```
 mkfs.xfs -i size=512 /dev/vdb1
```

Mount the partition as a Gluster Brick:

mkdir -p /export/vdb1 && mount /dev/vdb1 /export/vdb1 && mkdir -p /export/vdb1/brick

Add entry into fstab:

 echo "/dev/vdb1 /export/vdb1 xfs defaults 0 0"  >> /etc/fstab

Repeat Steps 4-10 on gf2.

Now it’s time to setup a replicated volume. On gf1:

gluster volume create gv0 replica 2 gf1.hlmn.co:/export/vdb1/brick gf2.hlmn.co:/export/vdb1/brick

An explanation of the above, from Gluster documentation:

Breaking this down into pieces, the first part says to create a gluster volume named gv0 (the name is arbitrary, gv0 was chosen simply because it’s less typing than gluster_volume_0). Next, we tell it to make the volume a replica volume, and to keep a copy of the data on at least 2 bricks at any given time. Since we only have two bricks total, this means each server will house a copy of the data. Lastly, we specify which nodes to use, and which bricks on those nodes. The order here is important when you have more bricks…it is possible (as of the most current release as of this writing, Gluster 3.3) to specify the bricks in a such a way that you would make both copies of the data reside on a single node. This would make for an embarrassing explanation to your boss when your bulletproof, completely redundant, always on super cluster comes to a grinding halt when a single point of failure occurs.

The above should output:

volume create: gv0: success: please start the volume to access data

Now, to make sure everything is setup correctly, issue this on both gf1 and gf2, output should be the same on both servers:

gluster volume info

Expected Output:

Volume Name: gv0
Type: Replicate
Volume ID: 064499be-56db-4e66-84c7-2b6712b10fa6
Status: Created
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gf1.hlmn.co:/export/vdb1/brick
Brick2: gf2.hlmn.co:/export/vdb1/brick

Status of the above shows “Created” which means it hasn’t been started yet. Trying to mount of the volume at this point would cause it to fail, so we have to start it first by issuing this on gf1:
```
gluster volume start gv0
```
You should see this:
```
volume start: gv0: success
```

Mount Your Gluster Volume on the Host Machine

Now that you have your Gluster Volume setup, you can access it using the glusterfs-client on another host.

Source: GlusterHacker

Install the GlusterFS client on a remote host:
```
apt-get install glusterfs-client
```
Create a config location for gluster:
```
mkdir /etc/glusterfs
```
Create a volume config file:
```
nano /etc/glusterfs/gfvolume1.vol
```

Fill in the following:

volume gv0-client-0
 type protocol/client
 option transport-type tcp
 option remote-subvolume /export/vdb1/brick
 option remote-host gf1.hlmn.co
end-volume

volume gv0-client-1
 type protocol/client
 option transport-type tcp
 option remote-subvolume /export/vdb1/brick
 option remote-host gf2.hlmn.co
end-volume

volume gv0-replicate
 type cluster/replicate
 subvolumes gv0-client-0 gv0-client-1
end-volume

volume writebehind
 type performance/write-behind
 option window-size 1MB
 subvolumes gv0-replicate
end-volume

volume cache
 type performance/io-cache
 option cache-size 512MB
 subvolumes writebehind
end-volume

Gluster reads the above starting at the bottom of the file and working it’s way up. So it first creates the cache volume, then adds a layer for writebehind and replication and finally the remote volumes.

Add it through fstab (nano /etc/fstab) and add the following:
```
/etc/glusterfs/gfvolume1.vol /mnt/gfvolume1 glusterfs rw,allow_other,default_permissions,_netdev 0 0
```
This tells fstab about both bricks so that if one goes down, it can connect to the other.

That’s pretty much it to at least getting it to work.

The performance of it, on the other hand, will need a lot more looking into since I’m getting 50mb/s writes on Gluster where the host can do 250mb/s. Small file performance is also abysmal.

Ryan Hallman

Category: GlusterFS

Zimbra High Availability Setup with GlusterFS

Notes on Installing GlusterFS on Ubuntu