Linux Cluster - Debian Squeeze, Pacemaker, DRBD, LVM, Apache

Posted by admin 27/03/2011 at 01h15

A step-by-step tutorial on how to setup a fail-over cluster using Debian 6 (Squeeze), Pacemaker and DRBD (disk replication over network). On top of this high available infrastructure, we will configure Apache web server. First, resources used will be defined and after that, installation and setting of each part with samples.

1. Infrastructure

2 vps - debian 6 (linux-image-2.6.32-5-686 kernel)- minimum instalation - single disk allocated per node
3 network cards
2 ips (eth0 same network) for accessing individual nodes - 192.168.10.41/42
2 ips (eth1 same network, or you can use bonding to increase bandwidth) for drbd - 192.168.5.41/42
2 ips (eth2 same network) for heartbeat - 192.168.6.41/42
1 service ip, will be set as an alias for eth0 card - 192.168.10.50

2. Network configuration

node1 network settings

  root@node1:~# cat /etc/network/interfaces
  # This file describes the network interfaces available on your system
  # and how to activate them. For more information, see interfaces(5).

  # The loopback network interface
  auto lo
  iface lo inet loopback

  # The primary network interface
  # allow-hotplug eth0
  auto eth0
  iface eth0 inet static
  address 192.168.10.41
  netmask 255.255.255.0
  gateway 192.168.10.254

  # drbd interface
  auto eth1
  iface eth1 inet static
  address 192.168.5.41
  netmask 255.255.255.0

  # heartbeat interface
  auto eth2
  iface eth1 inet static
  address 192.168.6.41
  netmask 255.255.255.0

node2 network settings

  root@node2:~# cat /etc/network/interfaces
  # This file describes the network interfaces available on your system
  # and how to activate them. For more information, see interfaces(5).

  # The loopback network interface
  auto lo
  iface lo inet loopback

  # The primary network interface
  # allow-hotplug eth0
  auto eth0
  iface eth0 inet static
  address 192.168.10.42
  netmask 255.255.255.0
  gateway 192.168.10.254

  # drbd interface
  auto eth1
  iface eth1 inet static
  address 192.168.5.42
  netmask 255.255.255.0

  # heartbeat interface
  auto eth2
  iface eth1 inet static
  address 192.168.6.41
  netmask 255.255.255.0

check if all interfaces from both nodes are accessible

  root@node1:~# ping 192.168.5.42
  root@node1:~# ping 192.168.6.42
  root@node1:~# ping 192.168.10.42
  root@node2:~# ping 192.168.5.41
  root@node2:~# ping 192.168.6.41
  root@node2:~# ping 192.168.10.41

add all hostnames involved to /etc/hosts (identically on all nodes)

3. DRBD Setup

We will add 2 new disks, one to every node. DRBD kernel module comes with standard debian squeeze kernel, so we only need to install utilities.

root@node1:~# aptitude install drbd-utils
root@node2:~# aptitude install drbd-utils

or download sources and compile :), but using debian I will stick with the standard package.
/etc/drbd.d/global_common.conf and /etc/drbd.d/r1.res must be identical on both nodes.
Next snippet will define a drbd resource on top of our /dev/sdb disk:>br />

root@node1:~# cat /etc/drbd.d/r1.res
resource r1 {
        protocol C;
        device /dev/drbd1 minor 1;
        disk /dev/sdb;
        meta-disk internal;

# following 2 definition are equivalent
        on node1 {
                address 192.168.5.41:7801;
                disk /dev/sdb;
        }
        on node2 {
                address 192.168.5.42:7801;
                disk /dev/sdb;
        }

#       floating 192.168.5.41:7801;
#       floating 192.168.5.42:7801;
         net {
                  after-sb-0pri discard-younger-primary; #discard-zero-changes;
                  after-sb-1pri discard-secondary;
                  after-sb-2pri call-pri-lost-after-sb;
        }
}

meta-data initialization
init script runs attach and connect commands for defined resource

root@node1:~# drbdadm create-md r1
root@node1:~# /etc/init.d/drbd start
root@node1:~# cat /proc/drbd
version: 8.3.7 (api:88/proto:86-91)
srcversion: EE47D8BF18AC166BE219757

 1: cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown C r----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:524236

root@node2:~# drbdadm create-md r1
root@node2:~# /etc/init.d/drbd start
root@node2:~# cat /proc/drbd
version: 8.3.7 (api:88/proto:86-91)
srcversion: EE47D8BF18AC166BE219757

 1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:524236

As our drbd resource will be managed by cluster, we need to disable startup script for drbd:

root@node1:~# update-rc.d -f drbd remove
root@node2:~# update-rc.d -f drbd remove

initial synchronization

root@node1:~# drbdadm -- --overwrite-data-of-peer primary r1
root@node1:~# cat /proc/drbd
version: 8.3.7 (api:88/proto:86-91)
srcversion: EE47D8BF18AC166BE219757

 1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----
    ns:142564 nr:0 dw:0 dr:143048 al:0 bm:8 lo:16 pe:6 ua:25 ap:0 ep:1 wo:b oos:381836
        [====>...............] sync'ed: 27.4% (381836/524236)K
        finish: 0:00:34 speed: 11,016 (10,168) K/sec

fully synchronized

root@node1:~# cat /proc/drbd
version: 8.3.7 (api:88/proto:86-91)
srcversion: EE47D8BF18AC166BE219757

 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
    ns:524236 nr:0 dw:0 dr:524436 al:0 bm:32 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

4. LVM Setup

we'll create a logical volume to keep our site files

root@node1:~# aptitude install lvm2 psmisc
root@node2:~# aptitude install lvm2 psmisc

root@node1:~# pvcreate /dev/drbd1
  Physical volume "/dev/drbd1" successfully created
root@node1:~# pvdisplay
  "/dev/drbd1" is a new physical volume of "511.95 MiB"
  --- NEW Physical volume ---
  PV Name               /dev/drbd1
  VG Name
  PV Size               511.95 MiB
  Allocatable           NO
  PE Size               0
  Total PE              0
  Free PE               0
  Allocated PE          0
  PV UUID               TCxIQu-cf1c-SLit-f2pf-i2tP-DiEv-2wkX37

PV UUID should be the same on both nodes

edit /etc/lvm/lvm.conf and replace filter line with

filter = [ "a|drbd.*|", "r|.*|" ]

also, disable lvm cache:

write_cache_state = 0

we can also delete existing lvm cache:

rm -rf /etc/lvm/cache/.cache

Let's create volume group and a logical volume on the primary drbd node:

root@node1:~# drbdadm primary r1
root@node1:~# vgcreate datavg /dev/drbd1
  Volume group "datavg" successfully created
root@node1:~# lvcreate -L 100M -n htdocs datavg
  Logical volume "htdocs" created
root@node1:~# mkfs.ext3 /dev/datavg/htdocs

To manually mount the new filesystem on one node we'll have to execute:

root@node1:~# vgchange -aey datavg
  1 logical volume(s) in volume group "datavg" now active
root@node1:~# mount /dev/datavg/htdocs /var/www/

To move htdocs filesystem to node2:

root@node1:~# umount /var/www/
root@node1:~# vgchange -aen datavg
  0 logical volume(s) in volume group "datavg" now active
root@node1:~# drbdadm secondary r1
root@node2:~# vgchange -aey datavg
  1 logical volume(s) in volume group "datavg" now active
root@node2:~# mount /dev/datavg/htdocs /var/www/

The above commands used to migrate filesystem from one node to another will be executed by cluster software (pacemaker), but first we need to set it up :)

5. Pacemaker/Corosync Setup

Pacemaker is just the cluster's resource manager, it runs on top of a communication system (corosync or heartbeat) responsible for message exchange between cluster's nodes.
Let's install software we need:

root@node1:~# aptitude install apache2 pacemaker
root@node2:~# aptitude install apache2 pacemaker

First step in cluster configuration is to connect nodes to each other, therefor we need to install on each node a common key and to define a private network for cluster communications.

root@node1:~# corosync-keygen
Writing corosync key to /etc/corosync/authkey.

If the above command takes too long, run in parallel the following command to increase the entropy:

root@node1:~# find / -name sss

Now, we can copy the authentication key to the second node:

root@node1:~# scp /etc/corosync/authkey root@node2:/etc/corosync/authkey

To configure private network for cluster communication:

root@node1:~# vi /etc/corosync/corosync.conf
set bindnetaddr to 192.168.6.0
root@node2:~# vi /etc/corosync/corosync.conf
set bindnetaddr to 192.168.6.0

Now, we need to enable cluster service to start at boot time:

root@node1:~# vi /etc/default/corosync
START=yes
root@node1:~# /etc/init.d/corosync start
root@node2:~# vi /etc/default/corosync
START=yes
root@node2:~# /etc/init.d/corosync start

Let's check the status of our cluster:

root@node1:~# crm_mon --one-shot -V
crm_mon[1929]: 2011/02/21_00:14:30 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
crm_mon[1929]: 2011/02/21_00:14:30 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
crm_mon[1929]: 2011/02/21_00:14:30 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
============
Last updated: Mon Feb 21 00:14:30 2011
Stack: openais
Current DC: node1 - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 2 expected votes
0 Resources configured.
============

Online: [ node1 node2 ]

We now have a functional bare cluster ( 0 Resources configured ), so let's add a few resources. Main tool used to configure cluster resources is CRM, a command line interface ( ofcourse there are a few other tools for the same purpose - cibadmin, crm_resource, even some graphical ones - hb_gui, but we will use the stock one ). Cluster configuration is written in xml format, so CRM tool takes human readable commands and generates xml entries.
First, we need to disable stonith (device which can remotely stop a node) and quorum (as we only have 2 nodes, quorum makes no sense here):

root@node1:~# crm
crm(live)# cib new conf20110221
INFO: conf20110221 shadow CIB created
crm(conf20110221)# configure
crm(conf20110221)configure# property stonith-enabled=false
crm(conf20110221)configure# property no-quorum-policy=ignore

Next, let's define our cluster resources (drbd resource, volume group, filesystem, service ip and application). Cluster's resources are manages by so called resource agents, which are scripts used to start/stop/monitor individual resources. Resources are defined in CRM shell with "primitive" command:

crm(conf20110221)configure# primitive drbd ocf:linbit:drbd params drbd_resource="r1" op start interval="0" timeout="240" op stop interval="0" timeout="100" op monitor interval="59s" role="Master" timeout="30s" op monitor interval="60s" role="Slave" timeout="30s"
crm(conf20110221)configure# primitive datavg ocf:heartbeat:LVM params volgrpname="datavg" exclusive="true" op start interval="0" timeout="30" op stop interval="0" timeout="30" 
crm(conf20110221)configure# primitive fs_apache ocf:heartbeat:Filesystem params device="/dev/mapper/datavg-htdocs" directory="/var/www" fstype="ext3" op start interval="0" timeout="60" op stop interval="0" timeout="120"
crm(conf20110221)configure# primitive app_ip ocf:heartbeat:IPaddr params ip="192.168.10.50" op monitor interval="30s"
crm(conf20110221)configure# primitive app_apache2 lsb:apache2 op monitor interval="15s"

We defined above five cluster resources: drbd, datavg, fs_apache, app_ip and app_apache2. To ensure our application will work without problems in case of a fail-over, these resources need to be started and stopped by the cluster in a specific order (we cannot start apache if the filesystem is not mounted or if the ip is not up, as well we cannot mount the filesystem if the volume group is not active etc.). To force cluster to start the resource in a specific order (and to stop them in reverse order), we will create a group of resources. The order specified when the resource group is created is the start order of the resources:

crm(conf20110221)configure# group apache_grp app_ip app_apache2
crm(conf20110221)configure# group lvm datavg fs_apache

We will impose a few other restrictions to our resources to ensure they are started on the same node and in the right order:

crm(conf20110221)configure# ms ms_drbd drbd meta master-node-max="1" clone-max="2" clone-node-max="1" globally-unique="false" notify="true" target-role="Master"
crm(conf20110221)configure# location drbd_on_node1 ms_drbd rule role="master" 100: #uname eq node1
crm(conf20110221)configure# colocation apache-deps inf: ms_drbd:Master lvm apache_grp
crm(conf20110221)configure# order app_on_drbd inf: ms_drbd:promote lvm:start apache_grp:start
First rule specifies we can have only one drbd master node (from a total of two), second one specifies our preference to run drbd as master on node1, third rule requires lvm and apache_grp resource groups to run on the same node where drbd resource has a master role (in other words, we prefer to have all resources running on node1). Last rule specifies the order to start resource groups (lvm first, follow by apache_grp).

As our config is complete now, we can save it and instruct CRM to load the new config:

crm(conf20110221)configure# commit
crm(conf20110221)configure# end
crm(conf20110221)# cib use live
crm(live)# cib commit conf20110221
INFO: commited 'conf20110221' shadow CIB to the cluster
crm(live)# quit
bye

Let's check now cluster's status:

root@node1:~# crm_mon -1f
============
Last updated: Sat Mar  5 23:49:49 2011
Stack: openais
Current DC: node2 - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 2 expected votes
3 Resources configured.
============

Online: [ node1 node2 ]

 Master/Slave Set: ms_drbd
     Masters: [ node1 ]
     Slaves: [ node2 ]
 Resource Group: lvm
     datavg     (ocf::heartbeat:LVM):   Started node1
     fs_apache  (ocf::heartbeat:Filesystem):    Started node1
     fs_db2     (ocf::heartbeat:Filesystem):    Started node1
 Resource Group: apache_grp
     app_ip     (ocf::heartbeat:IPaddr):        Started node1
     app_apache2        (lsb:apache2):  Started node1

Migration summary:
* Node node2:
* Node node1:

You can find more info on the following pages:
Pacemaker - http://www.clusterlabs.org
DRBD - http://www.drbd.org
CRM - http://www.clusterlabs.org/doc/crm_cli.html
Resource agents - http://www.linux-ha.org/wiki/Resource_Agents

Posted in | 2 comments |

Trackbacks<

Use the following link to trackback from your own site:
http://blog.non-a.net/trackbacks?article_id=1

Comments

Leave a comment

  1. neocoretech
    25/05/2011 at 14h36

    very nice tutorial... thx...

    is it possible to do all of this with only one networkcard? We have root servers with only one nic, and it's not possible to add additional...

  2. admin
    25/05/2011 at 17h54

    I think so, but I see no point in that. If the link between nodes interrupts, you'll have a split-brain (both nodes being masters). Also, performance will have much to suffer.

Leave a comment