Linux Cluster - Debian Squeeze, Pacemaker, DRBD, LVM, Apache
A step-by-step tutorial on how to setup a fail-over cluster using Debian 6 (Squeeze), Pacemaker and DRBD (disk replication over network). On top of this high available infrastructure, we will configure Apache web server. First, resources used will be defined and after that, installation and setting of each part with samples.
1. Infrastructure
2 vps - debian 6 (linux-image-2.6.32-5-686 kernel)- minimum instalation - single disk allocated per node
3 network cards
2 ips (eth0 same network) for accessing individual nodes - 192.168.10.41/42
2 ips (eth1 same network, or you can use bonding to increase bandwidth) for drbd - 192.168.5.41/42
2 ips (eth2 same network) for heartbeat - 192.168.6.41/42
1 service ip, will be set as an alias for eth0 card - 192.168.10.50
2. Network configuration
node1 network settings
root@node1:~# cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
# allow-hotplug eth0
auto eth0
iface eth0 inet static
address 192.168.10.41
netmask 255.255.255.0
gateway 192.168.10.254
# drbd interface
auto eth1
iface eth1 inet static
address 192.168.5.41
netmask 255.255.255.0
# heartbeat interface
auto eth2
iface eth1 inet static
address 192.168.6.41
netmask 255.255.255.0node2 network settings
root@node2:~# cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
# allow-hotplug eth0
auto eth0
iface eth0 inet static
address 192.168.10.42
netmask 255.255.255.0
gateway 192.168.10.254
# drbd interface
auto eth1
iface eth1 inet static
address 192.168.5.42
netmask 255.255.255.0
# heartbeat interface
auto eth2
iface eth1 inet static
address 192.168.6.41
netmask 255.255.255.0check if all interfaces from both nodes are accessible
root@node1:~# ping 192.168.5.42
root@node1:~# ping 192.168.6.42
root@node1:~# ping 192.168.10.42
root@node2:~# ping 192.168.5.41
root@node2:~# ping 192.168.6.41
root@node2:~# ping 192.168.10.41add all hostnames involved to /etc/hosts (identically on all nodes)
3. DRBD Setup
We will add 2 new disks, one to every node. DRBD kernel module comes with standard debian squeeze kernel, so we only need to install utilities.
root@node1:~# aptitude install drbd-utils
root@node2:~# aptitude install drbd-utilsor download sources and compile :), but using debian I will stick with the standard package.
/etc/drbd.d/global_common.conf and /etc/drbd.d/r1.res must be identical on both nodes.
Next snippet will define a drbd resource on top of our /dev/sdb disk:>br />
root@node1:~# cat /etc/drbd.d/r1.res
resource r1 {
protocol C;
device /dev/drbd1 minor 1;
disk /dev/sdb;
meta-disk internal;
# following 2 definition are equivalent
on node1 {
address 192.168.5.41:7801;
disk /dev/sdb;
}
on node2 {
address 192.168.5.42:7801;
disk /dev/sdb;
}
# floating 192.168.5.41:7801;
# floating 192.168.5.42:7801;
net {
after-sb-0pri discard-younger-primary; #discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri call-pri-lost-after-sb;
}
}meta-data initialization
init script runs attach and connect commands for defined resource
root@node1:~# drbdadm create-md r1
root@node1:~# /etc/init.d/drbd start
root@node1:~# cat /proc/drbd
version: 8.3.7 (api:88/proto:86-91)
srcversion: EE47D8BF18AC166BE219757
1: cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown C r----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:524236
root@node2:~# drbdadm create-md r1
root@node2:~# /etc/init.d/drbd start
root@node2:~# cat /proc/drbd
version: 8.3.7 (api:88/proto:86-91)
srcversion: EE47D8BF18AC166BE219757
1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:524236As our drbd resource will be managed by cluster, we need to disable startup script for drbd:
root@node1:~# update-rc.d -f drbd remove
root@node2:~# update-rc.d -f drbd removeinitial synchronization
root@node1:~# drbdadm -- --overwrite-data-of-peer primary r1
root@node1:~# cat /proc/drbd
version: 8.3.7 (api:88/proto:86-91)
srcversion: EE47D8BF18AC166BE219757
1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----
ns:142564 nr:0 dw:0 dr:143048 al:0 bm:8 lo:16 pe:6 ua:25 ap:0 ep:1 wo:b oos:381836
[====>...............] sync'ed: 27.4% (381836/524236)K
finish: 0:00:34 speed: 11,016 (10,168) K/secfully synchronized
root@node1:~# cat /proc/drbd
version: 8.3.7 (api:88/proto:86-91)
srcversion: EE47D8BF18AC166BE219757
1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
ns:524236 nr:0 dw:0 dr:524436 al:0 bm:32 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:04. LVM Setup
we'll create a logical volume to keep our site files
root@node1:~# aptitude install lvm2 psmisc
root@node2:~# aptitude install lvm2 psmisc
root@node1:~# pvcreate /dev/drbd1
Physical volume "/dev/drbd1" successfully created
root@node1:~# pvdisplay
"/dev/drbd1" is a new physical volume of "511.95 MiB"
--- NEW Physical volume ---
PV Name /dev/drbd1
VG Name
PV Size 511.95 MiB
Allocatable NO
PE Size 0
Total PE 0
Free PE 0
Allocated PE 0
PV UUID TCxIQu-cf1c-SLit-f2pf-i2tP-DiEv-2wkX37PV UUID should be the same on both nodes
edit /etc/lvm/lvm.conf and replace filter line with
filter = [ "a|drbd.*|", "r|.*|" ]also, disable lvm cache:
write_cache_state = 0we can also delete existing lvm cache:
rm -rf /etc/lvm/cache/.cacheLet's create volume group and a logical volume on the primary drbd node:
root@node1:~# drbdadm primary r1
root@node1:~# vgcreate datavg /dev/drbd1
Volume group "datavg" successfully created
root@node1:~# lvcreate -L 100M -n htdocs datavg
Logical volume "htdocs" created
root@node1:~# mkfs.ext3 /dev/datavg/htdocsTo manually mount the new filesystem on one node we'll have to execute:
root@node1:~# vgchange -aey datavg
1 logical volume(s) in volume group "datavg" now active
root@node1:~# mount /dev/datavg/htdocs /var/www/To move htdocs filesystem to node2:
root@node1:~# umount /var/www/
root@node1:~# vgchange -aen datavg
0 logical volume(s) in volume group "datavg" now active
root@node1:~# drbdadm secondary r1
root@node2:~# vgchange -aey datavg
1 logical volume(s) in volume group "datavg" now active
root@node2:~# mount /dev/datavg/htdocs /var/www/The above commands used to migrate filesystem from one node to another will be executed by cluster software (pacemaker), but first we need to set it up :)
5. Pacemaker/Corosync Setup
Pacemaker is just the cluster's resource manager, it runs on top of a communication system (corosync or heartbeat) responsible for message exchange between cluster's nodes.
Let's install software we need:
root@node1:~# aptitude install apache2 pacemaker
root@node2:~# aptitude install apache2 pacemakerFirst step in cluster configuration is to connect nodes to each other, therefor we need to install on each node a common key and to define a private network for cluster communications.
root@node1:~# corosync-keygen
Writing corosync key to /etc/corosync/authkey.If the above command takes too long, run in parallel the following command to increase the entropy:
root@node1:~# find / -name sssNow, we can copy the authentication key to the second node:
root@node1:~# scp /etc/corosync/authkey root@node2:/etc/corosync/authkeyTo configure private network for cluster communication:
root@node1:~# vi /etc/corosync/corosync.conf
set bindnetaddr to 192.168.6.0
root@node2:~# vi /etc/corosync/corosync.conf
set bindnetaddr to 192.168.6.0Now, we need to enable cluster service to start at boot time:
root@node1:~# vi /etc/default/corosync
START=yes
root@node1:~# /etc/init.d/corosync start
root@node2:~# vi /etc/default/corosync
START=yes
root@node2:~# /etc/init.d/corosync startLet's check the status of our cluster:
root@node1:~# crm_mon --one-shot -V
crm_mon[1929]: 2011/02/21_00:14:30 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
crm_mon[1929]: 2011/02/21_00:14:30 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
crm_mon[1929]: 2011/02/21_00:14:30 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
============
Last updated: Mon Feb 21 00:14:30 2011
Stack: openais
Current DC: node1 - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 2 expected votes
0 Resources configured.
============
Online: [ node1 node2 ]We now have a functional bare cluster ( 0 Resources configured ), so let's add a few resources. Main tool used to configure cluster resources is CRM, a command line interface ( ofcourse there are a few other tools for the same purpose - cibadmin, crm_resource, even some graphical ones - hb_gui, but we will use the stock one ). Cluster configuration is written in xml format, so CRM tool takes human readable commands and generates xml entries.
First, we need to disable stonith (device which can remotely stop a node) and quorum (as we only have 2 nodes, quorum makes no sense here):
root@node1:~# crm
crm(live)# cib new conf20110221
INFO: conf20110221 shadow CIB created
crm(conf20110221)# configure
crm(conf20110221)configure# property stonith-enabled=false
crm(conf20110221)configure# property no-quorum-policy=ignoreNext, let's define our cluster resources (drbd resource, volume group, filesystem, service ip and application). Cluster's resources are manages by so called resource agents, which are scripts used to start/stop/monitor individual resources. Resources are defined in CRM shell with "primitive" command:
crm(conf20110221)configure# primitive drbd ocf:linbit:drbd params drbd_resource="r1" op start interval="0" timeout="240" op stop interval="0" timeout="100" op monitor interval="59s" role="Master" timeout="30s" op monitor interval="60s" role="Slave" timeout="30s"
crm(conf20110221)configure# primitive datavg ocf:heartbeat:LVM params volgrpname="datavg" exclusive="true" op start interval="0" timeout="30" op stop interval="0" timeout="30"
crm(conf20110221)configure# primitive fs_apache ocf:heartbeat:Filesystem params device="/dev/mapper/datavg-htdocs" directory="/var/www" fstype="ext3" op start interval="0" timeout="60" op stop interval="0" timeout="120"
crm(conf20110221)configure# primitive app_ip ocf:heartbeat:IPaddr params ip="192.168.10.50" op monitor interval="30s"
crm(conf20110221)configure# primitive app_apache2 lsb:apache2 op monitor interval="15s"We defined above five cluster resources: drbd, datavg, fs_apache, app_ip and app_apache2. To ensure our application will work without problems in case of a fail-over, these resources need to be started and stopped by the cluster in a specific order (we cannot start apache if the filesystem is not mounted or if the ip is not up, as well we cannot mount the filesystem if the volume group is not active etc.). To force cluster to start the resource in a specific order (and to stop them in reverse order), we will create a group of resources. The order specified when the resource group is created is the start order of the resources:
crm(conf20110221)configure# group apache_grp app_ip app_apache2
crm(conf20110221)configure# group lvm datavg fs_apacheWe will impose a few other restrictions to our resources to ensure they are started on the same node and in the right order:
crm(conf20110221)configure# ms ms_drbd drbd meta master-node-max="1" clone-max="2" clone-node-max="1" globally-unique="false" notify="true" target-role="Master"
crm(conf20110221)configure# location drbd_on_node1 ms_drbd rule role="master" 100: #uname eq node1
crm(conf20110221)configure# colocation apache-deps inf: ms_drbd:Master lvm apache_grp
crm(conf20110221)configure# order app_on_drbd inf: ms_drbd:promote lvm:start apache_grp:startAs our config is complete now, we can save it and instruct CRM to load the new config:
crm(conf20110221)configure# commit
crm(conf20110221)configure# end
crm(conf20110221)# cib use live
crm(live)# cib commit conf20110221
INFO: commited 'conf20110221' shadow CIB to the cluster
crm(live)# quit
byeLet's check now cluster's status:
root@node1:~# crm_mon -1f
============
Last updated: Sat Mar 5 23:49:49 2011
Stack: openais
Current DC: node2 - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 2 expected votes
3 Resources configured.
============
Online: [ node1 node2 ]
Master/Slave Set: ms_drbd
Masters: [ node1 ]
Slaves: [ node2 ]
Resource Group: lvm
datavg (ocf::heartbeat:LVM): Started node1
fs_apache (ocf::heartbeat:Filesystem): Started node1
fs_db2 (ocf::heartbeat:Filesystem): Started node1
Resource Group: apache_grp
app_ip (ocf::heartbeat:IPaddr): Started node1
app_apache2 (lsb:apache2): Started node1
Migration summary:
* Node node2:
* Node node1:You can find more info on the following pages:
Pacemaker - http://www.clusterlabs.org
DRBD - http://www.drbd.org
CRM - http://www.clusterlabs.org/doc/crm_cli.html
Resource agents - http://www.linux-ha.org/wiki/Resource_Agents
Posted in Tutorials | 2 comments |
Trackbacks<
Use the following link to trackback from your own site:
http://blog.non-a.net/trackbacks?article_id=1
25/05/2011 at 14h36
very nice tutorial... thx...
is it possible to do all of this with only one networkcard? We have root servers with only one nic, and it's not possible to add additional...
25/05/2011 at 17h54
I think so, but I see no point in that. If the link between nodes interrupts, you'll have a split-brain (both nodes being masters). Also, performance will have much to suffer.