Heartbeat2 Xen cluster with drbd8 and OCFS2

Posted by AtulAthavale on Tue 29 Jan 2008 at 13:04

The idea behind the whole set-up is to get a High availability two node Cluster with redundant data. The two identical Servers are installed with Xen hypervisor and almost same configuration as Cluster nodes. The configuration and image files of Xen virtual machines are stored on drbd device for redundancy. Drbd8 and OCFS2 allows simultaneous mounting on both nodes, which is required for live migration of xen virtual machines.

This Article describes Heartbeat2 Xen cluster Using Ubuntu (7.10) OS, drbd8 and OCFS2 (Ver. 1.39) File system. Although here Ubuntu is used it can be done in almost same way with Debian

Setup

OS Installation

Install two Computers with standard minimal Ubuntu Server (7.10) OS. After standard installation is done, we go ahead installing required packets.

Disc Partition

On both computers we partition the disc in three partitions and use as follows /dev/sda1 as /root /dev/sda2 as swap /dev/sda3 as drbd8 ( just leave it as it is at the time of installation )

Network Configuration

Node

Hostname

IP-Address

Node1

node1

192.168.0.128

Node2

node2

192.168.0.129

Xen system

http://en.wikipedia.org/wiki/Xen We start with installing Xen Hypervisor and boot with Xen-kernel.

sudo apt-get install ubuntu-xen-server

Answer yes for additional software. Reboot the system with Xen hypervisor

OCFS2

http://oss.oracle.com/projects/ocfs2/ OCFS2 is a Cluster File System which allows simultaneous access from many nodes. We will set this on our drbd device to access it from both nodes simultaneously. While configuring OCFS2 we provide the information about nodes, which will access the file system later. Every Node that has a OCFS2 file system mounted, must regularly write into a meta-data of file system, letting the other nodes know that node is still alive.

Installation

sudo apt-get install ocfs2-tools ocfs2console

Configuration

Edit /etc/ocfs2/cluster.conf as follows

#/etc/ocfs2/cluster.conf
node:
ip_port = 7777
ip_address = 192.168.0.128
number = 0
name = node1
cluster = ocfs2
node:
ip_port = 7777
ip_address = 192.168.0.129
number = 1
name = node2
cluster = ocfs2
cluster:
node_count = 2
name = ocfs2

reconfigure ocfs2 with following command with their default values

sudo dpkg-reconfigure o2cb
sudo /etc/init.d/o2cb restart
sudo /etc/init.d/ocfs2 restart

drbd8

http://en.wikipedia.org/wiki/Drbd

Installation

The advantage of drbd8 over drbd7 is: It allows the drbd resource to be “master” on both nodes and so can be mounted read-write. We will build drbd8 modules and load it in kernel. For that we need packages “build-essential” and “kernel-headers-xen”

sudo apt-get install drbd8-utils drbd8-module-source drbd8-source  build-essential linux-headers-xen
sudo sudo m-a a-i drbd8-module-source
sudo update-modules
sudo modprobe drbd

This builds the drbd module kernel/drivers/block/drbd.ko against the current running kernel. A default configuration file is installed as /etc/drbd.conf

Configuration

Edit the /etc/drbd.conf

#/etc/drbd.conf
global {
usage-count yes;
}
common {
syncer { rate 10M; }
}
resource r0 {
protocol C;
handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
outdate-peer "/usr/sbin/drbd-peer-outdater";
}
startup {
}
disk {
on-io-error detach;
}
net {
allow-two-primaries;
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
}
syncer {
rate 10M;
al-extents 257;
}
on node1 {
device /dev/drbd0;
disk /dev/sda3;
address 192.168.0.128:7788;
flexible-meta-disk internal;
}
on node2 {
device /dev/drbd0;
disk /dev/sda3;
address 192.168.0.129:7788;
meta-disk internal;
}
}

“ allow-two-primaries” option in net section of drbd.conf allows the resource to be mounted as “master” on both nodes. Copy the /etc/drbd.conf to node2 and restart drbd on both nodes with following command.

sudo /etc/init.d/drbd restart

If you check the status it looks like this

suddo /etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.0.3 (api:86/proto:86)
SVN Revision: 2881 build by root@node1, 2008-01-20 12:48:36
0: cs:Connected st:Secondary/Secondary ds:UpToDate/UpToDate C r---
ns:143004 nr:0 dw:0 dr:143004 al:0 bm:43 lo:0 pe:0 ua:0 ap:0
resync: used:0/31 hits:8916 misses:22 starving:0 dirty:0 changed:22
act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0

change the resource to “master” with following command on both nodes

 sudo drbdadm primary r0

and check the status again

sudo /etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.0.3 (api:86/proto:86)
SVN Revision: 2881 build by root@node1, 2008-01-20 12:48:36
0: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate C r---
ns:143004 nr:0 dw:0 dr:143004 al:0 bm:43 lo:0 pe:0 ua:0 ap:0
resync: used:0/31 hits:8916 misses:22 starving:0 dirty:0 changed:22
act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0

As you can see resource is “master” on both nodes Th drbd device is now accessible under /dev/drbd0

File system

We can now create a file system on /der/drbd0 by following command

sudo mkfs.ocfs2 /dev/drbd0

This can be mounted on both nodes simultaneously with

sudo mkdir /drbd0
sudo mount.ocfs2 /dev/drbd0 /drbd0

Now we have a common storage which will be synchronized with drbd on both nodes

Init script

We have to make sure that after reboot, the system will set drbd resources again to “master” and mount those on “/drbd0” before starting Heartbeat and Xen machines.

Edit /etc/init.d/mountdrbd.sh

#/etc/init.d/mountdrbd.sh
drbdadm primary r0
mount.ocfs2 /dev/drbd0 /mnt

make it executable and add symbolic link to this under /etc/rc3.d/S99mountdrbd.sh

sudo chmode +x /etc/init.d/mountdrbd.sh
sudo ln -s /etc/init.d/mountdrbd.sh /etc/rc3.d/S99mountdrbd.sh

Actually this step can be integrated also in Heartbeat by adding appropriate resources to the configuration. But as time being we will do this with script.

Heartbeat2

http://www.linux-ha.org/Heartbeat

Installation

Now we can install and setup Heartbeat 2

sudo apt-get install heartbeat-2 heartbeat-2-gui

Edit /etc/ha.d/ha.cf

#/etc/ha.d/ha.cf
crm on
bcast eth0
node node1 node2

and restart heartbeat2 with

sudo /etc/init.d/heartbeat restart

Configuration

In Heartbeat2 the configuration and status information of resources are stored in xml format in “/usr/lib/heartbeat/crm/cib.xml” file. Thy Syntax for this is very well explained by Alan Robertson in his tutorial at the linux.conf.au 2007. Which can be found at http://linux-ha.org/HeartbeatTutorials

This file can either edited directly as whole or manipulated in pieces using “cibadmin” tool. We will use this tool as it makes it much easier to manage the cluster. The required components we will save in xml files under /root/cluster

Initialaization

Edit file /root/cluster/bootstrap.xml

#/root/cluster/bootstrap.xml
<cluster_property_set id="bootstrap">
<attributes>
<nvpair id="bootstrap01" name="transition-idle-timeout" value="60"/>
<nvpair id="bootstrap02" name="default-resource-stickiness" value="INFINITY"/>
<nvpair id="bootstrap03" name="default-resource-failure-stickiness" value="-500"/>
<nvpair id="bootstrap04" name="stonith-enabled" value="true"/>
<nvpair id="bootstrap05" name="stonith-action" value="reboot"/>
<nvpair id="bootstrap06" name="symmetric-cluster" value="true"/>
<nvpair id="bootstrap07" name="no-quorum-policy" value="stop"/>
<nvpair id="bootstrap08" name="stop-orphan-resources" value="true"/>
<nvpair id="bootstrap09" name="stop-orphan-actions" value="true"/>
<nvpair id="bootstrap10" name="is-managed-default" value="true"/>
</attributes>
</cluster_property_set>

Load this file with following command

 sudo cibadmin -C -o crm_config -x /root/cluster/bootstrap.xml

This will initialize the Cluster with values set in xml file. (some how if it has alredy set you can use “sudo cibadmin -M crm_config -x /root/cluster/bootstrap.xml” to modify it with our new values)

Setting up STONITH device

STONITH prevents “split-brain-situation” (i.e. running Resource on both nodes unwontedly at same time) by fencing the other node. Details can be found out at http://www.linux-ha.org/STONITH We will use “stonth” over ssh to reboot the faulty machine

 sudo apt-get install stonith

Follow “ http://sial.org/howto/openssh/publickey-auth/” to setup public key authentication. In short just do following on both nodes

sudo ssh-keygen
--> save key under /root/.ssh/*
-->dont give any passphrase
scp /root/.ssh/id_rsa.pub node2:/root/.ssh/authorized_keys

Now check that you can log on from node1 to node2 per ssh without password asked and vice a versa Now check that stonith is working

sudo ssh -q -x -n -l root "node2" "ls -la"

you should get a file list from node2 Now we configure “stonith” device as Cluster resource. It will be a special cluster resource “Clone” which will run simultaneously on all nodes.

#/root/cluster/stonith.xml
<clone id="stonithcloneGroup" globally_unique="false">
<instance_attributes id="stonithcloneGroup">
<attributes>
<nvpair id="stonithclone01" name="clone_node_max" value="1"/>
</attributes>
</instance_attributes>
<primitive id="stonithclone" class="stonith" type="external/ssh" provider="heartbeat">
<operations>
<op name="monitor" interval="5s" timeout="20s" prereq="nothing" id="stonithclone-op01"/>
<op name="start" timeout="20s" prereq="nothing" id="stonithclone-op02"/>
</operations>
<instance_attributes id="stonithclone">
<attributes>
<nvpair id="stonithclone01" name="hostlist" value="node1,node2"/>
</attributes>
</instance_attributes>
</primitive>
</clone>

Load this file with following command

sudo cibadmin -C -o resources -x /root/cluster/stonith.xml

Xen as cluster resource

Now we can add a Xen virtual machine as cluster resource.Lets say we have a Xen para visualized machine called vm01. The cofiguration and image files of vm01 we keep under /drbd0/xen/vm01/ as vm01.cfg and vm01-disk0.img respectively

Edit /root/cluster/vm01.xml

#/root/cluster/vm01.xml
<resources>
<primitive id="vm01" class="ocf" type="Xen" provider="heartbeat">
<operations>
<op id="vm01-op01" name="monitor" interval="10s" timeout="60s" prereq="nothing"/>
<op id="vm01-op02" name="start" timeout="60s" start_delay="0"/>
<op id="vm01-op03" name="stop" timeout="300s"/>
</operations>
<instance_attributes id="vm01">
<attributes>
<nvpair id="vm01-attr01" name="xmfile" value="/drbd0/xen/vm01/vm01.cfg"/>
<nvpair id="vm01-attr02" name="target_role" value="started"/>
</attributes>
</instance_attributes>
<meta_attributes id="vm01-meta01">
<attributes>
<nvpair id="vm01-meta-attr01" name="allow_migrate" value="true"/>
</attributes>
</meta_attributes>
</primitive>
</resources>

Load this file with following command

sudo cibadmin -C -o resources -x /root/cluster/vm01.xml

Monitoring Tool

With command “crm_mon” you can monitor the cluster including its nodes and resources

sudo crm_mon Refresh in 14s...
============
Last updated: Fri Jan 25 17:26:10 2008
Current DC: node2 (83972cf7-0b56-4299-8e42-69b3411377a7)
2 Nodes configured.
6 Resources configured.
============
Node: node2 (83972cf7-0b56-4299-8e42-69b3411377a7): online
Node: node1 (6bfd2aa7-b132-4104-913c-c34ef03a4dba): online
Clone Set: stonithclone
stonithclone:0 (stonith:external/ssh): Started node1
stonithclone:1 (stonith:external/ssh): Started node2
vm01 (heartbeat::ocf:Xen): Started node2

There is also a GUI available. For using it just set a password for user “hacluster” with following command and call “hb_gui”

sudo passwd hacluster
password
re type password
sudo hb_gui &

Managing Tool

The Cluster resources can be managed either with GUI or with crm_* commands. Please refer to “man” pages for details

list of crm_* commands: crm_attribute, crm_failcount, crm_mon, crm_sh, crm_uuid, crm_diff, crm_master, crm_resource , crm_standby, crm_verify

I hope you find some fun trying it out.

Hallo Folks, mean while there is also on site (Europe and India) commercial support for these kind of Clusters available. Just drop an email for further details.

Gruß, atul.athavale [at] gmail [dot] com .


 

 


Posted by Anonymous (88.97.xx.xx) on Tue 29 Jan 2008 at 17:08
Nice notes. Thanks.

You might be interested in this - http://code.google.com/p/ganeti/

[ Parent | Reply to this comment ]

Posted by Anonymous (130.83.xx.xx) on Tue 29 Jan 2008 at 17:25
Hey buddy,

thnx for reply!!!

ganetil rocks too .. the only advantage of this installation is "live migration" in Xen. This makes it transparent for user .. as he sees absolutly no changes as resource change his node

Gruss,

Atul Athavale.

[ Parent | Reply to this comment ]

Posted by Anonymous (80.206.xx.xx) on Wed 30 Jan 2008 at 15:43
Is this site becomed ubuntu-administration? O_o...

[ Parent | Reply to this comment ]

Posted by Anonymous (80.69.xx.xx) on Thu 31 Jan 2008 at 07:35
http://packages.debian.org/drbd8 shows also packages for lenny and sid.

I think this howto is great! :)

/thorsten

[ Parent | Reply to this comment ]

Posted by Anonymous (137.56.xx.xx) on Mon 4 Feb 2008 at 17:02
Yeah, but lenny and sid lack amd64 xen kernels and sources. So I am not so sure this will work on debian without hacks.

Please let me know if it works, and how :)

[ Parent | Reply to this comment ]

Posted by Anonymous (141.35.xx.xx) on Thu 31 Jan 2008 at 11:50
Excellent article.

Maybe you want to fix the link to the wikipedia article for drdb.
1. make it clickable
2. move it above "Installation" as you did in the others sections

I'm fine to read Ubuntu related articles which can be easily applied to Debian -- don't let the grumblers bother you.

Best regards!

[ Parent | Reply to this comment ]

Posted by Anonymous (62.49.xx.xx) on Mon 11 Feb 2008 at 14:44
Great article. I'm about halfway through so far, have just got my ocfs2 filesystem up and running.

Here's a few notes for anyone else doing this:

Don't use the 64 bit version of Ubuntu server, it's xen kernel is older and doesn't have ocfs2 support.

You don't need to install ocfs2console. It pulls in a lot of GUI components that you just don't need.

The /etc/ocfs2/cluster.conf file is very picky about syntax. If you copy and paste it from above it wont work, the lines have to be indented with tabs and not spaces, otherwise o2cb wont start.

There is no drbd8-source package, so just exclude that from the command line.

The first time you look at drbd status it will probably say ds:Inconsistent/Inconsistent and wont let you set your nodes to primary.
To fix this you have to run the following on one of your nodes:
sudo drbdadm -- --overwrite-data-of-peer primary all

After that status will be ds:UpToDate/Inconsistent and it will start synchronising. You can use the disk at this stage without waiting for the sync to complete.

The first line of /etc/init.d/mountdrbd.sh should be a shebang:
#!/bin/sh

Also I think it should be symlinked into /etc/rc2.d/S99mountdrbd.sh because the system boots into runlevel 2 and not 3. (I've not rebooted yet so haven't tested this part)

I'll be back with any more notes after I've installed heartbeat and got an xen machine up and running.

Tomun.

[ Parent | Reply to this comment ]

Posted by AtulAthavale (130.83.xx.xx) on Mon 11 Feb 2008 at 16:06
Hey Tomun, Thnx a lot for the corrections. Actually while reading even I noticed "Hey! even I had to do this ;) " but problem is .. I did it 1st and while writing I forgot those things. Again sorry for sloppy description at times.

[ Parent | Reply to this comment ]

Posted by Anonymous (217.21.xx.xx) on Mon 18 Feb 2008 at 11:56
I caht't understand only one thing: why do you need to use OSFS2 on the top of drbdv???

[ Parent | Reply to this comment ]

Posted by barbacha (62.23.xx.xx) on Mon 18 Feb 2008 at 12:22
Because DRBD does not include a DLM (distribute lock manager). Therefore you can't implement an active/active configuration with DRBD only.

You need to move the DLM into userspace, by using a tools like for example the one provided by OCFS.

Cheers,

Nicolas BOUTHORS (DRBD user since 0.6.1)
http://www.nbi.fr/

PS : Great article !

[ Parent | Reply to this comment ]

Posted by Anonymous (217.21.xx.xx) on Mon 18 Feb 2008 at 13:52
Ok, in this case why i cannot use only ocfs2 ?

[ Parent | Reply to this comment ]

Posted by barbacha (62.23.xx.xx) on Mon 18 Feb 2008 at 14:37
Because OCFS only does not replicate the data.

OCFS is an "high end" product that suppose that you have a SCSI array with dual fiber channel access for two machine to mount at the same time the same media.

When you don't own a such beast, you have to replicate the underlying data yourself. One example of doing so is using DRBD.

Cheers,

Nicolas BOUTHORS
http://www.nbi.fr/

[ Parent | Reply to this comment ]

Posted by Anonymous (88.2.xx.xx) on Fri 7 Mar 2008 at 13:03
"OCFS is an "high end" product that suppose that you have a SCSI array with dual fiber channel access"

Not need to go "high end" though: you can use Coraid's ATAoE to mount on both nodes a third party's remote block device over ethernet network (either a Coraid appliance or a properly configured Linux box -Etch can do the trick out of the box-) and then use OCFS or GFS to provide a concurrent access capable filesystem on top of it.

This way you can get DRDB out of the equation (I didn't test version 8 but I have awful memories about v 7 either on Debian Sarge or other Linux distros).

[ Parent | Reply to this comment ]

Posted by Anonymous (202.7.xx.xx) on Tue 19 Feb 2008 at 05:25
Nice post.

Its great to see these advancements in HA for linux coming along.

I work as an AIX admin with HACMP. IBM's answer to HA which has been around for years and I have always wanted to see if I could do it on linux, seeing concurrent filesystems is very cool as this was one thing I thought was not around yet for linux.

What happens here in a failover situation?

[ Parent | Reply to this comment ]

Posted by Anonymous (130.83.xx.xx) on Tue 19 Feb 2008 at 09:01
Hello,

If any 1 node goes down the cluster will move the xen resource to the healthy node. The all required data, i.e. xen-config files and xen-disk-images are available over drbd+ocfs2 on that perticular node.
When the faulty node comes back the drbd syncronizes the data to the faulty node and the cluster is back with two nodes.

[ Parent | Reply to this comment ]

Posted by horeizo (192.33.xx.xx) on Tue 19 Feb 2008 at 07:08
Hi Atul, thanks for your post. I'm working on a similar setup atm and would like to comment on a few things:
  • it could be a bit confusing that you create a ha.cf file which basically forces heartbeat2 to run in heartbeat1 mode, but then use the XML files that are characteristic for heartbeat2. There's a handy converter at /usr/lib/heartbeat/haresources2cib.py which generates XML from the much more human-friendly ha.cf
  • you might add a warning that mixing OCFS2's own disk heartbeat with heartbeat2 is pretty dangerous (see this post eg) in active/active configurations. The safe way would be to use OCFS2 usermode heartbeat that plugs into heartbeat2.
Unfortunately, these patches to OCFS2 have not yet made it into the kernel. I integrated Novell's patches into the Debian sid 2.6.24 kernel, but due to some other problems (missing Xen targets..) we decided to give SLES a try for the production system. cheers, -Christian

[ Parent | Reply to this comment ]

Posted by Anonymous (130.83.xx.xx) on Tue 19 Feb 2008 at 08:55
Hello Christian,

Thnx for the comments

1.) I guess ha.cf is mandatory for ha2 too. It defines the interfaces to watch on, node joining methods and broadcast type. The rest configuration is under cib.xml. haresources2cib.py is used to convert ha1 to ha2, as I am setting up a new cluster directly with ha2 I opt for writing own xmls. In my opinion they are cleaner ;)

2.) You are absolutely right about OCFS2 issues and the missing user space patches for debian. But in this HowTo we workaround those dangerous by mounting ocfs2 manually over init-script and not using HA2 for the same.

Gruss,

Atul Athavale.

PS: SLES is also a good choice. They have integrated most of HA2 and XEN very well in 10sp1. I have built two SLES clusters too. It was forced due to some policy decisions ... never the less my OS of choice still remains debian or Ubuntu ;)

[ Parent | Reply to this comment ]

Posted by horeizo (192.33.xx.xx) on Tue 19 Feb 2008 at 09:35
hey Atul,

1) you're absolutely right, my bad - I mixed it up with haresources which you don't use. Sorry

2) I'm not sure you're getting around the problem by init.d mounting OCFS2. The problem might arise if heartbeat2 and OCFS2 have conflicting opinions on whether a split-brain situation is about to occur or not. Fortunately I haven't experienced this situation yet, but uncoordinated heartbeating/fencing in active/active clusters has a rather bad reputation.

You're right about Debian, I'd love to get the whole stack up and running on etch, but for the time being it involves way too much patching for a production setting. Jeff (Novell) who prepared the OCFS2 patches for 2.6.24 indicated that OCFS2 user mode heartbeat is meant to be rewritten for 2.6.26 and I think I might give it another go then.

Gruss too,
-Christian

[ Parent | Reply to this comment ]

Posted by rtsf-msu (35.10.xx.xx) on Wed 12 Mar 2008 at 13:01
Fantastic Article. Thank you for your time on this.

I've set up a test system like in your example and I am very impressed with how the failover works. I have a question as I haven't been able to find an answer to a problem I've been having:

Regarding the init file mountdrbd.sh, I have found out that if I kill (poweroff) the primary node, (number 0 in /etc/ocfs2/cluster.conf) the xen instance will move seamlessly to the secondary and heartbeat will change that node to the primary. When I bring the downed box back up it will error while running mountdrbd.sh as it tries to set 'drbdadm primary r0' saying that there cannot be two primaries while the /dev/drbd0 is out of sync.
After that error stonith kicks in and the box reboots and / or kernel panics. This still leaves the other node up and xen domU instance will be running fine.

Essentially I have a hard time bringing the cluster back to the point it was at before it lost a node. I'm not sure if it is an order of operations problem or if drbd needs time to sync changes before bringing both nodes back to primary.

Thanks for your time.

[ Parent | Reply to this comment ]

Posted by Anonymous (213.157.xx.xx) on Wed 12 Mar 2008 at 13:29
Hello,

Thnx for trying it out and your comments!

In our configuration both nodes are "primary" independant of where DomU is really runing. This "making both drbd0s primary" is controlled with mountdrbd.sh

You have got the problem exactly right. "drbd needs time to sync changes before bringing both nodes back to primary". So my mountdrbd.sh script needs an extension which will check for sync status and wait till both nodes are in sync again before making the returened node to primary again.

I will try to fix it soon!!!

Gruss,
Atul Athavale.

[ Parent | Reply to this comment ]

Posted by rtsf-msu (35.10.xx.xx) on Wed 12 Mar 2008 at 13:36
Thanks Atul for your very quick reply.

Again, this article is great.

I'm glad I wasn't as far off base as I had previously thought.

I really appreciate your response.


Drew

[ Parent | Reply to this comment ]

Posted by Gwayne (165.21.xx.xx) on Thu 13 Mar 2008 at 03:55
sudo cibadmin -C crm_config -x /root/cluster/bootstrap.xml

should be

sudo cibadmin -C -o crm_config -x /root/cluster/bootstrap.xml

According to me ;)

Thanks for the very helpful doc.

[ Parent | Reply to this comment ]

Posted by AtulAthavale (130.83.xx.xx) on Thu 13 Mar 2008 at 06:49
Hello, Yup!!! Its writing mistake. Thnx for pointing it out!! now corrected in Article

[ Parent | Reply to this comment ]

Posted by Anonymous (77.192.xx.xx) on Thu 20 Mar 2008 at 10:55
Hi,

Why did you use OCFS2 and not GFS ?

GFS seems a bit more aged and advanced but i don't find any real comparison of the two .

Thanks for this great article :)

Mathieu

[ Parent | Reply to this comment ]

Posted by Anonymous (212.254.xx.xx) on Fri 28 Mar 2008 at 09:23
On Debian, you might get this error message:

ocfs2_hb_ctl: I/O error on channel while starting heartbeat
mount.ocfs2: Error when attempting to run /sbin/ocfs2_hb_ctl: "Operation not permitted"

After many hours of research, I can tell you it is not because of drbd or ocfs2. Upgrade the kernel to version 2.6.22 or bigger. The ocfs2 driver in the kernel 2.6.18 is bugged when used with drbd.

[ Parent | Reply to this comment ]

Posted by Anonymous (213.30.xx.xx) on Mon 21 Jul 2008 at 14:35
I've got this error message. Spent many hours of research too. Trying to upgrade with a newer kernel, (linux-modules-2.6.25-2-xen-686 backport), I've found that this one can't run in Dom0. More details on how you cope with that would be great.

[ Parent | Reply to this comment ]

Posted by Anonymous (82.215.xx.xx) on Fri 20 Jun 2008 at 10:18
The example and guide is only with 1 guest resource. Try it with 2 or more guests and see how Heartbeat + Xen fails. HB tries to simultaneously migrate all the guests resulting to a freeze in guest vm:s.

[ Parent | Reply to this comment ]

Posted by Anonymous (130.83.xx.xx) on Sat 21 Jun 2008 at 15:19
Hallo,

First of all: Thank you for tyring this guide out.
The poit you made with more than 1 Guests freeze the Cluster is not entirely right. It depends on the configuration. If one cares to put Xen-VM-Harddisks on different drbd-devices (/der/drbd0, /dev/drbd1 ... an so on) with accordant configuration. It works.

[ Parent | Reply to this comment ]

Posted by Anonymous (80.98.xx.xx) on Sat 27 Dec 2008 at 02:15
Hi,
thanks for the cool article!
I'm looking for a HA configuration that solves redundancy both for hw and sw issues.
How can this setup help in decreasing downtime, caused by sw upgrades and other maintenance? (eg. avoid outages upon kernel or software upgrade like: detache a node, upgrade, attach node back, 'resync(?)')
I need this solution for webhosting, where security upgrades are highly critical but cause unwanted downtime.
Thanks,
Zsigmond

[ Parent | Reply to this comment ]

Posted by Anonymous (80.98.xx.xx) on Sat 27 Dec 2008 at 03:17
"Rolling upgrades" (node by node), that's the word I was looking for

"In this scenario each node is removed from the cluster, upgraded and then brought back online until all nodes are running the newest version."

Zs.

[ Parent | Reply to this comment ]

Posted by Anonymous (80.85.xx.xx) on Thu 22 Oct 2009 at 21:05
I try this manual for a xen cluster, but i have a problem with heartbeat, when i run teh cibadmin -C -o crm_config -x /root/cluster/bootstrap.xml then i get a error message Signon to CIB failed: connection failed. Init failed, could not perform requested operation. I using openSUSE 11.1 32bit. Can anybody write the heartbeat installalion /configuration step by step?

[ Parent | Reply to this comment ]

Posted by Anonymous (203.35.xx.xx) on Tue 3 Aug 2010 at 03:09
Aren't you inevitably going to get split-brain doing this, unless one has at least 3 nodes? Suppose one has 2 nodes, N1 and N2, each running DRDB8+OCFS2+Xen+heartbeat2. If other node appears to go down, neither node can tell if its really gone down or just a comms problem. So only safe approach is to fence itself, and now the whole cluster is fenced and everything is down.

Whereas, with at least 3 nodes, split-brain can be avoided -- if N1&N2 can see each other but not N3, then N3 will fence itself, and N1+N2 will keep running. Whereas, if both N1+N2 die, N3 must fence itself and then the whole cluster is down.

So 2-node cluster can't survive single node failure, whereas 3-node cluster can survive single node failure, 4-node cluster can survive single node failure (if 2 nodes fail, we can't distinguish that from 2 equal subclusters), 5-node cluster can survive two node failure, 6-node can survive 2-node, 7-node can survive 3-node, etc...

[ Parent | Reply to this comment ]

Sign In

Username:

Password:

[Register|Advanced]

 

Flattr

 

Current Poll

What do you use for configuration management?








( 471 votes ~ 5 comments )