Debian Lenny HowTo

From ClusterLabs

Jump to: navigation, search

This page will guide you trough installing a Corosync+ Pacemaker two node cluster which is later extended and worked with. The aim is to provide you with a working example of such a cluster.

Once you get up to speed using this HowTo you can dive into the more advanced configuration and documentation.

Contents

Introduction

In this example we will first use these names and IP adresses for example code:

  • node1 - ip 10.0.0.11 - first node
  • node2 - ip 10.0.0.12 - second node
  • virt1 - ip 10.0.0.21 - virtual IP adress

Disclaimer: We assume that you can work with debian linux already and know the security implications of working as root and so on.

If you get stuck using this HowTo you might try your luck on the #linux-ha irc channel on freenode.net

Installation

First of all, please install two servers (node1 and node2) with Debian GNU/Linux 5.0 (alias Lenny) and set them both up the way you want them -- finish any non-cluster related changes before fiddling with Pacemaker.

Backports.org and the Madkiss-repository

As of Jul 8, 2010, packages for the whole Linux-HA clusterstack (corosync, openais, heartbeat, cluster-glue, cluster-agents, pacemaker) are available from the official backports repository for Debian GNU/Linux 5.0. They are derived from the official packages in the current "testing"-branch of Debian GNU/Linux, currently codenamed Squeeze. Due to this, the APT-Repository formely known as the "Madkiss"-repo has kind of lost its original intention; it does no longer include packages for the whole cluster stack.

It will, however, continue to exist in order to provide Lenny-packages for cases where updated packages have been uploaded to the Debian development branch codenamed "Sid" (also referred to as "Unstable"). Due to the Backports.org-policy, packages which want to enter the Backports.org-repo must be present in the same version in the testing-branch (currently Squeeze). Thus, there might be situations where a more up-to-date version of the Linux-HA cluster stack exists in Unstable but didn't migrate to testing yet and accordingly can not be made available in the Backports.org-repository. Packages for Lenny might then be available from the Madkiss-repo.

So in order to use Pacemaker on Debian GNU/Linux 5.0 ("Lenny"), please add the Backports.org-repository to your APT-configuration according to the How-To on this site. This has to be done on all nodes in your cluster.

How to enable the repository

If you already have a backports stanza in your apt sources lists, you should be ok. Otherwise, create a new file /etc/apt/sources.list.d/pacemaker.list that contains:

# only if you want to madkiss repo, which may sometimes not include the full stack.
# deb http://people.debian.org/~madkiss/ha lenny main

# usually you should be ok just using backports:
deb http://backports.debian.org/debian-backports lenny-backports main

If you use the Madkiss repo, you want to add the Madkiss key to you package system:

apt-key adv --keyserver pgp.mit.edu --recv-key 1CFA3E8CD7145E30

If you omit this step you will get this error:

W: GPG error: http://people.debian.org lenny Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 1CFA3E8CD7145E30

Update the package list

aptitude update

Install the packages

Installing the package pacemaker will install pacemaker with corosync, if you need openais later on, you could install that as a plugin in corosync. OpenAIS is need for example for DLM or CLVM, but thats beyond the scope of this howto.

aptitude install pacemaker

If you want to run pacemaker on top of Heartbeat 3 instead of Corosync, please use the following command:

aptitude install pacemaker heartbeat

Please note that Corosync will still be installed as dependency; however, if you set up Heartbeat properly, Corosync can remain unused.

Initial Configuration

Create authkey

To create an authkey for corosync communication between your two nodes do this on the first node:

node1~: sudo corosync-keygen

This creates a key in /etc/corosync/authkey

You need to copy this file to the second node and put it in the /etc/corosync directory with the right permissions. So on the first node:

node1~: scp /etc/corosync/authkey node2:

And on the second node:

node2~: sudo mv ~/authkey /etc/corosync/authkey
node2~: sudo chown root:root /etc/corosync/authkey
node2~: sudo chmod 400 /etc/corosync/authkey

Edit configfile

Most of the options in the /etc/corosync/corosync.conf file are ok to start with, you must however make sure that it can communicate so make sure to adjust this section:

        interface {
                # The following values need to be set based on your environment
                ringnumber: 0
                bindnetaddr: 192.168.2.0
                mcastaddr: 226.94.1.1
                mcastport: 5405
        }

Change your bindnetaddr to your local subnet so if you have configured the IP 10.0.0.23 for the first node and 10.0.0.24 for the second node, adjust your bindnetaddr to 10.0.0.0.

Enabling corosync

Corosync is disabled by default and starting it with the initscript will not work. To enable corosync you need to replace START=no with START=yes in /etc/default/corosync

Deal with firewall

Make sure you have opened the multicast port for udp traffic in your firewall. For example when using shorewall add this rule to your /etc/shorewall/rules file on both nodes:

# Multicast for pacemaker
ACCEPT   net                                    fw      udp     5405

Running corosync

Now that you have configured both nodes you can start the cluster on both sides:

node1~: sudo /etc/init.d/corosync start
Starting corosync daemon: corosync.
node2~: sudo /etc/init.d/corosync start
Starting corosync daemon: corosync.

Check the status

To check corosync status you can look at /var/log/daemon.log

If you take a look at the processlist using 'ps auxf' you should get something like this:

root     29980  0.0  0.8  44304  3808 ?        Ssl  20:55   0:00 /usr/sbin/corosync
root     29986  0.0  2.4  10812 10812 ?        SLs  20:55   0:00  \_ /usr/lib/heartbeat/stonithd
102      29987  0.0  0.8  13012  3804 ?        S    20:55   0:00  \_ /usr/lib/heartbeat/cib 
root     29988  0.0  0.4   5444  1800 ?        S    20:55   0:00  \_ /usr/lib/heartbeat/lrmd
102      29989  0.0  0.5  12364  2368 ?        S    20:55   0:00  \_ /usr/lib/heartbeat/attrd
102      29990  0.0  0.5   8604  2304 ?        S    20:55   0:00  \_ /usr/lib/heartbeat/pengine
102      29991  0.0  0.6  12648  3080 ?        S    20:55   0:00  \_ /usr/lib/heartbeat/crmd

And you can issue the crm_mon tool to get info about the current status of the cluster. We use -V for extra information.

node1~: sudo crm_mon --one-shot -V
crm_mon[7363]: 2009/07/26_22:05:40 ERROR: unpack_resources: No STONITH resources have been defined
crm_mon[7363]: 2009/07/26_22:05:40 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
crm_mon[7363]: 2009/07/26_22:05:40 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity


============
Last updated: Fri Nov  6 21:03:51 2009
Stack: openais
Current DC: node1 - partition with quorum
Version: 1.0.6-cebe2b6ff49b36b29a3bd7ada1c4701c7470febe
2 Nodes configured, 2 expected votes
0 Resources configured.
============

Online: [ node1 node2 ]

As you can see the setup is complaining about STONITH, but that is since we have not configured that part of the cluster.

Configure an IP resource

We are now going to configure the Configuration Information Base or CIB using the Cluster Resouce Manager or CRM command line tool.

First we start the crm commandline tool:

node1~: sudo crm
crm(live)#

Then we create a copy of the current configuration to edit in, we will commit this copy when we are done editing:

crm(live)# cib new config20090726
INFO: config20090726 shadow CIB created
crm(config20090726)#

Then we go into configuration mode and we show the current config:

crm(config20090726)# configure
crm(config20090726)configure# show
node host132.procolix.com
node host133.procolix.com
property $id="cib-bootstrap-options" \
        dc-version="1.0.6-cebe2b6ff49b36b29a3bd7ada1c4701c7470febe"
        cluster-infrastructure="openais" \
        expected-quorum-votes="2"

We now turn off STONITH since we don't need it in this example configuration:

crm(config20090726)configure# property stonith-enabled=false

Now we add our failover IP to the configuration:

crm(config20090726)configure# primitive failover-ip ocf:heartbeat:IPaddr params ip=10.0.0.21 op monitor interval=10s

And lastly we check if our configuration is valid and then commit it to the cluster and quit the configuration tool:

crm(config20090726)configure# verify
crm(config20090726)configure# end
There are changes pending. Do you want to commit them? y
crm(config20090726)#
crm(config20090726)# cib use live
crm(live)# cib commit config20090726
INFO: commited 'config20090726' shadow CIB to the cluster
crm(live)# quit
bye

When we now do a one-shot crm_mon we get:

node1~: sudo crm_mon --one-shot


============
Last updated: Fri Nov  6 21:5:51 2009
Stack: openais
Current DC: node1 - partition with quorum
Version: 1.0.6-cebe2b6ff49b36b29a3bd7ada1c4701c7470febe
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ node1 node2 ]

failover-ip     (ocf::heartbeat:IPaddr):        Started node1

Resource operations

There are quite some things you can do with a resource, here are a few examples:

Put a node in standby and back online again

Put node1 in standby

When you want to do maintenance on node1 you can put that node in standby mode. That works like this:

node1~: sudo crm
crm(live)# node
crm(live)node# standby
crm(live)node# quit
bye

You can see that it actually failed to the other node with crm_mon:

node1~: sudo crm_mon --one-shot


============
Last updated: Fri Nov  6 21:04:31 2009
Stack: openais
Current DC: node1 - partition with quorum
Version: 1.0.6-cebe2b6ff49b36b29a3bd7ada1c4701c7470febe
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Node node1: standby
Online: [ node2 ]

failover-ip     (ocf::heartbeat:IPaddr):        Started node2

Put node1 online again

When maintenance is over you can start node1 again like this:

node1~: sudo crm
crm(live)# node
crm(live)node# online
crm(live)node# bye
bye

Now you can see that the resource has failed back again to node1:

node1~: sudo crm_mon --one-shot


============
Last updated: Fri Nov  6 21:08:22 2009
Stack: openais
Current DC: node1 - partition with quorum
Version: 1.0.6-cebe2b6ff49b36b29a3bd7ada1c4701c7470febe
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ node1 node2 ]

failover-ip     (ocf::heartbeat:IPaddr):        Started node1

Migrate the resource to the other node

You might want the resource to run on the other node then the one it is running on right now, this is being done with the migrate command.

We are now telling our cluster to run the IP resource on node2 instead of node1:

node1~: sudo crm
crm(live)# resource
crm(live)resource# list
failover-ip     (ocf::heartbeat:IPaddr) Started
crm(live)resource# migrate failover-ip node2
crm(live)resource# bye
bye

You can now see that it is running on the other node using crm_mon:

node1~:  sudo crm_mon --one-shot


============
Last updated: Fri Nov  6 21:09:45 2009
Stack: openais
Current DC: node1 - partition with quorum
Version: 1.0.6-cebe2b6ff49b36b29a3bd7ada1c4701c7470febe
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ node1 node2 ]

failover-ip     (ocf::heartbeat:IPaddr):        Started node2

Stop the resource

You might want to stop your resource or, in other words, make your resource unavailable. That can be done like this:

node1~: sudo crm
crm(live)# resource
crm(live)resource# stop failover-ip
crm(live)resource# bye
bye

Using crm_mon that will look like this:

node1~:  sudo crm_mon --one-shot


============
Last updated: Fri Nov  6 21:11:56 2009
Stack: openais
Current DC: node1 - partition with quorum
Version: 1.0.6-cebe2b6ff49b36b29a3bd7ada1c4701c7470febe
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ node1 node2 ]

Note that there is no resource listed here, but you can see that there is one configured resource.

Add another node

Now we have a two node cluster, but you might want to upgrade your setup by adding a node.

We will call this node:

  • node3 - ip 10.0.0.13 - third node

First install an extra node as described above under 'Installation' and add it to the cluster by adding the authkey and the configuration and possibly configure the firewall.

Then check if it all worked:

node1~: crm_mon --one-shot


============
Last updated: Fri Nov  6 21:18:14 2009
Stack: openais
Current DC: node1 - partition with quorum
Version: 1.0.6-cebe2b6ff49b36b29a3bd7ada1c4701c7470febe
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ node1 node2 node3 ]

failover-ip     (ocf::heartbeat:IPaddr):        Started node1
Personal tools