This file describes the installation and use of Ka on a Linux cluster or a network of computers.
Requisites
Ka allows the cloning of a "source node" to a "destination node" - or several destination
nodes - using information from a "server node" (nfs/tftp/dhcp).
The source and the destination nodes should have the same hardware.
Ka is made of several components:
- a 'master' program that will be run on the source node
- a 'client' program that will fetch the data from the 'master' and write them on the destination nodes. This client is located
on a nfsroot system image (that will be mounted later by the destination nodes), and therefore must be installed on a NFS server.
- a bunch of scripts/files used to manage the boot of the destination nodes, which must be installed on a TFTP server.
Installation and use
Cloning a cluster with Ka is a four-step work.
- Getting the rpm files
- Installing the server node
- Preparing the source node
- Cloning
Getting the files
Download the RPM's from the download page on
sourceforge.
ka-nfsroot and ka-server-host RPM's are to be installed on the server
node.
ka-deploy-cluster-node must be installed on the source node.
ka-nfsroot only comes as a binary (It has been built from the mini distribution ttylinux (http://www.informatik.uni-bremen.de/~pharao90/ttylinux/)).
Installing the server node
Installing Ka
Install ka-nfsroot and ka-server-host RPM's.
To easily configure ka, run the configuration scripts configure_server.sh and configure_nfsroot.sh (in /usr/share/Ka).
Ka requires on your server:
- a NFS server which exports /tftpboot/ka/nfsroot
- a TFTP server (compatible with pxelinux, for instance tftp-hpa) which serves files from the directory /tftpboot
You can test your installation by running the test_services.sh script (in /usr/share/Ka).
As a super-user, type test_services.sh ip-of-your-tftp-server
DHCP
Ka needs a working DHCP server to be present on your network. This server is required for proper boot with pxelinux. Dynamic leases are probably not a good idea here.
Your dhcpd.conf must include a "next-server" statement giving the IP of your tftp server.
The "filename" option in your dhcpd.conf must be "/tftpboot/ka/pxelinux.0".
Preparing the source node
Install the ka-deploy-source-node rpm. There is no configuration required for this package.
You can again test your installation by running the test_services.sh script (in /usr/share/Ka).
As a super-user, type test_services.sh ip-of-your-tftp-server
Cloning
On the server
The boot of the destination nodes uses PXE and PXELINUX, and does so by using a config file present on the tftp server.
So you must indicate on the server which are the destination nodes that you want to install, so their next reboot will be in install mode. For this :
$ cd /tftpboot/ka/
$ ./ka_pxe_step -s -t install -m machine1 -m machine2 -m machine3
./ka_pxe_step -h will give you a list of available options.
On the source node
Run (as super-user) the ka-d command on the source node:
ka-d.sh -n xx
where xx is the number of destination nodes.
Run ka-d.sh -h to obtain the full list of options.
Boot all destination nodes. When all the nodes have started, cloning starts.
- On the client, tar prints :
tar : could not create directory : no such file or directory ->This is normal, see BUGS
- On the client, tar prints lots and lots of error messages : ->you probably have a wrong version of tar
- During the cloning, the client does not find the server and fails on 'udp-find-server()' : -> your netmask may be wrong. check your dhcpd.conf, you must have specified
the subnet. If you still have trouble with udp, try to get rid of the -s
option on ka-d-client (edit /tftpboot/ka/nfsroot/install) and use the -h option
(see Ka-deploy.txt for help)
-
Default kernel in the /tftpboot directory on the server node is compliant with
3-com and Intel Ethernet card. If you have another network card, you must build a linux kernel
image for your machines that supports (without using modules):
- your network interface card(s)
- IP autoconfiguration(bootp/dhcp)
- root filesystem on nfs
- Default cloning uses the source node partitionning scheme. This means that you will probably run into some weird errors if your nodes do not all have the same hard disk.
To change this, use a 'description file' with the -p option of ka-d.sh
- You must check that the destination nodes can boot on their LAN card using the PXE protocol
-
You also need some software of the cluster node:
- GNU tar, and NOT version 1.13.19. Version 1.13.17 is OK, I don't know about
the others. (issues with hard links in 1.13.19). -- older versions probably
are NOT ok.
-
tftp : tftp 0.16-2 is OK, tftp 0.17-4 is not (buggy tftp put).
- a properly configured lilo on the source node
If you have trouble finding correct versions for tar and tftp, just take the
ones from the nfsroot rpm.
- IP configuration on the cloned nodes : the installation process will try to modify the configuration files of the newly installed systems to set a correct IP
address and DNS name. This should work with Mandrake systems at least. If you have problems, edit /tftpboot/ka/nfsroot/ka/setup-network.sh
During the cloning phase, you may check what is going on the server
with a tail -f on
/var/log/messages on the server. Information about the other nodes are printed
on their screen. You may also see what's on the screen of the installing nodes by telneting them on port 233.
A typical static lease to use would be like this :
host icluster1
{
hardware ethernet 00:01:02:03:4c:d8;
fixed-address 129.88.96.1;
filename "ka/pxelinux.0";
next-server 129.88.96.252;
}
With a dhcpd.conf beginning like this for example :
option subnet-mask 255.255.255.0;
option routers 129.88.96.254;
subnet 129.88.96.0 netmask 255.255.255.0
{
group {
default-lease-time -1;
(add leases here)
}
}
If you want to create static hosts configuration for a lot of machines, you need their MAC address. You can do it by hand. You can also boot them with a dynamic host configuration, and
then take the addresses on the leases file or on the log files. There is a PERL script in the scripts directory that does that (recup_addresses_mac.pl). But you must be careful and switch the machines on in the good order.