You can setup a working LVS from the LVS-mini-HOWTO. Presumably it will be more upto date, clearer and simpler than this section.
Abreviations/conventions for setup/testing/configuring
client: client's IP =CIP gateway: gateway/router's IP =DGW (router will be the client in most test setups) director: director's IP =DIP on director eth0 virtual IP =VIP on director eth0:x (eg eth0:1) real-server: real-server IP =RIP on real-server eth0 virtual IP =VIP on real-server eth0:x/lo:0/tunl0/dummy0 gateway =SGW
(with suggestions from John Cronin jsc3@havoc.gtf.org
)
Get a fresh kernel from ftp.kernel.org and the matching ipvs patch from the software page on www.linuxvirtualserver.org. Apply the kernel patch 2.2.x using the instructions in the tarball. You'll do something like
director:/usr/src/linux# patch -p1 <../ipvs-0.9.8-2.2.14/ipvs-0.9.8-2.2.14.patch
(Note: kernels from RedHat are not the standard kernels off ftp.kernel.org. They are pre-patched with ipvs. If you are going to use RedHat files, follow RedHat's instructions, not ours. If you follow our instructions and try to patch a RH kernel you'll get
> Hunk #1 succeeded at 121 with fuzz 2 (offset 18 lines). > Hunk #2 FAILED at 153. > Hunk #3 FAILED at 163. > Hunk #4 FAILED at 177. > 3 out of 4 hunks FAILED -- saving rejects to include/linux/ip_masq.h.rej > > patching file `include/net/ip_masq.h' > Reversed (or previously applied) patch detected! Assume -R? [n]
The pre-applied patch on the RedHat kernel will likely be older than the one on the LVS website.)
Compile the 2.2.x kernel and reboot.
The actual kernel compile instructions will vary with kernel patch number. Here's what I used for ipvs-0.9.9 on kernel 2.2.15pre9 in the Networking options. The relevant options are marked. Some of the options are not explicitely required for LVS to work, but you'll need them anyhow - e.g. ip aliasing if you need to constuct a director with only one NIC, or tunneling if you are going to run VS-Tun. Until you know what you're doing activate all of the options with an '*' at the start of the line.
You need to turn on "Prompt for developmental code... " or whatever under the "Code Maturity" section. In the "Networking" section, you have to turn on IP:masquerading before you get the ipvs option.
Do all the other kernel stuff - make modules, install the modules, copy the new kernel into / or /boot, edit lilo.conf and run lilo. Make sure you leave the old kernel config in lilo too, so you can recover if all does not go well. When loading the new kernel, make sure the ip_vs* modules get loaded. The README in the kernel source tree has all the necessary info in there.
[*] Kernel/User netlink socket
[*] Routing messages
< > Netlink device emulation
* [*] Network firewalls
[*] Socket Filtering
<*> Unix domain sockets
* [*] TCP/IP networking
[ ] IP: multicasting
[*] IP: advanced router
[ ] IP: policy routing
[ ] IP: equal cost multipath
[ ] IP: use TOS value as routing key
[ ] IP: verbose route monitoring
[ ] IP: large routing tables
[ ] IP: kernel level autoconfiguration
* [*] IP: firewalling
[ ] IP: firewall packet netlink device
* [*] IP: transparent proxy support
* [*] IP: masquerading
--- Protocol-specific masquerading support will be built as modules.
* [*] IP: ICMP masquerading
--- Protocol-specific masquerading support will be built as modules.
* [*] IP: masquerading special modules support
* <M> IP: ipautofw masq support (EXPERIMENTAL)
* <M> IP: ipportfw masq support (EXPERIMENTAL)
* <M> IP: ip fwmark masq-forwarding support (EXPERIMENTAL)
* [*] IP: masquerading virtual server support (EXPERIMENTAL)
* (12) IP masquerading VS table size (the Nth power of 2)
* <M> IPVS: round-robin scheduling
* <M> IPVS: weighted round-robin scheduling
* <M> IPVS: least-connection scheduling
* <M> IPVS: weighted least-connection scheduling
* [*] IP: optimize as router not host
* <M> IP: tunneling
<M> IP: GRE tunnels over IP
[*] IP: broadcast GRE over IP
[ ] IP: multicast routing
[*] IP: PIM-SM version 1 support
[*] IP: PIM-SM version 2 support
* [*] IP: aliasing support
[ ] IP: ARP daemon support (EXPERIMENTAL)
* [*] IP: TCP syncookie support (not enabled per default)
--- (it is safe to leave these untouched)
< > IP: Reverse ARP
[*] IP: Allow large windows (not recommended if <16Mb of memory)
< > The IPv6 protocol (EXPERIMENTAL)
The default LVS hash table size (2^12 entries) originally meant 2^12 simultanous connections. If you are editing the .config by hand look for CONFIG_IP_MASQUERADE_VS_TAB_BITS. Each entry (for a connection to a client) takes 128 bytes, 2^12 entries requires 512kbytes. If you have 128M spare memory you can have 10^6 entries if you set the table size to 2^20. (Note: not all connections are active - some are waiting to timeout).
Early versions of ipvs would crash your machine if you alloted too much memory to this table. This problem has been fixed in 0.9.9. (Note "top" reports memory allocated, not memory you are using. No matter how much memory you have, Linux will eventually allocate all of it as you continue to run the machine and load programs.)
As of ipvs-0.9.9 the hash table is different.
From: Julian Anastasovuli@linux.tu-varna.acad.bg
With CONFIG_IP_MASQUERADE_VS_TAB_BITS we specify not the max number of the entries (connections in your case) but the number of the rows in a hash table. This table has columns which are unlimited. You can set your table to 256 rows and to have 1,800,000 connections in 7000 columns average. But the lookup is slower. The lookup function chooses one row using hash function and starts to search all these 7000 entries for match. So, by increasing the number of rows we want to speedup the lookup. There is _no_ connection limit. It depends on the free memory. Try to tune the number of rows in this way that the columns will not exceed 16 (average), for example. It is not fatal if the columns are more (average) but if your CPU is fast enough this is not a problem.
All entries are included in a table with (1 << IP_VS_TAB_BITS) rows and unlimited number of columns. 2^16 rows is enough. Currently, LVS 0.9.7 can eat all your memory for entries (using any number of rows). The memory checks are planned in the next LVS versions (are in 0.9.9?).
> How long are the connection entries held for ? (Column 8 of > /proc/net/ip_masquerade ?)
(Julian Anastasov uli@linux.tu-varna.acad.bg
)
The default timeout value for TCP session is 15 minutes, TCP session after
receiving FIN is 2 miniutes, and UDP session 5 minutes. You can use
"ipchains -M -S tcp tcpfin udp" to set your own time values.
> If we assume a clunky set of web servers being > balanced that take 3s to serve an object, then if the connection entries > are dropped immediately then we can balance about 20 million web requests > per minute with 128M RAM. If however the connection entries are kept for a > longer time period this puts a limit on the balancer.
Yeah, it is true.
> Eg (assuming column 8 is the thing I'm after!)
Actually, the column 8 is the delta value in sequence numbers. The timeout value is in column 10.
> [zathras@consus /]$ head -n 1000 /proc/net/ip_masquerade |sed > -e "s/ */ /g"|cut -d" " -f8|sort -nr|tail -n500|head -n1 > 8398 > > ie Held for about 2.3 hours, which would limit a 128Mb machine to balance > about 10.4 million requests per day. (Which is definitely on the low side > knowing our throughput...)
The patches to the early versions of the 2.4.x kernels were configured and installed separately to the "make menuconfig" for the kernel. This required moving files into the /lib/modules directories and loading the modules by hand.
To avoid this, you should start with 0.2.7-2.4.2 (there were two versions, the hand install and the version which is configured by "make configure"; get the 2nd one).
Here's the networking config
<*> Packet socket [ ] Packet socket: mmapped IO [*] Kernel/User netlink socket [*] Routing messages <*> Netlink device emulation [*] Network packet filtering (replaces ipchains) [*] Network packet filtering debugging [*] Socket Filtering <*> Unix domain sockets [*] TCP/IP networking [ ] IP: multicasting [*] IP: advanced router [*] IP: policy routing [*] IP: use netfilter MARK value as routing key [*] IP: fast network address translation [*] IP: equal cost multipath [*] IP: use TOS value as routing key [*] IP: verbose route monitoring [*] IP: large routing tables [*] IP: kernel level autoconfiguration [ ] IP: BOOTP support [ ] IP: RARP support <M> IP: tunneling < > IP: GRE tunnels over IP [ ] IP: multicast routing [ ] IP: ARP daemon support (EXPERIMENTAL) [ ] IP: TCP Explicit Congestion Notification support [ ] IP: TCP syncookie support (disabled per default) IP: Netfilter Configuration ---> IP: Virtual Server Configuration ---> < > The IPv6 protocol (EXPERIMENTAL) < > Kernel httpd acceleration (EXPERIMENTAL) [ ] Asynchronous Transfer Mode (ATM) (EXPERIMENTAL) ---
Here's my config for the IP: Virtual Server configuration (turn it all on)
<M> virtual server support (EXPERIMENTAL) [*] IP virtual server debugging (NEW) (12) IPVS connection table size (the Nth power of 2) (NEW) --- IPVS scheduler <M> round-robin scheduling (NEW) <M> weighted round-robin scheduling (NEW) <M> least-connection scheduling scheduling (NEW) <M> weighted least-connection scheduling (NEW) <M> locality-based least-connection scheduling (NEW) <M> locality-based least-connection with replication scheduling (NEW) <M> destination hashing scheduling (NEW) <M> source hashing scheduling (NEW) --- IPVS application helper <M> FTP protocol helper (NEW)
Here is my config for the netfilter section
<M> Connection tracking (required for masq/NAT) <M> FTP protocol support <M> Userspace queueing via NETLINK (EXPERIMENTAL) <M> IP tables support (required for filtering/masq/NAT) <M> limit match support <M> MAC address match support <M> netfilter MARK match support <M> Multiple port match support <M> TOS match support <M> Connection state match support <M> Unclean match support (EXPERIMENTAL) <M> Owner match support (EXPERIMENTAL) <M> Packet filtering <M> REJECT target support <M> MIRROR target support (EXPERIMENTAL) <M> Full NAT <M> MASQUERADE target support <M> REDIRECT target support <M> Packet mangling <M> TOS target support <M> MARK target support <M> LOG target support < > ipchains (2.2-style) support < > ipfwadm (2.0-style) support
Note I have removed the ipchains option here. This was <M> in the last version of the HOWTO. However this raised problems as some people didn't understand the ipchains compatability problems.
Joe Feb 2001: With sufficient number of connections, a director could start to swap out its tables (is this true?)
In this case, throughput could slow to a crawl. I presume the kernel would have to retrieve parts of the table to find the real-server associated with incoming packets. I would think in this case it would be better to drop connect requests than to accept them.
In earlier verions of LVS, you could set the amount of memory for the tables (in bytes). Now you allocate a number of hashes, whose size could (in the worst case) grow without limit.
Julian
> IMO, this is not true. LVS uses GFP_ATOMIC kind of allocations > and as I know such allocations can't be swapped out.
If it's possible for LVS to start the director to swap, is there some way to stop this?
> You can try with testlvs whether the LVS uses swap. > Start the kernel with LILO option mem=8M and with large swap area. > Then check whether more than 8MB swap is used.
In general, nothing specific is done for the real-servers. You can have any OS running on them (except VS-Tun, which runs only on Linux real-servers). You plug them in to the network, startup the services on the VIP (for VS-DR, VS-Tun) or the RIP (VS-NAT), setup the default gw (the router and the director respectively in the usual setups) and you're ready to go.
Except: You have to handle the arp problem with VS-DR (and possibly VS-Tun). Unfortunately this turns what would be a trivial installation into one that requires clear thinking. If you don't want to deal with the arp problem for your first installation, then setup a VS-NAT LVS.
Eventually you'll want the higher throughput and lower latency of VS-DR, in which case you'll need to understand the arp problem. The simplest approach is to use the NOARP option of ifconfig to setup lo:0 on non-linux unix real-servers. For linux,
For non-unix (ie Windows) real-servers, look below for further instructions.
If you are handling the arp problem by hiding the VIP device on the real-servers, then you need to patch the real-servers if
The current version(s) of 2.2.x, e.g. 2.2.19 are already patched with the arp hiding code i.e. you don't have to patch the current 2.2.x kernels. For Linux-2.0.x real-servers, you can use the NOARP option when setting up a device for the VIP.
If you are running non-Linux unix real-servers,
you can handle the arp problem by configuring the device
carrying the VIP with the -arp switch.
This list of real-servers is from Ratz ratz@tac.ch
About the only thing he hasn't tried yet is Plan 9.
Solaris 2.5.1, 2.6, 2.7 Linux (of course): 2.0.36, 2.2.9, 2.2.10, 2.2.12 FreeBSD 3.1, 3.2, 3.3 NT (although Webserver would crash): 4.0 no SP IRIX 6.5 (Indigo2) HPUX 11
Ratz's code is now in the configure script. This part of the script has not been well tested (you might find that it doesn't setup your non-linux unix box properly yet, please contact me - Joe).
Here's the information for non-Linux unices. On some Unixes you have to plumb the interface before assigning an IP. The plumb instruction is not included here.
#uname : FreeBSD #uname -r : 3.2-RELEASE #<command> : ifconfig lo0 alias <VIP> netmask 0xffffffff -arp up #ifconfig -a: lo0: flags=80c9<UP,LOOPBACK,RUNNING,NOARP,MULTICAST>mtu 16837 # inet 127.0.0.1 netmask 0xff000000 # inet <VIP> netmask 0xffffffff #uname : IRIX #uname -r : 6.5 #<command> : ifconfig lo0 alias <VIP> netmask 0xffffffff -arp up #ifconfig -a: lo0: flags=18c9<UP,LOOPBACK,RUNNING,NOARP,MULTICAST,CKSUM> # inet 127.0.0.1 netmask 0xff000000 # inet <VIP> netmask 0xffffffff #uname : SunOS #uname -r : 5.7 #<command> : ifconfig lo0:1 <VIP> netmask 255.255.255.255 up #ifconfig -a: lo0: flags=849<UP,LOOPBACK,RUNNING,MULTICAST>mtu 8232 # inet 127.0.0.1 netmask ff000000 # lo0:1 flags=849<UP,LOOPBACK,RUNNING,MULTICAST>mtu 8232 # inet <VIP> netmask ffffffff #uname : HP-UX #uname -r : B.11.00 #<command> : ifconfig lan1:1 10.10.10.10 netmask 0xffffff00 -arp up #ifconfig -a: lan0: flags=842<BROADCAST,RUNNING,MULTICAST> # inet <some IP> netmask ffffff00 # lan0:1: flags=8c2<BROADCAST,RUNNING,NOARP,MULTICAST> # inet <VIP> netmask ffffff00 #
Some unices aren't very cooperative and other methods (e.g. adding an extra NIC) should do it.
Ratz 16 Apr 2001
in most cases (when using the NOARP option) you need alias support. Some Unices have no support for aliased interfaces or only limited, such as QNX, Aegis or Amoeba for example. Others have interface flag inheritance problems like HP-UX where it is impossible to give an aliased interface a different flag vector as for the underlying physical interface (as happens with Linux 2.2 and 2.4 - Joe). So for HP/UX you need a special setup because with the standard depicted setup for DR it will NOT work. I've done most Unices as Realserver and was negatively astonished by all the different implementation variations of the different Unix flavours. This maybe resulted from unclear statements from the RFC's.
(This is not handled by the configure script).
Here's Wensong's recipe for setting up the lo device on a NT real-server.
If you don't have MS Lookback Adapter Driver installed on your NT boxes, enter Network Control Panel, click the Adapter section, click to add a new adapter, select the MS Loopback Adapter. Your NT cdrom is needed here.
Then add your VIP (Virtual IP) address on the MS Loopback Adapter, do not enter a gateway address on the Loopback Adapter. Since the netmask 255.255.255.255 is considered invalid in M$ NT, you just accept the default netmask, then enter MS-DOS prompt, remove the extra routing entry.
c:route delete <VIP's network> <VIP's netmask>
This will make the packets destined for this network will go through the other network interface, not this MS Loopback interface.
As I remember, setting its netmask to 255.0.0.0 also works.
alternatively (Jerome RICHARD jrichard@virtual-net.fr
)
On Windows NT Server, you just have to install a network adapter called "MS Loopback" (Provided on the Windows NT CDROM in new network section) and then you setup the VIP on this interface.
from o1004g o1004g@nbuster.com
ipchains in 2.2.x kernels has been replaced by iptables in 2.4.x kernels. For 2.4 kernels, ipchains is available for backwards compatibility. However ipchains and iptables can't be used at the same time.
The ip_tables module is incompatible with ipchains. If present, the ip_tables module must be unloaded for ipchains to work.
If you have ip_tables loaded, you'll get uninformative errors when you try to run ipchains commands with 2.4. Rather than saying that ipchains under 2.4 is there for compatibility, it would be more accurate to say that the ipchains commands available with 2.4 kernels will only cause you grief and it will be faster to rewrite your scripts to iptables, than to fall into all the holes you'll find using the compatibility. It won't take long before some script/program expects to run ip_tables on your 2.4 machine and as soon as that happens one or both (I don't know which) of your iptables or ipchains are hosed.