Export your block devices with AoE
Posted by rodo on Thu 11 Oct 2007 at 08:59
Imagine you have a machine with all of his disk full and another with unused Gigabits, and you don't want to move the data from one to other. Why not using the second's disk on the first, you can do it with iSCSI but you can do it with ATA over Ethernet (AoE) too. It's the second method I'll explain is this article.
All of this was made with two computer running Debian Etch.
Prepare the kernel
First check if your running kernel have AoE, the config option name is CONFIG_AT_OVER_ETH, have a look at fig1, my kernel has AoE in module
fig1
host:/# grep ATA_OVER /boot/config-`uname -r` CONFIG_ATA_OVER_ETH=m host:/#If not, configure your kernel and activate AoE in core or in module like you prefer
Device Drivers -->
|- Block Devices --->
|- <m> ATA over Ethernet support
Ok now you have a kernel with AoE, just load the aoe module
host:/# modprobe aoe host:/#You can check your syslog to be sure AoE is available
host:/# tail /var/log/syslog Oct 10 11:54:07 host kernel: aoe: aoe_init: AoE v22 initialised. host:/#Now we'll call the client 'client' and the server 'server', funny isn't it ?
In SAN vocabulary we call the client 'initiator' and the server 'target', I prefer to continue using simplest therms.
The server side (target)
In first we need to install the vblade packageserver:/# apt-get install vblade Reading package lists... Done Building dependency tree... Done The following NEW packages will be installed: vblade [...] Unpacking vblade (from .../archives/vblade_11-1_i386.deb) ... Setting up vblade (11-1) ... server:/#On our server we'll export the /dev/sdd5 partition which has a size of 5GB, export a block device is easy at do
server:/# vbladed 0 1 eth0 /dev/sdd5 server:/#Some explication about this command, each AoE device is identify by a couple Major/Minor, with major between 0-65535 and minor between 0-255. AoE is based just over Ethernet on the OSI models so we need to indicate which ethercard we'll use.
Is this example we export /dev/sdd5 with a major value of 0 and minor if 1 on the eth0 interface.
We are ready to use our partition on the network !
Client Side (initiator)
The client needs the aoe kernel module too, so prepare your kernel as we saw.The userlands tools are present in the package aoetools
client:/# apt-get install aoetoolsNow discover what we can use over our network :
client:/# aoe-discover client:/# aoe-stat e0.1 5.000GB eth0 up client:/#At this point we have a new block device available on the client box named /dev/etherd/e0.1. If we have a look at the /dev tree a new node appears
client:/# ls -al /dev/etherd/ total 4 drwxr-xr-x 2 root root 140 2007-10-10 13:30 . drwxr-xr-x 16 root root 14660 2007-10-10 13:30 .. c-w--w---- 1 root disk 152, 3 2007-10-10 13:30 discover brw-rw---- 1 root disk 152, 16 2007-10-10 13:30 e0.1 cr--r----- 1 root disk 152, 2 2007-10-10 13:30 err c-w--w---- 1 root disk 152, 4 2007-10-10 13:30 interfaces -rw-r--r-- 1 root root 5 2007-10-10 13:00 revalidate
What to do with
Simply make a filesystem on your block device likeclient:/# mkfs.ext3 /dev/etherd/e0.1and use it like you do with your /dev/hd* or /dev/sd* the only difference is that block device is over the network !
- Can you create the filesysteme server side ?
- What happen if connection goes down ?
[ Parent | Reply to this comment ]
Q1: Can you create the filesysteme server side ?
A: The filesystem can be created on the server or the client. For both aoe target is just a disk.
Q2: What happen if connection goes down ?
A: The same effect when you pull a running hard-drive from the machine.
That being said, you can use multiple ethernet cards on the client and server which gives you multiple paths to the device.
--
best regards
Atif Ghaffar
[ Parent | Reply to this comment ]
(and it explaines the current poll ;-)
I never tried aoe, even didn`t knew its shipped in etchs default kernel!
But there are some questions open for me:
- can I mount aoe-exported blockdevices simultaneously on severeal clients? (this would be great for xen-domUs)
- can I mount an aoe-exported blockdevices read-only?
- can I easily restrict access via IP or even better with crypto. keys?
[ Parent | Reply to this comment ]
* have a look at http://xenaoe.org/ :-)
* Indeed you mount the aoe blockdevices as you mount a partition on a local disk
* Nop, same answer as 2.
Regards
[ Parent | Reply to this comment ]
PJ
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
You are off the edge of the map, mate. Here there be monsters!
[ Parent | Reply to this comment ]
That said, if anyone has knows of a good HOWTO on Debian iSCSI, with a Debian target host, I'd be willing to give it another try.
The disadvantage of AoE, is that AFAIK its Linux/*BSD only at this point, where as iSCSI works with Windows.
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
That said, if you have routers then they can most likely be configured as bridges. Or you can use something like vtun to bridge two Ethernet networks.
Cheers,
Andy
[ Parent | Reply to this comment ]
AOE works on a layer beneth that (ethernet)
So you dont have the tcp overhead with aoe as you have with iscsi.
--
best regards
Atif Ghaffar
[ Parent | Reply to this comment ]
Aoe on the other hand isn't routable, its constrained to the ethernet segment on which is lives and as such, by default is fairly secure.
Also the overhead of creating full ip packets with the checksums required can add extra load to a server, if it is already heavily loaded this can make a difference. This has necessitated the creation of iscsi hbas (like iscsi drive controllers) with integrated tcp offload engines to get around some of this.
[ Parent | Reply to this comment ]
We use coraid boxes for AOE and are extremely happy with them.
Even when using the hardware device it is useful to put another software layer that can split and export these devices.
One example would be how to split and export one disk to many clients.
For example: you have 5 disks and you use hardware raid to make one big raid 5 with it.
You may now have one large disk 10TB as /dev/sda
You now need to give 20GB to machine 1, 10GB to machine 2 and 1TB to machine 3.
This can be done by creating 3 logical volumes on this device and export each volume as a disk.
This makes it also easy to grow and re-export the volumes thus increasing the storage capacity on the clients.
CoRaid also sells some ready-made boxes for this pupose which can be stacked on top of the storage boxes.
http://www.coraid.com/products4.html
--
best regards
Atif Ghaffar
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
AoE is a SAN protocol. It replaces something like fibrechannel which is much more expensive due to the non-commodity nature of the hardware involved. It seems funny to me that people occasionally criticize AoE for being non-routable when fibrechannel has always been non-routable and worked just fine. If you do need to route AoE you can do layer 2 tunneling between sites. But you will want to make sure you encrypt it. If you don't it is like giving the general public direct read access to your IDE cable because that is what your network becomes when you run a SAN protocol such as AoE over it.
[ Parent | Reply to this comment ]
[ Send Message | View dkg's Scratchpad | View Weblogs ]
Does anyone have a summary of what's been done to really hammer on it?
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
I'm just thinking, it's gonna be obvious for most people perhaps, but I can't stop asking myself, why should I consider using Ata over Ethernet if I am already using NFS? Is it a performance issues? Capabilities issues? Can you give me some ideas about that? Thx.
[ Parent | Reply to this comment ]
----------
AOE and NFS have nothing to do with each other. *
If you ask this question: Should I use AOE when I am using NFS?
The answer should be : No you should not.
AOE is for SAN (Storage Area Network)
http://en.wikipedia.org/wiki/Storage_area_network
http://en.wikipedia.org/wiki/ATA_over_Ethernet
NFS is for File Sharing (Network File System) http://en.wikipedia.org/wiki/Network_file_system
AOE exports BLOCK devices (raw disk).
NFS exports Files (a filesystem stricty speaking).
* AOE and NFS have nothing to do with each other.
--
best regards
Atif Ghaffar
[ Parent | Reply to this comment ]
With AoE, (as with nbd or drbd), since you have a shared device, you mount the real physical partition on the CLIENT side. So you CANNOT mount it twice unless you want to mess up with your superblock and metadata ...
The only ways for AoE to be "like" nfs is
- when you mount your nfs export from one client machine only : AoE is a quicker solution
- if you use a distributed filesystem such as ocfs2 or gfs, you can mount AoE drive on many simultaneous places, since the lock manager do it for you.
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
So, the drive cache is on the server side when using NFS : each and every request (such as getattrs) are sent to the nfs server and the reply is sent back to the client.
When using AoE, the mount is done on the client side, so the will be the file cache.
So :
- if you have a low-latency network or have many RAM on your server, you should really choose nfs
- if you have no RAM on your server (I mean not that much) but many on the webserver, you should choose AoE.
anyway, you can change at any time : just unmount the partition on the AoE client, remount it on the AoE server and reload nfs. Then remount the partition on the client side using nfs ...
---
Example: Some web softwares use soooo many nfs requests that the NFS server can get overloaded by only 1 website ... Here is an example on a french mass-hosting ngo :
http://mon.lautre.net/munin/lautre.net/fey.lautre.net-nfsd-year.p ng
On september 15th, we removed 1 website from the 1000 sites hosted on this cluster, and the getattrs went down from 800 req/s to 400 req/s (...) For this kind of website, AoE will definitely be better...
[ Parent | Reply to this comment ]
on server server side -
(root) uname -r 2.6.17.4 (root) modprobe aoe (root) vbladed 0 1 eth0 /tmp/aoeswapfilein
/var/messages I see: ioctl returned 0 269484032 bytes pid 9036: e0.1, 526336 sectorsso this is OK, right?
on client side -
(root) uname -r 2.6.23.1 (root) modprobe aoe (root) aoe-discover (root) aoe-statNothing was printed out. Not
/dev/etherd/eX.Y at all occured. Why? There are no errors anywhere or so. Can you help me, please? Thanks...
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
Strangely enough when I modprobe aoe on a "pure client" '/dev/etherd' is created and populated (even without aoetools, which is nice actually).
On the server I modprobe aoe, it get's initiated as far as I can tell but no '/dev/etherd' (aoe-discover chokes because of the missing directory) If I create the directory by hand (mkdir /dev/etherd) aoe-discover creates a file "discover" which just has a newline.
Any hints or explanations why that happens?
[ Parent | Reply to this comment ]
[ Send Message | View dkg's Scratchpad | View Weblogs ]
You might be interested in vblade-persist, a framework i wrote that automatically creates properly-named symlinks within /dev on the exporting server, while supervising the exports to make sure that they stay up (and come back after a reboot). Hopefully it'll be in debian soon.
[ Parent | Reply to this comment ]