A Backup Fileserver, OverviewWhy bother backing up data? There are several good reasons. The first is that everything can break, including the hard drive where your data is. The second one is you might delete something important by mistake and want it back later on. The third is in case of disaster recovery, like your computer catching on fire, or being stolen. How do you backup data? There are many ways to backup data. There are flash drives and external drive enclosures. There are prebuilt boxes that do it. There are hardware raid controllers. There are computers dedicated to storing data. Lets narrow down the choices a bit. The first thing to know where the data you will be backing up is, and how much you will be backing up. On a Unix or Linux system, the typical answer is everything on your personal directory tree. On a windows system, the personal directory tree is typically on the OS partition, and has all kinds of junk as well as useful stuff. I recommend a separate OS partition, and a user partition. That way, if you need to reinstall the OS, you won't lose all your personal data. Of course, computers with the OS pre-installed aren't set up that way. There are two easy solutions. One is to shrink the OS partition and use the rest of the space as the user partition. This is a great solution for notebooks that only have one physical drive. Another solution is to buy a separate hard drive and dedicate it for user files. |
The next level of sophistication is to separate out different types of data. Clearly some types of data is more important to backup than others. For me, all my critical data will fit on a big thumb drive. I have several other larger partitions for less critical data.
Flash drives are a great way to store data. They are quite robust, reasonably priced, and very portable. They aren't generally very fast (5-20mbytes/second write speeds are typical), nor are they very big (32 or 64 gigabytes max). If you can fit your date on one, or two if you are worried about a single point of failure, they are a great solution. I recommend name brand flash memory vendors, such as Sandisk, as they tend to use the better grade chips for themselves, and sell the lesser grade chips to their competitors. If you use a fat32 file system, it is not very robust and can get accidentally corrupted. I don't know a universally readable file system that is more robust though.
The next step up is an external hard drive in an enclosure. There are many interface options available, USB, firewire, e-sata, and Ethernet. Each has a potential advantage. Make sure the Ethernet is gigabit Ethernet, or you will be limited to 100mbits/second. USB is found on almost all computers. Firewire is in practice a bit faster, though not as popular. e-sata is the newest external interface. It is faster than any other one, but older computers don't have an external sata port. This can be fixed with an add-on card, but e-sata is useless on all but the newest notebooks. Ethernet's advantage is it can be attached to your network, rather than your computer, so it can be visible from all your computers rather than just the one that it is plugged into. If you have a 3.5 inch hard drive in an enclosure, in my experience it will run quite hot. Hot enough that I am concerned about the drive's reliability. Antec makes a very nice enclosure with a big quiet fan, the MX-1. It has usb and e-sata ports, and the case is designed to absorb noise.
The next step up is a bit more complex as it involves multiple hard drives. You can combine multiple hard drives to look like one big hard drive. You can treat each hard drive separately. Or, you can combine multiple hard drives so that one hard drive's failure can be tolerated by the system. The simplest is called RAID-1 (RAID stands for Redundant Array of Inexpensive Disks). In RAID-1, you copy all data to two disks. If one disk breaks, you get the data from the other disk. The overhead is you need double the amount of hard drive storage you would need without RAID-1. The good news is it is very easy to do. The next level of complexity is called RAID-5. RAID-5 allows for the failure of a single disk in a group of disks to not lose any data. RAID-6 allows for the failure of two disks in a group of disks to not lose any data. Though disks are inexpensive, they aren't very inexpensive. In addition to the price of the disk, the disk takes up space inside the computer, and takes power. Based on my experience with disks, I decided that Raid-5 was the best solution for me.
One thing to watch out for, is RAID-5 protects against one hard drive failing. If you bought a batch of 4 hard drives, and the batch has problems, 3 of the drives might fail within a few weeks. This happened to me. The only protection against this is to buy different brands or models of hard drives. Realize that some companies such as Seagate make drives and label them as Maxtor. So even though the drives have different names and model numbers, they might actually be the same. Just something to keep in mind.
You can buy a system to hold multiple disks and do RAID-5. They are typically called NAS (network attached storage). A prebuilt NAS system is a box that has everything except the hard drives, and you just add hard drives. They are less flexible than a dedicated general purpose computer, but are likely simpler and use less power and volume. Internally, they generally run Linux. I will only consider units with gigabit Ethernet, otherwise file transfer is limited to 12.5mbytes/second, which is too slow for me. I am only going to consider systems that support raid-5 for robustness. The minimum amount of drives you need for raid-5 is 3 drives (with 2 drives you might as well use raid-1). Lets see what is out there. There is the Promise SmartStor NS4300N that holds up to 4 disks. Write speed is around 15 mbytes/second. Cost is $400 without drives. Looking at Small Network Builder's NAS charts, only Thecus 1U4500 and Thecus N5200 have write speeds above 20mbytes/second. Thecus 1U4500 holds up to 4 drives, and has a write speed of 32mbytes/second and costs about $1000 without drives. Thecus NS5200 holds up to 5 drives and has a write speed of 25mbytes/second and costs about $700 without drives. It looks like these systems are not for me. They are somewhat lacking in flexibility, and pretty expensive, none hold more than 5 hard drives, and none have a write speed above 32mbytes/second.
The next big choice is software or hardware raid. There are basically two types of raid controllers, the type that rely on the operating system to do most of the work (known as software raid controllers) and the type that don't rely on the operating system to do most of the work (known as true hardware raid controllers). If a card costs under $100, or only works with Microsoft operating systems, you can be sure it is a software raid controller. For a 8 disk true raid controller, there are several choices: Adaptec has some models around $575, Atto makes a nice model for $1095, ICP has a model for $650, Promise has some nice models starting around $400. In addition to raid controllers, you can do the raid entirely in software. This provides quite a bit of flexibility, although it can take quite a bit of CPU cycles. The good news is you can get a pretty decent motherboard, processor and memory for less than a true hardware raid controller card. Sun has a new filesystem called ZFS which can be configured to have raid-like properties (they call it raid-Z). It is done entirely in software. You can get Solaris for personal use for free. ZFS is also supported under FreeBSD. In some respects ZFS is more robust than other filesystems. If you do use ZFS, there is no reason to buy a hardware raid controller. This is some of the possible flexibility you get when you do raid entirely in software. For me, a hardware controller was too expensive, so I decided to use software raid.
You will need to build or repurpose a computer to be the fileserver. You will need room for all the hard drives, and a way to keep them cool. The power supply must be powerful enough to spin up all the hard drives. You will need one or more gigabit Ethernet interfaces.
The motherboard should support gigabit Ethernet. If it doesn't, it is likely too old, or too cheap to use. For older Pentium III servers, either get a 64 bit PCI Ethernet card, or a 32 bit PCI card running on its own PCI bus (server motherboards often have more than one bus). You will want to use ECC memory. If the motherboard or processor doesn't support ECC, it is the wrong choice for a fileserver. The last feature is a decent amount of I/O speed. Even though my Asus NCCH-DL motherboard has two PCI-X slots (64bits and 66mhz which totals 528 mbytes/second), the southbridge-northbridge interface is only 266 mbytes/second. This is one place where a hardware raid card might be faster than software raid, as much of the I/O can happen on the card, and only some would get to the PCI-X bus. Any opteron system that I am familiar with will have plenty of I/O capability.
You will need enough I/O ports for all of your hard drives. You will also need one for the OS disk, and likely one for the optical drive. I suppose you could use a USB optical drive if you ran out of ports. If the motherboard doesn't have enough ports, you will need a card with some ports. If you have pci-x slots, supermicro has a sat2-mv8, 8 port sata II controller card that works well. If you have pci-e slots, try to find a sata II card without raid hardware or software. I don't recommend parallel ATA, unless you only have one drive per controller, as a failed disk can make the other disk on the cable inaccessible.
I recommend a 64 bit processor, as the xor's of raid-5 will be twice as fast (assuming a 64 bit OS). Any dual core processor is likely fast enough, so select a low power one. An AMD phenom is plenty fast, reasonably low power, and stupidly cheap right now.
It is important to keep the disks reasonably cool. The coolermaster 4-in-3 module is the best way I have found to do so. It has a 120mm fan in front of the hard drives. The fan doesn't have a rpm sensor. It is reasonably priced. The only downside is that to change a hard drive means disconnecting all the cables from all the drives, and removing the module. If that doesn't bother you, buy one or more of these. If that does bother you, I recommend the supermicro 5 bay hotswap rack. Each hard drive is accessible from the front, and can be removed while the computer is on (assuming you have SATA II or SAS and software support). The 92mm fan in the back is quite loud, but you can get an inline 3 pin adjustable fan speed controller and slow it down so it isn't so bothersome. The rack has a fan failure alarm, an over-temperature alarm, and hard drive activity lights, and fault lights. It isn't cheap, but I really like them. You will want to monitor the hard drive temperature with some software tool. For windows, speedfan works great.
I had a bunch of fileservers, which worked well. One issue I had was they consumed a lot of power. The Asus m3a78-t uses 93 watts at idle, and the Supermicro X8DTL with one L5640 uses about 96 watts at idle. I decided to upgrade the motherboard to something newer. I now have two Supermicro X10SLM-F motherboards, one Supermicro X10SLM+-LN4F motherboard and one Supermicro X10SLL-F motherboard. They are using Xeon e3-1220v3 or Xeon e3-1270v3 processors. I have reviews of these motherboards elsewhere, but they idle around 46 watts. Most of the fileservers now have 2.5g ethernet, and I have a 2.5g switch. The good news is 2.5g NICs are cheap. The bad news is 2.5g switches are not. I bought a trendnet 8 port switch, which worked fine at 2.5g speeds, but not at 1g speeds. Fortunately it has a lifetime warranty.
For the disks, I have made some upgrades. Server #1 has four 8tb western digital / hitachi helium drives. Server #2 has four 10tb western digital /hitachi drives. Server #3 has four 12tb western digital / hitachi SAS drives. Thees need a SAS controller. Server #4 has seven 4tb drives. The 8tb drives were shucked from external drives. The 10tb and 12tb drives I bought uses. And the 4tb drives are quite old, and were purchased from Frys on black friday.
All of the servers have 80+ power supplies. Unfortunately, there are very few really efficient low power power supplies. Most of the ones I am using are 380w 80+ 'white'. If someone knows of more efficient low power supplies that aren't really expensive, please let me know.
Disks | Read | Write | |
four 8tb 5400rpm SATA | 258 mb/sec | 205 mb/sec | |
five 8tb 5400rpm SATA | 453 mb/sec | 383 mb/sec | |
seven 4tb 7200rom SATA | 443 mb/sec | 245 mb/sec | |
four 10tb 7200rpm SATA | 472 mb/sec | 265 mb/sec | |
four 12tb 7200rpm SAS | 656 mb/sec | 535 mb/sec |
I am using an older Asus M3a78-T motherboard (which used to be my main computer) 8gb of ram and an AMD Phenom II 940 processor. I am now using a Highpoint RR2710 card with 8 SAS ports. The card supports 6 gigabit/sec drives and drives over 2tb, and is reasonably inexpensive. I have six 4tb hard drives and one 5tb hard drive for the raid array. It can check the array at roughly 1 gigabyte per second, which is pretty fast. The array can do writes at 327 mbytes/sec and reads at 682 mbytes/sec. Need to look into upgrading my network to 10 gig...
My new motherboard is an Asus M5A78L (chosen to be inexpensive) 8gb of ram and an AMD Phenom II 720 processor (chosen because I had one lying around and it is reasonably low power). I am now using a Highpoint RR2710 card with 8 SAS ports. The card supports 6 gigabit/sec drives and drives over 2tb, and is reasonably inexpensive. I have eight 2tb hard drives for the raid array.
So I can now do 35 mbytes/second write, and 365 mbytes/second read. Not sure why the writes are so slow. I will have to look into it.
My new motherboard is a Supermicro X7DBE with 8gb of ram and 2 L5420 Xeon processors. The motherboard uses FB-DIMMS which run quite hot (70C or so) and suck down power (@8w at idle per DIMM). I am now using a LSI SAS 3081e 8 port SAS controller and seven 2tb hard drives.
So I can now do 225 mbytes/second write, and 345 mbytes/second read. That is plenty fast for now.
Well, my used Asus NCCH-DL motherboard finally died. I replaced it with a Asus M3N WS motherboard. I bought an AMD phenom II 710, which has three 2.6ghz cores. I also bought 4gb of ECC memory. The phenom has very nice power management, and runs quite cool at idle. The M3N WS has a PCI-X slot for my 8 port SATAII controller, as well as 6 native SATAII ports. I used the same supermicro pci-x 8 port sata II controller, and seven 750gb hard drives.
Power consumption (with 7 disk array, and three 2.6ghz processors):
So, sequential writes went from 92mbytes/second to 105mbytes/second. Per character sequential writes went from 22mbytes/second to 58mbytes/second. Sequential reads went from 191mbytes/second down to 161mbytes/second. Per character sequential reads went from 27mbytes/second to 56mbytes/second. Overall, much improved.
I then added an 8th disk, and changed the chunk size to 64k. Here are those numbers:
So I can now do 204 mbytes/second write, and 320 mbytes/second read. I upgraded my client machine to windows 7. Before I added the 8th disk, I could do sustained writes of 90 mbytes/second over ethernet. I think with the 8th disk and bigger chunk size, I should be able to saturate the gigabit ethernet. Of course, this is for large file transfers.
I refreshed my hard drives, switching to six 2tb drives, and switched to Mandriva 2010.1 (kernel 2.6.33) and ext4 with a chunk size to 64k. Here are those numbers:
So I can now do 124 mbytes/second write, and 366 mbytes/second read. I upgraded my client machine to windows 7. For some reason writing is slower. Perhaps because there are less disks? It isn't the filesystem. I suspect the linux kernel is the main cause. I think I will try ext4, as it has more development effort, and faster boot times.
I learned a lot building my first two backup fileservers. It is now clear to me that my first two were limited by relatively slow main memory speed. Using PC133 memory, the main memory speed is roughly 400 mbytes/second. Since there is no easy way to speed that up using Pentium III processors, it was time for something newer. I bought a used Asus NCCH-DL motherboard and two LV Xeon 2.8ghz processors. Though the Xeons can draw more power than the Pentium III, they draw a variable amount of power depending on load, unlike the Pentium III which always draws the same amount of power. This motherboard, like the Intel STL-2, features 2 pci-x slots (64 bit, 66mhz), as well as built in Intel gigabit Ethernet. It has a bunch of other nifty features as well, the best ones are the ability to set the clock speed and cpu multiplier as well as fan speed based on system and cpu temperatures. I added 2gb of pc3200 memory. I used the same supermicro pci-x 8 port sata II controller, and four 750gb hard drives.
Power consumption (with 4 disk array, and 1.6ghz processors (1.86ghz)):
The writing speed is 72.9mb/sec (mega bytes per second) which is faster than V2 which was 46.5mb/second and faster than V1 which was 60.7mb/sec. The reading speed is 119mb/second which is faster than V2 which was 102mb/second and faster than V1 which was 35mb/sec or 53mb/sec (benchmark was run twice). I hope the faster writes translate into faster performance using samba. With the V2 system, I could write between 20 and 25 mb/second. With the V3 system, I can write between 25 and 35 mb/second. I decided to test by copying two disks over at a time. I had tried this before with the V1 and V2 system, but the overall speed didn't increase (as it was maxed out). With the V3 system, I can write to the fileserver with an aggregate speed of between 40 and 50 mb/sec. This is a huge increase over the V2 system. Perhaps I will increase the memory speed and see what that does for performance, but this version is almost twice as fast as the V2 system, which is a big win already.
I bought two more drives for my raid array. You can 'grow' the array without destroying the data. First you say:
mdadm --add /dev/md0 drive1 drive2
to add the drives to the array. Next you say:
mdadm --grow /dev/md0
To grow the array with the added drives. I did this. What I didn't know was the process takes over 100 hours when I am adding two 500gb drives. The system hung during this process, so I built the array from scratch.
The next step is to grow the filesystem via the command:
resize_reiserfs /dev/md0
The there will be more free disk space.
Power consumption (with 6 disk array and 2.8ghz processors):
Well, what did the extra two disks do for filesystem performance? Here are some performance numbers:
So, sequential writes went from 73mbytes/second to 92mbytes/second. Sequential reads went from 119mbytes/second to 191mbytes/second. This is close enough to saturating gigabit Ethernet from my perspective. No need to crank up the cpu or memory speed.
I learned a lot building my first backup fileserver. It was time to build another. I bought a used Intel STL-2 motherboard, two Intel Pentium III 933 processors, and 4gb of ECC memory for $130. This is a really serious server motherboard. It has onboard ultra-160 SCSI, and 2 pci-x slots (64 bit, 66mhz), which is faster than my gateway 6400 64 bit slots. I added a supermicro pci-x 8 port sata II controller, and an Intel pro 1000-mt pci-x gigabit Ethernet card. I added four sata II 500gb hard drives. I used a promise Ultra TX 100 IDE controller for the OS disk.
Power consumption:
The writing speed is slower than backup file server V1, but I only have 4 drives in the array. Hopefully it will speed up when I add more hard drives. The reading speed is about twice as fast as the old backup file server. It is possible I was saturating the 33mhz pci bus, or perhaps I was having master/slave issues (which isn't a concern with sata).
I wanted a fileserver to back up the rest of my computers to, at a low cost. I had an older Gateway 6400 server motherboard that was used. It has two Pentium III 933 processors, over a gigabyte of ECC memory, and some 64 bit PCI slots (in addition to the usual 32 bit PCI slots). Since hardware RAID is generally expensive, slow, and can be difficult to repair (if the controller fails you usually need to replace it with the exact same part), I choose software RAID. The main CPU(s) are used, and if some part fails, there is no need to find an exact replacement for it. Since the motherboard has a problem with its IDE controllers Asus CUR-DLS, I used two Promise Ultra TX 100 IDE controllers. These controllers each have two UDMA-133 connectors, however you cannot boot to a CDROM drive attached to them to install the OS. I could have used a SCSI drive attached to the motherboards SCSI controller, but I decided to use a generic IDE hard drive instead. So I attached a CDROM to the motherboard's IDE controller, and six 250gb hard drives for the RAID partition, and one IDE hard drive for the OS to the Promise controllers.
(Note it is generally accepted that it is bad to use the master and slave on a single IDE controller because if the master drive fails, it often takes the slave drive offline as well. This is why people use SCSI which doesn't have this problem, or SATA which doesn't have master and slave, but uses point to point. However, given the low failure rate of hard drives, I decided to risk it. Since this is a backup file server, its failure wouldn't jeopardize anything.)
As more hard drives became available, they would be added to the RAID array. Eventually, SATA drives would be added to the array, minimizing the risks of a master drive causing a double failure. After operating the system for a while, I realized that the 100mbit Ethernet was a bottleneck. I got a Broadcom gigabit Ethernet server card, which has a PCI-X interface. Since the motherboard has two 64 bit PCI slots, which are separate from the other PCI slots, the disk traffic wouldn't affect the Ethernet traffic. The array can sustain reading and writing at 50 mbytes/second according to the Bonnie+ benchmarks, included below. When backing up files from other computers, I can copy at about 20-30mbytes/second across the Ethernet.
Eventually a 300gb drive was added to the array, though the array only could use 250gb. A CoolerMaster Stacker case was chosen in order to provide good cooling for the seven hard drives.
For software, I chose Linux because it had mature RAID software. We configured Linux with ssh as well as samba, which emulates the windows SMB protocol used by windows file servers. That way, windows machines could see the files on the backup file server, and files could be copied back and forth using the windows GUI.
Here is how I configured the RAID-5 array:
# make a device-special file # Not needed with most 2008+ Linuxes # mknod /dev/md0 b 9 0 # make the raid array with 6 hard drives mdadm --create /dev/md0 --verbose --chunk=32 --level=5 --raid-devices=6 /dev/hd[abcdef] # This will make a 232gb size set, and a chunk size of 32k mdadm --detail --scan # make the file system on the raid array mkreiserfs /dev/md0 # make the mount point for the raid array make directory /data #the fstab entry for the raid array /dev/md0 /data reiserfs notail 1 2 # to create mdadm.conf, as root run mdadm --detail --scan --verbose >> /etc/mdadm.conf # This will make /etc/mdadm.conf, which looks like: # DEVICE partitions # MAILADDR foo@bar.com # ARRAY /dev/md0 level=raid5 num-devices=5 UUID=ce5e9366:d83a232c:3da68804:f6814703 auto=yes
If you happen to be running Mandriva 2008 spring, fedora 9, or possibly some other 2008+ version of linux, you should read bug 40023 if your raid array doesn't mount when you reboot your computer.
Power consumption:
If you have comments or suggestions, Email me at turbo-www@weasel.com
Created with gnu emacs and template-toolkit, not some sissy HTML editor.
No Java or javascript needed to view my web pages. They
both have significant security issues.