The NAS 3-2-1 backup breakdown
How much does backup actually cost?
I interrupt our incredibly fascinating breakdown of zettelkasten system creation to talk about backups.
As always no advertisements and no affiiliate links are in this post. All opinions are my own.
You should backup your data so you have it in case your data storage medium is lost, stolen, or destroyed. I could write a lot about the principles of backup, but you probably already understand a lot of them. I want to dive into a particular backup strategy and how I aim to achieve it in a cost-effective way through the power of friendship.
3-2-1 backup
I honestly tried to track down where this idea first appeared but I cannot find a solid lead on that, but the 3-2-1 backup strategy is a sound strategy to achieving backups safely. Here's how it works.
- You should have three sets of your data (the original and two copies)
- On 2 storage media.
- And one should be stored offsite.
Let's break down the particular reasons for these rules and how you can follow them.
Have three sets of your data
The idea behind having three sets of your data is really just about increased redundancy, and really a result of the rule requiring having a copy offsite. So let's start with why you want a copy of your data. Because if you don't have a copy of your data, you don't have your data when you lose a set of your data.
Moving on. We'll revisit this when talking about offsite copies.
Use two storage media
When storing data, if you have two copies but they share a mode of failure, there's a possibility that they will both fail at the same time for the same reason. For example, if your hard drive decides to drive its spindle straight through its platters such that it "commits sudoku" and your backup hard drives are from the same batch, you have a high risk your backup could fail at around the same time.
Traditionally, this was about having tape or optical backups for hard drives. In my machines at home I exclusively use solid state drives for performance but in my NAS I use mechanical hard drives to save on cost. This can be considered two mediums. But even within my NAS I have a RAID setup and hard drives from more than one production batch and manufacturer (more on that later).
This rule may not need to be followed as strongly as it required in the past. I do think that the next rule has a much bigger impact.
Have one set of data offsite
This is the kicker, and it's where backup starts getting expensive. More expensive than a NAS and hard drives. But you absolutely need offsite backup. As we know, "two is one and one is none" is the rule of redundancy and you need redundancy of location in addition to redundancy of data.
A friend of mine has a grandmother in Houston. Her house was hit hard by Hurricane Harvey, resulting in complete flooding (she had evacuated, thankfully). A few days later while the floodwaters were receding her house burned down due to an electrical issue caused by the flooding. Did you know a hurricane can start your house on fire? I didn't until I heard about that.
Even if your data is in different storage media, some of which are resiliant to particular types of damage, being co-located geographically can result in the destruction of both at the same time. This is why you need an offsite backup and this is also the reason for needing three copies. The first two copies, your primary and secondary, live alongside each other. You backup to your secondary from your primary regularly. Your tertiary backup is the "oh shit" backup for when everything goes wrong. You'll be glad you have it when you need it.
How to do 3-2-1 easily
Get a an external hard drive (of the opposite type of the one in your computer). Backup to it daily. Also backup to a cloud provider weekly.
How I do 3-2-1
Each of my computers backs up automatically to my Synology NAS daily, and my NAS backs up to the cloud (Backblaze B3 storage, specifically) daily.
Why I use a NAS
NAS or Network Attached Storage is a really simple computer that's optimized to just have a ridiculous amount of hard drives stored in it and to offer them for use to your local network.
I use a NAS because it serves myriad uses in my home. It's my storage for things I want to share within my network. I run a media server on it for things that used to be on Blu-Ray discs. I run my security cameras through it to store footage. And of course I use it for backup.
I use a NAS because it makes all these things super easy.
What NAS should I get?
There are a bunch of options for NAS, and I'll outline them here. I have personal experience with only one, a Synology 1019+ I purchased in early 2020.
FreeNAS
First up is FreeNAS, it's an open-source linux OS using ZFS storage. This means you can build your own computer and drop a bunch of hard drives in it and not pay a penny for software, and it being linux you can get it to do exactly what you want. Pretty simple, it works. You can also buy FreeNAS-ready prebuilts such as the TrueNas Mini E, which supports 4 bays of hot-swappable storage for $750.
Unraid
Unraid is a system which can run NAS functionality but also virtual machines in a way that provides bare metal access to resources. This means you can have a NAS that is also your main workhorse PC and your gaming desktop. Great! If you plan on violating the rule of two storage media. Yep, that's what made me ultimately decide not to go with Unraid, it's not a backup solution. It's designed for primary storage.
But it does support a disk-independent logical RAID system, which we'll talk about more later (under the Synology section)
Unraid does not sell systems, but you can build a system for assumably less than a QNAP or Synology. So no matter what, it'll cost slightly more than FreeNAS, because FreeNAS is free.
FreeNAS can also run VMs with near-bare-metal access. You should only go for Unraid if you for some reason need mixed-disk RAID. Why, when you're going to need a backup for it anyways?
QNAP
QNAP is a manufacturer of NAS systems for home and enterprise use. QNAP stands for "Quality Network Appliance Provider" (not "Quickly! Need (an) Acronym, Please" which I originally assumed.) When you go to their site they immediately ask for the right to give you desktop notifications, which is very impolite.
Basicaly, QNAP is a simple way to build a NAS that works out of the box pretty well. A system similar in performacne to the TrueNas Mini E would be the QNAP TS-451D2-4G which has 4 drive bays and costs $475 right now on my friend Bezos' personal website. How is it cheaper than the prebuilt FreeNAS? No one knows (it's economy of scale).
Synology
Synology is a manufacturer of NAS systems for home and enterprise use. It does all the same stuff the QNAP does. The software is a bit easier to use (the management interface is pretty great, I gotta say.) A comparable synology system (4-bay) to what we've been looking at previously would be the Synology DS420+ which costs $500. It's $25 more than the QNAP, but it has one amazing thing QNAP does not:
Synology Hybrid RAID
This is the reason I went for Synology. I have in my Synology right now two 4TB disks and three 12TB disks which I shucked from external hard drives when they went on sale (highly recommend, great way to get cheap disks). That's 36 total terabytes of storage. But, in order to survive if one of these disks dies, data is striped across all these disks.
In any traditional RAID system, you are limited to the number of drives you have times the size of the smallest drive for your storage amount, minus one for surviving failure. That's 16 TB in my case. That's what I'd be limited to on a QNAP running RAID 5 - 2 entire drives' (and the largest drives) worth of wasted space. But you'd think that means I could just not buy two of the big drives, but no, I'd end up with only 8TB of space if I did that (still need 4TB for failure, and just waste the rest of the space on the 12TB drive. Every other notable form of RAID performs worse in this respect than RAID 5. Except RAID 0 which could be called RAID "I'd like to lose all my data if I lose a disk please" because it does great until you lose a disk as it stripes the data across disks but has no redundancy.
In comparison, my system provides me with 32TB of storage, twice the amount that RAID 5 provides, and it's able to do this by using only one of my disks for ensuring complete redundancy. The tradeoff is that my system will simply not operate without a disk when it fails. It could, technically, but it instead screams at you to replace the disk. This is to avoid the system losing another disk and thus all your data. RAID 5 is the same situation, though.
Oh and it's faster than RAID 5.
That's why I bought a synology. You could probably do something similar with LVM in FreeNAS but it's not an officially supported thing and it's not easy. I've never heard of anyone doing that successfully.
Since I can easily mix and match drives, I just snatch up cheap storage whenever I see it.
M.2 SSD cache
This is something that my Synology can do, but so can basically anything else on this list. I have two 128GB M.2 NVMe SSDs shoved into slots on my NAS which act as hot/warm storage for reads and writes. This makes reading and writing frequent files faster and makes up a bit for the slower speeds that would otherwise be the result of the old spinning rust on those mechanical hard drives I get for real cheap.
Link aggregation
With two 1Gb ports which can be aggregated and my Meraki MS120-8 (disclaimer: I work here but all these opinions are my own and some of them might be satirical) which can do 20Gbps on the backplane and supports LACP link aggregation I can achieve some surprising storage speeds on my local network. I'm not going to pretend it ever achieves what my workstation machine achieves with the SSD inside it, but it is certainly more than you'd expect HDDs to achieve especially over a network connection.
NAS cost comparison
Right now, I store 4 TB of data. Yes, with 32TB of available storage. I'm terrible (I've got plans for what to do with the storage though). Here's what could achieve 4TB and redundancy (actually 3 4TB disks) and what it costs for a year.
Maker | Price | Disk cost | Total |
---|---|---|---|
FreeNAS | $800 | $320 | $1120 |
QNAP | $475 | $320 | $795 |
Synology | $500 | $320 | $820 |
And the cost goes down the next year to $0!
The cost of cloud backup
Cloud backup is more expensive than you'd think. Let's look at an example of Amazon S3. S3 has multiple tiers. The default hot tier costs $0.023 per GB, so with rotating backups (backing up daily, then keeping weeks beyond the most recent week, months beyond the most recent month, (7 days backup, week before, month before, etc) and assuming changes of 5% of data daily, and 20 total backups where 5% of data changes between backups and 20 total backups, we will need 8TB of storage. For S3 this costs $184 a month for storage. Each individual backup of 5% incremental changes is 390 requests daily, so 11.7k requests per month at $0.005 is around $0.05 a month. This results in a yearly cost of ~ $2.2k. Retrieving a single backup to restore your data is $0.09 per GB so a backup retrieval costs $360.
You might think Glacier is a better choice but incremental backup is not an option because deleting data before a certain time period (3 months to be exact) costs as much as retrieving it, so collapsing incremental backups together is a pain and half, especially for your wallet. Storage though is $0.004 per GB, which means storage costs $32 a month were it the same as S3. But without a feasible solution for incremental backups, you're going to have to perform full backups and keep 3 months around, which means your monthly cost will be $1.4k if you backup daily and yearly costs will be ~ $17.3k.
With glacier if you need to backup from the cloud you'll also pay retrieval costs. $40, specifically. Not bad, really. You'll get your data in a couple of hours or days.
Synology has its own backup service, which I don't trust. I just don't. They're not a huge cloud provider, I'm sure they've got plenty of redundant storage but I feel like storing it in the service provided by my NAS maker is failing to store in two mediums. My gut warns me against it.
Backblaze, the company selling cloud backup to millenials, also has an S3-compliant storage solution. It costs a quarter of the price of Amazon S3 at $0.005 per GB, meaning its monthly cost would be $40 and annual cost would be $480. Retrieving a backup costs $0.01 per GB, so $40. Half the price of S3 going in, same cost as Glacier coming out, but at speeds similar to Glacier. A pretty good deal. This is what I use right now.
Alternatives: have a friend with a NAS
With the annual cost of S3, you could just buy your friend a NAS and have them stick it in their house and you use it as a backup destination. Then your friend has a NAS. Maybe your friend already wants a NAS, then you just offer them some hard drives to offset the storage you'll use on their NAS. This is much much cheaper than cloud-based storage, it's encrypted so your friend can't snoop on your files, and you can literally visit the NAS. You know where it is. Also you can visit your friend. It's also great incentive to keep that friendship strong, because you're both trusting each other to keep that data around.
Compared to the cost of Backblaze's backup it certainly costs more for the first year at least. That's why I don't recommend just handing them a NAS as a gift. Trick them into wanting one for themselves. But if they buy one themselves because they want it, offer to trade HyperBackup service with them and you'll both save ~ $500 a year.
Synology and QNAP both have built-in solutions for working with other devices of the same maker. Synology calls it HyperBackup and QNAP calls it Hybrid Backup Sync. FreeNAS has something I'm sure, but I wouldn't know what it's called and there's probably 50 ways to do it. QNAP and Synology are also capable of backing up to each other (though they won't make it easy).