“Got Your Back”

if_Drive-Backup_79136Back-ups are one of those things that everyone knows that they need, but seldom puts much time or effort into setting up and maintaining properly. My previous safety net was CrashPlan, who are exiting from the consumer back up space. This left me in a difficult place to try and find a cloud provider that supports large server backups from Linux at a consumer price.

I looked at Amazon Cloud Storage ($60 per TB per year), Google ($240 per TB per year), and Backblaze B2 at $60 per TB per year (I didn’t consider Azure, given my Linux infrastructure). While Amazon may seem a safer bet on the surface, I found their EC2 pricing unnecessarily confusing, not transparent, and potentially a “runaway” cost as everything has a price per unit. This led me to believe the consumer cloud pricing may just be a transient offer in their quest for per-byte/second billing of computing, storage and networking cloud services. I needed something that had been around for a while, had simple pricing and a focused offering. Backblaze B2 fits those criteria.

Backblaze has a simple pricing model, where you pay for the storage you use calculated on average during the day. That is at half a cent per GB per day, with 10 GB free (that’s GB not GiB). There are a few extras, such as downloads at 2c per GB, and transaction costs, but I discuss that later in the article and have largely mitigated that.

They have a great pricing calculator page to give you some confidence that you’re not going to be hit with a big bill at the end of the month. I tested it with Duplicati, a cross platform software I’m testing for backups. Seemed OK for uploads under the free 10GB limit. I read their blog, looked at their open source storage pods read the review and decided this was the cloud storage for me.

I returned to evaluating backup software as I was not completely happy with Duplicati, given it’s dependence on the complex Mono stack for Linux and web-only interface. I wanted something that I could script. Maybe surprisingly, Backblaze came to the rescue as they have a page listing integrations with their service. Restic jumped out as a modern, script-able backup engine that is cross platform by virtue of it’s source code language, Go. Most importantly, it generates AES encrypted backups with tamper detection.

So I dumped 700GB of data into B2 for a test. It took a long time. A really long time, over 4 days. I measured the backup at about 0.5MiB/s. Not great. What was worse was that Restic made a lot of calls to the transaction APIs. Over 350,000 calls for 120,000 files. That’s not hugely expensive, around 15c, but it’s disproportionate to the storage costs, when uploads are actually supposed to be free to B2. I discovered a caching switch added in the latest version, but this only marginally helped take about 100,000 calls off the class B transactions.

Storage integrity checks, repairs and restores were slow (12-24 hours) with lots of transaction use, even with local caching turned on. Time to think of an alternative. I didn’t want to drop Restic as everything else about this solution fitted my needs, but maybe there was a better way to the the data into the cloud.

Enter another B2 integration, RClone. This is rsync for the cloud. This was much faster, consuming my entire upload bandwidth, and cutting upload times to 1/8th of what they were. The downside, is that it is not a backup engine by itself, and doesn’t pretend to be. However, it can be combined with a backup engine for great results.

What it means is that I need to use Restic to backup all my PCs and servers to a local mirror and have RClone copy those files to the cloud. It might seem wasteful to have a local copy when everything is in the cloud, but having multiple copies is a good thing and assuming I don’t lose my local infrastructure, I can restore files from local servers without needing to pay for downloads.

I can verify and check my local Restic repository, then compare the SHA1 has with the remote hash, calculated by Backblaze to ensure that the remote files are correct without requiring a full download to test.

While I can restore directly from Backblaze B2, the most effective way would be to use RClone to get the files back to a local disk and restore from a local mirror. This mirror may be my current infrastructure if I’m just restoring files, or it could be completely different systems in the event of a catastrophic failure.

My back-ups are completely automated, running on a schedule or upon power up for the household laptops. For Linux, I can use Restic directly, and for Windows, I’ll stick with Duplicati and GPG encrypted volumes. I opted not to use the the Windows file history functionality in Windows 10 because if “Windows” has access to the history archives through a Samba share, then backups are vulnerable to hacking or corruption.

While this has taken me away from my CPC2 project, the loss of my project files would have been a much bigger delay.




Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s