Ever since the release of our open source single binary “dfuse for EOSIO“, we have received many requests to explain the foundational architecture that it encompasses. In this video, Alex explains the genesis, the features, and the power of
pitreos, our open-source backup and recovery tool for
nodeos. Learn why we custom built this incredibly fast tool, and how you can take advantage of it.
What’s up everybody, welcome back to another episode of this miniseries about dfuse architectural overview. In this episode we’re gonna cover pitreos, an often overlooked piece of software that is magnificent.
In this episode we cover pitreos. pitreos stands for “Point In Time Recovery for EOS”. pitreos. The genesis story of that piece of software goes back to when we were producing on EOS mainnet. In the beginning, we were looking for an efficient backup and recovery solution because we had a producing node and we wanted to make sure it was really reliable and we looked at all the solutions out there. Name it, we’ve checked it. TAR, R-sync, Restic. We’ve looked at 75 of those.
But we noticed two things that are very peculiar and particular about nodeos’s largest files — mainly the state and the blocks log — is that the state is a sparse file. And on EOS we had provisioned 68 gigs for such a file, and if we look here on the terminal, here we have sample data from mainnet. If we look at what’s effectively on disk, it looks like we have 8.6 gigs, but if we look at what’s inside the state, the shared memory here, it’s 63 gigs, which is surprising. But this uses the sparse file feature, most modern file systems have that. It’s a feature of the file system that says, ok, you have continuous data but at this point from that byte to that other byte it’s a given we know it’s all zeros.
So we don’t need to use actual disk space to store a huge amount of zeros. It’s just a feature, you just pierce a hole in that file. But if you don’t have a tool that is aware that these are sparse, and that this is a hole, what you’ll see is you’ll stream a big file. You’re gonna have millions and millions of worthless zeros. But if your tool is aware that these are sparse files, you can analyze and have different system calls to go and figure out where are the holes. Then you can be very much efficient.
And another thing, the second aspect of nodeos’s largest files is the append-only nature of block logs, which is great. Other blockchains use normally leveldb. It’s very difficult to diff, very difficult to have as an append-only as it doesn’t work like that. But block logs thankfully are append-only. You can take that assumption with certain checks.
So when you have a block log and you make a back up and then you continue working and you do another backup. Well, it’s similar to this here. This is the first file, and these are chunks of 50 megs, these are the default chunking that we use in the default setting. You change that into whatever. 50 megs and then another chunk, another chunk, and at some point the end of file is partial 50 meg.
But then if you continue on, what you can do there is you don’t need to verify, you can assume this was truncated here, and then you rewrite just that last chunk. This one, plus another incomplete 50 megs, and there you go.
So when you do a restore using pitreos, you’re actually going to fetch your previous backup, look at the state of the previous thing, look at what you needed to backup and just assume these are the same chunks. And let’s take a look at what it would look like if you do a backup. If you go to the terminal again, I’m in the data folder here of the node and I run pitreos. And pitreos has a few commands: backup, you can list files, and list the backups themselves, the version, and then restore. So let’s try it here for backup, what you would see. You can add a tag, add some metadata that’s gonna be stored like something like that. The blocknum whatever, for programmatic.
So this is offered as a stand-alone command-line tool, it works very well. But you can also have it as a library and actually this is all bolted in manageos, and the mindreader for those who listened to the previous videos. This is actually exposed as a REST call, so you can do backups and recoveries. It’s going to use that method under the hood without going through the command-line tools all as a Go library.
So if you look here, you can do data, and by default it’s gonna store that on the local filesystem at this location. Mind you this uses the dstore abstraction, so we have support in there for Amazon, GCP, Azure Blob Storage, local file systems, and it also supports local caching. So you’d have another directory where you avoid going in and reaching out for files on the network if you have a local cache if you are often doing some backups. For example you want to restore fast to the previous state? Well, you have all the chunks locally. It’ll just go and write them back. We’ll see how that works.
So if I do a backup of this directory, I would just say “dot” and it’s gonna use those default values, store that in this directory here. Let’s try that. So I’m gonna backup, I’m gonna time that. And we’ll do some magic to shorten the time-space. BAM! Okay mind you. So it took 15 seconds to do a backup of something I did already before. Mind you this is on my local NVMe disk, it’s pretty fast. There’s no network round-trips. But it still went through and it looked at the previous backup, noticed that there’s no need for the blocks log, it’s a very tiny block log I have there. But it went through the whole state file. Looked at 68 gigs and it just needed to back up those chunks that were already written. So it didn’t need to upload. So 15 seconds to do a backup. On production for mainnet, it would probably take a minute or so.
And for a restore, let’s try what that would look like if I want to do a restore. Now I’m gonna list, so `pitreos list`. I want to recover for this one, `restore`, here in this directory and then let’s see how long this takes and let’s do the same trick. Okay, 14 seconds to do restoration, previous run took 13, 15, 16 seconds. Around this time. So it’s extremely fast, and because it knows and it’s optimized for those two special cases that are particular to nodeos.
So you can use that in your own setup. It’s bolted in the dfuse platform as sort of an automatic backup recovery, it’s the fastest solution we have right now. It’s faster than doing snapshots of your volume. It’s a lot more cost-effective because it’s deduplicating the chunks of 50 megs everywhere, and you can more easily have backups of your full blocks log and it’s not so heavy to continue because it’s not a full-blown backup each time.
Okay. Hope this is helpful. It’s open-source, you can use it and if you like it, then star the repo, go in that dfuse-eosio and it’s all part of the solution, star that repo, if you have any comments in the Telegram channel don’t hesitate, we’re more than happy to take that. We hope you appreciate it. Okay, you’ll see you next time