Ever since the release of our open source single binary “dfuse for EOSIO“, we have received many requests to explain the foundational architecture that it encompasses. In this video, Alex reviews bstream – the block stream which is the low-level component of dfuse where all information passes through protobuf definitions to build dfuse block objects to be consumed.

 

Transcript:

What’s up everyone and welcome to another video in this series dfuse architecture overview, and today, we’re going to use the whiteboard. 

In this video, we’re going to cover actually, not just the whiteboard, but the bstream package. The bstream package is the low-level component of the dfuse platform. This one’s going to be a little deep a little more technical, I mean feel free to skip to the next video if you find this is more underpinning. But it might be useful for understanding the breadth and depth of the dfuse platform. I think it’s a very nice piece and certainly has been rewritten, three, four times. There’s a lot of wisdom and insight in there. 

Ok, so let’s get it on. I’m gonna show here, a small sample of that thing being used and we’re going to dive into eosws, if you notice dfuseeos, that’s the repository, dfuse-eosio and then in the eosws repository. This is a system that manages the WebSocket and this little piece of code we’re looking at is handling the getActionTraces call and we’re gonna slowly build the thing I’ve designed here, and see how these things get inside together. 

I want to start for just a second by showing you the interface, what does it mean to be bstream? This is the bstream repository here, on GitHub. The interface it has provided looks like this, this is the building block of the dfuse platform in a way that is similar to those middleware systems you had everywhere in every programming language. Often in web stack you had an incoming request, you can put a middleware and then it’s gonna call your handler. You can create trees, routers and all sorts of components. We’re using the same sort of thing, very similar to those who are knowledgeable about Go, the Net-HTTP library, but in this case, what is going through is blocks. Block information in the format that we’ve listed in the previous video. You can watch it up here, and they are in native Go, and they can attach, with that little thing here, you can attach information. For example, you can attach in the case of the search engine you’ll have an index of the content of that block ready to be queried. Or you can bring about other information about the block as it goes and flows through the pipe you can sort of build some object as they come through. And the block object in here is the protobuf object, protobuf scaffolded object. 

Ok, so, I want to introduce a few components that are in there, that are building blocks for the whole platform. The first one is the fileSource. The fileSource you provide one with the storage location – based on dstore which we’ve seen in a previous video – and it is going to start feeding blocks, unpacking them, and then feeding them down the pipe. Simple; turn around, give it a starting block and it’s going to start flowing. You don’t have a guarantee of which block is going to, if you ask for block 75 it’s rounded to the block 0 and you’re going to have those 75 because the file needs to be read anyway, it’s going to be piped through, maybe you want to have a gate. Which we’ll see right after, to say I don’t want to pass through blocks unless it’s that ID. Or it’s above that block number, and you can craft that. And that’s what happens in this design in the example shown. 

The lower level thing is the liveSource. The liveSource, you remember from the mindreader videos, turn around and you have the gRPC server called the blockstream interface and if you call that you’re going to get live blocks. They arrive in the same way and it’s a protobuf block and then you can start feeding that to someone who needs it. But most of the time what you’ll want is something to say start at block 10 million, and in that case, and if your live node is at a 100 million, then you’ll need to turn to fileSource and then stream everything. That’s the joiningSource. You give it a fileSource factory and a liveSource factory and it’s going to then transparently pass from one to the other and understand and try the livesource, if it’s too far it’s going to stop it, and all that. So you get one stream. Wherever is the source it’s going to join them together and feed you an insanely fast series of blocks. It can also parallelize fileSource downloading so they all arrive and unpack in parallel so they stream like crazy, it’s pretty fun. 

The other thing here I wanted to introduce is the subscriptionHub. The subscriptionHub is a simple, in-process fan-out system. Similar to the relayer we’ll see later, but it connects on one side to what we’ll call an internal source. Somehow it gets blocks from whatever and then it offers that as a liveSource locally. So if you have 75 WebSocket connections you don’t connect 75 times to the network. You’ll get one from the network and the rest is just going to be fanned out internally. It’s a local hub for internal in-go process subscription. You get a block, and then everyone is listening on that thing, they’ll get a copy of that block, not a copy it’s just actually a reference, and then can filter it according to their needs and then send it through WebSocket which is what we’re going to see right now. And I’ll show you forkable right after. 

I’m in the action here the function called getActionTraces through the WebSocket you can see in our docs. Then we’re going to do a few manglings, and we’re creating the handler here. What’s going to happen when a block flows in? That’s the handler that we’re going to keep at the end. And we’re going to get the block, going to get a native block for EOS, we’re going to filter through it, if it’s not executed we don’t want to filter it. Going through all the actions, if it’s the right receiver, if it’s the right account that you’re matching… It’s filtering and then we’re converting that to something that makes sense for you before we actually output it to the WebSocket stream. If he asked “I want db operations” we’re going to add that to the outgoing data streams and then it’s sent through the WebSocket. 

So that’s just a simple handler, this actually represents this box here where you’re actually receiving whatever, and it’s going to be configured after, and you’re going to filter according to your needs, and this is the end of the stream, you’re responding through WebSocket. It doesn’t have another handler lower than that, ok? But before that, as you’ll see, we’ll have a gate. Because maybe we want to start at a given place. See this one here. Let me show you that here. 

So you want to have a gate because the user requested to start at a given block height, so you don’t want to send anything before he asked at this point. So you put a gate, so whatever you’re receiving from a fileSource or whatever you make sure from your perspective that it’s gated and it’s configured with the user’s demand, the gate that we see here. But also before that, if you’ve asked withProgress, there’s an option in the WebSocket stream, we’ll actually decorate. We’re going to create another handler. We’re going to replace our handler, it’s actually going to insert optionally, let me show you here, optionally a box. It’s gonna unsync, *woop* like that and like that, and then this is gonna be the progress handler. 

And this one, what it does, it has a reference to the WebSocket stream and when it sends blocks flowing through, independent of filtering which happened after, it will say ‘I have seen block, x, y, z’ and it’s going to send periodically messages through the WebSocket again, so on the side. It’s decorating and creating a pipe, a configured pipe. 

So same thing here if we’re wanting an irreversible. We’ll have a handler and again replace our handler with the decorated, we’re passing our current handler preconfigured, and we’re adding a new box here, on these lines. If we ask for irreversibleOnly, then the forkable needs to be put there with a filter of irreversibleOnly. Let me give you an example of the forkable right after. And then we replace our handler there for configuring another box in between here saying ‘Ok, this is IRR only’, and then we link back to filter. So we don’t even lose time filtering things that are not irreversible and we stitch things together like that. 

And now we see if we look in the code again, we start from the subscriptionHub to get something that we configured. We get a block gate, this is target join thing, and in this one, this source will actually ask for the irreversible block at the beginning. Asking to start at this block so we can have all the data we need to have a reversible segment and navigate the forks, navigate the reorganizations of the chain. So that’s why you have those two boundaries. And then what we do, we simply kickstart the source here. So registered whatever, we’re sending a small message saying we’re listening and we kickstart the source and form that point on, the blocks are gonna flow from here to there, all through filter and pipe that to the message. It’s just the way it’s laid out, it’s fun, it’s building blocks in there, each very isolated and their purpose, they can be composed very well. So, I appreciate that. 

One other thing about the subcriptionHub which I can show you also here, the subcriptionHub itself has a quite complex handler. So this is how we launched subcriptionHub, over here where you see we’re in a bstream hub. And we have a series of subscribers down there and we have a factory for fileSource and a factory for liveSource so we know how to kickstart that. It’ll be, sort of pre-configured with the location of the block storage and the whatever, and the IFP address of the service, the gRPC services. It can always start with new parameters. 

In here, you’ll see that we’re configuring some wonky things with the source from the referenceFactory and then we’ll have a block gate and it needs to be very precise so we have fileSource factory and these are passed to the joiningSource because it starts with the factory and then we wrap all of that in an eternal source and what that means is this thing is never gonna stop. And it’s gonna have all the guarantees so that it always continues on from where it left off, even though there’s network disconnections, even though it’s waiting for files, even though there’s no livesource. It’s gonna then watch only files and retry the live, but the eternal source is always going to never let you down somehow. 

Then at this point, you get that guarantee. But all these things behind the scenes, you know the sub is always gonna continue and all the streams that are waiting, the WebSocket streams that are waiting, are going to be fed eventually when things settle up again. 

So this first part was about the first part of bstream, on the next video I will introduce the forkable – or forkdb – which is another feature of the bstream. Feel free to skip these if you want to go to the higher level stuff. Now you can imagine that all of our systems use that, they’re all composed like that, the search indexer has it’s configured thing like that. With different guarantees at every component. This builds an index and so eosws, anything that feeds blocks is built using that infrastructure. 

So in the next video we will see that forkable and the forkdb implementation that lives inside bstream and also later on we’ll see about the high availability and topology. So hopefully this is helpful, if you have comments please don’t hesitate, go in the channel, and star that repo! Hey, thank you.