BlockTrades RocksDB proposal to Steemit
Steem nodes are costly to operate and the cost is increasing
Replacing chainbase storage with rocksdb storage to slash node operating costs
BlockTrades believes the most suitable solution to this problem is moving much of the blockchain state data that is currently stored in a database called “chainbase” into a more efficiently organized form using the RocksDB database. Chainbase was an early attempt to reduce the memory requirements of running a Steem node, but it ultimately failed because the storage mechanism was poorly tuned for storing data on a hard disk, which lead to most nodes storing their chainbase in a form of “RAM disk” in order to operate at acceptable performance levels. In other words, node operators found that the data still needed to be stored in memory in order for the node to operate acceptably.
Unlike chainbase, RocksDB is specifically designed for efficient storage and retrieval of data using standard hard disk technology, so it will enable this state data to be moved out of memory once and for all, resulting in reduced costs to run Steem nodes.
Why BlockTrades is the natural candidate to implement rocksdb in Steem
The BlockTrades development team has long been concerned about the increasing costs of running Steem nodes and we approached Steemit over a year ago with a design architecture for using Rocksdb to solve the problem, based on our prior experience with high performance database coding. Steemit was generally receptive to the idea and we implemented an initial proof-of-concept under contract with Steemit that stored database from the “account history plugin” that carried one of the highest memory costs for the API nodes operated by Steemit. This technology was successfully deployed in Steemit’s production servers and has allowed Steemit API servers to continue operating on high end cloud servers.
Note, however, that this was just a proof-of-concept to show that Rocksdb met the performance requirments to allow state data to be moved from memory to disk storage by writing special case code to move the account history data. It did not reduce the memory requirements of the “consensus” state data that requires witnesses to run nodes with a large amount of memory. But we proposed a second stage project whereby we would develop a generic way to move Steem state data (stored in C++ data structures called multi-indexes) on to disk storage using Rocksbd, then use this technology to move most if not all of the state data onto disk.
An important part of this latter proposal was to also develop an extensive test framework to verify that functional operation and performance of the Steem node was not degraded as we progressively move more and more of the state data onto disk storage. At the time, Steemit rejected that proposal as they decided that enough memory optimization had been done for their immediate needs and Steemit planned to move all the non-consensus data out of the nodes into a separate database (variously SBDS and Hivemind).
If hivemind is going to reduce memory costs, why is rocksdb needed?
Hivemind will move “non-consensus” state data out of the API nodes operated by Steemit that are used to serve data to steemit.com and other web sites and user interfaces like busy.org, steempeak, partico, appics, etc. But it won’t reduce the amount of memory required to operate the most important nodes required to keep Steem a decentralized network: witness nodes. Rocksdb, however, will allow this data to be moved to disk, allowing witness to operate nodes at reasonable cost now and in the future as the blockchain’s activity level increases.
Why re-propose this now?
Steemit has recently expressed a renewed interest in reducing the costs of operating Steem nodes. At the same time, they’ve reduced the number of employees able to complete various Steem-related projects. We at BlockTrades believe that we can provide the expertise to complete the Rocksdb at an efficient cost and free up Steemit’s blockchain programmers to focus on other high priority tasks like implementation of SMTs.
What are we proposing?
Our proposal is to analyze the key data structures currently consuming memory in consensus nodes (i.e. witness nodes) and then move some of this data into Rocksdb storage using a generic solution we will develop. We will then thoroughly test the resulting version of the node versus the operation of existing chainbase nodes to ensure that the replacement will be seamless. The generic solution we develop can then be used to move more state data out to disk as required in the future and the regression test system we develop will be re-usable as well. We believe we can perform this work in 2-3 months with a 6 person team for a fixed-cost of $250K.
If this is an offer to Steemit, isn’t this better done directly to Steemit?
We have approached Steemit with this offer and they are considering it, but Ned asked that I also share this information with the whole Steem community and see how the idea is received. I believe this could a great first step to expanding development of Steem’s blockchain code beyond Steem’s core team and will enable Steem to progress faster than ever before, so please let us know in the comments how you view this project proposal.