In my previous post Blockchain DLT Considerations for Supply Chain I have laid out some important challenges and considerations to take when designing and planning for a blockchain DLT based network for your supply chain consortium. Even though the concepts are very similar across many use cases, the supply chain context is especially challenging due to its complexity on both the business side and the technical side. One important notion of such complexity is the scale of the supply chains that drive large international businesses. For example, a business may deal with thousands of suppliers and partners of different sizes that make a private and permissioned blockchain consortium a very complex network of participants and nodes. Therefore a very relevant and “scary“ question is how to scale my blockchain DLT based network without falling into an operational nightmare and losing completely the benefits and business viability of forming and participating in such a network.
First, let’s talk about what scaling may mean for your DLT network from a business and technical perspective.
Supply chains are dynamic networks that may change frequently over time to address the business reality (for example a retailer may import fruits grown by different farmers every month depending on quality or price). For the purpose of better understanding the notion of DLT network scaling I will use as a context the following scenario:
A retailer, a farmers’ cooperative, and a processor have joined on a DLT network and use it to track the fruits produced by the farmers on their way to the retailer’s shelves. The purpose of the DLT application is to have real-time visibility on the movement of the fruits from harvesting time to the retailer shelf so that if a contamination is detected the location of the exact producing farm can be located and the contaminated fruits can be disposed of across all stores. Let’s assume the DLT network is private and permissioned, modeled as a consortium of 3 participating organizations (retailer, farmers cooperative, and processor). Every organization hosts and operates its own network nodes. The governance of the network is based on a majority policy that requires a signature by at least 2 organizations in order to add a new participant to the network.
One of the most important and challenging aspects of the DLT network scaling is the network governance and in particular, the operation to add/remove participants. This is crucial for any supply chain as business partners may change frequently when the business grows or for example, has to downscale due to challenging economic background. Let’s explore what it means to add a new participant to the consortium above.
New participant on the network
The fruit-tracing DLT application may work just fine and provide real-time information across your supply chain. This means when contamination is detected you can easily pinpoint where the fruits came from and dispose only of the affected batches. This can shield you from huge financial losses and unnecessary food disposal. Very often the retailer stores are regulated (i.e. by government or association entity) and they are required to provide transparency to the regulator whenever contamination has been detected. The retailer may provide the information to the regulator whenever such a contamination event occurs. However, the information has to be collected and prepared before sending it to the regulator. The retailer though may decide to use the benefits of the established DLT network and invite the regulator to be part of the supply chain consortium. This way the regulator will have the same information at the same time as every other participant on the network with the data consistency and trust guaranteed at a transactional level.
Adding the regulator to the supply chain network technically speaking is a process with the following steps:
- The regulator makes a formal request to join the consortium and sends it to the consortium participants. The request may contain technical artifacts on the identity of the regulator (i.e. certificates).
- The consortium parties inspect the request to make sure they may trust the provided artifacts and that those belong to the joining regulator.
- The participants jointly review the regulator’s request in order to make sure they all agree/approve to include it in the consortium on the network. They also may agree on the sequence of the steps to prepare the technical transaction update, sign it and deploy it on the network.
- The transaction update step is usually taken by someone from the consortium and consists of compiling a technical update transaction of the current DLT network permission model. As mentioned above (due to the policy), the majority of the participants will have to sign this transaction in order to be able to commit it to the network.
- The permission model update transaction is shared and signed by the current participants after a careful examination. This is an important step in order to make sure all parties sign the transaction they already agreed on. This may be done by a technical team (or a tool can be used to facilitate the process).
- Deployment may be done by a participant after all signatures are collected on the transaction update. This will finalize the update of the consortium and open the door to the regulator to join its own nodes to the network.
- The last step is to notify the regulator that the consortium is updated and the network is ready to accept the regulator nodes. From the consortium perspective, the regulator is free to bring its own nodes and manage its own part of the network.
As the network is distributed, the aforementioned process is asynchronous and usually requires human interaction as proper approvals must be collected on business and technical levels. It is possible to automate and simplify most of the steps in order to avoid human errors and speed up the process. This is an extremely important part considering that you may need to scale your permissoned DLT network with a large number of new participants (i.e. adding carriers, customs, ports, etc.). Therefore this process must be well established, properly designed, and implemented upfront. It will determine the success and business value of your supply chain DLT network. Hence using the proper tools that allow you to manage the network membership across multiple and heterogeneous network infrastructures is the key.
For example, you have to consider that your supply chain partners who will join your supply chain DLT network will come with their own infrastructure that may be hosted on different cloud providers powered by different technologies. Furthermore, if you lock-in the design of the DLT network to support only a particular cloud provider you are severely limiting the scaling potential that may render your consortium useless and unusable by your supply chain partners. It is important to mention though that some consortia will benefit from a model where the network is managed and operated on a monolithic vendor infrastructure using its integrated tools and services. For example, few large-sized businesses may benefit from using a single centralized cloud platform to bring up their nodes and have native tools to manage the network. However, such an approach may become a bottleneck when the network has to scale and include nodes that are running on a different platform.
Note the steps of removing participants from your consortium is relatively very similar to the one above. Depending on the chosen DLT technology and tools the technical steps may be different.
As I mentioned above the best approach is to automate as many steps of the process as possible. This will simplify the scaling workflow of your supply chain and minimize the related operational cost. This is quite important considering the smaller participants may not have the capability to perform highly technical and complex tasks that are costly to execute and require specialized skills. User-friendly tools capable to manage the process and fluently integrated with the underlying DLT technology will bring huge value to any consortium.
The DLT network governance means a lot more than just adding or removing participants. Depending on the used DLT technology there are multiple technical network governance challenges like adjusting the blockchain block size, adjusting the governance administration policy, etc. All of those additional governance operations are very similar to adding new participants. Usually, the difference is in the nature of the network update transaction content.
Often a business may be part of multiple consortia and different DLT networks. This is especially relevant for large enterprises (or regulators, government, etc.) that run different businesses and operate at a large scale. From a technical perspective, this means the business-operated DLT nodes have to handle a large volume of transactions.
Scaling network nodes
As every participant on the consortium host and manage its own nodes the scaling is essentially a task performed by each party individually. Scaling is a complex software concept that has multiple dimensions. An important dimension that will affect your overall transaction throughput and performance is the resources scaling capability. Note that in the DLT world your overall network performance is as good as the worst-performing nodes and components of the network ( considering the nodes are endorsing and processing transactions ).
Let’s focus on nodes scaling in the context of a single participant. The nodes scaling fits into the well-established concepts of vertical (scale up) and horizontal scaling (scale out):
- Vertical scaling - add more computational power for your nodes (i.e. higher single-core performing CPU), add more memory, add more storage. This exercise is not a simple task when we are handling large systems with multiple components. However, there are tools available that can manage the process easily and are more or less abstracting the underlying complexity of replacing or enhancing the hardware. For example, you may be able easily to resize your file system volumes with Kubernetes or easily add a new machine to your Kubernetes cluster with much more CPU power and RAM. It is much harder though to decide how much hardware to add in order to speed up your node’s transaction processing. You will likely need a test system to experiment and measure before performing those changes in production. This exercise is not a simple task and requires experts to perform correctly (not an easy DIY kind of task).
- Horizontal scaling - or how to increase the number of transactions your network can handle by adding more nodes and hardware (i.e. add more CPUs to benefit from the extra parallel processing capabilities and increase the number of transactions your nodes can process at a given time). The DLT network is distributed by nature and horizontal scaling shouldn’t be a complex task. In fact, it is a good idea to scale your part of the network with multiple nodes on multiple infrastructure providers and geographical locations (data centers) to manage the risk of failures and data loss. Note that you may as well implement a backup process of your nodes. However, downtime on your side may have a huge impact on the whole network operation. For example, if you run a single node it may fail and a restore from a backup may take hours. Furthermore, when that node is part of the transaction processing the whole network will cease operation while it is down. It is in fact very desirable that each partner on your DLT supply chain network runs multiple nodes in order to avoid those network operational impacts.
The technicalities and details of your nodes’ scaling are not a simple matter. There is a lot more to consider and implement depending on the used DLT software and underlying infrastructure.
Transaction throughput and load
The volume of transactions your DLT network has to handle may increase over time. As a consortium, the network participants have to consider how the overall transaction throughput is affected by the network infrastructure. For example, your applications may face a large load at peak hours that may eventually cause failures due to exhausted resources. To know upfront what the network can handle, you may be interested in performing tps/tpm (transactions per second/transactions per minute) network benchmarks. The key is that the overall network performance is dependent on the performance of multiple key components that are distributed and managed across participants. That means if an endorsing participant on the network is running slow then that becomes your network performance bottleneck. There are many things that may impact the throughput such as network latency or transaction endorsing times. When the network has to improve the transaction throughput at peak hours then every participant needs to take part in the process to scale properly their nodes, network, and infrastructure (vertically and/or horizontally).
Another point to consider is the overall network elasticity - the ability to resize (up or down) dynamically the resources needed to handle the load at a given point in time. This may not be an easy task especially when there are multiple participants on the network running on their own technology and infrastructure stacks. Even though an organization on the consortium may decide to implement an elastic scale-out process, the others may not be able to do so. Therefore a global implementation is hard to achieve without a centralized approach that may defeat the distributed nature of your consortium network.
It is clear that the high network throughput of a large consortium is a complex and hard-to-achieve task. It is a real problem for many business processes (think the volume of daily financial transactions). Fortunately, such needs may be met by the technology assuming the right framework is used and the network is organized and segregated properly. It is common to think that a business consortium should implement the DLT applications using a single monolithic blockchain ledger. However, such an approach is not forced or necessarily required. You can think of a consortium network as a set of multiple networks with proper interoperability established between them. The network nodes may be serving different parts of the smaller networks to achieve much better control over the underlying infrastructure and achieve the best possible throughput. I will uncover further details of this concept in one of my future posts.
Transaction throughput is an important aspect considering high load scenarios. However, as in any software system, it is just one of the many dimensions to look at.
High availability is an important factor for your network resilience. This can be easily achieved by distributing your nodes across different infrastructure providers and geographical locations ( the network might need to handle international transactions ).
There are other areas of scaling to consider as well that are applicable for any system:
- functional scalability
- heterogeneous scalability
- administrative scalability
The above areas are important to incorporate into your network design from the very beginning as it might be hard to perform changes later on in an operational production network.
The scaling topics above are common concepts in almost any permissioned blockchain DLT network. However, as always the key is in the details, and chosen technology, design and the exact implementation drive the final success.
In one of my next articles, I will talk about what network scaling may look like when the implementation is based on Hyperledger Fabric as one of the most popular choices of building permissioned and private blockchain DLT solutions.
Scaling your supply chain DLT based network is a complex task and there are important DLT notions to focus on in addition to the regular well-established scaling techniques in the world of conventional applications platforms.
There is still a lot of learning to gain with DLT scaling in the context of the private and permissioned consortia. The DLT technology implementations are improving at a fast pace to address real-world needs. At the end of the day, however, the key best practices come out of the operational production deployments.
One of the best ways to deal with high complexity is to start with simple solutions and build on top. There is no silver bullet to solve all DLT scaling challenges and very often we start overthinking the whole story. I think the best approach is to reduce the complexity by segregating mission-critical DLT applications in smaller networks that can integrate and interoperate on a larger set of networks. The beauty of using blockchain ledger technology is that all transactions are traceable and verifiable. Therefore it is possible to split complex business transactions into small atomic technical transactions across different sets of networks that can verify easily their data quality and security across multiple ledgers. I prefer to think of a business consortium implementation as a set of multiple interconnected DLT networks running a verifiable sequence of simple business transactions. In simpler words, a business DLT is a network of networks rather than a single network to run all business transactions with a single ledger to hold all data.