Golem Architecture

So, how exactly does Golem work? To get an idea, let’s go through the computing task process.

Everything begins when a requestor needs to compute a task using the Golem network. It may be a CGI artist who has just finished working on an animation and wants to render it in high quality or a data scientist who wants to train her machine learning algorithm.

If the requestor’s task belongs to a class already implemented in Golem, she may use one of the task templates from the task collection. The task template contains the full computational logic. It has the source code that should be run, knows how to split the task into subtasks that are sent to different nodes, how to verify results and put them together as the final result. Right now, the task collection is just a set of available task templates (Blender and Luxrender rendering in Brass Golem), but later on it will transform into Golem Shop, where users will be able to add and download new use cases.

If the user’s task is something more specific, which isn’t available in the task collection, she would have to write her own code using the task definition framework. In Brass and Clay Golem, only the closed task collection will be available (effectively limiting the use cases to those preprogrammed by us). The task definition framework will be introduced in Stone Golem. It is a generic template providing public interface which allows users to create new types of tasks by implementing their own task templates. These can be stored locally and be used exclusively by the creator or they can be submitted to Golem Shop.

Once the requestor chooses the task template and defines a new task, it is added to the task manager, which keeps track of all tasks that have been requested from this node. Then information about the new tasks is broadcasted to the Golem network.

The provider’s transaction system collects all broadcasted offers and chooses the best ones. It checks the reputation of each node in the reputation systemand rejects offers from nodes with poor reputation. Then, the provider’s Golem connects with the requestor’s node and submits an offer with a price and information about the machine’s capabilities (performance, available cores, etc.). The requestor’s transaction system checks the provider’s reputation and rejects offers from providers with poor reputation. If the task manager decides that there’s a subtask fitting one of the submitted offers, then a list of resources that should be downloaded via IPFS is sent to the provider.

After all resources are pulled into the provider’s machine, the task computer(on the provider’s machine) can finally start computing.

The task computer is responsible for running and managing the computation. It can start a Docker container, where the proper source code will be executed, it also checks for errors or timeouts. In the future, it may run on a Virtual Machine (slow but secure) or run the code directly on the host machine (quick but insecure). Of course, the second option will only be used in a very specific scenario and providers will only run code from well-checked, secure tasks from the task collection.

Once the computation is finished, its results and logs are sent back to the requestor via the IPFS network. The task manager passes them to the task template for verification. It can be different for every task type. The most popular one is based on redundancy. Some subtasks may be sent to more than one node (the requestor may decide how often and to how many) and their results will be compared. In some cases, it won’t be necessary because verification will be simple — if the task itself is Proof of Work of some kind. Another requestor may be able to compute a random, smaller part of the task and compare it with the result. For example, in the case of rendering it is possible to render some pixels locally and compare their colour with those from the received image. Sometimes a combination of all these method may work.

When the result passes the verification stage, a new due payment is notified to the payment system, which sends the right amount of ethers to the Ethereum contract. Check our previous post to see how it’s implemented and why we’ve decided to use Ethereum. Simultaneously, the provider’s reputation level rises in the requestor’s reputation system (and decreases if the result doesn’t pass the verification). The reputation decline will be lower if the provider sends information about errors during the computation or doesn’t send results at all (in most cases, this would be due to technical problems) and much higher if the result is wrong.

The provider’s payment system monitors the Ethereum blockchain and timeliness of payment. If it’s late the requestor’s reputation gets significantly lower. The reputation system keeps track of all positive and negative interactions with other nodes and uses it to compute a local rank. There are two separate rankings: one for providers and another for requestors. The system exchanges information about these ranks using a differential gossiping algorithm, causing the reputation vector converge to a real global value. The implementation of an efficient reputation system is not easy, but constitutes one of the most important challenges within the project. In the future, we plan to add yet another mechanism to it — a sort of synchronization with some form of Ethereum-based reputation systems, depending on their actual development within the Ethereum ecosystem.

Of course this is just the tip of the iceberg and a glimpse of typical workflow within Golem. In the future, we’ll provide detailed descriptions of other systems.So, how exactly does Golem work? To get an idea, let’s go through the computing task process.

Everything begins when a requestor needs to compute a task using the Golem network. It may be a CGI artist who has just finished working on an animation and wants to render it in high quality or a data scientist who wants to train her machine learning algorithm.

If the requestor’s task belongs to a class already implemented in Golem, she may use one of the task templates from the task collection. The task templatecontains the full computational logic. It has the source code that should be run, knows how to split the task into subtasks that are sent to different nodes, how to verify results and put them together as the final result. Right now, the task collection is just a set of available task templates (Blender and Luxrender rendering in Brass Golem), but later on it will transform into Golem Shop, where users will be able to add and download new use cases.

If the user’s task is something more specific, which isn’t available in the task collection, she would have to write her own code using the task definition framework. In Brass and Clay Golem, only the closed task collection will be available (effectively limiting the use cases to those preprogrammed by us). The task definition framework will be introduced in Stone Golem. It is a generic template providing public interface which allows users to create new types of tasks by implementing their own task templates. These can be stored locally and be used exclusively by the creator or they can be submitted to Golem Shop.

Once the requestor chooses the task template and defines a new task, it is added to the task manager, which keeps track of all tasks that have been requested from this node. Then information about the new tasks is broadcasted to the Golem network.

The provider’s transaction system collects all broadcasted offers and chooses the best ones. It checks the reputation of each node in the reputation systemand rejects offers from nodes with poor reputation. Then, the provider’s Golem connects with the requestor’s node and submits an offer with a price and information about the machine’s capabilities (performance, available cores, etc.). The requestor’s transaction system checks the provider’s reputation and rejects offers from providers with poor reputation. If the task manager decides that there’s a subtask fitting one of the submitted offers, then a list of resources that should be downloaded via IPFS is sent to the provider.

After all resources are pulled into the provider’s machine, the task computer(on the provider’s machine) can finally start computing.

The task computer is responsible for running and managing the computation. It can start a Docker container, where the proper source code will be executed, it also checks for errors or timeouts. In the future, it may run on a Virtual Machine (slow but secure) or run the code directly on the host machine (quick but insecure). Of course, the second option will only be used in a very specific scenario and providers will only run code from well-checked, secure tasks from the task collection.

Once the computation is finished, its results and logs are sent back to the requestor via the IPFS network. The task manager passes them to the task template for verification. It can be different for every task type. The most popular one is based on redundancy. Some subtasks may be sent to more than one node (the requestor may decide how often and to how many) and their results will be compared. In some cases, it won’t be necessary because verification will be simple — if the task itself is Proof of Work of some kind. Another requestor may be able to compute a random, smaller part of the task and compare it with the result. For example, in the case of rendering it is possible to render some pixels locally and compare their colour with those from the received image. Sometimes a combination of all these method may work.

When the result passes the verification stage, a new due payment is notified to the payment system, which sends the right amount of ethers to the Ethereum contract. Check our previous post to see how it’s implemented and why we’ve decided to use Ethereum. Simultaneously, the provider’s reputation level rises in the requestor’s reputation system (and decreases if the result doesn’t pass the verification). The reputation decline will be lower if the provider sends information about errors during the computation or doesn’t send results at all (in most cases, this would be due to technical problems) and much higher if the result is wrong.

The provider’s payment system monitors the Ethereum blockchain and timeliness of payment. If it’s late the requestor’s reputation gets significantly lower. The reputation system keeps track of all positive and negative interactions with other nodes and uses it to compute a local rank. There are two separate rankings: one for providers and another for requestors. The system exchanges information about these ranks using a differential gossiping algorithm, causing the reputation vector converge to a real global value. The implementation of an efficient reputation system is not easy, but constitutes one of the most important challenges within the project. In the future, we plan to add yet another mechanism to it — a sort of synchronization with some form of Ethereum-based reputation systems, depending on their actual development within the Ethereum ecosystem.

Of course this is just the tip of the iceberg and a glimpse of typical workflow within Golem. In the future, we’ll provide detailed descriptions of other systems.