Developer Tech News #4
This week’s tech update comes to you from our annual company getaway where we’ve been working hard on our strategy for Brass release and beyond. Last week was a pretty productive time for our developers so let’s get stuck in with our updates…
Command line
If you have ever used the Golem CLI, you may have noticed a load subcommand that allows you to request tasks to be computed in the Golem Network in a headless environment. Unfortunately, it wasn’t possible to generate those presets with the new GUI and the feature was removed last week, because it depended on the jsonpickle library, (see last week’s news for more information).
This week we have added a support for adding tasks without a GUI from a file preset that uses the same format as our RPC API.
Task computation
From today, Golem will internally store the reason why a task cannot be computed. It will make easier for users to find out why their tasks are not being computed by the network, for example if they have set too low price or too high reputation threshold.
On top of this, our Lead Software Engineer, Aleksandra Skrzypczak, has refactored the benchmarks part of Golem. Previously the Blender and Lux Render benchmarks were called directly from Golem core. That may not be a big issue for Brass, but it will make it easier to add new types of apps in the future.
Execution logs
Another problem we have fixed was the manner in which we saved logs. Golem always saves execution logs to a separate file. Unfortunately, two programs we make use of, HyperG and the Geth Ethereum client, do their own logging as well. This has meant that these logs were not saved to disk and made it really difficult for us to provide support whenever the root cause of problems was in HyperG or Geth.
Senior Software Engineer, Paweł Peregud, who cooperated with Golem dev Dariusz Rybi to implement these features, compares our solution to the well-know Linux program, tee:
HyperG already puts the logs into its own file, but Geth does not and it’s not possible to make it so. Thus, we emulate tee from a separate Python thread and gather the logs from all the subprocesses.
Hardware virtualization
There are some other fixes regarding hardware support. While most of you probably have four or eight logical cores on your machine [1], there exist manycore machines containing 64 cores per CPU. You might think that might not create problems and that’s right, until you leave Linux.
As always, the devil is in the detail. Golem uses Docker to achieve both consistent and safe, task execution across the platforms. Docker takes advantage of the LXC (Linux Container) feature on Linux so as to be as lightweight as possible. Unfortunately, Mac and Windows don’t support such APIs, which means that Docker on these platforms is, in fact, equivalent to a virtual machine.
These operating systems put limitations on guest systems, namely, they limit the number of cores to be presented to a guest OS to 32 and 16 in case of MS Windows and Mac OS X respectively. Violating these constraints will result in an error. Golem now takes this into account whenever Virtual Machines have to be handled.
As Adam Banasiak, our Software Engineer explains:
There was a case when someone’s Golem crashed straight to hell because the VM was configured to have 47 cores, and VirtualBox did not support it. This prevents that.
Task verification
Recently we have done additional research on a Blender local work verification. In general Golem renders a small part of a whole image and compares it with received result. To make sure that both parts were generated from the same scene with the same parameters, it uses specialized metrics for image comparison (a function returning a distance between two images).
As a result of this exercise, we discovered that the minimal size of our “small part” should be at least 1%* of the size of whole scene. We’ve also decided to change the metrics to ones from OpenCV library which are more appropriate for Golem use cases.
From now verification has been moved to become a separate process.
POST-EDIT REMARK
*By 1% we’ve meant 1% of each dimension, ie. 0.01% of pixels in whole scene. We would like to thank our community members on Reddit for pointing that out and enabling us to be more precise here.
Sidenotes
[1] You may wonder, what is a logical core, and if computer cores exist, which act illogically. The whole thing boils down to hyperthreading — a technology, which allows a CPU to execute more than one instruction at a time. Since operating systems would find it difficult to handle such extensions, such one hyperthreaded processor is simply represented by the OS as multiple processors. Hence the name logical.
—
Marcin Mielniczuk contributed to this article