Resource system
Created by: Zaharid
I am going to write my ideas for a generalized resource system, so that they can be taken into account for #223. It also generalizes the belated zsvc idea.
The goal is to bring us anywhere close to having reproducible results, in general, by default, and for an extended period of time. Also help identify which results have been compromised by bugs or updates to data or theory.
The idea is that "everything is a resource" with some capabilities:
- Has an unique ID that is completely immutable.
- Has a label marking what it is supposed to represent.
- Can be deprecated and superseded by another resource.
- It is possible to identify all users of a given resource (corresponding to a unique id) and in particular those of a deprecated resource.
Several resources can have the same label, but only the latest one is the active resource corresponding to that label, with all other being deprecated.
I'd like to extend the reportengine
/validphys
capabilities to be able to version resources. So that, similar to the primitive functionality that I added to vp for alpha_s, there is something like
config_key: config_value
template_text: |
{@table_with_expensive_thing@}
{@plot_with_expensive_thing@}
actions_:
- report(main=True)
exports_:
- action: compute_expensive_thing
label: "Right way of doing expensive things"
where table_with_expensive_thing
and plot_with_expensive_thing
both depend on compute_expensive_thing
(which in turn depends on config_key
). The result of compute_expensive_thing
also knows how to serialize and deserialize itself to a file. Uploading that report will assign an unique ID to it, something like sQs8EZjSQxOA3Uq7H7sA0w==
. Then one can use the result of compute_expensive_thing
like this:
ns_containing_cached_value:
resource_: sQs8EZjSQxOA3Uq7H7sA0w==/resources/compute_expensive_thing
template_text: |
{@with ns_containing_cached_value@}
config_key = {@config_key@}
{@table_with_expensive_thing@}
{@plot_with_expensive_thing@}
{@endwith@}
actions_:
- report(main=True)
where the value of resource_
corresponds to a folder in the filesystem (and in the storage server) that somehow knows how to turn itself into a namespace (that we called ns_containing_cached_value
) containing the previously computed value of compute_expensive_thing
, but also all its dependencies (so that there is no room for inconsistencies), so that the line in template_text
prints config_key = config_value
. Then the plot and the table would be computed based on the result for compute_expensice_thing
that was already obtained.
More often one would write instead
ns_containing_cached_value:
resource_: "Right way of doing expensive things"
where we have written the label from above. This would resolve to exactly the same resource if there isn't another version, but additionally write somewhere in the input what is the exact version that got used, so that running again the input (that gets saved in every validphys run) would give the same output, regardless of future updates.
If a resource is deprecated by an updated version, any attempt to use it would result in a warning. The warning should also be displayed on the html output of any result using the outdated resource, so that anybody viewing it at a later point can know that the result is affected by some deprecation.
To that end, the information on deprecations must be contained in remote indexes that are queried before each run and also accessible to the html output (dynamically via some javascript).