Pineappl integration with validphys
Created by: scarlehoff
EDIT: This branch is outdated with respect to master as it points to a master pre-thcovmat in n3fit, #1578 is a rebase of this branch to 08/07/2022 master: https://github.com/NNPDF/nnpdf/commit/3098d0aa36c8d90878c35047b77a9c7fa802da84
As promised, the pineappl integration with vp is in preparation.
In this first commit the pineappl tables are already compatible with vp and can be used for instance in predictions
(like with the commondata reader, I put a jupyter notebook at the root of the repository).
The fit part I will do tomorrow, which is a bit more complicated (if I want to keep both the old and new theories) since I will have to reconstruct the fktable from the pandas dataframe.
I will also need to modify FKTableData
to hold more interesting information, but nothing to worrysome.
This will also close #1541 (closed)
This PR has ended up being a tad bigger than one would like (and broader in scope). Introducing the pineappl
tables meant it was easier to simplify some of the other stuff (such as the reader for the old fktables) rather than have both of them live together. As such, the review will be more painful than I hoped, so in order to help the reviewer (and myself in 5 years time when I have to remember what I did) I've written some notes on the changes.
TODO before merging
-
pineappl
,eko
inconda-forge
-
approved -
Theory whatever in the server so it can be used by everyone -
Remove jupyter notebook -
Remove the oldmode
flag from the loader
How to install the necessary packages
Since they are still not in conda you will need to install pineappl
in order to use this branch, luckily it is available in the python repos:
pip install pineappl eko
or
conda install pineappl eko
(just in case, make sure that the pineappl version that get installed is 0.5.2
)
Where to get pineappl grids
I've put in the root of the nnpdf server a folder called pineappl_ingredients.tar
which contains a yamldb
(which goes into NNPDF/data
and a pineappls
which goes into NNPDF/data/theory_200
cd ${CONDA_PREFIX}/share/NNPDF/data
scp vp.nnpdf.science:pineappl_ingredients.tar .
tar -xf pineappl_ingredients.tar yamldb/
tar -C theory_200/ -xf pineappl_ingredients.tar pineappls/
rm pineappl_ingredients.tar
cd -
Notes on the changes
-
Jupyter notebook: it's there just so we can play with it a bit, but I'll remove it once this is approved for merging.
-
all
n3fit
files: since now we have in vp a few objects that contain exactly the same information that was passed to some of the dictionaries inn3fit
, I've removed said dictionaries. This is the only reason these files have changed so you might as well ignore them. -
config.py
: separated posdataset and integdataset -
convolution.py
: in order to keep usingtruediv
forRATIO
(which is nice because that means that tensorflow, numpy, pandas or any other library know what to do with the operation) it is necessary to make the pandas denominator into a numpy array. I think this is reasonable since in this case the denominator is usually a total cross section so the indexes are not supposed to match the numerator. -
core.py
: Addedload_commondata
toDataSetInput
andDataGroupSpec
so that one can get just theCommonData
fromlibNNPDF
instead of the wholeDataSet
. ModifiedFKTableSpec
so that it can load both the new and old fktables. Added anIntegrabilitySetSpec
which is equal toPositivitySetSpec
. -
coredata.py
I've added toFKTableData
methods that generate the information needed to fit an fktable:luminosity_mapping
andget_np_fktable
. I've moved the application of thecfactors
toFKTableData
to be equivalent towith_cuts
(so that no funny games need to be played with the dataclass outside). Added also a_protected
flag sincecfactors
andcuts
were done with the old fktables in mind and as explained inconvolution.py
they wil find a different number of points. When the repetition flag was found in apfelcomb, it gets saved into the yaml database and cuts or cfactors are applied accordingly. Note thatFKTableData
works the same no matter where the fktables came from (pineappl or old) -
fkparser.py
Moved the application ofcfactors
away (see above). -
loader.py
I've addedcheck_fkyaml
to load the newyaml
files. Eventually the yaml information will come from the new commondata format so at the moment I've just hardcoded the path of the yamldb inside the data folder. I've separated the positivity and integrability loading. This was not strictly necessary but it facilitated then3fit_data.py
below and it was something that annoyed me since a long time. For testing I've added a flag tocheck_dataset
such that if you useoldmode
as a cfactor, the old fktables are used regardless. This is useful for debugging and will be removed alongside the jupyter notebook for before merging (or as soon as the new theory is prepared). At the moment whether to use pineappl or not does not depend on the theory since the pineappl tables are not a part of any theory at the moment. -
n3fit_data.py
This has been greatly simplified. Most notably, since likemask_fk_tables
have been removed.fitting_data_dict
is no longer a dictionary with a list of dictionaries inside but contains a list ofFittableDatasets
which are coming from the outside. The most interesting part here is that this means that issue #1541 (closed) is also solved. I had to do something else which was creating aTupleComp
class for the masks that depend on the name and the seed, but it is a small price to pay. For the rest most of the changes in this module are just removing stuff. -
n3fit_data_utils.py
Here don't even look at the diff, this module has been basically rewritten from scratch. Some lines just happen to be similar. I've created theFittableDataSet
which contains all the information necessary to fit a given dataset other than the central value. imho, once we have the new common data format and pure pythondatasets
this should be just an extension of those but it's a bit tricky because in its current form this can be shared between replicas and if we add the cv that won't be possible. But anyway, that's a problem for the future weeks. -
pineparser.py
Contains all the logic for loading the new fktables intocoredata.FKTableData
objects completely equivalent to what vp creates reading the old fktables with pandas. This is going to probably be moved topineko
since we believe building and reading the fktables should be one single thing. I'll do it when this https://github.com/N3PDF/pineko/pull/12 PR is merged. -
results.py
This is the reason for theload_commondata
needed incore.py
. Currently getting results needed not only the central value of the data but also to load thelibNNPDF
fktable even if they were not used anywhere. I could live with that (just a waster of memory) when they exist, but this is no longer the case if you only have the pineappl version.