New CommonData Reader (!1678) · Merge requests · Emanuele Roberto Nocera / nnpdf

Emanuele Roberto Nocera requested to merge final_reader_for_new_commondata_mk2 into master Feb 23, 2023

Created by: scarlehoff

This branch contains not only the final reader, but also the new commondata and all changes from master.

Some checks performed

The fitbot there act as a check that old theories can still be used (the fitbot uses theory 200... it will soon be changed to theory 700). There are small changes in the fitbot (while the regression tests have not changed) because there are some dataset for which the precision of the data has changed (e.g., we had 4 digits before and now have 5), but it is only a few LHCB and maybe maybe jets, so it doesn't show up in any regression.

I will submit another fitbot just before the merge to update the reference bot.

This is a comparison to the last baseline (NNPDF40_nnlo_as_01180_qcd) where I've used exactly the same runcard (i.e., I haven't changed the names of the datasets that have been translated automatically by vp) https://vp.nnpdf.science/ClK5YFI-TjCBkzTeewuFow== (since the report is also done with new data, it might be informative to compare to a report done with old data 1)

And this one is the same fit but this time the names of the runcard have also been updated: https://vp.nnpdf.science/QaBlf8XvSmSe8UWMvzIy3g==

In both cases they are <60 replica fits.

The fits have been done with this commit https://github.com/NNPDF/nnpdf/commit/1a8bf48945a04b00455c42b019d3c0ab2552c601 which corresponds to this one https://github.com/NNPDF/nnpdf/commit/7e6599a821b6f1d0fb30c705339d3c047da46c99 before the rebase on top of the last batch of comments in the other branch.

Checks for sets of data separated by experiment

https://vp.nnpdf.science/chwFM_lJR025vREJ1I9VPQ== https://vp.nnpdf.science/Toy_r6uFRm-h1oUFEZF6hw== https://vp.nnpdf.science/-2VyNN3CTHWnb26fUQkUJQ== https://vp.nnpdf.science/YPAQHvMtTeyBuNq12nl6RQ== https://vp.nnpdf.science/UHU-TYLJQCuE-8lRHe8Aog== https://vp.nnpdf.science/oSZLrPg3Tyyf-i2mE33G-Q== https://vp.nnpdf.science/wCFRKBNsSA2U2O_6Sa75Zw== https://vp.nnpdf.science/E0IDIgWFRF6tvdFk3pD0mA== https://vp.nnpdf.science/0YL4eaPTT7eR3wM9IQwj5w==

You should be able to use it with

git checkout final_reader_for_new_commondata_mk2

You can even install this in a separated isolated virtualenvironment in your computer without having to clone or checkout the repository with:

python -m venv nnpdf_venv
source nnpdf_venv/bin/activate
python -m pip install git+https://github.com/NNPDF/nnpdf.git@final_reader_for_new_commondata_mk2

Leaving the original msg here:

This branch / PR is the final reader for the new commondata files.

As you can see, this is a bit less ambitious than the previous PR. I've added a reader and only a reader. The changes to other parts of the code are minimal and well contained*. The reader is able to read the new commondata but the output is an object of the old type. This will allow me to keep this branch on track with master.

@enocera in principle people can branch out of this branch to implement the new dataset so that they can test. As long as they don't change the code itself there will be no conflicts. If people find it difficult I can test the new commondatas myself as they get implemented.

@Zaharid since this branch is only going to add a reader and thus there will be no conflicts with master, feel free to comment or even modify the parsers to your heart's content. This is even almost orthogonal to people's implementation of the new data so it should be fine.

For people implementing new commondata:

The new data should be found together with the old commondata inside the commondata/new folder found in: validphy2/src/validphys/datafiles/. Note that the "new" part will dissapear. It is just to have the new and old commondata files differentiated.

So, for instance, if you implement a set of data like CMS_TTBAR_13TEV_2L_DIF then it will be inside: validphy2/src/validphys/datafiles/commondata/new/CMS_TTBAR_13TEV_2L_DIF .

If you installed the code with pip install -e . then you can update the metadata there, no need to reinstall or anything (like it used to be the case with cmake).

Then you can load the new commondata in the same way you would do so with Validphys The you can load the new commondata with:

from validphys.api import API
ds = API.dataset(dataset_input={"dataset": "DATASET"}, theoryid=400, use_cuts="internal")

Important: take into account that the theory id must be one of the new ones, since the theory is read from the commondata itself.

TODO

There's a few "TODO" within the code. These are things to be changed at the end since they might require changes somewhere else or that are useful for debugging.
One thing missing is to make sure that anywhere in vp where peek_commondata_metadata is used one can also use the new CommonMetaData class. It should be a trivial change but will wait until the end.
Remove the cases in which there are both load and load_commondata (as now they will both do the same thing)

(feel free to edit this comment with other to-do items that might be important)

*note that this is possible only now that there is no longer the need to keep C++ compatible or literal objects in the pipeline.

New CommonData Reader

Some checks performed

For people implementing new commondata:

TODO

Merge request reports