NNPDF4.0 closure test runcards
Created by: wilsonmr
I'd quite like to get a second opinion on the runcard which will be used for the NNPDF4.0 closure tests. I made some "easier on the eye" summaries of the two sets in the following reports:
-
Fitted data: https://vp.nnpdf.science/spjGAIj0QgCKvp33-gZ69Q==(see comments below) -
Out of sample data: https://vp.nnpdf.science/XDkEioYKTSOVf-S3o7S_5w==(see comments below)
In essence I took NNPDF40_full_candidate.yml
from #675 and then put most datasets with # N
in the out of sample, and kept the rest as 3.1 datasets. With some caveats:
I decided the following datasets were updated versions of 3.1 datasets:
- { dataset: CHORUSNUPb, frac: 0.5} # U
- { dataset: CHORUSNBPb, frac: 0.5} # U
- { dataset: NTVNUDMNFe, frac: 0.5, cfac: [MAS]} # U
- { dataset: NTVNBDMNFe, frac: 0.5, cfac: [MAS]} # U
- {dataset: HERACOMB_SIGMARED_C, frac: 0.5} # U
- {dataset: HERACOMB_SIGMARED_B, frac: 0.5} # U
- { dataset: ATLASWZRAP11CC, frac: 0.5, cfac: [QCD] } # U
- {dataset: ATLAS_1JET_8TEV_R06_DEC, frac: 0.5, cfac: [QCD]} # U
I think most of these aren't controversial picks, except for ATLAS_1JET_8TEV_R06_DEC
I thought that in essence this was more like an updated dataset than a completely new one because we had single jets in 3.1 and if I remove this we have absolutely none since the CMS single jets have been removed.
I tested that fitted data and out of sample data don't intersect and also checked that all 76 datasets were allocated to fitted or out of sample. Using the following script:
Click to see script
#!/usr/bin/env python
"""
intersect_datasets.py
Give runcard A and B, check how many datasets how many datasets intersect
and print as fraction of both individual sets of data. This doesn't check whether
the settings for the datasets are consistent.
"""
import argparse
from reportengine.compat import yaml
def runcard_to_dataset_set(runcard_yaml):
"""Given a parsed configuration file (fit or validphys runcard) extract
the experiments key and then return a set containing the name of all the
datasets.
"""
experiments = runcard_yaml["experiments"]
return {ds["dataset"] for exp in experiments for ds in exp["datasets"]}
def main():
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("runcard_a", type=str, help="path to first runcard")
parser.add_argument("runcard_b", type=str, help="path to second runcard")
args = parser.parse_args()
with open(args.runcard_a, "r") as file:
runcard_a = yaml.safe_load(file)
set_a = runcard_to_dataset_set(runcard_a)
with open(args.runcard_b, "r") as file:
runcard_b = yaml.safe_load(file)
set_b = runcard_to_dataset_set(runcard_b)
set_intersect = set_a.intersection(set_b)
print(
f"{len(set_intersect)} / {len(set_a)} datasets overlapped from "
f"{args.runcard_a}. With {len(set_intersect)} / {len(set_b)} datasets from {args.runcard_b}."
)
if __name__ == "__main__":
main()
But I don't know how much reusability this script has so haven't added it to the repo.