Making PDFs out of fits should be easier
Created by: Zaharid
Getting a "publish quality" LHADF set out of a fit should be simpler than it is. There are three things that need to be done:
- We likely want to have a different name for the PDF and the corresponding fit.
- We however want to have a way to communicate what the original fit was. E.g. by putting it in the .info YAML description, although perhaps we want to have a separate key for it (and teach vp about it).
- We want to update other parts of the description by e.g. adding arxiv references.
- We want to compress the result. Some care must be taken given that the LHAPDF set in the fit folder is a bunch of symlinks, so one has to take care to pass the
--dereferece
flag to thetar
command (AFAICT this is not possible withshutil.make_archive
and I couldn't be bothered to look intotarfile
; see https://bugs.python.org/issue37601).
For reference this is the script I used (I created a new conda environment and downloaded all the fits first).
import pathlib
import re
import multiprocessing
import subprocess
import shutil
import ruamel_yaml as yaml
from validphys.loader import FallbackLoader as Loader
def fixup_ref(new):
l = Loader()
p = l.check_pdf(new)
fit = l.check_fit(new)
desc = fit.as_input()["description"]
infopath = pathlib.Path(p.infopath)
with open(infopath) as f:
y = yaml.YAML()
res = y.load(infopath)
res["SetDesc"] = desc
res["Reference"] = "arxiv:1802.03398"
with open(infopath, "w") as f:
y.dump(res, f)
def rename(old, new):
subprocess.run(["fitrename", "-rc", old, new], check=True)
compress(new)
def compress(new):
fixup_ref(new)
l = Loader()
p = l.check_pdf(new)
dst = pathlib.Path(p.infopath).parent
subprocess.run(
["tar", "--dereference", "-czvf", f"res/{new}.tar.gz", "-C", str(dst.parent), new]
, check = True
)
# shutil.make_archive(f"res/{new}", "gztar", root_dir=dst.parent, base_dir=new)
def main():
p = multiprocessing.Pool()
fits = list(pathlib.Path().glob("NNPDF*"))
tasks = []
for fit in fits:
res = re.match("(NNPDF.*_as_)(\d+)(_.*)", fit.name)
if not res:
raise Exception(fit.name)
head, val, tail = res.group(1), res.group(2), res.group(3)
if tail == "_ascorr_notop":
new = f"{head}{val}{tail}"
tasks.append(p.apply_async(compress, (new,)))
elif tail in {"_uncorr_s4", "_uncorr_s3"}:
tail = "_ascorr"
new = f"{head}{val}{tail}"
tasks.append(p.apply_async(rename, (fit.name, new)))
else:
raise Exception(f"bad tail, {tail}")
for task in tasks:
task.get()
p.close()
p.join()
if __name__ == "__main__":
main()
@siranipour do you think that we could add the relevant functionality to fitrename
or some similar script?