Skip to content

Refactor stopping

Emanuele Roberto Nocera requested to merge refactor_stopping into master

Created by: APJansen

This simplifies the stopping.py module, reducing the number of classes and the coupling to the outside. It also rewrites the WriterWrapper.

I started looking at this because #1782 requires changes here, but the changes in this PR are purely a refactor that should do exactly the same.

Tests

I test this by running on this branch, renaming the output folder, running on master, and comparing the two. We only expect changes in the timing and versions. I've tested with:

  • n3fit Basic_runcard_modified.yml 1 -r 5
  • n3fit Basic_runcard_modified.yml 4 -r 5
  • n3fit Basic_runcard_modified.yml 3

In all cases the only differences are the versions and the timings, the rest is identical. Note I've modified the basic runcard to speed it up, increasing the learning rate so it stops quickly. I think it is still general enough for these tests, in the case with 5 replicas, the best epochs are 77, 72, 77, 72, 75, so not a degenerate case where coincidentally they're all equal or something.

Comments

  • WriterWrapper is a class with one method, that gets initialized and immediately after the method gets called, and that's all that happens with it. So it could just as well have been a function. I left it as a class though because it was easier and I'm not sure if everyone will agree.
  • Similarly in the stopping module, there were 4 classes and it was quite hard to figure out which calls which etc. I removed one, ReplicaState, and moved a lot of the functionality from FitHistory to the main Stopping class. Now FitHistory is essentially nothing more than a list of the 4th class FitState, so perhaps I should remove that entirely, keeping only Stopping and FitState.
  • Some of the timings are stored for each replica, but if multiple replicas are run these will be the overal timings. However when you run them one at a time it will be different of course. Also there is nothing better to report on a per-replica basis if run in parallel, so I don't think there is anything better to do and I just added a comment on it.
  • In WriterWrapper I needed both the pdf_model itself and the pdf_instance which is the model wrapped by N3PDF. I got the former from the latter by accessing its attribute, but that is supposed to be private so need to clean that up.

Merge request reports

Loading