Skip to content

Avoiding duplicated computations by having a single observable model

Emanuele Roberto Nocera requested to merge merge-observable-splits into master

Created by: APJansen

Goal

The goal of this PR is to speed up the code by a factor 2 by a refactoring that avoids redoing the same computations. Currently there are separate training and validation models. At every training step the validation model is run from scratch on x inputs, while the only difference with the training model is in the final masking just before computing the loss.

This will hopefully also improve readability. From an ML point of view the current naming is very confusing. Instead of having a training model and a validation model, we can have a single observable model, and on top of that a training and validation loss. (Just talking about names here, they may still be MetaModels)

The same holds of course for the experimental model, except that there is no significant performance cost there. But for consistency and readability let's try to treat that on the same footing.

This PR branches off of trvl-mask-layers because that PR changes the masking. That one should be merged before this one.

Current implementation

Models creation

The models are constructed in ModelTrainer._model_generation. Specifically in the function _pdf_injection, which is given the pdfs, a list of observables and a corresponding list of masks. For the different "models", both the values of the mask but also the list of observables change, as not all models use all observables, in particular the positivity and integrability ones. This function just calls the observables on the pdfs with the mask as argument. And each observable's call method, defined here, does two steps: 1. compute the observable, 2. apply the mask and compute the loss.

Models usage

Once they are created, the training model is, obviously, used for training here. The validation model used to initialize the Stopping object. The only thing that happens there is that its compute_losses method is called. Similarly for the experimental model, where it is called directly in the ModelTrainer (here).

Changes proposed

  1. Decouple the masking and loss computation from the ObservableWrapper class. Just remove those parts from ObservableWrapper, and create perhaps an ObservableLoss layer that does this.
  2. Apply this pure observable class to the pdfs, for all observables, to create an observables_model.
  3. Create 3 loss models, that take as input all observables, do both a masking and a selection and a computation of losses.
  4. For the training one, put it on top of the observables_model, to create a model identical to the current training model.
  5. Add the output of the observables_model to the output list of this training model, so these can be reused.
  6. The validation and experimental models can be discarded, instead we have the validation and experimental losses that are applied to the output of the observables_model. So e.g. we can replace self.experimental["model"].compute_losses() with experimental_loss(observables).

Merge request reports

Loading