Avoiding duplicated computations by having a single observable model
Created by: APJansen
Goal
The goal of this PR is to speed up the code by a factor 2 by a refactoring that avoids redoing the same computations. Currently there are separate training and validation models. At every training step the validation model is run from scratch on x inputs, while the only difference with the training model is in the final masking just before computing the loss.
This will hopefully also improve readability. From an ML point of view the current naming is very confusing. Instead of having a training model and a validation model, we can have a single observable model, and on top of that a training and validation loss. (Just talking about names here, they may still be MetaModels)
The same holds of course for the experimental model, except that there is no significant performance cost there. But for consistency and readability let's try to treat that on the same footing.
This PR branches off of trvl-mask-layers because that PR changes the masking. That one should be merged before this one.
Current implementation
Models creation
The models are constructed in ModelTrainer._model_generation
.
Specifically in the function _pdf_injection
, which is given the pdfs, a list of observables and a corresponding list of masks.
For the different "models", both the values of the mask but also the list of observables change, as not all models use all observables, in particular the positivity and integrability ones.
This function just calls the observables on the pdfs with the mask as argument.
And each observable's call method, defined here, does two steps: 1. compute the observable, 2. apply the mask and compute the loss.
Models usage
Once they are created, the training model is, obviously, used for training here.
The validation model used to initialize the Stopping
object. The only thing that happens there is that its compute_losses
method is called. Similarly for the experimental model, where it is called directly in the ModelTrainer
(here).
Changes proposed
- Decouple the masking and loss computation from the
ObservableWrapper
class. Just remove those parts fromObservableWrapper
, and create perhaps anObservableLoss
layer that does this. - Apply this pure observable class to the pdfs, for all observables, to create an
observables_model
. - Create 3 loss models, that take as input all observables, do both a masking and a selection and a computation of losses.
- For the training one, put it on top of the
observables_model
, to create a model identical to the current training model. - Add the output of the observables_model to the output list of this training model, so these can be reused.
- The validation and experimental models can be discarded, instead we have the validation and experimental losses that are applied to the output of the observables_model. So e.g. we can replace
self.experimental["model"].compute_losses()
withexperimental_loss(observables)
.