Skip to content
Snippets Groups Projects

Avoiding duplicated computations by having a single observable model

Open Emanuele Roberto Nocera requested to merge merge-observable-splits into master

Created by: APJansen

Goal

The goal of this PR is to speed up the code by a factor 2 by a refactoring that avoids redoing the same computations. Currently there are separate training and validation models. At every training step the validation model is run from scratch on x inputs, while the only difference with the training model is in the final masking just before computing the loss.

This will hopefully also improve readability. From an ML point of view the current naming is very confusing. Instead of having a training model and a validation model, we can have a single observable model, and on top of that a training and validation loss. (Just talking about names here, they may still be MetaModels)

The same holds of course for the experimental model, except that there is no significant performance cost there. But for consistency and readability let's try to treat that on the same footing.

This PR branches off of trvl-mask-layers because that PR changes the masking. That one should be merged before this one.

Current implementation

Models creation

The models are constructed in ModelTrainer._model_generation. Specifically in the function _pdf_injection, which is given the pdfs, a list of observables and a corresponding list of masks. For the different "models", both the values of the mask but also the list of observables change, as not all models use all observables, in particular the positivity and integrability ones. This function just calls the observables on the pdfs with the mask as argument. And each observable's call method, defined here, does two steps: 1. compute the observable, 2. apply the mask and compute the loss.

Models usage

Once they are created, the training model is, obviously, used for training here. The validation model used to initialize the Stopping object. The only thing that happens there is that its compute_losses method is called. Similarly for the experimental model, where it is called directly in the ModelTrainer (here).

Changes proposed

  1. Decouple the masking and loss computation from the ObservableWrapper class. Just remove those parts from ObservableWrapper, and create perhaps an ObservableLoss layer that does this.
  2. Apply this pure observable class to the pdfs, for all observables, to create an observables_model.
  3. Create 3 loss models, that take as input all observables, do both a masking and a selection and a computation of losses.
  4. For the training one, put it on top of the observables_model, to create a model identical to the current training model.
  5. Add the output of the observables_model to the output list of this training model, so these can be reused.
  6. The validation and experimental models can be discarded, instead we have the validation and experimental losses that are applied to the output of the observables_model. So e.g. we can replace self.experimental["model"].compute_losses() with experimental_loss(observables).

Merge request reports

Ready to merge by members who can write to the target branch.
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • requested review from @enocera

  • Created by: APJansen

    I'm looking into the first point, decoupling the computation of the observables from their masking and loss. Some questions @goord

    Currently in _generate_experimental_layer this happens:

    1. observables computed
    2. masks applied one by one
    3. masked observables concatenated
    4. some rotation
    5. loss computed

    What does this rotation do?

    And is it possible to change this to (at the cost of concatenating masks inside observable_generator): Here:

    1. observables computed
    2. UNmasked observables concatenated
    3. (rotation?)

    subsequent loss layer:

    1. masks applied to concatenated observables
    2. loss computed

    Also I've probably seen this before but still I'm confused why there is both a mask applied directly to the observables, in _generate_experimental_layer, and also inside the LossInvcovmat itself?

  • Created by: goord

    Yes these rotations are triggered by the 'data_transformation_tr', which is used if you represent the experimental data in a covariance-diagonal basis I guess. I'm not sure when this is actually used, and I'm not sure whether this code path is propoerly tested in the trvl-mask-layers branch...

  • Created by: goord

    The mask in LossInvCovmat is not used for masking training/validation I think.

  • mentioned in issue #1803

  • Created by: APJansen

    @goord Why does the experimental output have rotation=None while the others have rotation=obsrot, around here? Is that intentional? It interferes a bit with how I thought the observable computation would decouple from the masked loss.

  • Created by: goord

    @goord Why does the experimental output have rotation=None while the others have rotation=obsrot, around here? Is that intentional? It interferes a bit with how I thought the observable computation would decouple from the masked loss.

    This is a rewrite of line 289 in the master. I don't know why the diagonal basis is not used for the experimental output layer, perhaps @scarlehoff or @RoyStegeman can explain us.

    If you look at n3fit_data.py you can see that in the diagonal basis, the training and validation covmats are being masked and then inverted, but the full covmat inverse (inv_true) is computed in the old basis.

  • Created by: scarlehoff

    Because when they were separated it didn't really matter and it is decoupled from training / validation (the idea of diagonalising is to be able to do the split removing the correlations within a dataset between training and validation).

  • Created by: APJansen

    Hm I don't fully understand, but is it ok to uniformize this? I now calculate all observables without any mask once, so using the same settings, and then mask the folds and the tr/val split afterwards. It's passing all the tests and also giving identical results for the main runcard.

  • Created by: goord

    Hm I don't fully understand, but is it ok to uniformize this? I now calculate all observables without any mask once, so using the same settings, and then mask the folds and the tr/val split afterwards. It's passing all the tests and also giving identical results for the main runcard.

    You can try the diag-DIS runcard to check the observable rotation: DIS_diagonal_l2reg_example.yml.txt

  • Created by: APJansen

    Seems to work fine, and gives the same results as trvl-mask-layers.

  • Created by: scarlehoff

    I don't fully understand

    The chi2 (should not) depend of the diagonalization, since the total covmat is only used to report the total chi2, nobody cared about diagonalising that because it was not needed.

    but is it ok to uniformize this?

    Yes because see above.

  • Created by: APJansen

    Ok perfect, thanks :)

  • Created by: APJansen

    @scarlehoff @Radonirinaunimi Positivity is included in the validation model, I remember we discussed this before, and if I remember correctly there was some disagreement on whether this was necessary or not, is that right? If I remove it, I get an error from this line, which can be fixed by changing fitstate.validation to fitstate._training, after which it runs normally (though I haven't done any comparisons).

    Right now I'm thinking that to remove the repeated calculation of observables, the easiest is to combine the training and validation models into one model that computes both of their losses, adding a "_tr" and "_val" postfix and filtering as appropriate when summing to get the final train/val losses. The experimental one can stay separate as the performance loss there is negligible.

    Does that sound ok?

    Of course it would be nicer to instead just have one model and 3 different losses, but that will take longer to implement.

  • Created by: scarlehoff

    I don't understand what you mean. The easiest way looks much more complex to me since you need to filter out things and any bug there will "break" the validation.

  • Created by: scarlehoff

    Also, I'm not completely sure you can achieve your goal here?

    You need to compute everything twice for every epoch just the same.

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading