training and validation loss are 1 epoch out of sync n3fit
Created by: wilsonmr
I was looking at the effect of not using lookback in n3fit
and the easiest quick way to do this was to change the training and validation masks to both include all of the data
we would expect that the training and validation chi2 to be the same but I was seeing that they weren't, after some printing:
before saving state: 43.45235877403846, 43.689744591346155
[INFO]: At epoch 100/40000, total loss: 43.689744591346155
NMC: 43.690
Validation loss at this point: 43.45235877403846
before saving state: 44.35763221153846, 43.45235877403846
we see that the training chi2 is for the wrong epoch (one before). In practise this probably makes little difference and I guess only really changes the chi2 log
it's because in stopping we use the training_info
which is before the weights are updated and then calculate the validation loss with the updated weights below.
I think it should be fixed however looking at the code it isn't instantly obvious how - I suppose _parse_training
needs to either evaluate the model at the correct epoch