Closure test L1 consistency in random noise generation (!1695) · Merge requests · Emanuele Roberto Nocera / nnpdf · GitLab

Snippets Groups Projects

Merged Emanuele Roberto Nocera requested to merge using_L0_cv_for_MULT_generation_in_CT into master 2 years ago

Created by: comane

The idea of this PR is that currently (in the master) the MULT uncertainties part of the Covmat used to compute L1 noise in a closure test are computed from the experimental central values (which do not exist within a CT). In this branch the MULT unc. are computed from the L0 central values, i.e., the theory predictions.

Activity

Emanuele Roberto Nocera requested review from @enocera 2 years ago

requested review from @enocera
Emanuele Roberto Nocera requested review from @enocera 2 years ago

requested review from @enocera
Emanuele Roberto Nocera @enocera 2 years ago

Author Owner

Created by: RoyStegeman

At some point I disconnected from the discussion last Wednesday, but could you explain why it has to be done this way? Would be useful if that could also be added to the as a comment in the code or to the docs, whichever is more appropriate.
Emanuele Roberto Nocera @enocera started a thread on commit b095d5db 2 years ago

validphys2/src/validphys/pseudodata.py

pseudodata.py

Emanuele Roberto Nocera @enocera 2 years ago

Author Owner

Created by: comane

@scarlehoff, @andreab1997, @giovannidecrescenzo In the code meeting of the 15.03.2023 it was discussed that the needed modifications should be:

covmat = dataset_inputs_covmat_from_systematics( commondata_wc, level0_commondata_wc, dataset_input_list, use_weights_in_covmat=False, norm_threshold=None, _list_of_central_values=None, _only_additive=sep_mult, )

level1_data = make_replica( level0_commondata_wc, filterseed, covmat, sep_mult=sep_mul, genrep=True )

and adding sep_mult as an arg of make_level1_data. However, I do not think that this would be correct since when _only_additive=True only the ADD types (see the cd.additive_errors method) in the systype file are selected that is only a subset of the actual points. It is in fact also clear by running a vp-setupfit that the modifications wrt to the master in the central L1 values would be huge.
Emanuele Roberto Nocera @enocera 2 years ago

Author Owner

Created by: scarlehoff

That's surprising. The only difference should be whether the multiplicative error is added with the covmat or after and we said in the past that it didn't make a (big) difference. @andreab1997 maybe we got the combination of true/false wrong?

afair in the fit we have the true-true option for the sampling.

Emanuele Roberto Nocera requested review from @enocera 2 years ago

requested review from @enocera

Emanuele Roberto Nocera @enocera 2 years ago

Author Owner

Created by: comane

At some point I disconnected from the discussion last Wednesday, but could you explain why it has to be done this way?

I think the point is that Experimental Central values are not known within a Closure test. Thus, I think, that the multiplicative errors should be generated from the Level 0 central values in order to be more consistent

Emanuele Roberto Nocera @enocera 2 years ago

Author Owner

Created by: RoyStegeman

That I find unconvincing without further explanation. In a closure test we aim to reproduce a known underlying law, thus as L0 data we use predictions from an input PDF. Then we do the fits based on some loss function and see if we can recover the input pdf. Fine so far.

Then, we have a choice regarding the loss function, either we keep it as in the real case (thus multiplicative exp uncs determined using the exp central value) or we determine the uncs going into the loss function using the input pdf. I would like to keep the closure test as close to the real scenario as possible, so in principle I prefer to use the same covmat as in the real scenario.

If there is some reason why using the experimental covmat is wrong/inconsistent, we'll have to accept that we need to deviate from the real covmat and use the input pdf as you propose. My point is that right now I don't see where we go wrong if we use the usual exp covmat.

Emanuele Roberto Nocera @enocera 2 years ago

Author Owner

Created by: comane

Then, we have a choice regarding the loss function, either we keep it as in the real case (thus multiplicative exp uncs determined using the exp central value) or we determine the uncs going into the loss function using the input pdf. I would like to keep the closure test as close to the real scenario as possible, so in principle I prefer to use the same covmat as in the real scenario.

If there is some reason why using the experimental covmat is wrong/inconsistent, we'll have to accept that we need to deviate from the real covmat and use the input pdf as you propose. My point is that right now I don't see where we go wrong if we use the usual exp covmat.

I see your point, however, I think that the real (experimental) covariance matrix is never really used when fitting in a closure test. This is because, correct me if I am wrong, the covmat used when fitting, i.e. the one used to construct the chi2, as well as the one used to add noise to Level 1 data for each data replica, is generated from the Level1 data that have been written in the runcard_name/filter folder (after vp-setupfit) and there, in particular, the central values won't be the experimental ones but the Level 0 ones augmented by some noise. So the question is from which covariance matrix do we sample the noise that we add to Level 0 data in order to get Level 1 data. In any case, I think that the difference is really small.

Emanuele Roberto Nocera @enocera 2 years ago

Author Owner

Created by: scarlehoff

@RoyStegeman this is only for the creation of the data and should not affect the fit. This change should make the level 1 data closer to what is done in the fit when we generate a replica.

Emanuele Roberto Nocera @enocera 2 years ago

Author Owner

Created by: RoyStegeman

This change should make the level 1 data closer to what is done in the fit when we generate a replica.

Good point, somehow I forgot about that for a sec. Nevertheless, I'm not so interested in revisiting this discussion since you've talked about it for a long time and seemed to have come to some sort of agreement which is why I asked for the reasoning behind the choice to be documented (which might be useful since you apparently already remember something different from Mark after two days).

Emanuele Roberto Nocera @enocera 2 years ago

Author Owner

Created by: scarlehoff

(which might be useful since you apparently already remember something different from Mark after two days)

I believe you two were just talking about different things (or were thinking about different things).

In any case, the difference between both methods should be negligible. Having the separate multiplicative flag in the runcard will help checking whether the difference is truly negligible.

Emanuele Roberto Nocera @enocera 2 years ago

Author Owner

Created by: alecandido

I see your point, however, I think that the real (experimental) covariance matrix is never really used when fitting in a closure test.

@comane: re the real covmat I agree with you, but, of course, the additive part of it is exactly that.

Then, we have a choice regarding the loss function, either we keep it as in the real case (thus multiplicative exp uncs determined using the exp central value) or we determine the uncs going into the loss function using the input pdf. I would like to keep the closure test as close to the real scenario as possible, so in principle I prefer to use the same covmat as in the real scenario.

@RoyStegeman about the concept itself I'd agree with @comane: the exp central values should never enter a closure test, not even in the covmat. The idea is that you want to test the methodology in a controlled scenario, i.e.:

I make up data for the central values of which I know the underlying law (the PDF I used for L0 generation)
and I try to reconstruct it as if I were doing the usual fit

But the usual fit prescription is:

take the central value
take additive and multiplicative uncertainties as given by experimentalist (and this we decided not to fake, apart from the closure tests with inconsistent data project)
construct the covmat with central values and exp uncertanties
initialize
start training, with the loss compute using the covmat

But in a closure test step 1. is replaced by the fake central data, and the experimental input should be limited to everything but central values, consistently. Otherwise, we would be leaking information about the exp central values, to which the closure test should be completely blind (even if just through the covmat).

However, I believe that this is the only argument in favor of not using the actual exp covmat, but it really depends on how you decide to factorize experimental input, i.e. what you consider to be independent (in a statistical sense). If the two independent inputs were the central values and the covmat, that you would've been right, but in that case I'd say the experimentalist would quote those values. If instead the independent inputs are central values and add+mult unc, then you should replace the central values at this level with the closure test ones.

Emanuele Roberto Nocera @enocera 2 years ago

Author Owner

Created by: RoyStegeman

I don't agree that the closure test should be completely blind to the exp central values purely for the sake of it. I do agree that the point of the closure test is to have a perfectly controlled/self-consistent scenario. If the closure test needs to be completely blind to the exp central values to achieve this, then that implies that the closure test should be blind to the exp central values as well, but this needs to be explained (which so far has not been done to my satisfaction, but if that's just me then please write down the argument in the docs).

If you say we should use the L1 data for the mult uncs. e.g. because that is consistent with the L1 central data we want to fit while using the exp central values for the mult uncs is not, or because we need the same covmat for sampling and fitting, that would be a perfectly sensible reason to me. Though also those reasons would require explaining why they are more important than sampling the fluctuations from the same covmat as in a real fit.

Emanuele Roberto Nocera @enocera 2 years ago

Author Owner

Created by: alecandido

I don't agree that the closure test should be completely blind to the exp central values purely for the sake of it.

It's not for the sake of it, it is because exp central values are a specific instance of possible central values, and if you keep correlating to them in all your tests you can't prove that your methodology is good enough independently, because it might be still good only for some feature of that specific instance. In a sense, it depends on the goal we are attributing to the closure tests.

So, let's say like this: if we had the choice of picking a PDF that reproduce exactly the known central values at L0, should we use that one? Or should we pick a random one anyhow?

If you say we should use the L1 data for the mult uncs. e.g. because that is consistent with the L1 central data we want to fit while using the exp central values for the mult uncs is not, or because we need the same covmat for sampling and fitting, that would be a perfectly sensible reason to me.

I don't see exactly how to spell out the consistency. It is a rather vague requirement. About the second example (we need the same for sampling and fitting), well, that in a sense we'd need, but at L2: the way the NNPDF methodology is spelled out, it only allows having the same covmat for sampling and loss (up to t0, of course). At L1 generation there is no comparison with the actual fit.

Though also those reasons would require explaining why they are more important than sampling the fluctuations from the same covmat as in a real fit.

That's the point, that is the same one I mentioned above: I believe you want a test that is as independent as possible from the actual fit, to prove your methodology to be good enough in a large enough space. But maybe I'm misinterpreting what people claimed about closure tests (even though I heard many times during the hopscotch saga that not overfitting to be proven by closure tests).

Emanuele Roberto Nocera @enocera 2 years ago

Author Owner

Created by: comane

@RoyStegeman , @scarlehoff , @AleCandido , To recap, the idea of the modification introduced by this PR:

A consistent Closure Test should be defined as:

Generate L0 data: L0 = G[f] (f is the underlying law, G ~ FK table)
Generate L1 data: L1 = L0 + eta, eta ~ N(0,C)
Generate L2 data: L2_k = L1 + eps_k, eps_k ~ N(0,C)

The above Closure test is consistent since both eta as eps_k are sampled from N(0,C), i.e. using the same covariance matrix.

What is actually done in the code (master branch) right now:

Generate L0 data: L0 = G[f]
Generate L1 data: L1 = L0 + eta, eta ~ N(0,Cexp)
Generate L2 data: L2_k = L1 + eps_k, eps_k ~ N(0,CL1)

where CL1 is a covariance matrix whose 'additive part' is exactly the same as Cexp (and therefore consistent) the 'multiplicative part', however, has been generated from L1 central values. Therefore, in general, Cexp != CL1.

What is done in this PR:

Generate L0 data: L0 = G[f]
Generate L1 data: L1 = L0 + eta, eta ~ N(0,CL0)
Generate L2 data: L2_k = L1 + eps_k, eps_k ~ N(0,CL1)

where CL0 is a covariance matrix whose 'additive part' is exactly the same as Cexp (and therefore consistent) the 'multiplicative part', however, has been generated from L0 central values. Note that in this case one still has CL0 != CL1, however, we can expect the 'multiplicative entries' of CL0 to more closely resemble those of CL1 (since those of Cexp could deviate by a significant amount from those of CL1 if there is no good data - theory agreement)

Emanuele Roberto Nocera @enocera 2 years ago

Author Owner

Created by: comane

please write down the argument in the docs

Hi @RoyStegeman , thank you for your comments. By writing in the docs, do you mean to add something to https://docs.nnpdf.science or simply to comment the make_level1_data() function that has been modified in this PR?

Emanuele Roberto Nocera @enocera 2 years ago

Author Owner

Created by: RoyStegeman

In my first comment I said "whichever is more appropriate", though since the argument contains some subtleties that cannot be summarized in one or two sentences I would suggest adding it to the actual documentation https://docs.nnpdf.science/. These docs are built from the docs folder in this repo.

It may still be useful to add "what" the function does as a small comment in the code, but to me the docs seem like a more appropriate place to answer "why" it's done the way it is. There's no strict rules though so just do whatever you think is best, people will have to review this before it gets merged anyway so it could even be changed later.

Emanuele Roberto Nocera @enocera 2 years ago

Author Owner

Created by: alecandido

By writing in the docs, do you mean to add something to docs.nnpdf.science or simply to comment the make_level1_data() function that has been modified in this PR?

Just to mention it: in principle even the docstring of make_level1_data() is part of the docs on https://docs.nnpdf.science. Even if it is not reachable by the TOC (but it is discoverable searching). https://docs.nnpdf.science/modules/validphys/validphys.html?highlight=all_preds#validphys.pseudodata.make_level1_data

Emanuele Roberto Nocera @enocera 2 years ago

Author Owner

Created by: comane

report for Rbv computed on a multiclosure test

https://vp.nnpdf.science/CCq46px1QAygO0h3HlPULQ==/#example-report

Note: Rbv is not evaluated on test data but on the same data used in training

Emanuele Roberto Nocera @enocera 2 years ago

Author Owner

Created by: comane

@RoyStegeman , @Zaharid , @scarlehoff Multiclosure report for Rbv

https://vp.nnpdf.science/7sXVqI6BRgyFFRFzDRazuA==/

the training and test datasets are the same as the one used in NNPDF4.0 CT, see tab 6.2 of NNPDF4.0 paper. Results seem to be consistent with the ones of M.W.

Please register or sign in to reply