Covariance matrices
Created by: nhartland
I'll try to summarise here the situation with covariance matrices and we can try to reach some agreement. Please edit if I've left something out. The problem is that we have two fundamentally different covariance matrices
- The sampling (experimental) matrix.
- The fitting (t0) matrix.
Where by 'experimental' here I mean in the sense that it's not t0. On top of these we have various sorts of uncertainties that we want to introduce.
- C-factor Monte Carlo uncertainties.
- Fudge factors that we invent in order to make chi2s look better (ZpT).
- MHO uncertainties (in NNPDF3.1 only jets, now all datasets).
These we can wrap up under the umbrella of 'theory uncertainties'. We want to be able to add these uncertainties to either the sampling or fitting matrices at will. On top of this we also want to have a mechanism for the introduction of weights.
What must be decided is
- Where and how are these matrices to be used and stored.
- What code computes the matrices.
- How do we specify which theory errors we want to add to which matrices.
- How should these matrices be communicated (if at all) between codes.
Point 1 : Where and how are these matrices to be used
The fit needs the full fitting covariance matrix in order to perform cuts to it (the training-validation split). It then needs to compute the sqrt fitting matrix for use in the chi-squared computation. Both of these should be stored in the Experiment
class.
The sqrt sampling matrix must be available at the start of the fit but it doesn't need to be stored thereafter (we should avoid wasting memory).
Validphys can get by with just the sqrtcovmat (the full covariance matrix can be easily reconstructed).
Point 2 : What code computes the two original matrices
There should be a function in libnnpdf to compute the Experimental part of both the fitting and sampling matrices (at this point differing only by the use of t0).
This should not be called by the Experiment
class. This should only be called by vp-setupfit
where it is used to construct the total fitting and sampling covariance matrices, adding in theory errors as per point 3.
Point 3 : How do we specify which theory errors we want to add to which matrices.
We want to deprecate the existing system for fudge-factors and C-factor errors (sys10)
and have them included in the theory matrices by vp-setupfit
. How the construction of the matrices is specified is undecided.
It should: Express which theory errors (MHOU, C-factor, fudge) should be applied to which matrices (fitting, sampling). This should be driven by the fit runcard.
Point 4. How should these matrices be communicated (if at all) between codes.
vp-setupfit
will compute the total matrices and write them to file (one for fitting, one for sampling). nnfit
, particuarly the Experiment
class will then read these and perform the Cholesky decomposition. It will store the fitting covariance matrix and sqrt covariance matrix, but use the sampling sqrt matrix only upon initialisation, then discarding it.