Implement theory uncertainties in n3fit

Created by: wilsonmr

@RosalynLP @enocera @scarlehoff creating new issue for discussing specifically the strategy with theory uncertainties since I think that is slightly detached from whether we can use new runcards in n3fit or not.

I will try and summarise the current situation and then propose what I propose to be the plan.

In libnnpdf the experiment class has fSamplingMatrix and fCovMat. The former of those is only used during replica generation. In MakeReplica we have:

// Compute the sampling covariance matrix with data CVs, no multiplicative error and no theory errors
  if (fSamplingMatrix.size(0) == 0)
  {
    matrix<double> SM = ComputeCovMat_basic(fNData, fNSys, fSqrtWeights, fData, fStat, fSys, false, false, false, "", {});
    fSamplingMatrix = ComputeSqrtMat(SM); // Take the sqrt of the sampling matrix
  }

which basically says that if we never explicitly set the sampling matrix, we create it with default settings (no mult error, no theory error and no theory covmat) the latter two seem confusing but we understand theory error to be errors in the systematics which are labelled THEORY* and theory covariance is this new thing we introduced for the theory covmat paper.

This is all well and good and we call this function from python in n3fit.io.reader.py

We note that in order to include theory uncertainties in replica generation in nnfit we call the following function:

/**
* Read in covariance matrix for replica generation from file, and generate covariance matrix and its square root
*/
void Experiment::LoadRepCovMat(string filename, bool ThUnc, std::vector<int> bmask)
{
  fSamplingMatrix = ComputeCovMat_basic(fNData, fNSys, fSqrtWeights, fT0Pred, fStat, fSys, false, false, ThUnc, filename, bmask);
  fSamplingMatrix = ComputeSqrtMat(fSamplingMatrix);
}

which then means the MakeReplica function doesn't gen the sampling matrix, since it is already set, and theory covmat uncertainties get included in the generation if ThUnc is True and a path to the covmat is provided. This requires that we have a single experiment, because the same covmat would be loaded for each experiment and cause all kinds of issues.

Now we also have the function:

/**
* Read in covariance matrix to be used in fit from file, and generate covariance matrix and its square root
*/
void Experiment::LoadFitCovMat(string filename, bool ThUnc, std::vector<int> bmask)
{
  fCovMat = ComputeCovMat_basic(fNData, fNSys, fSqrtWeights, fT0Pred, fStat, fSys, true, true, ThUnc, filename, bmask);
  fSqrtCov = ComputeSqrtMat(fCovMat);
}

which sets the covmat used in the fit to include all uncertainties including theory covmat if ThUnc is True.

The key is that these functions only get called at the level of nnfit:

      // read covmat from file if specified in the runcard
      if (settings.IsThUncertainties())
        {
          string ThCovMatPath = settings.GetResultsDirectory() + "/tables/datacuts_theory_theorycovmatconfig_theory_covmat.csv";

          exp->LoadRepCovMat(ThCovMatPath, settings.IsThCovSampling());
          exp->LoadFitCovMat(ThCovMatPath, settings.IsThCovFitting());
        }

It is key they are called after SetT0 (relevant for the fit covmat)

Now In n3fit we are calling get_covmat which itself is calling GetCovMat (similarly for get_sqrtcovmat)

matrix<double> const& GetCovMat()  const { return fCovMat;  } //!< Return fCovMat
matrix<double> const& GetSqrtCov() const { return fSqrtCov; } //!< Return the Cholesky decomposition of the covariance matrix

and we use the MakeReplica function. Both of these are after calling spec_c.SetT0 Now I'm pretty sure that if we had a single experiment, then it would be sufficient to call LoadRepCovMat and LoadFitCovMat in common_data_reader and so I personally don't think it would require that much effort to have the same theory covmat utils in n3fit as we had in nnfit. Also I think I would remove the caching of the data loading i.e change spec_c = spec.load() to spec.__wrapped__.load(spec) so that we weren't mutating something that was also cached.

Now then we can talk about how there are more robust ways of doing this (i'm guessing @Zaharid hates what I am suggesting doing), which I totally agree with, but I think this should be solved when designing the python implementations of the functions here. I think that having the same as before is satisfactory for now and (famous last words) I think I could do it in a few hours, then it would need testing which would take a little longer.

There was this other suggestion about creating new commondata for effected datasets. I think this could be done but if it's not trivial to do then I feel confident enough we can just do the above.