Incorporating general theory covmats into nnpdf code

Created by: Zaharid

Indeed I think we should clean the whole Experiment/dataset mess before adding more features like this.

Created by: RosalynLP

In which case I will make a start on that asap as we will need this functionality for a higher twist uncertainty project I am working on.

Created by: RosalynLP

Presumably sorting out exp vs datasets also requires changes in the C++ code?

Created by: RosalynLP

@Zaharid I can't find a previous issue open about this, the closest is #21 (closed), did we ever sketch out a strategy for this?

Created by: Zaharid

There is a bit of discussion here:

#483 (closed)

There is a mostly abandoned effort on things marked with the destroyc++ tag

https://gitlab.c3s.unito.it/enocera/nnpdf/-/issues?q=is%3Aissue+is%3Aopen+label%3Adestroyingc%2B%2B

@wilsonmr had drafted something looking like a plan in the last code meeting in Milan.

Created by: wilsonmr

The new experiment setup will hopefully make this easier but I think Rosalyn is wanting to use this with the old code, we just discussed and in the end we think that for the purpose of cutting a total covmat something like this should suffice:

ndata = 0
cuts_list = []

for exp in experiments:
    for ds in exp.datasets:
        cuts = ds.cuts
        if cuts is None:
            cuts_list.append(np.arange(ndata, ndata+len(ds)))
        else:
            cuts_list.append(ndata+cuts.load())
        ndata += len(ds)

total_cuts = np.concatenate(cuts_list, axis=0)
cut_covmat = total_covmat[total_cuts, :][:, total_cuts]
general_theory_covmat = cut_covmat + scale_covmat

Created by: Zaharid

I guess I am concerned that if you keep piling on top of a design that is know to be problematic, it is going to eventually collapse in a way that nobody knows how to fix...

mentioned in merge request !646 (closed)

changed title from Incorporating general theory covmats into fits to Incorporating general theory covmats into nnpdf code

Created by: wilsonmr

One thing I was thinking about was the possibility of having a commondata-esque object related to theory covariance matrices

Take for example the 9-point covmat, in construction.py:

def covmat_9pt(name1, name2, deltas1, deltas2):
    """Returns theory covariance sub-matrix for 9pt prescription,
    given two dataset names and collections of scale variation shifts"""
    if name1 == name2:
        s = 0.25*sum(np.outer(d, d) for d in deltas1)
    else:
        s = (1/12)*(np.outer((deltas1[0]+deltas1[4]+deltas1[6]),
                    (deltas2[0]+deltas2[4]+deltas2[6]))
                + np.outer((deltas1[1]+deltas1[5]+deltas1[7]),
                           (deltas2[1]+deltas2[5]+deltas2[7]))) + (1/8)*(
                           np.outer((deltas1[2]+deltas1[3]),
                                    (deltas2[2]+deltas2[3])))
    return s

It seems to me that deltas here could be calculated once for a given input PDF and stored like a systematics file. If every dataset had it's respective deltas then the covmat block for two datasets can be constructed quite easily and doesn't rely on saving a constructed covmat which may or may not have cuts applied or have been made with different datasets

I have no idea how the higher twists covmat is constructed, but @RosalynLP is there some point of the construction which looks like folloiwng?

higher_twist_cov[i, j] = higher_twist_delta[i] * higher_twist_delta[j]

Created by: wilsonmr

also one wouldn't need 9 theorys to do analysis or run a fit any more - just to generate deltas which would be done once.

Created by: Zaharid

I am not sure how stable those things will be and how many variations and variations of variations we are going to want, so I wouldn't assimilate it to commondata. However perhaps it could be linked to theories somehow.

Created by: wilsonmr

Well, with the scale variations covariance matrices I don't see how it would be unstable to save a copy of a table for each dataset which was N_data x N_shifts where the columns would be like:

+0 | ++ | +- | -0 | -+ | -- | 0+ | 0-

where each label refers to a delta between that theory and theory 00, the variations would be on the PDF and PTO of the input theories (the latter we don't need to worry much about for the time being). Adhering to the covmat reg convention if I call table above A then the covmat construction is either

1/normalisation * A A.T

for same processes or

1/normalisation * A_tilde A_tilde.T

where A_tilde a new table whose columns are linear combinations of A according to the point prescription (like tilde_A[:, '+X'] = sqrt(2) * (A[:, '+0'] + A[:, '++'] + A[:, '+-']) etc.). As a side note: if the table was a Dataframe with column labels like "++" then it would be far less ambiguous what contruction.py was doing. I believe if there were variations then the variations would be happening at the level of the construction not the deltas - which as far as I can tell were pretty much constant throughout the theory covariance project

Surely this is way better than saving a cut covmat which is only valid if the datasets are loaded in the same order, with the same cuts etc.?

This would be seperate file to commondata and theory. But I suppose would be more similar to theory in the sense that there could be multiple files associated with a single dataset which were square roots of different theory covariance matrices. So the table above would be like <dataset_id>_pointpresc_delta.dat but (provided it can be cast in a similar format) could also have <dataset_id>_higher_twist.dat or whatever

Created by: RosalynLP

@wilsonmr I think what you are saying amounts to storing the covariance matrices as nuisance parameters rather than as covariance matrices right? In the case of the theory covmat the nuisance parameters can be constructed from linear combinations of the deltas. Then you have some nuisance parameters

\beta

with eigenvalues

\alpha

and you construct

S = \sum_n \alpha_n \beta_n \beta_n^T

. But I think what you are suggesting is really overcomplicating matters, and although in principle we can write any covariance matrix in terms of some nuisance parameters I think this might just create trouble for us down the line by being too inflexible. At the end of the day by far the simplest thing is we have a load of tables saved, one for each covariance matrix, with no cuts applied, and we have flags which say whether to load in specific ones. Then we apply the cuts. This is the most general way of doing it and so is likely to lead to the fewest complications down the line and generally the least stress. I think it's also worth remembering that MHOUs are a special case of theory uncertainties which can be constructed entirely from given theory vectors but in general this won't be the case and we could have some theory covariance matrix supplied externally, much like experimental covariance matrices are currently. Then we would have to deconstruct them into nuisance parameters if we were using the above framework.

Created by: wilsonmr

no, I'm just saying let's store deltas, there are 9 in total and the different prescriptions amounts to putting some of them them together in different ways as per construction.py.

I think we should discuss this in person, I'm not suggesting anything to do with nuisance parameters I don't think but it's really boring to try and type something I can write on a blackboard in 5 minutes

Created by: scarlehoff

This is now done!

closed

Incorporating general theory covmats into nnpdf code

Designs

Child items ...

Activity