Skip to content
Snippets Groups Projects

Group datasets by experiment PLOTTING label

Merged Emanuele Roberto Nocera requested to merge expbyplotlabel into master

Created by: wilsonmr

closes #413 (closed), closes #418 (closed)

closes #174 (closed) (the dataspecs plots already exists)

At the moment this is EXTREMELY basic, but it explains better my approximate approach to addressing the problem that the plots by experiment in vp reports are essentially redundant.

Instead of rewriting the various actions (another possibility which shouldn't be discounted/forgotten) I aimed to keep the idea of experiment as some arbitrary combination of datasets. There is a production rule which for a given fit, looks at the plotting info and then groups those datasets into experiments. Then one can do something like

{@with experiments_from_plotting@}
{@plot_experiments_chi2@}
{@endwith@}

and the plot will have bars according to the plotting experiments. I then added a provider fixup_fitthcovmat which handles the loading of the fitthcovmat and fixes the experiment headers to match those of the experiments. Then anything which used fitthcovmat now uses fixup_fitthcovmat and in DataResult I now take the cross section of fixup_fitthcovmat for either experiment or dataset.

The final step would be introducing some minimal #356 (closed) so that one can calculate total chi2s required for some tables and the summarise_fits action. All of this is currently a bit gross at the moment, can probably be done in a cleaner way, or in fact I could stop with this approach and try something completely different, it has however produced this: (results include th covmat and may or may not be correct - they look rather sensible compared to not using th covmat but that's not very rigorous)

experiments_from_plotting_plot_experiments_chi2

Merge request reports

Merged by avatar (Apr 5, 2025 12:24pm UTC)

Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • requested review from @enocera

  • requested review from @enocera

  • requested review from @enocera

  • 747 751 """
    748 752 return label
    749 753
    754 def produce_experiments_from_plotting(self, fit):
    755 """Used to produce experiments from the fit, where each experiment is a group of datasets
    756 according to the plotting info
    757 """
    758 #TODO: consider this an implimentation detail
    759 from reportengine.namespaces import NSList
    760
    761 with self.set_context(ns=self._curr_ns.new_child({'fit':fit})):
    762 _, experiments = self.parse_from_('fit', 'experiments', write=False)
  • Emanuele Roberto Nocera
  • Created by: Zaharid

    I think we should move towards having (data, theory predictions, covmat) -> chi2 instead of (results) -> chi2. Now covmat requires way too many switches (and mixing data and theory in DataSet has always been a big problem in libnnpdf).

  • Created by: Zaharid

    @scarrazza @tgiani while I am unhappy with vp getting a lot uglier due to this whole covmat thing I don't see much to do here short of rewriting the whole thing (which we should be doing by now). Please do look and test this.

  • Created by: Zaharid

    @wilsonmr Please rebase on top of master and force push to get the tests to work. Would be good to add tests with the theory covmat.

    I am also thinking that if we are going to use a theory covmat everywhere, it should be computed by vp rather than loaded somewhere with potentially incompatible data. And in the case where it is loaded, we should be able to check that e.g. the cuts match. One way to go about that is storing the corresponding namespace information on the cuts, in line with #224.

  • Created by: Zaharid

    In any case we need a plan on how this should look like medium term.

  • Created by: wilsonmr

    sure, the loading is a bit problematic. The only possible problem is if it is to be computed then basically every report from now on will require up to 9 theories to run. Like you say, at the very least the loaded one should check that it's consistent with the cuts.

    There is still more to do with this if it is the direction you want to go for now - this will break the total chi2 values everywhere because they will miss theory correlations between experiments (as defined in this branch)

  • Created by: wilsonmr

    Last Commit:

    Ok some decisions made here:

    • use_theorycovmat should only be a bool, the string option made it messy and was too prone to error
    • since experiments in the global namespace couldn't be overridden, I moved it to dataspecs specifically so it didn't interact with the production rules
    • removed the plot_pdfs_phi since it needed to be sorted anyway closes #174 (closed) (the dataspecs plots already exists) I also added a phi table, and so changed phi_data also returns Npoints
    • if theory covmat is to be used then I always calculate covmat and sqrt covmat 'externally' in the sense that I get the sqrt from numpy function, but this seems consistent with the so called internal function GetSqrtFitCovMat in fact I need to delete this function.

    Finally here is a new vp-comparefits report https://vp.nnpdf.science/T9J7ouncRqK8WJY4avGGmA==

    (compare to its old version https://vp.nnpdf.science/WfioqoYSSl2DCFyr1xgEsw==/#chi2-by-experiment the datasets are sometimes ordered differently but they seem to have same numbers, less/no NaNs and the new experiment plots look sensible)

    To do: (provided this passes tests)

    • Add covmat checks
    • Clean up some redundant providers/ change documentation more thoroughly
    • add covmat tests
    • add to vp-comparefits report to indicate what covmat is used
    • The table fits_chi2_table total is still broken, calc total in more sensible way.
  • Created by: Zaharid

    Can we add to vp-comparefits some info on what covmat is used for the various plots? This could be a piece of text after the title or some label in the plot.

  • Created by: wilsonmr

    I can, but they should be the same in this case, just in a different order since the fits_chi2_datasets_table is grouped according the plotting label on the experiment index

  • Created by: Zaharid

    Yeah, sorry just realized.

  • Created by: wilsonmr

    oh also I'm very bad at thinking of proper variable names, so probably these need to be changed <- input appreciated

  • Created by: Zaharid

    Yeah, I am not a fan of the name plot_fits_exp_by_plotting_chi2.

  • Created by: Zaharid

    Specifically I don't think the names should contain references to plotting.

  • Created by: wilsonmr

    I just wanted to have some kind of distinction between these experiments and the fitted experiment, the latter of which has some kind of meaningful bearing on the total chi2. I guess I need to change the tests to account for the new DataResult layout. I didn't quite do what you said wrt data, theory predictions, covmat -> chi2 here but vaguely moved towards it, where DataResult is a container of data and covmat and result is a container of that and the theory prediction

  • Created by: wilsonmr

    As soon as #356 (closed) becomes a thing then this becomes less confusing, perhaps it would be sufficient to call the new providers *_experiments_* whatever and then on any provider used in the report with the old meaning of experiments, put a comment and a warning that the provider needs to be updated with data when it exists?

    and delete for example plot_fits_experiments_chi2 which is either the same as plot_fits_exp_by_plotting_chi2 for old fits or is useless for newer fits...

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading