Use PLOTTING information to group by experiments

assigned to @enocera

added urgent label

Created by: Zaharid

@tgiani @wilsonmr Please try to get to this, since people are complaining loudly about it. It should be easy to use something like this as onde index level for a dataframe:

from validphys.plotoptions import get_info

def patata(experiments):
    flat = (ds for exp in experiments for ds in exp.datasets)
    metaexps = [get_info(ds).experiment for ds in flat]
    print(metaexps)

Created by: wilsonmr

will take a look

mentioned in merge request !412 (merged)

Created by: wilsonmr

I'm a bit confused at what level this needs to happen, is it in every function which currently takes experiments as an input? Or is this supposed to be at the validphys.config level?

Created by: Zaharid

There are several ways to do it. But maybe let's start with the things that appear in vp-comparefits?

On Wed, 27 Mar 2019, 12:04 wilsonmr, notifications@github.com wrote:

I'm a bit confused at what level this needs to happen, is it in every function which currently takes experiments as an input? Or is this supposed to be at the validphys.config level?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub #413 (closed), or mute the thread https://github.com/notifications/unsubscribe-auth/AFabUpmtwPBzpdqYquxUW4uGzwyDv__eks5va14egaJpZM4cLtpd .

Created by: wilsonmr

ok so at a glance fits_chi2_table, plot_fits_experiments_chi2, plot_phi (in fact this function could be improved anyway).

The only weird thing is with the fits with theory uncertainties then the experimental covariance matrix for experiments (as defined by the PLOTTING information) wouldn't by default have any theory contribution

Created by: Zaharid

Isn't that what the other pr is for?

On Wed, 27 Mar 2019, 12:13 wilsonmr, notifications@github.com wrote:

ok so at a glance fits_chi2_table, plot_fits_experiments_chi2, plot_phi (in fact this function could be improved anyway).

The only weird thing is with the fits with theory uncertainties then the experimental covariance matrix for experiments (as defined by the PLOTTING information) wouldn't by default have any theory contribution

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub #413 (closed), or mute the thread https://github.com/notifications/unsubscribe-auth/AFabUtxgJmEOYutZbvMUaHYeyXmAmwUpks5va2B3gaJpZM4cLtpd .

Created by: wilsonmr

well the other PR expects there to just be the single (big) experiment or single dataset and sets the covmat and then the sqrt covmat accordingly. If I understand this correctly we now want to not use the single experiment label since it defeats the point of plot_fits_experiments_chi2 for example where you get single bar. Instead we want the experiment to be from the plotting info, which is fine but then the covariance matrix needs to be constructed for this experiment in a similar way the single dataset case, where a block is taken from the total theory covmat, and then given to this GetFitSqrtCovMat function which gives back the sqrt of the total covmat for that set of datasets, the way these providers currently work it would be easiest to define new ExperimentSpec objects from these combination of datasets and collect over this new list of experiments. But this seems really messy perhaps I missed something?

Created by: Zaharid

Indeed it is a bit complicated in that you would have to first group by experiment (label) and then get the appropriate slices of covmat and predictions.

On Wed, 27 Mar 2019, 12:30 wilsonmr, notifications@github.com wrote:

well the other PR expects there to just be the single experiment or single dataset and sets the covmat and then the sqrt covmat accordingly. If I understand this correctly we now want to not use the single experiment label since it defeats the point of plot_fits_experiments_chi2 for example where you get single bar. Instead we want the experiment to be form the plotting info, which is fine but then the covariance matrix needs to be constructed for this experiment in a similar way the single dataset case, where a block is taken from the total theory covmat, and then given to this GetFitSqrtCovMat function which gives back the sqrt of the total covmat for that set of datasets, the way these providers currently work it would be easiest to define new ExperimentSpec objects from these combination of datasets and collect over this new list of experiments. But this seems really messy perhaps I missed something?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub #413 (closed), or mute the thread https://github.com/notifications/unsubscribe-auth/AFabUjcLPjQ4O4R4Z4z7RyQ4X0w0SEPRks5va2R4gaJpZM4cLtpd .

Created by: wilsonmr

yeah and also the way fits_chi2_table currently calculates the total is summing chi2 for different experiments which if we were grouping by plotting info experiment then this would also need a rethink. I think naively this would work best if the total covmat was stored as a panda dataframe (EDIT: with just one dataset name and datapoint index as a 2 level multiindex) before the sqrt was taken, then it's easy to get the relevant sub matrix and get its square root, and cache these in the various DatasetSpec and ExperimentSpec objects but also have the total still available, for instance to calculate the total chi2 for both the chi2 tables and the fit summary tables

Created by: Zaharid

That sounds reasonable. Btw @enocera what are you doing in the results you are showing?

On Wed, 27 Mar 2019, 12:54 wilsonmr, notifications@github.com wrote:

yeah and also the way fits_chi2_table currently calculates the total is summing chi2 for different experiments which if we were grouping by plotting info experiment then this would also need a rethink. I think naively this would work best if the total covmat was stored as a panda dataframe before the sqrt was taken, then it's easy to get the relevant sub matrix and get its square root, and cache these in the various DatasetSpec and ExperimentSpec objects but also have the total still available, for instance to calculate the total chi2 for both the chi2 tables and the fit summary tables

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub #413 (closed), or mute the thread https://github.com/notifications/unsubscribe-auth/AFabUkBExQ9tOJgsnziCNcqd866sejn2ks5va2ojgaJpZM4cLtpd .

@Zaharid Nothing but doing vp-comparefits -i

mentioned in issue #418 (closed)

mentioned in merge request !365 (merged)

Created by: wilsonmr

I think I'm being stupid but I can't get even the simplest version of this to work. I put a function in config.py which was something like

def produce_fit_plotting_experiments(self, fit):
        with self.set_context(ns=self._curr_ns.new_child({'fit':fit})):
            _, experiments = self.parse_from_('fit', 'experiments', write=False)

        res = {}
        for exp in experiments:
            for ds in exp.datasets:
                metaexp = get_info(ds).experiment
                if metaexp in res:
                    res[metaexp].append(ds)
                else:
                    res[metaexp] = [ds]
        exps_out = []
        for exp in res:
            exps_out.append(ExperimentSpec(exp, res[exp]))
        return {'experiments': exps_out}

with the hope that at one could use this production rule with current actions which require experiments to arrange plots grouped by plotting experiment. But I don't really understand what experiments is supposed to look like, I thought it would just be a list of individual experiments, but then vp complains that the items are non-dict like. However if I try making the items like {experiment: ExperimentSpec()} then this breaks various actions which expect each item in experiment to be an experiment spec

Probably this is a stupid way to proceed but I thought might be the easiest start, if there is a more obvious way then let me know. If not perhaps you have an idea of how exactly to properly construct experiments in this production rule?

Created by: wilsonmr

perhaps it would be easier to just reinvent the providers used in the compare-fits report to use something like experiments but not quite instead of trying to patch experiments to work with old providers

Created by: Zaharid

I think I'm being stupid but I can't get even the simplest version of this to work. I put a function in config.py which was something like
def produce_fit_plotting_experiments(self, fit):
        with self.set_context(ns=self._curr_ns.new_child({'fit':fit})):
            _, experiments = self.parse_from_('fit', 'experiments', write=False)

        res = {}
        for exp in experiments:
            for ds in exp.datasets:
                metaexp = get_info(ds).experiment
                if metaexp in res:
                    res[metaexp].append(ds)
                else:
                    res[metaexp] = [ds]
        exps_out = []
        for exp in res:
            exps_out.append(ExperimentSpec(exp, res[exp]))
        return {'experiments': exps_out}
with the hope that at one could use this production rule with current actions which require experiments to arrange plots grouped by plotting experiment. But I don't really understand what experiments is supposed to look like, I thought it would just be a list of individual experiments, but then vp complains that the items are non-dict like. However if I try making the items like {experiment: ExperimentSpec()} then this breaks various actions which expect each item in experiment to be an experiment spec

Probably this is a stupid way to proceed but I thought might be the easiest start, if there is a more obvious way then let me know. If not perhaps you have an idea of how exactly to properly construct experiments in this production rule?

reportengine uses a trick to allow things like experiments and pdfs to be usable as lists of namespaces which provide a single experiment and pdf key per namespace respectively. The trick is to make these things (specifically the the ones you mark with element_of) not a list but a NSList(as defined in reportengine.namespaces), which has a method like:

    def as_namespace(self):
        return [{self.nskey: item} for item in self]

Ideally I'd like to keep that a an implementation detail, but it's fine if this is the best way.

I'd say the endgame is to use something like #356 (closed). I am not not sure what the most efficient approach for this particular problem is.

Use PLOTTING information to group by experiments

Designs

Child items ...

Activity