We currently typically enter data using the experiments key, but that is slowly going to be deprecated in favour of the data interface (#356 (closed)). The information of which dataset belongs to which experiment has been incorporated to the plotting files as per #412. We now would like that the various actions in validphys and particularly those in vp-comparefits group by the experiment defined in the metadata. They should probably warn if the set of experiments given as input contains something other than [BIGEXP].
Designs
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
@tgiani @wilsonmr Please try to get to this, since people are complaining loudly about it. It should be easy to use something like this as onde index level for a dataframe:
I'm a bit confused at what level this needs to happen, is it in every function which currently takes experiments as an input? Or is this supposed to be at the validphys.config level?
I'm a bit confused at what level this needs to happen, is it in every
function which currently takes experiments as an input? Or is this supposed
to be at the validphys.config level?
ok so at a glance fits_chi2_table, plot_fits_experiments_chi2, plot_phi (in fact this function could be improved anyway).
The only weird thing is with the fits with theory uncertainties then the experimental covariance matrix for experiments (as defined by the PLOTTING information) wouldn't by default have any theory contribution
ok so at a glance fits_chi2_table, plot_fits_experiments_chi2, plot_phi
(in fact this function could be improved anyway).
The only weird thing is with the fits with theory uncertainties then the
experimental covariance matrix for experiments (as defined by the PLOTTING
information) wouldn't by default have any theory contribution
well the other PR expects there to just be the single (big) experiment or single dataset and sets the covmat and then the sqrt covmat accordingly. If I understand this correctly we now want to not use the single experiment label since it defeats the point of plot_fits_experiments_chi2 for example where you get single bar. Instead we want the experiment to be from the plotting info, which is fine but then the covariance matrix needs to be constructed for this experiment in a similar way the single dataset case, where a block is taken from the total theory covmat, and then given to this GetFitSqrtCovMat function which gives back the sqrt of the total covmat for that set of datasets, the way these providers currently work it would be easiest to define new ExperimentSpec objects from these combination of datasets and collect over this new list of experiments. But this seems really messy perhaps I missed something?
Indeed it is a bit complicated in that you would have to first group by
experiment (label) and then get the appropriate slices of covmat and
predictions.
well the other PR expects there to just be the single experiment or single
dataset and sets the covmat and then the sqrt covmat accordingly. If I
understand this correctly we now want to not use the single experiment
label since it defeats the point of plot_fits_experiments_chi2 for
example where you get single bar. Instead we want the experiment to be form
the plotting info, which is fine but then the covariance matrix needs to be
constructed for this experiment in a similar way the single dataset case,
where a block is taken from the total theory covmat, and then given to this
GetFitSqrtCovMat function which gives back the sqrt of the total covmat
for that set of datasets, the way these providers currently work it would
be easiest to define new ExperimentSpec objects from these combination of
datasets and collect over this new list of experiments. But this seems
really messy perhaps I missed something?
yeah and also the way fits_chi2_table currently calculates the total is summing chi2 for different experiments which if we were grouping by plotting info experiment then this would also need a rethink. I think naively this would work best if the total covmat was stored as a panda dataframe (EDIT: with just one dataset name and datapoint index as a 2 level multiindex) before the sqrt was taken, then it's easy to get the relevant sub matrix and get its square root, and cache these in the various DatasetSpec and ExperimentSpec objects but also have the total still available, for instance to calculate the total chi2 for both the chi2 tables and the fit summary tables
yeah and also the way fits_chi2_table currently calculates the total is
summing chi2 for different experiments which if we were grouping by
plotting info experiment then this would also need a rethink. I think
naively this would work best if the total covmat was stored as a panda
dataframe before the sqrt was taken, then it's easy to get the relevant sub
matrix and get its square root, and cache these in the various DatasetSpec
and ExperimentSpec objects but also have the total still available, for
instance to calculate the total chi2 for both the chi2 tables and the fit
summary tables
I think I'm being stupid but I can't get even the simplest version of this to work. I put a function in config.py which was something like
def produce_fit_plotting_experiments(self, fit): with self.set_context(ns=self._curr_ns.new_child({'fit':fit})): _, experiments = self.parse_from_('fit', 'experiments', write=False) res = {} for exp in experiments: for ds in exp.datasets: metaexp = get_info(ds).experiment if metaexp in res: res[metaexp].append(ds) else: res[metaexp] = [ds] exps_out = [] for exp in res: exps_out.append(ExperimentSpec(exp, res[exp])) return {'experiments': exps_out}
with the hope that at one could use this production rule with current actions which require experiments to arrange plots grouped by plotting experiment. But I don't really understand what experiments is supposed to look like, I thought it would just be a list of individual experiments, but then vp complains that the items are non-dict like. However if I try making the items like {experiment: ExperimentSpec()} then this breaks various actions which expect each item in experiment to be an experiment spec
Probably this is a stupid way to proceed but I thought might be the easiest start, if there is a more obvious way then let me know. If not perhaps you have an idea of how exactly to properly construct experiments in this production rule?
perhaps it would be easier to just reinvent the providers used in the compare-fits report to use something like experiments but not quite instead of trying to patch experiments to work with old providers
I think I'm being stupid but I can't get even the simplest version of this to work. I put a function in config.py which was something like
def produce_fit_plotting_experiments(self, fit): with self.set_context(ns=self._curr_ns.new_child({'fit':fit})): _, experiments = self.parse_from_('fit', 'experiments', write=False) res = {} for exp in experiments: for ds in exp.datasets: metaexp = get_info(ds).experiment if metaexp in res: res[metaexp].append(ds) else: res[metaexp] = [ds] exps_out = [] for exp in res: exps_out.append(ExperimentSpec(exp, res[exp])) return {'experiments': exps_out}
with the hope that at one could use this production rule with current actions which require experiments to arrange plots grouped by plotting experiment. But I don't really understand what experiments is supposed to look like, I thought it would just be a list of individual experiments, but then vp complains that the items are non-dict like. However if I try making the items like {experiment: ExperimentSpec()} then this breaks various actions which expect each item in experiment to be an experiment spec
Probably this is a stupid way to proceed but I thought might be the easiest start, if there is a more obvious way then let me know. If not perhaps you have an idea of how exactly to properly construct experiments in this production rule?
reportengine uses a trick to allow things like experiments and pdfs to be usable as lists of namespaces which provide a single experiment and pdf key per namespace respectively. The trick is to make these things (specifically the the ones you mark with element_of) not a list but a NSList(as defined in reportengine.namespaces), which has a method like:
def as_namespace(self): return [{self.nskey: item} for item in self]
Ideally I'd like to keep that a an implementation detail, but it's fine if this is the best way.
I'd say the endgame is to use something like #356 (closed). I am not not sure what the most efficient approach for this particular problem is.