Automated selection of models via a `best_chi2_worse_phi2` algorithm

Created by: Cmurilochem

As a continuation of #1943, I managed to automate the selection of best models via the @juanrojochacon's hyperopt algorithm wherein data of 1/\varphi^{2} is used to decide on the best \chi^{2} hyperpoint. Here I am just referring to it as best_chi2_worse_phi2 algorithm.

To this end, I made a post-fit script which is primarily based on the validphys vp_hyperoptplot.py module. I did so in such a way to make our implementation easier later. Just in case I attach it here analysis_hyperopt.zip.

The core of the idea is presented in the code snippet below:

args = {
    'loss_target': 'best_chi2_worst_phi2',    # select Juan & Roy's algorithm
    'max_phi2_points': 10,                             # select the n lowest values of 1/phi2
    'threshold': 3.0,
}

if args.loss_target == "best_chi2_worst_phi2":
        minimum = dataframe.loss[best_idx]
        std = np.std(dataframe.loss)
        lim_max = dataframe.loss[best_idx] + std
        # select rows with chi2 losses within the best point and lim_max
        selected_chi2 = dataframe[(dataframe.loss >= minimum) & (dataframe.loss <= lim_max)]
        # among the selected points, select the nth lowest in 1/phi2
        selected_phi2 = selected_chi2.loss_reciprocal_phi2.nsmallest(args.max_phi2_points)
        # find the location of these points in the dataframe
        indices = dataframe[dataframe['loss_reciprocal_phi2'].isin(selected_phi2)].index
        best_trial = dataframe.loc[indices]

Here, I define an internal between the chi2 minimum and 1 standard deviation std from which I will monitor later on the corresponding 1/phi2 values. For these, I get the nth lowest 1/phi2 hyperpoints and save the selected models into best_trial. In the zip attached file I take as example the runs I discussed on Monday using 10 replicas (because I have much more points to test the algorithm). The final plot is show below:

The yellow region defines the interval between chi2 minimum (grey circle) and 1 standard deviation std of the loss data. I also asked the script to give me 10 models within this region which show the lowest 1/phi2's (cyan circles).

Questions

Is 1 std sufficient for our purposes ? Note that for the analysis I selected a loss threshold of 3. So, all models showing higher losses were excluded from the DataFrame and analysis.
When looking at 1/phi2 values which option is more physically sound and the best: (i). 1/ < phi2 > or (ii). <1/phi2> ? Note that in the analysis I use <1/phi2>.
Is the idea to implement this later in validphys ? I tried to run the vp-hyperoptplot but it always complains about the need for pandoc (even if I have pandoc installed).

I would appreciate any comments and idea to improve are always welcome.