Automated selection of models via a `best_chi2_worse_phi2` algorithm
Created by: Cmurilochem
As a continuation of #1943, I managed to automate the selection of best models via the @juanrojochacon's hyperopt algorithm wherein data of 1/\varphi^{2}
is used to decide on the best \chi^{2}
hyperpoint. Here I am just referring to it as best_chi2_worse_phi2
algorithm.
To this end, I made a post-fit script which is primarily based on the validphys vp_hyperoptplot.py module. I did so in such a way to make our implementation easier later. Just in case I attach it here analysis_hyperopt.zip.
The core of the idea is presented in the code snippet below:
args = {
'loss_target': 'best_chi2_worst_phi2', # select Juan & Roy's algorithm
'max_phi2_points': 10, # select the n lowest values of 1/phi2
'threshold': 3.0,
}
if args.loss_target == "best_chi2_worst_phi2":
minimum = dataframe.loss[best_idx]
std = np.std(dataframe.loss)
lim_max = dataframe.loss[best_idx] + std
# select rows with chi2 losses within the best point and lim_max
selected_chi2 = dataframe[(dataframe.loss >= minimum) & (dataframe.loss <= lim_max)]
# among the selected points, select the nth lowest in 1/phi2
selected_phi2 = selected_chi2.loss_reciprocal_phi2.nsmallest(args.max_phi2_points)
# find the location of these points in the dataframe
indices = dataframe[dataframe['loss_reciprocal_phi2'].isin(selected_phi2)].index
best_trial = dataframe.loc[indices]
Here, I define an internal between the chi2 minimum
and 1 standard deviation std
from which I will monitor later on the corresponding 1/phi2 values. For these, I get the nth lowest 1/phi2 hyperpoints and save the selected models into best_trial
. In the zip attached file I take as example the runs I discussed on Monday using 10 replicas (because I have much more points to test the algorithm). The final plot is show below:
The yellow region defines the interval between chi2 minimum
(grey circle) and 1 standard deviation std
of the loss data. I also asked the script to give me 10 models within this region which show the lowest 1/phi2's (cyan circles).
Questions
- Is 1
std
sufficient for our purposes ? Note that for the analysis I selected a loss threshold of 3. So, all models showing higher losses were excluded from the DataFrame and analysis. - When looking at 1/phi2 values which option is more physically sound and the best: (i). 1/ < phi2 > or (ii). <1/phi2> ? Note that in the analysis I use <1/phi2>.
- Is the idea to implement this later in
validphys
? I tried to run thevp-hyperoptplot
but it always complains about the need forpandoc
(even if I havepandoc
installed).
I would appreciate any comments and idea to improve are always welcome.