Changing generation of pseudodata
Created by: andreab1997
Currently, the pseudodata are generated again each time even one of them is negative (and not ASY) before going on. From a theoretical point of view this is strange, because it is equivalent to cut the gaussians distribution before zero. Moreover, from a performance point of view, this easily becomes a bottleneck (for example in the case of fit with thcovmat in which the shifts are bigger). We want to try doing something slightly different, namely taking the absolute value of each negative pseudodata. From a theoretical point of view this makes no sense as the actual method, but, at least, it requires only a single iteration in every case.
Merge request reports
Activity
Created by: Zaharid
FWIW something that we may as well do is setting the negative pseudodata to zero, which incidentally is what we recommend other people to do
http://nnpdf.mi.infn.it/wp-content/uploads/2019/03/NNPDFfits_Positivity_PhysicalCrossSections_v2.pdf
226 shifted_pseudodata = (all_pseudodata + shifts)*mult_part 227 #positivity control 228 shifted_pseudodata[full_mask] = np.abs(shifted_pseudodata[full_mask]) 229 228 shifted_pseudodata = (all_pseudodata + reg_shifts)*mult_part 229 230 230 return shifted_pseudodata 231 231 232 def delta(shifts, data, mask): 233 new_shifts = [] 234 for sh, dat, ma in zip(shifts, data, mask): 235 if ma: 236 if sh < 0.: 237 sh = dat*(np.exp(sh/dat)-1.) 238 new_shifts.append(sh) 239 return np.array(new_shifts) Created by: alecandido
FWIW something that we may as well do is setting the negative pseudodata to zero, which incidentally is what we recommend other people to do
http://nnpdf.mi.infn.it/wp-content/uploads/2019/03/NNPDFfits_Positivity_PhysicalCrossSections_v2.pdf
Hi @Zaharid, sorry not to have replied before, there were holidays in the middle: you're right, that's an honest solution, but I like more the ELU regularization: 0 itself is rather a degenerate value, and the resulting distribution has a discrete weight on a single point.
It might be all right, but I prefer to warp everything a bit, and obtain the final distribution deforming the original one (it is not a Gaussian in any case, not even with truncation).
Created by: Zaharid
Fwiw the proposed solution seems to me worse than the status quo. It has the unpleasant feature that big negative deviations become big postive ones for no reason while it is unclear what should happen with the corrlations. With the truncated gaussian one at least gets a proportional rescaling of the positive part and sensible enough correlations.
In any case I would be interested in seeing how much of a difference do all of these choices make at the end of a fit.
Created by: alecandido
Fwiw the proposed solution seems to me worse than the status quo. It has the unpleasant feature that big negative deviations become big postive ones for no reason while it is unclear what should happen with the corrlations. With the truncated gaussian one at least gets a proportional rescaling of the positive part and sensible enough correlations.
In any case I would be interested in seeing how much of a difference do all of these choices make at the end of a fit.
I guess you're not catching up with the updates: the reflection was only a first attempt, with the only rationale of applying a transformation instead of retrying. If you call a "bad" point one for which the variance is so large that the central value is almost zero (such that the Gaussian is cut approximately in half), you see that the status quo is exponentially slow in the number of bad points: since the points are bulk extracted (generated by extracting a vector of iid normal variables and rotating it by the Cholesky factor of the covmat), in order to get all positive you have a chance that is
1/2^b
, withb
the number of bad points.The technical improvement was the only improvement we were seeking at the beginning, but the current solution (the ELU) has also the property you desire (not scattering the points around), the property I desire (not putting all the weight at 0), the technical improvement, and it's improving a bit w.r.t. to the status quo (since it's keeping the median, and biasing less the average on the right).
The plan is that the implementation is so simple that is already done e7fa6ad, you can see that the whole PR is a small change in code, and @andreab1997 is already running a fit to estimate the impact.
The technical improvement is required for further studies with the theory covmat, that can blow up the variance of some points, driving many of them to be baddish.
220 special_mult = ( 221 1 + special_mult_errors * rng.normal(size=(1, 222 special_mult_errors.shape[1])) / 100 223 ).prod(axis=1) 224 mult_part = np.concatenate(mult_shifts, axis=0)*special_mult 225 #regularize the negative shifts 226 reg_shifts = delta(shifts, all_pseudodata, full_mask) 227 #Shifting pseudodata 228 shifted_pseudodata = (all_pseudodata + reg_shifts)*mult_part 229 231 230 return shifted_pseudodata 232 231 232 def delta(shifts, data, mask): 233 new_shifts = shifts.copy() 234 neg = shifts < 0. 235 negtofix = np.logical_and(neg, mask) Created by: alecandido
Also not clear to me it is the right scale to set (but maybe the uncertainty is?).
This actually I didn't understand, which scale are you talking of? However, the shape has been optimized for:
- mapping
0
shift in0
- mapping
-inf
to obtain0
for the sampled datum (so the shift is-datum
) - have continuous first derivative (i.e. derivative equal to
1
for shift equal0
)
- mapping
Created by: Zaharid
In principle we may want to not have to deal with experimental data but rather with theory predictions, which can logically (and practically) be negative. The most popular application of that is closure tests, but also the kind of thing we do in Cambridge these days (what happens if I add new physics to this pristine self consistent environment where the fake data has been obtained from the predictions?).
Created by: alecandido
Then I'm sorry, but you are asking for something more than the status quo: at the moment data are imposed positive by retrying strategy. This PR is not a generalization of that, but simply it's trying to be a more efficient (and possibly slightly less biasing) implementation of that.
If you want to allow for negative elements, then you should bypass the imposed positivity (that was pre-existing to this PR), that is a generalization and it's very simple to add to both implementations: bypass it. Still, it is not in the current scope, but not even farther because of this.
Created by: alecandido
Ok, you're right, but then it might be even worse than the badness cited above: you can have a rate lower than 50% for a single point, so:
- if it is an isolated feature, you can decide whatever you wish among
max(d, 0)
and discarding the point - if it is consistent over a set of points, then even the former was not a solution: you'd develop an incredibly high rejection rate
If you're in case 2., I would suggest dropping pseudodata positivity (and thus bypass the update above), because it's absurd to fit positive pseudodata for actual negative central values. Pseudodata positivity is an assumption on the distribution, but it is not mandatory: if the thing you're fitting makes sense to be negative, then use appropriate distributions.
- if it is an isolated feature, you can decide whatever you wish among
Created by: Zaharid
I guess I am saying that I would consider a proposal to either clip negative values at zero or to do nothing and rely on other positivity constraining features to be potentially advantageous. But I think the proposed solution as is now breaks the aforementioned existing use case.
Created by: alecandido
But I think the proposed solution as is now breaks the aforementioned existing use case.
I believe there is no existing use case available, because while theoretically possible, it requires an exponential time, so it's only useful when it is irrelevant. The issue we attempted to face in this PR is: is it possible to have a technically light implementation for the current use case, namely positive central values? And the answer seems to be yes.
The scope was a brief technical PR, for a proper extension to negative central values (not supported by the status quo) let's open a new one. I guess we can make both behaviors available to the runcard (clipping and nothing).
Created by: andreab1997
I have now the reports of the fits done with the ELU, so we can just look to the results. https://vp.nnpdf.science/JfvnfqGKTvWEfo7oHiFmQw== https://vp.nnpdf.science/DiO8TovvQiG4oAgqgJB7-g== The first is the comparison between baselines (so without thcovmat), one with the ELU and the other with the "old" method. The second is the same but for a thcovmat fit. I believe that the results are really similar in both cases, so probably in this case using the ELU gives really just an improvement on the performances.
Created by: alecandido
Thanks, having a simpler mechanism guaranteed to terminate in a finite amount of time seems like a good idea. That said we could as well use some option that works in various corner cases as per the discussion above.
I would keep this PR as small as possible (I'd keep PRs in general small, but I'm the first one to fail...). I like this one to be the default, especially now that is proven to work as good as the previous one (at the level of results).
But I'd support a new PR to expose the option of above (actually extended) to the runcard:
pseudodata_positivity: elu/clip/null central_values_positivity: clip/null
clip
is justmax(d,0)
,null
is no positivity applied, andelu
is the one of this PR (only makes sense with a reference, so only for pseudodata, not for central values). This should be very simple to implement, about 5 lines of code (just branching andmax
) + parsing and propagating runcards options.Created by: Zaharid
I’d prefer this to be yet another configuration option that needs to be kept in sync, especially if it is unlikely to make a difference. But we likely would have to maintain the old one in order to be able to make prove that it is equivalent. However the configuration should be something to remove in the future rather than to expand on. I would think nobody is in the business of launching a big research project into sampling pseudodata that requires permanent infrastructure in the code. Even if those that do, in the past we have made the mistake of keeping too much testing code that was irrelevant to the published work (the theory covmat module is an example if that).
Even if there end up being more options as far as I am concerned, a hard requirement for the default is to work with negative central experimental values (which have come up despite the apparent silliness).
I think it is safe to assume that if elu doesn’t make a big difference none of the alternatives will , while personally I am unpersuaded of its benefits. In particular, all things equal, following our own public advice seems like the most sensible choice to me.
Created by: alecandido
But we likely would have to maintain the old one in order to be able to make prove that it is equivalent.
I'm not that sure: we have the old code for reproducibility, it's the purpose of the tags. So, I'd say we don't have to maintain any weird bit of code written at some point in the past, especially if we have already proven it's not making much difference wrt the new one.
Created by: alecandido
However the configuration should be something to remove in the future rather than to expand on.
Even if there end up being more options as far as I am concerned, a hard requirement for the default is to work with negative central experimental values (which have come up despite the apparent silliness).
We have already plenty of silly configurations, this is in no way worse than the multiplicative errors.
And, as said, not even the old one sad working in practice with negative central values. Negative central values are not a feature of baseline fit, so I can agree we can support, but no reason to support them by default.
Created by: alecandido
I would think nobody is in the business of launching a big research project into sampling pseudodata that requires permanent infrastructure in the code.
"Big research project" is for sure too ambitious, but there is some interest to play with pseudodata generation. Stefano F mentioned we might like to implement Poisson distributed pseudodata, for a very large x dataset that is waiting since years.
I see no reason not to keep some variations, especially if they are so simple. We can document and unit test much better than what has been done, and for testing within the global fit we only need the default to be sensible. The other you can consider part of the "human hyperopt" for each specific problem.
In particular, since closure tests have to reproduce the actual methodology, I'd clip level 1 pseudodata (central values), rather than allowing for negative ones. However, let's discuss at the code meeting.
Created by: alecandido
I think it is safe to assume that if elu doesn’t make a big difference none of the alternatives will , while personally I am unpersuaded of its benefits.
This is not true, as we have already seen with reflection. That was silly, but in principle pseudodata should have been not so negative, such that it would have not "made a big difference".
If you have a suggestion, let's test it, before discussing on conjectures.
Created by: Zaharid
But we likely would have to maintain the old one in order to be able to make prove that it is equivalent.
I'm not that sure: we have the old code for reproducibility, it's the purpose of the tags. So, I'd say we don't have to maintain any weird bit of code written at some point in the past, especially if we have already proven it's not making much difference wrt the new one.
Please let's not re litigate well established and publicly documented policies in every thread.
Created by: alecandido
Please let's not re litigate well established and publicly documented policies in every thread.
Instead of pointing me to the docs, point to the minutes:
However, we might even maintain the old bugs, but they should be flagged as deprecated, and the next major release can't be something that will happen in 10 years.
Potentially, 4.x series should last for as long as possible, since we roll major only in case of actual major updates, that don't have to happen necessarily: the more stable we are, the better it is. But this should not block smaller updates forever!