evolvefit
Created by: scarrazza
This pull request is supposed to remove APFEL from nnfit, as discussed in #173 (closed) .
The code contains the following modifications:
- no APFEL dependency in nnfit
- evolvefit and nnfit share the same objects
I am opening this PR now (in a prototype stage) because as opposed to what we have been discussing in Amsterdam the APFEL initialization is not the only bottleneck of the DGLAP evaluation.
Here a summary of the situation:
-
if we use the external grid (as nnfit does) the initialization takes 1000s while the evolution itself take 30 minutes per replica (!!)
-
if we use the fast evolution with internal grids, the init. takes up to 10s and the evolution per replica can take approx. 1 min. This solution produces spikes as we well know.
Having these points in mind, I am not sure removing APFEL from nnfit still the best option, so here some ideas:
- we can perform a tedious grid search for 2 until we reach a good compromise (if possible)
- we can adapt evolvefit to work as revolvenet replica by replica in a cluster (but here I don't see any advantage)
- forget about this PR, merge APFEL's PR that fixes the leaks, including EDI crashes (https://github.com/scarrazza/apfel/pull/10), and keep for future the development of a faster and more stable DGLAP algorithm.
Let me know what do you think.
Merge request reports
Activity
requested review from @enocera
requested review from @enocera
Created by: Zaharid
Let's discuss on Thursday. One advantage of this approach is that it guarantees the nnfit results to be protected from misbehaviour of apfel. But whether that compensates the annoyance of having to deal with more complicated job submission scripts is somewhat questionable. There would be gains if we could pay for apfel initialization once and evolve the replicas concurrently, but we already have enough issues with the single threaded version of the problem.
Created by: scarrazza
Just a short update since yesterday's meeting. The idea of this PR still ok, however it is too slow for my personal taste. So we have decided the following:
- Nathan will take this PR and replace the direct evolution with the evolution operators (alla FK table). This should avoid the requirement of rerunning
evolvefit
for each replica in a separate job, reducing drastically the resources/calculation time. - After that we perform speed benchmarks, if we get something acceptable, i.e. 30-40 min. calculation, then fine, otherwise we should rethink about DGLAP FK tables (or something else) as we did in 3.0.
- Nathan will take this PR and replace the direct evolution with the evolution operators (alla FK table). This should avoid the requirement of rerunning
Created by: nhartland
So @vbertone and myself were looking at this, and Valerio has put together a code which can do the LHgrid generation for all replicas in ~1hr.
This seems fine by me considering the amount of time currently spent initialising APFEL for each replica at the moment.
The procedure would probably look something like
- Fit outputs initial scale grid of points
- Postfit computes vetoes etc, computes average on initial scale grid
- New
evolvefit
generates LHgrid - vp-upload etc etc
How does it sound?
Created by: scarrazza
This ~1hr is obtained with the external grid, TRN solution, etc.? Does this implementation works with external PDFs? If so, then this is much better than what we have now however I still not totally convinced we need an extra ~1hr for objects that can be precomputed and stored in the theory folder...
Created by: nhartland
Ah, no, good point, this is with the exact solution. How much longer is the TRN?
On 22 May 2018, at 19:02, Stefano Carrazza notifications@github.com wrote:
This ~1hr is obtained with the external grid, TRN solution, etc.? Does this implementation works with external PDFs? If so, then this is much better than what we have now however I still not totally convinced we need an extra ~1hr for objects that can be precomputed and stored in the theory folder...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub !193 (closed), or mute the thread https://github.com/notifications/unsubscribe-auth/AE_NDNXP52TZh49fdT4qDG2Tj52kTjntks5t1EShgaJpZM4T3hSw.
Created by: vbertone
Yes, it is true that the truncated evolution is much slower. The reason is simply that, to compute the numerical derivative, the evolution has to be computed 5 times. We can surely store the evolution kernels in some format in APFEL or in the APFELcomb but personally, I'm not very much in favour of this solution. As you may remember, this very same procedure caused us a lot of problems in the past.
Created by: scarrazza
I must admit that we have been too precipitated when we decided to abbandon the FK evolution tables.
We didn't investigate the origin of the problem, which was the generation layout and storage mechanisms we had at that time, and instead we suggested an alternative approach which still doesn't guarantee bug free results (i.e. suppose a new apfel commit introduces a bug), is slow and requires a lot of memory to satisfy the apfel dependency.
So, I think the best we can do is to switch back to tables generated/stored by APFELcomb, but for that we have to define a suitable format for the data fields, which must include the xgrid, qgrid and alphas (in the same file).
Created by: Zaharid
Frankly, while not being there in the "past", the only argument I hear about is that it used to be done that way and therefore is bad, but never quite understood why. So with the information I have, I very much agree with @scarrazza.
On Thu, May 24, 2018 at 11:31 AM, Stefano Carrazza <notifications@github.com
wrote:
I must admit that we have been too precipitated when we decided to abbandon the FK evolution tables.
We didn't investigate the origin of the problem, which was the generation layout and storage mechanisms we had at that time, and instead we suggested an alternative approach which still doesn't guarantee bug free results (i.e. suppose a new apfel commit introduces a bug), is slow and requires a lot of memory to satisfy the apfel dependency.
So, I think the best we can do is to switch back to tables generated/stored by APFELcomb, but for that we have to define a suitable format for the data fields, which must include the xgrid, qgrid and alphas (in the same file).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub !193 (closed), or mute the thread https://github.com/notifications/unsubscribe-auth/AFabUshE9tQeAtiblWt7Hn7MQmf8pLNVks5t1owFgaJpZM4T3hSw .
Created by: scarrazza
Let me try to write down what I remember about that bug:
- FK tables were stored in single tgz per file (not per theory)
- the FK tables per dglap were stored in single tgz files + (and here the problem) an extra flat txt file with the xgrid, qgrid and alphas values.
- all files were stored in the same location after download (no independent theory folder) so quite easy to make a mess with override. In particular because no piece of code was checking or cleanup the download location...
One day, when replacing the DGLAP of a 3.0 set for the nf variation, something fishy happened: the txt file was not updated (overrided) consistently with the FK for DGLAP with different nf value and we got bugged grids. So the bug was caused by a mixture of bad layout and auxiliary scripts and, why not, human interaction (I can imagine users downloading tables manually from the server and forgetting to update the flat txt file)