Skip to content
Snippets Groups Projects

evolvefit

Closed Emanuele Roberto Nocera requested to merge evolvefit into master

Created by: scarrazza

This pull request is supposed to remove APFEL from nnfit, as discussed in #173 (closed) .

The code contains the following modifications:

  • no APFEL dependency in nnfit
  • evolvefit and nnfit share the same objects

I am opening this PR now (in a prototype stage) because as opposed to what we have been discussing in Amsterdam the APFEL initialization is not the only bottleneck of the DGLAP evaluation.

Here a summary of the situation:

  1. if we use the external grid (as nnfit does) the initialization takes 1000s while the evolution itself take 30 minutes per replica (!!)

  2. if we use the fast evolution with internal grids, the init. takes up to 10s and the evolution per replica can take approx. 1 min. This solution produces spikes as we well know.

Having these points in mind, I am not sure removing APFEL from nnfit still the best option, so here some ideas:

  • we can perform a tedious grid search for 2 until we reach a good compromise (if possible)
  • we can adapt evolvefit to work as revolvenet replica by replica in a cluster (but here I don't see any advantage)
  • forget about this PR, merge APFEL's PR that fixes the leaks, including EDI crashes (https://github.com/scarrazza/apfel/pull/10), and keep for future the development of a faster and more stable DGLAP algorithm.

Let me know what do you think.

Merge request reports

Closed by avatar (Apr 11, 2025 12:46pm UTC)

Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • requested review from @enocera

  • requested review from @enocera

  • Created by: Zaharid

    Let's discuss on Thursday. One advantage of this approach is that it guarantees the nnfit results to be protected from misbehaviour of apfel. But whether that compensates the annoyance of having to deal with more complicated job submission scripts is somewhat questionable. There would be gains if we could pay for apfel initialization once and evolve the replicas concurrently, but we already have enough issues with the single threaded version of the problem.

  • Created by: nhartland

    That's a shame about the external grid taking so long. I thought someone mentioned at the amsterdam meeting that it was pretty quick?

  • Created by: scarrazza

    What about moving the alpha_s values at the top of each LHAPDF replica file? Then postfit can write the info file without APFEL dependency, but this still requires access to the theory db.

  • Emanuele Roberto Nocera changed title from [WIP] evolvefit to evolvefit

    changed title from [WIP] evolvefit to evolvefit

  • Created by: scarrazza

    @nhartland @Zaharid this PR is now ready for a first review/test pass.

  • Created by: Zaharid

    With this approach, is there any reason why we don't compute the evolution kernel in parallel to the fit and then we just multiply it?

  • Created by: scarrazza

    Not with the current structure in APFEL.

  • Created by: scarrazza

    Just a short update since yesterday's meeting. The idea of this PR still ok, however it is too slow for my personal taste. So we have decided the following:

    • Nathan will take this PR and replace the direct evolution with the evolution operators (alla FK table). This should avoid the requirement of rerunning evolvefit for each replica in a separate job, reducing drastically the resources/calculation time.
    • After that we perform speed benchmarks, if we get something acceptable, i.e. 30-40 min. calculation, then fine, otherwise we should rethink about DGLAP FK tables (or something else) as we did in 3.0.
  • Created by: nhartland

    So @vbertone and myself were looking at this, and Valerio has put together a code which can do the LHgrid generation for all replicas in ~1hr.

    This seems fine by me considering the amount of time currently spent initialising APFEL for each replica at the moment.

    The procedure would probably look something like

    1. Fit outputs initial scale grid of points
    2. Postfit computes vetoes etc, computes average on initial scale grid
    3. New evolvefit generates LHgrid
    4. vp-upload etc etc

    How does it sound?

  • Created by: scarrazza

    This ~1hr is obtained with the external grid, TRN solution, etc.? Does this implementation works with external PDFs? If so, then this is much better than what we have now however I still not totally convinced we need an extra ~1hr for objects that can be precomputed and stored in the theory folder...

  • Created by: nhartland

    Ah, no, good point, this is with the exact solution. How much longer is the TRN?

    On 22 May 2018, at 19:02, Stefano Carrazza notifications@github.com wrote:

    This ~1hr is obtained with the external grid, TRN solution, etc.? Does this implementation works with external PDFs? If so, then this is much better than what we have now however I still not totally convinced we need an extra ~1hr for objects that can be precomputed and stored in the theory folder...

    — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub !193 (closed), or mute the thread https://github.com/notifications/unsubscribe-auth/AE_NDNXP52TZh49fdT4qDG2Tj52kTjntks5t1EShgaJpZM4T3hSw.

  • Created by: scarrazza

    I can imagine a factor 3 to 5 slower...

  • Created by: nhartland

    Alright then, I agree probably we should pre-compute these in e.g APFELcomb and just do the product offline (like as before). I guess there probably isn't a more practical solution?

  • Created by: scarrazza

    I am afraid not if we must use the current evolution setup (external+TRN). On the other hand these caching grids could be generated and maintained by APFEL itself.

  • Created by: vbertone

    Yes, it is true that the truncated evolution is much slower. The reason is simply that, to compute the numerical derivative, the evolution has to be computed 5 times. We can surely store the evolution kernels in some format in APFEL or in the APFELcomb but personally, I'm not very much in favour of this solution. As you may remember, this very same procedure caused us a lot of problems in the past.

  • Created by: scarrazza

    I must admit that we have been too precipitated when we decided to abbandon the FK evolution tables.

    We didn't investigate the origin of the problem, which was the generation layout and storage mechanisms we had at that time, and instead we suggested an alternative approach which still doesn't guarantee bug free results (i.e. suppose a new apfel commit introduces a bug), is slow and requires a lot of memory to satisfy the apfel dependency.

    So, I think the best we can do is to switch back to tables generated/stored by APFELcomb, but for that we have to define a suitable format for the data fields, which must include the xgrid, qgrid and alphas (in the same file).

  • Created by: Zaharid

    Frankly, while not being there in the "past", the only argument I hear about is that it used to be done that way and therefore is bad, but never quite understood why. So with the information I have, I very much agree with @scarrazza.

    On Thu, May 24, 2018 at 11:31 AM, Stefano Carrazza <notifications@github.com

    wrote:

    I must admit that we have been too precipitated when we decided to abbandon the FK evolution tables.

    We didn't investigate the origin of the problem, which was the generation layout and storage mechanisms we had at that time, and instead we suggested an alternative approach which still doesn't guarantee bug free results (i.e. suppose a new apfel commit introduces a bug), is slow and requires a lot of memory to satisfy the apfel dependency.

    So, I think the best we can do is to switch back to tables generated/stored by APFELcomb, but for that we have to define a suitable format for the data fields, which must include the xgrid, qgrid and alphas (in the same file).

    — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub !193 (closed), or mute the thread https://github.com/notifications/unsubscribe-auth/AFabUshE9tQeAtiblWt7Hn7MQmf8pLNVks5t1owFgaJpZM4T3hSw .

  • Created by: scarrazza

    Let me try to write down what I remember about that bug:

    • FK tables were stored in single tgz per file (not per theory)
    • the FK tables per dglap were stored in single tgz files + (and here the problem) an extra flat txt file with the xgrid, qgrid and alphas values.
    • all files were stored in the same location after download (no independent theory folder) so quite easy to make a mess with override. In particular because no piece of code was checking or cleanup the download location...

    One day, when replacing the DGLAP of a 3.0 set for the nf variation, something fishy happened: the txt file was not updated (overrided) consistently with the FK for DGLAP with different nf value and we got bugged grids. So the bug was caused by a mixture of bad layout and auxiliary scripts and, why not, human interaction (I can imagine users downloading tables manually from the server and forgetting to update the flat txt file)

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading