Skip to content

Revert the default for parallel fits to a single model per replica

Emanuele Roberto Nocera requested to merge revert_multidense_default into master

Created by: scarlehoff

This PR reverts back to a model-per-replica configuration (instead of the multidense layer) since the multidense layer is not able to fit models independently (at the moment) which is necessary for a complete fit.

For sequential fits nothing should change, for parallel fit the difference is of about a factor of 2 in time (but I'm guessing that this factor of 2 we are gaining might be why the fits are of worse quality).

I've introduced a multidense layer type if people want to play around with multidense.

Here two reports, as you can see multidense has a terrible chi2 while dense is statistically compatible with 4.0.

multidense: https://vp.nnpdf.science/UGSJ6OsPTdSR2s97OXIvDw== dense: https://vp.nnpdf.science/WbBCvsjfQV-6ncIQ3GhCVw==

I haven't looked deep into the details of the error but I don't think multidense was truly benchmarked in a real-life scenario? At least I could not find any examples.

Anyway, for reference, these were fits of 120 replicas in a RTX 2080. 80 minutes with dense, 40 minutes with multidense. It would be ideal if multidense could be fixed but I can live with 80 minutes runs... for now checks.py will fail if you use multidense to make sure it is only used as a development option for now.

Fixes, in a way, #2118

TODO:

  • Fix regression tests, note that most of the .json files have no changes (proving that, for one replica, multidense and dense were really the same)
  • Check it works with TF 2.16/2.17 (if I'm able to get a computer with both a GPU and a recent-enough version of python/tf at the same time, not trivial)

RE the last point, I cannot check but the python tests are getting TF 2.17 so the code does work.

Merge request reports

Loading