Revert the default for parallel fits to a single model per replica
Created by: scarlehoff
This PR reverts back to a model-per-replica configuration (instead of the multidense layer) since the multidense layer is not able to fit models independently (at the moment) which is necessary for a complete fit.
For sequential fits nothing should change, for parallel fit the difference is of about a factor of 2 in time (but I'm guessing that this factor of 2 we are gaining might be why the fits are of worse quality).
I've introduced a multidense
layer type if people want to play around with multidense
.
Here two reports, as you can see multidense
has a terrible chi2 while dense
is statistically compatible with 4.0.
multidense
: https://vp.nnpdf.science/UGSJ6OsPTdSR2s97OXIvDw==
dense
: https://vp.nnpdf.science/WbBCvsjfQV-6ncIQ3GhCVw==
I haven't looked deep into the details of the error but I don't think multidense
was truly benchmarked in a real-life scenario? At least I could not find any examples.
Anyway, for reference, these were fits of 120 replicas in a RTX 2080. 80 minutes with dense
, 40 minutes with multidense
.
It would be ideal if multidense
could be fixed but I can live with 80 minutes runs... for now checks.py
will fail if you use multidense
to make sure it is only used as a development option for now.
Fixes, in a way, #2118
TODO:
-
Fix regression tests, note that most of the .json
files have no changes (proving that, for one replica,multidense
anddense
were really the same) -
Check it works with TF 2.16/2.17 (if I'm able to get a computer with both a GPU and a recent-enough version of python/tf at the same time, not trivial)
RE the last point, I cannot check but the python tests are getting TF 2.17 so the code does work.