Fix multidense compatibility with tf 2.16 and keras 3
Created by: APJansen
Merge request reports
Activity
mentioned in issue #1970 (closed)
Created by: scarlehoff
When trying it in my computer I got this error though:
/NNPDF/src/nnpdf/n3fit/src/n3fit/model_gen.py", line 789, in generate_nn pdfs = layer(pdfs) ^^^^^^^^^^^ /NNPDF/src/nnpdf/n3fit/src/n3fit/backends/keras_backend/multi_dense.py", line 128, output_shape = output_shape[:1] + [self.replicas] + output_shape[1:] ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~ TypeError: Exception encountered when calling MultiDense.call(). can only concatenate tuple (not "list") to tuple Arguments received by MultiDense.call(): • args=('<KerasTensor shape=(1, None, 2), dtype=float32, sparse=None, name=xgrids_processed>',) • kwargs=<class 'inspect._empty'>
Created by: scarlehoff
fixing that generates further errors I'm afraid
ValueError: In a nested call() argument, you cannot mix tensors and non-tensors. Received invalid mixed argument: inputs={'pdf_x': <KerasTensor shape=(1, 1, None, 14), dtype=float32, sparse=False, name=keras_tensor_25>, 'pdf_xgrid_integration': <KerasTensor shape=(1, 1, None, 14), dtype=float32, sparse=False, name=keras_tensor_36>, 'xgrid_integration': <KerasTensor shape=(1, None, 1), dtype=float32, sparse=None, name=integration_grid>, 'photon_integral': array([[[0.]]])}
so keras 3 might be breaking quite a lot of stuff...
Created by: APJansen
Yes was just about to say that I would expect that... We are doing many things which are not standard, particularly in
MetaModel
andMetaLayer
, I would expect a lot of things to break.I think transitioning to Keras 3 with multiple supported backends would be great, especially when all those backend wrappers would be removed (or at least moved down a level). But timing wise perhaps this is not the best moment..
One thing you can check is what happens with the dense-per-flavour layer, but I think the error you quote above would also arise there.
Created by: scarlehoff
It is not that bad, the problem right now seem to be
NotImplementedError: numpy() is only available when eager execution is enabled
I'm not sure where that's happening since the traceback is just the training so we are at some point calling something we are not allowed to call (but it should've failed earlier during compilation of the model... it's, at least partially, a tensorflow bug)
Edit: setting eager execution everything work other than the log broken down by experiment because they have changed the reporting but that's minor.
Created by: scarlehoff
It seems to be inside tensorflow/keras. More specifically it is understanding the preprocessing factor variables as numpy arrays. Which it really seems like a bug in tensorflow in that if there's anything wrong with them they should've been caught some time ago by the time the training starts.
added run-fit-bot label
Created by: github-actions[bot]
Greetings from your nice fit
! I have good news for you, I just finished my tasks:- Fit Name: NNBOT-0a7d27e2f-2024-03-04
- Fit Report wrt master: https://vp.nnpdf.science/OcMBAjoWRSq2vMro_M_gzA==
- Fit Report wrt latest stable reference: https://vp.nnpdf.science/dvehhiHPRviA3_ruNnPj_Q==
- Fit Data: https://data.nnpdf.science/fits/NNBOT-0a7d27e2f-2024-03-04.tar.gz
Check the report carefully, and please buy me a
, or better, a GPU !Created by: scarlehoff
Please @APJansen when you have time could you make sure that the changes I made are sensible?
@Cmurilochem I had to change the scope of the hyperopt pickle test to check that indeed when starting from a pickle the next first trial does give you the same parameters as if they had been run sequentially... however the results after the first trial change. It's a sub-% change but that's enough to change the following trials. I've noticed that tf (or maybe keras) has changed the behaviour of the special case
seed=0
(which now basically means "get some random seed" and broke some of the multidense tests), I wonder whether this is related.How important is that, after a restart, the hyperopt continues exactly on the same path after the first trial? If it is important it might be necessary to investigate where this difference is coming from before merging (or blocking hyperopt runs to only use tensorflow <2.16 / python <3.12 / numpy < X.YY... not sure which library is at fault here tbh)
414 408 for i_replica in range(self.num_replicas): 415 409 self.set_replica_weights(weights, i_replica) 416 410 411 def save_weights(self, file, save_format="h5"): 412 """ 413 Compatibility function for tf < 2.16 414 """ 415 try: 416 super().save_weights(file, save_format=save_format) 417 except TypeError: 418 new_file = file.with_suffix(".weights.h5") 419 super().save_weights(new_file) 420 shutil.move(new_file, file) Created by: scarlehoff
I'll add it as a comment but, the reason is that for keras 3
save_format
doesn't exist anymore and instead the format is read directly from the extension docs.So you are forced to save it to a
.weights.h5
file.
removed run-fit-bot label