Skip to content

Use mkl for n3fit

Emanuele Roberto Nocera requested to merge n3fit_to_mkl into n3fit_separate_convolution

Created by: scarlehoff

These are small changes but seem to be important for performance. This is to be compared to the times and memory usages of #745

DIS, 4 threads 2 GB: 27+-8 min Global, 8 cores, 4.8 GB: 3.7 h +- 1.0 h Global, 8 cores, 8 GB: 3.3 +- 0.7 h

Note that each two thread correspond to one single core. Also that to get similar times for master you need to allocate at least 12-14 GB of memory. I never saw it going beyond 3.8 in my tests, I gave it one extra in the cluster just in case.

The interesting one is the global, I gave it a much smaller memory allowance (which can have side effects like more jobs falling in the same node or swapping) with no speed penalty. I hope this opens the door to running several replicas in parallel. I also did the test with 8 GB for a better comparison (more memory maybe it is not useful for n3fit but helps to not have jobs from others in the same node)

Also added two new tests, one that ensures that n3fit doesn't suddenly take ages and another one to ensure that the changes don't break the hyperoptimization.

With regards to the usage of MKL, the key setting for good performance seems to be KMP_BLOCKTIME which gives best result when set to 0. https://software.intel.com/en-us/articles/maximize-tensorflow-performance-on-cpu-considerations-and-recommendations-for-inference

I wonder whether it makes sense to install directly from the intel conda channel instead of the default. I'll run some benchmarks in case I notice any difference.

This is ready for review, but given that the changes might be machine dependent I'll run the fits on the other cluster I have access to in order to make sure that everything is ok.

Merge request reports

Loading