Parallel hyperoptimization with MongoDB
Created by: Cmurilochem
Aim
This PR aims to implement parallel hyperoptimizations using MongoDB datasets and mongo workers. This will enable us to calculate several trials simultaneously.
Strategy
Similarly to FileTrials, the main idea is to implement a MongoFileTrials class that inherits from MongoTrials. This new MongoFileTrials class will then be the one we will instantiate before calling hyperopt fmin
Tasks
-
Implement MongoFileTrials
-
Parse MongoDB option to n3fit
command andHyperScanner
-
Adapt hyper_scan_wrapper
to allow for parallel evaluation offmin
trials -
Add MondoDB
andpymongo
as dependencies -
Add unit/integration test -
Quantify performance improvement -
Run test on snellius -
Add documentation -
Add restarting options
Usage
Local Machine (for simple tests only)
First, make sure that you have MongoDB
installed either via conda
(not sure if available in the latest conda
version) or apt-get/brew
. Also pymongo
is necessary but this can be easily installed via pip
(it has already been added as dependency).
In the latest version of the code in this PR, n3fit
is adapted to run automatically (by internal subprocessing) both mongod
(that generates MongoDB
databases) and hyperopt-mongo-worker
(that launches mongo workers).
To run parallel hyperopts with n3fit
, do:
n3fit hyper-quickcard.yml 1 -r N_replicas -o dir_output_name --hyperopt N_trials --parallel-hyperopt --num-mongo-workers N
where N
defines the number of mongo workers you want to launch in parallel. Indeed, N
will define the number of trials we are calculating simultaneously. If you want to restart jobs, make sure you have dir_output_name
in your current path and do:
n3fit hyper-quickcard.yml 1 -r N_replicas -o dir_output_name --hyperopt N_trials --parallel-hyperopt --num-mongo-workers N --restart
Snellius
Here is a complete slurm script showing how we would run a hyperopt experiment in parallel in snellius (including restarts if needed):
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --partition gpu
#SBATCH --gpus-per-node=4
#SBATCH --time 24:00:00
#SBATCH --output=logs/parallel_slurm-%j.out
# Print job info
echo "Job started on $(hostname) at $(date)"
# conda env
ENVNAME=py_nnpdf-master-gpu
# calc details
RUNCARD="hyper-quickcard.yml"
REPLICAS=2
TRIALS=30
DIR_OUTPUT_NAME="test_hyperopt"
RESTART=false
# number of mongo workers to lauch
N_MONGOWORKERS=4
# activate conda environment
source ~/.bashrc
anaconda
conda activate $ENVNAME
# set up cudnn to run on the gpu
CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))
echo "CUDNN path: $CUDNN_PATH"
export LD_LIBRARY_PATH="$CONDA_PREFIX/lib/:$CUDNN_PATH/lib:$LD_LIBRARY_PATH"
echo "LD_LIBRARY_PATH: $LD_LIBRARY_PATH"
# Verify GPU usage
ngpus=$(python3 -c "import tensorflow as tf; print(len(tf.config.list_physical_devices('GPU')))")
ngpus_list=$(python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))")
echo "List of physical devices '$ngpus_list'"
if [ ${ngpus} -eq 0 ]; then
echo "GPUs not being used!"
else
echo "Using GPUs!"
echo "Num GPUs Available: ${ngpus}"
fi
# Run n3fit
echo "Changing directory to $TMPDIR"
cp "runcards/$RUNCARD" $TMPDIR
if [ ${RESTART} == "true" ]; then
cp -r $DIR_OUTPUT_NAME $TMPDIR
fi
cd $TMPDIR
echo "Running n3fit..."
if [ ${RESTART} == "true" ]; then
echo "Restarting job...."
echo "n3fit '$TMPDIR/$RUNCARD' 1 -r $REPLICAS --hyperopt $TRIALS -o $DIR_OUTPUT_NAME --parallel-hyperopt --num-mongo-workers $N_MONGOWORKERS --restart"
n3fit "$TMPDIR/$RUNCARD" 1 -r $REPLICAS --hyperopt $TRIALS -o $DIR_OUTPUT_NAME --parallel-hyperopt --num-mongo-workers $N_MONGOWORKERS --restart
else
echo "n3fit '$TMPDIR/$RUNCARD' 1 -r $REPLICAS --hyperopt $TRIALS -o $DIR_OUTPUT_NAME --parallel-hyperopt --num-mongo-workers $N_MONGOWORKERS"
n3fit "$TMPDIR/$RUNCARD" 1 -r $REPLICAS --hyperopt $TRIALS -o $DIR_OUTPUT_NAME --parallel-hyperopt --num-mongo-workers $N_MONGOWORKERS
fi
echo "Copying outputs to $SLURM_SUBMIT_DIR ..."
cp -r "$TMPDIR/$DIR_OUTPUT_NAME" $SLURM_SUBMIT_DIR
echo "Returning to $SLURM_SUBMIT_DIR ..."
cd $SLURM_SUBMIT_DIR
echo "Job completed at $(date)"
This would be run by doing:
sbatch minimal_parallel_hyperopt.slurm --exclusive
Here, each mongo worker selected (4) sees and run in one separate GPU:
as implemented here. In this run, we are then calculating 4 trials in parallel.
We could also set up our experiment to run 2 mongo workers in each gpu (8 trials in parallel), e.g., by using N_MONGOWORKERS=8
in the script above. In this case, we would observe:
Performance assessment
Local Machine
I have just made a very quick test in my local pc to assess the possible performance improvement with parallel hyperopts. I used the hyper-quickcard.yml
card from n3fit/tests/regression
(with minor modifications) and run it for 10 trials and 2 replicas varying the number of simultaneously launched mongo workers. The results are summarised in the figure below:
The results look encouraging a priori.
Snellius
For the snellius tests, I have employed the slurm script above as model and a more complete runcard.txt. I ran 10 trials with 2 replicas with varying numbers of mongo workers. The final results (after several fine tunings in the code) are plotted in the figure below:
It shows the variations of the total clock run time of each job as a function of the number of launched mongo workers. The idea here is that each mongo worker is responsible for one trial in hyperopt, so the more mongo workers we launch the more trials we calculate simultaneously.
I also tested the possibility that we launch more than 1 mongo worker per gpu; see right (light grey) part of the figure. This is actually where we observe the best performance and improvement. So, as seen, a job with 8 mongo workers (2 per gpu) is nearly ~8x faster than a serial hyperopt.