GaMPEN is written in Python and relies on the PyTorch deep learning library to perform all of its tensor operations.
Training and inference for GaMPEN requires Python 3.7+. Trained GaMPEN models can be run on a CPU to perform inference, but training a model requires access to a CUDA-enabled GPU for reasonable training times.
Create a new conda environment with Python 3.7+. Extensive instructions on how to create a conda enviornment can be found here. Of course, you could use any other method of creating a virtual environment, but we will assume you are using conda for the rest of this guide.
conda create -n gampen python=3.7
Activate the new environment
conda activate gampen
Navigate to the directory where you want to install GaMPEN and then clone this repository with
git clone https://github.com/aritraghsh09/GaMPEN.git
Navigate into the root directory of GaMPEN
Install all the required dependencies with
To confirm that the installation was successful, run
It is okay if there are some warnings or some tests are skipped. The only thing you should look out for is errors produced by the
make check command.
If you get an error about specific
libcudas libraries being absent while running
make check, this has probably to do with the fact that you don’t have the appropriate CUDA and cuDNN versions installed for the PyTorch version being used by GaMPEN. See below for more details about GPU support.
GaMPEN can make use of multiple GPUs while training if you pass in the appropriate arguments to the
To check whether the GaMPEN is able to detect GPUs, type
python into the command line from the root directory and run the following command:
from ggt.utils.device_utils import discover_devices discover_devices()
The output should be
cudaif GaMPEN can detect a GPU.
If the output is
cpu then GaMPEN couldn’t find a GPU. This could be because you don’t have a GPU, or because you haven’t installed the appropriate CUDA and cuDNN versions.
If you are using an NVIDIA GPU, then you can use this link for more details about specific CUDA and cuDNN versions that are compatible with different PyTorch versions. To check the version of PyTorch you are using, type
python into the command line and then run the following code-block:
import torch print(torch.__version__)
The core parts of the GaMPEN ecosystem are :-
Placing your data in a specific directory structure
GaMPEN/ggt/data/make_splits.pyscript to generate train/devel/test splits
GaMPEN/ggt/train/train.pyscript to train a GaMPEN model
Using the MLFlow UI to monitor your model during and after training
GaMPEN/ggt/modules/inference.pyscript to perform predictions using the trained model.
GaMPEN/ggt/modules/result_aggregator.pyscript to aggregate the predictions into an easy-to-read pandas data-frame.
We highly recommend going through all our Tutorials to get an in-depth understanding of how to use GaMPEN, and an overview of all the steps above.
Here, we provide a quick-and-dirty demo to just get you started training your 1st GaMPEN model! This section is intentionally short without much explanation.
Let’s download some simulated Hyper Suprime-Cam (HSC) images from the Yale servers. To do this run from the root directory of this repository:
make demodir=./../hsc hsc_demo
This should create a directory called
hsc at the specified
demodir path with the following components
- hsc - info.csv -- file names of the trianing images with labels - cutouts/ -- 67 images to be used for this demo
Now, let’s split the data into train, devel, and test sets. To do this, run
python ./ggt/data/make_splits.py --data_dir=./../hsc/ --target_metric='bt'
This will create another folder called
hsc with the different data-splits for training, devel (validation), and testing.
Running the trainer
Run the trainer with
python ggt/train/train.py \ --experiment_name='demo' \ --data_dir='./../hsc/' \ --split_slug='balanced-dev2' \ --batch_size=16 \ --epochs=2 \ --lr=5e-7 \ --momentum=0.99 \ --crop \ --cutout_size=239 \ --target_metrics='custom_logit_bt,log_R_e,log_total_flux' \ --repeat_dims \ --no-nesterov \ --label_scaling='std' \ --dropout_rate=0.0004 \ --loss='aleatoric_cov' \ --weight_decay=0.0001 \ --parallel
To list the all possible options along with explanations, head to the Using GaMPEN page or run
python ggt/train/train.py --help
Launching the MLFlow UI
Open a separate shell and activate the virtual environment that the model is training in. Then, run
Now navigate to
http://localhost:5000/ to access the MLFlow UI which will show you the status of your model training.
MLFLow on a remote machine
First, on the server/HPC, navigate to the directory from where you initiated your GaMPEN run (you can do this on separate machine as well – only the filesystem needs to tbe same). Then execute the following command
mlflow ui --host 0.0.0.0
--host option is important to make the MLFlow server accept connections from other machines.
Now from your local machine tunnel into the
5000 port of the server where you ran the above command.
or example, let’s say you are in an HPC environment, where the machine where you ran the above command is named
server1 and the login node to your HPC is named
hpc.university.edu and you have the username
astronomer. Then to forward the port you should type the following command in your local machine
ssh -N -L 5000:server1:5000 firstname.lastname@example.org
If performing the above step without a login node (e.g., a server whhich has the IP
server1.university.edu), you should be able to do
ssh -N -L 5000:localhost:5000 email@example.com
After forwarding, if you navigate to
http://localhost:5000/ you should be able to access the MLFlow UI
Training on other datasets
First create the necessary directories with
mkdir -p (dataset-name)/cutouts
Place FITS files in
Provide a file titled
(dataset-name). This file should have (at least) a column titled
file_name(corresponding to the names of the files in
(dataset-name)/cutouts), a column titled
object_id(with a unique ID of each file in
(dataset-name)/cutouts) and one column for each of the variables that you are trying to predict. For example, if you are trying to predict the radius of a galaxy, you would have a column titled
Generate train/devel/test splits with
python ggt/data/make_splits.py --data_dir=data/(dataset-name)/
make_splits.py file splits the dataset into a variety of splits and you can choose to use any of these for your analysis. Details of the various splits are mentioned on the Using GaMPEN page.
After generating the splits, the
dataset-name directory should look like this:
- dataset_name - info.csv - cutouts/ - splits/
Follow the instructions under Running the trainer.