Containers are used at the core of csquare. They allow you to bring your own software stack into the platform to run your experiments. csquare currently uses Docker-compatible containers.
While containers are traditionally used to deploy small workloads on a server, we use the containers to create a secure, customizable sandbox to run your experiments. Depending on your compute resources requirements, your container might be the only one to be executed on a compute node.
On the csquare platform, containers are used to help you tu run your workloads with all the tools you need. You might want to use a specific version of Pytorch, or even a specific Linux distribution. If your model requires GPU support, we recommend you to use official NVIDIA images that you can find on the NVIDIA NGC.
Suppose that you want to train a model to recognize cars and this is based on Pytorch. You might have a Git repository with the following structure:
In this case, to start your training, you will run the following command:
This model uses GPUs acceleration; you need a Docker image that is made for GPUs.
You finally have the following requirements:
- Python with Pytorch
- GPU support
- Python 3
- Pytorch v21.05
- GPU support
In order to use the NVIDIA NGC, you need to set up your registries accordingly.
Ensure that you allocate more memory than your container image size, otherwise your experiment might fail with an