vLLM is a fast and easy-to-use library for LLM inference and serving. Originally developed in the Sky Computing Lab at UC Berkeley, vLLM has grown into one of the most active open-source AI projects built and maintained by a diverse community of many dozens of academic institutions and companies from over 2000 contributors.
This software is still in testing and is awaiting feedback from other Comet users.
The vLLM software is provisioned in a container file stored in the /nobackup/shared/containers directory and is accessible to all users of Comet. You do not need to take a copy of the container file; it should be left in its original location.
/nobackup/shared/containers
You can find the container files here:
/nobackup/shared/containers/vllm.0.19.1.sif
We normally recommend using the latest version of the container. If you require a different version of vLLM, please contact us.
Container Image Versions
We may reference a specific container file, such as vllm.0.19.1.sif, but you should always check whether this is the most recent version of the container available. Simply ls the /nobackup/shared/containers directory and you will be able to see if there are any newer versions listed.
ls
We have provided a convenience script that will automate all of steps needed to run applications inside the container, and access your $HOME, /scratch and /nobackup directories to just two simple commands.
$HOME
/scratch
/nobackup
/nobackup/shared/containers/vllm.0.19.1.sh
There is a corresponding .sh script for each version of the container image we make available.
.sh
Just source this file and it will take care of loading apptainer, setting up your bind directories and calling the exec command for you - and give you a single command called container.run (instead of the really long apptainer exec command) to then run anything you want inside the container, for example - to run the basic offline inference example:
source
apptainer
bind
exec
container.run
$ source /nobackup/shared/containers/vllm.0.19.1.sh $ container.run python3 /opt/vllm/examples/basic/offline_inference/basic.py
Note that the examples included with vLLM are all accessible under /opt/vllm/examples inside the container.
/opt/vllm/examples
This is an example of what an sbatch job file may look like when running vLLM:
sbatch
#!/bin/bash #SBATCH --account=comet_abcxyz #SBATCH --partition=gpu-s_paid #SBATCH -c 8 #SBATCH --mem=32G #SBATCH --gres=gpu:L40:1 source /nobackup/shared/containers/vllm.0.19.1.sh container.run python3 /opt/vllm/examples/basic/offline_inference/basic.py
Obviously adjust your SBATCH parameters to match the partition you want to use and the resources you intend to allocate to vLLM.
As long as you use the container.run method to launch the applications, you will automatically be able to read and write to files in your $HOME, /scratch and /nobackup directories and any Nvidia GPU cards that you assign via Slurm.
If you run any of the applications inside the container manually, without using the container.run helper you will need to use the –bind argument to apptainer to ensure that all relevant directories are exposed within the container.
–bind
Do remember that the container filesystem itself cannot be changed - so you won't be able to write or update to /usr/local, /opt, /etc or any other internal folders - keep output directories restricted to the three areas listed above.
/usr/local
/opt
/etc
Important
This section is only relevant to RSE HPC staff or users wanting to understand how the container image is built. If you are intending to simply use the software you do not need to read this section - turn back now!
Build Script:
#!/bin/bash echo "Loading modules..." module load apptainer echo "" echo "Building container..." export APPTAINER_TMPDIR=/scratch apptainer build vllm.0.19.1.sif vllm.def 2>&1 | tee vllm.log
Container Definition:
Bootstrap: docker From: ubuntu:noble %post # Prevent interactive prompts export DEBIAN_FRONTEND=noninteractive # Update & install only necessary packages apt-get update # Base stuff everything will need apt-get install -y aptitude wget zip git less vim python3-pip # Remove any downloaded package files - so they do not remain in the built image apt-get clean mkdir -p /opt mkdir -p /src cd /src # Pytorch install pip install torch torchvision --break-system-packages # vLLM Install pip install vllm --break-system-packages # Optional newer transformers code pip install git+https://github.com/huggingface/transformers.git --break-system-packages # Source code of vllm - to get access to the example scripts cd /src wget https://github.com/vllm-project/vllm/archive/refs/tags/v0.19.1.tar.gz -O vllm-0.19.1.tgz tar -zxf vllm-0.19.1.tgz mv vllm-0.19.1/ /opt/vllm # Remove any temporary files used during build cd / rm -rf /src pip cache purge %environment %runscript
Runtime Helper:
You should source this file in order to use the container.run command. This should have the current container image name set as the IMAGE_NAME parameter:
IMAGE_NAME
#!/bin/bash module load apptainer IMAGE_NAME=/nobackup/shared/containers/vllm.0.19.1.sif container.run() { # Run a command inside the container... # automatically bind the /scratch and /nobackup dirs # pass through any additional parameters given on the command line apptainer exec --nv --bind /scratch:/scratch --bind /nobackup:/nobackup ${IMAGE_NAME} $@ }
Back to Software
Table of Contents
HPC Service
Main Content Sections
Documentation Tools