====== vLLM ======

> vLLM is a fast and easy-to-use library for LLM inference and serving.
>
> Originally developed in the Sky Computing Lab at UC Berkeley, vLLM has grown into one of the most active open-source AI projects built and maintained by a diverse community of many dozens of academic institutions and companies from over 2000 contributors.

<WRAP round box todo>
This software is still in testing and is awaiting feedback from other Comet users.
</WRAP>

   * For more information about vLLM: https://vllm.ai/

----

===== Running vLLM on Comet =====

The vLLM software is provisioned in a container file stored in the ''/nobackup/shared/containers'' directory and is accessible to __all__ users of Comet. You do //not// need to take a copy of the container file; it should be left in its original location.

You can find the container files here:

   * ''/nobackup/shared/containers/vllm.0.19.1.sif''

We //normally// recommend using the latest version of the container. If you require a different version of vLLM, please [[:contact:index|contact us]].

<WRAP round box info>
**Container Image Versions**

We //may// reference a specific container file, such as **vllm.0.19.1.sif**, but you should always check whether this is the most recent version of the container available. Simply ''ls'' the ''/nobackup/shared/containers'' directory and you will be able to see if there are any newer versions listed.
</WRAP>

We have provided a convenience script that will automate **all** of steps needed to run applications inside the container, and access your ''$HOME'',  ''/scratch'' and ''/nobackup'' directories to just two simple commands.

   * ''/nobackup/shared/containers/vllm.0.19.1.sh''

<WRAP round box info half>
There is a corresponding ''.sh'' script for //each version// of the container image we make available.
</WRAP>

Just ''source'' this file and it will take care of loading ''apptainer'', setting up your ''bind'' directories and calling the ''exec'' command for you - and give you a single command called ''container.run'' (instead of the really long //apptainer exec// command) to then //run// anything you want inside the container, for example - to run the [[https://docs.vllm.ai/en/latest/examples/basic/offline_inference/|basic offline inference example]]:

<code lang=bash>
$ source /nobackup/shared/containers/vllm.0.19.1.sh
$ container.run python3 /opt/vllm/examples/basic/offline_inference/basic.py
</code>

**Note that the examples included with vLLM are all accessible under ''/opt/vllm/examples'' inside the container.**

----

===== Sample Sbatch Script =====

This is an **example** of what an ''sbatch'' job file may look like when running vLLM:

<code lang=bash>
#!/bin/bash

#SBATCH --account=comet_abcxyz
#SBATCH --partition=gpu-s_paid
#SBATCH -c 8
#SBATCH --mem=32G
#SBATCH --gres=gpu:L40:1

source /nobackup/shared/containers/vllm.0.19.1.sh
container.run python3 /opt/vllm/examples/basic/offline_inference/basic.py
</code>

Obviously adjust your //SBATCH// parameters to match the partition you want to use and the resources you intend to allocate to vLLM.

----
===== Accessing Data =====

As long as you use the ''container.run'' method to launch the applications, you will automatically be able to read and write to files in your ''$HOME'', ''/scratch'' and ''/nobackup'' directories **and** any Nvidia GPU cards that you assign via Slurm. 

If you run any of the applications inside the container manually, without using the ''container.run'' helper you will need to use the ''--bind'' argument to ''apptainer'' to ensure that all relevant directories are exposed within the container.

Do remember that the container filesystem itself cannot be changed - so you won't be able to write or update to ''/usr/local'', ''/opt'', ''/etc'' or any other internal folders - keep output directories restricted to the three areas listed above.

----
===== Building vLLM on Comet =====

<WRAP round box important>
**Important**

This section is only relevant to RSE HPC staff or users wanting to understand how the container image is built. If you are intending to simply //use// the software you **do not** need to read this section - turn back now!
</WRAP>

**Build Script:**

<code lang=bash title=vllm.build.sh>
#!/bin/bash

echo "Loading modules..."
module load apptainer

echo ""
echo "Building container..."
export APPTAINER_TMPDIR=/scratch

apptainer build vllm.0.19.1.sif vllm.def 2>&1 | tee vllm.log
</code>

**Container Definition:**
<code lang=bash title=vllm.def>
Bootstrap: docker
From: ubuntu:noble

%post
	# Prevent interactive prompts
	export DEBIAN_FRONTEND=noninteractive

	# Update & install only necessary packages
	apt-get update
    
	# Base stuff everything will need
	apt-get install -y aptitude wget zip git less vim python3-pip
	
	# Remove any downloaded package files - so they do not remain in the built image	
	apt-get clean

	mkdir -p /opt
	mkdir -p /src
	cd /src

	# Pytorch install
	pip install torch torchvision --break-system-packages

	# vLLM Install
	pip install vllm --break-system-packages

	# Optional newer transformers code
	pip install git+https://github.com/huggingface/transformers.git --break-system-packages

	# Source code of vllm - to get access to the example scripts
	cd /src
	wget https://github.com/vllm-project/vllm/archive/refs/tags/v0.19.1.tar.gz -O vllm-0.19.1.tgz
	tar -zxf vllm-0.19.1.tgz
	mv vllm-0.19.1/ /opt/vllm
	
	# Remove any temporary files used during build
	cd /
	rm -rf /src
	pip cache purge

%environment

%runscript
</code>

**Runtime Helper:**

You should ''source'' this file in order to use the ''container.run'' command. This should have the current container image name set as the ''IMAGE_NAME'' parameter:
<code lang=bash title=vllm.sh>
#!/bin/bash

module load apptainer

IMAGE_NAME=/nobackup/shared/containers/vllm.0.19.1.sif

container.run() {
	# Run a command inside the container...
	# automatically bind the /scratch and /nobackup dirs
	# pass through any additional parameters given on the command line
	apptainer exec --nv --bind /scratch:/scratch --bind /nobackup:/nobackup ${IMAGE_NAME} $@
}
</code>

----

[[:advanced:software|Back to Software]]