blob: 4f73dbd4b45b1f38c0363f5857515aaebee0ad3f [file] [log] [blame] [view]
---
hide:
- tags
tags:
- GPU
- CUDA
icon: simple/nvidia
---
# GPU deployment using CUDA
IREE can accelerate model execution on Nvidia GPUs using
[CUDA](https://developer.nvidia.com/cuda-toolkit).
## :octicons-download-16: Prerequisites
In order to use CUDA to drive the GPU, you need to have a functional CUDA
environment. It can be verified by the following steps:
``` shell
nvidia-smi | grep CUDA
```
If `nvidia-smi` does not exist, you will need to
[install the latest CUDA Toolkit SDK](https://developer.nvidia.com/cuda-downloads).
### Get the IREE compiler
#### :octicons-download-16: Download the compiler from a release
Python packages are distributed through multiple channels. See the
[Python Bindings](../../reference/bindings/python.md) page for more details.
The core [`iree-base-compiler`](https://pypi.org/project/iree-base-compiler/)
package includes the CUDA compiler:
--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-compiler-from-release.md"
#### :material-hammer-wrench: Build the compiler from source
Please make sure you have followed the
[Getting started](../../building-from-source/getting-started.md) page to build
the IREE compiler, then enable the CUDA compiler target with the
`IREE_TARGET_BACKEND_CUDA` option.
!!! tip
`iree-compile` will be built under the `iree-build/tools/` directory. You
may want to include this path in your system's `PATH` environment variable.
### Get the IREE runtime
Next you will need to get an IREE runtime that includes the CUDA HAL driver.
#### :octicons-download-16: Download the runtime from a release
Python packages are distributed through multiple channels. See the
[Python Bindings](../../reference/bindings/python.md) page for more details.
The core [`iree-base-runtime`](https://pypi.org/project/iree-base-runtime/)
package includes the CUDA HAL driver:
--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-runtime-from-release.md"
#### :material-hammer-wrench: Build the runtime from source
Please make sure you have followed the
[Getting started](../../building-from-source/getting-started.md) page to build
IREE from source, then enable the CUDA HAL driver with the
`IREE_HAL_DRIVER_CUDA` option.
#### :octicons-checklist-24: Check for CUDA devices
You can check for CUDA support by looking for a matching driver and device:
```console hl_lines="8"
--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-run-module-driver-list.md:2"
```
```console hl_lines="3"
--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-run-module-device-list-nvidia.md"
```
## Compile and run a program
With the requirements out of the way, we can now compile a model and run it.
### :octicons-file-code-16: Compile a program
--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-import-onnx-mobilenet.md"
Then run the following command to compile with the `cuda` target:
```shell hl_lines="2-3"
iree-compile \
--iree-hal-target-device=cuda \
--iree-cuda-target=<...> \
mobilenetv2.mlir -o mobilenet_cuda.vmfb
```
--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-optimization-options.md"
#### Choosing CUDA targets
Canonically a CUDA target (`iree-cuda-target`) matching the LLVM NVPTX
backend of the form `sm_<arch_number>` is needed to compile towards each GPU
architecture. If no architecture is specified then we will default to
`sm_60`.
Here is a table of commonly used architectures:
| CUDA GPU | Target Architecture | Architecture Code Name
| ------------------- | ------------------- | ----------------------
| NVIDIA P100 | `sm_60` | `pascal`
| NVIDIA V100 | `sm_70` | `volta`
| NVIDIA A100 | `sm_80` | `ampere`
| NVIDIA H100 | `sm_90` | `hopper`
| NVIDIA RTX20 series | `sm_75` | `turing`
| NVIDIA RTX30 series | `sm_86` | `ampere`
| NVIDIA RTX40 series | `sm_89` | `ada`
In addition to the canonical `sm_<arch_number>` scheme, `iree-cuda-target`
also supports two additonal schemes to make a better developer experience:
* Architecture code names like `volta` or `ampere`
* GPU product names like `a100` or `rtx3090`
These two schemes are translated into the canonical form under the hood.
We add support for common code/product names without aiming to be exhaustive.
If the ones you want are missing, please use the canonical form.
### :octicons-terminal-16: Run a compiled program
To run the compiled program:
``` shell hl_lines="2"
iree-run-module \
--device=cuda \
--module=mobilenet_cuda.vmfb \
--function=torch-jit-export \
--input="1x3x224x224xf32=0"
```
The above assumes the exported function in the model is named `torch-jit-export`
and it expects one 224x224 RGB image. We are feeding in an image with all 0
values here for brevity, see `iree-run-module --help` for the format to specify
concrete values.