docs/website/docs/guides/deployment-configurations/gpu-cuda.md - 3p/openxla/iree - Git at Google

 ---
 hide:
   - tags
 tags:
   - GPU
   - CUDA
 icon: simple/nvidia
 ---

 # GPU deployment using CUDA

 IREE can accelerate model execution on Nvidia GPUs using
 [CUDA](https://developer.nvidia.com/cuda-toolkit).

 ## :octicons-download-16: Prerequisites

 In order to use CUDA to drive the GPU, you need to have a functional CUDA
 environment. It can be verified by the following steps:

 ``` shell
 nvidia-smi | grep CUDA
 ```

 If `nvidia-smi` does not exist, you will need to
 [install the latest CUDA Toolkit SDK](https://developer.nvidia.com/cuda-downloads).

 ### Get the IREE compiler

 #### :octicons-download-16: Download the compiler from a release

 Python packages are distributed through multiple channels. See the
 [Python Bindings](../../reference/bindings/python.md) page for more details.
 The core [`iree-base-compiler`](https://pypi.org/project/iree-base-compiler/)
 package includes the CUDA compiler:

 --8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-compiler-from-release.md"

 #### :material-hammer-wrench: Build the compiler from source

 Please make sure you have followed the
 [Getting started](../../building-from-source/getting-started.md) page to build
 the IREE compiler, then enable the CUDA compiler target with the
 `IREE_TARGET_BACKEND_CUDA` option.

 !!! tip
     `iree-compile` will be built under the `iree-build/tools/` directory. You
     may want to include this path in your system's `PATH` environment variable.

 ### Get the IREE runtime

 Next you will need to get an IREE runtime that includes the CUDA HAL driver.

 #### :octicons-download-16: Download the runtime from a release

 Python packages are distributed through multiple channels. See the
 [Python Bindings](../../reference/bindings/python.md) page for more details.
 The core [`iree-base-runtime`](https://pypi.org/project/iree-base-runtime/)
 package includes the CUDA HAL driver:

 --8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-runtime-from-release.md"

 #### :material-hammer-wrench: Build the runtime from source

 Please make sure you have followed the
 [Getting started](../../building-from-source/getting-started.md) page to build
 IREE from source, then enable the CUDA HAL driver with the
 `IREE_HAL_DRIVER_CUDA` option.

 #### :octicons-checklist-24: Check for CUDA devices

 You can check for CUDA support by looking for a matching driver and device:

 ```console hl_lines="8"
 --8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-run-module-driver-list.md:2"
 ```

 ```console hl_lines="3"
 --8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-run-module-device-list-nvidia.md"
 ```

 ## Compile and run a program

 With the requirements out of the way, we can now compile a model and run it.

 ### :octicons-file-code-16: Compile a program

 --8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-import-onnx-mobilenet.md"

 Then run the following command to compile with the `cuda` target:

 ```shell hl_lines="2-3"
 iree-compile \
     --iree-hal-target-device=cuda \
     --iree-cuda-target=<...> \
     mobilenetv2.mlir -o mobilenet_cuda.vmfb
 ```

 --8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-optimization-options.md"

 #### Choosing CUDA targets

 Canonically a CUDA target (`iree-cuda-target`) matching the LLVM NVPTX
 backend of the form `sm_<arch_number>` is needed to compile towards each GPU
 architecture. If no architecture is specified then we will default to
 `sm_60`.

 Here is a table of commonly used architectures:

 | CUDA GPU            | Target Architecture | Architecture Code Name
 | ------------------- | ------------------- | ----------------------
 | NVIDIA P100         | `sm_60`             | `pascal`
 | NVIDIA V100         | `sm_70`             | `volta`
 | NVIDIA A100         | `sm_80`             | `ampere`
 | NVIDIA H100         | `sm_90`             | `hopper`
 | NVIDIA RTX20 series | `sm_75`             | `turing`
 | NVIDIA RTX30 series | `sm_86`             | `ampere`
 | NVIDIA RTX40 series | `sm_89`             | `ada`

 In addition to the canonical `sm_<arch_number>` scheme, `iree-cuda-target`
 also supports two additonal schemes to make a better developer experience:

 * Architecture code names like `volta` or `ampere`
 * GPU product names like `a100` or `rtx3090`

 These two schemes are translated into the canonical form under the hood.
 We add support for common code/product names without aiming to be exhaustive.
 If the ones you want are missing, please use the canonical form.

 ### :octicons-terminal-16: Run a compiled program

 To run the compiled program:

 ``` shell hl_lines="2"
 iree-run-module \
     --device=cuda \
     --module=mobilenet_cuda.vmfb \
     --function=torch-jit-export \
     --input="1x3x224x224xf32=0"
 ```

 The above assumes the exported function in the model is named `torch-jit-export`
 and it expects one 224x224 RGB image. We are feeding in an image with all 0
 values here for brevity, see `iree-run-module --help` for the format to specify
 concrete values.
	---
	hide:
	- tags
	tags:
	- GPU
	- CUDA
	icon: simple/nvidia
	---

	# GPU deployment using CUDA

	IREE can accelerate model execution on Nvidia GPUs using
	[CUDA](https://developer.nvidia.com/cuda-toolkit).

	## :octicons-download-16: Prerequisites

	In order to use CUDA to drive the GPU, you need to have a functional CUDA
	environment. It can be verified by the following steps:

	``` shell
	nvidia-smi \| grep CUDA
	```

	If `nvidia-smi` does not exist, you will need to
	[install the latest CUDA Toolkit SDK](https://developer.nvidia.com/cuda-downloads).

	### Get the IREE compiler

	#### :octicons-download-16: Download the compiler from a release

	Python packages are distributed through multiple channels. See the
	[Python Bindings](../../reference/bindings/python.md) page for more details.
	The core [`iree-base-compiler`](https://pypi.org/project/iree-base-compiler/)
	package includes the CUDA compiler:

	--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-compiler-from-release.md"

	#### :material-hammer-wrench: Build the compiler from source

	Please make sure you have followed the
	[Getting started](../../building-from-source/getting-started.md) page to build
	the IREE compiler, then enable the CUDA compiler target with the
	`IREE_TARGET_BACKEND_CUDA` option.

	!!! tip
	`iree-compile` will be built under the `iree-build/tools/` directory. You
	may want to include this path in your system's `PATH` environment variable.

	### Get the IREE runtime

	Next you will need to get an IREE runtime that includes the CUDA HAL driver.

	#### :octicons-download-16: Download the runtime from a release

	Python packages are distributed through multiple channels. See the
	[Python Bindings](../../reference/bindings/python.md) page for more details.
	The core [`iree-base-runtime`](https://pypi.org/project/iree-base-runtime/)
	package includes the CUDA HAL driver:

	--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-runtime-from-release.md"

	#### :material-hammer-wrench: Build the runtime from source

	Please make sure you have followed the
	[Getting started](../../building-from-source/getting-started.md) page to build
	IREE from source, then enable the CUDA HAL driver with the
	`IREE_HAL_DRIVER_CUDA` option.

	#### :octicons-checklist-24: Check for CUDA devices

	You can check for CUDA support by looking for a matching driver and device:

	```console hl_lines="8"
	--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-run-module-driver-list.md:2"
	```

	```console hl_lines="3"
	--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-run-module-device-list-nvidia.md"
	```

	## Compile and run a program

	With the requirements out of the way, we can now compile a model and run it.

	### :octicons-file-code-16: Compile a program

	--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-import-onnx-mobilenet.md"

	Then run the following command to compile with the `cuda` target:

	```shell hl_lines="2-3"
	iree-compile \
	--iree-hal-target-device=cuda \
	--iree-cuda-target=<...> \
	mobilenetv2.mlir -o mobilenet_cuda.vmfb
	```

	--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-optimization-options.md"

	#### Choosing CUDA targets

	Canonically a CUDA target (`iree-cuda-target`) matching the LLVM NVPTX
	backend of the form `sm_<arch_number>` is needed to compile towards each GPU
	architecture. If no architecture is specified then we will default to
	`sm_60`.

	Here is a table of commonly used architectures:

	\| CUDA GPU \| Target Architecture \| Architecture Code Name
	\| ------------------- \| ------------------- \| ----------------------
	\| NVIDIA P100 \| `sm_60` \| `pascal`
	\| NVIDIA V100 \| `sm_70` \| `volta`
	\| NVIDIA A100 \| `sm_80` \| `ampere`
	\| NVIDIA H100 \| `sm_90` \| `hopper`
	\| NVIDIA RTX20 series \| `sm_75` \| `turing`
	\| NVIDIA RTX30 series \| `sm_86` \| `ampere`
	\| NVIDIA RTX40 series \| `sm_89` \| `ada`

	In addition to the canonical `sm_<arch_number>` scheme, `iree-cuda-target`
	also supports two additonal schemes to make a better developer experience:

	* Architecture code names like `volta` or `ampere`
	* GPU product names like `a100` or `rtx3090`

	These two schemes are translated into the canonical form under the hood.
	We add support for common code/product names without aiming to be exhaustive.
	If the ones you want are missing, please use the canonical form.

	### :octicons-terminal-16: Run a compiled program

	To run the compiled program:

	``` shell hl_lines="2"
	iree-run-module \
	--device=cuda \
	--module=mobilenet_cuda.vmfb \
	--function=torch-jit-export \
	--input="1x3x224x224xf32=0"
	```

	The above assumes the exported function in the model is named `torch-jit-export`
	and it expects one 224x224 RGB image. We are feeding in an image with all 0
	values here for brevity, see `iree-run-module --help` for the format to specify
	concrete values.