Port iree.runtime to nanobind. (#14214)

I believe that this should be a no-op for users. There is one minor API
change (the MappedMemory class no longer implements the buffer protocol,
but I've seen no evidence that this was actually used since it was a
less functional way to get a host ndarray).

More adventurous use of nanobind is possible in the future (i.e. using
`ndarray` and `dlpack` interop for sharing across frameworks), but using
that will most likely necessitate API changes, which I was working to
avoid.

Aside from relatively mechanical differences from pybind11, the main
issues were that the buffer protocol and array support was dropped in
nanobind. This required some direct coding against the C API to achieve
the same characteristics. I think this is actually an improvement as the
pybind11 implementations of these features was neither efficient nor
obvious what it was doing.

A build time dependency on `nanobind` is added. When building Python
wheels, this gets satisfied automatically. Otherwise, the docker images
have been updated to pre-install the necessary Python package. In
addition, there is now a build time dependency on NumPy headers, which
should already be installed (pybind11 vendored stripped down copies of
these headers in an effort to avoid this, but I opted to just do the
normal thing).

Nanobind's performance is [quite
compelling](https://nanobind.readthedocs.io/en/latest/benchmark.html)
and owes to a combination of favoring more efficient binding styles that
would basically be a rewrite in pybind11 and exclusive use of the new
Python 3.8+ vectorcall ABI. Since the runtime is performance critical
and the cost of Python calls is already quite visibly adding overhead on
traces, it makes sense to baseline on the most efficient implementation.
In addition, the compile-time savings seem to be real and the build is
noticeably faster (this was not a primary consideration, just a nice
bonus).
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 68e1f0f..d65eb3b 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -616,8 +616,9 @@
   # "Bootstrapping" by first looking for the optional Development component
   # seems to be robust generally.
   # See: https://reviews.llvm.org/D118148
-  find_package(Python3 COMPONENTS Interpreter Development)
-  find_package(Python3 COMPONENTS Interpreter Development.Module REQUIRED)
+  # If building Python packages, we have a hard requirement on 3.8+.
+  find_package(Python3 3.8 COMPONENTS Interpreter Development)
+  find_package(Python3 3.8 COMPONENTS Interpreter Development.Module REQUIRED)
 elseif(IREE_BUILD_COMPILER OR IREE_BUILD_TESTS)
   find_package(Python3 COMPONENTS Interpreter REQUIRED)
 endif()
@@ -879,11 +880,14 @@
 endif()
 
 if(IREE_BUILD_PYTHON_BINDINGS)
-  if(NOT TARGET pybind11::module)
-    message(STATUS "Using bundled pybind11")
-    add_subdirectory(third_party/pybind11 EXCLUDE_FROM_ALL)
-  else()
-    message(STATUS "Not including bundled pybind11 (already configured)")
+  # The compiler uses pybind11
+  if(IREE_BUILD_COMPILER)
+    if(NOT TARGET pybind11::module)
+      message(STATUS "Using bundled pybind11")
+      add_subdirectory(third_party/pybind11 EXCLUDE_FROM_ALL)
+    else()
+      message(STATUS "Not including bundled pybind11 (already configured)")
+    endif()
   endif()
 endif()