Microkernels: add arm64 bitcode. Test everywhere. (#13846)

This adds arm64 to the ukernels bitcode build, and, following #13825 ,
that is automatically picked up by `iree-compile` when the target is
arm64.

This generalizes the e2e matmul tests BUILD with data-tiling +
microkernels to cover relevant cases on both x86-64 and arm64.

This drops tags on that e2e matmul test, so it's now enabled everywhere.
It just uses generic (not fast) bitcode if we don't have dedicated fast
code for some architecture, but it runs everywhere.

Fixes #13804 .
diff --git a/build_tools/bazel/iree_bitcode_library.bzl b/build_tools/bazel/iree_bitcode_library.bzl
index 0a01299..89ad587 100644
--- a/build_tools/bazel/iree_bitcode_library.bzl
+++ b/build_tools/bazel/iree_bitcode_library.bzl
@@ -89,6 +89,9 @@
         # This must match what the runtime is built with.
         "-fno-short-wchar",
 
+        # Enable inline asm.
+        "-fasm",
+
         # Object file only in bitcode format:
         "-c",
         "-emit-llvm",