Add document explaining the RISCV toolchain building process

Change-Id: I4bcfa519254f922117da0e9aab71581a3397a05a
diff --git a/BuildRiscVToolchain.md b/BuildRiscVToolchain.md
new file mode 100644
index 0000000..fef1138
--- /dev/null
+++ b/BuildRiscVToolchain.md
@@ -0,0 +1,179 @@
+# Build RISC-V Toolchain
+
+This doc lists the common config settings to build the RISC-V toolchain. It
+generally involves two parts of the toolchain: GCC to build the headers and
+libraries, and LLVM to build the compiler, linker, and utility tools.
+
+## Prerequisites
+Your host machine needs to have the following packages installed, which are
+already part of the Shodan prerequisite pacakges
+
+* CMake (>= 3.13.4)
+* Ninja
+* Clang
+
+The source code of the toolchain is at
+* [riscv-gnu-toolchain](https://github.com/riscv/riscv-gnu-toolchain): Checkout
+the latest release tag, and checkout the submodule `riscv-binutils` with the
+`rvv-1.0.x-zfh` branch.
+* [llvm-project](https://github.com/llvm/llvm-project): Checkout the latest green
+commit
+
+## Build RISC-V Linux toolchain (64-bit)
+
+### Build GCC:
+```
+$ mkdir -p <GCC_BUILD_PATH>
+$ cd <GCC_BUILD_PATH>
+$ <GCC_SRC_PATH>/configure \
+  --srcdir=<GCC_SRC_PATH> \
+  --prefix=<TOOLCHAIN_OUT_DIR> \
+  --with-arch=rv64gc \
+  --with-abi=lp64d \
+  --with-cmodel=medany
+$ make -C <GCC_BUILD_PATH> linux
+```
+Notice Linux requires the full general CPU extension support, i.e., rv64imafdc,
+and the ABI also needs to support hard double-float modules. For 32-bit Linux,
+build the toolchain with the flags of `--with-arch=rv32gc --with-abi=ilp32d`.
+
+### Build LLVM:
+```
+$ cmake -B <LLVM_BUILD_PATH> \
+		-DCMAKE_INSTALL_PREFIX=<TOOLCHAIN_OUT_DIR> \
+		-DCMAKE_C_COMPILER=clang  -DCMAKE_CXX_COMPILER=clang++ \
+		-DCMAKE_BUILD_TYPE=Release \
+		-DLLVM_TARGETS_TO_BUILD="RISCV" \
+		-DLLVM_ENABLE_PROJECTS="clang"  \
+		-DLLVM_DEFAULT_TARGET_TRIPLE="riscv64-unknown-linux-gnu" \
+		-DLLVM_INSTALL_TOOLCHAIN_ONLY=On \
+		-DDEFAULT_SYSROOT=../sysroot \
+		-G Ninja \
+		<LLVM_SRC_PATH>/llvm
+$ cmake --build  <LLVM_BUILD_PATH> --target install
+```
+For 32-bit, change the LLVM target triple to `riscv32-unknown-linux-gnu`.
+
+## Build RISC-V bare-metal toolchain (32-bit)
+
+### Build GCC:
+```
+$ mkdir -p <GCC_BUILD_PATH>
+$ cd <GCC_BUILD_PATH>
+$ <GCC_SRC_PATH>/configure \
+  --srcdir=<GCC_SRC_PATH> \
+  --prefix=<TOOLCHAIN_OUT_DIR> \
+  --with-arch=rv32gc \
+  --with-abi=ilp32 \
+  --with-cmodel=medany
+$ make -C <GCC_BUILD_PATH> newlib
+```
+Notice for bare-metal newlib there's no hard constraints on CPU feature and ABI
+support. However, LLVM for bare-metal only supports soft-float modules, so the
+GCC ABI setting needs to match that.
+
+### Build LLVM:
+```
+$ cmake -B <LLVM_BUILD_PATH> \
+		-DCMAKE_INSTALL_PREFIX=<TOOLCHAIN_OUT_DIR> \
+		-DCMAKE_C_COMPILER=clang  -DCMAKE_CXX_COMPILER=clang++ \
+		-DCMAKE_BUILD_TYPE=Release \
+		-DLLVM_TARGETS_TO_BUILD="RISCV" \
+		-DLLVM_ENABLE_PROJECTS="clang"  \
+		-DLLVM_DEFAULT_TARGET_TRIPLE="riscv32-unknown-elf" \
+		-DLLVM_INSTALL_TOOLCHAIN_ONLY=On \
+		-DDEFAULT_SYSROOT=../riscv32-unknown-elf \
+		-G Ninja \
+		<LLVM_SRC_PATH>/llvm
+$ cmake --build  <LLVM_BUILD_PATH> --target install
+```
+#### Build compiler-rt
+
+This should not be necessary for the Shodan usage, but in case the compiler-rt
+builtins is required in the project, it can be built with the additional commands
+
+```
+$ export PATH=<TOOLCHAIN_OUT_DIR>/bin:${PATH}
+$ cmake -B <LLVM_BUILD_PATH>/compiler-rt
+  -DCMAKE_INSTALL_PREFIX=$PREFIX \
+  -DCMAKE_TRY_COMPILE_TARGET_TYPE=STATIC_LIBRARY \
+  -DCMAKE_AR=<TOOLCHAIN_OUT_DIR>/bin/llvm-ar \
+  -DCMAKE_NM=<TOOLCHAIN_OUT_DIR>/bin/llvm-nm \
+  -DCMAKE_RANLIB=<TOOLCHAIN_OUT_DIR>/bin/llvm-ranlib \
+  -DCMAKE_C_FLAGS="-march=rv32gc" \
+  -DCMAKE_ASM_FLAGS="-march=rv32gc" \
+  -DCMAKE_C_COMPILER=<TOOLCHAIN_OUT_DIR>/bin/clang \
+  -DCMAKE_C_COMPILER_TARGET=riscv32-unknown-elf \
+  -DCMAKE_ASM_COMPILER_TARGET=riscv32-unknown-elf \
+  -DCOMPILER_RT_OS_DIR="clang/13.0.0/lib" \
+  -DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=lld" \
+  -DCOMPILER_RT_BUILD_BUILTINS=ON \
+  -DCOMPILER_RT_BUILD_SANITIZERS=OFF \
+  -DCOMPILER_RT_BUILD_XRAY=OFF \
+  -DCOMPILER_RT_BUILD_LIBFUZZER=OFF \
+  -DCOMPILER_RT_BUILD_MEMPROF=OFF \
+  -DCOMPILER_RT_BUILD_PROFILE=OFF \
+  -DCOMPILER_RT_BAREMETAL_BUILD=ON \
+  -DCOMPILER_RT_DEFAULT_TARGET_ONLY=ON \
+  -DLLVM_CONFIG_PATH=<LLVM_BUILD_PATH>/bin/llvm-config \
+  -DCMAKE_C_FLAGS="-march=rv32gc -mno-relax" \
+  -DCMAKE_ASM_FLAGS="-march=gv32gc -mno-relax" \
+  -G "Ninja" <LLVM_SRC_PATH>/compiler-rt
+$ cmake --build <LLVM_BUILD_PATH>/compiler-rt --target install
+```
+
+### Build newlib
+
+The source code is at https://github.com/riscv/riscv-newlib
+
+***NOTE: The GCC utility tools needs to be built first.***
+
+```
+$ mkdir -p <NEWLIB_BUILD_PATH>
+$ cd <NEWLIB_BUILD_PATH>
+$ <NEWLIB_SRC_PATH>/configure \
+  --target=riscv32-unknown-elf \
+  --prefix=<TOOLCHAIN_OUT_DIR> \
+  --enable-newlib-io-long-double \
+  --enable-newlib-io-long-long \
+  --enable-newlib-io-c99-formats \
+  --enable-newlib-register-fini \
+  CC_FOR_TARGET=clang \
+  CXX_FOR_TARGET=clang++ \
+  CFLAGS_FOR_TARGET="-march=rv32gc -O2 -D_POSIX_MODE -mno-relax" \
+  CXXFLAGS_FOR_TARGET="-march=rv32gc -O2 -D_POSIX_MODE -mno-relax"
+$ make -j32
+$ make install
+```
+## Test toolchain
+
+Run
+```
+<TOOLCHAIN_OUT_DIR>/bin/<arch>-<os>-<abi>-gcc -v
+```
+to see the supported ABIs, architectures, library paths, etc.
+
+Try to compile a simple c code (copied from CMake's package content, e.g.,
+`/usr/share/cmake-3.18/Modules/CMakeTestCCompiler.cmake`)
+
+```
+#ifdef __cplusplus
+# error "The CMAKE_C_COMPILER is set to a C++ compiler"
+#endif
+#if defined(__CLASSIC_C__)
+int main(argc, argv)
+  int argc;
+  char* argv[];
+#else
+int main(int argc, char* argv[])
+#endif
+{ (void)argv; return argc-1;}
+```
+To build (32-bit bare-metal example)
+
+```
+$ <TOOLCHAIN_OUT_DIR>/bin/clang -c testCCompiler.c -O --target=riscv32
+$ <TOOLCHAIN_OUT_DIR>/bin/riscv32-unknown-elf-gcc testCCompiler.o -o testCCompiler -march=rv32gc -mabi=ilp32
+```
+You can also use `readelf` to inspect the object file before building the binary
+to see if the architecture and ABI match.