doc/tutorials/writing_kelvin_programs.md - hw/kelvin - Git at Google

 # Writing a Kelvin Program

 This tutorial introduces the basics of writing a Kelvin program. You will:

 1) Learn the basic structure of a Kelvin program.
 2) Write and compile a basic program.
 3) Test your program with a cocotb test bench.

 ## Prerequistes

 This tutorial assumes you have completed opensecura [getting started guide](https://opensecura.googlesource.com/docs/+/refs/heads/master/GettingStarted.md).

 ## Writing a basic Kelvin program

 Open up [`tests/cocotb/tutorial/program.cc`](../../tests/cocotb/tutorial/program.cc),
 which is a skeleton program:

 ```c++
 // TODO: Add two inputs buffers of 8 uint32_t's (input1_buffer, input2_buffer)
 // TODO: Add one output buffer of 8 uint32_t's (output_buffer)

 int main(int argc, char** argv) {
   // TODO: Add code to element wise add/subtract from input1_buffer and
   // input2_buffer and store the result to output_buffer.

   return 0;
 }
 ```

 The typical structure of a Kelvin program includes:

 1) Input buffers, to store the inputs to the computation you want to perform.
    For this tutorial, we will assume the host core will write data to Kelvin's
    DTCM before the program executes.
 2) Output buffers, for Kelvin to store the result of computation. Similar to
    the input buffers, we'll assume that Kelvin will write to a location in it's
    DTCM to be read by the host processor after it completes.
 3) The actual computation to be performed.

 ### Defining Input and Output Buffers

 For this tutorial we'll accept two input buffers and emit one output buffer,
 each consisting of 8 uint32_t. We define them outside of `main`.
 __attribute__((section(".data"))) defines buffer is stored in data section.

 ```c++
 uint32_t input1_buffer[8] __attribute__((section(".data")));
 uint32_t input2_buffer[8] __attribute__((section(".data")));
 uint32_t output_buffer[8] __attribute__((section(".data")));

 int main(int argc, char** argv) {
   // TODO: Add code to element wise add/subtract from input1_buffer and
   // input2_buffer and store the result to output_buffer.

   return 0;
 }
 ```

 For this tutorial, we do not need to define the precise locations of these
 buffers. Our linker script will allocate them in DTCM and we'll query their
 locations in our test bench.

 ### Defining Computation

 As a simple example, let's add element-wise the elements from `input1_buffer`
 to `input2_buffer`:

 ```c++
 uint32_t input1_buffer[8] __attribute__((section(".data")));
 uint32_t input2_buffer[8] __attribute__((section(".data")));
 uint32_t output_buffer[8] __attribute__((section(".data")));

 int main(int argc, char** argv) {
   for (int i = 0; i < 8; i++) {
     output_buffer[i] = input1_buffer[i] + input2_buffer[i];
   }
   return 0;
 }
 ```

 The core will halt when returning from `main`.

 ### Compiling the program

 Run `bazel build tests/cocotb/tutorial:kelvin_v2_program`
 to generate `kelvin_v2_program.elf`.

 ## Creating the test bench

 Open up [`tests/cocotb/tutorial/tutorial.py`](../../tests/cocotb/tutorial/tutorial.py)
 which contains the skeleton testbench:

 ```python
 @cocotb.test()
 async def core_mini_axi_tutorial(dut):
     """Testbench to run your Kelvin program."""
     # Test bench setup
     core_mini_axi = CoreMiniAxiInterface(dut)
     await core_mini_axi.init()
     await core_mini_axi.reset()
     cocotb.start_soon(core_mini_axi.clock.start())
 ```

 First, we need to program ITCM with your program. A `load_elf` function is
 provided to copy all loadable sections into memory. Add the following to
 `core_mini_axi_tutorial`:

 ```diff
 @cocotb.test()
 async def core_mini_axi_tutorial(dut):
     """Testbench to run your Kelvin program."""
     # Test bench setup
     core_mini_axi = CoreMiniAxiInterface(dut)
     await core_mini_axi.init()
     await core_mini_axi.reset()
     cocotb.start_soon(core_mini_axi.clock.start())

 +   r = runfiles.Create()
 +   elf_path = r.Rlocation(
 +       "kelvin_hw/tests/cocotb/tutorial/kelvin_v2_program.elf")
 +   with open(elf_path, "rb") as f:
 +     entry_point = await core_mini_axi.load_elf(f)
 ```

 Before we start the program, let's also write inputs into DTCM. We can
 determine the location of a buffer using `lookup_symbol` and write to DTCM with
 `write`:


 ```diff
 @cocotb.test()
 async def core_mini_axi_tutorial(dut):
     """Testbench to run your Kelvin program."""
     # Test bench setup
     core_mini_axi = CoreMiniAxiInterface(dut)
     await core_mini_axi.init()
     await core_mini_axi.reset()
     cocotb.start_soon(core_mini_axi.clock.start())

     r = runfiles.Create()
     elf_path = r.Rlocation(
         "kelvin_hw/tests/cocotb/tutorial/kelvin_v2_program.elf")
     with open(elf_path, "rb") as f:
       entry_point = await core_mini_axi.load_elf(f)
 +     inputs1_addr = core_mini_axi.lookup_symbol(f, "input1_buffer")
 +     inputs2_addr = core_mini_axi.lookup_symbol(f, "input2_buffer")

 +   input1_data = np.arange(8, dtype=np.uint32)
 +   input2_data = 8994 * np.ones(8, dtype=np.uint32)
 +   await core_mini_axi.write(inputs1_addr, input1_data)
 +   await core_mini_axi.write(inputs2_addr, input2_data)
 ```

 Now that input data has been written, let's actually run the program! Use
 `execute_from` to start the program on Kelvin. Once it's running, wait for the
 core to halt, so we know it's done work and we can read the result:

 ```diff
 @cocotb.test()
 async def core_mini_axi_tutorial(dut):
     """Testbench to run your Kelvin program."""
     # Test bench setup
     core_mini_axi = CoreMiniAxiInterface(dut)
     await core_mini_axi.init()
     await core_mini_axi.reset()
     cocotb.start_soon(core_mini_axi.clock.start())

     r = runfiles.Create()
     elf_path = r.Rlocation(
         "kelvin_hw/tests/cocotb/tutorial/kelvin_v2_program.elf")
     with open(elf_path, "rb") as f:
       entry_point = await core_mini_axi.load_elf(f)
       inputs1_addr = core_mini_axi.lookup_symbol(f, "input1_buffer")
       inputs2_addr = core_mini_axi.lookup_symbol(f, "input2_buffer")

     input1_data = np.arange(8, dtype=np.uint32)
     input2_data = 8994 * np.ones(8, dtype=np.uint32)
     await core_mini_axi.write(inputs1_addr, input1_data)
     await core_mini_axi.write(inputs2_addr, input2_data)

 +   await core_mini_axi.execute_from(entry_point)
 +   await core_mini_axi.wait_for_halted()
 ```

 Finally, let's `read` and print the result:

 ```diff
 async def core_mini_axi_tutorial(dut):
     """Testbench to run your Kelvin program."""
     # Test bench setup
     core_mini_axi = CoreMiniAxiInterface(dut)
     await core_mini_axi.init()
     await core_mini_axi.reset()
     cocotb.start_soon(core_mini_axi.clock.start())

     r = runfiles.Create()
     elf_path = r.Rlocation(
         "kelvin_hw/tests/cocotb/tutorial/kelvin_v2_program.elf")
     with open(elf_path, "rb") as f:
       entry_point = await core_mini_axi.load_elf(f)
       inputs1_addr = core_mini_axi.lookup_symbol(f, "input1_buffer")
       inputs2_addr = core_mini_axi.lookup_symbol(f, "input2_buffer")

     input1_data = np.arange(8, dtype=np.uint32)
     input2_data = 8994 * np.ones(8, dtype=np.uint32)
     await core_mini_axi.write(inputs1_addr, input1_data)
     await core_mini_axi.write(inputs2_addr, input2_data)
     await core_mini_axi.execute_from(entry_point)
     await core_mini_axi.wait_for_halted()

 +   rdata = (await core_mini_axi.read(outputs_addr, 4 * 8)).view(np.uint32)
 +   print(f"I got {rdata}")
 ```

 ## Running the test bench

 You can run the test bench with:

 ```bash
 bazel run //tests/cocotb/tutorial:tutorial
 ```

 You should see the following in the console output:

 ```bash
 I got [8994 8995 8996 8997 8998 8999 9000 9001]
 ```

 Congratulations on running your first program!

 ## Next steps

 Follow up tutorials will cover accelerating Kelvin with RISC-V Vector
 intrinsics.
	# Writing a Kelvin Program

	This tutorial introduces the basics of writing a Kelvin program. You will:

	1) Learn the basic structure of a Kelvin program.
	2) Write and compile a basic program.
	3) Test your program with a cocotb test bench.

	## Prerequistes

	This tutorial assumes you have completed opensecura [getting started guide](https://opensecura.googlesource.com/docs/+/refs/heads/master/GettingStarted.md).

	## Writing a basic Kelvin program

	Open up [`tests/cocotb/tutorial/program.cc`](../../tests/cocotb/tutorial/program.cc),
	which is a skeleton program:

	```c++
	// TODO: Add two inputs buffers of 8 uint32_t's (input1_buffer, input2_buffer)
	// TODO: Add one output buffer of 8 uint32_t's (output_buffer)

	int main(int argc, char** argv) {
	// TODO: Add code to element wise add/subtract from input1_buffer and
	// input2_buffer and store the result to output_buffer.

	return 0;
	}
	```

	The typical structure of a Kelvin program includes:

	1) Input buffers, to store the inputs to the computation you want to perform.
	For this tutorial, we will assume the host core will write data to Kelvin's
	DTCM before the program executes.
	2) Output buffers, for Kelvin to store the result of computation. Similar to
	the input buffers, we'll assume that Kelvin will write to a location in it's
	DTCM to be read by the host processor after it completes.
	3) The actual computation to be performed.

	### Defining Input and Output Buffers

	For this tutorial we'll accept two input buffers and emit one output buffer,
	each consisting of 8 uint32_t. We define them outside of `main`.
	__attribute__((section(".data"))) defines buffer is stored in data section.

	```c++
	uint32_t input1_buffer[8] __attribute__((section(".data")));
	uint32_t input2_buffer[8] __attribute__((section(".data")));
	uint32_t output_buffer[8] __attribute__((section(".data")));

	int main(int argc, char** argv) {
	// TODO: Add code to element wise add/subtract from input1_buffer and
	// input2_buffer and store the result to output_buffer.

	return 0;
	}
	```

	For this tutorial, we do not need to define the precise locations of these
	buffers. Our linker script will allocate them in DTCM and we'll query their
	locations in our test bench.

	### Defining Computation

	As a simple example, let's add element-wise the elements from `input1_buffer`
	to `input2_buffer`:

	```c++
	uint32_t input1_buffer[8] __attribute__((section(".data")));
	uint32_t input2_buffer[8] __attribute__((section(".data")));
	uint32_t output_buffer[8] __attribute__((section(".data")));

	int main(int argc, char** argv) {
	for (int i = 0; i < 8; i++) {
	output_buffer[i] = input1_buffer[i] + input2_buffer[i];
	}
	return 0;
	}
	```

	The core will halt when returning from `main`.

	### Compiling the program

	Run `bazel build tests/cocotb/tutorial:kelvin_v2_program`
	to generate `kelvin_v2_program.elf`.

	## Creating the test bench

	Open up [`tests/cocotb/tutorial/tutorial.py`](../../tests/cocotb/tutorial/tutorial.py)
	which contains the skeleton testbench:

	```python
	@cocotb.test()
	async def core_mini_axi_tutorial(dut):
	"""Testbench to run your Kelvin program."""
	# Test bench setup
	core_mini_axi = CoreMiniAxiInterface(dut)
	await core_mini_axi.init()
	await core_mini_axi.reset()
	cocotb.start_soon(core_mini_axi.clock.start())
	```

	First, we need to program ITCM with your program. A `load_elf` function is
	provided to copy all loadable sections into memory. Add the following to
	`core_mini_axi_tutorial`:

	```diff
	@cocotb.test()
	async def core_mini_axi_tutorial(dut):
	"""Testbench to run your Kelvin program."""
	# Test bench setup
	core_mini_axi = CoreMiniAxiInterface(dut)
	await core_mini_axi.init()
	await core_mini_axi.reset()
	cocotb.start_soon(core_mini_axi.clock.start())

	+ r = runfiles.Create()
	+ elf_path = r.Rlocation(
	+ "kelvin_hw/tests/cocotb/tutorial/kelvin_v2_program.elf")
	+ with open(elf_path, "rb") as f:
	+ entry_point = await core_mini_axi.load_elf(f)
	```

	Before we start the program, let's also write inputs into DTCM. We can
	determine the location of a buffer using `lookup_symbol` and write to DTCM with
	`write`:


	```diff
	@cocotb.test()
	async def core_mini_axi_tutorial(dut):
	"""Testbench to run your Kelvin program."""
	# Test bench setup
	core_mini_axi = CoreMiniAxiInterface(dut)
	await core_mini_axi.init()
	await core_mini_axi.reset()
	cocotb.start_soon(core_mini_axi.clock.start())

	r = runfiles.Create()
	elf_path = r.Rlocation(
	"kelvin_hw/tests/cocotb/tutorial/kelvin_v2_program.elf")
	with open(elf_path, "rb") as f:
	entry_point = await core_mini_axi.load_elf(f)
	+ inputs1_addr = core_mini_axi.lookup_symbol(f, "input1_buffer")
	+ inputs2_addr = core_mini_axi.lookup_symbol(f, "input2_buffer")

	+ input1_data = np.arange(8, dtype=np.uint32)
	+ input2_data = 8994 * np.ones(8, dtype=np.uint32)
	+ await core_mini_axi.write(inputs1_addr, input1_data)
	+ await core_mini_axi.write(inputs2_addr, input2_data)
	```

	Now that input data has been written, let's actually run the program! Use
	`execute_from` to start the program on Kelvin. Once it's running, wait for the
	core to halt, so we know it's done work and we can read the result:

	```diff
	@cocotb.test()
	async def core_mini_axi_tutorial(dut):
	"""Testbench to run your Kelvin program."""
	# Test bench setup
	core_mini_axi = CoreMiniAxiInterface(dut)
	await core_mini_axi.init()
	await core_mini_axi.reset()
	cocotb.start_soon(core_mini_axi.clock.start())

	r = runfiles.Create()
	elf_path = r.Rlocation(
	"kelvin_hw/tests/cocotb/tutorial/kelvin_v2_program.elf")
	with open(elf_path, "rb") as f:
	entry_point = await core_mini_axi.load_elf(f)
	inputs1_addr = core_mini_axi.lookup_symbol(f, "input1_buffer")
	inputs2_addr = core_mini_axi.lookup_symbol(f, "input2_buffer")

	input1_data = np.arange(8, dtype=np.uint32)
	input2_data = 8994 * np.ones(8, dtype=np.uint32)
	await core_mini_axi.write(inputs1_addr, input1_data)
	await core_mini_axi.write(inputs2_addr, input2_data)

	+ await core_mini_axi.execute_from(entry_point)
	+ await core_mini_axi.wait_for_halted()
	```

	Finally, let's `read` and print the result:

	```diff
	async def core_mini_axi_tutorial(dut):
	"""Testbench to run your Kelvin program."""
	# Test bench setup
	core_mini_axi = CoreMiniAxiInterface(dut)
	await core_mini_axi.init()
	await core_mini_axi.reset()
	cocotb.start_soon(core_mini_axi.clock.start())

	r = runfiles.Create()
	elf_path = r.Rlocation(
	"kelvin_hw/tests/cocotb/tutorial/kelvin_v2_program.elf")
	with open(elf_path, "rb") as f:
	entry_point = await core_mini_axi.load_elf(f)
	inputs1_addr = core_mini_axi.lookup_symbol(f, "input1_buffer")
	inputs2_addr = core_mini_axi.lookup_symbol(f, "input2_buffer")

	input1_data = np.arange(8, dtype=np.uint32)
	input2_data = 8994 * np.ones(8, dtype=np.uint32)
	await core_mini_axi.write(inputs1_addr, input1_data)
	await core_mini_axi.write(inputs2_addr, input2_data)
	await core_mini_axi.execute_from(entry_point)
	await core_mini_axi.wait_for_halted()

	+ rdata = (await core_mini_axi.read(outputs_addr, 4 * 8)).view(np.uint32)
	+ print(f"I got {rdata}")
	```

	## Running the test bench

	You can run the test bench with:

	```bash
	bazel run //tests/cocotb/tutorial:tutorial
	```

	You should see the following in the console output:

	```bash
	I got [8994 8995 8996 8997 8998 8999 9000 9001]
	```

	Congratulations on running your first program!

	## Next steps

	Follow up tutorials will cover accelerating Kelvin with RISC-V Vector
	intrinsics.