This tutorial introduces the basics of writing a Kelvin program. You will:
This tutorial assumes you have completed opensecura getting started guide.
Open up tests/cocotb/tutorial/program.cc
, which is a skeleton program:
// TODO: Add two inputs buffers of 8 uint32_t's (input1_buffer, input2_buffer) // TODO: Add one output buffer of 8 uint32_t's (output_buffer) int main(int argc, char** argv) { // TODO: Add code to element wise add/subtract from input1_buffer and // input2_buffer and store the result to output_buffer. return 0; }
The typical structure of a Kelvin program includes:
For this tutorial we'll accept two input buffers and emit one output buffer, each consisting of 8 uint32_t. We define them outside of main
. attribute((section(“.data”))) defines buffer is stored in data section.
uint32_t input1_buffer[8] __attribute__((section(".data"))); uint32_t input2_buffer[8] __attribute__((section(".data"))); uint32_t output_buffer[8] __attribute__((section(".data"))); int main(int argc, char** argv) { // TODO: Add code to element wise add/subtract from input1_buffer and // input2_buffer and store the result to output_buffer. return 0; }
For this tutorial, we do not need to define the precise locations of these buffers. Our linker script will allocate them in DTCM and we'll query their locations in our test bench.
As a simple example, let's add element-wise the elements from input1_buffer
to input2_buffer
:
uint32_t input1_buffer[8] __attribute__((section(".data"))); uint32_t input2_buffer[8] __attribute__((section(".data"))); uint32_t output_buffer[8] __attribute__((section(".data"))); int main(int argc, char** argv) { for (int i = 0; i < 8; i++) { output_buffer[i] = input1_buffer[i] + input2_buffer[i]; } return 0; }
The core will halt when returning from main
.
Run bazel build tests/cocotb/tutorial:kelvin_v2_program
to generate kelvin_v2_program.elf
.
Open up tests/cocotb/tutorial/tutorial.py
which contains the skeleton testbench:
@cocotb.test() async def core_mini_axi_tutorial(dut): """Testbench to run your Kelvin program.""" # Test bench setup core_mini_axi = CoreMiniAxiInterface(dut) await core_mini_axi.init() await core_mini_axi.reset() cocotb.start_soon(core_mini_axi.clock.start())
First, we need to program ITCM with your program. A load_elf
function is provided to copy all loadable sections into memory. Add the following to core_mini_axi_tutorial
:
@cocotb.test() async def core_mini_axi_tutorial(dut): """Testbench to run your Kelvin program.""" # Test bench setup core_mini_axi = CoreMiniAxiInterface(dut) await core_mini_axi.init() await core_mini_axi.reset() cocotb.start_soon(core_mini_axi.clock.start()) + r = runfiles.Create() + elf_path = r.Rlocation( + "kelvin_hw/tests/cocotb/tutorial/kelvin_v2_program.elf") + with open(elf_path, "rb") as f: + entry_point = await core_mini_axi.load_elf(f)
Before we start the program, let's also write inputs into DTCM. We can determine the location of a buffer using lookup_symbol
and write to DTCM with write
:
@cocotb.test() async def core_mini_axi_tutorial(dut): """Testbench to run your Kelvin program.""" # Test bench setup core_mini_axi = CoreMiniAxiInterface(dut) await core_mini_axi.init() await core_mini_axi.reset() cocotb.start_soon(core_mini_axi.clock.start()) r = runfiles.Create() elf_path = r.Rlocation( "kelvin_hw/tests/cocotb/tutorial/kelvin_v2_program.elf") with open(elf_path, "rb") as f: entry_point = await core_mini_axi.load_elf(f) + inputs1_addr = core_mini_axi.lookup_symbol(f, "input1_buffer") + inputs2_addr = core_mini_axi.lookup_symbol(f, "input2_buffer") + input1_data = np.arange(8, dtype=np.uint32) + input2_data = 8994 * np.ones(8, dtype=np.uint32) + await core_mini_axi.write(inputs1_addr, input1_data) + await core_mini_axi.write(inputs2_addr, input2_data)
Now that input data has been written, let‘s actually run the program! Use execute_from
to start the program on Kelvin. Once it’s running, wait for the core to halt, so we know it's done work and we can read the result:
@cocotb.test() async def core_mini_axi_tutorial(dut): """Testbench to run your Kelvin program.""" # Test bench setup core_mini_axi = CoreMiniAxiInterface(dut) await core_mini_axi.init() await core_mini_axi.reset() cocotb.start_soon(core_mini_axi.clock.start()) r = runfiles.Create() elf_path = r.Rlocation( "kelvin_hw/tests/cocotb/tutorial/kelvin_v2_program.elf") with open(elf_path, "rb") as f: entry_point = await core_mini_axi.load_elf(f) inputs1_addr = core_mini_axi.lookup_symbol(f, "input1_buffer") inputs2_addr = core_mini_axi.lookup_symbol(f, "input2_buffer") input1_data = np.arange(8, dtype=np.uint32) input2_data = 8994 * np.ones(8, dtype=np.uint32) await core_mini_axi.write(inputs1_addr, input1_data) await core_mini_axi.write(inputs2_addr, input2_data) + await core_mini_axi.execute_from(entry_point) + await core_mini_axi.wait_for_halted()
Finally, let's read
and print the result:
async def core_mini_axi_tutorial(dut): """Testbench to run your Kelvin program.""" # Test bench setup core_mini_axi = CoreMiniAxiInterface(dut) await core_mini_axi.init() await core_mini_axi.reset() cocotb.start_soon(core_mini_axi.clock.start()) r = runfiles.Create() elf_path = r.Rlocation( "kelvin_hw/tests/cocotb/tutorial/kelvin_v2_program.elf") with open(elf_path, "rb") as f: entry_point = await core_mini_axi.load_elf(f) inputs1_addr = core_mini_axi.lookup_symbol(f, "input1_buffer") inputs2_addr = core_mini_axi.lookup_symbol(f, "input2_buffer") input1_data = np.arange(8, dtype=np.uint32) input2_data = 8994 * np.ones(8, dtype=np.uint32) await core_mini_axi.write(inputs1_addr, input1_data) await core_mini_axi.write(inputs2_addr, input2_data) await core_mini_axi.execute_from(entry_point) await core_mini_axi.wait_for_halted() + rdata = (await core_mini_axi.read(outputs_addr, 4 * 8)).view(np.uint32) + print(f"I got {rdata}")
You can run the test bench with:
bazel run //tests/cocotb/tutorial:tutorial
You should see the following in the console output:
I got [8994 8995 8996 8997 8998 8999 9000 9001]
Congratulations on running your first program!
Follow up tutorials will cover accelerating Kelvin with RISC-V Vector intrinsics.