Custom modules in Python sample

This sample illustrates how to define a custom module in the Python API, with a pure Python implementation, and compiling an overall program that can use it.

This builds on the capabilities ot the custom_module sample, which demonstrates C-based extension modules -- applying the same basics to Python. Some features are not yet implemented on the Python side, and the API is lower level than we should ultimately have. However, as is demonstrated, it can do some not trivial things.

Sample description

To show off some of the capabilities, this sample:

  • Demonstrates how to define a custom Python function which accepts both a buffer and a variant list. Within the implementation, the buffer is wrapped by a numpy array for use.
  • Module state is kept for the detokenizer state, keeping track of whether we are at the start of text or sentence. Real detokenizers are much more complex and would likely involve an opaque module custom type (not yet implemented in Python).
  • A global in the main program is used to accumulate fragments by the @detokenizer.accumtokens function.
  • The @detokenizer.jointokens will format and emit the text corresponding to accumulated tokens, respecting sentence boundaries and previous state.
  • A reset function is exported which resets the accumulated tokens and the detokenizer state.

A real text model would be organized differently, but this example should suffice to show that many of these advanced integration concepts are just simple code.

A future version of this sample will embed the detokenizer vocabulary as rodata in the main module and use that to initialize the internal lookup table.