This sample illustrates how to define a custom module in the Python API, with a pure Python implementation, and compiling an overall program that can use it.
This builds on the capabilities ot the custom_module
sample, which demonstrates C-based extension modules -- applying the same basics to Python. Some features are not yet implemented on the Python side, and the API is lower level than we should ultimately have. However, as is demonstrated, it can do some not trivial things.
To show off some of the capabilities, this sample:
@detokenizer.accumtokens
function.@detokenizer.jointokens
will format and emit the text corresponding to accumulated tokens, respecting sentence boundaries and previous state.reset
function is exported which resets the accumulated tokens and the detokenizer state.A real text model would be organized differently, but this example should suffice to show that many of these advanced integration concepts are just simple code.
A future version of this sample will embed the detokenizer vocabulary as rodata in the main module and use that to initialize the internal lookup table.