blob: 847e4077f974df8d50364757158904b8d23c42a7 [file] [log] [blame] [view]
# Custom modules in Python sample
This sample illustrates how to define a custom module in the Python API,
with a pure Python implementation, and compiling an overall program that
can use it.
This builds on the capabilities ot the `custom_module` sample, which
demonstrates C-based extension modules -- applying the same basics to
Python. Some features are not yet implemented on the Python side, and
the API is lower level than we should ultimately have. However, as
is demonstrated, it can do some not trivial things.
## Sample description
To show off some of the capabilities, this sample:
* Demonstrates how to define a custom Python function which accepts both
a buffer and a variant list. Within the implementation, the buffer is
wrapped by a numpy array for use.
* Module state is kept for the detokenizer state, keeping track of whether
we are at the start of text or sentence. Real detokenizers are much
more complex and would likely involve an opaque module custom type
(not yet implemented in Python).
* A global in the main program is used to accumulate fragments by
the `@detokenizer.accumtokens` function.
* The `@detokenizer.jointokens` will format and emit the text corresponding
to accumulated tokens, respecting sentence boundaries and previous
state.
* A `reset` function is exported which resets the accumulated tokens and
the detokenizer state.
A real text model would be organized differently, but this example should
suffice to show that many of these advanced integration concepts are just
simple code.
A future version of this sample will embed the detokenizer vocabulary as
rodata in the main module and use that to initialize the internal lookup
table.