blob: 847e4077f974df8d50364757158904b8d23c42a7 [file] [log] [blame] [view]
Stella Laurenzo2616ac72022-08-01 10:47:25 -07001# Custom modules in Python sample
2
3This sample illustrates how to define a custom module in the Python API,
4with a pure Python implementation, and compiling an overall program that
5can use it.
6
7This builds on the capabilities ot the `custom_module` sample, which
8demonstrates C-based extension modules -- applying the same basics to
9Python. Some features are not yet implemented on the Python side, and
10the API is lower level than we should ultimately have. However, as
11is demonstrated, it can do some not trivial things.
12
13## Sample description
14
15To show off some of the capabilities, this sample:
16
17* Demonstrates how to define a custom Python function which accepts both
18 a buffer and a variant list. Within the implementation, the buffer is
19 wrapped by a numpy array for use.
20* Module state is kept for the detokenizer state, keeping track of whether
21 we are at the start of text or sentence. Real detokenizers are much
22 more complex and would likely involve an opaque module custom type
23 (not yet implemented in Python).
24* A global in the main program is used to accumulate fragments by
25 the `@detokenizer.accumtokens` function.
26* The `@detokenizer.jointokens` will format and emit the text corresponding
27 to accumulated tokens, respecting sentence boundaries and previous
28 state.
29* A `reset` function is exported which resets the accumulated tokens and
30 the detokenizer state.
31
32A real text model would be organized differently, but this example should
33suffice to show that many of these advanced integration concepts are just
34simple code.
35
36A future version of this sample will embed the detokenizer vocabulary as
37rodata in the main module and use that to initialize the internal lookup
Scott Todd62efaee2024-05-31 13:33:55 -070038table.