Stella Laurenzo | 2616ac7 | 2022-08-01 10:47:25 -0700 | [diff] [blame] | 1 | # Custom modules in Python sample |
| 2 | |
| 3 | This sample illustrates how to define a custom module in the Python API, |
| 4 | with a pure Python implementation, and compiling an overall program that |
| 5 | can use it. |
| 6 | |
| 7 | This builds on the capabilities ot the `custom_module` sample, which |
| 8 | demonstrates C-based extension modules -- applying the same basics to |
| 9 | Python. Some features are not yet implemented on the Python side, and |
| 10 | the API is lower level than we should ultimately have. However, as |
| 11 | is demonstrated, it can do some not trivial things. |
| 12 | |
| 13 | ## Sample description |
| 14 | |
| 15 | To show off some of the capabilities, this sample: |
| 16 | |
| 17 | * Demonstrates how to define a custom Python function which accepts both |
| 18 | a buffer and a variant list. Within the implementation, the buffer is |
| 19 | wrapped by a numpy array for use. |
| 20 | * Module state is kept for the detokenizer state, keeping track of whether |
| 21 | we are at the start of text or sentence. Real detokenizers are much |
| 22 | more complex and would likely involve an opaque module custom type |
| 23 | (not yet implemented in Python). |
| 24 | * A global in the main program is used to accumulate fragments by |
| 25 | the `@detokenizer.accumtokens` function. |
| 26 | * The `@detokenizer.jointokens` will format and emit the text corresponding |
| 27 | to accumulated tokens, respecting sentence boundaries and previous |
| 28 | state. |
| 29 | * A `reset` function is exported which resets the accumulated tokens and |
| 30 | the detokenizer state. |
| 31 | |
| 32 | A real text model would be organized differently, but this example should |
| 33 | suffice to show that many of these advanced integration concepts are just |
| 34 | simple code. |
| 35 | |
| 36 | A future version of this sample will embed the detokenizer vocabulary as |
| 37 | rodata in the main module and use that to initialize the internal lookup |
Scott Todd | 62efaee | 2024-05-31 13:33:55 -0700 | [diff] [blame] | 38 | table. |