pw_tokenizer: Tokenization domains
- Add support for tokenization domains. Strings tokenized in each domain
are stored separately and can be processed differently. Using domains
is optional; strings are tokenized in the "default" domain by default.
- Provide *_DOMAIN versions of the tokenization macros to allow
specifying the domain.
- Update docs and add tests for tokenization domain macros.
- Clean up the tokenizer macros to reduce duplication.
- Use a leading underscore for the private tokenizer functions.
- Use the pw_ prefix for the ELF sections and the linker script example.
Change-Id: I14c6e84a7a954669c9ddf50a9b6d32b8e19d6f16
diff --git a/pw_tokenizer/docs.rst b/pw_tokenizer/docs.rst
index 8efcc67..6d57d49 100644
--- a/pw_tokenizer/docs.rst
+++ b/pw_tokenizer/docs.rst
@@ -120,7 +120,7 @@
are provided. For Make or other build systems, add the files specified in
the BUILD.gn's ``pw_tokenizer`` target to the build.
2. Use the tokenization macros in your code. See `Tokenization`_.
- 3. Add the contents of ``tokenizer_linker_sections.ld`` to your project's
+ 3. Add the contents of ``pw_tokenizer_linker_sections.ld`` to your project's
linker script.
4. Compile your code to produce an ELF file.
5. Run ``database.py create`` on the ELF file to generate a CSV token
@@ -341,6 +341,36 @@
primarily in C++ may use a large value for ``PW_TOKENIZER_CFG_HASH_LENGTH``
(perhaps even ``std::numeric_limits<size_t>::max()``).
+Tokenization domains
+--------------------
+``pw_tokenizer`` supports having multiple tokenization domains. Strings from
+each tokenization domain are stored in separate sections in the ELF file. This
+allows projects to keep tokens from different sources separate. Potential use
+cases include the following:
+
+* Keep large sets of tokenized strings separate to avoid collisions.
+* Create a separate database for a small number of strings that use truncated
+ tokens, for example only 10 or 16 bits instead of the full 32 bits.
+
+Strings are tokenized by default into the "default" domain. For many projects,
+a single tokenization domain is sufficient, so no additional configuration is
+required.
+
+To support other multiple domains, add a ``pw_tokenized.<new domain name>``
+linker section, as described in ``pw_tokenizer_linker_sections.ld``. Strings are
+tokenized into a domain by providing the domain name as a string literal to the
+``*_DOMAIN`` versions of the tokenization macros. Domain names must be comprised
+of alphanumeric characters and underscores; spaces and special characters are
+not permitted.
+
+.. code-block:: cpp
+
+ // Tokenizes this string to the "default" domain.
+ PW_TOKENIZE_STRING("Hello, world!");
+
+ // Tokenizes this string to the "my_custom_domain" domain.
+ PW_TOKENIZE_STRING_DOMAIN("my_custom_domain", "Hello, world!");
+
Token databases
===============
Token databases store a mapping of tokens to the strings they represent. An ELF
@@ -750,9 +780,9 @@
Supporting detokenization of strings tokenized on 64-bit targets would be
simple. This could be done by adding an option to switch the 32-bit types to
-64-bit. The tokenizer stores the sizes of these types in the ``.tokenizer_info``
-ELF section, so the sizes of these types can be verified by checking the ELF
-file, if necessary.
+64-bit. The tokenizer stores the sizes of these types in the
+``.pw_tokenizer_info`` ELF section, so the sizes of these types can be verified
+by checking the ELF file, if necessary.
Tokenization in headers
-----------------------
diff --git a/pw_tokenizer/encode_args.cc b/pw_tokenizer/encode_args.cc
index 4697dc8..27fa7f2 100644
--- a/pw_tokenizer/encode_args.cc
+++ b/pw_tokenizer/encode_args.cc
@@ -27,10 +27,10 @@
// Store metadata about this compilation's string tokenization in the ELF.
//
// The tokenizer metadata will not go into the on-device executable binary code.
-// This metadata will be present in the ELF file's .tokenizer_info section, from
-// which the host-side tooling (Python, Java, etc.) can understand how to decode
-// tokenized strings for the given binary. Only attributes that affect the
-// decoding process are recorded.
+// This metadata will be present in the ELF file's .pw_tokenizer_info section,
+// from which the host-side tooling (Python, Java, etc.) can understand how to
+// decode tokenized strings for the given binary. Only attributes that affect
+// the decoding process are recorded.
//
// Tokenizer metadata is stored in an array of key-value pairs. Each Metadata
// object is 32 bytes: a 24-byte string and an 8-byte value. Metadata structs
@@ -42,8 +42,16 @@
static_assert(sizeof(Metadata) == 32);
-// Store tokenization metadata in its own section.
-constexpr Metadata metadata[] PW_KEEP_IN_SECTION(".tokenzier_info") = {
+// Store tokenization metadata in its own section. Mach-O files are not
+// supported by pw_tokenizer, but a short, Mach-O compatible section name is
+// used on macOS so that this file can at least compile.
+#if __APPLE__
+#define PW_TOKENIZER_INFO_SECTION PW_KEEP_IN_SECTION(".pw_info")
+#else
+#define PW_TOKENIZER_INFO_SECTION PW_KEEP_IN_SECTION(".pw_tokenzier_info")
+#endif // __APPLE__
+
+constexpr Metadata metadata[] PW_TOKENIZER_INFO_SECTION = {
{"hash_length_bytes", PW_TOKENIZER_CFG_HASH_LENGTH},
{"sizeof_long", sizeof(long)}, // %l conversion specifier
{"sizeof_intmax_t", sizeof(intmax_t)}, // %j conversion specifier
diff --git a/pw_tokenizer/global_handlers_test.cc b/pw_tokenizer/global_handlers_test.cc
index 8bd7003..89885a8 100644
--- a/pw_tokenizer/global_handlers_test.cc
+++ b/pw_tokenizer/global_handlers_test.cc
@@ -94,6 +94,15 @@
EXPECT_EQ(std::memcmp(expected.data(), message_, expected.size()), 0);
}
+TEST_F(TokenizeToGlobalHandler, Domain_Strings) {
+ PW_TOKENIZE_TO_GLOBAL_HANDLER_DOMAIN(
+ "TEST_DOMAIN", "The answer is: %s", "5432!");
+ constexpr std::array<uint8_t, 10> expected =
+ ExpectedData<5, '5', '4', '3', '2', '!'>("The answer is: %s");
+ ASSERT_EQ(expected.size(), message_size_bytes_);
+ EXPECT_EQ(std::memcmp(expected.data(), message_, expected.size()), 0);
+}
+
TEST_F(TokenizeToGlobalHandler, C_SequentialZigZag) {
pw_TokenizeToGlobalHandlerTest_SequentialZigZag();
@@ -148,22 +157,35 @@
EXPECT_EQ(payload_, -543);
}
-TEST_F(TokenizeToGlobalHandlerWithPayload, Strings) {
- constexpr std::array<uint8_t, 10> expected =
- ExpectedData<5, '5', '4', '3', '2', '!'>("The answer is: %s");
+constexpr std::array<uint8_t, 10> kExpected =
+ ExpectedData<5, '5', '4', '3', '2', '!'>("The answer is: %s");
+TEST_F(TokenizeToGlobalHandlerWithPayload, Strings_ZeroPayload) {
+ PW_TOKENIZE_TO_GLOBAL_HANDLER_WITH_PAYLOAD({}, "The answer is: %s", "5432!");
+
+ ASSERT_EQ(kExpected.size(), message_size_bytes_);
+ EXPECT_EQ(std::memcmp(kExpected.data(), message_, kExpected.size()), 0);
+ EXPECT_EQ(payload_, 0);
+}
+
+TEST_F(TokenizeToGlobalHandlerWithPayload, Strings_NonZeroPayload) {
PW_TOKENIZE_TO_GLOBAL_HANDLER_WITH_PAYLOAD(
static_cast<pw_TokenizerPayload>(5432), "The answer is: %s", "5432!");
- ASSERT_EQ(expected.size(), message_size_bytes_);
- EXPECT_EQ(std::memcmp(expected.data(), message_, expected.size()), 0);
+ ASSERT_EQ(kExpected.size(), message_size_bytes_);
+ EXPECT_EQ(std::memcmp(kExpected.data(), message_, kExpected.size()), 0);
EXPECT_EQ(payload_, 5432);
+}
- PW_TOKENIZE_TO_GLOBAL_HANDLER_WITH_PAYLOAD({}, "The answer is: %s", "5432!");
-
- ASSERT_EQ(expected.size(), message_size_bytes_);
- EXPECT_EQ(std::memcmp(expected.data(), message_, expected.size()), 0);
- EXPECT_EQ(payload_, 0);
+TEST_F(TokenizeToGlobalHandlerWithPayload, Domain_Strings) {
+ PW_TOKENIZE_TO_GLOBAL_HANDLER_WITH_PAYLOAD_DOMAIN(
+ "TEST_DOMAIN",
+ static_cast<pw_TokenizerPayload>(5432),
+ "The answer is: %s",
+ "5432!");
+ ASSERT_EQ(kExpected.size(), message_size_bytes_);
+ EXPECT_EQ(std::memcmp(kExpected.data(), message_, kExpected.size()), 0);
+ EXPECT_EQ(payload_, 5432);
}
struct Foo {
@@ -207,5 +229,54 @@
TokenizeToGlobalHandlerWithPayload::SetPayload(payload);
}
+// Hijack the PW_TOKENIZE_STRING_DOMAIN macro to capture the tokenizer domain.
+#undef PW_TOKENIZE_STRING_DOMAIN
+#define PW_TOKENIZE_STRING_DOMAIN(domain, string) \
+ /* assigned to a variable */ PW_TOKENIZER_STRING_TOKEN(string); \
+ tokenizer_domain = domain; \
+ string_literal = string
+
+TEST_F(TokenizeToGlobalHandler, Domain_Default) {
+ const char* tokenizer_domain = nullptr;
+ const char* string_literal = nullptr;
+
+ PW_TOKENIZE_TO_GLOBAL_HANDLER("404");
+
+ EXPECT_STREQ(tokenizer_domain, PW_TOKENIZER_DEFAULT_DOMAIN);
+ EXPECT_STREQ(string_literal, "404");
+}
+
+TEST_F(TokenizeToGlobalHandler, Domain_Specified) {
+ const char* tokenizer_domain = nullptr;
+ const char* string_literal = nullptr;
+
+ PW_TOKENIZE_TO_GLOBAL_HANDLER_DOMAIN("www.google.com", "404");
+
+ EXPECT_STREQ(tokenizer_domain, "www.google.com");
+ EXPECT_STREQ(string_literal, "404");
+}
+
+TEST_F(TokenizeToGlobalHandlerWithPayload, Domain_Default) {
+ const char* tokenizer_domain = nullptr;
+ const char* string_literal = nullptr;
+
+ PW_TOKENIZE_TO_GLOBAL_HANDLER_WITH_PAYLOAD(
+ static_cast<pw_TokenizerPayload>(123), "Wow%s", "???");
+
+ EXPECT_STREQ(tokenizer_domain, PW_TOKENIZER_DEFAULT_DOMAIN);
+ EXPECT_STREQ(string_literal, "Wow%s");
+}
+
+TEST_F(TokenizeToGlobalHandlerWithPayload, Domain_Specified) {
+ const char* tokenizer_domain = nullptr;
+ const char* string_literal = nullptr;
+
+ PW_TOKENIZE_TO_GLOBAL_HANDLER_WITH_PAYLOAD_DOMAIN(
+ "THEDOMAIN", static_cast<pw_TokenizerPayload>(123), "1234567890");
+
+ EXPECT_STREQ(tokenizer_domain, "THEDOMAIN");
+ EXPECT_STREQ(string_literal, "1234567890");
+}
+
} // namespace
} // namespace pw::tokenizer
diff --git a/pw_tokenizer/public/pw_tokenizer/tokenize.h b/pw_tokenizer/public/pw_tokenizer/tokenize.h
index 10596ee..f5af3c4 100644
--- a/pw_tokenizer/public/pw_tokenizer/tokenize.h
+++ b/pw_tokenizer/public/pw_tokenizer/tokenize.h
@@ -24,6 +24,14 @@
#include "pw_tokenizer/internal/argument_types.h"
#include "pw_tokenizer/internal/tokenize_string.h"
+// Strings may optionally be tokenized to a domain. Strings in different domains
+// can be processed separately by the token database tools. Each domain in use
+// must have a corresponding section declared in the linker script. See
+// pw_tokenizer_linker_sections.ld for more details.
+//
+// If no domain is specified, this default is used.
+#define PW_TOKENIZER_DEFAULT_DOMAIN "default"
+
// Tokenizes a string literal and converts it to a pw_TokenizerStringToken. This
// expression can be assigned to a local or global variable, but cannot be used
// in another expression. For example:
@@ -37,7 +45,19 @@
// }
//
#define PW_TOKENIZE_STRING(string_literal) \
- _PW_TOKENIZE_LITERAL_UNIQUE(__COUNTER__, string_literal)
+ PW_TOKENIZE_STRING_DOMAIN(PW_TOKENIZER_DEFAULT_DOMAIN, string_literal)
+
+// Same as PW_TOKENIZE_STRING, but tokenizes to the specified domain.
+#define PW_TOKENIZE_STRING_DOMAIN(domain, string_literal) \
+ /* assign to a variable */ PW_TOKENIZER_STRING_TOKEN(string_literal); \
+ \
+ /* Declare the format string as an array in the special tokenized string */ \
+ /* section, which should be excluded from the final binary. Use __LINE__ */ \
+ /* to create unique names for the section and variable, which avoids */ \
+ /* compiler warnings. */ \
+ static _PW_TOKENIZER_CONST char PW_CONCAT( \
+ _pw_tokenizer_string_literal_DO_NOT_USE_, \
+ __LINE__)[] _PW_TOKENIZER_SECTION(domain) = string_literal
// Encodes a tokenized string and arguments to the provided buffer. The size of
// the buffer is passed via a pointer to a size_t. After encoding is complete,
@@ -60,13 +80,22 @@
// MyProject_EnqueueMessageForUart(buffer, size);
//
#define PW_TOKENIZE_TO_BUFFER(buffer, buffer_size_pointer, format, ...) \
- do { \
- _PW_TOKENIZE_STRING(format, __VA_ARGS__); \
- pw_TokenizeToBuffer(buffer, \
- buffer_size_pointer, \
- _pw_tokenizer_token, \
- PW_TOKENIZER_ARG_TYPES(__VA_ARGS__) \
- PW_COMMA_ARGS(__VA_ARGS__)); \
+ PW_TOKENIZE_TO_BUFFER_DOMAIN(PW_TOKENIZER_DEFAULT_DOMAIN, \
+ buffer, \
+ buffer_size_pointer, \
+ format, \
+ __VA_ARGS__)
+
+// Same as PW_TOKENIZE_TO_BUFFER, but tokenizes to the specified domain.
+#define PW_TOKENIZE_TO_BUFFER_DOMAIN( \
+ domain, buffer, buffer_size_pointer, format, ...) \
+ do { \
+ _PW_TOKENIZE_FORMAT_STRING(domain, format, __VA_ARGS__); \
+ _pw_TokenizeToBuffer(buffer, \
+ buffer_size_pointer, \
+ _pw_tokenizer_token, \
+ PW_TOKENIZER_ARG_TYPES(__VA_ARGS__) \
+ PW_COMMA_ARGS(__VA_ARGS__)); \
} while (0)
// Encodes a tokenized string and arguments to a buffer on the stack. The
@@ -97,30 +126,34 @@
// value);
// }
//
-#define PW_TOKENIZE_TO_CALLBACK(callback, format, ...) \
- do { \
- _PW_TOKENIZE_STRING(format, __VA_ARGS__); \
- pw_TokenizeToCallback(callback, \
- _pw_tokenizer_token, \
- PW_TOKENIZER_ARG_TYPES(__VA_ARGS__) \
- PW_COMMA_ARGS(__VA_ARGS__)); \
+#define PW_TOKENIZE_TO_CALLBACK(callback, format, ...) \
+ PW_TOKENIZE_TO_CALLBACK_DOMAIN( \
+ PW_TOKENIZER_DEFAULT_DOMAIN, callback, format, __VA_ARGS__)
+
+#define PW_TOKENIZE_TO_CALLBACK_DOMAIN(domain, callback, format, ...) \
+ do { \
+ _PW_TOKENIZE_FORMAT_STRING(domain, format, __VA_ARGS__); \
+ _pw_TokenizeToCallback(callback, \
+ _pw_tokenizer_token, \
+ PW_TOKENIZER_ARG_TYPES(__VA_ARGS__) \
+ PW_COMMA_ARGS(__VA_ARGS__)); \
} while (0)
PW_EXTERN_C_START
// These functions encode the tokenized strings. These should not be called
// directly. Instead, use the corresponding PW_TOKENIZE_TO_* macros above.
-void pw_TokenizeToBuffer(void* buffer,
- size_t* buffer_size_bytes, // input and output arg
- pw_TokenizerStringToken token,
- pw_TokenizerArgTypes types,
- ...);
+void _pw_TokenizeToBuffer(void* buffer,
+ size_t* buffer_size_bytes, // input and output arg
+ pw_TokenizerStringToken token,
+ pw_TokenizerArgTypes types,
+ ...);
-void pw_TokenizeToCallback(void (*callback)(const uint8_t* encoded_message,
- size_t size_bytes),
- pw_TokenizerStringToken token,
- pw_TokenizerArgTypes types,
- ...);
+void _pw_TokenizeToCallback(void (*callback)(const uint8_t* encoded_message,
+ size_t size_bytes),
+ pw_TokenizerStringToken token,
+ pw_TokenizerArgTypes types,
+ ...);
// This empty function allows the compiler to check the format string.
inline void pw_TokenizerCheckFormatString(const char* format, ...)
@@ -134,24 +167,12 @@
// These macros implement string tokenization. They should not be used directly;
// use one of the PW_TOKENIZE_* macros above instead.
-#define _PW_TOKENIZE_LITERAL_UNIQUE(id, string_literal) \
- /* assign to a variable */ PW_TOKENIZER_STRING_TOKEN(string_literal); \
- \
- /* Declare without nested scope so this works in or out of a function. */ \
- static _PW_TOKENIZER_CONST char PW_CONCAT( \
- _pw_tokenizer_string_literal_DO_NOT_USE_THIS_VARIABLE_, \
- id)[] _PW_TOKENIZER_SECTION(id) = string_literal
-
-// This macro uses __COUNTER__ to generate an identifier to use in the main
-// tokenization macro below. The identifier is unique within a compilation unit.
-#define _PW_TOKENIZE_STRING(format, ...) \
- _PW_TOKENIZE_STRING_UNIQUE(__COUNTER__, format, __VA_ARGS__)
// This macro takes a printf-style format string and corresponding arguments. It
// checks that the arguments are correct, stores the format string in a special
// section, and calculates the string's token at compile time.
// clang-format off
-#define _PW_TOKENIZE_STRING_UNIQUE(id, format, ...) \
+#define _PW_TOKENIZE_FORMAT_STRING(domain, format, ...) \
if (0) { /* Do not execute to prevent double evaluation of the arguments. */ \
pw_TokenizerCheckFormatString(format PW_COMMA_ARGS(__VA_ARGS__)); \
} \
@@ -164,15 +185,9 @@
PW_STRINGIFY(PW_ARG_COUNT(__VA_ARGS__)) " arguments were used for " \
#format " (" #__VA_ARGS__ ")"); \
\
- /* Declare the format string as an array in the special tokenized string */ \
- /* section, which should be excluded from the final binary. Use unique */ \
- /* names for the section and variable to avoid compiler warnings. */ \
- static _PW_TOKENIZER_CONST char PW_CONCAT( \
- _pw_tokenizer_format_string_, id)[] _PW_TOKENIZER_SECTION(id) = format; \
- \
/* Tokenize the string to a pw_TokenizerStringToken at compile time. */ \
_PW_TOKENIZER_CONST pw_TokenizerStringToken _pw_tokenizer_token = \
- PW_TOKENIZER_STRING_TOKEN(format)
+ PW_TOKENIZE_STRING_DOMAIN(domain, format)
// clang-format on
@@ -182,41 +197,43 @@
#define _PW_TOKENIZER_CONST const
#endif // __cplusplus
-// _PW_TOKENIZER_SECTION places the format string in a special .tokenized.#
+// _PW_TOKENIZER_SECTION places the format string in a special .pw_tokenized
// linker section. Host-side decoding tools read the strings from this section
// to build a database of tokenized strings.
//
// This section should be declared as type INFO so that it is excluded from the
-// final binary. To declare the section, as well as the .tokenizer_info section,
-// used for tokenizer metadata, add the following to the linker
-// script's SECTIONS command:
+// final binary. To declare the section, as well as the .pw_tokenizer_info
+// metadata section, add the following to the linker script's SECTIONS command:
//
-// .tokenized 0x00000000 (INFO) :
+// .pw_tokenizer_info 0x0 (INFO) :
// {
-// KEEP(*(.tokenized))
-// KEEP(*(.tokenized.*))
+// KEEP(*(.pw_atokenizer_info))
// }
//
-// .tokenizer_info 0x00000000 (INFO) :
+// .pw_tokenized.default 0x0 (INFO) :
// {
-// KEEP(*(.tokenizer_info))
+// KEEP(*(.pw_tokenized.default.*))
// }
//
-// Any address could be used for this section, but it should not map to a real
-// device to avoid confusion. 0x00000000 is a reasonable default. An address
-// such as 0xFF000000 that is outside of the ARMv7m memory map could also be
-// used.
+//
+// If custom tokenization domains are used, a section must be declared for each
+// domain:
+//
+// .pw_tokenized.YOUR_CUSTOM_TOKENIZATION_DOMAIN 0x0 (INFO) :
+// {
+// KEEP(*(.pw_tokenized.YOUR_CUSTOM_TOKENIZATION_DOMAIN.*))
+// }
//
// A linker script snippet that provides these sections is provided in the file
-// tokenizer_linker_sections.ld. This file may be directly included into
+// pw_tokenizer_linker_sections.ld. This file may be directly included into
// existing linker scripts.
//
// The tokenized string sections can also be managed without linker script
// modifications, though this is not recommended. The section can be extracted
// and removed from the ELF with objcopy:
//
-// objcopy --only-section .tokenize* <ORIGINAL_ELF> <OUTPUT_ELF>
-// objcopy --remove-section .tokenize* <ORIGINAL_ELF>
+// objcopy --only-section .pw_tokenize* <ORIGINAL_ELF> <OUTPUT_ELF>
+// objcopy --remove-section .pw_tokenize* <ORIGINAL_ELF>
//
// OUTPUT_ELF will be an ELF with only the tokenized strings, and the original
// ELF file will have the sections removed.
@@ -225,5 +242,14 @@
// option (--gc-sections) removes the tokenized string sections. To avoid
// editing the target linker script, a separate metadata ELF can be linked
// without --gc-sections to preserve the tokenized data.
-#define _PW_TOKENIZER_SECTION(unique) \
- PW_KEEP_IN_SECTION(PW_STRINGIFY(PW_CONCAT(.tokenized., unique)))
+//
+// pw_tokenizer is intended for use with ELF files only. Mach-O files (macOS
+// executables) do not support section names longer than 16 characters, so a
+// short, dummy section name is used on macOS.
+#if __APPLE__
+#define _PW_TOKENIZER_SECTION(unused_domain) \
+ PW_KEEP_IN_SECTION(".pw." PW_STRINGIFY(__LINE__))
+#else
+#define _PW_TOKENIZER_SECTION(domain) \
+ PW_KEEP_IN_SECTION(".pw_tokenized." domain "." PW_STRINGIFY(__LINE__))
+#endif // __APPLE__
diff --git a/pw_tokenizer/public/pw_tokenizer/tokenize_to_global_handler.h b/pw_tokenizer/public/pw_tokenizer/tokenize_to_global_handler.h
index 2bd2eea..ce41741 100644
--- a/pw_tokenizer/public/pw_tokenizer/tokenize_to_global_handler.h
+++ b/pw_tokenizer/public/pw_tokenizer/tokenize_to_global_handler.h
@@ -13,6 +13,9 @@
// the License.
#pragma once
+#include <stddef.h>
+#include <stdint.h>
+
#include "pw_preprocessor/util.h"
#include "pw_tokenizer/tokenize.h"
@@ -39,22 +42,31 @@
// MyProject_EnqueueMessageForUart(buffer, size_bytes);
// }
//
-#define PW_TOKENIZE_TO_GLOBAL_HANDLER(format, ...) \
- do { \
- _PW_TOKENIZE_STRING(format, __VA_ARGS__); \
- pw_TokenizeToGlobalHandler(_pw_tokenizer_token, \
- PW_TOKENIZER_ARG_TYPES(__VA_ARGS__) \
- PW_COMMA_ARGS(__VA_ARGS__)); \
+#define PW_TOKENIZE_TO_GLOBAL_HANDLER(format, ...) \
+ PW_TOKENIZE_TO_GLOBAL_HANDLER_DOMAIN( \
+ PW_TOKENIZER_DEFAULT_DOMAIN, format, __VA_ARGS__)
+
+// Same as PW_TOKENIZE_TO_GLOBAL_HANDLER, but tokenizes to the specified domain.
+#define PW_TOKENIZE_TO_GLOBAL_HANDLER_DOMAIN(domain, format, ...) \
+ do { \
+ _PW_TOKENIZE_FORMAT_STRING(domain, format, __VA_ARGS__); \
+ _pw_TokenizeToGlobalHandler(_pw_tokenizer_token, \
+ PW_TOKENIZER_ARG_TYPES(__VA_ARGS__) \
+ PW_COMMA_ARGS(__VA_ARGS__)); \
} while (0)
+PW_EXTERN_C_START
+
// This function must be defined by the pw_tokenizer:global_handler backend.
// This function is called with the encoded message by
// pw_TokenizeToGlobalHandler.
-PW_EXTERN_C void pw_TokenizerHandleEncodedMessage(
- const uint8_t encoded_message[], size_t size_bytes);
+void pw_TokenizerHandleEncodedMessage(const uint8_t encoded_message[],
+ size_t size_bytes);
// This function encodes the tokenized strings. Do not call it directly;
// instead, use the PW_TOKENIZE_TO_GLOBAL_HANDLER macro.
-PW_EXTERN_C void pw_TokenizeToGlobalHandler(pw_TokenizerStringToken token,
- pw_TokenizerArgTypes types,
- ...);
+void _pw_TokenizeToGlobalHandler(pw_TokenizerStringToken token,
+ pw_TokenizerArgTypes types,
+ ...);
+
+PW_EXTERN_C_END
diff --git a/pw_tokenizer/public/pw_tokenizer/tokenize_to_global_handler_with_payload.h b/pw_tokenizer/public/pw_tokenizer/tokenize_to_global_handler_with_payload.h
index e5ee036..df1800f 100644
--- a/pw_tokenizer/public/pw_tokenizer/tokenize_to_global_handler_with_payload.h
+++ b/pw_tokenizer/public/pw_tokenizer/tokenize_to_global_handler_with_payload.h
@@ -13,6 +13,9 @@
// the License.
#pragma once
+#include <stddef.h>
+#include <stdint.h>
+
#include "pw_preprocessor/util.h"
#include "pw_tokenizer/tokenize.h"
@@ -20,8 +23,7 @@
// to a buffer on the stack. The macro adds a payload argument, which is passed
// through to the global handler function
// pw_TokenizerHandleEncodedMessageWithPayload, which must be defined by the
-// user of pw_tokenizer. The payload type is specified by the
-// PW_TOKENIZER_CFG_PAYLOAD_TYPE option and defaults to void*.
+// user of pw_tokenizer. The payload is a uintptr_t.
//
// For example, the following tokenizes a log string and passes the log level as
// the payload.
@@ -38,29 +40,39 @@
}
}
*/
-#define PW_TOKENIZE_TO_GLOBAL_HANDLER_WITH_PAYLOAD(payload, format, ...) \
- do { \
- _PW_TOKENIZE_STRING(format, __VA_ARGS__); \
- pw_TokenizeToGlobalHandlerWithPayload(payload, \
- _pw_tokenizer_token, \
- PW_TOKENIZER_ARG_TYPES(__VA_ARGS__) \
- PW_COMMA_ARGS(__VA_ARGS__)); \
+#define PW_TOKENIZE_TO_GLOBAL_HANDLER_WITH_PAYLOAD(payload, format, ...) \
+ PW_TOKENIZE_TO_GLOBAL_HANDLER_WITH_PAYLOAD_DOMAIN( \
+ PW_TOKENIZER_DEFAULT_DOMAIN, payload, format, __VA_ARGS__)
+
+// Same as PW_TOKENIZE_TO_GLOBAL_HANDLER_WITH_PAYLOAD, but tokenizes to the
+// specified domain.
+#define PW_TOKENIZE_TO_GLOBAL_HANDLER_WITH_PAYLOAD_DOMAIN( \
+ domain, payload, format, ...) \
+ do { \
+ _PW_TOKENIZE_FORMAT_STRING(domain, format, __VA_ARGS__); \
+ _pw_TokenizeToGlobalHandlerWithPayload(payload, \
+ _pw_tokenizer_token, \
+ PW_TOKENIZER_ARG_TYPES(__VA_ARGS__) \
+ PW_COMMA_ARGS(__VA_ARGS__)); \
} while (0)
+PW_EXTERN_C_START
+
typedef uintptr_t pw_TokenizerPayload;
// This function must be defined pw_tokenizer:global_handler_with_payload
// backend. This function is called with the encoded message by
// pw_TokenizeToGlobalHandler and a caller-provided payload argument.
-PW_EXTERN_C void pw_TokenizerHandleEncodedMessageWithPayload(
+void pw_TokenizerHandleEncodedMessageWithPayload(
pw_TokenizerPayload payload,
const uint8_t encoded_message[],
size_t size_bytes);
// This function encodes the tokenized strings. Do not call it directly;
// instead, use the PW_TOKENIZE_TO_GLOBAL_HANDLER_WITH_PAYLOAD macro.
-PW_EXTERN_C void pw_TokenizeToGlobalHandlerWithPayload(
- pw_TokenizerPayload payload,
- pw_TokenizerStringToken token,
- pw_TokenizerArgTypes types,
- ...);
+void _pw_TokenizeToGlobalHandlerWithPayload(pw_TokenizerPayload payload,
+ pw_TokenizerStringToken token,
+ pw_TokenizerArgTypes types,
+ ...);
+
+PW_EXTERN_C_END
diff --git a/pw_tokenizer/pw_tokenizer_linker_sections.ld b/pw_tokenizer/pw_tokenizer_linker_sections.ld
new file mode 100644
index 0000000..afaba34
--- /dev/null
+++ b/pw_tokenizer/pw_tokenizer_linker_sections.ld
@@ -0,0 +1,82 @@
+/*
+ * Copyright 2020 The Pigweed Authors
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License"); you may not
+ * use this file except in compliance with the License. You may obtain a copy of
+ * the License at
+ *
+ * https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ *
+ * This linker script snippet declares the sections needed for string
+ * tokenization. All sections have type INFO so they are excluded from the final
+ * binary.
+ *
+ * The contents of this script can be copied into an existing linker script.
+ * Alternately, this file can be directly included in a linker script with an
+ * include directive. For example,
+ *
+ * INCLUDE path/to/modules/pw_tokenizer/pw_tokenizer_linker_sections.ld
+ *
+ * SECTIONS
+ * {
+ * (your existing linker sections)
+ * }
+ */
+
+SECTIONS
+{
+ /*
+ * This section stores metadata that may be used during tokenized string
+ * decoding. This metadata describes properties that may affect how the
+ * tokenized string is encoded or decoded -- the maximum length of the hash
+ * function and the sizes of certain integer types.
+ *
+ * Metadata is declared as key-value pairs. See the metadata variable in
+ * tokenize.cc for further details.
+ */
+ .pw_tokenizer_info 0x0 (INFO) :
+ {
+ KEEP(*(.pw_tokenizer_info))
+ }
+
+ /*
+ * Tokenized strings are stored in this section by default. In the compiled
+ * code, format string literals are replaced by a hash of the string contents
+ * and a compact argument list encoded in a uint32_t. The compiled code
+ * contains no references to the tokenized strings in this section.
+ *
+ * The section contents are declared with KEEP so that they are not removed
+ * from the ELF. These are never emitted in the final binary or loaded into
+ * memory.
+ */
+ .pw_tokenized.default 0x0 (INFO) :
+ {
+ KEEP(*(.pw_tokenized.default.*))
+ }
+
+/*
+ * Projects may define additional tokenization domains, if desired. Strings in
+ * different domains are stored in separate ELF sections so they can be
+ * processed separately by the token database tools.
+ *
+ * Use cases for domains include keeping large sets of strings separate to avoid
+ * collisions, or separating a small subset of strings that will use truncated
+ * tokens (e.g. 16-bit tokens instead of 32-bit tokens).
+ *
+ * Each tokenization domain in use must have a corresponding section in the
+ * linker script. As required, copy this section declaration and replace
+ * YOUR_CUSTOM_TOKENIZATION_DOMAIN with the the domain name.
+
+ .pw_tokenized.YOUR_CUSTOM_TOKENIZATION_DOMAIN 0x0 (INFO) :
+ {
+ KEEP(*(.pw_tokenized.YOUR_CUSTOM_TOKENIZATION_DOMAIN.*))
+ }
+
+ */
+}
diff --git a/pw_tokenizer/tokenize.cc b/pw_tokenizer/tokenize.cc
index 9edbb12..4f29876 100644
--- a/pw_tokenizer/tokenize.cc
+++ b/pw_tokenizer/tokenize.cc
@@ -26,11 +26,11 @@
namespace pw {
namespace tokenizer {
-extern "C" void pw_TokenizeToBuffer(void* buffer,
- size_t* buffer_size_bytes,
- pw_TokenizerStringToken token,
- pw_TokenizerArgTypes types,
- ...) {
+extern "C" void _pw_TokenizeToBuffer(void* buffer,
+ size_t* buffer_size_bytes,
+ pw_TokenizerStringToken token,
+ pw_TokenizerArgTypes types,
+ ...) {
if (*buffer_size_bytes < sizeof(token)) {
*buffer_size_bytes = 0;
return;
@@ -50,7 +50,7 @@
*buffer_size_bytes = sizeof(token) + encoded_bytes;
}
-extern "C" void pw_TokenizeToCallback(
+extern "C" void _pw_TokenizeToCallback(
void (*callback)(const uint8_t* encoded_message, size_t size_bytes),
pw_TokenizerStringToken token,
pw_TokenizerArgTypes types,
diff --git a/pw_tokenizer/tokenize_test.cc b/pw_tokenizer/tokenize_test.cc
index b61e3d9..18987fb 100644
--- a/pw_tokenizer/tokenize_test.cc
+++ b/pw_tokenizer/tokenize_test.cc
@@ -242,6 +242,18 @@
EXPECT_EQ(std::memcmp(result.data(), buffer_, result.size()), 0);
}
+TEST_F(TokenizeToBuffer, Domain_String) {
+ size_t message_size = sizeof(buffer_);
+
+ PW_TOKENIZE_TO_BUFFER_DOMAIN(
+ "TEST_DOMAIN", buffer_, &message_size, "The answer was: %s", "5432!");
+ constexpr std::array<uint8_t, 10> expected =
+ ExpectedData<5, '5', '4', '3', '2', '!'>("The answer was: %s");
+
+ ASSERT_EQ(expected.size(), message_size);
+ EXPECT_EQ(std::memcmp(expected.data(), buffer_, expected.size()), 0);
+}
+
TEST_F(TokenizeToBuffer, TruncateArgs) {
// Args that can't fit are dropped completely
size_t message_size = 6;
@@ -381,6 +393,15 @@
EXPECT_EQ(std::memcmp(expected.data(), message_, expected.size()), 0);
}
+TEST_F(TokenizeToCallback, Domain_Strings) {
+ PW_TOKENIZE_TO_CALLBACK_DOMAIN(
+ "TEST_DOMAIN", SetMessage, "The answer is: %s", "5432!");
+ constexpr std::array<uint8_t, 10> expected =
+ ExpectedData<5, '5', '4', '3', '2', '!'>("The answer is: %s");
+ ASSERT_EQ(expected.size(), message_size_bytes_);
+ EXPECT_EQ(std::memcmp(expected.data(), message_, expected.size()), 0);
+}
+
TEST_F(TokenizeToCallback, C_SequentialZigZag) {
pw_TokenizeToCallbackTest_SequentialZigZag(SetMessage);
@@ -391,5 +412,58 @@
EXPECT_EQ(std::memcmp(expected.data(), message_, expected.size()), 0);
}
+// Hijack the PW_TOKENIZE_STRING_DOMAIN macro to capture the domain name.
+#undef PW_TOKENIZE_STRING_DOMAIN
+#define PW_TOKENIZE_STRING_DOMAIN(domain, string) \
+ /* assigned to a variable */ PW_TOKENIZER_STRING_TOKEN(string); \
+ tokenizer_domain = domain; \
+ string_literal = string
+
+TEST_F(TokenizeToBuffer, Domain_Default) {
+ const char* tokenizer_domain = nullptr;
+ const char* string_literal = nullptr;
+
+ size_t message_size = sizeof(buffer_);
+
+ PW_TOKENIZE_TO_BUFFER(buffer_, &message_size, "The answer is: %s", "5432!");
+
+ EXPECT_STREQ(tokenizer_domain, PW_TOKENIZER_DEFAULT_DOMAIN);
+ EXPECT_STREQ(string_literal, "The answer is: %s");
+}
+
+TEST_F(TokenizeToBuffer, Domain_Specified) {
+ const char* tokenizer_domain = nullptr;
+ const char* string_literal = nullptr;
+
+ size_t message_size = sizeof(buffer_);
+
+ PW_TOKENIZE_TO_BUFFER_DOMAIN(
+ "._.", buffer_, &message_size, "The answer is: %s", "5432!");
+
+ EXPECT_STREQ(tokenizer_domain, "._.");
+ EXPECT_STREQ(string_literal, "The answer is: %s");
+}
+
+TEST_F(TokenizeToCallback, Domain_Default) {
+ const char* tokenizer_domain = nullptr;
+ const char* string_literal = nullptr;
+
+ PW_TOKENIZE_TO_CALLBACK(SetMessage, "The answer is: %s", "5432!");
+
+ EXPECT_STREQ(tokenizer_domain, PW_TOKENIZER_DEFAULT_DOMAIN);
+ EXPECT_STREQ(string_literal, "The answer is: %s");
+}
+
+TEST_F(TokenizeToCallback, Domain_Specified) {
+ const char* tokenizer_domain = nullptr;
+ const char* string_literal = nullptr;
+
+ PW_TOKENIZE_TO_CALLBACK_DOMAIN(
+ "ThisIsTheDomain", SetMessage, "The answer is: %s", "5432!");
+
+ EXPECT_STREQ(tokenizer_domain, "ThisIsTheDomain");
+ EXPECT_STREQ(string_literal, "The answer is: %s");
+}
+
} // namespace
} // namespace pw::tokenizer
diff --git a/pw_tokenizer/tokenize_to_global_handler.cc b/pw_tokenizer/tokenize_to_global_handler.cc
index 685a266..ecca3b4 100644
--- a/pw_tokenizer/tokenize_to_global_handler.cc
+++ b/pw_tokenizer/tokenize_to_global_handler.cc
@@ -19,9 +19,9 @@
namespace pw {
namespace tokenizer {
-extern "C" void pw_TokenizeToGlobalHandler(pw_TokenizerStringToken token,
- pw_TokenizerArgTypes types,
- ...) {
+extern "C" void _pw_TokenizeToGlobalHandler(pw_TokenizerStringToken token,
+ pw_TokenizerArgTypes types,
+ ...) {
EncodedMessage encoded;
encoded.token = token;
diff --git a/pw_tokenizer/tokenize_to_global_handler_with_payload.cc b/pw_tokenizer/tokenize_to_global_handler_with_payload.cc
index 2a0662f..b04e549 100644
--- a/pw_tokenizer/tokenize_to_global_handler_with_payload.cc
+++ b/pw_tokenizer/tokenize_to_global_handler_with_payload.cc
@@ -19,7 +19,7 @@
namespace pw {
namespace tokenizer {
-extern "C" void pw_TokenizeToGlobalHandlerWithPayload(
+extern "C" void _pw_TokenizeToGlobalHandlerWithPayload(
const pw_TokenizerPayload payload,
pw_TokenizerStringToken token,
pw_TokenizerArgTypes types,
diff --git a/pw_tokenizer/tokenizer_linker_sections.ld b/pw_tokenizer/tokenizer_linker_sections.ld
deleted file mode 100644
index db08481..0000000
--- a/pw_tokenizer/tokenizer_linker_sections.ld
+++ /dev/null
@@ -1,64 +0,0 @@
-/*
- * Copyright 2020 The Pigweed Authors
- *
- * Licensed under the Apache License, Version 2.0 (the "License"); you may not
- * use this file except in compliance with the License. You may obtain a copy of
- * the License at
- *
- * https://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
- * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
- * License for the specific language governing permissions and limitations under
- * the License.
- *
- * This linker script snippet declares the sections needed for string
- * tokenization.
- *
- * This file may be directly included in a linker script with an include
- * directive. For example,
- *
- * INCLUDE path/to/modules/pw_tokenizer/tokenizer_linker_sections.ld
- *
- * SECTIONS
- * {
- * (your existing linker sections)
- * }
- */
-
-SECTIONS
-{
- /*
- * All tokenized strings are stored in this section. Since the section has
- * type INFO, it is excluded from the final binary.
- *
- * In the compiled code, format string literals are replaced by a hash of the
- * string contents and a compact argument list encoded in a uint32_t. The
- * compiled code contains no references to the tokenized strings in this
- * section.
- *
- * The section contents are declared with KEEP so that they are not removed
- * from the ELF. These are never emitted in the final binary or loaded into
- * memory.
- */
- .tokenized 0x00000000 (INFO) :
- {
- KEEP(*(.tokenized))
- KEEP(*(.tokenized.*))
- }
-
- /*
- * This section stores metadata that may be used during tokenized string
- * decoding. This metadata describes properties that may affect how the
- * tokenized string is encoded or decoded -- the maximum length of the hash
- * function and the sizes of certain integer types.
- *
- * Metadata is declared as key-value pairs. See the metadata variable in
- * tokenize.cc for further details.
- */
- .tokenizer_info 0x00000000 (INFO) :
- {
- KEEP(*(.tokenizer_info))
- }
-}