pw_tokenizer: Tokenization domains - Add support for tokenization domains. Strings tokenized in each domain are stored separately and can be processed differently. Using domains is optional; strings are tokenized in the "default" domain by default. - Provide *_DOMAIN versions of the tokenization macros to allow specifying the domain. - Update docs and add tests for tokenization domain macros. - Clean up the tokenizer macros to reduce duplication. - Use a leading underscore for the private tokenizer functions. - Use the pw_ prefix for the ELF sections and the linker script example. Change-Id: I14c6e84a7a954669c9ddf50a9b6d32b8e19d6f16

commit: d58eef9aa8833f6e1b227d685d90876e721a6d1f [log] [tgz]
author: Wyatt Hepler <hepler@google.com> Fri May 08 10:39:56 2020 -0700
committer: CQ Bot Account <commit-bot@chromium.org> Thu May 14 16:09:38 2020 +0000
tree: 63ec75b4ad3880170ebe951e62d038157a7b8ab1
parent: f65bb9c4564c0281b5168661efe88be320c2f89e [diff]
diff --git a/pw_tokenizer/docs.rst b/pw_tokenizer/docs.rst
index 8efcc67..6d57d49 100644
--- a/pw_tokenizer/docs.rst
+++ b/pw_tokenizer/docs.rst

@@ -120,7 +120,7 @@
      are provided. For Make or other build systems, add the files specified in
      the BUILD.gn's ``pw_tokenizer`` target to the build.
   2. Use the tokenization macros in your code. See `Tokenization`_.
-  3. Add the contents of ``tokenizer_linker_sections.ld`` to your project's
+  3. Add the contents of ``pw_tokenizer_linker_sections.ld`` to your project's
      linker script.
   4. Compile your code to produce an ELF file.
   5. Run ``database.py create`` on the ELF file to generate a CSV token
@@ -341,6 +341,36 @@
 primarily in C++ may use a large value for ``PW_TOKENIZER_CFG_HASH_LENGTH``
 (perhaps even ``std::numeric_limits<size_t>::max()``).
 
+Tokenization domains
+--------------------
+``pw_tokenizer`` supports having multiple tokenization domains. Strings from
+each tokenization domain are stored in separate sections in the ELF file. This
+allows projects to keep tokens from different sources separate. Potential use
+cases include the following:
+
+* Keep large sets of tokenized strings separate to avoid collisions.
+* Create a separate database for a small number of strings that use truncated
+  tokens, for example only 10 or 16 bits instead of the full 32 bits.
+
+Strings are tokenized by default into the "default" domain. For many projects,
+a single tokenization domain is sufficient, so no additional configuration is
+required.
+
+To support other multiple domains, add a ``pw_tokenized.<new domain name>``
+linker section, as described in ``pw_tokenizer_linker_sections.ld``. Strings are
+tokenized into a domain by providing the domain name as a string literal to the
+``*_DOMAIN`` versions of the tokenization macros. Domain names must be comprised
+of alphanumeric characters and underscores; spaces and special characters are
+not permitted.
+
+.. code-block:: cpp
+
+  // Tokenizes this string to the "default" domain.
+  PW_TOKENIZE_STRING("Hello, world!");
+
+  // Tokenizes this string to the "my_custom_domain" domain.
+  PW_TOKENIZE_STRING_DOMAIN("my_custom_domain", "Hello, world!");
+
 Token databases
 ===============
 Token databases store a mapping of tokens to the strings they represent. An ELF
@@ -750,9 +780,9 @@
 
 Supporting detokenization of strings tokenized on 64-bit targets would be
 simple. This could be done by adding an option to switch the 32-bit types to
-64-bit. The tokenizer stores the sizes of these types in the ``.tokenizer_info``
-ELF section, so the sizes of these types can be verified by checking the ELF
-file, if necessary.
+64-bit. The tokenizer stores the sizes of these types in the
+``.pw_tokenizer_info`` ELF section, so the sizes of these types can be verified
+by checking the ELF file, if necessary.
 
 Tokenization in headers
 -----------------------

diff --git a/pw_tokenizer/encode_args.cc b/pw_tokenizer/encode_args.cc
index 4697dc8..27fa7f2 100644
--- a/pw_tokenizer/encode_args.cc
+++ b/pw_tokenizer/encode_args.cc

@@ -27,10 +27,10 @@
 // Store metadata about this compilation's string tokenization in the ELF.
 //
 // The tokenizer metadata will not go into the on-device executable binary code.
-// This metadata will be present in the ELF file's .tokenizer_info section, from
-// which the host-side tooling (Python, Java, etc.) can understand how to decode
-// tokenized strings for the given binary. Only attributes that affect the
-// decoding process are recorded.
+// This metadata will be present in the ELF file's .pw_tokenizer_info section,
+// from which the host-side tooling (Python, Java, etc.) can understand how to
+// decode tokenized strings for the given binary. Only attributes that affect
+// the decoding process are recorded.
 //
 // Tokenizer metadata is stored in an array of key-value pairs. Each Metadata
 // object is 32 bytes: a 24-byte string and an 8-byte value. Metadata structs
@@ -42,8 +42,16 @@
 
 static_assert(sizeof(Metadata) == 32);
 
-// Store tokenization metadata in its own section.
-constexpr Metadata metadata[] PW_KEEP_IN_SECTION(".tokenzier_info") = {
+// Store tokenization metadata in its own section. Mach-O files are not
+// supported by pw_tokenizer, but a short, Mach-O compatible section name is
+// used on macOS so that this file can at least compile.
+#if __APPLE__
+#define PW_TOKENIZER_INFO_SECTION PW_KEEP_IN_SECTION(".pw_info")
+#else
+#define PW_TOKENIZER_INFO_SECTION PW_KEEP_IN_SECTION(".pw_tokenzier_info")
+#endif  // __APPLE__
+
+constexpr Metadata metadata[] PW_TOKENIZER_INFO_SECTION = {
     {"hash_length_bytes", PW_TOKENIZER_CFG_HASH_LENGTH},
     {"sizeof_long", sizeof(long)},            // %l conversion specifier
     {"sizeof_intmax_t", sizeof(intmax_t)},    // %j conversion specifier

diff --git a/pw_tokenizer/global_handlers_test.cc b/pw_tokenizer/global_handlers_test.cc
index 8bd7003..89885a8 100644
--- a/pw_tokenizer/global_handlers_test.cc
+++ b/pw_tokenizer/global_handlers_test.cc

@@ -94,6 +94,15 @@
   EXPECT_EQ(std::memcmp(expected.data(), message_, expected.size()), 0);
 }
 
+TEST_F(TokenizeToGlobalHandler, Domain_Strings) {
+  PW_TOKENIZE_TO_GLOBAL_HANDLER_DOMAIN(
+      "TEST_DOMAIN", "The answer is: %s", "5432!");
+  constexpr std::array<uint8_t, 10> expected =
+      ExpectedData<5, '5', '4', '3', '2', '!'>("The answer is: %s");
+  ASSERT_EQ(expected.size(), message_size_bytes_);
+  EXPECT_EQ(std::memcmp(expected.data(), message_, expected.size()), 0);
+}
+
 TEST_F(TokenizeToGlobalHandler, C_SequentialZigZag) {
   pw_TokenizeToGlobalHandlerTest_SequentialZigZag();
 
@@ -148,22 +157,35 @@
   EXPECT_EQ(payload_, -543);
 }
 
-TEST_F(TokenizeToGlobalHandlerWithPayload, Strings) {
-  constexpr std::array<uint8_t, 10> expected =
-      ExpectedData<5, '5', '4', '3', '2', '!'>("The answer is: %s");
+constexpr std::array<uint8_t, 10> kExpected =
+    ExpectedData<5, '5', '4', '3', '2', '!'>("The answer is: %s");
 
+TEST_F(TokenizeToGlobalHandlerWithPayload, Strings_ZeroPayload) {
+  PW_TOKENIZE_TO_GLOBAL_HANDLER_WITH_PAYLOAD({}, "The answer is: %s", "5432!");
+
+  ASSERT_EQ(kExpected.size(), message_size_bytes_);
+  EXPECT_EQ(std::memcmp(kExpected.data(), message_, kExpected.size()), 0);
+  EXPECT_EQ(payload_, 0);
+}
+
+TEST_F(TokenizeToGlobalHandlerWithPayload, Strings_NonZeroPayload) {
   PW_TOKENIZE_TO_GLOBAL_HANDLER_WITH_PAYLOAD(
       static_cast<pw_TokenizerPayload>(5432), "The answer is: %s", "5432!");
 
-  ASSERT_EQ(expected.size(), message_size_bytes_);
-  EXPECT_EQ(std::memcmp(expected.data(), message_, expected.size()), 0);
+  ASSERT_EQ(kExpected.size(), message_size_bytes_);
+  EXPECT_EQ(std::memcmp(kExpected.data(), message_, kExpected.size()), 0);
   EXPECT_EQ(payload_, 5432);
+}
 
-  PW_TOKENIZE_TO_GLOBAL_HANDLER_WITH_PAYLOAD({}, "The answer is: %s", "5432!");
-
-  ASSERT_EQ(expected.size(), message_size_bytes_);
-  EXPECT_EQ(std::memcmp(expected.data(), message_, expected.size()), 0);
-  EXPECT_EQ(payload_, 0);
+TEST_F(TokenizeToGlobalHandlerWithPayload, Domain_Strings) {
+  PW_TOKENIZE_TO_GLOBAL_HANDLER_WITH_PAYLOAD_DOMAIN(
+      "TEST_DOMAIN",
+      static_cast<pw_TokenizerPayload>(5432),
+      "The answer is: %s",
+      "5432!");
+  ASSERT_EQ(kExpected.size(), message_size_bytes_);
+  EXPECT_EQ(std::memcmp(kExpected.data(), message_, kExpected.size()), 0);
+  EXPECT_EQ(payload_, 5432);
 }
 
 struct Foo {
@@ -207,5 +229,54 @@
   TokenizeToGlobalHandlerWithPayload::SetPayload(payload);
 }
 
+// Hijack the PW_TOKENIZE_STRING_DOMAIN macro to capture the tokenizer domain.
+#undef PW_TOKENIZE_STRING_DOMAIN
+#define PW_TOKENIZE_STRING_DOMAIN(domain, string)                 \
+  /* assigned to a variable */ PW_TOKENIZER_STRING_TOKEN(string); \
+  tokenizer_domain = domain;                                      \
+  string_literal = string
+
+TEST_F(TokenizeToGlobalHandler, Domain_Default) {
+  const char* tokenizer_domain = nullptr;
+  const char* string_literal = nullptr;
+
+  PW_TOKENIZE_TO_GLOBAL_HANDLER("404");
+
+  EXPECT_STREQ(tokenizer_domain, PW_TOKENIZER_DEFAULT_DOMAIN);
+  EXPECT_STREQ(string_literal, "404");
+}
+
+TEST_F(TokenizeToGlobalHandler, Domain_Specified) {
+  const char* tokenizer_domain = nullptr;
+  const char* string_literal = nullptr;
+
+  PW_TOKENIZE_TO_GLOBAL_HANDLER_DOMAIN("www.google.com", "404");
+
+  EXPECT_STREQ(tokenizer_domain, "www.google.com");
+  EXPECT_STREQ(string_literal, "404");
+}
+
+TEST_F(TokenizeToGlobalHandlerWithPayload, Domain_Default) {
+  const char* tokenizer_domain = nullptr;
+  const char* string_literal = nullptr;
+
+  PW_TOKENIZE_TO_GLOBAL_HANDLER_WITH_PAYLOAD(
+      static_cast<pw_TokenizerPayload>(123), "Wow%s", "???");
+
+  EXPECT_STREQ(tokenizer_domain, PW_TOKENIZER_DEFAULT_DOMAIN);
+  EXPECT_STREQ(string_literal, "Wow%s");
+}
+
+TEST_F(TokenizeToGlobalHandlerWithPayload, Domain_Specified) {
+  const char* tokenizer_domain = nullptr;
+  const char* string_literal = nullptr;
+
+  PW_TOKENIZE_TO_GLOBAL_HANDLER_WITH_PAYLOAD_DOMAIN(
+      "THEDOMAIN", static_cast<pw_TokenizerPayload>(123), "1234567890");
+
+  EXPECT_STREQ(tokenizer_domain, "THEDOMAIN");
+  EXPECT_STREQ(string_literal, "1234567890");
+}
+
 }  // namespace
 }  // namespace pw::tokenizer

diff --git a/pw_tokenizer/public/pw_tokenizer/tokenize.h b/pw_tokenizer/public/pw_tokenizer/tokenize.h
index 10596ee..f5af3c4 100644
--- a/pw_tokenizer/public/pw_tokenizer/tokenize.h
+++ b/pw_tokenizer/public/pw_tokenizer/tokenize.h

@@ -24,6 +24,14 @@
 #include "pw_tokenizer/internal/argument_types.h"
 #include "pw_tokenizer/internal/tokenize_string.h"
 
+// Strings may optionally be tokenized to a domain. Strings in different domains
+// can be processed separately by the token database tools. Each domain in use
+// must have a corresponding section declared in the linker script. See
+// pw_tokenizer_linker_sections.ld for more details.
+//
+// If no domain is specified, this default is used.
+#define PW_TOKENIZER_DEFAULT_DOMAIN "default"
+
 // Tokenizes a string literal and converts it to a pw_TokenizerStringToken. This
 // expression can be assigned to a local or global variable, but cannot be used
 // in another expression. For example:
@@ -37,7 +45,19 @@
 //   }
 //
 #define PW_TOKENIZE_STRING(string_literal) \
-  _PW_TOKENIZE_LITERAL_UNIQUE(__COUNTER__, string_literal)
+  PW_TOKENIZE_STRING_DOMAIN(PW_TOKENIZER_DEFAULT_DOMAIN, string_literal)
+
+// Same as PW_TOKENIZE_STRING, but tokenizes to the specified domain.
+#define PW_TOKENIZE_STRING_DOMAIN(domain, string_literal)                     \
+  /* assign to a variable */ PW_TOKENIZER_STRING_TOKEN(string_literal);       \
+                                                                              \
+  /* Declare the format string as an array in the special tokenized string */ \
+  /* section, which should be excluded from the final binary. Use __LINE__ */ \
+  /* to create unique names for the section and variable, which avoids     */ \
+  /* compiler warnings.                                                    */ \
+  static _PW_TOKENIZER_CONST char PW_CONCAT(                                  \
+      _pw_tokenizer_string_literal_DO_NOT_USE_,                               \
+      __LINE__)[] _PW_TOKENIZER_SECTION(domain) = string_literal
 
 // Encodes a tokenized string and arguments to the provided buffer. The size of
 // the buffer is passed via a pointer to a size_t. After encoding is complete,
@@ -60,13 +80,22 @@
 //   MyProject_EnqueueMessageForUart(buffer, size);
 //
 #define PW_TOKENIZE_TO_BUFFER(buffer, buffer_size_pointer, format, ...) \
-  do {                                                                  \
-    _PW_TOKENIZE_STRING(format, __VA_ARGS__);                           \
-    pw_TokenizeToBuffer(buffer,                                         \
-                        buffer_size_pointer,                            \
-                        _pw_tokenizer_token,                            \
-                        PW_TOKENIZER_ARG_TYPES(__VA_ARGS__)             \
-                            PW_COMMA_ARGS(__VA_ARGS__));                \
+  PW_TOKENIZE_TO_BUFFER_DOMAIN(PW_TOKENIZER_DEFAULT_DOMAIN,             \
+                               buffer,                                  \
+                               buffer_size_pointer,                     \
+                               format,                                  \
+                               __VA_ARGS__)
+
+// Same as PW_TOKENIZE_TO_BUFFER, but tokenizes to the specified domain.
+#define PW_TOKENIZE_TO_BUFFER_DOMAIN(                        \
+    domain, buffer, buffer_size_pointer, format, ...)        \
+  do {                                                       \
+    _PW_TOKENIZE_FORMAT_STRING(domain, format, __VA_ARGS__); \
+    _pw_TokenizeToBuffer(buffer,                             \
+                         buffer_size_pointer,                \
+                         _pw_tokenizer_token,                \
+                         PW_TOKENIZER_ARG_TYPES(__VA_ARGS__) \
+                             PW_COMMA_ARGS(__VA_ARGS__));    \
   } while (0)
 
 // Encodes a tokenized string and arguments to a buffer on the stack. The
@@ -97,30 +126,34 @@
 //                             value);
 //   }
 //
-#define PW_TOKENIZE_TO_CALLBACK(callback, format, ...)        \
-  do {                                                        \
-    _PW_TOKENIZE_STRING(format, __VA_ARGS__);                 \
-    pw_TokenizeToCallback(callback,                           \
-                          _pw_tokenizer_token,                \
-                          PW_TOKENIZER_ARG_TYPES(__VA_ARGS__) \
-                              PW_COMMA_ARGS(__VA_ARGS__));    \
+#define PW_TOKENIZE_TO_CALLBACK(callback, format, ...) \
+  PW_TOKENIZE_TO_CALLBACK_DOMAIN(                      \
+      PW_TOKENIZER_DEFAULT_DOMAIN, callback, format, __VA_ARGS__)
+
+#define PW_TOKENIZE_TO_CALLBACK_DOMAIN(domain, callback, format, ...) \
+  do {                                                                \
+    _PW_TOKENIZE_FORMAT_STRING(domain, format, __VA_ARGS__);          \
+    _pw_TokenizeToCallback(callback,                                  \
+                           _pw_tokenizer_token,                       \
+                           PW_TOKENIZER_ARG_TYPES(__VA_ARGS__)        \
+                               PW_COMMA_ARGS(__VA_ARGS__));           \
   } while (0)
 
 PW_EXTERN_C_START
 
 // These functions encode the tokenized strings. These should not be called
 // directly. Instead, use the corresponding PW_TOKENIZE_TO_* macros above.
-void pw_TokenizeToBuffer(void* buffer,
-                         size_t* buffer_size_bytes,  // input and output arg
-                         pw_TokenizerStringToken token,
-                         pw_TokenizerArgTypes types,
-                         ...);
+void _pw_TokenizeToBuffer(void* buffer,
+                          size_t* buffer_size_bytes,  // input and output arg
+                          pw_TokenizerStringToken token,
+                          pw_TokenizerArgTypes types,
+                          ...);
 
-void pw_TokenizeToCallback(void (*callback)(const uint8_t* encoded_message,
-                                            size_t size_bytes),
-                           pw_TokenizerStringToken token,
-                           pw_TokenizerArgTypes types,
-                           ...);
+void _pw_TokenizeToCallback(void (*callback)(const uint8_t* encoded_message,
+                                             size_t size_bytes),
+                            pw_TokenizerStringToken token,
+                            pw_TokenizerArgTypes types,
+                            ...);
 
 // This empty function allows the compiler to check the format string.
 inline void pw_TokenizerCheckFormatString(const char* format, ...)
@@ -134,24 +167,12 @@
 
 // These macros implement string tokenization. They should not be used directly;
 // use one of the PW_TOKENIZE_* macros above instead.
-#define _PW_TOKENIZE_LITERAL_UNIQUE(id, string_literal)                     \
-  /* assign to a variable */ PW_TOKENIZER_STRING_TOKEN(string_literal);     \
-                                                                            \
-  /* Declare without nested scope so this works in or out of a function. */ \
-  static _PW_TOKENIZER_CONST char PW_CONCAT(                                \
-      _pw_tokenizer_string_literal_DO_NOT_USE_THIS_VARIABLE_,               \
-      id)[] _PW_TOKENIZER_SECTION(id) = string_literal
-
-// This macro uses __COUNTER__ to generate an identifier to use in the main
-// tokenization macro below. The identifier is unique within a compilation unit.
-#define _PW_TOKENIZE_STRING(format, ...) \
-  _PW_TOKENIZE_STRING_UNIQUE(__COUNTER__, format, __VA_ARGS__)
 
 // This macro takes a printf-style format string and corresponding arguments. It
 // checks that the arguments are correct, stores the format string in a special
 // section, and calculates the string's token at compile time.
 // clang-format off
-#define _PW_TOKENIZE_STRING_UNIQUE(id, format, ...)                            \
+#define _PW_TOKENIZE_FORMAT_STRING(domain, format, ...)                     \
   if (0) { /* Do not execute to prevent double evaluation of the arguments. */ \
     pw_TokenizerCheckFormatString(format PW_COMMA_ARGS(__VA_ARGS__));          \
   }                                                                            \
@@ -164,15 +185,9 @@
       PW_STRINGIFY(PW_ARG_COUNT(__VA_ARGS__)) " arguments were used for "      \
       #format " (" #__VA_ARGS__ ")");                                          \
                                                                                \
-  /* Declare the format string as an array in the special tokenized string */  \
-  /* section, which should be excluded from the final binary. Use unique   */  \
-  /* names for the section and variable to avoid compiler warnings.        */  \
-  static _PW_TOKENIZER_CONST char PW_CONCAT(                                   \
-      _pw_tokenizer_format_string_, id)[] _PW_TOKENIZER_SECTION(id) = format;  \
-                                                                               \
   /* Tokenize the string to a pw_TokenizerStringToken at compile time. */      \
   _PW_TOKENIZER_CONST pw_TokenizerStringToken _pw_tokenizer_token =            \
-      PW_TOKENIZER_STRING_TOKEN(format)
+      PW_TOKENIZE_STRING_DOMAIN(domain, format)
 
 // clang-format on
 
@@ -182,41 +197,43 @@
 #define _PW_TOKENIZER_CONST const
 #endif  // __cplusplus
 
-// _PW_TOKENIZER_SECTION places the format string in a special .tokenized.#
+// _PW_TOKENIZER_SECTION places the format string in a special .pw_tokenized
 // linker section. Host-side decoding tools read the strings from this section
 // to build a database of tokenized strings.
 //
 // This section should be declared as type INFO so that it is excluded from the
-// final binary. To declare the section, as well as the .tokenizer_info section,
-// used for tokenizer metadata, add the following to the linker
-// script's SECTIONS command:
+// final binary. To declare the section, as well as the .pw_tokenizer_info
+// metadata section, add the following to the linker script's SECTIONS command:
 //
-//   .tokenized 0x00000000 (INFO) :
+//   .pw_tokenizer_info 0x0 (INFO) :
 //   {
-//     KEEP(*(.tokenized))
-//     KEEP(*(.tokenized.*))
+//     KEEP(*(.pw_atokenizer_info))
 //   }
 //
-//   .tokenizer_info 0x00000000 (INFO) :
+//   .pw_tokenized.default 0x0 (INFO) :
 //   {
-//     KEEP(*(.tokenizer_info))
+//     KEEP(*(.pw_tokenized.default.*))
 //   }
 //
-// Any address could be used for this section, but it should not map to a real
-// device to avoid confusion. 0x00000000 is a reasonable default. An address
-// such as 0xFF000000 that is outside of the ARMv7m memory map could also be
-// used.
+//
+// If custom tokenization domains are used, a section must be declared for each
+// domain:
+//
+//   .pw_tokenized.YOUR_CUSTOM_TOKENIZATION_DOMAIN 0x0 (INFO) :
+//   {
+//     KEEP(*(.pw_tokenized.YOUR_CUSTOM_TOKENIZATION_DOMAIN.*))
+//   }
 //
 // A linker script snippet that provides these sections is provided in the file
-// tokenizer_linker_sections.ld. This file may be directly included into
+// pw_tokenizer_linker_sections.ld. This file may be directly included into
 // existing linker scripts.
 //
 // The tokenized string sections can also be managed without linker script
 // modifications, though this is not recommended. The section can be extracted
 // and removed from the ELF with objcopy:
 //
-//   objcopy --only-section .tokenize* <ORIGINAL_ELF> <OUTPUT_ELF>
-//   objcopy --remove-section .tokenize* <ORIGINAL_ELF>
+//   objcopy --only-section .pw_tokenize* <ORIGINAL_ELF> <OUTPUT_ELF>
+//   objcopy --remove-section .pw_tokenize* <ORIGINAL_ELF>
 //
 // OUTPUT_ELF will be an ELF with only the tokenized strings, and the original
 // ELF file will have the sections removed.
@@ -225,5 +242,14 @@
 // option (--gc-sections) removes the tokenized string sections. To avoid
 // editing the target linker script, a separate metadata ELF can be linked
 // without --gc-sections to preserve the tokenized data.
-#define _PW_TOKENIZER_SECTION(unique) \
-  PW_KEEP_IN_SECTION(PW_STRINGIFY(PW_CONCAT(.tokenized., unique)))
+//
+// pw_tokenizer is intended for use with ELF files only. Mach-O files (macOS
+// executables) do not support section names longer than 16 characters, so a
+// short, dummy section name is used on macOS.
+#if __APPLE__
+#define _PW_TOKENIZER_SECTION(unused_domain) \
+  PW_KEEP_IN_SECTION(".pw." PW_STRINGIFY(__LINE__))
+#else
+#define _PW_TOKENIZER_SECTION(domain) \
+  PW_KEEP_IN_SECTION(".pw_tokenized." domain "." PW_STRINGIFY(__LINE__))
+#endif  // __APPLE__

diff --git a/pw_tokenizer/public/pw_tokenizer/tokenize_to_global_handler.h b/pw_tokenizer/public/pw_tokenizer/tokenize_to_global_handler.h
index 2bd2eea..ce41741 100644
--- a/pw_tokenizer/public/pw_tokenizer/tokenize_to_global_handler.h
+++ b/pw_tokenizer/public/pw_tokenizer/tokenize_to_global_handler.h

@@ -13,6 +13,9 @@
 // the License.
 #pragma once
 
+#include <stddef.h>
+#include <stdint.h>
+
 #include "pw_preprocessor/util.h"
 #include "pw_tokenizer/tokenize.h"
 
@@ -39,22 +42,31 @@
 //     MyProject_EnqueueMessageForUart(buffer, size_bytes);
 //   }
 //
-#define PW_TOKENIZE_TO_GLOBAL_HANDLER(format, ...)                 \
-  do {                                                             \
-    _PW_TOKENIZE_STRING(format, __VA_ARGS__);                      \
-    pw_TokenizeToGlobalHandler(_pw_tokenizer_token,                \
-                               PW_TOKENIZER_ARG_TYPES(__VA_ARGS__) \
-                                   PW_COMMA_ARGS(__VA_ARGS__));    \
+#define PW_TOKENIZE_TO_GLOBAL_HANDLER(format, ...) \
+  PW_TOKENIZE_TO_GLOBAL_HANDLER_DOMAIN(            \
+      PW_TOKENIZER_DEFAULT_DOMAIN, format, __VA_ARGS__)
+
+// Same as PW_TOKENIZE_TO_GLOBAL_HANDLER, but tokenizes to the specified domain.
+#define PW_TOKENIZE_TO_GLOBAL_HANDLER_DOMAIN(domain, format, ...)   \
+  do {                                                              \
+    _PW_TOKENIZE_FORMAT_STRING(domain, format, __VA_ARGS__);        \
+    _pw_TokenizeToGlobalHandler(_pw_tokenizer_token,                \
+                                PW_TOKENIZER_ARG_TYPES(__VA_ARGS__) \
+                                    PW_COMMA_ARGS(__VA_ARGS__));    \
   } while (0)
 
+PW_EXTERN_C_START
+
 // This function must be defined by the pw_tokenizer:global_handler backend.
 // This function is called with the encoded message by
 // pw_TokenizeToGlobalHandler.
-PW_EXTERN_C void pw_TokenizerHandleEncodedMessage(
-    const uint8_t encoded_message[], size_t size_bytes);
+void pw_TokenizerHandleEncodedMessage(const uint8_t encoded_message[],
+                                      size_t size_bytes);
 
 // This function encodes the tokenized strings. Do not call it directly;
 // instead, use the PW_TOKENIZE_TO_GLOBAL_HANDLER macro.
-PW_EXTERN_C void pw_TokenizeToGlobalHandler(pw_TokenizerStringToken token,
-                                            pw_TokenizerArgTypes types,
-                                            ...);
+void _pw_TokenizeToGlobalHandler(pw_TokenizerStringToken token,
+                                 pw_TokenizerArgTypes types,
+                                 ...);
+
+PW_EXTERN_C_END

diff --git a/pw_tokenizer/public/pw_tokenizer/tokenize_to_global_handler_with_payload.h b/pw_tokenizer/public/pw_tokenizer/tokenize_to_global_handler_with_payload.h
index e5ee036..df1800f 100644
--- a/pw_tokenizer/public/pw_tokenizer/tokenize_to_global_handler_with_payload.h
+++ b/pw_tokenizer/public/pw_tokenizer/tokenize_to_global_handler_with_payload.h

@@ -13,6 +13,9 @@
 // the License.
 #pragma once
 
+#include <stddef.h>
+#include <stdint.h>
+
 #include "pw_preprocessor/util.h"
 #include "pw_tokenizer/tokenize.h"
 
@@ -20,8 +23,7 @@
 // to a buffer on the stack. The macro adds a payload argument, which is passed
 // through to the global handler function
 // pw_TokenizerHandleEncodedMessageWithPayload, which must be defined by the
-// user of pw_tokenizer. The payload type is specified by the
-// PW_TOKENIZER_CFG_PAYLOAD_TYPE option and defaults to void*.
+// user of pw_tokenizer. The payload is a uintptr_t.
 //
 // For example, the following tokenizes a log string and passes the log level as
 // the payload.
@@ -38,29 +40,39 @@
        }
      }
  */
-#define PW_TOKENIZE_TO_GLOBAL_HANDLER_WITH_PAYLOAD(payload, format, ...)      \
-  do {                                                                        \
-    _PW_TOKENIZE_STRING(format, __VA_ARGS__);                                 \
-    pw_TokenizeToGlobalHandlerWithPayload(payload,                            \
-                                          _pw_tokenizer_token,                \
-                                          PW_TOKENIZER_ARG_TYPES(__VA_ARGS__) \
-                                              PW_COMMA_ARGS(__VA_ARGS__));    \
+#define PW_TOKENIZE_TO_GLOBAL_HANDLER_WITH_PAYLOAD(payload, format, ...) \
+  PW_TOKENIZE_TO_GLOBAL_HANDLER_WITH_PAYLOAD_DOMAIN(                     \
+      PW_TOKENIZER_DEFAULT_DOMAIN, payload, format, __VA_ARGS__)
+
+// Same as PW_TOKENIZE_TO_GLOBAL_HANDLER_WITH_PAYLOAD, but tokenizes to the
+// specified domain.
+#define PW_TOKENIZE_TO_GLOBAL_HANDLER_WITH_PAYLOAD_DOMAIN(                     \
+    domain, payload, format, ...)                                              \
+  do {                                                                         \
+    _PW_TOKENIZE_FORMAT_STRING(domain, format, __VA_ARGS__);                   \
+    _pw_TokenizeToGlobalHandlerWithPayload(payload,                            \
+                                           _pw_tokenizer_token,                \
+                                           PW_TOKENIZER_ARG_TYPES(__VA_ARGS__) \
+                                               PW_COMMA_ARGS(__VA_ARGS__));    \
   } while (0)
 
+PW_EXTERN_C_START
+
 typedef uintptr_t pw_TokenizerPayload;
 
 // This function must be defined pw_tokenizer:global_handler_with_payload
 // backend. This function is called with the encoded message by
 // pw_TokenizeToGlobalHandler and a caller-provided payload argument.
-PW_EXTERN_C void pw_TokenizerHandleEncodedMessageWithPayload(
+void pw_TokenizerHandleEncodedMessageWithPayload(
     pw_TokenizerPayload payload,
     const uint8_t encoded_message[],
     size_t size_bytes);
 
 // This function encodes the tokenized strings. Do not call it directly;
 // instead, use the PW_TOKENIZE_TO_GLOBAL_HANDLER_WITH_PAYLOAD macro.
-PW_EXTERN_C void pw_TokenizeToGlobalHandlerWithPayload(
-    pw_TokenizerPayload payload,
-    pw_TokenizerStringToken token,
-    pw_TokenizerArgTypes types,
-    ...);
+void _pw_TokenizeToGlobalHandlerWithPayload(pw_TokenizerPayload payload,
+                                            pw_TokenizerStringToken token,
+                                            pw_TokenizerArgTypes types,
+                                            ...);
+
+PW_EXTERN_C_END

diff --git a/pw_tokenizer/pw_tokenizer_linker_sections.ld b/pw_tokenizer/pw_tokenizer_linker_sections.ld
new file mode 100644
index 0000000..afaba34
--- /dev/null
+++ b/pw_tokenizer/pw_tokenizer_linker_sections.ld

@@ -0,0 +1,82 @@
+/*
+ * Copyright 2020 The Pigweed Authors
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License"); you may not
+ * use this file except in compliance with the License. You may obtain a copy of
+ * the License at
+ *
+ *     https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ *
+ * This linker script snippet declares the sections needed for string
+ * tokenization. All sections have type INFO so they are excluded from the final
+ * binary.
+ *
+ * The contents of this script can be copied into an existing linker script.
+ * Alternately, this file can be directly included in a linker script with an
+ * include directive. For example,
+ *
+ *   INCLUDE path/to/modules/pw_tokenizer/pw_tokenizer_linker_sections.ld
+ *
+ *   SECTIONS
+ *   {
+ *     (your existing linker sections)
+ *   }
+ */
+
+SECTIONS
+{
+  /*
+   * This section stores metadata that may be used during tokenized string
+   * decoding. This metadata describes properties that may affect how the
+   * tokenized string is encoded or decoded -- the maximum length of the hash
+   * function and the sizes of certain integer types.
+   *
+   * Metadata is declared as key-value pairs. See the metadata variable in
+   * tokenize.cc for further details.
+   */
+  .pw_tokenizer_info 0x0 (INFO) :
+  {
+    KEEP(*(.pw_tokenizer_info))
+  }
+
+  /*
+   * Tokenized strings are stored in this section by default. In the compiled
+   * code, format string literals are replaced by a hash of the string contents
+   * and a compact argument list encoded in a uint32_t. The compiled code
+   * contains no references to the tokenized strings in this section.
+   *
+   * The section contents are declared with KEEP so that they are not removed
+   * from the ELF. These are never emitted in the final binary or loaded into
+   * memory.
+   */
+  .pw_tokenized.default 0x0 (INFO) :
+  {
+    KEEP(*(.pw_tokenized.default.*))
+  }
+
+/*
+ * Projects may define additional tokenization domains, if desired. Strings in
+ * different domains are stored in separate ELF sections so they can be
+ * processed separately by the token database tools.
+ *
+ * Use cases for domains include keeping large sets of strings separate to avoid
+ * collisions, or separating a small subset of strings that will use truncated
+ * tokens (e.g. 16-bit tokens instead of 32-bit tokens).
+ *
+ * Each tokenization domain in use must have a corresponding section in the
+ * linker script. As required, copy this section declaration and replace
+ * YOUR_CUSTOM_TOKENIZATION_DOMAIN with the the domain name.
+
+  .pw_tokenized.YOUR_CUSTOM_TOKENIZATION_DOMAIN 0x0 (INFO) :
+  {
+    KEEP(*(.pw_tokenized.YOUR_CUSTOM_TOKENIZATION_DOMAIN.*))
+  }
+
+ */
+}

diff --git a/pw_tokenizer/tokenize.cc b/pw_tokenizer/tokenize.cc
index 9edbb12..4f29876 100644
--- a/pw_tokenizer/tokenize.cc
+++ b/pw_tokenizer/tokenize.cc

@@ -26,11 +26,11 @@
 namespace pw {
 namespace tokenizer {
 
-extern "C" void pw_TokenizeToBuffer(void* buffer,
-                                    size_t* buffer_size_bytes,
-                                    pw_TokenizerStringToken token,
-                                    pw_TokenizerArgTypes types,
-                                    ...) {
+extern "C" void _pw_TokenizeToBuffer(void* buffer,
+                                     size_t* buffer_size_bytes,
+                                     pw_TokenizerStringToken token,
+                                     pw_TokenizerArgTypes types,
+                                     ...) {
   if (*buffer_size_bytes < sizeof(token)) {
     *buffer_size_bytes = 0;
     return;
@@ -50,7 +50,7 @@
   *buffer_size_bytes = sizeof(token) + encoded_bytes;
 }
 
-extern "C" void pw_TokenizeToCallback(
+extern "C" void _pw_TokenizeToCallback(
     void (*callback)(const uint8_t* encoded_message, size_t size_bytes),
     pw_TokenizerStringToken token,
     pw_TokenizerArgTypes types,

diff --git a/pw_tokenizer/tokenize_test.cc b/pw_tokenizer/tokenize_test.cc
index b61e3d9..18987fb 100644
--- a/pw_tokenizer/tokenize_test.cc
+++ b/pw_tokenizer/tokenize_test.cc

@@ -242,6 +242,18 @@
   EXPECT_EQ(std::memcmp(result.data(), buffer_, result.size()), 0);
 }
 
+TEST_F(TokenizeToBuffer, Domain_String) {
+  size_t message_size = sizeof(buffer_);
+
+  PW_TOKENIZE_TO_BUFFER_DOMAIN(
+      "TEST_DOMAIN", buffer_, &message_size, "The answer was: %s", "5432!");
+  constexpr std::array<uint8_t, 10> expected =
+      ExpectedData<5, '5', '4', '3', '2', '!'>("The answer was: %s");
+
+  ASSERT_EQ(expected.size(), message_size);
+  EXPECT_EQ(std::memcmp(expected.data(), buffer_, expected.size()), 0);
+}
+
 TEST_F(TokenizeToBuffer, TruncateArgs) {
   // Args that can't fit are dropped completely
   size_t message_size = 6;
@@ -381,6 +393,15 @@
   EXPECT_EQ(std::memcmp(expected.data(), message_, expected.size()), 0);
 }
 
+TEST_F(TokenizeToCallback, Domain_Strings) {
+  PW_TOKENIZE_TO_CALLBACK_DOMAIN(
+      "TEST_DOMAIN", SetMessage, "The answer is: %s", "5432!");
+  constexpr std::array<uint8_t, 10> expected =
+      ExpectedData<5, '5', '4', '3', '2', '!'>("The answer is: %s");
+  ASSERT_EQ(expected.size(), message_size_bytes_);
+  EXPECT_EQ(std::memcmp(expected.data(), message_, expected.size()), 0);
+}
+
 TEST_F(TokenizeToCallback, C_SequentialZigZag) {
   pw_TokenizeToCallbackTest_SequentialZigZag(SetMessage);
 
@@ -391,5 +412,58 @@
   EXPECT_EQ(std::memcmp(expected.data(), message_, expected.size()), 0);
 }
 
+// Hijack the PW_TOKENIZE_STRING_DOMAIN macro to capture the domain name.
+#undef PW_TOKENIZE_STRING_DOMAIN
+#define PW_TOKENIZE_STRING_DOMAIN(domain, string)                 \
+  /* assigned to a variable */ PW_TOKENIZER_STRING_TOKEN(string); \
+  tokenizer_domain = domain;                                      \
+  string_literal = string
+
+TEST_F(TokenizeToBuffer, Domain_Default) {
+  const char* tokenizer_domain = nullptr;
+  const char* string_literal = nullptr;
+
+  size_t message_size = sizeof(buffer_);
+
+  PW_TOKENIZE_TO_BUFFER(buffer_, &message_size, "The answer is: %s", "5432!");
+
+  EXPECT_STREQ(tokenizer_domain, PW_TOKENIZER_DEFAULT_DOMAIN);
+  EXPECT_STREQ(string_literal, "The answer is: %s");
+}
+
+TEST_F(TokenizeToBuffer, Domain_Specified) {
+  const char* tokenizer_domain = nullptr;
+  const char* string_literal = nullptr;
+
+  size_t message_size = sizeof(buffer_);
+
+  PW_TOKENIZE_TO_BUFFER_DOMAIN(
+      "._.", buffer_, &message_size, "The answer is: %s", "5432!");
+
+  EXPECT_STREQ(tokenizer_domain, "._.");
+  EXPECT_STREQ(string_literal, "The answer is: %s");
+}
+
+TEST_F(TokenizeToCallback, Domain_Default) {
+  const char* tokenizer_domain = nullptr;
+  const char* string_literal = nullptr;
+
+  PW_TOKENIZE_TO_CALLBACK(SetMessage, "The answer is: %s", "5432!");
+
+  EXPECT_STREQ(tokenizer_domain, PW_TOKENIZER_DEFAULT_DOMAIN);
+  EXPECT_STREQ(string_literal, "The answer is: %s");
+}
+
+TEST_F(TokenizeToCallback, Domain_Specified) {
+  const char* tokenizer_domain = nullptr;
+  const char* string_literal = nullptr;
+
+  PW_TOKENIZE_TO_CALLBACK_DOMAIN(
+      "ThisIsTheDomain", SetMessage, "The answer is: %s", "5432!");
+
+  EXPECT_STREQ(tokenizer_domain, "ThisIsTheDomain");
+  EXPECT_STREQ(string_literal, "The answer is: %s");
+}
+
 }  // namespace
 }  // namespace pw::tokenizer

diff --git a/pw_tokenizer/tokenize_to_global_handler.cc b/pw_tokenizer/tokenize_to_global_handler.cc
index 685a266..ecca3b4 100644
--- a/pw_tokenizer/tokenize_to_global_handler.cc
+++ b/pw_tokenizer/tokenize_to_global_handler.cc

@@ -19,9 +19,9 @@
 namespace pw {
 namespace tokenizer {
 
-extern "C" void pw_TokenizeToGlobalHandler(pw_TokenizerStringToken token,
-                                           pw_TokenizerArgTypes types,
-                                           ...) {
+extern "C" void _pw_TokenizeToGlobalHandler(pw_TokenizerStringToken token,
+                                            pw_TokenizerArgTypes types,
+                                            ...) {
   EncodedMessage encoded;
   encoded.token = token;
 

diff --git a/pw_tokenizer/tokenize_to_global_handler_with_payload.cc b/pw_tokenizer/tokenize_to_global_handler_with_payload.cc
index 2a0662f..b04e549 100644
--- a/pw_tokenizer/tokenize_to_global_handler_with_payload.cc
+++ b/pw_tokenizer/tokenize_to_global_handler_with_payload.cc

@@ -19,7 +19,7 @@
 namespace pw {
 namespace tokenizer {
 
-extern "C" void pw_TokenizeToGlobalHandlerWithPayload(
+extern "C" void _pw_TokenizeToGlobalHandlerWithPayload(
     const pw_TokenizerPayload payload,
     pw_TokenizerStringToken token,
     pw_TokenizerArgTypes types,

diff --git a/pw_tokenizer/tokenizer_linker_sections.ld b/pw_tokenizer/tokenizer_linker_sections.ld
deleted file mode 100644
index db08481..0000000
--- a/pw_tokenizer/tokenizer_linker_sections.ld
+++ /dev/null

@@ -1,64 +0,0 @@
-/*
- * Copyright 2020 The Pigweed Authors
- *
- * Licensed under the Apache License, Version 2.0 (the "License"); you may not
- * use this file except in compliance with the License. You may obtain a copy of
- * the License at
- *
- *     https://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
- * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
- * License for the specific language governing permissions and limitations under
- * the License.
- *
- * This linker script snippet declares the sections needed for string
- * tokenization.
- *
- * This file may be directly included in a linker script with an include
- * directive. For example,
- *
- *   INCLUDE path/to/modules/pw_tokenizer/tokenizer_linker_sections.ld
- *
- *   SECTIONS
- *   {
- *     (your existing linker sections)
- *   }
- */
-
-SECTIONS
-{
-  /*
-   * All tokenized strings are stored in this section. Since the section has
-   * type INFO, it is excluded from the final binary.
-   *
-   * In the compiled code, format string literals are replaced by a hash of the
-   * string contents and a compact argument list encoded in a uint32_t. The
-   * compiled code contains no references to the tokenized strings in this
-   * section.
-   *
-   * The section contents are declared with KEEP so that they are not removed
-   * from the ELF. These are never emitted in the final binary or loaded into
-   * memory.
-   */
-  .tokenized 0x00000000 (INFO) :
-  {
-    KEEP(*(.tokenized))
-    KEEP(*(.tokenized.*))
-  }
-
-  /*
-   * This section stores metadata that may be used during tokenized string
-   * decoding. This metadata describes properties that may affect how the
-   * tokenized string is encoded or decoded -- the maximum length of the hash
-   * function and the sizes of certain integer types.
-   *
-   * Metadata is declared as key-value pairs. See the metadata variable in
-   * tokenize.cc for further details.
-   */
-  .tokenizer_info 0x00000000 (INFO) :
-  {
-    KEEP(*(.tokenizer_info))
-  }
-}
commit	d58eef9aa8833f6e1b227d685d90876e721a6d1f	[log] [tgz]
author	Wyatt Hepler <hepler@google.com>	Fri May 08 10:39:56 2020 -0700
committer	CQ Bot Account <commit-bot@chromium.org>	Thu May 14 16:09:38 2020 +0000
tree	63ec75b4ad3880170ebe951e62d038157a7b8ab1
parent	f65bb9c4564c0281b5168661efe88be320c2f89e [diff]