Design of the allocator claims model

This document describes the design of the claim model that the allocator should provide and how it should be implemented.

The problem

Consider the scenario where compartment A passes a heap buffer to compartment B. Another thread executing in compartment A may free the buffer in the middle of B's execution. This will cause B to fault. This is a concurrent operation and so sets the stage for classic TOCTOU problems. In the core compartments, we mostly avoid this by running with interrupts disabled, but this is not desirable for the entire system.

To mitigate this, we need some mechanism that allows B to prevent the deallocation of an object. This is further complicated by the fact that A and B may exist in a mutual distrust relationship and so B must not be permitted to consume A's memory allocation quota.

There are two variants of this problem, which may have different solutions. In one case, the callee wishes to use an object for the duration of the call. In another case, the callee wishes to use an object for a long period. The main difference between these is the relative costs. For objects held for the duration of a call, the cost of two calls into the allocator to hold and release the object may be prohibitive, whereas this is likely to be negligible for objects held for a long time.

Additional constraints

In addition to solving the basic problem, we have a number of additional constraints on the design space:

  • A solution should be possible to apply at the boundaries when wrapping an existing component in a compartment.
  • Most objects will not use this mechanism (they will be reachable from a single compartment) and so there should be no overhead when not in use.
  • We are targeting resource-constrained systems and so the total overhead must be low.
  • We assume compartments may be malicious and so they should not be able to trick the allocator into creating a data structure that takes a very large amount of time to walk.

Possible approach 0: Explicit copies

The first approach is simply to punt on the problem entirely. When an OS kernel interacts with userspace memory, it calls explicit helpers to copy data to and from the kernel. This is necessary because the userspace process may provide invalid pointers or update the memory map to invalidate pointers while the kernel runs.

In many situations, the same approach could work with CHERIoT. We could provide a safe memcpy as a library function that runs with interrupts disabled, checks the source and destination pointers, and then performs the copy (up to some bounded size) and reports whether the copy succeeded. This would be sufficient in a lot of cases but is not a very friendly programmer model. In particular, it means that any compartment wrapping existing unmodified code would have to either (deep) copy any objects passed in or would have to make invasive changes to the wrapped code.

Note: This approach can be orthogonal to others and so we should implement it anyway.

Possible approach 1: Hazard pointers

Hazard pointers provide inspiration for the first possible solution. This approach involves having a per-compartment list of held objects that the allocator must consult before freeing an object. The allocator would provide an API that allowed a compartment to register a table containing a (bounded) array of held objects and an allocation capability. On deallocation, the allocator would need to traverse all such lists to determine whether an object is on any of them. If the object is found on a list then the metadata should be updated to mark it as owned by the allocation capability associated with the list.

This is attractive for the fast-path case, because the compartment wishing to claim an object for the duration of a call needs only to insert a pointer to it into a pre-registered array. It is unclear whether it is possible to use this interface securely. In particular, a malicious compartment could allocate a large object and pass a pointer to a small sub-object to the callee. The callee has no mechanism (other than calling the allocator) to know whether it holds a pointer to an entire object.

Possible approach 2: Explicit claims

The approach in CheriOS is similar to reference counting. Compartments may explicitly claim memory and objects may not be deallocated until all claims have been dropped. Importantly, the CheriOS model handles resource constraints: when you claim an object, it counts towards your quota.

The down side of this model is that it requires tracking a lot of state. Straight reference counting is not sufficient because references can be owned by mutually distrusting entities. With a simple reference counting model, the original attack is still possible:

  1. A allocates an object.
  2. A passes a pointer to the object to B.
  3. B increments the reference count.
  4. A decrements the reference count twice from another thread.
  5. B traps.

Each reference to the object is now state that must be tracked and, most likely, will require an allocation. We currently have 14 bits spare in the allocation header (slightly more if we shrink the allocator ID, which does not actually require a full 16 bits, since a 16-bit allocator ID would require 2 MiB of RAM to hold all of the corresponding allocation capabilities, leaving no space for the heap) and so could store a heap-start-relative (shifted for alignment) pointer to the head of a linked list of claims. In the common case, this field will be 0 (the start of the heap cannot be allocated) and so free can skip it.

The proposed solution involves constructing a linked list of structures holding an allocator ID, a saturating reference count, and a next pointer. When an object is claimed, the allocator walks the list of claims to find a structure with a matching ID. If one is found, it increments the reference count. If not, then it allocates a new object (out of the caller‘s quota) to hold the claim and inserts it into the head of the list. The size of the object is subtracted from the caller’s quota.

When a claim is dropped (this does not require a new API, heap_free must do the same thing, which is convenient for destructors), the allocator has to handle two cases:

  • Is this an additional claim? If so then it will have a claim entry on the list.
  • Is this the original allocator? If so, then it will be the owner ID in the header.

These are not mutually exclusive. The owner may claim its own object to avoid having to track whether it owns an object that is passed to it on a recursive call.

The second case is easy to handle, check the caller's allocator capability ID against the header as is done today. Instead of automatically proceeding to deallocation if this succeeds, the allocator should free the object only if the claims list is also empty. If the claims list is not empty, the owner is set to 0 (0 is an invalid value for an allocator ID).

To handle the first case, the allocator walks the claims list, decrementing the reference count when it finds the corresponding entry. If the reference count saturates then the decrement is silently ignored. If this reference count reaches zero, then the claim structure is deallocated (reclaiming both its usage and that of the object from the caller's quota). If this is the last claim, then the object should be deallocated.

Note that this approach still has potential problems with sub objects. We have a choice of whether to allow claims with capabilities that span less than the entire object. If we do, then we still need to consume sufficient quota for the entire object, which may be surprising to the callee. If we do not, then sub-object bounds (which are good for security) become less useful.

I propose that we implement an API with this signature:

size_t claim_object(struct SObjStruct *heapCapability, void *capability);

The return value of this is the number of bytes that were consumed from the quota. If this is zero, the claim failed. Any non-zero result indicates success but a caller can also check that the result is below some expected maximum.