)]}'
{
  "commit": "27be42f696e5c6c98d31243cbcc8a169edb18a84",
  "tree": "5896d1a5cc340d0b212c7e893500003703578ff6",
  "parents": [
    "be4c5fcb20834ed055b92e94fbf0d478ff012af2",
    "0066c7a5f033d29be645845a7b58e95631bbf41e"
  ],
  "author": {
    "name": "Ben Vanik",
    "email": "ben.vanik@gmail.com",
    "time": "Mon Dec 05 19:25:32 2022 -0800"
  },
  "committer": {
    "name": "GitHub",
    "email": "noreply@github.com",
    "time": "Mon Dec 05 19:25:32 2022 -0800"
  },
  "message": "Adding collectives HAL operations and compiler support. (#11342)\n\nThis adds end-to-end support from a new `stream.async.collective` op and\r\ncommunication channel type down through the HAL and all the way to\r\nruntime.\r\n\r\nConceptually collective operations are commands that can be recorded\r\ninto command buffers and the various HAL backends can decide how to\r\nimplement them. The commands are transfer-like but may be implemented\r\nwith dispatch logic.\r\n\r\nWe could offer a local emulated channel that let us simulate multiple\r\ndevices by performing copies but this first version just returns\r\nunimplemented on all backends. A skeleton of the runtime support is\r\nprovided for NCCL support in the CUDA backend. In the future we can add\r\nlocal/ backend support for various collective libraries if we want or\r\nexpose a factory mechanism on device creation to allow hosting\r\napplications control over the communication channels and routing. A\r\nutility has been added to allow command buffer implementations to\r\naccumulate batches of collective operations for efficient submission to\r\nthe underlying library APIs.\r\n\r\nThere are several areas future changes will focus on but what\u0027s here\r\nshould be enough for some basic hello-world programs, major things\r\nmissing:\r\n\r\n* send/recv are not available at the `stream.async.*` level\r\n* collectives cannot currently be performed in-place (#11249 is tracking\r\nthe support required)\r\n* supported collective element types and reduction operators are\r\nbasically just NCCL\r\n* emulation for unsupported element types and reduction operators is\r\nmissing - we should insert casts and such at higher levels\r\n\r\nThe next steps for wiring this up are to implement NCCL shared library\r\nloading, implement the `TODO(#9580)`s in the code for calling into NCCL,\r\nand some representation of collectives at the flow level that lower into\r\nthe stream ops.\r\n\r\nProgress on #9580.",
  "tree_diff": []
}
