tiny fix (#7355)

commit: 1cc4e6668f227c2c9f0d757342a7882fca4cb8ee [log] [tgz]
author: bjacob <benoitjacob@google.com> Thu Oct 14 22:09:26 2021 -0400
committer: GitHub <noreply@github.com> Thu Oct 14 19:09:26 2021 -0700
tree: 221f5db29a8b39ae4fd78f1ac58e8c7cb983b2c2
parent: f776545395cf7b4381ff81d487ed55375e07f8a7 [diff]
diff --git a/docs/website/docs/blog/2021-10-13-mmt4d.md b/docs/website/docs/blog/2021-10-13-mmt4d.md
index 020e67c..969a765 100644
--- a/docs/website/docs/blog/2021-10-13-mmt4d.md
+++ b/docs/website/docs/blog/2021-10-13-mmt4d.md

@@ -39,16 +39,19 @@
 algebraic transformations that compose and enable further compiler
 optimizations.
 
-At the basis of this work is the [extensible op system of the Linalg dialect](https://mlir.llvm.org/docs/Dialects/Linalg/OpDSL/) in the MLIR compiler toolkit.
-In this case, a general purpose, mixed precision mmt4d op is defined via a high level
-description directly in the compiler and is then available to both users of the compiler
-(as a `linalg.mmt4d` op) or for direct emission via Python based IR construction
-(i.e. for direct integration into high level frameworks without rebuilding the compiler).
-The ability to define such new special forms cheaply, and without any systemic framework
-level cost, is part of the extensibility and composition story that we expect will become
-increasingly important in development and deployment scenarios in the future, and in this
-case, it let us spring board off of high quality code generation which was already well
-integrated and composed well with other features of the compiler.
+At the basis of this work is the
+[extensible op system of the Linalg dialect](https://mlir.llvm.org/docs/Dialects/Linalg/OpDSL/)
+in the MLIR compiler toolkit. In this case, a general purpose, mixed precision
+mmt4d op is defined via a high level description directly in the compiler and is
+then available to both users of the compiler (as a `linalg.mmt4d` op) or for
+direct emission via Python based IR construction (i.e. for direct integration
+into high level frameworks without rebuilding the compiler). The ability to
+define such new special forms cheaply, and without any systemic framework level
+cost, is part of the extensibility and composition story that we expect will
+become increasingly important in development and deployment scenarios in the
+future, and in this case, it let us spring board off of high quality code
+generation which was already well integrated and composed well with other
+features of the compiler.
 
 ## Existing Matrix Multplication Code Generation
 
@@ -105,9 +108,9 @@
 
 -   **Inefficent memory traversal:** For efficiency reasons, we always need
     `tile_m_v>1` and `tile_n_v>1`. That is because the higher these values, the
-    less memory-load instructions are needed overall; and this is also dictated by
-    the SIMD instructions that we want to use. But that means that the kernel is
-    accessing simultaneously multiple rows or columns of the left-hand and
+    fewer memory-load instructions are needed overall; and this is also dictated
+    by the SIMD instructions that we want to use. But that means that the kernel
+    is accessing simultaneously multiple rows or columns of the left-hand and
     right-hand side matrices. And in this existing approach, they are stored in
     linear layout, not in a tiled layout, so these accesses are not contiguous
     in memory. This is detrimental to memory access performance, meaning the
@@ -168,8 +171,10 @@
 
 ![graphviz](2021-10-13-mmt4d-graph_flow.svg)
 
-So we can think of the outermost two dimensions of the 4D representations as the tile position in the overall matrix, and the innermost two as the element position within one tile. Hopefully
-the following Python pseudocode makes it more concrete:
+So we can think of the outermost two dimensions of the 4D representations as the
+tile position in the overall matrix, and the innermost two as the element
+position within one tile. Hopefully the following Python pseudocode makes it
+more concrete:
 
 ```python
 def pack_2d_4d(operand, parallel_size, reduction_size):
commit	1cc4e6668f227c2c9f0d757342a7882fca4cb8ee	[log] [tgz]
author	bjacob <benoitjacob@google.com>	Thu Oct 14 22:09:26 2021 -0400
committer	GitHub <noreply@github.com>	Thu Oct 14 19:09:26 2021 -0700
tree	221f5db29a8b39ae4fd78f1ac58e8c7cb983b2c2
parent	f776545395cf7b4381ff81d487ed55375e07f8a7 [diff]