tiny fix (#7355)
diff --git a/docs/website/docs/blog/2021-10-13-mmt4d.md b/docs/website/docs/blog/2021-10-13-mmt4d.md
index 020e67c..969a765 100644
--- a/docs/website/docs/blog/2021-10-13-mmt4d.md
+++ b/docs/website/docs/blog/2021-10-13-mmt4d.md
@@ -39,16 +39,19 @@
algebraic transformations that compose and enable further compiler
optimizations.
-At the basis of this work is the [extensible op system of the Linalg dialect](https://mlir.llvm.org/docs/Dialects/Linalg/OpDSL/) in the MLIR compiler toolkit.
-In this case, a general purpose, mixed precision mmt4d op is defined via a high level
-description directly in the compiler and is then available to both users of the compiler
-(as a `linalg.mmt4d` op) or for direct emission via Python based IR construction
-(i.e. for direct integration into high level frameworks without rebuilding the compiler).
-The ability to define such new special forms cheaply, and without any systemic framework
-level cost, is part of the extensibility and composition story that we expect will become
-increasingly important in development and deployment scenarios in the future, and in this
-case, it let us spring board off of high quality code generation which was already well
-integrated and composed well with other features of the compiler.
+At the basis of this work is the
+[extensible op system of the Linalg dialect](https://mlir.llvm.org/docs/Dialects/Linalg/OpDSL/)
+in the MLIR compiler toolkit. In this case, a general purpose, mixed precision
+mmt4d op is defined via a high level description directly in the compiler and is
+then available to both users of the compiler (as a `linalg.mmt4d` op) or for
+direct emission via Python based IR construction (i.e. for direct integration
+into high level frameworks without rebuilding the compiler). The ability to
+define such new special forms cheaply, and without any systemic framework level
+cost, is part of the extensibility and composition story that we expect will
+become increasingly important in development and deployment scenarios in the
+future, and in this case, it let us spring board off of high quality code
+generation which was already well integrated and composed well with other
+features of the compiler.
## Existing Matrix Multplication Code Generation
@@ -105,9 +108,9 @@
- **Inefficent memory traversal:** For efficiency reasons, we always need
`tile_m_v>1` and `tile_n_v>1`. That is because the higher these values, the
- less memory-load instructions are needed overall; and this is also dictated by
- the SIMD instructions that we want to use. But that means that the kernel is
- accessing simultaneously multiple rows or columns of the left-hand and
+ fewer memory-load instructions are needed overall; and this is also dictated
+ by the SIMD instructions that we want to use. But that means that the kernel
+ is accessing simultaneously multiple rows or columns of the left-hand and
right-hand side matrices. And in this existing approach, they are stored in
linear layout, not in a tiled layout, so these accesses are not contiguous
in memory. This is detrimental to memory access performance, meaning the
@@ -168,8 +171,10 @@

-So we can think of the outermost two dimensions of the 4D representations as the tile position in the overall matrix, and the innermost two as the element position within one tile. Hopefully
-the following Python pseudocode makes it more concrete:
+So we can think of the outermost two dimensions of the 4D representations as the
+tile position in the overall matrix, and the innermost two as the element
+position within one tile. Hopefully the following Python pseudocode makes it
+more concrete:
```python
def pack_2d_4d(operand, parallel_size, reduction_size):