[top_earlgrey] Run the Verilator sim with 4 threads

Verilator can parallelize the execution of the simulation. Previously,
we didn't make use of this functionality. With this commit, the
simulation is split into four threads by default, which works best for
systems with four or more physical cores (in one socket); this should be
a sane default for a developer workstation/laptop these days.

Users can override this setting when compiling the simulation to better
adjust to their system configuration by providing a parameter to
fusesoc. For example, the following command line compiles the simulation
with 8 threads.

```
fusesoc --cores-root=. run --target=sim  --setup --build \
  --flag=fileset_top lowrisc:systems:top_earlgrey_verilator \
  --verilator_options '--threads 8'
```

In my testing, the speedup in simulation performance is reasonably close
to being proportional to the amount of threads, as long as physical CPU
cores are available. For example, going from 1 thread to 4 threads
results in a speedup of 3.2 on my 8th generation Intel Core i7 laptop CPU.

For CI with Azure Pipelines, we need to disable threading, as the
provided machines only have two virtual CPUs, and we get a slow-down
from enabling tracing, which times out some simulations.
(See
https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/hosted?view=azure-devops&tabs=yaml;
VM model "Standard_DS2_v2" as listed in
https://docs.microsoft.com/en-us/azure/virtual-machines/dv2-dsv2-series#dsv2-series).

Also note that the compliance test in CI runs with four parallel jobs;
changing the number of threads per simulation will likely also need
changes to the number of compliance test jobs.

Signed-off-by: Philipp Wagner <phw@lowrisc.org>
diff --git a/azure-pipelines.yml b/azure-pipelines.yml
index ba507b9..9a91e76 100644
--- a/azure-pipelines.yml
+++ b/azure-pipelines.yml
@@ -347,10 +347,14 @@
       mkdir -p "$BIN_DIR/hw/top_earlgrey"
 
       export PATH="$VERILATOR_PATH/bin:$PATH"
+      # Compile the simulation without threading; the runners provided by
+      # Azure provide two virtual CPUs, which seems to equal one physical
+      # CPU (at most); the use of threading slows down the simulation.
       fusesoc --cores-root=. \
         run --flag=fileset_top --target=sim --setup --build \
         --build-root="$OBJ_DIR/hw" \
-        lowrisc:systems:top_earlgrey_verilator
+        lowrisc:systems:top_earlgrey_verilator \
+        --verilator_options="--no-threads"
 
       cp "$OBJ_DIR/hw/sim-verilator/Vtop_earlgrey_verilator" \
         "$BIN_DIR/hw/top_earlgrey"