[prim_present/prince] Update PRINCE keyschedule and update docs

Signed-off-by: Michael Schaffner <msf@opentitan.org>
diff --git a/hw/ip/prim/doc/prim_present.md b/hw/ip/prim/doc/prim_present.md
index 341a822..da7856e 100644
--- a/hw/ip/prim/doc/prim_present.md
+++ b/hw/ip/prim/doc/prim_present.md
@@ -66,5 +66,5 @@
 data_o = state ^ round_keys[32];
 ```
 
-The reduced 32bit block-size variant implemented is non-standard and should only be used for scrambling purposes, since it is not secure.
+The reduced 32bit block-size variant implemented is non-standard and should only be used for scrambling purposes, since it **is not secure**.
 It leverages the same crypto primitives and key derivation functions as the 64bit variant, with the difference that the permutation layer is formulated for 32 instead of 64 elements.
diff --git a/hw/ip/prim/doc/prim_prince.md b/hw/ip/prim/doc/prim_prince.md
index 52e6abe..8019b4f 100644
--- a/hw/ip/prim/doc/prim_prince.md
+++ b/hw/ip/prim/doc/prim_prince.md
@@ -16,11 +16,12 @@
 
 ## Parameters
 
-Name         | type   | Description
--------------|--------|----------------------------------------------------------
-DataWidth    | int    | Block size, can be 32 or 64.
-KeyWidth     | int    | Key size, can be 64 for block size 32, or 128 for block size 64
-NumRounds    | int    | Half the number of the reflected PRINCE rounds. Can range from 1 to 5. The effective number of non-linear layers is 2 + 2 * NumRounds.
+Name           | type   | Description
+---------------|--------|----------------------------------------------------------
+DataWidth      | int    | Block size, can be 32 or 64.
+KeyWidth       | int    | Key size, can be 64 for block size 32, or 128 for block size 64
+NumRounds      | int    | Half the number of the reflected PRINCE rounds. Can range from 1 to 5. The effective number of non-linear layers is 2 + 2 * NumRounds.
+UseOldKeySched | bit    | If set to 1, fall back to the original keyschedule (not recommended). Defaults to 0.
 
 ## Signal Interfaces
 
@@ -34,18 +35,18 @@
 # Theory of Operations
 
 ```
-             /---------------\
-dec_i        |               |
------------->|    PRINCE     |
-key_i        |               |
-=====/======>|   DataWidth   |
- [KeyWidth]  |   KeyWidth    |
-             |   NumRounds   |
-data_i       |               |  data_o
-=====/======>|               |=====/=======>
- [DataWidth] |               |  [DataWidth]
-             |               |
-             \---------------/
+             /----------------\
+dec_i        |                |
+------------>|     PRINCE     |
+key_i        |                |
+=====/======>|    DataWidth   |
+ [KeyWidth]  |    KeyWidth    |
+             |    NumRounds   |
+data_i       | UseOldKeySched |  data_o
+=====/======>|                |=====/=======>
+ [DataWidth] |                |  [DataWidth]
+             |                |
+             \----------------/
 ```
 
 The PRINCE module is fully unrolled and combinational, meaning that it does not have any clock, reset or handshaking inputs.
@@ -75,7 +76,7 @@
       state = mult_layer(state);
       state = shiftrows_layer(state);
       state ^= ROUND_CONSTANT[i]
-      state ^= k1;
+      data_state ^= (k & 0x1) ? k0 : k1;
 }
 
 // middle part
@@ -85,7 +86,7 @@
 
 // reverse pass
 for (int i=6; i < 11; i++) {
-      state ^= k1;
+      data_state ^= (k & 0x1) ? k1 : k0;
       state ^= ROUND_CONSTANT[i]
       state = shiftrows_inverse_layer(state);
       state = mult_layer(state);
@@ -97,10 +98,15 @@
 
 data_o = state ^ k0_prime;
 ```
+The multiplicative layer is an involution, meaning that it is its own inverse and it can hence be used in the reverse pass without inversion.
 
-Note that the multiplicative layer is an involution, meaning that it is its own inverse and it can hence be used in the reverse pass without inversion.
+It should be noted that the actual choice of the `ALPHA_CONSTANT` used in the key tweak can have security impacts as detailed in [this paper](https://eprint.iacr.org/2015/372.pdf).
+The constant chosen by the designers of PRINCE does not have these issues - but proper care should be taken if it is decided to modify this constant.
+Also, [this paper](https://eprint.iacr.org/2014/656.pdf) proposes an improved key schedule to fend against attacks on the FX structure of PRINCE (see Appendix C), and this improvement has been incorporated in this design.
+The improvement involves alternating the keys `k0` and `k1` between rounds, as opposed to always using the same key `k1`.
 
-The reduced 32bit variant mentioned above and all reduced round variants are non-standard and must only be used for scrambling purposes, since they are not secure.
+
+The reduced 32bit variant mentioned above and all reduced round variants are non-standard and must only be used for scrambling purposes, since they **are not secure**.
 The 32bit variant leverages the same crypto primitives and key derivation functions as the 64bit variant, with the difference that the multiplication matrix is only comprised of the first two block diagonal submatrices (^M0 and ^M1 in the paper), and the shiftrows operation does not operate on nibbles but pairs of 2 bits instead.
 
 
diff --git a/hw/ip/prim/rtl/prim_prince.sv b/hw/ip/prim/rtl/prim_prince.sv
index b32f95f..355c916 100644
--- a/hw/ip/prim/rtl/prim_prince.sv
+++ b/hw/ip/prim/rtl/prim_prince.sv
@@ -17,8 +17,11 @@
 // References: - https://en.wikipedia.org/wiki/PRESENT
 //             - https://en.wikipedia.org/wiki/Prince_(cipher)
 //             - http://www.lightweightcrypto.org/present/present_ches2007.pdf
-//             - https://eprint.iacr.org/2012/529.pdf
 //             - https://csrc.nist.gov/csrc/media/events/lightweight-cryptography-workshop-2015/documents/papers/session7-maene-paper.pdf
+//             - https://eprint.iacr.org/2012/529.pdf
+//             - https://eprint.iacr.org/2015/372.pdf
+//             - https://eprint.iacr.org/2014/656.pdf
+
 
 // TODO: this module has not been verified yet, and has only been used in
 // synthesis experiments.
@@ -27,7 +30,10 @@
   parameter int DataWidth     = 64,
   parameter int KeyWidth      = 128,
   // The construction is reflective. Total number of rounds is 2*NumRoundsHalf + 2
-  parameter int NumRoundsHalf = 5
+  parameter int NumRoundsHalf = 5,
+  // This primitive uses the new key schedule proposed in https://eprint.iacr.org/2014/656.pdf
+  // Setting this parameter to 1 falls back to the original key schedule.
+  parameter bit UseOldKeySched = 1'b0
 ) (
   input        [DataWidth-1:0] data_i,
   input        [KeyWidth-1:0]  key_i,
@@ -168,7 +174,15 @@
   //////////////
 
   logic [DataWidth-1:0] data_state;
-  logic [DataWidth-1:0] k0, k0_prime, k1;
+  logic [DataWidth-1:0] k0, k0_prime, k1, k0_new;
+
+  if (UseOldKeySched) begin : gen_legacy_keyschedule
+    assign k0_new = k1;
+  end else begin : gen_new_keyschedule
+    // improved keyschedule proposed by https://eprint.iacr.org/2014/656.pdf
+    assign k0_new = k0;
+  end
+
   always_comb begin : p_prince
     // key expansion
     k0       = key_i[DataWidth-1:0];
@@ -193,7 +207,8 @@
       data_state = mult_prime_layer(data_state);
       data_state = shiftrows_layer(data_state);
       data_state ^= RoundConst[k][DataWidth-1:0];
-      data_state ^= k1;
+      // improved keyschedule proposed by https://eprint.iacr.org/2014/656.pdf
+      data_state ^= (1'(k) & 1'b1) ? k0_new : k1;
     end
 
     // middle part
@@ -204,7 +219,8 @@
     // reverse pass
     // the construction is reflective, hence the subtraction with NumRoundsHalf
     for (int k = 11-NumRoundsHalf; k <= 10; k++) begin
-      data_state ^= k1;
+      // improved keyschedule proposed by https://eprint.iacr.org/2014/656.pdf
+      data_state ^= (1'(k) & 1'b1) ? k1 : k0_new;
       data_state ^= RoundConst[k][DataWidth-1:0];
       data_state = shiftrows_inv_layer(data_state);
       data_state = mult_prime_layer(data_state);