Update mac_5x9x8 Revised authored by gling's avatar gling
......@@ -32,28 +32,91 @@ In this case, if this multiply was a part of a 5x4x5 MAC unit, $\lceil\log_2(5)\
Since there are no more sign extensions required, we can rewrite the entire 5x4x5 MAC unit as a single large array addition:
```
p00[4:0] = a0[4:0] * b0[0]
p01[4:0] = a0[4:0] * b0[1]
p02[4:0] = a0[4:0] * b0[2]
p03[4:0] = a0[4:0] * b0[3]
p00[4:0] = a0[4:0] * b0[0]
p01[4:0] = a0[4:0] * b0[1]
p02[4:0] = a0[4:0] * b0[2]
p03[4:0] = a0[4:0] * b0[3]
~p0[4] p0[3] p0[2] p0[1] p0[0]
~p0[4] p0[3] p0[2] p0[1] p0[0]
~p0[4] p0[3] p0[2] p0[1] p0[0]
p0[4] ~p0[3] ~p0[2] ~p0[1] ~p0[0]
+ 1 0 0 0 0 1 0 0 0
-------------------------------------------------------------------------------
P[8] P[7] P[6] P[5] P[4] P[3] P[2] P[1] P[0]
p0_0[4:0] = a0[4:0] * b0[0]
p0_1[4:0] = a0[4:0] * b0[1]
p0_2[4:0] = a0[4:0] * b0[2]
p0_3[4:0] = a0[4:0] * b0[3]
p1_0[4:0] = a1[4:0] * b1[0]
p1_1[4:0] = a1[4:0] * b1[1]
p1_2[4:0] = a1[4:0] * b1[2]
p1_3[4:0] = a1[4:0] * b1[3]
p2_0[4:0] = a2[4:0] * b2[0]
p2_1[4:0] = a2[4:0] * b2[1]
p2_2[4:0] = a2[4:0] * b2[2]
p2_3[4:0] = a2[4:0] * b2[3]
p3_0[4:0] = a3[4:0] * b3[0]
p3_1[4:0] = a3[4:0] * b3[1]
p3_2[4:0] = a3[4:0] * b3[2]
p3_3[4:0] = a3[4:0] * b3[3]
p4_0[4:0] = a4[4:0] * b4[0]
p4_1[4:0] = a4[4:0] * b4[1]
p4_2[4:0] = a4[4:0] * b4[2]
p4_3[4:0] = a4[4:0] * b4[3]
~p0_0[4] p0_0[3] p0_0[2] p0_0[1] p0_0[0]
~p1_0[4] p1_0[3] p1_0[2] p1_0[1] p1_0[0]
~p2_0[4] p2_0[3] p2_0[2] p2_0[1] p2_0[0]
~p3_0[4] p3_0[3] p3_0[2] p3_0[1] p3_0[0]
~p4_0[4] p4_0[3] p4_0[2] p4_0[1] p4_0[0]
~p0_1[4] p0_1[3] p0_1[2] p0_1[1] p0_1[0]
~p1_1[4] p1_1[3] p1_1[2] p1_1[1] p1_1[0]
~p2_1[4] p2_1[3] p2_1[2] p2_1[1] p2_1[0]
~p3_1[4] p3_1[3] p3_1[2] p3_1[1] p3_1[0]
~p4_1[4] p4_1[3] p4_1[2] p4_1[1] p4_1[0]
~p0_2[4] p0_2[3] p0_2[2] p0_2[1] p0_2[0]
~p1_2[4] p1_2[3] p1_2[2] p1_2[1] p1_2[0]
~p2_2[4] p2_2[3] p2_2[2] p2_2[1] p2_2[0]
~p3_2[4] p3_2[3] p3_2[2] p3_2[1] p3_2[0]
~p4_2[4] p4_2[3] p4_2[2] p4_2[1] p4_2[0]
p0_3[4] ~p0_3[3] ~p0_3[2] ~p0_3[1] ~p0_3[0]
p1_3[4] ~p1_3[3] ~p1_3[2] ~p1_3[1] ~p1_3[0]
p2_3[4] ~p2_3[3] ~p2_3[2] ~p2_3[1] ~p2_3[0]
p3_3[4] ~p3_3[3] ~p3_3[2] ~p3_3[1] ~p3_3[0]
p4_3[4] ~p4_3[3] ~p4_3[2] ~p4_3[1] ~p4_3[0]
1 1 1 1 0 0 0 0 1 0 0 0
1 1 1 1 0 0 0 0 1 0 0 0
1 1 1 1 0 0 0 0 1 0 0 0
1 1 1 1 0 0 0 0 1 0 0 0
+ 1 1 1 1 0 0 0 0 1 0 0 0
------------------------------------------------------------------------------------------------------------
P[11] P[10] P[9] P[8] P[7] P[6] P[5] P[4] P[3] P[2] P[1] P[0]
```
Focusing on the upper 4 high bits of these 5 constant terms, if K is even, these will always sum to zeroes and a carry out bit, have no effect on the result, and can be replaced with zeroes. If K is odd, these will always sum to 1's, inverting the upper 4 bits of the output sum. Therefore, those can be replaced with zeroes and inverting the upper 4 bits of P (`P[11:8]`) after summation.
The column of bits at `(1 << (M-1))` can be summed together prior to compilation into one constant value of `(K << (M-1))`. Therefore, the final MAC output can be shown by:
```
~p0_0[4] p0_0[3] p0_0[2] p0_0[1] p0_0[0]
~p1_0[4] p1_0[3] p1_0[2] p1_0[1] p1_0[0]
~p2_0[4] p2_0[3] p2_0[2] p2_0[1] p2_0[0]
~p3_0[4] p3_0[3] p3_0[2] p3_0[1] p3_0[0]
~p4_0[4] p4_0[3] p4_0[2] p4_0[1] p4_0[0]
~p0_1[4] p0_1[3] p0_1[2] p0_1[1] p0_1[0]
~p1_1[4] p1_1[3] p1_1[2] p1_1[1] p1_1[0]
~p2_1[4] p2_1[3] p2_1[2] p2_1[1] p2_1[0]
~p3_1[4] p3_1[3] p3_1[2] p3_1[1] p3_1[0]
~p4_1[4] p4_1[3] p4_1[2] p4_1[1] p4_1[0]
~p0_2[4] p0_2[3] p0_2[2] p0_2[1] p0_2[0]
~p1_2[4] p1_2[3] p1_2[2] p1_2[1] p1_2[0]
~p2_2[4] p2_2[3] p2_2[2] p2_2[1] p2_2[0]
~p3_2[4] p3_2[3] p3_2[2] p3_2[1] p3_2[0]
~p4_2[4] p4_2[3] p4_2[2] p4_2[1] p4_2[0]
p0_3[4] ~p0_3[3] ~p0_3[2] ~p0_3[1] ~p0_3[0]
p1_3[4] ~p1_3[3] ~p1_3[2] ~p1_3[1] ~p1_3[0]
p2_3[4] ~p2_3[3] ~p2_3[2] ~p2_3[1] ~p2_3[0]
p3_3[4] ~p3_3[3] ~p3_3[2] ~p3_3[1] ~p3_3[0]
p4_3[4] ~p4_3[3] ~p4_3[2] ~p4_3[1] ~p4_3[0]
+ 0 0 0 0 0 0 1 0 1 0 0 0
------------------------------------------------------------------------------------------------------------
~P[11] ~P[10] ~P[9] ~P[8] P[7] P[6] P[5] P[4] P[3] P[2] P[1] P[0]
```
This is the final form used in the MAC unit.