Update mac_5x9x8 Revised authored by gling's avatar gling
We have defined a KxMxN MAC unit as a device which takes K signed M-bit numbers ($a_0, a_1, ..., a_{K-1}$) and K signed N-bit numbers ($b_0, b_1, ..., b_{K-1}$), and computes
$$a_0*b_0 + a_1*b_1 + ... + a_{K-1}*b_{K-1}$$
We have defined a KxMxN MAC unit as a device which takes K signed M-bit numbers ($a_0, a_1, ..., a_{K-1}$) and K signed N-bit numbers ($b_0, b_1, ..., b_{K-1}$), and computes $$a_0 \ast b_0 + a_1 \ast b_1 + ... + a_{K-1} \ast b_{K-1}$$
For making an KxMxN MAC unit, the number of output bits will be
$$\lceil\log_2(K * 2^M * 2^N)\rceil = \lceil\log_2 K\rceil + M + N$$
......@@ -10,84 +9,25 @@ First, to perform a signed multiplication, we use the modified Baugh-Wooley form
\url{https://web.ece.ucsb.edu/~parhami/pres_folder/f31-book-arith-pres-pt3.pdf}. A more clear representation of this method is shown at \url{https://en.wikipedia.org/wiki/Binary_multiplier#Signed_integers} with the exception of a missing 1 in the MSB of the last partial sum.
Using an example calculation with a 4-bit by 5-bit signed multiply with multiplicands `a` and `b`:
\begin{verbatim}
```
p0[4:0] = a[4:0] * b[0]
p1[4:0] = a[4:0] * b[1]
p2[4:0] = a[4:0] * b[2]
p3[4:0] = a[4:0] * b[3]
~p0[4] + p0[3] + p0[2] + p0[1] + p0[0]
~p0[4] + p0[3] + p0[2] + p0[1] + p0[0]
~p0[4] + p0[3] + p0[2] + p0[1] + p0[0]
~p0[4] + p0[3] + p0[2] + p0[1] + p0[0]
-----------------------------------------------------------------------
\end{verbatim}
We have defined a 5x9x8 MAC unit as a device which takes 5 signed 9-bit numbers (a0, ..., a4) and 5 signed 8-bit numbers (b0, ..., b4), and computes a0*b0 + a1*b1 + a2*b2 + a3*b3 + a4*b4.
~p0[4] p0[3] p0[2] p0[1] p0[0]
~p0[4] p0[3] p0[2] p0[1] p0[0]
~p0[4] p0[3] p0[2] p0[1] p0[0]
p0[4] ~p0[3] ~p0[2] ~p0[1] ~p0[0]
+ 1 0 0 0 0 1 0 0 0
-------------------------------------------------------------------------------
P[8] P[7] P[6] P[5] P[4] P[3] P[2] P[1] P[0]
```
For making an KxMxN MAC unit, the number of output bits will be
ceil(log2(K * 2^M * 2^N)) = ceil(log2(K)) + M + N
For a 5x9x8 MAC unit, ceil(log2(5)) + 8 + 9 = 20 bits.
First, to perform a signed multiplication, the modified Baugh-Wooley form on slide 64 of
https://web.ece.ucsb.edu/~parhami/pres_folder/f31-book-arith-pres-pt3.pdf was used. A more clear representation of this method is shown at https://en.wikipedia.org/wiki/Binary_multiplier#Signed_integers with the exception of a missing 1 in the MSB of the last partial sum.
A corrected form for a 9-bit input `a` and an 8-bit input `b`:
```
p0[8:0] = a[8:0] * b[0]
p1[8:0] = a[8:0] * b[1]
p2[8:0] = a[8:0] * b[2]
p3[8:0] = a[8:0] * b[3]
p4[8:0] = a[8:0] * b[4]
p5[8:0] = a[8:0] * b[5]
p6[8:0] = a[8:0] * b[6]
p7[8:0] = a[8:0] * b[7]
~p0[7] p0[6] p0[5] p0[4] p0[3] p0[2] p0[1] p0[0]
~p1[7] +p1[6] +p1[5] +p1[4] +p1[3] +p1[2] +p1[1] +p1[0] 0
~p2[7] +p2[6] +p2[5] +p2[4] +p2[3] +p2[2] +p2[1] +p2[0] 0 0
~p3[7] +p3[6] +p3[5] +p3[4] +p3[3] +p3[2] +p3[1] +p3[0] 0 0 0
~p4[7] +p4[6] +p4[5] +p4[4] +p4[3] +p4[2] +p4[1] +p4[0] 0 0 0 0
~p5[7] +p5[6] +p5[5] +p5[4] +p5[3] +p5[2] +p5[1] +p5[0] 0 0 0 0 0
~p6[7] +p6[6] +p6[5] +p6[4] +p6[3] +p6[2] +p6[1] +p6[0] 0 0 0 0 0 0
+p7[7] ~p7[6] ~p7[5] ~p7[4] ~p7[3] ~p7[2] ~p7[1] ~p7[0] 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------
P[15] P[14] P[13] P[12] P[11] P[10] P[9] P[8] P[7] P[6] P[5] P[4] P[3] P[2] P[1] P[0]
```
*
* For making a 5x8x9 MAC unit, the number of output bits will be
* log2(2^8 * 2^9 * 5) = 8 + 9 + log2(5) = 19.3219280949 => 20 bits
*
* This works by taking the partial sums of the 5 multiplies, adding the 5
* parital sums of each place value together first, then summing the 8 partial
* sums together last.
*
* This is derived from the Baugh-Wooley method for signed multiplication, but
* modified to support summing 5 items without finishing the multiplies. First,
* XOR every initial partial product by (1 << 8), in other words invert the sign
* bits. Invert the bits last partial product corresponding to the sign bit of
* the second multiplicand.
*
* Now, to complete the Baugh-Wooley method for a single multiply, we need to
* add (1 << 15) (one in the sign bit of the output) which effectively is
* inverting the sign bit of the final output. And add (1 << 9), a bit in the
* position one place value greater than the sign bit of the first multiplicand.
*
* Now, when summing these partial sums together, they need to be sign-extended.
* Unfortunately, due to the array adder outputting two partial sums, the sign
* bit cannot easily be determined, so it would be better if we can modify the
* Baugh-Wooley method to produce a 20-bit output directly and sum those
* instead.
*
* It ends up that simply adding a set of 4 bits in