



# Implementation Evaluation of Fixed-Point Multipliers for Complex Numbers

#### Per Larsson-Edefors and Erik Börjeson

Chalmers University of Technology Gothenburg, Sweden

perla@chalmers.se



# Methodology for "Implementation Evaluation"









Source: Real-Time Transmission over 2x55km All 7-Core Coupled-Core Multi-Core Fiber Link, 2022.

- FPGA: + realtime, coarse structure.
- ASIC: complex, + gate level.
- Compromise: ASIC netlist analysis in ASAP7.

Source: Nathan Godwin

# Fixed-Point vs Floating-Point Multipliers



Full-precision products  $\Rightarrow$  data wordlength growth: products need to be truncated (e.g. MSB).

- Fixed-point circuits are substantially faster than floating point.
- At the same timing constraint, fixed-point circuits require less area.

#### **Fixed-Point Circuits for Digital Fronthauls**



Source: Which ADC Architecture Is Right for Your Application?, 2005

- Fixed-point arithmetic suits applications with high clock rates and strict power budgets.
- Receivers in communication systems: DSP circuits interface to ADCs with high sampling rates (50-200 Gsamples/s).
- Additionally, these ADC have low resolution (6-12 bits) ...

#### Lower Resolution, Shorter Wordlengths



The speed and area advantage of fixed-point circuits appear to increase as the wordlength gets shorter.

#### **Fixed-Point Custom DSP ASIC Implementation**



# **Complex Multipliers: Motivating Example**

#### DSP for a 400-Gbit/s coherent fiber-optic receiver



It turns out that ...

1) the equalizer dominates receiver area and power, and

2) complex multipliers dominate the equalizer implementation.

#### **Direct Complex Multiplication**



<u>Direct complex multiplication:</u> Zr = Ar Br – Ai Bi Zi = Ar Bi + Ai Br

### Reducing Multiplication Count $4 \rightarrow 3$



How does a reduction in multiplication complexity affect circuit implementations?

(Noise analysis available in Wenzler et. al, ISCAS'95)

- HDL descriptions for direct, Wenzler, and Golub.
- HDL parameterized w.r.t. wordlengths.
- Timing-driven synthesis in Cadence Genus using ASAP7 library

area numbers for different timing constraints.

 $\rightarrow$ 



• Note, additional runs have been done with commercial libraries to validate ASAP7.

#### Area Evaluations, 1

Compare the three CMs to the baseline: an integer (real) multiplier (RM).



#### Area Evaluations, 2



The direct CM remains faster for shorter wordlengths, but its area disadvantage is not as pronounced.

# **Energy Evaluation Flow**

- Several distinct design phases and not-so-well-integrated EDA point tools.
- Challenging to combine information on system workload with physical layout.

| $Psw = f V dd^2 \Sigma (C_i \alpha_i)$ | )                    |
|----------------------------------------|----------------------|
| ee 8 2 2                               | 00011110<br>00010111 |
|                                        | 00000001             |
|                                        | 10001111             |
|                                        | 10010101             |
|                                        | 01110010             |
|                                        | 00111000             |

- Simulation-based energy analysis:
  - Generate input vectors.
  - Backannotate switching activity from Cadence Xcelium simulation in netlists
    → energy per operation.
- Baseline vector set for A and B: Uniformly distributed random numbers.

## **Energy per Operation Evaluations**

![](_page_13_Figure_1.jpeg)

Despite having larger area than the other types, the direct CM is the most energy efficient. As timing is relaxed, the direct CM's energy/op increases, which is counterintuitive.

### **Glitching Power Dominates Total Power**

![](_page_14_Figure_1.jpeg)

- Glitching is an issue in arithmetic circuits extensively using XOR.
- Balancing the delay of reconverging logic paths reduces the number of glitches.

For tighter timing, the symmetric arrangement of operations in the direct CM makes for balanced logic paths.

For relaxed timing, clearly the effect of increased signal switching on Psw dominates that of decreasing capacitance.

# Impact of Data Vectors on Energy

- Use two different vector sets:
  - Baseline vector set. -
  - Reduced-activity vector set: Half the switching activity and 1 bit less dynamic range.
- Assign baseline set to A, the other to B.
- 2. Swap inputs.

![](_page_15_Figure_6.jpeg)

|          | Small dynamic | Low switching |
|----------|---------------|---------------|
| Random   | range         | activity      |
| 00011110 | 00001001      | 01010100      |
| 00010111 | 00001110      | 01010100      |
| 0000001  | 00001000      | 01010100      |
| 10001111 | 11111011      | 01010100      |
| 11100010 | 00001110      | 10100111      |
| 10010101 | 11111111      | 10100111      |
| 01110010 | 11111011      | 10100111      |
| 00111000 | 00001101      | 10100111      |
| 01111111 | 11111000      | 00101010      |
| 00001101 | 00000100      | 00101010      |
| 10111111 | 11111011      | 00101010      |
| 01101011 | 0000000       | 00101010      |
|          |               |               |

note: signal properties have been somewhat exaggerated

## **Pin Assignment to Reduce Power**

![](_page_16_Figure_1.jpeg)

The impact of pin assignment is the greatest for direct and Golub CMs. For relaxed timing, optimal pin assignment significantly reduces glitching power.

#### Conclusion

- Anecdotal evidence suggests fixed-point complex multipliers (CMs) commonly are constructed based on the direct form.
- Two alternate approaches for CM exist: Wenzler's and Golub's schemes.
- To the best of our knowledge, a comprehensive CM evaluation is missing.
- Our evaluations show that...
  - Wenzler generally is faster and more hardware frugal than Golub.
  - Golub is more energy efficient than Wenzler, because glitches are fewer.
  - the direct CM is faster and more energy efficient than other approaches.
  - the only design situation that may call for Wenzler/Golub is that of resource-constrained implementations where short delay is not a priority.