# **Real-time FPGA Implementation of Efficient Filter-Banks** for Digitally Sub-banded Coherent DFT-S OFDM Receiver

A. Tolmachev, M. Orbach, M. Meltsin, R. Hilgendorf, T. Birk and M. Nazarathy<sup>\*</sup> Electrical Engineering Department, Technion, Israel Institute of Technology, Haifa 32000 Israel \*nazarat@ee.technion.ac.il

**Abstract:** We demonstrate a real-time FPGA realization of dual polarization filter banks – this high-speed 2x25 GBd low-complexity core establishes feasibility of energy-efficient (multipliers count ~halved) HW architecture for a digitally sub-banded 170 Gb/s DFT-S OFDM receiver. **OCIS codes:** (060.1660) Coherent comm.; (060.2330) Fiber optic comm; Filter-banks; sub-band; OFDM; DFT-spread.

### 1. Introduction

Coherent optical transmission systems aim to provide spectrally efficient high capacity communication, enabled by wideband optical super-channels and sophisticated signal processing. Currently, the required real-time computations at enormous data throughput per hardware (HW) processor pose a major implementation bottleneck. To date, no real-time *receiver* (Rx) with full functionality has ever been demonstrated over FPGA at tens of GBd (>100 Gb/s). Two options have been considered to 'divide&conquer' the prohibitive computational load: (i): The conventional method is time-domain HW parallelization, partitioning the sampled data stream into blocks, processed in parallel by multiple DSP sub-processors, e.g. [1]. (ii) Optical sub-band decomposition into multiple sub-bands, using a separate slower receiver per each sub-band [2]. The main drawbacks of these alternative methods relate to latency, data routing and digital HW complexity challenges and lack of scalability. A recent third alternative has been gathering momentum [3-7]: *digital sub-banding*, namely frequency-domain (FD) digital-demux, parallelizing the processing in FD over multiple sub-bands which are digitally carved out from the channel spectrum. Here we report on a first real-time implementation of a low-complexity *filter-bank* (FB) pair in a single FPGA at ultra-fast rate supporting 2x25 GBd (170 Gb/s) 16-QAM OFDM 25 GHz channels. The FB is realized in real-time, the rest of the optical transmission link is simulated off-line, feeding the input/output memories of the real-time FB FPGA.

# 2. Digital sub-banding

The recent *Multi Sub-Band* (MSB) OFDM Rx concept [3-7] is reviewed in Fig. 1. The ADCs operate at 26.6GS/s. The transmission format is *DFT-spread* (DFT-S) [8] OFDM, with 960 tones ('tone'=OFDM sub-carrier), partitioned into 15 DFT-S sub-bands (64 tones per sub-band). The large OFDM tones count provides high spectral efficiency, whereas using DFT-S reduces the *Peak to Average Ratio* (PAPR) and NL impairments. The filter bank digitally extracts, over 15 parallel paths, 1.66 GHz slices out of the 25 GHz input BW, down-sampled by a factor of 8, generating 15 twice-oversampled sub-bands, each at 3.33 GS/s, each carrying a "mini-" DFT-S OFDM signal with 64 subcarriers. The sub-band low sampling rate dramatically reduces the HW complexity for subsequent processing over an array of 15 slow sub-band OFDM Rx-s, running DSP algorithms which function much better in the narrow-band spectral environment. OFDM Rx-s at relatively slow rate have already been experimentally demonstrated [1,2] and in fact our required sub-band Rx-s are even slower. The critical challenge remains the design of efficient digital filter-bank structures, processing the full throughput of the X and Y *polarizations* (POL), which is partitioned into 15 x 2 data sub-streams corresponding to each sub-band. The overhead of dividing into sub-bands must be kept low.

### 3. Filter Bank basic structure

Figure 1 presents the sub-banded Rx front-end for each POL, including the efficient oversampled filter bank. To achieve twice-oversampling, we modify the classic critically sampled analysis FB structure (described in most DSP textbooks). To achieve M=16 BW reduction, the novel 2x under-decimated FB (Fig. 2) comprises M/2=8 Single Input Dual Output (SIDO) filters instead of 16 Single Input Single Output (SISO). The polyphase filter array is derived from a 48-taps low-pass prototype filter, thus each SIDO-FIR polyphase filter requires just 3 taps per output.



Fig. 1: Basic MSB DFT-S OFDM receiver architecture, showing the two filter banks (one per polarization) and one of 15 sub-band receivers, fed by the corresponding sub-bands of the X and Y polarizations.



Fig. 2: Efficient DSP realization of twice-oversampled filter bank implementation, first disclosed in [4].

### 4. HW implementation of the filter bank

We aim to receive IQ data at the rate of 26.6 GBd over each of the two POLs. Current FPGA-based receivers are unable to process such high-speed throughput. To address this limitation we parallelize the high rate processing in frequency by means of the filter bank pair (one for X-POL and one for Y-PON). Each of the 16 FB output ports is under-decimated to 26.6GS/s/8=3.33GS/s (2x oversampled 1.66 GHz sub-bands). This clock rate is available with state-of-the-art ASICs, however an FPGA implementation requires additional slowdown. Table 1 evaluates the HW resources for the Xilinx Virtex6 FPGA family devices for various implementation options. The first row presents the earlier described design, subsequent rows show design options with multiple slowed-down filter bank instances in parallel. The highlighted cells describe feasible design options for the selected technology. Fig. 3 presents our FPGA HW implementation comprising 8 block-parallel slowed-down filter bank instances, each running at 416MHz clock.

| Table 1: FPGA Hardware resources evaluation |           |               |       |       |
|---------------------------------------------|-----------|---------------|-------|-------|
| System Params.                              |           | 48 tap filter |       |       |
|                                             |           |               |       |       |
| Banks                                       | Operating | Total         | % for | % for |
|                                             | Freq.     | DSP Slices    | 240T  | 380T  |
|                                             |           | RAM           |       |       |
| 1                                           | 3325M     | 24            | 3.1   | 2.8   |
|                                             |           | 32            | 3.8   | 2.1   |
| 2                                           | 1663M     | 48            | 6.3   | 5.5   |
|                                             |           | 64            | 7.7   | 4.2   |
| 4                                           | 832M      | 96            | 12.5  | 11.1  |
|                                             |           | 128           | 15.4  | 8.3   |
| 6                                           | 555M      | 144           | 18.8  | 16.6  |
|                                             |           | 192           | 23.1  | 12.5  |
| 8                                           | 416M      | 192           | 25.0  | 22.2  |
|                                             |           | 256           | 30.8  | 14.0  |
| 10                                          | 333M      | 240           | 31.3  | 27.8  |
|                                             |           | 320           | 38.5  | 17.5  |
| 12                                          | 278M      | 288           | 37.5  | 33.3  |
|                                             |           | 384           | 46.6  | 25    |
| 14                                          | 238M      | 336           | 43.8  | 38.8  |
|                                             |           | 448           | 53.8  | 29.2  |



Fig. 3: Filter Bank implementation over the Virtex6 240T FPGA.

### 5. Real-Time Implementation

The filter bank system design presented above was implemented and tested in real-time at 2 x 25GBd rates using the setup of Fig. 4. Filter bank frequency-domain de-muxing for the X- and Y- polarizations was performed at full-speed using two Xilinx ML605 boards, each hosting a Virtex-6 240T FPGA.

Full Band OFDM Tx + Optical Fiber Channel propagation was simulated offline. The resulting receiver input samples were transferred into the Xilinx ML605 EvBoard using a GbE interface and stored into the on-chip memory. The FPGA filter-banks operated in real-time at full-speed (2x26.6 GS/s), but in intermittent bursts as input/out on-chip memories got accessed by the offline PC processor. The filter-bank outputs were read over the GbE interface and fed into PC to perform the sub-band FB processing. Fig. 5 compares the end-to-end fully-offline reference simulations vs. the data obtained from the hybrid real-time/offline set-up of Fig. 4, comprising real-time FPGA filter-banks + off-line Tx + channel + sub-band Rx-s. Specifically, Fig. 5 compares received QPSK constellations and inferred Q factors for real-time FPGA based vs. Matlab/Simulink simulated filter-banks and also for floating and fixed point filter-bank realizations. In all cases it is apparent that the penalties due to the FPGA implementation with a finite number of bits are negligible, indicating that the FPGA 16-bit precision is amply sufficient and performance is dominated by the quantization noise of the 5-bit ENOB assumed ADC device assumed

in the offline simulation. The system was also tested with 16-QAM hybrid offline + real-time FBs, with similar results, to be detailed in the talk.

# 6. Discussion and Conclusions

We demonstrated the recent digitally sub-banded filter-bank based architecture [3-7] in real-time over available FPGAs. The most demanding HW sub-system in the proposed MSB DFT-S OFDM receiver, namely the dual-POL filter-banks core, has been established here to be amenable to real-time FPGA realization. The FB-based HW architecture enables removal of the heavy CD and PMD conventional equalizers. Heretofore, the riskiest has been the filter-bank, demonstrated here in real-time. The way is now open towards realizing a complete 16-QAM Rx at tens of GBd per polarization. It remains to augment the FB system with an array of slow sub-band OFDM Rx-s, fed from the filter banks output ports. Here we demonstrated the critical filter-banks core - the high-speed bottleneck through which the entire 2x25GBd Rx throughput flows, prior to being partitioned into 2x15 slower data streams to be processed by the 15 simple frequency-flat sub-band Rx-s. In the current demonstration, the sub-band receivers were simulated off-line whereas the filter-banks operated in real-time. However, the extrapolation is evident - the slow, simple and robust sub-band OFDM receivers could be readily realized at 3.2 GS/s. As an indication, generic OFDM receivers at somewhat higher rates of several GBd were already implemented in real-time HW [1,2,9]). *Acknowledgement:* This work was supported by the Chief Scientist Office of the Israeli Ministry of Industry, Trade

and Labor within the 'Tera Santa' consortium.







Fig. 5: Simulation-only vs. real-time+simulation results. (left): 5bit ADC floating point simulation, Q=18.7dB. (mid): 5bit ADC fixed point simulation, Q=18.7 (right): 5bit ADC FPGA processed results, Q=18.7.

#### 4. References

[1] Qi Yang et al, "Towards real-time implementation of optical OFDM transmission", in OFC/NFOEC'10, OMS6.

[2] N. Kaneda et al," Real-Time 2.5 GS/s Coherent Optical Receiver for 53.3-Gb/s Sub-Banded OFDM", JLT 28, 494, 2010.

[3] A. Tolmachev and M. Nazarathy, "Filter-bank based efficient transmission of reduced guard interval OFDM," Opt. Express, 19, B370 (2011).

[4] M. Nazarathy and A. Tolmachev, "Digital sub-banding – a signal processing architecture... improving OFDM..." invited, in SPPCOM'12.

[5] A. Tolmachev et al, "Oversampled Digital Filter Banks Simplify and Improve ... Processing in RGI OFDM Receivers," in OFC'12, OM2H.3.

[6] M. Nazarathy and A. Tolmachev, "Sub-banding DSP for flexible optical transceivers," invited, in ICTON'12.

[7] M. Nazarathy and A Tolmachev, "Filter-bank based digital sub-banding ASIC architecture ...," invited, in SPIE OPTO (Photon. West, 2013).

[8] X. Chen et al, " Experimental demonstration of improved fiber nonlinearity tolerance ... DFT-spread OFDM systems", Opt. Express **19**, 2011. [9] R. Killey et al, " Recent Progress on Real-Time DSP for Direct Detection Optical OFDM Transceivers", OFC 2011.