# Speeding up DSP applications using Reconfigurable Computing

Júlio C.M. Ruzicki, Eduardo V. Nicola, Luis J. Martins, Júlio C.B. Mattos

Centro de Des envolvimento Tecnológico (CDTec), Universidade Federal de Pelotas(UFPEL) Caixa Postal 354 – 96.010-610 – Pelotas – RS – Brasil

{jcmruzicki, evnicola, lhjmartins, julius} @inf.ufpel.edu.br

Abstract- In the last decades, the increased density in integrated circuits experienced has ensured the evolution of processors until the Moore's Law ending. Different types of processors are used to exploit the parallelism available in the applications, such as the well known superscalar architectures. Digital Signal Processing applications use Programmable Digital Processor (PDSP) to exploit their parallelism and their features increasing the performance maintaining good tradeoff in terms of power dissipation. However, device scaling is reaching physical limits and DSP applications are getting more complex. Thus, this paper proposes a reconfigurable system coupled to processor in order to increasing a performance of DSP applications consuming less power as possible. The results, in terms of performance, are presented in different scenarios comparing this processor coupled to a reconfigurable system with a GPP (General Propose Processor) and commercial DSPs. Experimental results show that the Architecture purposed saves power and speeds up the execution of applications.

## Keywords— DSP, Reconfigurable Architectures, Dynamic Reconfiguration

#### I. INTRODUCTION

The increased density of processors could provide support for the use of DSP techniques in execution of tasks involving multimedia, instrumentation and data communication. This increase in density also ensured the evolution of processors as predicted in Moore's Law. The main drawback of these approaches is that requires high voltages causing the raise of the power consumption. However, the evolution provided by these approaches found its limit. The limit is in the heat dissipation generated by the increased clock rate in digital devices [1].

The PDSPs (Programmable Digital Signal Processors) offer the flexibility offered by GPPs (General Purpose Processors), covering a wide range of applications that use DSP. Furthermore, PDSPs offer high performance in speed and consumption of ASICs (Application-Specific Integrated Circuit) without the cost and design time. Besides, PDSPs are preferred to perform DSP applications because its specialized architecture explores the parallelism and data intensive computing exhibit by these applications. In addition to conventional units of a GPP, a PDSP offers other execution units for specific tasks, such as MAC (Multiply-Accumulate), FIR (Finite Impulse Response), FFT (Fast Fourier Transform), and memory access [2]. While the industry does not adopt new materials, the way is the research for new methodologies to ensure the technology evolution. One area that has grown in research is Reconfigurable Architectures. This field aims to increase processor performance without increasing the clock frequency, coupling a functional unit capable of executing program code portions directly in hardware, avoiding the conventional datapath processor, increasing the execution speed with the lowest consumption energy possible. Many concepts and approaches have been developed to provide increase performance keeping the overhead from area and power consumption as low as possible. Yet two decades ago, to increase the software efficiency, one can see [3][4][5][6]. Many works were proposed to keep power consumption as low as possible, presenting good performance [5][7][8].

In this paper, we present the results obtained from the use of the technique of reconfiguration in a GPP processor, front of two commercial PDSPs.

#### II. METHODOLOGY

The tool that was selected is the VisualDSP++ because it allows works with many Analog's [9] PDSPs and offers an offline simulator that provides the information necessary to compare the processors performance. The chosen tool allows works with the PDSP families of Analog Devices as follow: ADSP-21XX, Blackfin, SHARC and TigerSHARC. From these families were chosen two processors for this work: ADSP-BF504, from the Blackfin family, and the ADSP-21477, from the SHARC family.

The applications selected to his work as follow: (1) Fast Fourier Transform (FFT), (2) Inverse Fast Fourier Transform (IFFT), (3) Finite Impulse Response (FIR), (4) Discrete Fourier Transform (DFT) and (5) Inverse Discrete Fourier Transform (IDFT). These applications were executed in the processors and then compared following the proposed scenarios: PDSPs vs. Mips32 and PDSPs vs. Mips32 with RU coupled. Also were used four models to RU size proposed by [10].

The reconfigurable architecture was proposed by [10]. The system is composite of the following blocks: IF (Instruction Fetch), ID (instruction decode), EX (Execution), MA (Memory Access) and WR (write register), represent the Mips32 pipeline stages. The following blocks: BT (bin ary translator), Reconfiguration Cache (RC) and RU, represents the units coupled to processor and complete the proposed architecture.

### III. RESULTS AND DISCUSSION

The time of execution obtained of applications simulation from 1 to 3 is shown in Fig. 1a and to applications 4 and 5, in Fig. 1b, all considering the model 1 to RU size. This model presented better trade off of proposed models. To obtain the time of execution were used the maximum clock allowed by processors, as follow: 190 MHz to MIPS32, 400 MHz to Blackfin and 200 MHz to SHARC. To MIPS32, was considered an ICP=1 (Instructions Per Cycles) and to others two processor as recommend on Visual DSP++ guide.



Fig. 1. Execution time of the applications.

We noted that the processor Mips32 is faster than both Blackfin and SHARC on applications 1 and 3, and slower on applications 2, 4 and 5. This behavior happens because the both PDSPs processors use emulation to execute double precision computations. The SHARC processor has a floating-point unity (FU) in its architecture, but can't be used to execute double precision computations.

In these experiments was obtained an average speed up from 1.97, making the proposed architecture more efficient than other processors used. The model 1 presented almost the maximum speed up between the models proposed and minor occupied area.

#### CONCLUSIONS

This work shown that is possible to obtain speed up and save energy from DSP applications when using the reconfigurable computing technique coupled to a GPP like Mips32. The results also show that using a software simulation tool to reconfigurable computing allows the user to check the features of proposed architecture in project time, saving design time of embedded systems. Also using a simulation tool was possible verify the area occupied by models proposed. We also show that using a BT it is possible to couple any processor to a RU in runtime.

Future studies include the evaluation of this architecture running a complete set application like Mibench and the insertion from specialized hardware used to DSP like multiplyaccumulate (MAC), circul ar buffer.

#### **REFERENCES**

- N.S. Kim, T. Austin, D. Blaauw, T. Mudge, K. Flautner, J.S. Hu, M.J. Irwin, M. Kandemir, V. Narayanan : Leakage current: Moore's law meets static power. Computer 36(12), 68–75 (2003).
- [2] P. Lasley, et. Al. DSP Processor Fundamentals-Architectures and Features. New York: IEEE Press, 1997.
- [3] R. K. Gupta, G. D. Micheli. Hardware-software co-synthesis for digital systems. IEEE Design and Test of Computers, Santa Barbara, v. 10, n. 3, p. 29 – 41, September 1993.
- [4] D. Gajski, et al. SpecSyn: An Environment Supporting the Specify-Explore-Refine Paradigm for Hardware/Software System Design. IEEE Transactions on VLSI Systems, Princeton, v. 6, n. 1, p. 84-100, March 1998.
- [5] J. Henkel. A low power hardware/software partitioning approach for core-based embedded systems. In: Design Automation Conference, DAC, 36., 1999, Anaheim. Proceedings... New York: ACM Press, 2005. p. 122 – 127.
- [6] G. Venkataramani, et al. A Compiler Framework for Mapping Applications to a Coarse-grained Reconfigurable Computer Architecture. In: International Conference On Compilers, Architecture And Synthesis For Embedded Systems, 2001, Atlanta. Proceedings... New York ACM Press, 2001. p. 116 – 125.
- [7] M. Wan, et al. An Energy Conscious Methodology for Early Design Space Exploration of Heterogeneous DSPs. In: Custom Integrated Circuits Conference, 1998, Santa Clara. Proceedings... Washington: IEEE Computer Society, 1998. p. 111 – 117.
- [8] G. Stitt, F. Vahid. The Energy Advantages of Microprocessor Platforms with On-Chip Configurable Logic, IEEE Design and Test of Computers, Los Alamitos, v. 19, n. 6, p. 36 – 43, November 2002.
- [9] Analog DSP processors. Available at homepage: http://www.analog.com/en/processors-ds p/products/in dex.html.
- [10] Beck, A.C.S.; Rutzig, M.B.; Gaydadjiev, G.; Carro, L., Transparent Reconfigurable Acceleration for Heterogeneous Embedded Application s, *Desig n, Automation and Test in Europe, 2008. DATE '08*, vol., no., pp.1208,1213, 10-14 March 2008.