- Low power embedded compilers and architectures
- People
- Selected Relevant Publications
- Related Projects
Low Power Embedded Compilers and Architectures
The fast growing market for embedded devices has large implications on design and technology. Shorter time-to-market requires programmable platforms with sufficient tool-support. Inherent complexity of the application demands high performance throughputs from these platforms. Realistic peak performances are expected to be around 100-1000GOPs. Additionally, the battery oriented and portable nature of these systems implies stringent power requirements for the platforms. Power consumption of these platforms running these applications is expected to be in the order of around 100-1000mW. Rephrasing power and performance requirements, we see that the computational efficiency has to be around 1000 MOPS/mW, combined with high peak performances. In addition, as a consequence of technology scaling into the nano-dimensions, deep sub-micron effects cannot be contained at lower levels of system abstraction, but counter measures has to be taken at higher abstraction levels namely in processor architecture and compilers. With the growth of multimedia and wireless applications that are becoming more complex (dynamic, heterogeneous and data/memory dominated) mapping such applications onto processor architectures in an efficient manner is a non-trivial task. State-of-the-art power-efficient programmable processors and compilation techniques achieve up to 50MOPS/mw (for single 32-bit arithmetic unit equivalent operations), which is at least a factor 20 short of reaching the target. Furthermore, the non-recurrent engineering (NRE) costs of the compilation and architecture exploration frameworks are becoming increasingly high. This implies that the desired solutions should be extensions that can be integrated into existing state-of-the-art architecture templates and compilation frameworks.
In order to approach this very challenging task, all the relevant architecture and compiler aspects are consolidated and are categorized into primitives. The aim of such a categorization is to handle the design complexity and also to enable easier integration into the state-of-the-art solutions. Also, since the limitations of one state-of-the-art solution are different from the limitations of another state-of-the-art solution, these primitives aid in identifying the particular extensions needed for a given state-of-the-art solution. The architecture and compiler primitives are:
- Data-parallel background memory: Organization of the data memories for effective data-parallel access and the related compiler optimizations like data-layout and data-locality are in this category
- Data-parallel foreground memory: Organization of the local data storage in the data-parallel processor, and the related compiler optimizations like register allocation, data-layout and data-locality are in this category
- Subword parallel data-path: Organization of the data-path units for effective subword support, and the related compiler optimizations like instruction selection, scheduling and assignment are in this category
- Data-parallel address path: Organization of the address-path units for the heavily distributed data-parallel memory units and the related compiler optimizations like instruction selection, scheduling and assignment
- Distributed instruction memory: Organization of the instruction path in heavily distributed memories plus local controllers, and the related compiler optimizations like distributed code-layout, instruction compression and code selection are in this category
Graduate Students and Interns
During this period as a long term research coordinator I have worked with various PhD students and MS Interns towards this research (listed alphabetically).
- Javed Absar [PhD, KULeuven]
- Chaitanya Cherukuri [MS Intern]
- Jean Baka Domelevo [MS Intern]
- Yuki Kobayashi [PhD Osaka University]
- John (Ioannis) Koutras [MS Intern]
- Nikolaos Kroupis [PhD, Univ Patras]
- Andy Lambrechts [PhD, KULeuven]
- Elena Perez [PhD Candidate, UMadrid]
- Vasilis Porpodas [PhD Candidate]
- Praveen Raghavan [PhD, KULeuven]
- Estela Rey Ramos [MS Intern]
- Nandhavel Sethubalasubramanian [MS Intern]
- Ittetsu Taniguchi [PhD, Osaka University]
- Guillermo Talavera [PhD Candidate, UABarcelona]
Selected Publications
Ultra-Low Energy Domain-Speciific Instruction-Set Processors.
Francky Catthoor, Praveen Raghavan, Andy Lambrechts, Murali Jayapala, Angeliki Kritikakou and Javed Absar, Springer 1st Edition XXI 400p hardcover, 2010. [BIB]
[Springer Link]
- Playing the Trade-Off Game Architecture Exploration Using COFFEE.
Praveen Raghavan, Murali Jayapala, Andy Lambrechts, Javed Absar and Francky Catthoor. ACM Transactions on Design Automation of Electronic Systems (TODAES), 14(3), May 2009. [BIB] - Distributed Loop Controller for Multi-threading in Uni-threaded ILP Architectures.
Praveen Raghavan, Andy Lambrechts, Murali Jayapala, Francky Catthoor and Diederik Verkest. IEEE Transactions on Computers, 58(3):311-321, March 2009. [BIB] - Compilation Technique for Loop Overhead Minimization.
Nikolaos Kroupis, Praveen Raghavan, Murali Jayapala, Francky Catthoor and Dimitrios Soudris. In 12th EuroMicro Conference on Digital System Design Architectures Tools and Methods, Aug 2009. [BIB] - Reconfigurable AGU: An Address Generation Unit Based on Address Calculation Pattern for Low Energy and High Performance Embedded Processors.
Ittetsu Taniguchi, Praveen Raghavan, Murali Jayapala, Francky Catthoor, Yoshinori Takeuchi and Masaharu Imai. IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences, E92(4):1161-1173, 2009. [BIB] - Interconnect-Exploration for Energy vs Performance Tradeoffs for Coarse Grained Reconfigurable Architectures.
Andy Lambrechts, Praveen Raghavan, Murali Jayapala, Bengfeng Mei, Francky Catthoor and Diederik Verkest. IEEE Transactions on VLSI, 17(1):151-155, Jan 2009. [BIB] - Locality Optimizations in a Compiler for Wireless Applications.
Javed Absar, Praveen Raghavan, Andy Lambrechts, Min Li, Murali Jayapala and Francky Catthoor. Design Automation of Embedded Systems (DAEM), April 2008. [BIB] - Address Generation Optimization for Embedded High-Performance Processors: A Survey.
Guillermo Talavera, Murali Jayapala, Jordi Carrabina and Francky Catthoor. Journal of Signal Processing Systems for Signal Image and Video Technology (formerly the Journal of VLSI Signal Processing Systems for Signal Image and Video Technology), 53(3):271-284, December 2008. [BIB] - Methodology for operation shuffling and L0 cluster generation for low energy heterogeneous VLIW processors.
Y. Kobayashi, M. Jayapala, P. Raghavan, F. Catthoor and M. Imai. ACM Transactions on Design Automation of Embedded Systems (TODAES), 12(4):(Article no. 41), Sep 2007. [BIB] - Very Wide Register: An Asymmetric Register File Organization for Low Power Embedded Processors.
P.Raghavan, A.Lambrechts, M.Jayapala, F.Catthoor, D.Verkest and H. Corporaal. In "DATE '07: Proceedings of the conference on Design, 2007. [BIB] - Architectures and Circuits for Software Defined Radios: scaling and scalability for low cost and low energy.
L. Van der Perre, B. Bougard, J. Craninckx, W. Dehaene, L. Hollevoet, M. Jayapala, P. Marchal, M. Miranda, P. Raghavan, T. Schuster, P. Wambacq, F. Catthoor and P. Vanbekbergen. In IEEE International Solid-State Circuits Conference (ISSCC), 2007. [BIB]
Related Projects
- SWANS: Silicon platforms for Wireless Advanced Networks of Sensors (01/05/05 - 13/03/08), IWT
- FLEXWARE: Exploitation of flexible hardware platforms for massively parallel bio-informatics applications (1/jan/07 31/dec/10), IWT
- HiPEAC-1: High-Performance Embedded Architectures and Compilers (9/1/2004 - 8/31/2008), FP6
- Marie-Curie Actions: Human Resources and Mobility Activity, (2004-2008), FP6
- [2004-2007] IMEC M4 Program
- [2007-2010] IMEC Apollo Program