| 
 
      
       
       Successful Use of an Open Source Processor in a Commercial ASIC by Declan Staunton, Silicon & 
      Software SystemsDublin, Ireland
 
 Abstract
 
 Open source IP has been 
      slow to take off in commercial IC development for very good reasons. 
      Immaturity of designs, lack of support, licensing and warranty concerns 
      would normally ensure open source IP cores are not even considered as 
      solutions. However there are situations, and in the case of the LEON core, 
      there are solutions that warrant consideration for certain types of 
      application. Here we describe our experience in using the LEON processor 
      in a commercial ASIC. Both benefits and drawbacks are described before 
      concluding that LEON was an excellent solution for this 
      design.
 
 Introduction
 
 Processor 
      selection is one of the key design decisions in any SoC development. For 
      this development there were fewer constraints than would normally be 
      encountered when choosing a processor. In particular there was no legacy 
      software or particular operating system that needed to be supported. 
      Furthermore the vast majority of the logic design was from scratch so 
      there were no legacy bus interfaces to support. The ASIC was intended for 
      use in a high volume embedded system. As can be seen from the block 
      diagram in Fig 1 this was a typical SoC design. The principal requirements 
      were:
 
 1) Performance: The 
      initial performance requirements were relatively low but these grew in 
      time. There were some hard real-time requirements and many firm (meaning 
      the system would not fail but the user could perceive a slowdown) 
      real-time requirements
 
 2) Low or no 
      royalty: As very high volumes were expected this was important 
      in  keeping the unit price low
 
 3) Supervisor & User modes:  The 
      processor would have to support the execution of third party code without 
      jeopardizing the integrity of the system.
 
 
 Power consumption was 
      not a significant concern and while a synthesizable core was preferable 
      this was primarily due to the unsuitability of the hard cores that were 
      available for the fabrication process used.
 
 A number of 16 and 
      32-bit commercial cores were considered before concluding that the LEON 
      processor offered the best overall solution.
 
 
  Figure 1: Block diagram of the LEON powered 
      SoC
 
 
 The LEON 
      Processor
 
 LEON is a VHDL implementation of the open standard 
      (IEEE1754) SPARC V8 architecture [1]. LEON is a highly configurable, 
      synthesizable, 32-bit core with pre-selectable cache sizes (both I & 
      D), optional floating point unit and hardware acceleration for multiply 
      and divide instructions, debug monitor, AMBA AHB [2] interface and support 
      for a co-processor. Most of the features of LEON can be configured via a 
      simple GUI which produces a VHDL file of constants that is then referenced 
      by the other source files. A screenshot of this GUI showing the 
      configuration options for the Integer Unit is shown in Fig 2.
 
 The 
      LEON3 processor is available under GPL and commercial license 
      arrangements. A LGPL version (LEON2) is also available. In fact for most 
      of the design phase the LEON2 core was used but a late change to the LEON3 
      core was made for licensing reasons. Despite occurring late in the design 
      phase, the switchover from LEON2 to LEON3 was not difficult.
 
 A 
      full software development environment based on the GNU C/C++ compiler is 
      available for LEON. An instruction set simulator (TSIM) is also available 
      although this was only rarely used by the IC development team. The LEON 
      cores and associated IP are available from and supported by Gaisler 
      Research [3]
 
 
  Figure 
      2: LEON configuration GUI
 
 Using LEON
 
 Familiarisation with 
      the LEON design was quite straightforward but could have been accelerated 
      by more complete design documentation and better coding practices. The 
      code itself was written in a consistent style but the signal and variable 
      naming were often not very descriptive and comments were scant. Moreover 
      the extensive use of VHDL records caused problems with some tools and in 
      some cases a record had to be broken out into its constituent 
      signals.
 
 The first step in customizing 
      LEON for our application was the identification of the component entities 
      we wished to retain and excising these from the LEON deliverable (which 
      includes bridges, interfaces and peripherals to make it a SoC in its own 
      right). The components of interest were at the heart of the processor – 
      the Integer Unit (IU), Cache controllers and AHB interface (there were 
      some 22 VHDL files required to describe these completely). A testbench was 
      created to verify the operation of these components in isolation from the 
      rest of the LEON processor.
 
 The next step 
      was the creation of a bridge between the LEON AHB interface and the 
      proprietary bus interfaces to the on-chip DRAM and peripherals. While AMBA 
      buses were not used elsewhere on the chip their use was advantageous due 
      to the familiarity of the design team with the standard. With the bridge 
      in place the LEON CPU core could then be integrated with the remainder of 
      the ASIC (or more specifically the portions of it that existed at that 
      time). It was also necessary to select and integrate the correct memories 
      and register files for the cache data and tag rams and the IU register 
      file. At a later date it was also necessary to select the appropriate 
      hardware multiplier and divider circuits. LEON does support memories and 
      register arrays from a number of foundries (and also FPGA targets) but the 
      foundry for this ASIC was not supported so this step took some work. 
      Simple wrappers were also required for each register array / memory.
 
 Modifications
 
 In 
      order to fulfill the application requirements some modifications and 
      enhancements were required to the LEON CPU components. All of the LEON 
      related design work was confined to the CPU subsystem level of hierarchy 
      depicted in Fig 3 below and this was performed in parallel with the rest 
      of the ASIC design.  Firstly, as the LEON cache controllers refilled 
      the 256-bit wide cache lines by reading 32 bits at a time and the on-chip 
      DRAM produced 256-bit lines for every read, it was highly inefficient to 
      read the same DRAM line 8 times in order to refill a line in the LEON 
      caches. By making a few changes to both the instruction and data cache 
      controllers and cache memories it was possible to refill the entire cache 
      line with the 256 bits yielded by the DRAM read thus reducing the number 
      of DRAM reads required from 8 to 1.
 
 The most significant 
      enhancement was the addition of a Memory Management Unit (MMU). Code is 
      executed on the processor in either supervisor or user mode and the 
      application required strict enforcement of security rules to ensure user 
      mode code was restricted in its operation. The primary function of the MMU 
      was the protection of supervisor mode code and data from user mode 
      accesses. The MMU was simpler than conventional MMUs in that it did not 
      feature a Translation Lookaside Buffer (TLB), although it did implement 
      the memory map for the IC. It is not compatible with the SPARC Reference 
      MMU specification [4]. The MMU allowed the DRAM address space to be split 
      into up to 8 regions with each region having programmable access 
      permissions and start / stop boundaries. The programmable registers 
      controlling the MMU could of course only be accessed when executing code 
      in supervisor mode.
 
 Access control for the on-chip peripherals was 
      distributed i.e. the access control signals were propagated to each 
      peripheral and each peripheral could accept or reject an access depending 
      on the permissions of the access and the peripherals settings (this was 
      often determined on a register by register basis). In addition to 
      controlling access to the DRAM and peripherals the MMU also included the 
      AHB to proprietary bus bridges for the DRAM data bus and the CPU 
      peripheral bus, a write buffer and a bus timeout function to avoid 
      possible bus hangs.
 
 The purpose of the write buffer was to improve 
      write performance to minimize the impact of register window over / 
      underflows. Register windows are a feature of SPARC processors and can 
      allow fast context switching between tasks. However when a register window 
      over / underflow occurs the worst case context switch time may become 
      prohibitive for real-time applications. A small posted write buffer was 
      added which combined a number of CPU writes into a single write to the 
      wide DRAM. This was found to improve write performance significantly 
      (particularly for the sequential writes that are characteristic of window 
      over / underflow handling) at the cost of complicating the design to 
      ensure data coherency was upheld in all situations.
 
 Further 
      modifications were made to the data cache to enforce user / supervisor 
      data security. Code is executed on the processor in either supervisor or 
      user mode and the application required strict enforcement of security 
      rules to ensure user mode code was restricted in its operation. Extra tag 
      bits and logic were added to the data cache to ensure user mode code could 
      not retrieve supervisor data from the data cache (this was possible with 
      the basic LEON design) and the MMU enforced the security of supervisor 
      mode code and data outside of the caches.
 
 After a reset, the 
      processor starts executing code from address #00000000. In order to assist 
      with software error handling (e.g. null pointer de-referencing) all 
      accesses to the bottom four word locations (i.e. #00000000 to #0000000C) 
      were trapped by the MMU unless they were made by the reset handler. In all 
      there were six different conditions introduced that could be trapped by 
      the MMU to protect the integrity of the system.
 
 These 
      customisations were made while preserving all of the existing LEON 
      functionality i.e. no previous LEON functionality was compromised by the 
      enhancements. While this required a little more design and verification 
      effort it offered increased confidence in the modified 
      design.
 
 
  Figure 3: CPU subsystem block 
      diagram
 
 Integration
 
 As previously 
      mentioned integration of LEON with the remainder of the chip was mostly a 
      matter of choosing the correct technology specific macros (i.e. SRAMs, 
      register arrays, multiplier etc) and then connecting it together. Because 
      the CPU peripherals had been verified using a bus functional model of the 
      CPU peripheral bus prior to integration they all worked first time with 
      the real CPU. One issue that did require some attention during integration 
      was endianness. SPARC, and therefore LEON, is a big endian architecture 
      but the rest of the system was little endian. Thus, when data was shared 
      between the CPU and other blocks (some of which had DMAs with byte-write 
      capability) careful thought was needed to ensure that bytes were not 
      swapped around incorrectly. These scenarios were also subjected to 
      significant directed testing to ensure everything was correct. Where 
      endianness coherency could not be handled by hardware the need for byte 
      swapping in software was clearly flagged to the software 
      developers.
 
 Verification
 
 A number of different 
      approaches were used to verify the functionality and integration of the 
      CPU within the ASIC including RTL verification, behavioural modeling in C 
      and VHDL, external certification of the processor and FPGA emulation. 
      Unfortunately a complete discussion of the verification strategy used is 
      outside the scope of this paper. The primary approach for testing the 
      functionality of the CPU (and in particular the customizations of the LEON 
      components) was at the subsystem level. This level consisted of all the 
      CPU subsystem components shown in Fig 3, the Interrupt Controller (this 
      was a new design rather than the LEON interrupt controller), ROM, RAM, 
      DRAM arbiter and a behavioural model for the DRAM. Tests were developed 
      using C and assembly language, compiled using the GNU toolkit available 
      with LEON and then post processed into appropriately formatted memory 
      images by perl scripts. A VHDL testbench performed the necessary stimulus 
      generation and signal monitoring.
 
 LEON modules that were 
      customized were subjected to full functional verification (i.e. not just 
      the changes were tested). As the original LEON tests that formed part of 
      the release were not considered rigorous enough for production silicon 
      extra effort was expended to ensure satisfactory verification coverage, 
      particularly of the cache controllers.
 
 Over the course of the 
      development a number of minor bugs in the LEON design were uncovered by 
      the verification which were promptly fixed by Gaisler Research. The 
      success of the verification is best demonstrated by the fact that the 
      silicon worked first time upon return from the fab without a single 
      bug.
 
 FPGA Emulation
 
 A significant software development was required to generate 
      the ROM image and further post-boot downloadable code. While there is an 
      instruction level simulator (TSIM) available for the LEON processor it 
      could not model the modifications made to the LEON modules or the other 
      on-chip components particular to this design. FPGA emulation was clearly 
      the best solution especially as it also provided an additional layer of 
      functional verification.
 
 An off the shelf third party board based 
      on a Xilinx Virtex-II 6000 FGPA was chosen for its large FPGA and short 
      lead time. Retargeting the LEON modules to the FPGA was straightforward as 
      Xilinx FPGAs were already supported as a target technology in the LEON 
      code. Two additional LEON modules were implemented on the FPGA which would 
      not be present on the ASIC – the Debug Support Unit (DSU) and a UART. 
      These were required to facilitate software debug and communication with a 
      host PC.
 
 S3’s GNAT (General-purpose Native jtAg Tester) [5][6] 
      module was used as part of the FPGA development environment. This module 
      allows access to the FPGA logic (including ROM / RAM and I/Os) via its 
      JTAG port. When used in conjunction with VNC [7] full remote control, even 
      from other sites, of the FPGA board was possible. This allowed the ROM 
      contents to be updated, the processor reset and onboard LEDs and internal 
      registers to be monitored all without having to go to the 
      lab.
 
 Benefits
 
 Outside 
      of the obvious cost savings one of the primary benefits of using LEON was 
      the ease with which its capabilities could be augmented as the 
      requirements grew. This was a significant benefit because, as with all 
      developments, requirements did change. Initially a cacheless Integer Unit 
      was to be sufficient but this evolved into a final configuration with 1 kB 
      I & D caches with the enhancements referred to earlier and hardware 
      support for the SPARC multiply, multiply and accumulate, and divide 
      instructions. As the entire source code was available for the extra LEON 
      features from the very beginning the new features could be turned on 
      easily and quickly without the need for further dialog (or negotiation) 
      with the supplier. Indeed once the simulation and synthesis environments 
      had been set-up simple ‘what-if’ analyses could be easily achieved by 
      choosing different configuration options with the GUI referred to earlier 
      and executing our makefile based flow.
 
 Access to the source code 
      and the freedom to modify it proved very useful not only in performing the 
      customizations described but also during debug as it was possible to tease 
      out detailed functionality and to obtain a more complete understanding of 
      certain behaviours. Without this freedom to modify the core the same 
      degree of performance improvement would not have been possible. 
      Furthermore if similar functionality was designed into the non-CPU logic 
      its complexity, and the probability of a error, would have been increased.
 
 LEON has been designed with direct support for a number of 
      fabrication technologies (including FPGA) and porting it to a new 
      technology was not difficult. The code synthesized cleanly and posed no 
      problems in physical design.
 
 Finally the commercial support 
      provided by Gaisler Research for the duration of the development was 
      excellent. We enjoyed a direct interface to the engineers who designed the 
      core and they were always prompt and accurate in their 
      responses.
 
 Drawbacks
 
 The coding style used 
      for LEON required some familiarisation and the lack of comments and 
      detailed design documentation hampered progress from time to time. The 
      widespread use of records also caused problems for some CAD tools 
      (although these may have been addressed by the tool vendors by now). There 
      were also a number of new releases of the LEON database which fortunately 
      had little effect on our development – this was because the modules we 
      were using in our design were only occasionally modified in these new 
      releases.
 
 Other embedded applications, especially those with 
      significant real-time requirements, may not find LEON such a good solution 
      as the use of register windows makes context switching times difficult to 
      predict and poor in the worst case. Furthermore the register file for the 
      IU is large – a 144 x 32, 3-port register array was required in our 
      implementation which used the standard configuration of eight register 
      windows.
 
 While the software support for LEON is increasing all the 
      time (a Linux port is now available) careful consideration should be given 
      to both legacy (as porting may be non-trivial) and new software 
      requirements. This was not a problem in our application.
 
 While the 
      code itself has been used and refined many times the testcases that formed 
      part of the releases used in our development were not comprehensive enough 
      for an ASIC tapeout. Supplementary testing was required in our 
      case.
 
 Conclusion
 
 Processor selection is 
      one of the most important decisions to be made in developing a SoC. When 
      faced with a clean-sheet design the LEON core is certainly worthy of 
      serious consideration. The overall quality of the LEON offering is broadly 
      equal to, and often better than, that of other commercial IP blocks.  
      While it could be used without any modifications the possibilities for 
      customization are powerful. Access to the source code, and the ability to 
      modify it, allowed us to customize the core to our requirements rather 
      than complicate the logic external to the core. This enabled us to achieve 
      better performance, better verification and a higher quality design with 
      zero defects. While LEON may not be as widely suitable as the market 
      leading processor cores, it proved to be an excellent choice for this 
      design and doubtless will prove to be so for many others.
 
 References
 
 [1] http://www.us.design-reuse.com/exit?url=http://www.sparc.com/standards.html
 [2] 
      http://www.us.design-reuse.com/exit?url=http://www.arm.com/products/solutions/AMBA_Spec.html
 [3] 
      http://www.us.design-reuse.com/exit?url=http://www.gaisler.com
 [4] 
      SPARC V8 Architecture Manual, Appendix H
 [5] http://www.us.design-reuse.com/exit?url=http://www.s3group.com/system_ic/gnat/
 [6] 
      http://www.us.design-reuse.com/exit?url=http://www.xilinx.com/publications/xcellonline/xcell_53/xc_jtag53.htm
 [7] 
      http://www.us.design-reuse.com/exit?url=http://www.realvnc.com, 
      http://www.us.design-reuse.com/exit?url=http://www.tightvnc.com
 
 
 
 
 
 
 
        
          | 
              
              
                |   |  
                |  | Contact Silicon & Software Systems (S3)  |  |  |   
       
 
 
 
 list: -1137508565.42 seconds
 detail: 
      0.000471115112305 seconds
 prov: 0.000668048858643 seconds
 end_new
 |