EEMBC Journal - Autumn 2002 NEWS FROM EMBEDDED MICROPROCESSOR BENCHMARK CONSORTIUM -- www.eembc.org In this issue: 1. From the President 2. News Kernels 3. New Members 4. First SuperH Processor Benchmark Now Available 5. A New Leader and Direction for EEMBC's Consumer Benchmarks Subcommittee 6. EEMBC Publishes Benchmark Scores for NEC's 300-MHz VR5500 Processor 7. From the Lab, by Alan Weiss _________________ 1. From the President It's Not Normal to Normalize. A better perspective on benchmark analysis, by Markus Levy As the database of benchmark scores on EEMBC's website continues to grow (now numbering more than 180 processor reports), there is finally enough performance data to make reasonable comparisons between devices. In our early days, when there were limited numbers of score reports, system designers were forced to compare low-end 150-MHz processors with high-end 500-MHz processors. Nowadays there are quite a few directly competing processors with published scores on the EEMBC website, but this hasn't stopped people from making unreasonable comparisons. Take, for example, the EEMBC Netmark scores for Motorola's 1-GHz MPC7455 (30.4 Netmarks) and 400-MHz MPC755 (12.7 Netmarks). Comparing these processors might be interesting from an academic perspective, but it's of minimal practical value given the differences between their respective performance and price. Even so, sometimes the attempt at comparison will be made. This is done by normalizing the operating frequencies of the two devices to put the scores within a similar order of magnitude. If you normalize the MPC7455 down to 400 MHz (normalized Netmarks equals 12.2), then it turns out that the MPC755 is a more efficient processor. But this specious result fails to account for the capabilities of two radically different micro architectural implementations and manufacturing processes. It's even more abnormal to normalize scores for processor cores. EEMBC has adopted a rule that requires all processor core scores to be represented in units of iterations per MHz (essentially equivalent to cycle counts). In this context, the scores are closely tied to the instructions per cycle (IPC) of a processor. While I am certainly a believer in improving IPC capabilities, a single pipeline processor will typically have an IPC of one and a dual-pipelined processor will typically have an IPC of two (not accounting for processors with multiple dispatch capability, SIMD, VLIW, etc.). This situation can produce misleading results because normalizing benchmark scores almost completely discounts the architectural and micro architectural features of a processor. But the purpose of designing different architectures and micro architectures is to cost-efficiently address specific performance needs. For example, cores are designed with longer pipelines and more complex (higher latency) memory subsystems in order to achieve higher frequency performance and accomplish more work in a given time period. Take for example the ARM processor that has gone from the ARM7, with a 3-stage pipeline, to the 6-stage ARM10, to the most recent ARM11, based on an 8-stage pipeline. Interestingly, based on the IPC, these processors have relatively the same performance. But the ARM7 is capable of operating at 200 MHz (typical), and the ARM11 will reach speeds of 500-700 MHz in a 0.13-micron process. A similar result is found if you compare the MIPS 4KEc and the MIPS 5Kc, but these are substantially different architectures. This is not to say that normalizing is always an abnormal function. Some processors are specifically designed to deliver the most "work" per clock. An example of this is seen in recently published EEMBC Telecom scores for Improv Systems' 250-MHz VLIW processor and Tensilica's 285-MHz Xtensa-V. It makes sense, when comparing these processors, to normalize the scores because every clock counts. Obviously, in reality, benchmark score analysis is a complex task. And in general, there's really no right or wrong way to do this. But the bottom line is to make sure that you normalize with care and understand the consequences. _________________ 2. News Kernels Readers of the new edition of Computer Architecture: A Quantitative Approach by John L. Hennssy and David A. Patterson will find numerous references to EEMBC benchmarks throughout the text, as well as a section on the pitfalls of Whetstone and Dhrystone synthetic benchmarks. The best-selling textbook has been expanded to include not only high-performance desktop machine design, but also the design of embedded and server systems. Price for the book is $62.50 at http://www.bookpool.com Discount Technical Books. Further to a resolution passed at the August EEMBC board meeting in Monterey, work is underway to make EEMBC benchmark data sheets more useful for system designers by explaining how benchmarks translate into real-world performance. An improved layout makes data sheets easier to read. As they're revised, new data sheets will be available at http://www.eembc.org/benchmark/datasheet. Look for the debut of a new Search Benchmark Scores mechanism on the EEMBC web site in mid-November. The new design will allow users to bring up a list of all silicon or simulated processor scores for a given application area with just one click. Searches can also be refined to bring up processor scores from selected suppliers only. _________________ 3. New Members EEMBC is pleased to announce that AMD has reinstated its membership to support its newly acquired company, Alchemy Semiconductor. In addition, the Consortium has signed up three new members: Insignia, Samsung, and Sony. Insignia joins as an EEMBC Java subcommittee member, while AMD, Samsung, and Sony are joining as EEMBC board members. This brings our member count up to 52! _________________ 4. First SuperH Processor Benchmark Now Available The first certified benchmark scores for a processor core from SuperH, Inc. were announced during this year's Microprocessor Forum, held October 14-18 in San Jose. The licensable SuperH SH-4 CPU core was tested using EEMBC's automotive / industrial, consumer, networking, and office automation benchmark suites in a 266-MHz simulation environment. Per-Megahertz, out-of-the-box consolidated scores in each category were 0.52771 Automarks, 0.07474 Consumermarks, 0.01917 Netmarks, and 0.67887 OAmarks. Simulated Scores Automotive http://www.eembc.org/benchmark/score/ScoreReportWin.asp?BenchmarkSeq=334 Consumer http://www.eembc.org/benchmark/score/ScoreReportWin.asp?BenchmarkSeq=352 Networking http://www.eembc.org/benchmark/score/ScoreReportWin.asp?BenchmarkSeq=356 Office Automation http://www.eembc.org/benchmark/score/ScoreReportWin.asp?BenchmarkSeq=353 The SH-4 was tested using the SuperH GNU compiler, which is based on industry-standard, open-source GNU technologies and supports standard Linux distributions as well as proprietary kernels. "The compiler is an important feature when running large benchmarks, and it is a significant part of our strategy that we provide a GNU compiler that is open-source by definition," said Jon Frosdick, SuperH's director of software engineering. "SuperH joined EEMBC as a board member just six months ago, so it is especially gratifying to see how quickly the company has certified and published benchmark scores on the SH-4," said Markus Levy, EEMBC president. "Besides demonstrating the out-of-the-box performance of the SuperH core, these new scores are a significant contribution to our library of EEMBC benchmark score reports based on simulation." _________________ 5. A New Leader and Direction for EEMBC's Consumer Benchmarks Subcommittee EEMBC's Consumer Subcommittee has a new chair: Adrian Wise, CTO of Siroyan. An expert in MPEG, Wise will lead the Consortium's efforts to add to its consumer benchmark suite a series of new benchmark kernels that address the growing need for objective performance measures of processors used in MP3 players, digital cameras, PDAs, and other consumer electronic devices. "The EEMBC Consumer Sub-committee is developing benchmarks covering a range of media processing technologies, including both still and moving image coding, audio coding, and encryption," Wise noted. "We are working to add benchmarks that are highly relevant to designers of consumer electronics equipment, and we welcome the input of engineers from both the OEM and semiconductor communities who are selecting processors and processor IP blocks for consumer electronics equipment." EEMBC's current consumer benchmark suite includes benchmark kernels for JPEG encode and decode, color-space conversion, and image filtering. New benchmarks for MP3, MPEG-2, and MPEG-4 video encode and decode, as well as encryption kernels, are now under development and some are even ready for testing. Wise has been involved with the MPEG standardization process since 1989, including service as the Project Editor of the MPEG-2 video standard. "The new benchmarks being developed by the Consumer Subcommittee extend across a wide range of technologies relevant to consumer electronics equipment," said Markus Levy, EEMBC president. "The addition of the MPEG standards, moreover, will bring them into the realm of real- time computing and allow designers to gauge the performance of processors under real-time constraints. With his many years of experience in the semiconductor industry and direct involvement in the development of MPEG-2, Adrian is superbly qualified to lead this effort." ____________________ 6. EEMBC Publishes Benchmark Scores for NEC's 300-MHz VR5500 Processor New benchmark scores for a processor from NEC Electronics provide an interesting opportunity to make direct performance comparisons between 300-MHz and 400-MHz versions of the same device. Scores for NEC's 64-bit, 300-MHz VR5500T MIPSR-based microprocessor were published in September, following on publication of scores for the 400-MHz VR5500 earlier this year. The 300-MHz VR5500 microprocessor was tested using all five of EEMBC's application-based benchmark suites. In benchmark setups using 32-bit and 64-bit external buses, the 300-MHz VR5500 processor achieved the following out-of-the-box consolidated scores: NEC VR5500 Out-of-the-Box Scores 32-bit Automotive http://www.eembc.org/benchmark/score/ScoreReportWin.asp?BenchmarkSeq=287 Consumer http://www.eembc.org/benchmark/score/ScoreReportWin.asp?BenchmarkSeq=290 Networking http://www.eembc.org/benchmark/score/ScoreReportWin.asp?BenchmarkSeq=291 Office Automation http://www.eembc.org/benchmark/score/ScoreReportWin.asp?BenchmarkSeq=294 Telemarks http://www.eembc.org/benchmark/score/ScoreReportWin.asp?BenchmarkSeq=295 64-bit Automotive http://www.eembc.org/benchmark/score/ScoreReportWin.asp?BenchmarkSeq=288 Consumer http://www.eembc.org/benchmark/score/ScoreReportWin.asp?BenchmarkSeq=289 Networking http://www.eembc.org/benchmark/score/ScoreReportWin.asp?BenchmarkSeq=292 Office Automation http://www.eembc.org/benchmark/score/ScoreReportWin.asp?BenchmarkSeq=293 Telecom http://www.eembc.org/benchmark/score/ScoreReportWin.asp?BenchmarkSeq=296 A close analysis of the individual benchmark scores behind the 300-MHz VR5500 processor marks reveals good system-level efficiency when combined with a 32-bit external memory bus, especially for applications with reasonably small data sets, as seen with the automotive / industrial and telecom benchmarks. As expected, when dealing with applications with larger data sets, such as the consumer and networking benchmarks, the width of the processor's external data bus has a greater impact. Furthermore, the processor's data bus width will be more of a performance factor for higher clock frequencies. For example, for the consumer benchmarks, the performance of the 300-MHz processor is improved by 5.7% when using a 64-bit external bus, while the 400-MHz processor's performance is increased by 7% when using a 64-bit external bus. "This comparison between the 300-MHz and 400-MHz VR5500 microprocessor scores is exemplary of how designers can use EEMBC benchmarks to understand the price/performance ratio offered by a given embedded processor and memory subsystem," said Markus Levy, EEMBC president. "In making scores for both these devices available, NEC is doing a great service for its customers and demonstrating a performance range rather than concentrating only on achieving the highest scores possible. This information allows designers to make trade-offs to achieve the most practical processor and system implementation." _________________ 7. From the Lab Dhrystones Considered Harmful: Using Dhrystones Will Damage Your Business and Your Mental Health By Alan R. Weiss, Chairman and Chief Technical Officer, EEMBC Certification Laboratories In Communications of the ACM, Vol. 11, No. 3, March 1968, Edsger Dijkstra wrote a seminal paper entitled "GOTO Considered Harmful." In the spirit of this classic of computer science, and in honor of the late Professor Dijkstra, I would like to add the following contribution to the engineering community. Just as the use of "GOTO" in computer languages is an abomination, so too is the use ? and misuse ? of the Dhrystone benchmark. For a number of years, I have believed that the quality of computer processor architectures decreases in exact proportion to the regularity with which their manufacturers quote Dhrystone scores. More recently, I reacquainted myself with exactly how awful Dhrystones are, and how it can and will damage your mental health merely to utter their name. I am convinced that Dhrystones should be abolished forever from all discussions related to performance or benchmarking. Here is a summary of the main differences between EEMBC and Dhrystone: Written in C language code EEMBC: Yes - All code is in ANSI C (except the Java benchmark suite) Dhrystone: Yes - but it is NOT in ANSI C Very small size EEMBC: No - mixture of small and large benchmarks Dhrystone: Yes - two tiny .C files Single, easy-to-report score EEMBC: Yes - aggregates such as AutoMark or TeleMark. Dhrystone: Yes - DMIPS Multiple benchmarks EEMBC: Yes - 35 in Version 1.0, more being added Dhrystone: No Synthetic EEMBC: No - based on real-world applications Dhrystone: Yes - completely Related to a reference platform EEMBC: No Dhrystone: Yes - based on ancient VAX 11/750 Integer only code EEMBC: No - combination of integer and floating point Dhrystone: Yes Library-dependent performance EEMBC: No - profiling suggests that good libraries help, but bad libraries do not hurt as much as with Dhrystone Dhrystone: Yes - very sensitive to string functions Evolution EEMBC: Yes - a new micro controller benchmark suite was added in 2002, a Java benchmark suite and more complex applications in 2003. Dhrystone: No - Not since 1988 Third-party certification EEMBC: Yes - EEMBC Certification Laboratories, independent, non-biased, fair Dhrystone: No - plagued with inaccuracies and cheating Source control EEMBC: Yes - strong. All code is available to EEMBC members, backed up by ECL using CVS and source management. A "correct version" always exists. Dhrystone: No Standard run rules EEMBC: Yes - extremely strict, with a combination of Full Fury (optimized) and an Out of the Box Dhrystone: Yes - but open to wide interpretation Disclosure of benchmark environment EEMBC: Yes - scores must be accompanied by full disclosure of certified benchmark environment Dhrystone: No - resulting in benchmark execution ambiguity Significant figure of merit varies EEMBC: No Dhrystone: No Repository of official scores EEMBC: Yes - http://www.eembc.org Dhrystone: No - Past Usenet repositories are years out of date. Inlining or excessive compiler optimization destroys the benchmark EEMBC: No Dhrystone: Yes Full Fury mode EEMBC: Yes - allows any optimization that still results in certifiable output. Highlights the theoretical maximum performance of a device or architecture Dhrystone: No - run rules state no changing the source code In the insightful words of Dr. Reinhold Weicker, the original creator of Dhrystone, "Although the Dhrystone benchmark that I published in 1984 was useful at the time, it cannot claim to be useful for modern workloads and CPUs because it is so short, it fits in on-chip caches, and fails to stress the memory system. "Also, because it is so short and does not read from an input file, special compiler optimizations can benefit Dhrystone performance more than normal program performance. In embedded computing, EEMBC www.eembc.org is collecting larger real-life embedded-computing programs as the basis for benchmarks." _________________ If you do not wish to receive e-mail from EEMBC, you can un-subscribe by accessing the following link: http://www.eembc.org/asp/unsubscribe.asp. EEMBC sends no more than one e-mail per month to registered users at www.eembc.org. Continuing your subscription ensures you'll be notified when new scores and other important announcements are available.