# Instrumentation Set-up for Instruction Level Power Modeling\*

S. Nikolaidis, N. Kavvadias, P. Neofotistos, K. Kosmatopoulos, T. Laopoulos, and L. Bisdounis<sup>1</sup>

Section of Electronics and Computers, Department of Physics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece <sup>1</sup>INTRACOM S.A., Development Programmes Department 19.5 Km Markopoulo Ave., GR-19002 Peania, Greece

snikolaid@physics.auth.gr

**Abstract.** Energy constraints form an important part of the design specification for processors running embedded applications. For estimating energy dissipation early at the design cycle, accurate power consumption models characterized for the processor are essential. A methodology and the corresponding instrumentation setup for taking current measurements to create high quality instruction level power models, are discussed in this paper. The instantaneous current drawn by the processor is monitored at each clock cycle. A high performance instrumentation setup has been established for the accurate measurement of the processor current, which is based on a current sensing circuit, instead of the conventional solution of a series resistor.

## **1** Introduction

Embedded computer systems are characterized by the presence of a dedicated processor which executes application specific software. A large number of embedded computing applications are power or energy critical, that is power constraints form an important part of the design specification [1]. Early work on processor analysis had focused on performance improvement without determining the power-performance tradeoffs. Recently, significant research in low power design and power estimation and analysis has been developed. The determination of a method for the accurate estimation of the power consumption in processors and its dependencies on data and the architectural characteristics of the processor are required for the creation of high quality models.

Power modeling techniques existing in literature, are distinguished into two main categories: a) physical *measurement-based* and b) *simulation-based* ones. In *simulation-based* methods [2][3], energy consumed by software is estimated by calculating the energy consumption of various components in the target processor through simulations, which can be performed at different levels of abstraction. The

\* This work was supported by EASY project, IST-2000-30093, funded by the European Union

common case is evaluating the power consumption figures from a mix of gate-level and RT-level descriptions. The main drawback of these simulation-based techniques is the need of information about the circuit level design of the processor which is not usually available. In addition to that, these techniques do not provide a mechanism to relate the energy consumption of software with executing the instruction sequence.

In *measurement-based* approaches [1], [4]–[7], the energy consumption of software is characterized by data obtained from real hardware. Power metrics extracted by monitoring the execution of software on the target processor involve either current measurements or direct energy measurements. The advantage of measurement-based approaches is that the resulting energy model proves close to the actual energy behavior of the processor.

In measurement techniques, the usual concept is to associate instructions running on the processor with their corresponding energy cost. These techniques are evaluated by analyzing the power consumption of the processor via decomposing its workload into sequentially executing assembly-level instructions. A profound advantage of measurement-based methodologies towards simulation-based methods is that knowledge of micro- architectural details of the processor under study is not necessary.

# 2 Related Work

Power analysis techniques for embedded processors that employ physical measurements were firstly suggested in mid 90's. Significant effort on software optimization for minimizing power dissipation is found in [1],[8]-[10], where a technique based on physical measurements is developed. Power characterization is done with the extraction of cost factors for the average current drawn by the processor as it repeatedly executes short instruction sequences. The base cost for an instruction. Inter-instruction effect induced when executing different adjacent instructions, is measured by replacing the one-instruction loops used in the measurement of base costs, with loops consisting of appropriate instruction pairs. The sum of the power costs of each instruction executed in a program, refined by the power cost of the inter-instruction effects, are considered to provide the power cost of the program. This method has been validated for commercial targets based on embedded core processors.

The majority of work published on the field of measurement-based techniques, refers to the Tiwari method as a base point. By Tiwari method only average power estimates can be utilized for modeling task, since the measurements are taken with a standard digital ammeter. Direct application of the Tiwari technique is found in [11] where an extensive study of the ARM7TDMI processor core is reported. In order to confine the set of all instruction variations, the ARM instructions are organized according to addressing mode. In [5], physical measurements for the processor current are also obtained by a precise amperemeter. However, power modeling effort is more sophisticated, as architectural-level model parameters are introduced and integrated within the power model. These consist of the weight of instruction fields or data words, the Hamming-distance between adjacent ones, and basic costs for accessing the CPU, external memory and activating/deactivating functional units.

Instantaneous current is firstly measured in [4], where a digitizing oscilloscope is used for reading the voltage difference over a precision resistor that is inserted between the power supply and the core supply pin of the processor. Instantaneous power is then calculated directly from the voltage waveform from which average figures are extracted to guide instruction power modeling. A similar measurement methodology is described in [12], where a high bandwidth differential probe is utilized for reading instantaneous power on a resistor, consumed by an ARM7 processor. Resistor-based methodologies suffer from the supply voltage fluctuations over the measurement resistor and noise induced in supply current path, phenomena which inherently reduce the accuracy of the method.

All the above techniques acquire the current drawn by the processor on instruction execution. A complex circuit topology for cycle-accurate energy measurement is proposed in [6,7], which is based on instrumenting charge transfer using switched capacitors. The switches repeat on/off actions alternately. A switch pair is charged with the power supply voltage during a clock cycle and is discharged during the next cycle powering the processor. The change in the voltage level across the capacitors is proportional to the square of the consumed energy and this value is used for the calculation of energy in a clock cycle. However, this method can not provide detail information for the shape of the current waveform, which may be significantly useful in many applications and also in case high quality power models including the architectural characteristics of the processor are required. In order to measure the energy variations, various (ref, test) instruction pairs are formed, where ref notes a reference instruction of choice and test the instruction to be characterized. This setup combined with the above modeling concept are then utilized to obtain an energy consumption model for the ARM7 processor. With this circuitry, measures are gathered for each pipeline stage energy consumption.

Power analysis methods that utilize simulation tools for constructing energy estimation models for software are also accounted. In these methodologies, the target system is first synthesized from an RTL description and gate-level power estimators are used to construct a power cost database for all instructions or instruction pairs [2]. In a different approach, system-level components such as the processor and the cache are considered on either active or idle state, and for each state a factor analogous to the power consumption is assigned [13].

#### **3** Instrumentation Setup for Measuring the Instantaneous Current

As it is mentioned in the previous section different techniques have been proposed for the estimation of the impact of software to the overall power consumption of a given processor. The common methodology is based on the derivation of instruction level power models by measuring the average current of each executed instruction of the processor. The average power *P* consumed by a microprocessor while running a program is given by:  $P=I_{DD}V_{DD}$ , where  $I_{DD}$  is the average current and  $V_{DD}$  is the supply voltage. The energy *E* consumed by a program is further given by: E=PN, where *N* is the number of clock cycles taken by the program, and • is the clock period. Thus, the ability to measure the current drawn by the CPU during the execution of the program is essential for measuring its power/energy cost.

The proposed method is based on the measurement of the instantaneous current drawn by the processor during the execution of the instructions. The measurement of the instantaneous current gives the opportunity for higher quality models to be developed since the behavior of the processor can be observed on a finer level exploiting the knowledge of its architectural characteristics. A current measurement method that used a digitizing oscilloscope to measure instantaneous power was proposed in [4] to develop a power model for the JF and HD implementations of the i960 family. However, the current was measured as the voltage drop on a resistor set directly in the current supply network. This configuration is inherently influencing the actual level of the voltage applied to the chip and thus creating an offset noise on the current values, and consequently reducing the accuracy of the method. Using small values of resistors reduces the resolution of the measurements.



Fig. 1. Measurement setup

The proposed current measurement approach which is based on a current mirroring configuration with Bipolar Junction Transistors (for high frequency operation and limited power-supply voltage fluctuation), aims to overcoming the insufficiencies of the previous methods. The instruction level power models may now be derived on the basis of the information about the "instantaneous" current variations monitored and measured continuously by a high-accuracy and high-speed automated measurement and data acquisition setup. The main task is performed by a high performance current mirroring circuit which is capable of providing a precise copy of the instantaneous current drawn by the processor core. The output (copy) current is then monitored by a precision Digital Storage Oscilloscope (DSO) and transferred to a PC for further calculations by the appropriate software (in Labview environment). The measurement setup is shown in Fig. 1. This measurement approach is similar to the Built-In Self Test (BIST) techniques [14] used for testing high frequency analog circuits (monitoring the current, drawn by the circuit under test, results to evaluation of the different operating conditions). By applying proper timing and signature

analysis techniques to these measurements, the power consumption of each instruction sequence used in the software can be estimated.

A simple 4-transistor configuration, shown in Figure 2, is used, which has been proven by extensive tests to offer a quite remarkable performance in terms of copying accuracy and time (frequency) response. The first of these characteristics is obviously important for the accurate measurement of the instantaneous current value. The second one is also important for this case, since the current variations in each clock, are short pulse-like shape waveforms. A key point in this measurement problem is to monitor accurately the shape of the current variations since this characteristic affects strongly the energy consumption value. The energy is calculated by integrating the current in the clock period and multiplying by the supply voltage. In Figure 2, *Rbias* is used for biasing purposes. Current measurements are taken on *Rmeas* (actually the voltage is measured) at the output branch which reflects the current through the processor (DUT). An offset DC current value due to the *Rbias* has to be subtracted from the measured value.



Fig. 2. Current mirror. DUT corresponds to the processor.

The experimental circuit of the current mirror and other components for the proper operation of the system are placed in a specially designed printed circuit board. Then, multiple experimental tests were performed to ensure the proper operation of this setup for the specific application. These tests include current copying accuracy, operation range (min-max values of the current), frequency response, phase difference measurements (between input and output current waveforms), etc. Note that the current drawn by the processor (and during these tests this current was controlled by different high performance generators) is considered as input current waveform while as output current, the current copy generated by the current mirror at its output is considered. The set of instruments used for the experimental test of the current sensing configuration includes an HP-3325B - 20MHz Function Generator, an IFN-2025 – 2GHz Sinusoidal Signal Generator, an HP-3575 – Gain/Phase Meter, and power supply units from Delta Electronika and Kikusui. The monitoring instrument is an HP54601B - 100MHz Digital Storage Oscilloscope connected to a PC computer by a GPIB local network. The experiments presented here have been done using BC212 transistors. The main characteristics of them are given in Table 1.

| Collector Current – Continuous               | I               | -100 mA,DC |
|----------------------------------------------|-----------------|------------|
| DC Current Gain ( $I_c = -100 \text{mAdc}$ ) | h <sub>FE</sub> | 120 (typ)  |
| Current Gain – Bandwidth Product             | f <sub>r</sub>  | 280 MHz    |

Table 1. Transistor Characteristics

In Figure 3 a comparison between the processor supply voltage fluctuation when the proposed circuit, (a), and a resistor, (b), for the same resolution, are used as the current sensing circuit. The supply voltage fluctuation at the processor is significantly less (more than 7 times) compared to a simple resistor configuration.



Fig. 3. Supply voltage fluctuation when the current sensing circuit is (a) the proposed current mirror and (b) a resistor.



Fig. 4. Experimental results for the input-output characteristic of the current sensing configuration and for the error of output current values (current copying)

The following experimental diagrams present the performance characteristics of this instrumentation system in terms of the different specifications considered above. These diagrams present typical cases from the multiple measurement tests, which were repeatedly performed in the lab. As it is shown in Figure 4 the main specification characteristics of the measurement system include an operation range of 2-100mA, with an error less than 2.5% and less than 1% in an operating range large enough to monitor different variations. Note that in case a lower current value needs to be monitored, then a constant current value may be added changing *Rbias*, which will therefore shift the operating range within the useful range.



Fig. 5. Experimental results for the frequency response of the current sensing configuration

Current copying capability (gain) is practically maintained constant to equal inputoutput current values up to 100Mhz. As it is shown in Figure 5, the gain fluctuation is less than 0.5dB all over the frequency range up to 100MHz. For higher frequencies a fixing coefficient has to be considered for the reproduction of the actual input current value. In addition to the above mentioned performance, a series of oscilloscope recordings were transferred to the PC and are shown here, to illustrate the efficiency of the proposed solution for accurate monitoring and measurement of the current waveforms. These examples present the accurate monitoring of a 10MHz square wave (Figure 6), showing also the details of the comparison between input and output.



Fig. 6. A 10MHz test current-signal recorded by the instrumentation setup. Upper trace corresponds to the input current and lower trace to the output current of the mirroring circuit

#### 4 Results

Using the proposed instrumentation set up, accurate instruction level power models can be derived based on the measurement of the instantaneous current drawn during the execution of the instruction. By monitoring current at each clock cycle we can have a clear view of the way the power is consumed. The effect of the factors, which affect power consumption, can be studied in a straightforward way. For example, different operand values can be used in the instructions and the corresponding power consumption can be measured at each clock cycle. Taking also advantage of the high resolution achieved by the proposed instrumentation set up accurate models can be created. In Figure 7, the current of the processor ARM7TDMI (supply voltage 2.5V) when executes the ADD instruction (while the rest pipeline stages run phases of the NOP instruction) having as operands (0,0) and (5555555,AAAAAAAA) and running at 6MHz is monitored. As it is expected, energy consumption appears in both phases of the clock. The energy is calculated by integrating the current for a clock period and multiplying by the supply voltage. The contribution of the current through *Rbias* is subtracted. For zero operands the energy consumed in this clock cycle was estimated at 0.95nJ while in the other case at 1.12nJ. The effect of the operand values on energy consumption is obvious.



**Fig. 7.** Current waveforms of the processor ARM7 executing the ADD instruction (a) with operands (0,0) and (b) with operands (5555555,AAAAAAAA)

Many measurements for estimating the energy consumed by the instructions of the ARM7 processors have been taken. Loops with NOP instructions and the one test instruction were executed. The energy of the test instruction was calculated as the sum of the energy consumed in the clock cycles required for this instruction to be executed minus two times the energy budget of the NOP instruction. (Due to the pipeline structure, two NOP instructions are also executed in the clock cycles needed for the execution of a test instruction). In Table 2 the energy consumption of some instructions are presented. The operands in these instructions are zero.

| Instruction          | E (nJ) | Instruction         | E (nJ) |
|----------------------|--------|---------------------|--------|
| ADD R2,R0,R1         | 0.910  | LDR R2,[R1,R3]      | 2.774  |
| AND R2,R0,R1         | 0.856  | STR R2, [R1,R3]     | 1.961  |
| ORR R2,R0,R1         | 0.907  | MUL R2, R0, R1, R10 | 2.768  |
| ORRS R2,R0,R1        | 0.967  | MLA R2, R0, R1, R10 | 3.748  |
| MOV R2,R1            | 0.935  | CMP R0,R1           | 0.751  |
| MOV R0,R0            | 0.903  | SWP R2,R0,[R1]      | 3.917  |
| Instruction          | E (nJ) | Instruction         | E (nJ) |
| ADD R2,R0,R1, ASR R3 | 2.137  | MRS R2, CPSR        | 0.977  |
| B label              | 3.095  | MSR CPSR_f, R2      | 1.143  |

Table 2. Instruction-level energy consumption (for zero operands)

With the proposed measuring environment models including the dependencies of the energy consumption on the operand values and their addressing values can be easily created. Although we have not yet completed the measurements, we have observed that there is a dependence of the energy on the number of 1s in the values of the operands and their addresses, which is close to be linear. This is illustrated for the ADD and the LDR instructions in Figure 8. The dependence of the energy on the position of the 1s in the operand words doesn't have significant effect on energy.



Fig. 8. Energy consumption of (a) ADD and (b) LDR as a function of the number of 1s in the operand values

#### 5 Conclusions

More accurate instruction level power models can be derived by measuring the instantaneous current drawn by the processor at each clock cycle. The instrumentation setup for monitoring the instantaneous current and for calculating the corresponding energy is presented. A current mirror is used as a current sensing circuit to minimize the supply voltage fluctuation and to increase the resolution of our measurements. Exhaustive experiments were done to ensure the propriety of the proposed circuit examining in terms of accuracy and frequency response. Some results from measurements of the current drawn by the ARM7TDMI processor are presented.

## References

- 1. Vivek Tiwari, Sharad Malik and Andrew Wolfe, "Power Analysis of Embedded software: A First Step Towards Software Power Minimization", *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 2, No. 4, pp. 437–445, December 1994
- Chaitali Chakrabarti, Dinesh Gaitonde, "Instruction Level Power Model of Microcontrollers", Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 176–179, 1999.
- 3. Tony D. Givargis, Frank Vahid, Jorg Henkel, "Instruction-based System-level Power Evaluation of System-on-a-chip Peripheral Cores," in Proc. of IEEE/ACM International Symposium on System Synthesis (ISSS '00), pp. 163–169, September 2000.

- J. T. Russell and M. F. Jacome, "Software Power Estimation and Optimization for High Performance, 32-bit Embedded Processors, In Proceedings of the International Conference on Computer Design (ICCD '98), October 1998.
- S. Steinke, M. Knauer, L. Wehmeyer, P. Marwedel, "An Accurate and Fine Grain Instruction-Level Energy Model supporting Software Optimizations," in Proc. of the International Workshop on Power and Timing Modeling, Optimization and Simulation, Yverdon-les-bains, Switzerland (PATMOS '01), September 2001.
- 6. Naehyuck Chang, Kwanho Kim, and Hyun Gyu Lee, "Cycle-Accurate Energy Consumption Measurement and Analysis: Case Study of ARM7TDMI," *IEEE Transactions on VLSI Systems*, vol 10, No 2, pp. 146–154, Apr. 2002.
- Sheayun Lee, Andreas Ermedahl, Sang Lyul Min, and Naehyuck Chang, "An Accurate Instruction-Level Energy Consumption Model for Embedded RISC Processors," to appear in In Proceedings of ACM SIGPLAN 1999 Workshop on Languages, Compilers and Tools for Embedded Systems, 2001.
- 8. Vivek Tiwari, Sharad Malik, Andrew Wolfe, Mike Tien-Chien Lee, "Instruction Level Power Analysis and Optimization of Software", Journal of VLSI Signal Processing, Vol. 13, No. 2–3, pp. 223–238, August 1996.
- 9. Mike Tien-Chien Lee, Vivek Tiwari, Sharad Malik, and Masahiro Fujita, "Power Analysis and Minimization Techniques for Embedded DSP Software", IEEE Transactions on Very Large Scale Integration (VLSI) Systems, pp. 123-135, March 1997.
- 10. V. Tiwari, T. C. Lee, "Power Analysis of a 32-bit Embedded Microcontroller," VLSI Design Journal, Vol. 7, No. 3, 1998.
- 11. SOFLOPO, Low Power Development for Embedded Applications, Esprit project, Deliverable 2.2: Physical measurements, by Thanos Stouraitis, University of Patras, December 1998.
- 12. Xavier Amela, Joan Figueras, Salvador Manich, Josep Rius, Rosa Rodriguez, Antonio Rubio, "ARM Instruction Set Energy Models and Power Simulation Tools (ARM7TDMI)," UPC Internal Report for the IST 10425 VIP (Versatile Integrated Payphone) Project, March 2001.
- 13. Tajana Simunic, Luca Benini and Giovanni De Micheli, "Cycle-Accurate Simulation of Energy Consumption in Embedded Systems," In Proceedings of the Design Automation Conference (DAC '99), 1999.
- A. Hatzopoulos, S. Siskos and Th. Laopoulos, "Current conveyor based test structures for mixed-signal circuits", IEE Proceedings – Circuits, Devices and Systems, V.144, N.4, 1997