Home Intelligraphics custom device driver development
        About
        Contact   Free Quote | Services | Resources | Careers | Partners | Site Map 
Intelligraphics device driver development
On Target, On Time
Intelligraphics - the device driver development experts
     

Performance Utilization and Performance Benchmarking in Embedded Systems

Handling performance issues at various stages of product development

Introduction

In an ideal world, the selection of a processor for a new product is based on the hardware architecture and performance requirements. However, in reality both the architecture and processor are selected based on a variety of reasons including cost, available tools, developer familiarity, support, and (too often) political reasons. In many embedded development environments, performance requirements are considered after-the-fact. Performance considerations are frequently neglected until performance-related problems arise. Several reasons for this neglect exist, the first being the performance requirements themselves. Performance issues may be omitted completely from the requirement specification document. Alternatively, existing performance requirements may be ignored or altered in response to time-to-market pressures.

In addition to new products, existing products may be affected by performance requirements as well. Performance perceived "good enough" earlier in a product's lifecycle, may later be proven to be unacceptable. One such example would be deciding whether an existing product has enough bandwidth to handle an enhancement without a board upgrade. On the other hand, a product may be developed without even considering performance requirements resulting in a partially functional application. Regardless of the reasons, ignoring performance requirements can pose great consequences.

A set of approaches is required when dealing with different performance issues. These approaches should be applicable during any stage of product development. Moreover, they should prove successful dealing with issues on both new and existing products and should provide information as a function of processor utilization.

Overview

This article presents three approaches for handling performance issues at various stages of product development. The three approaches addressed are Rate Monotonic Analysis, Duty Cycle, and Counter approaches. All three approaches are thoroughly discussed revealing each approach's advantages, disadvantages, pitfalls, and accuracy. Enhancements to several of the approaches are also mentioned for potential use for profiling. Although a Real-Time Operating System (RTOS) is assumed for each of the approaches, each approach is universal in purpose so that they may be used for any application running under an RTOS or superloop.

Rate Monotonic Analysis

Rate Monotonic Analysis (RMA) is a rigorous analytical approach for an RTOS-based application. RMA has several functions and was first presented in a paper by C.F. Liu in 1973. This approach has since been used successfully by developers for almost three decades. Typically, RMA is used to determine if hard deadlines can be met but may also be used to determine processor utilization of an application. The latter function of RMA is useful during the early stages of project development for selecting a processor. Therein lies most of our interest in RMA. Before discussing the details of RMA, we need to define terms and discuss relevant assumptions.

RMA Definitions and Assumptions

An application can have up to N tasks where N is an integer number. Each of those tasks operates at a certain periodicity or time interval. A task has to perform some action or activity and needs to be completed within a certain time frame or deadline. The deadline may be either "hard" or "soft". A hard deadline is analogous to a command, while a soft deadline is analogous to a request. An example of a soft deadline is striking a key on a keyboard. Optimally, the key will be displayed within several hundred milliseconds after being struck, but such a response is not mandatory for the character to be displayed within that time frame. Alternatively, hard deadlines must be met by their specified time interval. An example of a hard deadline is a pacemaker. Pulses to stimulate the heart occur at precise intervals and critical importance is placed upon the proper execution of those tasks within the allotted time frame.

Five additionally important concepts for performance involve the requirement of tasks to address priority, preemption, priority inversion, deadlocks, and starvation. A priority is analogous to a pecking order. Those items higher in the pecking order get service before those lower in priority. A task must have a priority in order for other tasks to be preempted. For example, an application with multiple tasks may have some tasks preempted by other tasks. Preemption occurs when a higher priority task that needs to execute interrupts an already running lower priority task. The higher priority task executes until it is finished. At that time, the task scheduler returns control back to the lower priority task so that the task may complete its function. Priority inversion is a condition that keeps the highest priority task from executing and is caused when a higher priority task and a lower priority task share a resource. If the lower priority task has the resource locked but is preempted by the higher priority task, the higher priority task is blocked, producing the next concept: deadlock. Deadlock occurs when neither task is able to execute because they each have a resource that the other needs to run. Finally, starvation occurs when a low priority task never gets to run. Starvation typically occurs when tasks are given the wrong priority.

Certain assumptions must be met in order to make use of RMA. One may assume that the application is running on a single processor and that tasks are fixed priority, periodic, preemptable, and executable to completion. Additionally, one may assume that context switches are instantaneous, tasks are not synchronized, and tasks with shorter periods have higher priority. If all of these assumptions are met, the basic RMA theory states that the tasks are schedulable, no priority inversions or deadlocks exist, and the application will meet all hard deadlines.

Using RMA

Now that RMA conditions have been defined and discussed, it is necessary to address RMA utilization. The following equation is the basic form for using RMA to determine processor utilization:

U(n) = T1/P1 + T2/P2 + T3/P3 + ... + Tn/Pn <= n(21/n - 1)

The function U(n) represents the percentage of processor cycles that may be used before a deadline is missed. The variable n represents the number of tasks, while T and P represent execution time and period respectively. Moreover, T is the time required to complete a specific activity, and P the frequency with which the task must be run. For all practical purposes, the period P and the deadline are the same, meaning that the activity must finish before the task is ready to run again.

The left hand side of the equation is the individual ratios of each task. All of the individual ratios are summed to produce the projected utilization value. The right hand side of the equation is the upper bound of processor utilization or the maximum value to the utilization cannot exceed for n tasks before missing a deadline. The upper bound has a finite limit of approximately 69% for an infinite number of tasks.

The following example illustrates how RMA works. Assuming a three-task system, the time intervals for each task are 60 ms, 70 ms, and 80 ms respectively. Additionally, the periods for the tasks are 200 ms, 350 ms, and 175 ms respectively. This result of the left hand side of the equation is:

60/200 + 70/350 + 80/175 = .957

These values yield a processor utilization of about 96%. The upper bound calculation is:

3 (2 1/3 - 1) = .780

or roughly 78%. This result means that for the deadlines to be met the overall processor utilization must be less than 78%. Since processor utilization for this configuration is 96%, the deadlines are missed. In this scenario, the application needs to adjust utilization so that these values measure less than 78%.

The basic RMA theory works for applications with known deadlines. Moreover, this theory only accounts for hard deadlines; soft deadlines and cycles used by Interrupt Service Routines (ISR) are not considered with this form of RMA. Even though RMA states that an infinite number of tasks will only consume a maximum of 69% of processor utilization if all assumptions are met, practice shows otherwise. In practice, processor utilization may easily reach above 90%. This difference is due to soft deadline components such as non-critical tasks and interrupts. These soft deadline components operate at a low priority and may absorb cycles not used by hard deadline tasks. Thus, heuristically, the calculated processor utilization is halved to account for these unseen components. These components, while often unseen, can account for a large amount of processor cycles. Therefore when using the basic form of RMA, it is prudent to use the heuristic rule.

Empirical Approaches

The RMA approach addresses performance from a theoretical or analytical standpoint. RMA is a good tool for analytically calculating processor utilization if the system is well defined and all system deadlines are known. With known deadlines, RMA can be used to set up and model the system before implementation of the application. However, situations exist such that using an analytical approach like RMA may not be feasible. An example of this type of situation would be a scenario where the system is not well understood, does not adhere to all of the assumptions, or is too complex to be simply modeled. If the application is too complex, then designing and implementing the application to detect the satisfaction of the performance requirements becomes a preferred option. Determining processor utilization of an implemented application is referred to as an empirical approach.

Several empirical approaches have proven to be useful in determining processor utilization of an implemented application. These approaches have specific techniques that may be used to determine utilization of a given application. Keep in mind that the two techniques discussed in this article are not the only options available. However, these techniques are presented because of their overall utility.

The two empirical approaches discussed are the duty cycle, and counter approaches. The first approach uses duty cycles to determine aggregate processor utilization. A duty cycle is a ratio of the time on versus the total time. The second approach uses counters to log time spent in tasks. Counts are calculated and compared to give processor utilization as a function of differential time. The counter approach is used to determine both aggregate and individual task utilization and uses ratios to determine utilization. Although both approaches are used to determine processor utilization, they do not provide utilization results in terms of cycles, but rather as a ratio of use.

Duty Cycle Approach

The use of an RTOS is required for the duty cycle approach. The selected RTOS should be multitasking, pre-emptive, and priority based. An RTOS always has at least one task running. This task is called a null task and is the lowest priority task. Consequently, the null task runs when no other task is running. Listing 1 shows the pseudo code of a typical null task. The null task is just a super loop designed to use idle processor cycles.

Listing 1.
 Pseudo code for an RTOS based null task.
   
Null_task { loop { Typically, nop code goes here, although user specific code may be inserted here } forever // loops until preempted by a higher priority task }

The duty cycle approach operates by manipulating application code so as to observe non-idle tasks. This approach first inserts code to toggle an output port on the processor. The port is a single line or pin that is turned on and off by different code modules. Moreover, the pin's value determines whether the application is idle. The toggled pin is then available for monitoring by a test instrument. The instrument is used to gather the task information. Once gathered, the information is analyzed to determine processor utilization under certain operating conditions. Changes to the code are required before this approach is useable.

The duty cycle approach requires two changes in order to enable monitoring. The first change is within the null task, and the second is within the task scheduler. Changes within the null task consist of accessing an available output pin port. Code is added to set the output pin upon entering the null task. The task scheduler, which is part of the RTOS kernel, is responsible for scheduling and swapping tasks. Code is added here to reset the output pin whenever a task is swapped out. (Additionally, changes to the scheduler require access to the kernel source.) The net effect of these changes is to toggle the pin to high when idle and low when non-idle (doing work). Listing 2 shows the pseudo code for the required changes of the null task and scheduler.

Listing 2.
 Pseudo code changes in a) null task, and b) task scheduler,
 for implementing both duty cycle techniques.
   
a) null_task { loop { set output pin } forever b) task_scheduler { preempt currently running task select highest priority task that is ready to run . . . check ready task to see if it is the idle task if ready task is non-idle task { reset output pin } . . . }

The duty cycle approach is itself composed of two techniques that may be used to determine processor utilization. Code changes to use these two techniques are the same for each. The only difference is in how the information is gathered. These techniques are named based on the test instrument that is used to gather the duty cycle information from the output pin. The first is the Volt Ohm Meter (VOM) technique, while the second is the oscilloscope technique. Both have strengths and weaknesses with each discussed below.

Volt-Ohm Meter (VOM) Technique

The Volt-Ohm Meter (VOM) technique is general purpose and determines average processor utilization. This technique determines aggregate processor utilization for the whole application rather than for individual tasks. The VOM technique may be used during the implementation, integration, and testing stages of development. Furthermore, this technique uses a VOM as its measuring instrument.

Four advantages are associated with using the VOM technique. The VOM technique provides a rough estimate of performance, requires minimal analysis knowledge and diagnostic skills, utilizes inexpensive equipment, and is minimally code invasive.

The VOM technique gives a quick ballpark utilization figure. This technique operates by summing and averaging all waveforms present using the Root-Mean-Squared (RMS) calculation. Once calculated, the RMS value is shown on instrument's display. The second advantage is that the technique is easy to understand. Minimal analysis and diagnostic knowledge is required for effective use. For the most part, all that is required to utilize this approach is the ability to read a VOM. The third advantage is its cost. The VOM is a standard piece of test equipment that is very inexpensive to acquire and use. Fourthly, the technique is minimally code invasive. Only a few changes in the code are required for use.

Importantly, three disadvantages are associated with using the VOM technique: potential accuracy problems, aggregate performance, and a requirement for an external device to determine utilization.

The accuracy disadvantage exists because the VOM technique uses averaging. The VOM technique is accurate for measuring applications that are non-fluctuating and utilize the processor at a relatively constant rate. Bursty applications can be problematic. A bursty application, one that fluctuates in utilization, occurs when the processor utilization varies greatly from one time interval to the next. The VOM technique averages bursty applications, which can lead to gross inaccuracies.

The following example of this inaccuracy shows the dilemma. If the VOM is averaging a bursty application with a processor utilization value over five evenly dispersed time intervals of 5%, 76%, 34%, 17%, and 65%, the resulting performance measurement will be an average of about 39%. The error due to averaging would be significant if the actual utilization is, for example, 27%. Bursty applications may give results that are too far off to be useful for computing processor utilization. On the other hand, a non-bursty application over the same time interval with readings of 25%, 29%, 28.5%, 26.5%, and 27% yields an average that is quite close to the actual. Therefore, if greater accuracy is needed, other techniques (described later) may be optimal.

The second disadvantage, aggregate performance, arises because the VOM technique is only good for measuring overall processor utilization. If the measurement of individual task utilization is required, this technique is unusable.

The third disadvantage is that the VOM technique requires additional hardware to use: namely, the Volt-Ohm Meter. Although a disadvantage, this additional equipment reliance is minor since the VOM is so inexpensive.

In general, the VOM technique is useable if the application is non-bursty with a limited number of resources available and if an approximate result is satisfactory.

Oscilloscope/Logic Analyzer Technique

The oscilloscope/logic analyzer, (scope) technique operates by graphically keeping track of the duty cycle in order to determine aggregate processor utilization using a logic analyzer or oscilloscope. Both logic analyzers and oscilloscopes are able to graphically display waveforms, capture a specific waveform, trigger on an event, or compare different waveforms over time. Modern oscilloscopes have much of the same functionality as logic analyzers. Therefore, for all practical purposes logic analyzers and oscilloscopes are interchangeable for using this technique.

The oscilloscope or scope technique is like the VOM technique in many ways. The scope technique is general purpose, determines aggregate processor utilization, may be used during the same phases of development, relies on the null task based RTOS, and is minimally code invasive. Although similar to the VOM technique, the scope technique has important differences make it superior.

However, there are several disadvantages to the scope technique: namely, the amount of analytical knowledge and the expense. The scope technique requires a detailed knowledge of analysis theory, and diagnostic skills. Specifically one must understand how to operate an oscilloscope and interpret the results. This technique is more expensive than other techniques because of the expense of purchasing an oscilloscope. As a result, many more people have access to a VOM than a scope.

Several advantages make the scope technique superior to the VOM technique. The scope technique provides greater accuracy than the VOM technique due to the graphical nature of the oscilloscope. The scope technique is believed to be approximately 98-99% accurate. The scope technique is also more flexible in that it produces accurate results for both bursty and non-bursty applications. Unlike the VOM technique, the scope technique is also scaleable through enhancements.

Oscilloscope Enhancements

One enhancement that may be made would allow the scope technique to support individual task utilization. This enhancement can use a null task or a non-null task based RTOS. Supporting this enhancement requires a special kind of oscilloscope, a sufficient number of output pins, and more extensive code changes. A multi-channel scope is also required to support this type of enhancement. A multi-channel scope displays numerous output pins at once. A typical multi-channel scope may display as many as a sixteen channels simultaneously. Besides a multi-channel scope, the processor must have the desired number of output pins available on the processor. Finally, this enhancement requires more extensive code changes than the basic scope technique.

Assuming that the processor has enough output pins and the scope that has enough channels, the same approach could be used and scaled up by adding code for toggling six different output pins with a separate pin per task. This approach is not suggested with the VOM technique because it would be too cumbersome. However, the setup may be implemented with a six channel scope. In a system with six tasks, the system may be reasonably instrumented to handle all six tasks with the enhanced scope technique. Although scaling well, the process becomes cumbersome using this technique for more than ten tasks because of the necessary management of the code changes.

The scope technique allows the capture of processor utilization with greater accuracy than with the VOM technique. Additionally, this technique may be scaled to calculate the individual task utilization by simply changing code for different output pins and connecting a multi-channel scope to monitor the results. Keep in mind that the maximum number of tasks monitored is limited by the input channels of the scope and by the number of changes that can be effectively managed at once. Although the scope technique may be modified to provide individual task utilization, the counter approach is superior to either of the duty cycle techniques in terms of flexibility, scalability, and cost.

Counter Approach

The counter approach is more flexible and general purpose than any other method discussed to this point. The counter approach provides a means of displaying processor utilization for each task in an application. Like the other empirical approach, the counter approach may be used during the implementation, integration, and testing stages of development. The counter approach is as accurate as the scope technique and does not require any expensive test instruments to determine the utilization. The only requirements are a preemptive RTOS, a real-time counter, and additional software to capture and display the counts.

The counter approach operates by keeping separate data structures or "bins" for each task. The bin holds counter information and any other required statistical information. When the scheduler performs a context switch, the bin for that specific task is updated. Updating consists of enabling the real-time clock and starting the counter for that bin. The counter is incremented while the task is running. When the scheduler is ready to run another task, it disables the clock. Moreover, once the context switch occurs, the time differential, or delta, is recorded in the bin associated with that task before a new task is scheduled. Deltas for each bin are summed to provide the complete time for the application. The collected bin information is displayed by a user written profiling utility.

The counter approach requires more code changes than the previously mentioned duty cycle techniques because each task in the system may potentially be modified. The counter approach may not require a null task based RTOS to operate if it is monitoring each task's utilization. If aggregate utilization is the goal, then a null task based RTOS is required.

Similar to the duty cycle approach, the counter approach also has two techniques that may be used to determine processor utilization. Although separate, these techniques are variations of each other. The first technique is to have two bins configured to capture aggregate processor utilization. The second technique is to have a bin for every task in the application.

First Technique: Single null task

The first counter approach technique is similar to the duty cycle techniques and requires a change to the RTOS's task schedule in order to operate. Code in the task scheduler is modified so that a counter is stopped, the delta interval is calculated, the delta value is summed, the results are stored in the bin, the counter is reset, and finally the counter is started again when the null task is scheduled. When a non-idle task is scheduled, the null task is preempted. The task scheduler stops the counter, calculates the delta, and sums the delta with the existing delta in an "idle" bin. The task scheduler then resets and restarts the counter. The counter runs until the scheduler detects a context switch back to the null task. At that time, the scheduler stops the timer, records the summed delta and stores it in the "non-idle" bin. Once stored, the process is ready to begin again. Listing 3 presents the required changes to support this technique.

Listing 3.
 Pseudo code for implementing single task counter technique.
   
task_scheduler { . . . if power up sequence true { clear bins reset timer counter } Determine if scheduled task is idle or non-idle Case task idle: { stop timer counter calculate delta from timer counter sum counter delta with contents of non-idle task bin store results in non-idle task bin reset timer counter start timer counter } Case task non-idle: { stop timer counter calculate delta from timer counter sum counter delta with contents of idle task bin store results in idle task bin reset timer counter start timer counter . . . }

The scheduler is responsible for recording the sums of the each bin's delta. The utilization number from this technique is derived by:

U(n) = S ai / S (ai + bi)

Where U(n) is the utilization for n-tasks, ai is the deltas for the idle task, Sai is the summation of the deltas for the idle task, bi is the deltas for all the non-idle tasks, and S(ai + bi) is the summation of all the deltas for both the non-idle and idle tasks. This ratio gives the aggregate processor utilization for a single task technique.

Although similar to the duty cycle techniques, this technique is superior to both for several reasons. First, the single null task technique does not require changes to the task code because all changes occur within the task scheduler. Second, this technique does not require a great deal of analysis expertise to use because the code profiler utility translates the information. Third, no additional equipment is required to gather the utilization information. Finally, this technique is more scaleable than the previously mentioned techniques.

Second Technique: Multiple tasks

The second technique for the counter approach is a variation of the first. Specifically, the multiple tasks technique is a scaled technique to provide better granularity at the individual task level. This technique contrasts the previous one by using multiple bins to store processor utilization information for each task. The multiple task technique has the ability to determine both aggregate as well as individual task utilization information.

The multiple tasks technique operates in almost the identical manner to the first technique. The primary difference is in the increased number of bins to be managed. As a result, the logic required to manage more bins is more complex. Instead of juggling solely between an idle and non-idle bin, the logic in the task scheduler has to keep track of one bin for every task. An advantage of this added complexity is that no additional code changes are needed within any of the application tasks. However, a side effect of isolating the changes to the task scheduler is that the changes are transparent to the application. Listing 4 shows the pseudo code for managing multiple tasks.

Listing 4.
 Pseudo code implementation of generalized counter
 technique that supports multiple tasks.
   
task_scheduler { . . . if power up sequence true { clear bins reset timer counter } Determine if identity of currently executing task stop timer counter calculate delta from timer counter sum counter delta with contents of currently executing task bin store results in currently executing task bin reset timer counter start timer counter . . . } }

This technique is more flexible than the single task technique. The multiple task technique does not require a null task because this technique keeps track of all tasks rather than just an idle and a non-idle task. This technique may be configured to profile aggregate utilization, individual utilization, or both concurrently. The multiple task technique is usable for either a small or a large number of tasks, however, managing many code changes for a large number of tasks may be difficult.

All of the empirical techniques discussed for counter approach provide processor utilization for tasks and do not include Interrupt Service Routines (ISRs). However, ISR utilization could be easily added to the scope technique by inserting code into the ISR itself. The counter approach and techniques may also be modified to deal with ISRs, though more extensive modifications would be required. Specifically, the modifications involve the addition of logic within the ISR to handle the ISR bin and its control.

Example

The following real world example is used to illustrate how one of the empirical techniques may be used to determine processor utilization.

As stated initially, the purpose of a performance investigation is to determine if a current product has enough processing power to support an enhancement without a processor upgrade. Specifically, we seek to determine the amount to which the processor is utilized during a single channel test. The results may then be analyzed and scaled to see if enough bandwidth is available for the current product's processor to support 24 channels simultaneously.

The existing product for this example is an Integrated Services Digital Network (ISDN) test instrument that supports two 64K baud, plus an additional 16K baud channel; often referred to as Basic Rate Instrument or BRI. The application supports 41 tasks. These tasks are synchronized and have different priorities based on the external events they service. Moreover, the tasks are objects that are provided by the Real-Time eXecutive in C (RTXC) RTOS. RTXC is a preemptive multitasking RTOS that satisfies the requirements of the selected empirical technique. The application software runs on a proprietary hardware platform based on the Motorola 68360 processor.

In order to determine the worst-case processor utilization for BRI, the application must be stressed. This stressing is accomplished by using one of the most processor intensive ISDN tests--the Basic Error Rate Test (BERT). The BERT test measures the quality of a channel and operates by testing a single 64K baud ISDN channel and sending a fixed sequence of 63 bits every 20 seconds. Bits are simultaneously received and compared against those sent. The results of the comparison yield the channel quality.

The oscilloscope technique was chosen for this evaluation over the other techniques for several reasons. This technique is accurate, general purpose, works for both bursty and non-bursty applications, and provides aggregate processor utilization. Furthermore, this technique is minimally invasive code-wise, and only requires modifications in two places in the code. Thus, the oscilloscope technique was chosen because it satisfied the selection criteria the best.

To use the oscilloscope technique, the application requires several code changes. Specifically, code is added in two sections. The first change is to the null task and the second is in the scheduler. Listing 5 shows the modified RTXC null task.

Listing 5.
 RTXC null task changes to support oscilloscope technique.
   
/* C main() module */ int main(void) { int i; . . . for (;;) /* loop forever (null task) */ { if (brkflag) ; /* this could be "break;" which aborts RTXC */ SET_P1; /* macro that sets port pin 1 via xilinx */ } . . . return(1); }

The code is a macro that is used to set the pin to one output port upon entering the task. Whenever the application is idle, the output pin is set to a five Volt high logic level. Listing 6 shows the relevant code with in RTXC's scheduler. RTXC's scheduler routine is called postem(). The postem() routine handles ISR and task scheduling. The scheduler code resets the output pin to a zero volt low logic level when a context swap occurs. These changes allow the monitoring of the output pin while it toggles on and off depending on the task. In summary, a high condition signifies that the application is idle, while a low signifies that the application is doing some work. After the code changes are made, the oscilloscope is then connected to the output pin and set to capture one cycle of the test. Now the BERT test is ready to run.

Listing 6.
 RTXC scheduler changes to suport oscilloscope technique.
   
/* RTXC task scheduler */ static FRAME *postem(void) /* returns with interrupts disabled */ { . . . if (hipritsk->priority != NULL_TASK_PRIORITY) /* reset pin if non-idle task */ RESET_P1; /* macro, resets port pin 1 */ . . . return(hipritsk->sp); /* exit to hipritsk via tcb.sp */ }

Figures 1 and 2 show two significant events of the BERT test. Figure 1 is a waveform capture of the point at which the application is busy exercising the channel. This figure also shows the area of interest over a one second range. Figure 2 is a greater magnification of the right side of Figure 1. This image shows the spikes of non-idle task time. Calculating total non-idle task time is accomplished by summing all low-level waveforms of Figure 1. Starting from the first low edge of the large non-idle waveform produced 670 total milliseconds for all non-idle tasks. This finding translates to a processor utilization of 67%.

Figure 1: The Application Exercising the Channel



Figure 2: Spikes of Non-idle Task Time



Comparing the oscilloscope technique to the VOM technique, the VOM reading on the same pin was 2.42 Volts, which translated to 48.5%. The VOM approach does not provide good accuracy since the test is run over 20 seconds and most of that time is spent idle. To produce an accurate result for the VOM technique the 20-second period has to be normalized to one second. To get 3.35 Volts, corresponding to the 67% found with the oscilloscope, one would have to put the VOM lead on the pin at the start of the non-idle task and remove it one second later. This quick manual change is clearly impractical for this setup. Thus, the oscilloscope technique allows us to focus on the area of interest, in this case the non-idle time, and analyze it to determine the normalized processor utilization for worst-case usage.

The empirical results substantiate that the oscilloscope technique answered our questions regarding the systems' performance. The results from the BERT test verified that a performance problem existed and that the existing hardware would not be able to scale up and handle the simultaneous testing of 24 64K baud channels. This finding verified the original analysis of the application based on deadlines and overall processor cycles available, also concluding that insufficient processor bandwidth was available to accomplish our goal. As a result, the empirical test verified the analytical RMA analysis.

Summary

There exist several different methods for performance utilization monitoring. The theoretical RMA approach is used during the early stages of development while the others, practical approaches like the duty cycle or the counter approach, are used during the later stages of development to gather empirical information. All have their own strengths and weaknesses. Each individual developer must decide which approach best suites their needs.

Most importantly, the earlier that performance requirements are dealt with, the less expensive the product costs are for development. Problems cost more to repair later in the development cycle. The costs to fix bugs later in the development cycle are much higher than for using RMA initially. When using the RMA approach is not possible, the other approaches can also be effective and are definitely better than not addressing performance requirements at all.


Copyright © 1995-2007 Intelligraphics Inc. All Rights Reserved. Legal Information