4. High-resolution timing
4.1 Delays
First of all, I should say that you cannot guarantee user-mode
processes to have exact control of timing because of the multi-tasking
nature of Linux. Your process might be scheduled out at any time for
anything from about 10 milliseconds to a few seconds (on a system with
very high load). However, for most applications using I/O ports, this
does not really matter. To minimise this, you may want to nice your
process to a high-priority value (see the nice(2)
manual page) or
use real-time scheduling (see below).
If you want more precise timing than normal user-mode processes give
you, there are some provisions for user-mode `real time' support.
Linux 2.x kernels have soft real time support; see the manual page for
sched_setscheduler(2)
for details. There is a special kernel that
supports hard real time; see
http://luz.cs.nmt.edu/~rtlinux/ for more information on
this.
Sleeping: sleep()
and usleep()
Now, let me start with the easier timing calls. For delays of multiple
seconds, your best bet is probably to use sleep()
. For delays of
at least tens of milliseconds (about 10 ms seems to be the minimum
delay), usleep()
should work. These functions give the CPU to
other processes (``sleep''), so CPU time isn't wasted. See the manual
pages sleep(3)
and usleep(3)
for details.
For delays of under about 50 milliseconds (depending on the speed of
your processor and machine, and the system load), giving up the CPU
takes too much time, because the Linux scheduler (for the x86
architecture) usually takes at least about 10-30 milliseconds before
it returns control to your process. Due to this, in small delays,
usleep(3)
usually delays somewhat more than the amount that you
specify in the parameters, and at least about 10 ms.
nanosleep()
In the 2.0.x series of Linux kernels, there is a new system call,
nanosleep()
(see the nanosleep(2)
manual page), that allows
you to sleep or delay for short times (a few microseconds or more).
For delays <= 2 ms, if (and only if) your process is set to soft
real time scheduling (using sched_setscheduler()
),
nanosleep()
uses a busy loop; otherwise it sleeps, just like
usleep()
.
The busy loop uses udelay()
(an internal kernel function used by
many kernel drivers), and the length of the loop is calculated using
the BogoMips value (the speed of this kind of busy loop is one of the
things that BogoMips measures accurately). See
/usr/include/asm/delay.h
) for details on how it works.
Delaying with port I/O
Another way of delaying small numbers of microseconds is port
I/O. Inputting or outputting any byte from/to port 0x80 (see above for
how to do it) should wait for almost exactly 1 microsecond independent
of your processor type and speed. You can do this multiple times to
wait a few microseconds. The port output should have no harmful side
effects on any standard machine (and some kernel drivers use it). This
is how {in|out}[bw]_p()
normally do the delay (see
asm/io.h
).
Actually, a port I/O instruction on most ports in the 0-0x3ff range
takes almost exactly 1 microsecond, so if you're, for example, using
the parallel port directly, just do additional inb()
s from that
port to delay.
Delaying with assembler instructions
If you know the processor type and clock speed of the machine the program will be running on, you can hard-code shorter delays by running certain assembler instructions (but remember, your process might be scheduled out at any time, so the delays might well be longer every now and then). For the table below, the internal processor speed determines the number of clock cycles taken; e.g., for a 50 MHz processor (e.g. 486DX-50 or 486DX2-50), one clock cycle takes 1/50000000 seconds (=200 nanoseconds).
Instruction i386 clock cycles i486 clock cycles
xchg %bx,%bx 3 3
nop 3 1
or %ax,%ax 2 1
mov %ax,%ax 2 1
add %ax,0 2 1
Clock cycles for Pentiums should be the same as for i486, except that
on Pentium Pro/II, add %ax, 0
may take only 1/2 clock cycles. It
can sometimes be paired with another instruction (because of
out-of-order execution, this need not even be the very next
instruction in the instruction stream).
The instructions nop
and xchg
in the table should have no
side effects. The rest may modify the flags register, but this
shouldn't matter since gcc should detect it. xchg %bx, %bx
is a
safe choice for a delay instruction.
To use these, call asm("instruction")
in your
program. The syntax of the instructions is as in the table above; if
you want multiple instructions in a single asm()
statement,
separate them with semicolons. For example,
asm("nop ; nop ; nop ; nop")
executes four nop
instructions, delaying for four clock cycles on i486 or Pentium
processors (or 12 clock cycles on an i386).
asm()
is translated into inline assembler code by gcc, so there
is no function call overhead.
Shorter delays than one clock cycle are impossible in the Intel x86 architecture.
rdtsc
for Pentiums
For Pentiums, you can get the number of clock cycles elapsed since the last reboot with the following C code (which executes the CPU instrution named RDTSC):
extern __inline__ unsigned long long int rdtsc()
{
unsigned long long int x;
__asm__ volatile (".byte 0x0f, 0x31" : "=A" (x));
return x;
}
You can poll this value in a busy loop to delay for as many clock cycles as you want.
4.2 Measuring time
For times accurate to one second, it is probably easiest to use
time()
. For more accurate times, gettimeofday()
is accurate
to about a microsecond (but see above about scheduling). For Pentiums,
the rdtsc
code fragment above is accurate to one clock cycle.
If you want your process to get a signal after some amount of time,
use setitimer()
or alarm()
. See the manual pages of the
functions for details.
Next Previous Contents