TSC Register
The Time Stamp Counter (TSC) is a 64-bit register present in most modern x86 processors that counts the number of cycles since reset. The instruction RDTSC returns the TSC value in EDX:EAX registers.
As many modern x86 CPUs support out-of-order execution, the RDTSC will not necessarily be performed in the order that appears in the executable. This can cause the RDTSC instruction to be executed later or earlier than desired misleading cycle count. A serializing variant of the RDTSC instruction is often present, RDTSCP, forcing every preceding instruction to be completed.
The time-stamp counter increments at a constant rate, that may be set by the maximum core-clock to bus-clock ratio of the processor or may be set by the maximum resolved frequency at which the processor is booted (in older CPUs it was not incrementing constantly, varying with the core's actual frequency). The maximum resolved frequency may differ from the maximum qualified frequency of the processor.
Since the family 10h (Barcelona/Phenom) for AMD and Nehalem CPUs for Intel, the TSC increases at a constant rate and is synchronized across all cores of the CPU. However, the TSC value of a core could be changed by some software subsystem using the WRMSR instruction.[1]
As the compiler might also optimize and reorder code it is necessary to use "volatile" variables.
The code itself:
typedef unsigned long long Tsc_t;
inline Tsc_t readTsc(void)
{
  unsigned hi, lo;
  __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
  return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}
Measuring ticks between two TSC reads:
#include <stdio.h>
typedef unsigned long long Tsc_t;
inline Tsc_t readTsc(void)
{
  unsigned hi, lo;
  __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
  return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}
int main(void) {
  Tsc_t t1;
  Tsc_t t2;
  t1 = readTsc();
  t2 = readTsc();
  printf("%llu\n", (t2-t1));
  return 0;
}
Measuring ticks in one second:
#include <stdio.h>
#include <unistd.h>
typedef unsigned long long Tsc_t;
inline Tsc_t readTsc(void)
{
  unsigned hi, lo;
  __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
  return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}
int main(void) {
  Tsc_t t1;
  Tsc_t t2;
  t1 = readTsc();
  sleep(1);
  t2 = readTsc();
  printf("%llu\n", (t2-t1));
  return 0;
}
Notes
- ↑ Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 3 (3A, 3B & 3C): System Programming Guide

