OpenSolaris

  subsites   code review   repo   packages   bugs   defect   polls   planet
You are not signed in. Sign in or register.

Instruction Based Sampling (IBS)

Overview

Instruction Based Sampling is an performance observability feature available as of AMD family 0x10 processors (e.g. Barcelona). While many modern processors offer performance counters as a mechanism for observing counts of certain performance relevant events, this data often lacks the specificity needed to gain an accurate understanding of performance (or the lack thereof). As an example, many performance counter facilities enable one to count memory references, but this doesn't show which memory is being accessed.

In many ways, AMD's Instruction Based Sampling facility bridges this gap. It works by periodically sampling instructions (or instruction ops) from an instruction stream (program execution). Detailed information about the sampled instruction/op is then collected as it makes its way through the pipeline. The information is then made available through the IBS facility.

IBS provides the performance analyst with a mechanism for effectively observing:

  • Virtual / physical memory access patterns and utilization
  • Cache / TLB utilization
  • Instruction fetch / execution latencies
  • Branch prediction effectiveness
  • ..and more.
IBS is described in Appendix G of Software Optimization Guide for AMD Family 10h Processors. This article provides an example of how IBS can be used (using matrix multiplication as example).

IBS Dynamic Tracing (DTrace) Provider

A prototype DTrace provider has been developed that allows one to interface with the IBS feature through DTrace. The provider exports a set of ibs DTrace probes that (when enabled) fire after IBS samples an instruction / op.

The information IBS provides about the sampled op/instruction is available both in the body of the DTrace probe, as well as the probe's predicate. DTrace allows one to easily build predicates to filter for the performance events of interest, and its data aggregation features provide a powerful mechanism for managing, analyzing, and visualizing the stream of performance data the IBS feature provides.

Status

A fairly full featured prototype is available.

Using the provider

The purpose of IBS DTrace provider is to provide convenient access to the IBS functionality. Currently the provider provides 2 kinds of probes:
  • ibs-fetch-x: For programming the fetch control. The x in the probe name indicates the time interval in terms of number of instruction fetches after which the IBS should pick up an instruction for recording the desired data. Right now it takes any value between 500 and 65535.
  • ibs-exec-x: For programming the execution control. The x in the probe name indicates the time interval in terms of number of executed micro-ops after which the IBS should pick up a micro-op for recording the desired data. Right now it takes any value between 500 and 65535.
Note: The x in the probe name actually goes into bits [4:19] of the 20 bit count of instruction fetches/micro-ops executed (with bits [0:3] being 0). So the actual number of instruction fetches/micro-ops executed before the IBS selects an instruction/micro-op for recording data is greater than x. For instance x = 1000 corresponds to 16000 instruction fetches/microops executed. When the probe fires, the recorded data is returned in a data structure as args[0]. For the fetch probe the data structure is as follows:
typedef struct ibs_fetch_data {
	int cpu_id;

	int icache_miss;
	int l1tlb_miss;
	int l2tlb_miss;
	int l1tlb_psize;
	int latency;
	int phyadr_valid;

	uintptr_t linadr;
	uintptr_t phyadr;
} ibs_fetch_data_t;
For the exec probe the data structure is as follows:
typedef struct ibs_exec_data {
	int cpu_id;

	/* Op Data register */
	int comp_to_retire_count;
	int tag_to_retire_count;
	int resync_uop;
	int mispred_return_uop;
	int return_uop;
	int taken_branch_uop;
	int mispred_branch_uop;
	int retired_branch;

	/* Op Data2 register */
	int cache_hit_state;
	int dest_processor;
	int source;

	/* Op Data3 register */
	int load_op;
	int store_op;
	int l1tlb_miss;
	int l1tlb_2m;
	int l1tlb_1g;
	int l2tlb_miss;
	int l2tlb_2m;
	int l2tlb_1g;
	int dcache_miss;
	int dcache_miss_latency;
	int misaligned_access;
	int load_bank_conflict;
	int store_bank_conflict;
	int data_forwarded;
	int data_forward_cancelled;
	int mem_uncached;
	int mem_writecombining;
	int locked_operation;
	int mab_hit;
	int linadr_valid;
	int phyadr_valid;

	uintptr_t logadr;
	uintptr_t linadr;
	uintptr_t phyadr;
} ibs_exec_data_t;
The names of the fields are representative the information they store. For more details refer to the family 0x10h Optimization guide (above).

Sample D scripts

The following simple script sums up the dcache misses caused by different executables. Note that this number would not be a precise total, since the accounting is not done on a per instruction or micro op basis. But still it gives a reasonable indication of how each executable is doing in terms of cache misses.
#!/usr/sbin/dtrace -s

#pragma D option quiet

ibs-exec-2000
{
        @exec[execname] = sum(args[0]->dcache_miss);
}

END
{
        printf("\nDcache misses per exec:\n");
        printa(@exec);
}
The following script adds more functionality and observes only an executable called "memtest":
#!/usr/sbin/dtrace -s

#pragma D option quiet

ibs:::ibs-fetch-500
/execname == "memtest"/
{
        @fetch[execname] = sum(args[0]->l2tlb_miss);
}

ibs-exec-1000
/execname == "memtest"/
{
        @exec[execname, args[0]->cpu_id] = sum(args[0]->dcache_miss);
}

ibs-exec-1000
/execname == "memtest" && args[0]->dcache_miss == 1 && args[0]->linadr_valid == 1/
{
        @linadr[args[0]->linadr] = count();
}

END
{
        printf("\nNumber of L2 TLB misses:\n");
        printa(@fetch);
        printf("\nDcache misses per core:\n");
        printa(@exec);
        trunc(@linadr, 10);
        printf("\nTop 10 VA that caused dcache misses:\n");
        printa("%16x   %16x   %@10d\n", @linadr);
}

Limitations and Known Issues

  • The IBS module has a dependency on dtrace and the pcplusmp modules. Make sure they are loaded in the system.
  • At a time only one period can be programmed for both the fetch and execution probes. For instance ibs-fetch-2000 and ibs-fetch-5000 cannot be used together. The periods for fetch and execution probes can be different though, as shown in the sample scripts.
  • Ideally the execution probe should not be programmed for less than a period of 1000, since it causes a tremendous number of interrupts (remember the execution unit counts the number of micro-ops, whose count goes up much faster than that of instructions). In our experiments with the second sample script, ibs-exec-1000 causes a performance delay of around 40% for "memtest" (the executable being observed).

IBS DTrace Provider Source Repository

  • ibs-gate: Anonymous pull is allowed. You must either be a leader of this project or a committer for the ibs-gate repository to push. Please read these instructions on how to use Mercurial repositories. The repository can also be browsed using the OpenSolaris Source Browser.

    Gate Status: The repository is is synced against build 93.

    Closed Binaries tarballs: Build 93 closed bins tarballs (needed for nightly(1)) can be downloaded here.

    To clone from the ibs-gate repository
        $ hg clone ssh://your-login@hg.opensolaris.org/hg/amd/ibs-gate
        
    For help with using Mercurial, or the ON tools, you can:

  • To make a (debug) kernel (using ibs as an example workspace, and opensolaris.sh as the environment file)
        $ cd ibs
        $ /opt/onbld/bin/bldenv -d /opt/onbld/bin/opensolaris.sh 
        $ cd usr/src/tools
        $ dmake install
        $ cd $CODEMGR_WS/usr/src/uts
        $ dmake install
        
  • To create a kernel tarball to install (x86)...
        $ /opt/onbld/bin/Install -G my_ibs_kernel -k i86pc
        
  • To build BFU archives, you need to get (and extract) the "closed bins" tarball(s) into your workspace. See above for current pointers (you must use versions appropriate for the build of onnv against which your repo is synced).
        $ cd ibs
        $ tar xf on-closed-bins.i386.tar
        $ /opt/onbld/bin/nightly /opt/onbld/bin/opensolaris.sh
        
  • See the OpenSolaris Developer's Reference for details on how to use kernel tarballs generated by Install(1).

IBS DTrace Provider Binary Package

To ease testing of the provider, a preliminary binary package is available here (created 2008-08-13 16:00).

This package contains the IBS provider module and a special devfsadm link module to create the device link in the /dev filesystem. To add this package, extract the tarball into some directory and use pkgadd(1M) add it:

$ cd /tmp
$ mkdir ibs
$ cd ibs
$ gzcat SUNWibs.tar.gz | tar xf - 
$ pkgadd -d .