|
|
Instruction Based Sampling (IBS)OverviewInstruction Based Sampling is an performance observability feature available as of AMD family 0x10 processors (e.g. Barcelona). While many modern processors offer performance counters as a mechanism for observing counts of certain performance relevant events, this data often lacks the specificity needed to gain an accurate understanding of performance (or the lack thereof). As an example, many performance counter facilities enable one to count memory references, but this doesn't show which memory is being accessed.In many ways, AMD's Instruction Based Sampling facility bridges this gap. It works by periodically sampling instructions (or instruction ops) from an instruction stream (program execution). Detailed information about the sampled instruction/op is then collected as it makes its way through the pipeline. The information is then made available through the IBS facility. IBS provides the performance analyst with a mechanism for effectively observing:
IBS Dynamic Tracing (DTrace) ProviderA prototype DTrace provider has been developed that allows one to interface with the IBS feature through DTrace. The provider exports a set of ibs DTrace probes that (when enabled) fire after IBS samples an instruction / op.The information IBS provides about the sampled op/instruction is available both in the body of the DTrace probe, as well as the probe's predicate. DTrace allows one to easily build predicates to filter for the performance events of interest, and its data aggregation features provide a powerful mechanism for managing, analyzing, and visualizing the stream of performance data the IBS feature provides. StatusA fairly full featured prototype is available.Using the providerThe purpose of IBS DTrace provider is to provide convenient access to the IBS functionality. Currently the provider provides 2 kinds of probes:
typedef struct ibs_fetch_data {
int cpu_id;
int icache_miss;
int l1tlb_miss;
int l2tlb_miss;
int l1tlb_psize;
int latency;
int phyadr_valid;
uintptr_t linadr;
uintptr_t phyadr;
} ibs_fetch_data_t;
For the exec probe the data structure is as follows:
typedef struct ibs_exec_data {
int cpu_id;
/* Op Data register */
int comp_to_retire_count;
int tag_to_retire_count;
int resync_uop;
int mispred_return_uop;
int return_uop;
int taken_branch_uop;
int mispred_branch_uop;
int retired_branch;
/* Op Data2 register */
int cache_hit_state;
int dest_processor;
int source;
/* Op Data3 register */
int load_op;
int store_op;
int l1tlb_miss;
int l1tlb_2m;
int l1tlb_1g;
int l2tlb_miss;
int l2tlb_2m;
int l2tlb_1g;
int dcache_miss;
int dcache_miss_latency;
int misaligned_access;
int load_bank_conflict;
int store_bank_conflict;
int data_forwarded;
int data_forward_cancelled;
int mem_uncached;
int mem_writecombining;
int locked_operation;
int mab_hit;
int linadr_valid;
int phyadr_valid;
uintptr_t logadr;
uintptr_t linadr;
uintptr_t phyadr;
} ibs_exec_data_t;
The names of the fields are representative the information they store. For more details refer to the family 0x10h Optimization guide (above).
Sample D scriptsThe following simple script sums up the dcache misses caused by different executables. Note that this number would not be a precise total, since the accounting is not done on a per instruction or micro op basis. But still it gives a reasonable indication of how each executable is doing in terms of cache misses.
#!/usr/sbin/dtrace -s
#pragma D option quiet
ibs-exec-2000
{
@exec[execname] = sum(args[0]->dcache_miss);
}
END
{
printf("\nDcache misses per exec:\n");
printa(@exec);
}
The following script adds more functionality and observes only an executable called "memtest":
#!/usr/sbin/dtrace -s
#pragma D option quiet
ibs:::ibs-fetch-500
/execname == "memtest"/
{
@fetch[execname] = sum(args[0]->l2tlb_miss);
}
ibs-exec-1000
/execname == "memtest"/
{
@exec[execname, args[0]->cpu_id] = sum(args[0]->dcache_miss);
}
ibs-exec-1000
/execname == "memtest" && args[0]->dcache_miss == 1 && args[0]->linadr_valid == 1/
{
@linadr[args[0]->linadr] = count();
}
END
{
printf("\nNumber of L2 TLB misses:\n");
printa(@fetch);
printf("\nDcache misses per core:\n");
printa(@exec);
trunc(@linadr, 10);
printf("\nTop 10 VA that caused dcache misses:\n");
printa("%16x %16x %@10d\n", @linadr);
}
Limitations and Known Issues
IBS DTrace Provider Source Repository
IBS DTrace Provider Binary PackageTo ease testing of the provider, a preliminary binary package is available here (created 2008-08-13 16:00).This package contains the IBS provider module and a special devfsadm link module to create the device link in the /dev filesystem. To add this package, extract the tarball into some directory and use pkgadd(1M) add it: $ cd /tmp $ mkdir ibs $ cd ibs $ gzcat SUNWibs.tar.gz | tar xf - $ pkgadd -d . |