GSoC Idea: Microbenchmarking extension

Microbenchmarking extension: access to CPU performance counters

Areas C, extensions, cpu counters, portability
Good if student knows C (required)
Priority Medium
Difficulty Medium
Benefits to the student Getting familiar with cpu performance counters, learning to code a Tcl extension, learn about portability issues
Benefits to Tcl Improved tools for performance estimation, both in applications and core development
Mentor Miguel Sofer
See also GSoC Idea: Core Performance Analysis (larger context)

Introduction

In Tcl, the tools that can be used to estimate the performance of selected pieces of code are the time and clock commands. They both measure wall-clock time elapsed.

In modern computers the wall-clock time depends on so many external variables (cpu load, cache effects of other processes or threads, etc) that it does not provide a reliable for performance optimization except for order-of-magnitude effects.

Modern CPUs have hardware counters that may provide more reliable performance estimates, especially with respect to cacheing problems - today's main performance bottleneck ([1] slide 64). There are tools like linux's perf that provide access to these counters, providing measurements between process start and process end. Other tools that allow an estimation of the cache effects are Valgrind's Cachegrind tool, but: it is extremely slow, it measures simulated cached effects, it is not easy to use except for full process measurements.

In order to assist performance estimation for both the Tcl core and scripts it would be desirable to control the access to the hardware counters from Tcl scripts.

Project Description

The goal of this project is to design and implement a Tcl extension with commands to interact with the CPUs harware counters. Initially the goal is to code an extension that works under linux using [2].

If time permits, the student will research the possibility of porting the extension to Windows and/or OSX. This will entail finding out about interfaces analogous to [2], (possibly) redesigning parts of the extension's C-code so that it can be configured to work with the three different APIs, and coding a portable extension.

References

  • [1] A Crash Course in Modern Hardware [L1 ]
  • [2] Performance Counters for Linux [L2 ]
  • [3] A JVM does that? Slides 20 and 21 in [L3 ] suggest that java may have interesting code to look at (video at [L4 ])

Comments & Discussion

Some comments here, and discussion of the idea