Well, baseball spring training is starting. My Minnesota Twins have just started with pitchers and catchers reporting last week. I look forward to the arrival of baseball each spring. It means the beginning of spring, of course, which will be a welcome respite from this winter which has been rather harsh for Ohio. It also means that the baseball season is around the corner. I follow several Minnesota Twins related blogs, among them Twinsdaily, Twinkietown, and Aaron Gleeman. Now that spring training is upon us, there is lots of content starting to come out. Baseball bloggers have gotten very keen on sabermetrics as described in the book, Moneyball. The tenets of sabremetrics remind me a bit of Fit For Purpose for computer systems.
Moneyball describes how the Oakland Athletics began the widespread use of sabermetrics to build a winning baseball team. Sabermetrics uses advanced baseball statistics to help identify undervalued players. As a small market team, the Oakland A’s, and my Minnesota Twins, usually can’t afford to compete for the big name players against the likes of the New York Yankees, Boston Red Sox, and Los Angeles Dodgers. That means they have to find value in players that the other teams might overlook. Sabermetricians look to find under appreciated players whose actual impact might be overlooked in the media hype and traditional scouting. They use advanced baseball statistics such as on base percentage (OBP), slugging percentage, on-base percentage + slugging percentage (OPS), fielding independent pitching (FIP), and many more. Rather than depending on the hype and the eyes of scouts, they are using measurements that tease actual value out a players actual performance.
How does this relate to Fit For Purpose? Well, as I was thinking about Moneyball, it occurred to me that the sabermetricians are looking for actual performance rather than hype and that is similar to the way that a fit for purpose methodology looks to match a computer system’s characteristics to the workload, and not just depend on the hype of the day. In the computer world, we hear all about technologies — 22nm and 14 nm semiconductors geometry, clock speeds, SPECint ratings, and the like. But how do those characteristics help with real, live workloads? Maybe they matter, maybe they don’t. What really matters is how well a system can perform actual workloads and how well it meets user non-functional requirements. To sabermetricians, it doesn’t matter that Derek Jeter has great press and a reputation as a great hitter and great fielder. The proof is in the performance. There are probably elements of truth to Jeter’s reputation, and it may have applied more when he was younger than when he was approaching the end of his career. But, sabermetrics looks to find the reality of his performance, and especially his recent performance to determine his current value. Similarly,the specs that a processor has only matter a little. What matters more is how the entire architecture meets the requirements.Here are some exmaples old baseball stats, and a better stat:
- ERA (Earned Run Average) — FIP (Fielding Independent Pitching)
- BA (Batting Average) — OBP (On-base Percentage)
- RBI (Runs Batted In) — Slugging Percentage
- Fielding Percentage (errors/chances) — UZR (Ultimate Zone Rating)
What are some of the metrics available in the computer world? Well, there are lots of benchmarks out there, like the various SPEC benchmarks, Transaction Processing Performance Council (TPC), SAPS SD, and some others. And, of course there are simple metrics like clock speed, cache size, memory size, etc. All of these provide some information, but can be hard to extrapolate to real-world workloads. TPC-C was intended to look like a real-world workload, but it was developed 20-ish years ago, and the state of computing has improved to the point that TPC-C has become less dependable. Also, vendors have become fairly adept at tuning to the benchmark, so that is also affecting the applicability of benchmarks. If you want to know what a simple workload, like a computational workload with little I/O, would behave like on various platforms, then the SPEC benchmarks can be useful. But, most workloads don’t look like that. The simple benchmarks pretty much mirror the clock speeds. So, are there any better examples out there? IBM’s System z performance guys publish a document, the Large Systems Performance Reference (LSPR), that attempts to do a better job, but it is unique to System z. There really is not anything like it in the open systems world. IBM has also developed an idea called workload factors that tries to account for variations in processor architectures and relate it back to System z. Workload factors have been developed mostly by experience with comparing the performance of various types of workloads in actual performance testing at IBM. Depending on the type of workload, the relative capacity may be higher or lower than the clock speed or simple benchmarks like SPEC.
Even without the level of analysis that IBM has done in the LSPR, you can see variations even with the fairly simple benchmarks. The more parallel, or specialized the benchmark, the more it reflects the basic characteristics of the processor itself. So, SPECInt measures integer arithmetic. SPECfp measure floating point. Since pretty much all processors have built-in integer and floating point unit nowadays, SPECInt and SPECfp mostly mirror clock speed. SAPS measures the SAP transaction rates, and mostly measures the application tier, which is the most processor bound and most paralellizable. And, TPC-C is probably the closest common measurement to a transactional workload, but its age makes it less predictive than it would have been 15 or 20 years ago.
There have been some interesting presentations and work published by the Computer Measurement Group. Among them are: Roger Rogers and Joe Temple, “Relative Capacity and Fit for Purpose Platform Selection”, CMG Journal of Computer Management, no 123, March 2009, Rick Lebsack and Joe Temple, “Fit for Purpose Platform Selection, a Workload View” CMG Journal of Computer Management, no 129, August 2011, and presentations by Joe Temple at CMG’s conferences, most recently, “Common Metrics don’t Work”