How to Cpk the SQL Way

As mentioned earlier, I’ve been involved in client’s Production Reporting application project, when the subject of Cpk came up.  After a lot of inconsistent references to the statistic and lot’s of code that approximated but didn’t exactly calculate it, I finally discovered the proper formula for  Cpk.  Here it is:
Cpk  =   min (  USL - μ / 3 * \hat{\sigma}, μ - LSL / 3 * \hat{\sigma})

Where USL is the Upper Specification Limit, LSL is the Lower Specification Limit,  μ is the arithmetic mean of the results and \hat{\sigma}, sigma hat, is the estimated standard deviation (sigma hat is going to turn out to be the kicker in this equation).

Let’s say you have a product you have to make that must be within some specifications: like, the length of a sub sandwich.  It should always be 12″ long, but it’s acceptable if the final sandwich comes out between 11″ (LSL) and 13″ (USL).  Cpk is the minimum of either the average deviation from the upper limit divided by 3 times the estimated standard deviation, or the average deviation from the lower limit divided by 3 times the standard deviation.

All this would be easy to calculate using SQL if just plain ole standard deviation were involved.  However, using standard deviation instead of estimated standard deviation (sigma vs. sigma hat), and you have the equation for Ppk, a  different statistic.   Always one to be lazy, I ask our client: “would it be okay to report just the Ppk statistic?”  Of course, the answer was no! Continue reading How to Cpk the SQL Way

Computing Cpk statistics using SQL

Part 1 – Introduction

For the past several years I have performed a myriad of projects for this favorite chemicals company client of mine.  Recently, I was called in to help design and develop a Production Reporting application for them, to meet new requirements from their new, German-based, owners.  The application reports on things like how much product is produced, what was consumed to produce it, production yields, etc.  Several factories are involved, so we’re tieing all the data into one database.

The first challenge involved converting all the English-based measurements into metric.  Yes, I did say it was one of those metric-loving European countries, right?  So, we’ve been going over different conversion equations – some of them are interesting, because they’ve involved variable factors like the energy content of natural gas and the specific gravity of  oil.

Still, though, a piece of cake – just linear equations to convert from one unit of measure to another.  Had more problems with the quality of data (often missing from several plants) than the equations themselves. So, we’re progressing fine on the application; then, one day I’m in a meeting with all the plant managers, and they start talking about CPK statistics. They talked like of course everyone knows what this statistic is.  I’m sitting there, though, going “huh?”  I have a computer science undergraduate degree, with oodles of calculus, differential equations and statistics, and a economics graduate degree with its own heavy mathematical load in correlation analysis and statistics.  Yet, I’ve never heard of Cpk (properly, that’s how the statisticians spell it, C sub pk.  I still don’t know what the initials mean).

Continue reading Computing Cpk statistics using SQL