For more see: http://www.haplr-index.com/outliers_and_misbehaving_data_in.htm
Why did LJ decide to use the “outlier” numbers that caused San Diego County to get a five star rating that appears questionable? Did this decision cost other libraries star ratings?
Why does the LJ Index “Score Calculation Algorithm” allow one measurement to swamp the score? Is this data misbehavior intended as one of the authors claims below?
In the LJ Index calculations, San Diego County’s incredibly high score (889% above the group average!) for Public Internet Use cancels relatively low scores for Circulation (48% below), Visits (29% below), and Program Attendance (20% below). In the latest LJ Index, San Diego ranked 4th and got 5 stars.
LJ’s February edition omitted San Diego County Library because it did not report Public Internet Use sessions. The Library received five stars in the November edition, called Round Two It reported 16.5 million “Public Internet Use” sessions. Newer data on the California State Library web site reports a more likely 1.4 million. Did San Diego County, among many others, report hits rather than sessions? Didn’t the numbers surprise LJ?
For 16.5 million sessions to be correct, on average, all visitors had to have used the internet terminals an average of 4.2 times every time they visited the library! That is highly unlikely. IMLS, the federal agency that publishes the data, has “edit checks” that are supposed to alert data coordinators to numbers that are out of range. Somehow, 132 libraries in 38 states were reported as having almost every reported visitor use the Public Internet Terminals at every visit. IMLS published a remarkable 8 sessions for every user visit for one library. Did the process work for the latest data?
How does this affect the LJ Index Star Libraries roster? With the more reasonable 1.5 million number, wouldn’t San Diego’s score fall from 989 to 450? Rather than 5 Stars for being 4th ranked out of 36 libraries, they would fall to 22nd ranked and no stars. Isn’t that precisely what will happen with round three?
Am I wrong that this single correction changes the scores of every other library in the group? In all, 29 of the 36 libraries would change rankings with just this one outlier number corrected. Isn’t that a lot of volatility for just one data element?
Should LJ have left San Diego County Library out of the mix because of the questionable data? At the very least, should they not have acknowledged the problems? The LJ authors have certainly given me enough grief about not giving sufficient warning about the vagaries of HAPLR data over the years.
In Ain’t Misbehavin’! , LJ Index co-author Ray Lyons’ Blob piece says, “LJ Index scores are not well behaved. That is, why they don’t conform to neat and tidy intervals the way HAPLR scores range from about 30 to 930.” Lyons says that LJ Index is more informative than percentile-based rankings like HAPLR. Lyons notes that the LJ Index has a “challenging problem” with outliers that can distort the ratings. Is that what happened here? Aren’t there other examples of this happening in other spending categories?
- Thomas J. Hennen Jr.
- Racine, Wisconsin, United States
- We (my wife and I) are celebrating the 11th Anniversary of HAPLR, and more importantly, our 38th Anniversary. The HAPLR system uses data provided by 9,000 public libraries in the United States to create comparative rankings. The comparisons are in broad population categories. HAPLR provides a comparative rating system that librarians, trustees and the public can use to improve and extend library services. I am the director of Waukesha County Federated Library System.