So I got my LSAT (Law School Admissions Test) score back the other day. I got a 170. The exact same score I got when I took the test in 2006. I would say I’m perfectly happy with the score, except for the fact that it seems to suggest a depressing thought: the sum of our intellects can be numerically represented, and this number can be replicated rather successfully. It’s a bit more comforting to think, looking at our ACT or SAT or MCAT or GMAT scores, that it’s all just a silly little game we have to play—that on a good day I could’ve done just as good as the girl with a 177 next to me (and I’m sure the guy with a 165 is thinking the same about me), that it’s just luck, that all of this testing business is rather senseless and unfair. If I were the truly spiritual sort I would even say I would’ve preferred a 167 or a 168, just as a demonstration that the sum total of me cannot be represented by a number (but I can’t honestly say that, because law school admissions has become such a crazy numbers game that two points worse—two more wrong questions—would probably doom my chances at all of the schools I want to go to).
Even more sobering: I probably took 25 timed practice tests of actual past LSATs, and if you averaged all my scores it comes to about 171.5. But if you averaged only the practice tests I took of LSATS since 2006, my average would be almost exactly at 170 (the test changed somewhat in the mid-2000s, in a way I found a bit more challenging). I probably took twelve post-2006 practice tests, and of those twelve scores nine were in between 169 and 172. I had only three outliers—a 166, a 167, and a 176. It appears inescapable—170 LSAT is written somewhere in my soul. I’m not dumb, maybe I’m kind of smart, but I’m no genius.
But hold on a minute. Replicability (the ability to generate identical results in repeated forms) and scalability (the ability to generate a hierarchy, which ideally follows a parabolic curve which narrows at the extremes) are prized attributes of all standardized tests, and particularly for the LSAT, which is supposedly entirely a measure of inherent ability. There is no actual material to learn for the LSAT—everything you see when you take the test will be new to you. It’s the equivalent of being handed a puzzle and told you have thirty-five minutes to solve it.
Fascinated with how replicable the LSAT is, I’ve done a little research on it, and standardized tests in particular. I’ve come to the conclusion that, for the LSAT at least, and probably for almost all standardized tests, replicability and scalability are actually the only prized attributes.
Here’s how I’ve come to figure it: the LSAT’s job is to make sure that me, or someone like me, takes that test and gets 10 questions wrong out of 100 (10 questions wrong is usually about what it takes to get a 170, depending on the curve). If I take it again in four months it wants to make sure I get 10 wrong again. That’s the replication part.
The scalability part wants to ensure that, if 1,000 people take the test, the scores fall along that parabolic curve. A 178 (roughly 3-4 questions wrong) is right at the edge of the 99.9 percentile, meaning that 99.9 percent of test-takers got a worse score than you—so in a group of 1,000 people, congratulations, you are the only one with a 178. Everybody else did worse. A 167 is at the 94.6 percentile, so in a group of 1,000 people you did better than 945 of them. A 150 is the average score, so it’s the fat part of the parabola. It slopes downward in both directions from there.
I’m not a statistician, but as I figure it the scalability part isn’t that hard to achieve, because you can manipulate the curve. (E.g., on the test I took, the June 2011 LSAT, you could get 11 wrong and still get a a 170, while on the December 2010 LSAT you could get a whopping 15 wrong and still get the same score). The only issue that could come up is a “lumpy” curve—where, for example, a bunch of people get 10 wrong and a bunch of people get 15 wrong, but very few get 11-14 wrong. You can adjust the curve to compensate, by just making the score gap between 10 wrong and 15 wrong very small, but actually dramatic irregularities like that never happen, because of the lengths the LSAT makers go to ensure replicability.
That’s where the handy “experimental section” comes in. These are unscored sections test-takers unknowingly serve as guinea pigs for, which will be used for future tests. We’ve all done one—they include such sections in every SAT, ACT, LSAT, and GMAT. These sections are universally despised, but if you think about, absolutely necessary. You can’t replicate actual testing conditions in any other way. One purpose of the experimental section is, of course, to gauge general difficulty—90% of people get a question right, and it’s considered easy. 30% get it right, and it’s very hard.
But one thing I discovered that most people probably don’t know—they gauge that difficulty not as it applies to the entire group, but rather very specific cohorts. Test-takers are split into groups depending on their overall performance, and questions are gauged according to how specific groups respond to them. The curve of the test actually reflects your performance not according to how you did compared to the entire population of test-takers, as the scoring table suggests, but rather on how you did compared to your own cohort. Thus, a curve might be very harsh for 170 scorers, but gentler on 160 scorers, or vice versa. (The experimental is also interesting, because supposedly it reveals bias—if female 170 scorers or African-American 170 scorers consistently get a certain question wrong more often than a white male 170 scorer, that score is tossed out. No idea if it works the other way around, too).
Okay, so I’m kind of rambling. I find this stuff interesting; many people may not.
The big hole I see in the LSAT is that, while the test, taken over and over again, had the amazing ability to force me to trip up 10 times every time I sat down to take the test, the way I tripped up changed dramatically every time. There are four sections on the LSAT: two Logical Reasoning sections, which are short passages which challenge your logical thinking, one Reading Comprehension section, almost identical to the ones we had on the SAT where there are four passages and 5-8 questions following each passage, and a Logic Games section, which gives you four game scenarios with unique rules (Judy can sit next to Tom, but not to Brian, etc.).
I started out thinking I was very strong in both Logical Reasoning and Reading, but a total mess at Games. Then I came across a few tests where, of the 10 questions I was getting wrong, seven of them were on the Reading Comprehension, which seemed like it had morphed from a cuddly bear to a nightmarish beast somewhere in 2007 (right at the same time I thought I had totally mastered Games). I occasionally got all of the Logical Reasoning questions on both sections right, only to get nailed by a Logic Game I couldn’t even begin to understand. And always the same score would blink out at me from the computer: 170. 169. 171.
At one level, this replicability is impressive, and meaningful. Someone who consistently gets only five wrong, in presumably different ways, simply has less chinks in his mental armor than me. At moments I consistently misread something, he or she reads it perfectly clearly. When I totally miss an abstract inference, he or she regularly thinks of it.
But the test has been designed so that this 175 scorer always or almost always finds their way through a moment of difficulty where I stumble, and that we both find our ways through where a 160 scorer stumbles. Actually, that’s ALL it is designed to do. It doesn’t care how we stumble, only that we stumble the appropriate number of times. The concepts tested are supposed to be critical to law school, but in manipulating the questions the only critical factor for the test-makers is that I always get 10 wrong, and Smarty-Pants over there always gets five wrong.
LSAT scores correlate better than anything else, including undergrad GPA, with first-year law school grades. It still only correlates at about a 16% rate though, meaning 16% of any first-year student body’s grades directly correlate with LSAT score. But this is just an accidental thing, IMO. Having an intellectual hierarchy of some sort, any sort, will allow you to find out something about intellectual capabilities. Ranking students according to chess skill or ability in Jeopardy would probably provide some kind of correlation as well.
All that said, I think standardized tests are fine. The alternative would be to just let spoiled rich kids in, after all. But many of us, Asians in particular, tend to really think the number tells us how smart a person is. It absolutely does not. (Well, don’t even get me started on the concept of “smartness.”) Being the only person out of a hundred who can solve a complicated puzzle is impressive, but is it as impressive if we realize the puzzle has been designed so that one person out of a hundred can solve it, that this is the only meaning of the puzzle’s existence? Wouldn’t it be more impressive if you were one of ten people to solve it, but the puzzle was actually based on a problem which has meaning, which you are likely to encounter in your professional journey, whose solution will bring benefits to you and everyone else?
So while I may be a little bummed out that LSAT apparently has me figured out, I think I’ll get over it. And I’m 100% sure I’m never taking the darn thing again.