Sunday, December 26, 2010

Can Grades Be Controlled in a "Fair" Way?

The New York Times has a not-bad story about one university's attempts to rationalize the grades its students receive. This is a much more subtle problem than most so-called conservative critics acknowledge. It would be easy to force a grading curve on courses above a certain minimum enrollment: just require faculty to turn in numerical grades, using whatever method they want that is monotonic (higher numbers representing better performance), and automatically curve the letter grades to achieve whatever proportions are deemed appropriate. The problem with doing this is, of course, that some courses attract better students than others. At Harvard, the students who enroll in Math 55 include a fair proportion of the very best undergraduate math students in the world; it would be unjust to give many of them Cs just on principle.

Many schools have experimented with "decorated transcripts" of various kinds, which give not only the student's grade but some information about the other grades given in the course. This makes the transcript more complicated, but certainly has transparency to recommend it. Still, it doesn't really tell the reader how good the competition was.

Harvard doesn't decorate transcripts, but for a long time it used to try to push faculty toward a norm that took the quality of students in a particular course into account. The dean would inform professors, for each course, of the difference between the average grade in that course and the average grade of that course's students in their other courses. That is, a negative number would tend to indicate that the course was grading harshly and a positive number that the course was grading generously. These numbers were always interesting to contemplate, and there were finer variants on the theme (for example, comparing the grades I give in CS courses to the grades my students were getting only in other CS courses, or only in other science courses). It was never clear, however, how faculty used the information. In particular I don't think anyone ever studied the time series: Did "harsh" professors raise their curve more or less than "soft" professors lowered theirs? Or perhaps that was studied, and that is the reason we aren't being given that information any more.

If students all took the all the same courses, a large part of the problem would go away. Every professor could grade using any kind of distribution, and GPAs, being averages of the same components, would fairly rank students. You would still have the problem of comparing a Physics B+ to an English A-, but the overall records could rationally be compared.

When you think about that, you realize that the problem resembles that of Bowl Championship Series rankings in college football. Teams don't have enough head to head meetings, or even enough common opponents, to make it easy to figure out who the top teams should be. There is some sound mathematics that could be focused on the ranking problem, but as in the case of the BCS algorithms, it would have one huge disadvantage: There would, in general, be no way to answer a question of why one student outranked another except to persuade someone that the algorithm was right and to assert that the relative standings were what fell out of the algorithm when it was applied to the mountains of data about grades.

I wrote a couple of chapters about all this in Excellence Without a Soul, but you can get a good sense of my position from this Morning Prayer talk I gave in the fall of 2003.


  1. The comnent "there would, in general, be no way to answer a question of why one student outranked another except to persuade someone that the algorithm was right ..." is the heart of the matter. Valen Johnson proposed such an algorithm and ranking system at Duke in 1997, he wrote about it in 2002 here:

    I was the one science professor that voted against it, largely because of the transparency issue.

  2. Thanks, Prof. Astrachan. Yes, that's right; Johnson later wrote an entire book on this subject and expressed some frustration with the Duke faculty that he couldn't persuade them that his scheme was correct. A better statistician than politician, it seems!

    I think Johnson's scheme did not spring from the same competitive model I am suggesting. I was imagining something like this. For every course and every grade, there is a separate standard that it represents. A student who gets a B in my course beats the B-in-my-course standard but loses to the B+-in-my-course standard and any standard higher than that. So the set of all students' grades in all courses is like a sparse football schedule. In some cases there are fairly direct comparisons that can be drawn (for example, with another student who beats the B+ standard in my course). In most cases comparison between students involves chains of inferences about "wins" and "losses" in several courses, since most students don't even take my course. There are quite a few papers about how to infer a serial ranking of all "teams" in case of such a sparse database of comparison points. But as far as I can see, just as in the BCS rankings, the only answer to most questions of the form "Why is A ranked more highly than B?" would be "That's the output of the algorithm when we run it on all this mass of data."

  3. Forgive me if this is naive, but it seems strange to me that we would have a discussion about controlling grades without explicitly discussing what it is that they are measuring. I don't believe I'm being pedantic when I ask what it means if I get an A in your course. Am I the smartest? Am I the most tenacious? Did I get lucky and accidentally study the right material?

    A letter on a page has nothing to say about these things, and I suppose that's what motivates corrective measures like the the so-called "decorated" transcripts you mentioned. But even that information is pretty paltry all things considered, and in the end I actually think that's most of the problem.

    For example, it's the problem that shows up when the public gets upset about the number of A's awarded to Harvard students, and again when the faculty have to discuss the best way to administer them. You expect the grade to describe the fact that Math 55 students are among the best math students in the worlds, but here they summarily fail: plain-paper grades come with no embedded comparison, and no set of assumptions or axioms. They don't fit the message Harvard wants to deliver, and probably by extension don't fit the message other schools want to deliver either.

    The other half of the problem is that many people believe things about grades that are misleading or that are harmful to students: the public does not understand why it would be permissible to assign such grades.

    So there seems to be two possible solutions: either you can add context to the grades, or you can establish some sort of consensus about what they mean. Neither seem to be satisfactory, and it makes me skeptical that it can be done at all.

  4. Thanks for the comment, Alex. FYI, here is the first paragraph of Chapter 6 of Excellence Without a Soul.

    "Amid all the furor over grade inflation, there has been little debate about why professors give grades in the first place. Critics thunder their criticisms with confidence, but after decades of thinking and talking about grading, there still is no consensus about what purpose they serve. Without agreement on goals, adherence to a consistent scale is impossible. Fixing the grading system would do nothing to improve undergraduate education, and unless professors agree that they will do their main business better if grades are held down, they will continue to give 'grade inflation' a backseat to more consequential educational problems."

  5. I respectfully disagree. This is long. Sorry. I just couldn’t help myself.

    I guess I'm a "conservative critic," but since I'm a registered Democrat who donated money to and volunteered for the Obama campaign, I'm conservative only relative to the far leftist tendencies of professors today. The issue of grade inflation is not at all subtle and not at all difficult to solve. In fact, the problem has been solved at a number of prominent institutions including Princeton, Wellesley, Boston U., and Reed.

    What happened at each of these institutions is that someone in a position of authority stated to the faculty that grades were no longer reflecting performance. The faculty then responded by adjusting their grades - particularly in the humanities and social sciences where grade inflation is rampant - so that mediocre achievement was no longer awarded the grade of A.

    It's easy to be glib and try to trivialize grades. But in fact, grades are an important part of prospective student evaluation in graduate schools and professional schools. Those schools, including Harvard's graduate and professional schools, believe that undergraduate grades are a meaningful predictor of future performance.

    Here's where we are with COFHE schools today (at least for the schools that have been generous and have provided me current data) in terms of average grading (and excluding Princeton):
    %A %B %C %D %F
    56.8, 36.5, 5.6, 0.6, 0.5

    I can guarantee that at those schools, excellence is not being produced 57 percent of the time on average. In the humanities, the grades are even higher. While there are certainly classes where students are generally motivated, bright, creative and working hard, on average that isn't the case. On average students are studying (total) about 15 hours a week give or take four hours.

    When grades are this high, students simply don't possess the motivation to work and take their studies seriously. And they don't. That's true on average at Harvard. That's true for almost all COFHE schools.

    That's where we are today. While there are still many classes where education is a serious affair, most classes lack intensity. The expectations are low. Students hardly study. They receive high grades for their wan performances.

    I fully expected that by now, grades would have reached a plateau at these schools. After all, how many A's can one professor hand out? I thought that sooner or later, probably at the 50 percent mark, we would have reached some sort of equilibrium. But the fact is that grades keep rising.

    Is grade inflation a trivial problem akin to, as you suggest, grading of eggs and maple syrup? I don't think so. When A's are handed out as freely as they are today, education suffers.

  6. fortyquestions,

    Thanks for this.I'm not actually a soft grader myself; what I object to is the obsession over the significance of grades. My perspective is distorted by the privilege I have had of teaching at Harvard, where I know that the student in the middle of my class would be a top student at a lot of other colleges. And where I know that much of the variance in grades in my courses is the result of nothing but preparation. I teach the undergrad CS theory course, which is largely math. So I have students with equal native talent and ambition spread over at least a two-year variation in math preparation--some arrive here prepared to take the course (or even skip it), others won't be ready until their junior year, and then only if they get on the right track with their first term course choices. They could all be fine by the time they graduate, but if they take the course as freshmen or sophomores, some will have As from me and some will have Cs, as a result of nothing but their socioeconomic background and what kind of precollege academic experience the luck of the draw dealt them. I give them the grades they deserve, but it drives me nuts that they get lined up at graduation as though the linear ordering of their GPAs had some larger significance.

    The computer science job marketplace seems to do a pretty good job ignoring grades and hiring people on the basis of what they can do when they graduate. Graduate schools don't usually work that way. High grades, in my experience, correlate better with success as a professor than success as an engineer, where, among my students, hard work and determination and adventuresomeness are also stronger correlates of success. So I don't like the fact that GPA at graduation is our principal metric for ranking students. But again, we are talking here about something that doesn't generalize very well--high grades coming out of a place that is very selective on the front end.

    Fundamentally what is going on here is that selective institutions haven't adjusted educationally to increased variance in the preparation level of the students they are admitting as they have diversified socioeconomically and geographically. By comparison with grade inflation, this is a more serious problem, but harder to identify because it is less easily quantified.

    I do agree that the grade compression in the humanities is worse and different. In some branches of the humanities there is no consensus about what excellence means. That is not the problem in the sciences, where the egg metaphor is not so silly--grades got higher over the years because the lower quality inputs got selected out.

    Final comments. First, I am not so sure that the leaders at the institutions you mention would all claim the success you have claimed on their behalf. And second, as long as students are competitive and there are any distinctions in grading at all, grades will be motivational. If you don't believe that I will send you some of the students just above the middle of my class who are fighting to turn their B or B+ into an A- or an A!

  7. Talk to those leaders. I'm not claiming anything they haven't claimed themselves. In contrast, a few weeks ago I met with a former president of a COFHE school that has done nothing about grading; he called the educational experience there "warm storage."

    Solving grade inflation is easy. I think what is happening at Harvard and elsewhere is not that solving grade inflation is hard, but that the faculty and students are both happy with the status quo where everyone pretends excellence is a common occurrence. It makes both professors and students feel good to pretend that everyone is a genius. I saw this delusional thinking at Stanford and Duke. At your school, people were happy with 90 percent of students graduating with honors until they suffered public embarrassment about it.

    I think, also, that you're confusing "fighting" for grades with kvetching about grades. There's a difference. And actually, a B+ student at Harvard is probably a B+ student at a lot of other places. That wasn't true in the 1950s or 1960s, but national grading patterns already take average student quality into account and have done so for about 30 years.

    That all said, the sciences nationally do not tend to have the problems with easy A's that are present elsewhere in the academy. Workloads tend to be higher as well. So your class is probably just fine and dandy. Walk over to the sociology and English departments, though, and it's a completely different story.

  8. And how has Harvard improved since we stopped giving honors to 90% of our students? Not to say that we shouldn't have made the change. But we shouldn't have made the change and then claimed to have done something educationally significant. And education is the business we are in.

    I agree that addressing grade inflation is easy. Just switch to a numerical grading scale and have the letter grades given out centrally in accordance with some curve. It's a nice question what to take into account when calculating the course's mapping from numbers to letters, but almost any result would be more rational than what we do now. The question is the same: and then what would we have accomplished educationally?

    I don't understand your comment about the 1950s vs. now, which seems to me backwards. In contrast with today, Harvard students were very ordinary people then, except socioeconomically. The most significant selection was self-selection; a large percentage of the applicants was admitted.

  9. In the 1950s, colleges graded about the same regardless of the average quality of the student body. In the 1980s, though, a fairly robust ad hoc grading system developed nationwide where the average grade at a school was strongly dependent on average student quality.

    It's fascinating to me that colleges started to do this on their own; but they definitely did so. As a result, a B+ at Harvard, which is a lousy grade (a real lousy grade in the humanities) , translates into a B+ at many other schools (where it can mean a good grade).

    One of the delusional statements I used to hear at Stanford and Duke was that "our B students would be A students anywhere else." That's simply not true.

    I agree that changing honors percentages at Harvard was a symbolic thing. It would have been better to have done something substantive. But that symbolism was an indication that something was inherently wrong. Leadership removed the symbolism. They did little else. As a result, you still have a school where on average, students don't work very hard and still receive high grades for their wan effort. That's certainly not true in every class. But it is true in most.

    It is possible that if the professorate at Harvard took their grading seriously and reserved A's for excellent performance, that nothing else would change educationally. But I don't think so. Students worked harder at Harvard and elsewhere in the past. Grades were tougher as well. Not all students worked harder it's true. But on average they did. I don't think that the two are unrelated. Run the experiment. Then we'll know.

    Right now at Harvard and elsewhere, the most common model for "education" is warm storage. We house 18-22 year olds in a safe environment akin to a summer camp. There's a sprinkling of learning, but mostly college is about making friends and having a good time before going into the workforce. Certainly there are many students that buck that model. But most don't. We can do better. We should do better.