The New York Times has a not-bad story about one university's attempts to rationalize the grades its students receive. This is a much more subtle problem than most so-called conservative critics acknowledge. It would be easy to force a grading curve on courses above a certain minimum enrollment: just require faculty to turn in numerical grades, using whatever method they want that is monotonic (higher numbers representing better performance), and automatically curve the letter grades to achieve whatever proportions are deemed appropriate. The problem with doing this is, of course, that some courses attract better students than others. At Harvard, the students who enroll in Math 55 include a fair proportion of the very best undergraduate math students in the world; it would be unjust to give many of them Cs just on principle.
Many schools have experimented with "decorated transcripts" of various kinds, which give not only the student's grade but some information about the other grades given in the course. This makes the transcript more complicated, but certainly has transparency to recommend it. Still, it doesn't really tell the reader how good the competition was.
Harvard doesn't decorate transcripts, but for a long time it used to try to push faculty toward a norm that took the quality of students in a particular course into account. The dean would inform professors, for each course, of the difference between the average grade in that course and the average grade of that course's students in their other courses. That is, a negative number would tend to indicate that the course was grading harshly and a positive number that the course was grading generously. These numbers were always interesting to contemplate, and there were finer variants on the theme (for example, comparing the grades I give in CS courses to the grades my students were getting only in other CS courses, or only in other science courses). It was never clear, however, how faculty used the information. In particular I don't think anyone ever studied the time series: Did "harsh" professors raise their curve more or less than "soft" professors lowered theirs? Or perhaps that was studied, and that is the reason we aren't being given that information any more.
If students all took the all the same courses, a large part of the problem would go away. Every professor could grade using any kind of distribution, and GPAs, being averages of the same components, would fairly rank students. You would still have the problem of comparing a Physics B+ to an English A-, but the overall records could rationally be compared.
When you think about that, you realize that the problem resembles that of Bowl Championship Series rankings in college football. Teams don't have enough head to head meetings, or even enough common opponents, to make it easy to figure out who the top teams should be. There is some sound mathematics that could be focused on the ranking problem, but as in the case of the BCS algorithms, it would have one huge disadvantage: There would, in general, be no way to answer a question of why one student outranked another except to persuade someone that the algorithm was right and to assert that the relative standings were what fell out of the algorithm when it was applied to the mountains of data about grades.
I wrote a couple of chapters about all this in Excellence Without a Soul, but you can get a good sense of my position from this Morning Prayer talk I gave in the fall of 2003.