NeurRealism: Bayesian truth serum, grading and student evaluations

Saturday, August 20, 2011

Bayesian truth serum, grading and student evaluations

In one of my last posts, I examined some proposals for making university grading more equitable and less prone to grade inflation. Currently, professors are motivated to inflate grades because high grades correlate with high student evaluations, and these are often the only metrics of teaching effectiveness available. Is there a way to assess professors' teaching abilities independent of the subjective views of students? Similarly, is there a way to get students to provide more objective evaluation responses?

It turns out that one technique may be able to do both. Drazen Prelec, a behavioral economist at MIT, has a very interesting proposal for motivating a person to give truthful opinions in face of knowledge that his opinion is a minority view. In this technique, awesomely named "Bayesian truth serum"*, people give two pieces of information: the first is their honest opinion on the issue at hand, and the second is an estimate of how the respondent thinks other people will answer the first question.

How can this method tell if you are giving a truthful response? The algorithm assigns more points to responses to answers that are "surprisingly common", that is, answers that are more common that collectively predicted. For example, let's say you are being asked about which political candidate you support. A candidate who is chosen (in the first question) by 10% of the respondents, but only predicted as being chosen (the second question) by 5% of the respondents is a surprisingly common answer. This technique gets more true opinions because it is believed that people systematically believe that their own views are unique, and hence will underestimate the degree to which other people will predict their own true views.

But, you might reasonably say, people also believe that they represent reasonable and popular views. They are narcissists and believe that people will tend to believe what they themselves believe. It turns out that this is a corollary to the Bayesian truth serum. Let's say that you are evaluating beer (as I like to do), and let's also say that you're a big fan of Coors (I don't know why you would be, but for the sake of argument....) As a lover of Coors, you believe that most people like Coors, but feel you also recognize that you like Coors more than most people. Therefore, you adjust your actual estimate of Coors' popularity according to this belief, therefore underestimating the popularity of Coors in the population.

It also turns out that this same method can be used to identify experts. It turns out that people who have more meta-knowledge are also the people who provide the most reliable, unbiased ratings. Let's again go back to the beer tasting example. Let's say that there are certain characteristics of beer that might taste very good, but show poor beer brewing technique, say a lot of sweetness. Conversely, there can be some properties of a beer that are normal for a particular process, but seem strange to a novice, such as yeast sediment. An expert will know that too much sweetness is bad and the sediment is fine, and will also know that a novice won't know this. Hence, while the novice will believe that most people agree with his opinion, the expert will accurately predict the novice opinion.

So, what does this all have to do with grades and grade inflation? Glad you asked. Here, I propose two independent uses of BTS to help the grading problem:

1. Student work is evaluated by multiple graders, and the grade the student gets is the "surprisingly common" answer. This motivates graders to be more objective about the piece of work. We can also find the graders who are most expert by sorting them according to meta-knowledge. Of course, this is throwing more resources after grading in an already strained system.

2. When students evaluate the professor, they are also given BTS in an attempt to elicit an objective evaluation.

* When I become a rock star, this will be my band name.

4 comments:

AnonymousAugust 21, 2011 at 2:31 AM
I think there might be a flaw in this use of BTS. In my experience as a TA, I got the impression that it was common to skim over students' work for the correct buzzwords. However on closer reading it was often clear that the buzzwords were used incorrectly and the student lacked basic understanding of the question. Thus a careful grader would predict correctly that other graders would give a higher grade than actually deserved. It seems to me that the higher grade would be a "surprisingly common answer" and yet be incorrect, but maybe there is something I am missing. It might also be possible to adjust in some way for time spent by each grader.

Sorry if this is a double post, but after initial submission nothing showed up.
ReplyDelete
Replies
Michelle GreeneAugust 21, 2011 at 2:13 PM
I have had that experience as a grader, too. It seems I've been a bit unclear: there can be two ways of 'scoring' BTS to get two different types of information. The first way is looking for the "surprisingly common" answers. This will tell you whether the answer the respondent is giving you is his/her own true opinion. The second is to identify experts in cases where the ground truth can't be known. Here, people with the best meta-knowledge get the most points.

You are indeed correct that in the case you describe, the surprisingly common answer would be incorrect. I'm proposing that you find the best graders by finding the graders who know that although other graders will give this item full credit that the student did not demonstrate full understanding.

I haven't convinced myself that it will work in all cases. Consider a very easy test where all students perform very well and are evaluated well by both lazy and careful TAs. Here, there would be little predictive power for determining the best graders.
ReplyDelete
Replies
DAugust 24, 2011 at 9:59 AM
Student evaluations tend to be a farce. I agree with this use for evaluations. However, the one drawback is that if you are trying to detect "experts" the evaluations are no longer anonymous.

It will never happen with grading.

I still think BTS is a better name for an album.
ReplyDelete
Replies
Naveed MughalJuly 9, 2018 at 9:43 AM
Aniket Singh has launched numerous scholarship programmes for underprivileged students at IIT-Madras. Determined to give back to the country and alma mater that has provided him with so much, Aniket has launched the scholarship which helps students cover the costs of tuition and accommodation that they will incur during their four years on campus.
You can read more about the scholarship at :
https://www.aninews.in/news/business/business/aniket-singh-launches-scholarship-programme-for-underprivileged-students-at-iit-madras201804181835480003/
Also visit him at : Aniket Singh
Check out his book at : Intern Abroad This Summera
ReplyDelete
Replies

Add comment