Over at Rationally Speaking, there is a proposal to move towards using odds instead of probabilities when speaking of uncertainty.
I highly recommend this lucid discussion of the brain, free will and criminal responsibility from David Eagleman in The Atlantic.
Running experiments is sometimes a pain, but I'm not sure what to think of this site where one can outsource one's experiments. It's called 'EBay for science', but I worry about how one would trust that the experiments were done well.
What does your writing say about you? Quite a bit, it turns out.
Predictive policing - using statistics to determine the times and locations of future crimes.
Sunday, August 21, 2011
Saturday, August 20, 2011
Bayesian truth serum, grading and student evaluations
In one of my last posts, I examined some proposals for making university grading more equitable and less prone to grade inflation. Currently, professors are motivated to inflate grades because high grades correlate with high student evaluations, and these are often the only metrics of teaching effectiveness available. Is there a way to assess professors' teaching abilities independent of the subjective views of students? Similarly, is there a way to get students to provide more objective evaluation responses?
It turns out that one technique may be able to do both. Drazen Prelec, a behavioral economist at MIT, has a very interesting proposal for motivating a person to give truthful opinions in face of knowledge that his opinion is a minority view. In this technique, awesomely named "Bayesian truth serum"*, people give two pieces of information: the first is their honest opinion on the issue at hand, and the second is an estimate of how the respondent thinks other people will answer the first question.
How can this method tell if you are giving a truthful response? The algorithm assigns more points to responses to answers that are "surprisingly common", that is, answers that are more common that collectively predicted. For example, let's say you are being asked about which political candidate you support. A candidate who is chosen (in the first question) by 10% of the respondents, but only predicted as being chosen (the second question) by 5% of the respondents is a surprisingly common answer. This technique gets more true opinions because it is believed that people systematically believe that their own views are unique, and hence will underestimate the degree to which other people will predict their own true views.
But, you might reasonably say, people also believe that they represent reasonable and popular views. They are narcissists and believe that people will tend to believe what they themselves believe. It turns out that this is a corollary to the Bayesian truth serum. Let's say that you are evaluating beer (as I like to do), and let's also say that you're a big fan of Coors (I don't know why you would be, but for the sake of argument....) As a lover of Coors, you believe that most people like Coors, but feel you also recognize that you like Coors more than most people. Therefore, you adjust your actual estimate of Coors' popularity according to this belief, therefore underestimating the popularity of Coors in the population.
It also turns out that this same method can be used to identify experts. It turns out that people who have more meta-knowledge are also the people who provide the most reliable, unbiased ratings. Let's again go back to the beer tasting example. Let's say that there are certain characteristics of beer that might taste very good, but show poor beer brewing technique, say a lot of sweetness. Conversely, there can be some properties of a beer that are normal for a particular process, but seem strange to a novice, such as yeast sediment. An expert will know that too much sweetness is bad and the sediment is fine, and will also know that a novice won't know this. Hence, while the novice will believe that most people agree with his opinion, the expert will accurately predict the novice opinion.
So, what does this all have to do with grades and grade inflation? Glad you asked. Here, I propose two independent uses of BTS to help the grading problem:
1. Student work is evaluated by multiple graders, and the grade the student gets is the "surprisingly common" answer. This motivates graders to be more objective about the piece of work. We can also find the graders who are most expert by sorting them according to meta-knowledge. Of course, this is throwing more resources after grading in an already strained system.
2. When students evaluate the professor, they are also given BTS in an attempt to elicit an objective evaluation.
* When I become a rock star, this will be my band name.
It turns out that one technique may be able to do both. Drazen Prelec, a behavioral economist at MIT, has a very interesting proposal for motivating a person to give truthful opinions in face of knowledge that his opinion is a minority view. In this technique, awesomely named "Bayesian truth serum"*, people give two pieces of information: the first is their honest opinion on the issue at hand, and the second is an estimate of how the respondent thinks other people will answer the first question.
How can this method tell if you are giving a truthful response? The algorithm assigns more points to responses to answers that are "surprisingly common", that is, answers that are more common that collectively predicted. For example, let's say you are being asked about which political candidate you support. A candidate who is chosen (in the first question) by 10% of the respondents, but only predicted as being chosen (the second question) by 5% of the respondents is a surprisingly common answer. This technique gets more true opinions because it is believed that people systematically believe that their own views are unique, and hence will underestimate the degree to which other people will predict their own true views.
But, you might reasonably say, people also believe that they represent reasonable and popular views. They are narcissists and believe that people will tend to believe what they themselves believe. It turns out that this is a corollary to the Bayesian truth serum. Let's say that you are evaluating beer (as I like to do), and let's also say that you're a big fan of Coors (I don't know why you would be, but for the sake of argument....) As a lover of Coors, you believe that most people like Coors, but feel you also recognize that you like Coors more than most people. Therefore, you adjust your actual estimate of Coors' popularity according to this belief, therefore underestimating the popularity of Coors in the population.
It also turns out that this same method can be used to identify experts. It turns out that people who have more meta-knowledge are also the people who provide the most reliable, unbiased ratings. Let's again go back to the beer tasting example. Let's say that there are certain characteristics of beer that might taste very good, but show poor beer brewing technique, say a lot of sweetness. Conversely, there can be some properties of a beer that are normal for a particular process, but seem strange to a novice, such as yeast sediment. An expert will know that too much sweetness is bad and the sediment is fine, and will also know that a novice won't know this. Hence, while the novice will believe that most people agree with his opinion, the expert will accurately predict the novice opinion.
So, what does this all have to do with grades and grade inflation? Glad you asked. Here, I propose two independent uses of BTS to help the grading problem:
1. Student work is evaluated by multiple graders, and the grade the student gets is the "surprisingly common" answer. This motivates graders to be more objective about the piece of work. We can also find the graders who are most expert by sorting them according to meta-knowledge. Of course, this is throwing more resources after grading in an already strained system.
2. When students evaluate the professor, they are also given BTS in an attempt to elicit an objective evaluation.
* When I become a rock star, this will be my band name.
Thursday, August 18, 2011
Soundbites!
Mmmmm.... useful math.
It's unfortunately easy to get people to falsely confess to crimes, reviewed in The Economist.
Retractions are up in scientific journals. Now there is a blog devoted to 'em.
Completely awesome matrix of how people on various rungs of the scientific ladder view one another.
Only 4% of the public can name a living scientist. Time for a scientific re-branding.
It's unfortunately easy to get people to falsely confess to crimes, reviewed in The Economist.
Retractions are up in scientific journals. Now there is a blog devoted to 'em.
Completely awesome matrix of how people on various rungs of the scientific ladder view one another.
Only 4% of the public can name a living scientist. Time for a scientific re-branding.
Monday, August 8, 2011
A solution to grade inflation?
In the somewhat limited teaching experience I've had, I have found grading to be particularly difficult. The grade a student receives in my class can determine whether he'll get or keep scholarships and will play a role in determining what kinds of opportunities he'll have after my class. This is a huge responsibility. As a psychophysicist, I worry about my grade-regrade reliability (will I grade the same paper the same way twice), order effects in my grading (if I read a particularly good paper, do all papers after it seem to not measure up?), and whether personal bias is affecting my scoring (Sally is always attentive and asks good questions in class, while Jane, if present, is pugnacious and disruptive).
Of course, the easiest thing is to give everyone generally good grades. The students won't argue that they don't deserve them, and in fact, there is evidence that they'll evaluate me better for it in the end.
And while many institutions have (implicitly or explicitly) adopted this strategy, the problem with grade inflation is that it hurts students who are performing at the top level, and removes accountability from our educational system. So, what do we do about grading?
The Chronicle of Higher Education has an interesting article showing two possible solutions. The second solution involves AI-based grading, which sounds intriguing. Unfortunately, no details were provided for how (or how well) it works, so I remain skeptical. However, the first proposed solution merits some discussion: outsource grading to adjunct professors who are independent of the course, professor and students. The article follows an online university that has enacted this strategy.
Pros of this idea:
- As the grader is not attached to either the professor or the student, bias based on personal feelings towards a student can be eliminated.
- In this instantiation, graders are required to submit detailed justifications for their grades, are provided extensive training and are periodically calibrated for consistency. This can provide far more objective grading than what we do in the traditional classroom.
However, the idea is not perfect. Here are some cons that I see:
- The graders' grades get translated into pass or fail. A pass/fail system does not encourage excellence, original thinking, or going beyond the material given.
- Much of traditional grading is based on improvement and growth over a semester, and this is necessarily absent in this system. Honestly, I only passed the second semester of introductory chemistry in college (after failing the first test) because the professor made an agreement with me that if I improved on subsequent tests, she would drop the first grade.
- Similarly, the relationship between professor and student is made personal through individualized feedback on assignments. Outsourcing grading means that there cannot be a deep, intellectual relationship between parties, which I believe is essential to learning and personal growth.
While not perfect, this is an interesting idea. What are your ideas for improving on it (or grading in general)?
Of course, the easiest thing is to give everyone generally good grades. The students won't argue that they don't deserve them, and in fact, there is evidence that they'll evaluate me better for it in the end.
And while many institutions have (implicitly or explicitly) adopted this strategy, the problem with grade inflation is that it hurts students who are performing at the top level, and removes accountability from our educational system. So, what do we do about grading?
The Chronicle of Higher Education has an interesting article showing two possible solutions. The second solution involves AI-based grading, which sounds intriguing. Unfortunately, no details were provided for how (or how well) it works, so I remain skeptical. However, the first proposed solution merits some discussion: outsource grading to adjunct professors who are independent of the course, professor and students. The article follows an online university that has enacted this strategy.
Pros of this idea:
- As the grader is not attached to either the professor or the student, bias based on personal feelings towards a student can be eliminated.
- In this instantiation, graders are required to submit detailed justifications for their grades, are provided extensive training and are periodically calibrated for consistency. This can provide far more objective grading than what we do in the traditional classroom.
However, the idea is not perfect. Here are some cons that I see:
- The graders' grades get translated into pass or fail. A pass/fail system does not encourage excellence, original thinking, or going beyond the material given.
- Much of traditional grading is based on improvement and growth over a semester, and this is necessarily absent in this system. Honestly, I only passed the second semester of introductory chemistry in college (after failing the first test) because the professor made an agreement with me that if I improved on subsequent tests, she would drop the first grade.
- Similarly, the relationship between professor and student is made personal through individualized feedback on assignments. Outsourcing grading means that there cannot be a deep, intellectual relationship between parties, which I believe is essential to learning and personal growth.
While not perfect, this is an interesting idea. What are your ideas for improving on it (or grading in general)?
Sunday, August 7, 2011
What is it about music?
I can't get enough of this song. I saw Gillian perform this in 2005 or 2006, and I'm so happy that she's finally put it on an album. Why have I played this song over 20 times in one day? Of course, there are technical aspects of it that are neat (I particularly like the frequent dissonances that resolve in Rawlings' guitar line), and the lyrics remind me of the time in my life when I first heard the song, but are these alone enough to produce such strong emotional reactions? Why does music give us chills? Why does it freak us out?
Indeed, music seems to activate the neural reward system, and certainly there are no lack of hand-wavy evolutionary psychology theories on music's emotional pull. But why does music make us feel things?
We have several touch/feeling metaphors for music: a person's voice can be "rough" or "velvet", musical passages may be "light" or "heavy, and pitches can be "rising" or "falling". Given this mapping, can we find cross-modal effects of music and feeling? One type of cross-modal effect is synaesthesia, where two senses are correlated in the same person. To a synaesthete, letters can have color, tastes can have shape, etc. The most common form of synaesthesia related to music is "colored music". Can we find evidence for "touched music"?
This paper is the reason I would love to be a psychologist in the 1950s. Here, the goal was to see which combinations of senses could be combined in synaesthesia, either in naturally occurring synaesthesia or in (I can't make this up) mescaline-induced synaesthesia. Here is the summary matrix of their results:
(An "N" in a cell represents a naturally occurring synaesthesia, and an "E" in a cell represents an "experimentally induced" synaestesia through mescaline).
So, who were the participants in this study?? The two authors and their two friends, one of whom was a natural music-tactile synaesthete. (See, I told you psychology was fun in the 1950s!)
This synaesthete described her experiences: "A trumpet sound is the feel of some sort of plastics; like touching certain sorts of stiffish plastic cloth - smooth and shiny - I felt it slipping." Sounds interesting, but does not sound like the "chilling", emotional experience.
It turns out that musical chills, while having a strong physiological basis, are not automatic but rather require attention. This implies that music and somatosensory (touch) systems are not necessarily linked.
So, no real answers in this post, but there's one other cool piece of data that I'll throw into the mix. So, if you ask people to assign colors to both emotions and music, people (normal, non-synaesthetes) are ridiculously similar in the colors chosen. Of course, this was presented at a conference and is not in final, peer-reviewed form, but it is certainly interesting.
Indeed, music seems to activate the neural reward system, and certainly there are no lack of hand-wavy evolutionary psychology theories on music's emotional pull. But why does music make us feel things?
We have several touch/feeling metaphors for music: a person's voice can be "rough" or "velvet", musical passages may be "light" or "heavy, and pitches can be "rising" or "falling". Given this mapping, can we find cross-modal effects of music and feeling? One type of cross-modal effect is synaesthesia, where two senses are correlated in the same person. To a synaesthete, letters can have color, tastes can have shape, etc. The most common form of synaesthesia related to music is "colored music". Can we find evidence for "touched music"?
This paper is the reason I would love to be a psychologist in the 1950s. Here, the goal was to see which combinations of senses could be combined in synaesthesia, either in naturally occurring synaesthesia or in (I can't make this up) mescaline-induced synaesthesia. Here is the summary matrix of their results:
(An "N" in a cell represents a naturally occurring synaesthesia, and an "E" in a cell represents an "experimentally induced" synaestesia through mescaline).
So, who were the participants in this study?? The two authors and their two friends, one of whom was a natural music-tactile synaesthete. (See, I told you psychology was fun in the 1950s!)
This synaesthete described her experiences: "A trumpet sound is the feel of some sort of plastics; like touching certain sorts of stiffish plastic cloth - smooth and shiny - I felt it slipping." Sounds interesting, but does not sound like the "chilling", emotional experience.
It turns out that musical chills, while having a strong physiological basis, are not automatic but rather require attention. This implies that music and somatosensory (touch) systems are not necessarily linked.
So, no real answers in this post, but there's one other cool piece of data that I'll throw into the mix. So, if you ask people to assign colors to both emotions and music, people (normal, non-synaesthetes) are ridiculously similar in the colors chosen. Of course, this was presented at a conference and is not in final, peer-reviewed form, but it is certainly interesting.
Saturday, August 6, 2011
Saturday soundbites: 6 RBI edition
Is Google the closest thing we have to mind reading?
Mind Hacks points to a great article on blind mathematicians. It turns out that they tend to be talented in geometry.
This week in neuro-nonsense, dieting causes your brain to eat itself.
Neuroskeptic describes a cool new proposed antidepressant that selectively modifies gene expression.
Google and Microsoft are launching new biobliometric tools, according to Nature.
Over at the Frontal Cortex, Jonah Lehrer reviews a study telling us that the Flynn effect is about more than just bringing up the lower part of the IQ distribution through nutrition, de-leading, etc. The top 5% are also getting smarter.
How getting tenure actually does change your life.
Can we predict which soldiers are going to develop PTSD? Should we?
Mind Hacks points to a great article on blind mathematicians. It turns out that they tend to be talented in geometry.
This week in neuro-nonsense, dieting causes your brain to eat itself.
Neuroskeptic describes a cool new proposed antidepressant that selectively modifies gene expression.
Google and Microsoft are launching new biobliometric tools, according to Nature.
Over at the Frontal Cortex, Jonah Lehrer reviews a study telling us that the Flynn effect is about more than just bringing up the lower part of the IQ distribution through nutrition, de-leading, etc. The top 5% are also getting smarter.
How getting tenure actually does change your life.
Can we predict which soldiers are going to develop PTSD? Should we?
Friday, August 5, 2011
Proposed changes to IRBs
Institutional review boards (IRBs) are committees formed within universities and research organizations. Their job is to review proposed research that uses human subjects, evaluating it for ethical treatment of the human participants. It's an important job given the rather spotty history we have with ethical research (see here, here and here among others).
However, there is a wide range of activities that count as human subjects research, ranging from experimental vaccine trials to personality tests, from political opinions to tests of color vision. Currently, all of this research is broken up into two groups: "regular" human subjects research, which is subject to a full review process and "minimal risk" research, which is subject to a faster review process. Research is defined as minimal risk when it poses no more potential for physical or psychological harm than any other activity in daily life.
My research falls into the minimal risk category. My experiments have been described by several subjects as being "like the world's most boring video game". Outside of being boring, they are not physically harmful, and there is no exposure of deep psychological secrets either. No matter. Each year, researchers like me fill out extensive protocols detailing the types of experiments they propose to do, detailing all possible risks, outlining how subject confidentiality will be maintained, etc. And each participant in a study (each time s/he participates) receives a 3-4 page legal document explaining all of the risks and benefits of the research, which the subject signs to give his consent.
This does seem to be overkill for research which really doesn't pose any sort of physical or psychological threat to participants, and I applaud new efforts to modernize and streamline this process. (Read here for a great summary of the details. Researchers: you can comment until the end of September, the Department of Health and Human Services is soliciting opinions on a bunch of things).
Among the changes are moving minimal risk research from expedited review to no review, and eliminating the need for physical consent forms (a verbal "is this OK with you?" will suffice). These are both good things that would improve my life substantially. However, I believe that standardizing IRB policies across the country would do the most good.
I am currently at my 4th institution and have seen as many IRBs. Two of them have been entirely reasonable, requiring the minimal amount of paperwork and approving minimal risk research across the board. The other two, however, have been less helpful. As Tal Yarkoni points out, "IRB analysts have an incentive to be pedantic (since they rarely lose their jobs if they ask for too much detail, but could be liable if they give too much leeway and something bad happens)". However, I think it goes beyond this. In some sense, IRBs feel they are productive by showing that they have stopped or delayed some proportion of the research that crosses their desks.
I have had an IRB reject my protocol because they didn't like my margin size, didn't like my font size, and didn't like the cute cartoon I put on my recruitment posters (apparently cartoons are coercive). I've had an IRB send an electrician into the lab with a volt meter to make sure my computer monitor wouldn't electrocute anyone. My last institution did not approve an experiment that was a cornerstone of my fellowship proposal as it required data to be gathered online (this is very common in my field) and I couldn't guarantee that someone outside of my approved age range (18-50) was doing my experiment. Under the current rules, I couldn't just use my collaborator's IRB approval as all institutions need to approve a protocol. However, another of the proposed changes will require only one approval.
I'm very optimistic about these proposed changes... let's hope they happen!
However, there is a wide range of activities that count as human subjects research, ranging from experimental vaccine trials to personality tests, from political opinions to tests of color vision. Currently, all of this research is broken up into two groups: "regular" human subjects research, which is subject to a full review process and "minimal risk" research, which is subject to a faster review process. Research is defined as minimal risk when it poses no more potential for physical or psychological harm than any other activity in daily life.
My research falls into the minimal risk category. My experiments have been described by several subjects as being "like the world's most boring video game". Outside of being boring, they are not physically harmful, and there is no exposure of deep psychological secrets either. No matter. Each year, researchers like me fill out extensive protocols detailing the types of experiments they propose to do, detailing all possible risks, outlining how subject confidentiality will be maintained, etc. And each participant in a study (each time s/he participates) receives a 3-4 page legal document explaining all of the risks and benefits of the research, which the subject signs to give his consent.
This does seem to be overkill for research which really doesn't pose any sort of physical or psychological threat to participants, and I applaud new efforts to modernize and streamline this process. (Read here for a great summary of the details. Researchers: you can comment until the end of September, the Department of Health and Human Services is soliciting opinions on a bunch of things).
Among the changes are moving minimal risk research from expedited review to no review, and eliminating the need for physical consent forms (a verbal "is this OK with you?" will suffice). These are both good things that would improve my life substantially. However, I believe that standardizing IRB policies across the country would do the most good.
I am currently at my 4th institution and have seen as many IRBs. Two of them have been entirely reasonable, requiring the minimal amount of paperwork and approving minimal risk research across the board. The other two, however, have been less helpful. As Tal Yarkoni points out, "IRB analysts have an incentive to be pedantic (since they rarely lose their jobs if they ask for too much detail, but could be liable if they give too much leeway and something bad happens)". However, I think it goes beyond this. In some sense, IRBs feel they are productive by showing that they have stopped or delayed some proportion of the research that crosses their desks.
I have had an IRB reject my protocol because they didn't like my margin size, didn't like my font size, and didn't like the cute cartoon I put on my recruitment posters (apparently cartoons are coercive). I've had an IRB send an electrician into the lab with a volt meter to make sure my computer monitor wouldn't electrocute anyone. My last institution did not approve an experiment that was a cornerstone of my fellowship proposal as it required data to be gathered online (this is very common in my field) and I couldn't guarantee that someone outside of my approved age range (18-50) was doing my experiment. Under the current rules, I couldn't just use my collaborator's IRB approval as all institutions need to approve a protocol. However, another of the proposed changes will require only one approval.
I'm very optimistic about these proposed changes... let's hope they happen!
Thursday, August 4, 2011
Fun graphics
Hasn't it been a long week? Instead of serious science, how about some pretty pictures?
We must all be getting way smarter than even the Flynn effect would predict - check out grade inflation at universities over the last 90 years.
Yes, this is really how science is done.
We must all be getting way smarter than even the Flynn effect would predict - check out grade inflation at universities over the last 90 years.
Yes, this is really how science is done.
Subscribe to:
Posts (Atom)