Wednesday, September 29, 2010

Where does bad fMRI science writing come from?

A perennial favorite topic on science blogs is the examination of badly designed, or badly interpreted fMRI data. However, little time is spent on why there is so much of this material to blog about! Here, I’m listing a few reasons why mistakes in experimental design and study interpretation are so common.

Problems in experimental design and analysis
These are problems in the design, execution and analysis of fMRI papers.

Reason 1: Statistical reasoning is not intuitive
Already, I have mentioned non-independence error in the context of clinical trial design. In fMRI, researchers often ask questions about activity in certain brain areas. Sometimes these areas are anatomically defined (such as early visual areas), but more often they are functionally defined, meaning they are areas that cannot be distinguished from surrounding areas by the physical appearance of the structure, but are rather defined by responding more to one type of stimulation than another. One of the most famous functionally defined areas is the Fusiform Face Area (FFA), which responds more to faces than objects. Non-independence error often comes from these functionally defined areas. It is completely kosher to run a localizer scan containing examples of stimuli known to drive your area and stimuli known to contrast with it (faces and objects, in the case of the FFA), and then run a separate experimental block containing whatever experimental stimuli you want. Then, when you analyze your data, you test your experimental hypothesis using the voxels (“volumetric pixels”) defined by your localizer scan. What is not acceptable is to run one long block that defines your region of interest in the context of the experimental block. 

A separate, but frequent error in fMRI data analysis is the failure to correct for multiple comparisons. There are hundreds of thousands of voxels in the brain, so it is probable that high activation in any particular voxel could be due to random chance. Making this point in a memorable way was Craig Bennett and colleagues who found a 3-voxel sized area in the brain of a dead salmon that responded to photographs of emotional situations. Of course, the deceased fish was not thinking about highly complex human emotions, the area was due to chance.

Now, it is all too easy to read about these problems and feel very smug about the retrospectively obvious. But it’s not that these researchers are misleading or dumb. But the non-independence problem stated another way is “we found voxels in the brain that responded to X, and then correlated this activation with Y”. Part of the controversy surrounding “voodoo correlations” surrounds the fact that, intuitively, there doesn’t seem to be much difference between correct and incorrect data analysis. Another important factor affecting the persistence of incorrect analysis is the fact that statistical reasoning is not intuitive, and that our intuitions have systematic biases.

Reason 2: It is both too easy and too hard to analyze fMRI data
There are many steps to fMRI data analysis, and there are several software packages available to do this, both free and non-free. Data fresh out of the scanner need to be pre-processed before any data analysis takes place. This pre-processing takes out small movements made by subjects, smoothes the data to take out noise, and often warps each individual’s brain to a standard brain. For brevity, I will refer the reader to this excellent synopsis of fMRI data analysis. The problem is that it is altogether too easy to “go through the motions” of data analysis without understanding how decisions made about various parameters affect the result. And although there is wide consensus about the statistical parameters used by analysis packages, this paper shows that differences in statistical decisions made by software developers have big effects in the overall results of the study. It is, in other words, too easy to go through analysis motions that are too hard to understand.

Problems in stretching the conclusions of studies
In contrast, these are problems in the translation of a scientific study to the general public.

Reason 3: Academics are under pressure to publish sexy work
As I discussed earlier, academia is a very competitive, and it is widely believed that fMRI publications have higher impact in hiring and tenure decisions than do behavioral studies. (Note, I have not found evidence of this, but it seems like someone should have computed it). Sexy fMRI work makes great sound-bites for university donors. (See “neuro-babble and brain-porn”). Here, slight exaggerations of the conclusions may be formed (and noisy peer review does not catch it).

Reason 4: Journals compete with one another for the sexiest papers
Science and Nature each have manuscript acceptance rates below 10%. If we assume that more than 10% of all papers submitted to these journals have sufficient quality to be accepted, then it is likely that some other selection criteria is being applied during the editorial process, such as novelty. It is also of note that these journals have word limits of < 2000 words, making it impossible to fully describe experimental techniques. Collectively, these situations make it possible for charismatically expressed papers with dubious methods to be accepted.

Reason 5: Pressure on the press to over-state scientific findings
Even for well-designed, well-analyzed and well-written papers, things can get tricky in the translation to the press. Part of the problem is the fact that many scientists are not completely effective communicators. But equally problematic is the pressure placed on journalists to express every study as a revolution or break-through. The truth is that almost no published papers will turn out to be revolutionary in the fullness of time; science works very slowly. Journalists perceive that their audience would rather hear about the newly discovered “neural hate circuit” or the “100% accurate Alzheimer’s disease test” than the moderate-strength statistical association found between a particular brain area and a behavioral measure.

Reason 6: Brain porn and Neuro-babble
(I would briefly like to thank Chris Chabris and Dan Simons for putting these terms into the lexicon).  “Brain porn” refers to the colored-blob-on-a-brain style photographs that are nearly ubiquitous in popular science writing. “Neuro-babble” is often a consequence of brain porn: when viewing such a science-y picture, one’s threshold for accepting crap explanations is dramatically lowered. There have been two laboratory demonstrations of this general effect. In one study, a bad scientific explanation was either presented to participants by itself, or with one of two graphics: a bar graph or a blobby brain scan image. Participants who viewed the brain, but not the bar graph or no image were more likely to say that the explanation made sense. In the other, a bad scientific explanation was given to subjects, either alone or with the preface “brain scans show that….”. Non-scientist participants as well as neuroscience graduate students were more likely to rank the prefaced bad explanations as better, even though the logic was equally un-compelling in both cases. These should serve as cautionary tales for the thinking public.

Reason 7: Truthiness
When we are presented with what we want to believe, it is much harder to look at the world with the same skeptical glasses.

Monday, September 27, 2010

Cognitive enhancers: leveling the playing field?

A hot topic in neuroethics surrounds the use of drugs to enhance human intelligence.  Outside of the two most obvious issues (‘do they work?’ and ‘are they safe?’), ethicists have expressed concern over the possible unequal distribution of enhancers. Consider this statement from the President’s Council for Bioethics’ report on human enhancement (“scare quotes” are original):
“The issue of distributive justice is more important than the issue of unfairness in competitive activities, especially if there are systemic disparities between those who will and those who won't have access to the powers of biotechnical "improvement." Should these capabilities arrive, we may face severe aggravations of existing "unfairnesses" in the "game of life," especially if people who need certain agents to treat serious illness cannot get them while other people can enjoy them for less urgent or even dubious purposes. If, as is now often the case with expensive medical care, only the wealthy and privileged will be able to gain easy access to costly enhancing technologies, we might expect to see an ever-widening gap between "the best and the brightest" and the rest. The emergence of a biotechnologically improved "aristocracy"-augmenting the already cognitively stratified structure of American society-is indeed a worrisome possibility, and there is nothing in our current way of doing business that works against it.”
Is this a realistic worry? One counter to this argument, as pointed out by Anjan Chatterjee is that distributive justice is only a problem for cases where there are clear winners and losers (i.e. in a zero-sum game situation). Performance enhancing drugs are a big deal in sports because there is one winner and one loser. In other cases, enhancement of some can benefit more than just the enhanced individual, such as the case of subtle anti-theft devices such as LoJack: fewer cars are stolen in neighborhoods where LoJack use is higher because would-be thieves cannot tell which cars have the system, and will therefore look in LoJack free neighborhoods for victims.

Beyond the question of whether our current economic environment spawns competitive, zero-sum situations, we need to ask whether putative cognitive enhancing drugs provide the same effect size in all those that try them. Interestingly, a study by Mehta and colleagues in 2000 demonstrated that the degree of cognitive improvement with Ritalin in a spatial working memory task was negatively correlated with baseline working memory capacity at r = -0.78! This means that over 60% of the variance in the enhancing abilities of Ritalin could be explained by the subjects’ working memory capacities: those with lower working memories got more benefit than those with higher capacities. Similar results have been found in other studies using dextroamphetamine, and bromocriptine (a drug used to treat pituitary tumors and Parkinson’s disease).

Why is this effect not being mentioned in the ethical debate over cognitive enhancements? Partially, it is because many experiments are designed not to show the effect. The “lab rat” for most academic research is the undergraduate student. Although researchers assume that samples from this population are representative of all people, we know that they are young, educated, and have self-selected to be in universities with people of similar interests and backgrounds. Given this, it is perhaps not surprising that some studies of cognitive enhancers have not found this relationship: in a mostly homogenous population, the degree of enhancement would be similar.

Of course, the ethical debate we should be having in light of this data surrounds how we would deal with a more intellectually leveled playing field. Of course, there would be a great number of benefits to society as more people might be doing more important work, and developing new solutions at a more rapid rate. However, we should also ask how we would evaluate people for a job or academic position if all had the same intellect. Would we then put more emphasis on other traits such as creativity or cooperativeness? How would we deal with issues of nepotism, sexism and racism if there were no plausible arguments against the candidate’s intellect?

How popular science reporting works

This is brilliant.

Sunday, September 26, 2010

Double-dipped sundae with a picked cherry on top

Let’s say that your hypothesis is that pitching in the 2010 baseball season is much stronger than 2009 pitching. In support of your hypothesis, you take a sample of excellent starting pitchers, say the ten pitchers with the most complete games. The average ERA for this group (at writing) is 3.17, and when you compare this to the 2009 MLB average ERA of 4.45, you say “see, I told you that we’re entering a new era of pitcher-dominated baseball!”.

Not so fast. Pitching a complete game is correlated with a low ERA (if batters were hitting you, you’d be taken out for a relief pitcher). This logic is circular: you are taking the best pitchers to prove that pitchers are great. These best pitchers are not representative of all pitchers.

Unfortunately, this statistical mistake is not uncommon in science, and a couple of recent papers have addressed this “voodoo” or “double dipping”. 

The Neuroskeptic just pointed out a particularly egregious case of a paper advocating double dipping as a way of getting better results from clinical drug trials. Briefly, their method is to run clinical trials at many centers, and then discount the centers that show a strong placebo effect. As the effect of any drug is measured by the amount of benefit that participants in the drug condition get over the participants in the placebo condition, centers with a strong placebo effect have a weaker drug effect.

Not all placebos are created equal, and not all types of patients respond to placebos in the same way. For example, severely depressed people have very little placebo effect in antidepressant trials, so antidepressants only have a strong effect in this population. 

There have been many recent, hard-hitting criticisms of several practices of big pharma, and they have been known to cherry pick studies for publication. Although only 50% of government-funded clinical drug trials find that a particular drug works, over 85% of industry-funded studies do.

Tuesday, September 21, 2010

Against simple determinants of complex behavior: the case of oxytocin

Earlier, I argued against the view that complex behaviors and psychiatric diagnoses were caused by single genes. (In other words, that we will not find a gene for altruism, schizophrenia, or the propensity to drive slowly in the left lane).

A shy reader sent me the following private message:
“However, I have been amazed by the more I dig into the literature the more you find ‘one protein’ being responsible for very complex traits, one example is bonding and vasopressin and oxytocin.”

Oxytocin is a hormone produced in the brain by the posterior pituitary gland. It is released in quantity during child-birth to trigger lactation. Smaller amounts are released in both males and females following orgasm.

In sheep, the oxytocin released at birth seems necessary for bonding between a ewe and her lamb. In the absence of oxytocin release, the ewe is repelled by the odor of amniotic fluid, and will reject lambs approaching her.  However, the mechanism behind this hormonal influence on behavior is unknown.

However, the most famous case study for the striking behavioral effects of oxytocin comes from the prairie vole.  Prairie voles form strong and long-lasting bonds with single partners, typically following the first mating event. If a female vole is given oxytocin and then placed in the presence of a male she has not mated with, she will bond to him as if she had. Conversely, if oxytocin is blocked in the female after mating, she will not bond with the male. A similar pattern is shown in the male for the hormone vasopressin.

Different interests have capitalized on this data, calling oxytocin the “cuddle drug” and selling it for romantic courtship, curing social phobias, or by the military for interrogation enhancement.

But does oxytocin have such a clear role in changing the behavior of humans?  It is neither feasible nor ethical to manipulate the bonding between human parents and partners, so there is still much we don’t know. In a recent study, either oxytocin or placebo was given to participants playing an economic game requiring the trust of an unseen other player. Participants who received oxytocin demonstrated more trust in their partner, but only when the partner was behaving in a trustworthy way. In other words, oxytocin does facilitate trust, but not over-ride common sense.
It seems most likely that human behaviors are too complex to be completely modulated by ocytocin. However, I should also note in closing that even in animal models, the association between oxytocin and behavior is not 1:1. In the prairie vole, a strong pair bond can occur in the absence of mating (and hence an absence of oxytocin). And interestingly, genetically mutated mice without an oxytocin receptor gene have no trouble giving birth or lactating.

So, even in “textbook” cases of a single neurotransmitter causing a complex set of behaviors, we can have the following take-home messages:
  1. Even in animal models, the neurotransmitter might not be necessary and sufficient (e.g. the voles still forming pair bonds in the absence of mating).
  2. Cases of behavioral modification, in both humans and animals, cannot be divorced from context. Context can be biological (e.g. birth, mating) or social (the humans who were not gullible with oxytocin).

Monday, September 20, 2010

Should we crowd-source peer review?

Peer review has been the gold standard for judging the quality of scientific work since World War II. However, it is a time consuming and error-prone process. Now, both lay and academic work is questioning whether the peer review system should be ditched in favor of a crowd-sourced model. 

Currently, a typical situation from an author’s perspective is to send out a paper, and receive 3 reviews about three months later. Typically, the reviewers will not completely agree with one another, and it is up to the editor to decide what to do with, for example, two mostly-positive and one scathingly negative review. How can the objective merit of a piece of work accurately be judged on such limited, noisy data? Were all of the reviewers close experts in the field? Were they rushed into doing a sloppy job? Did they feel the need for revenge against an author that unfairly judged one of their papers? Did they feel like they were in competition with the authors of the paper? Did they feel irrationally positive or negative towards the author’s institution or gender?

And from the reviewer’s point of view, reviewing is a thankless and time-consuming job. It is often a full day’s work to read, think about, and write a full and fair review of a paper. It requires accurate judgment on all matters from grammar and statistics to a determination of future importance to the field. And the larger the problems the paper has, the more time is spent in the description of and prescription for these problems. So, at the end of the day, you send your review and feel 30 seconds of gratitude that it’s over and you can go on to the rest of your to-do list. In a couple of months, you’ll be copied on the editor’s decision, but you almost never get any feedback about the quality of the review from an editor, and very little professional recognition of your efforts.

The peer review process is indeed noisy. A study of reviewer agreement of conference presentations found that the rate of reviewer agreement was not different from chance. In a study described here, women’s publications in law reviews were shown to have more citations than mens’. A possible interpretation of this result is that women are treated harsher in the peer review process, and as a consequence publish (when they can publish) better quality articles than men who do not have the same level of scrutiny. 

In peer review, one must also worry about competition and jealousy. In fact, a perfectly "rational" (Machiavellian) reviewer might reject all work that is better than his own for the purpose of advancing his career. In a simple computational model of the peer review process, it was found that the ratio of either "rational" or random reviewers needed to be kept below 30% for the system to beat chance. It also concludes that the refereeing system works the best when only the best papers are published. One can easily see how the “publish or perish” system hurts science.

It is a statistical fact that averaging over many noisy measurements provides a more accurate answer than any one answer. Francis Galton discovered this when asking individuals in a crowd to estimate the weight of an ox. Pooling over noisy estimates works when you ask for one measurement from many people, or when you ask the same person to estimate multiple times. A salient modern example of the power of crowd-sourcing is, of course,Wikipedia.

In a completely crowd-sourced model of publication, everything that is submitted gets published, and everyone who wants to can read and comment. Academic publishing would be quite similar to the blogosphere, in other words. The merits of a paper could then be determined by the citations, track backs, page views, etc.

On one hand, there are highly selective journals such as Nature who reject more than half of submitted papers before they even get to peer review and finally publish 7% of submissions. In this system, too many good papers are getting rejected. On the other hand, a completely crowd-sourced model means that there are too many papers for any scientist in the field to keep up with, and too many good papers won’t be read because it’s not worth one’s time to find diamonds in the rough. Furthermore, although the academy far from settled on the matter of how to rate professors for hiring and tenure decisions, it is more unclear what a “good” paper would be in this system as more controversial topics would get more attention.

The one real issue I see is that without editors seeking out reviewers to do the job, I worry that the only people reviewing a given paper will be the friends, colleagues and enemies of the authors, and this could make publication a popularity contest. Some data bear out this worry. In 2006, Nature conducted an experiment on the addition of open comments to the normal peer review process. Of the 71 papers that took part in the experiment, just under half received no comments at all, and half of the total comments were on only eight papers!

So, at the end of the day, I do believe that with good editorial control over comments, that a more open peer-reviewing system would be of tremendous benefit to authors, reviewers and science.

Sunday, September 19, 2010

New York Times' retraction of Alzheimer's test

Last month, an article in the New York Times proudly proclaimed a new 100% accurate test for the prediction of future Alzheimer’s disease. (This is the part where you should scroll down and read the retraction.).

Confused about the issue? Don’t feel bad – many doctors also fail at this type of reasoning.

Let’s say for the sake of argument that I tell you that I have a brand-new medical test. It’s called the “sleep gives you cancer” test, and it is one question long: Do you sleep? If you answered “yes”, then you will get cancer! As everyone who has ever had cancer has slept, this test is (according to the logic of the New York Times) 100% accurate. But, as a smart reader you will now tell me that my test isn’t so great because there are plenty of people in the world who sleep all the time and have never had cancer.

A test must do 2 things in order to be accurate: it must predict which people will get the disease (this is called sensitivity in the medical literature, and hit rate in psychology), and it also must predict which people won’t (the specificity in medical-speak, correct rejections in psych-speak). The test in question had a 100% sensitivity (everyone in their sample who later got Alzheimer’s tested positive), but 36% of people in the sample who didn’t get Alzheimer’s also tested positive.

So, how good is this test really?  Fortunately, some useful math exists to help us figure this out. Let’s say we have 1000 55-year olds. We know that 10% of them will develop Alzheimer’s by age 60. We give all 1000 people this test, and wait 5 years.  Looking at our sample, we’ll find that all 100 patients with Alzheimer’s tested positive for the test, as well as 324 (36%) of the non-Alzheimer group. Therefore, if one participant tested positive for the test there is only a 100/424 chance that s/he will have AD.

We also need to examine how useful an Alzheimer’s prediction test would be because, as of this writing, there isn’t a whole lot that can be done for AD. As pointed out here, the test described in the New York Times is based on a painful and invasive spinal tap, which makes the cost-benefit ratio quite large. However, there exist several predictive tests for AD based in neuroimaging that are less invasive.  However, given the high degree of uncertainty in the tests, coupled with the lack of meaningful therapeutic options spells years of needless anxiety for patients and families, in the opinion of this writer.

Thursday, September 16, 2010

Why scientists aren’t going to find my neurotic gene any time soon

There has been some talk lately about a recent study that found that no robust statistical relations could be found between the whole genome and a standard personality test.

Seem surprising? Not so much.

So, a gene codes a protein. What does the protein do? Such a great number of things that it’s difficult to even list: a protein can become a structural element of a cell (such as actin or myosin that make up muscle tissue), or they can become neurotransmitters or other messengers in a complicated cascade of signaling events. For example, the PubMed description of the protein neuregulin 1 (statistically associated with schizophrenia, and high creativity) starts with “The protein encoded by this gene was originally identified as a 44-kD glycoprotein that interacts with the NEU/ERBB2 receptor tyrosine kinase to increase its phosphorylation on tyrosine residues.” Aside from the technical language, the first description of this protein is the relationship that it has to another protein, to provide a specific biochemical context (phosphorylation).

Alright. So a gene codes a protein, and this protein is a widget that works in concert with other such widgets in a particular biochemical and environmental context.  Does it even make sense to say that there is a gene for a complex behavioral phenomenon such as schizophrenia, depression, or a neurotic personality?  Not very much, really. Not at least in the sense of “I have this computer for writing this blog post”. Kenneth Kendler points out the lack of causal link further by making the following analogy:

“A jumbo jet contains about as many parts as there are genes in the human genome. If someone went into the fuselage and removed a 2-foot length of hydraulic cable connecting the cockpit to the wing flaps, the plane could not take off. Is this piece of equipment then a cable for flying?”

While most people would answer that no, the tube does not directly cause the jet to fly, this is the exact same logic that is used when we try to find a gene for X.

The issue is that we expect genes to have very lawful 1:1 correspondences with specific traits because in school we learned about Mendel’s pea pods, or cystic fibrosis, or Huntington’s disease that show such a relationship. This type of inheritance seems to be the exception, rather than the rule. There exists a wide distribution of association strengths between a single gene and a particular outcome. Scientists express this strength using a statistic called the odds ratio. Briefly, this is the odds that someone with gene A will have disease X, versus the odds that someone without gene A will have disease X. For a completely Mendelian disease (one like cystic fibrosis that cannot be contracted through the environment), the odds ratio is infinite because if you have the gene, you will always have the disease, and if you don’t have the gene, you never will. Statistical associations that we perceive as strong (such as the link between heavy smoking and lung cancer) have an odds ratio of about 20.  Psychiatric associations, on the other hand have an odds ratio of 1-2.  In other words, don’t go rush out to get genetically tested for depression. It won’t do you much good.

Partially, this lack of association is due to complex interactions between genes and the environments. For example, people with a particular variant of a serotonin transporter are more likely to experience depression, but only in the context of having experienced a stressful life event

A possible exception to the single-genes-don’t-change-behavior-in-isolation rule might be the COMT gene. This gene makes the enzyme that breaks down several neurotransmitters in the brain, including dopamine.  Like many genes, individuals may have different variations (or alleles) of the gene.  However, unlike alleles that change an individual’s hair color, different alleles in the COMT gene have been associated with striking differences in cognitive function.  Incredibly, these differences arise due to a single amino acid difference in the enzyme!  Substituting valine for methionine at position 158 in the gene is associated with a host of poorer psychological outcomes.  As each person inherits one copy of the gene from each of his parents, individuals can either have two valines, two methionines, or one of each.  Interestingly enough, the number of valines correlates with the degree of negative outcome.  For example, a 2008 study was conducted in which people recorded the events that were taking place in their lives, and rated how positive these events were.  The authors found that valine-valine individuals found a very pleasant event only as positive and methionine-methionine people found a sort-of positive event.  Given these results, it is easy to see how these individuals have difficulties with major depression and addiction.

So, genes code proteins which work together in an incredibly complex biochemical context created by other genes, the environment, and interactions of the genes, the biochemical milieu and the environment. Instead of asking ourselves why we haven’t found the gene for X, we should really be asking ourselves why we keep asking that question.

Tuesday, September 14, 2010

Dispatches from the Academy

I have a weird and wonderful job. My job is to try to figure out things that have not yet been figured out, write about them, submit said writing to journals, and then argue with similar strange people until said words come out in print. The particulars of what I’m trying to figure out have nothing to do with making widgets, and almost nothing to do with deeply noble social causes such as curing cancer or Alzheimer’s disease. I am an academic, in the production of knowledge for knowledge sake.

There has been much recent criticism of the academy lately, primarily brought about by the publishing of Mark Taylor’s  Crisis on Campus: a Bold Plan for Reforming Our Colleges and Universities

While American universities are not without their faults, the timbre of this argument has reached laughably hyperbolic heights such as "Graduate education is the Detroit of higher learning" or that the current university system is a "Ponzi scheme".
A Ponzi scheme? Seriously?

The uncomfortable truth at the heart of Taylor’s argument is that there are too few tenure track professorships for too many young Ph.Ds. This is true. When I left graduate school last year, there were about 100 graduate students for 40 faculty members. While a conservative sounding student-to-faculty ratio, it is far above the replacement level for each of those 40 faculty members, each of whom trained students before and after us. And although some of my cohort knew they wanted to go into industry, the vast majority of us were bent on the tenure track. Although it’s too early to know what will happen to us, it is safe to say that we have a few years of fierce competition ahead, and of necessity, many of us will be doing non-academic pursuits. But in my field (and other sciences), we did not incur extra student debt in grad school, and have picked up some math and computer skills that make us somewhat employable.

Graduate students in the humanities have a harder road, having to pay for their graduate educations, and often becoming part of an economic underclass of highly educated adjunct professors, earning $1000-$5000 per course a semester, without benefits. David Hiscoe described his experience as an adjunct as “five writing courses a quarter at $12,500 a year, slightly more than the average hourly wage I'd pulled down as a not-too-able carpenter's assistant during the summers when I should have been writing my dissertation.”

Taylor’s solution? Abolish tenure to kick out the lazy, old, irrelevant and expensive professors.  The problem is that the economics of this argument don’t make any sense. Of course, a tenured professor is going to cost more than an adjunct. But the cost of this tenured professor is chump change compared to, say, landscaping, catering, the cushy salaries of university administrators, state-of-the-art athletic facilities and the salaries of football coaches.

Tenured professors (at least in my own field, it might be different in Taylor’s department of religion) are not lazy people. I believe that the difficulty of getting tenure selects out the people that are not intrinsically motivated for high achievement. By the time one’s tenure is decided, one has gone through 4 years of undergrad, 4-10 years of grad school, 1-6 years as a postdoc and 5-7 years as a non-tenured professor. You may get through a few years with the “eyes on the prize” mentality, but not half of your working life!

Furthermore, pressures that exist before tenure exist after tenure: research can only happen with funding from competitive grant proposals, and highly selective journals will not publish work that is irrelevant.

In anticipation of the counter-argument for tenure, Taylor speaks out against “academic freedom” by stating "If you don't have the guts to speak out before, you're not gonna have it after."

Academic freedom isn’t just about saying something controversial in the classroom, it’s about being able to take scientific risks. Many of the young professors I know, under the pressure to keep a certain publication volume, publish small, incremental pieces of work. This is not to say that it’s not good work, but it is safe work, and it is work that doesn’t radically change anyone’s world view. It’s work that, in the big picture, will be forgotten. In order to do important scientific work, one needs the ability to take some risks, to explore a set of experiments that might not work out, and to still have a job when and if these fail.  Without a degree of job security, we will lose cutting edge research.

But as I disagree with these major points from Taylor, I do see that there are major problems in American research universities. Chief among them is a lack of importance on teaching. The weighting of teaching in the tenure decision varies from university to university, but runs from indifferent to disdained. I recall with sadness the anxiety that my graduate advisor had over receiving a teaching award, it being seen as a "kiss of death" for tenure.

I want to be the professor who values teaching, because it does more good in the world than research alone. At the end of a long and venerable research career, one’s life’s work will be scarcely more than a paragraph in an introductory textbook, but teaching well affects students for a lifetime.

One point that no one seems to acknowledge in these debates over the future of universities is this: the prospect of becoming a tenured professor is a dream much like that of becoming a rock star. Both professions have demand that overflows the market. Both professions afford a lifestyle of creative freedom. And in both professions, you will find young people putting off creature comforts just for the opportunity to try, whether it is toiling in a wedding band, or being an adjunct instructor for $3000/semester. It’s not the safest bet, but I still can’t think of anything else I’d rather be doing.