The New |
||||||||||||||||||||||||||||||||||||||||
Download Chapter 7 |
7
Field Experiments Editors
love field experiments. When I was a young reporter in Miami, a wise but
slightly daffy feature editor named Bill Phillips sent me out into the
streets to test the tolerance of Miamians for boorish behavior. I bumped
into hurrying businessmen, I blew smoke into the faces of sweet old ladies,
I tied up traffic at a bus stop by spilling a handful of change in a bus
doorway and insisting on picking up every last nickel before the bus could
move. The reaction of the citizenry was kind and gentle, and I dare any
Miami
Herald reporter to replicate that experiment today.1
On April Fool's Day, Phillips stationed me at the corner of Miami Avenue
and Flagler Street with a string attached to a purse lying on the sidewalk,
so that I could snatch it out of the way of any passerby foolish enough
to try to pick it up. A photographer hid in a nearby doorway to record
the number of times I got punched in the nose. That's another experiment
nobody has dared to replicate. Stunts
like these may not be worth turning into generalizable experiments, but
there are socially useful and interesting things to be done that have both
some degree of generalizability and the mad ring of 1950s Miami Herald
copy. Some examples: Ignorance
about AIDS In
1988, Dr. Inge Corless, a faculty member in the health sciences division
of the University of North Carolina at Chapel Hill, was preparing a course
on Acquired Immune Deficiency Syndrome (AIDS). As
part of her research, she discussed with local pharmacists the disease-prevention
properties of different types of condoms and discovered a disturbing lack
of knowledge among these professionals. To see if her findings were true
of a larger group of pharmacists, my advanced reporting class designed
an experiment. A team of students sent one of its members to every pharmacy
in Chapel Hill and the neighboring town of Carrboro to buy condoms and
ask the pharmacist's advice about the best kind for preventing AIDS. A
cumulative scale was designed to rank each pharmacist on how much he or
she knew. The scale is cumulative because anyone who knows any given fact
on the list usually knows the preceding facts as well: 1. Latex is better than animal skin (pores in the latter can admit the virus). 2.
Lubricated latex is better than dry latex (less danger of breaking). 3.
Lubrication with a spermicide is better than plain lubrication (the spermicide
can kill the AIDS virus). 4.
The name of the spermicide that kills the AIDS virus in laboratory test
is Nonoxynol-9. Only
40 percent of the pharmacists knew all of these simple facts. Worse yet,
some of them advised the student buyers to buy the lambskin condoms, which
are both the most expensive and the least effective in preventing AIDS. This
was a simple and easily generalizable field experiment, for several reasons.
First, no inferences about causation or correlation were necessary. The
simple rate of ignorance was newsworthy by itself. The students did test
some hypotheses about the causes of ignorance by comparing chain stores
with independent pharmacies and older pharmacists with younger ones. No
meaningful differences were found, nor were they necessary for newsworthiness. The
second boost to generalizability comes from the fact that all of the pharmacies
in the defined area were tested. Sampling was involved within the pharmacies
because the person on duty when the student condom buyer appeared represented
all of the pharmacists who worked there. Overall, those pharmacists could
be taken as representative of those on duty during hours convenient for
student shoppers. The
resulting story, published in the Durham Morning Herald, had two
useful social effects.2
It spurred local pharmacists to become better educated, and it contributed
directly to the AIDS education of the newspaper's readers.
The underage drinking crackdown A more complicated research design was executed by an earlier class in 1982. The purpose of this one was to test the effectiveness of a Chapel Hill police crackdown on illegal beer purchases by persons under 18. This time, a causal inference was sought. The hypothesis: when the police crack down, beer sales dry up. To test it, beer sellers in neighboring jurisdictions were used as a control. A secondary hypothesis was also tested: police are more likely to watch on weekend nights, and so violations will be more frequent on week nights. In Chapel Hill, there was no sampling. Every convenience store and tavern was visited. Controls were drawn from the yellow pages section of the phone book and accumulated in order of their proximity to Chapel Hill until a matching number was reached. The buyers were all 18 years old, so that no laws would be broken. The variable being measured was whether the sellers of beer would verify the age of these young people by asking for identification. Verification is sufficiently salient to teenagers that they have a slang term for it: "carding." A total of 246 attempts to buy beer was made. The overall rate of carding in Chapel Hill on a Saturday night was 22 percent, a number which, standing alone, suggests that the police crackdown was not very effective. However, the carding rate outside Chapel Hill was only 6 percent, revealing a significant effect after all. The rate of carding in Chapel Hill dropped on Monday night to 7 percent, and was no longer significantly different from the rate outside Chapel Hill. Bottom line: the police crackdown did have a selective effect on weekends, none at all on other times, and there were still plenty of opportunities for illegal drinking by minors. Executing such a field experiment is not as simple as it sounds. The field workers have to be trained so that they follow uniform behaviors that will generate quantifiable data. We had each 18-year-old accompanied by an upperclassman or graduate student who observed and recorded the outcome of each test. We also had a rule against drinking on the job by our field force. That led to some awkward social situations. "We were the only customers in the bar," one of the supervisors reported. "The waitress was very nice and served Wendy a beer and then sat across the bar to talk to us. Wendy would casually pick up her glass and then set it down again without taking a sip. I avoided looking the waitress in the eye by staring out the glass door to the parking lot." The students left abruptly, pretending to pursue a friend spotted in a passing car.3 Some journalists are uncomfortable with the ethical considerations in such deception. Participant observation is, however, a time-honored tradition in both journalism and social science. And, at least where monitoring of public service is concerned, even so stringent an ethicist as Sissela Bok gives qualified approval where the monitoring leads to higher standards of public protection.4 Field
experiments are satisfying in their directness. Instead of asking about
social behavior in a survey, you get a chance to observe it straightforwardly.
When the hypothesis is clearly formulated in advance, you can design the
experiment to test the hypothesis in the most efficient way and use randomization
or test a total population so that error variance is minimized.
Rules for experimentation The rules for experimental research have been well codified over the years, starting with John Stuart Mill's nineteenth-century proposals for scientific method.5 However, some of the essential principles are intuitively apparent. Thomas D. Cook and Donald T. Campbell cite the story of a seventeenth-century experiment by a group of Frenchmen to test Pascal's theory of atmospheric pressure. They were looking for the cause of the Torricellian vacuum. A tube is filled with mercury and turned upside down with its lower end in a dish of mercury. The column of mercury falls until it is about 30 inches high, and a vacuum remains in the space above it. What supports the column? Pascal thought it was the weight of the air pressing on the mercury in the dish. On a fall day in 1648, seven Frenchmen took two tubes, two dishes, and lots of mercury to a mountain. At the bottom of the mountain they set up the two tubes and found the column of mercury was about 28 inches tall. Then, leaving one tube with an observer, they carried the other one 3,000 feet up the mountain and took another measurement. This time the mercury was less than 24 inches tall. They then varied the conditions on the mountaintop, taking measurements at different places and inside a shelter. All yielded the same number. On the way down, they stopped at an intermediate height and got an intermediate reading. At the bottom their observer there verified that the mercury in his tube had not changed. Then they set up the tube from the mountaintop one more time and saw that it now gave the same reading as the one that had been there at the bottom all the time. Pascal's
theory was supported. The second column of mercury had served as a control,
showing that the different readings on the mountain were due to the elevation,
not to something that had happened generally in the atmosphere during their
climb. By taking measurements in all the different conditions they could
think of, the experimenters were checking for rival hypotheses. And by
taking a measurement halfway down, they were showing a continuous effect.
Their example, say Cook and Campbell, "is redolent with features of modern
science."6
Modern experimental design For
the drinking-age experiment to have been as careful as the one performed
by those seventeenth-century researchers, we would have had to do it twice:
once before the police crackdown and once after. That would rule out the
possibility that there is some ongoing condition in Chapel Hill that explains
its higher Saturday-night carding rate. Experimental designs can take a
great variety of forms, and it helps to keep track of them if you make
diagrams. Here is one, adapted from Samuel Stouffer,7
whose pathbreaking study of American military men
in World War II was cited in the previous chapter:
The
experimental condition ñ police crackdown in the Chapel Hill example
--
is introduced between Time 1 and Time 2 for the experimental group only.
In theory, the Xs and Ys are equivalent to start with, but in practice
those conditions may be difficult or impossible to achieve. If you have
enough control over the situation, you can randomize assignment to group
X or group Y. but the police crackdown was not random, and it covered only
Chapel Hill. So the next best thing was to find bars and convenience stores
as much like those in Chapel Hill as possible, and the way to do that was
to find some in the same market but not the same police jurisdiction.
If
the complete design had been followed, the analysis could have taken the
following form:
where
D stands for "difference." If the police crackdown is effective,
then D(X) should be significantly larger than D(Y). If both change, then
some external force in the environment is acting on the entire community,
not just that portion covered by the Chapel Hill police department.
In fact, the design that was used was: X2
And
the fact that carding was more frequent within the police jurisdiction
than without was taken as evidence that the crackdown was real. Another
possible abbreviated design would have been:
X1
X2 where the situation in Chapel Hill after the crackdown could have been compared with the situation before. In that case, Chapel Hill would have acted as its own control. Notice
that in this case the experimental manipulation is not something controlled
by the researcher. This is a natural experiment in that the manipulation
would have taken place whether or not an interested researcher was around.
The researcher's job is one of measurement and analysis. Sometimes you
can introduce the experimental manipulation as well, and that give you
greater control in the randomization of potentially confounding variables. For
example, journalism students in Chapel Hill were interested in testing
the proposition that folks in their town are more polite than people elsewhere.
The town's reputation for civility is well known, but is it myth or fact?
And, can it be objectively measured? One way to operationalize civility
is by observing driver behavior. People are less inhibited in their social
interactions when they are protected from each other by 2,000-pound shells
of steel and fabric. We designed this simple test: Students in teams of
two got into automobiles and drove to randomly chosen traffic lights, looping
around until they were first in line at a red light. When the light turned
to green, the driver held the car's position and waited for the car behind
to honk. The passenger meanwhile used a stopwatch to clock the time from
the green light to the first honk from behind. Hypothesis: pre-honking
time in Chapel Hill would be significantly longer than pre-honking time
in other cities. When spring break came, the students scattered to their
various homes and vacation spots and repeated the experiment at random
intersections there. The outcome: Chapel Hill's reputation was justified.
Its mean pre-honking time, more than eight seconds, was more than double
that for other cities. In fact, some Chapel Hill motorists never honked
atall, but waited
patiently through another traffic light cycle. Another
famous experiment where the manipulation was introduced involved a militant
civil rights group in southern California in the 1960s. A professor at
California State College at Los Angeles recruited five blacks, five whites,
and five Mexican-Americans to attach "Black Panther"
bumper stickers to their cars. All drove routinely to and from campus along
Los Angeles freeway routes. All had perfect driving records for the previous
twelve months. All signed statements promising not to drive in unfamiliar
parts of the city or in a manner to attract police attention. The
experiment was halted after seventeen days. The $500 set aside to pay traffic
fines had been used up. The first arrest, for an incorrect lane change,
was made two hours after the experiment began. One subject got three tickets
in three days and dropped out. In total, the fifteen previously perfect
drivers collected thirty-three citations for moving violations in the seventeen
days that they displayed the bumper stickers.8 As
journalism, the honking and bumper-sticker studies may have been sound
enough, but even journalists need to know about the hazards of such research.
When experiments are inadequately controlled, the expectations of both
the experimenter and the subjects can yield spurious effects. There is
some evidence that teacher expectancies have a lot to do with how a child
performs in school. Robert Rosenthal demonstrated the effect by giving
teachers lists of pupils whose psychological test results indicated superior
performance. The pupils did in fact perform better than their classmates,
even though Rosenthal had compiled his list by random selection. He called
it "Pygmalion effect."9 Hawthorne
effect A
better-known problem occurs when the subjects in an experiment realize
that something special is happening to them. Just the feeling of being
special can make them perform differently. Surely knowing that a Black
Panther bumper sticker is on one's car could make one feel special. This
phenomenon is called Hawthorne effect after a series of experiments at
Western Electric Company's Hawthorne Plant in Chicago in 1927. Six women
were taken from a large shop department that made telephone relays and
placed in a test room where their job conditions could be varied and their
output measured. Their task was fairly simple: assemble a
coil, armature, contact springs, and insulators by fastening them to a
fixture with four screws. It was about a minute's worth of work. Each time
a worker completed one, she dropped it into a chute where an electric tape-punching
device added it to the total for computing the hourly production rate. To
establish the base for a pretest-posttest design, the normal production
rate was measured without the assemblers being aware of the measurement.
Then the experiment was explained to them: how it was to test the effect
of different working conditions such as rest periods, lunch hours, or working
hours. They were cautioned not to make any special efforts but to work
only at a comfortable pace. What
happened next has achieved the status of myth in the separate literature
of both social science and business administration. In the former, it is
regarded as a horror story. For the latter, it is considered inspirational. The
second variable in the experiment (Time 2) was the production rate for
five weeks in the test room while the subjects got used to the new surroundings.
Time 3 changed the piece rate rules slightly. Times 4, 5, and 6 changed
the rest periods around. And so it went for eleven separate observations.
And for each observation, production went up -- not up and down as the
conditions were varied. Just up. Nonplussed,
the experimenters threw the test into reverse. They took away all the special
work breaks, piece rates, and rest periods. Production still went up. They
put back some of the special conditions. More improvement. No matter what
they did, production got better. Something
was going on. It was "testing effect." The six women knew they were in
an experiment, felt good about it, enjoyed the special attention, and were
anxious to please. They formed a separate social set within the plant,
had frequent contact with management, and took part in the decisions over
how the experimental conditions were to be manipulated. Their participation
and the sense of being special overrode the effect of the initial admonition
to make no special effort and work only at a comfortable pace. The study
never found out what combination of rest periods, lunch hours,
or payment methods has the most effect on productivity. But it was not
wasted. The company learned that production improves when management shows
concern for workers , and management and workers are "organized in cooperation
with management in the pursuit of a common purpose."10
American management theorists took that idea to Japan after World War II,
where it flourished, and it was eventually reintroduced to our shores in
the 1980s. Those Hawthorne plant women were the first quality circle. One
of the flaws in the Hawthorne research design was that it tried to do too
much. Expressed diagrammatically, it would look like this: X1
X2 X3 X4 X5
X6 . . . Following the notation system of Samuel Stouffer, we see many observations at different points in time. An experimental manipulation is inserted between each of the adjoining pairs of observations. A better design would have had a row of Y's parallel to the X's to represent a control group with a similar special room and the same amount of special attention but no changes in working conditions. Better yet, get a different group (randomly selected, of course) for each experimental condition. Make repeated measurements and insert the change somewhere in the middle, say between the third and fourth observations. In that way you can verify that the control group and the experimental group are not alike to start with but are responding in the same way to the passage of time and to the effects of being measured. Factors
that correlate with the passage of time are an ongoing problem with field
experiments. Your subjects get older and wiser, public policy makers change
their ways, record keeping methods change, the people making the observations,
maybe even you, too, all change. The traditional way of coping with differences
that correlate with time is with imagination. Stouffer noted that the basic
research design, with all the controls and safeguards taken away, looks
like this: X2 One measurement of one thing at one point in time. He was complaining about 1940s social science, but what he said is still relevant to 1990s journalism. With such a research design, one phenomenon looked at once and not compared to anything, we "do not know much of anything," he said. "But we can still fill pages ... with 'brilliant analysis' if we use plausible conjecture in supplying missing cells from our imagination. Thus we may find that the adolescent today has wild ideas and conclude that society is going to the dogs." The result is a kind of pretest, posttest comparison: X1X2 The
italic cell is not an observation, but "our own yesterdays with hypothetical
data, where X1 represents us and X2 our offspring.
The tragicomic part is that most of the public, including, I fear, many
social scientists, are so acculturated that they ask for no better data." Since
Stouffer's time, social scientists have become more careful. There is a
tendency to add more control groups. The Black Panther bumper sticker experiment,
for example, could have profited from this design:
The
Y2 represents a control group that drives without bumper stickers
as a test of the possibility that police are cracking down at Time 2. The
control group would be even better if both X and Y drivers had either a
Black Panther sticker or a neutral sticker applied before each trip, and
if the application were performed at random after each driver was already
in the car and had no way of knowing which sticker was being displayed.
An
even more thorough design could look like this:
Here
the nonstickered or neutral-stickered Y group is present at Time 1 to verify
its initial comparability. X' and Y' are present
as a test of the possibility that the experiment made the original two
groups of drivers too aware of their roles as subjects and made them behave
differently, like the women in the Hawthorne experiment. Such an effect
would be indicated by a difference between X2 and X'2
as well as between Y2 and Y'2.
Donald
Campbell, with Julian Stanley in an early evaluation of designs for experimental
and quasi-experimental social research, pointed out that the above design
includes four separate tests of the hypothesis.11
If the police are really prejudiced against the Black Panthers, then there
should be the following differences in the quantity of arrests:
Tacking
on control groups can be a good idea in survey research when you return
to the same respondents to get a pretest, posttest measure. The Miami
Herald did that when Martin Luther King was assassinated just after
it had completed a survey of its black population. The hypothesis was that
King's nonviolent ideology was weakened by his death and the advocates
of violence had gained. Fortunately, the earlier survey had asked questions
about both kinds of behavior, and records of who had been interviewed had
been retained. At the suggestion of Thomas Pettigrew, the Herald
added a control group of fresh respondents to the second wave of interviews.
The original respondents had had time to think about their responses, might
have been changed by the interview experience, might even have read about
themselves in the Miami Herald. Any differences at Time 2 might
simply be the effect of the research process rather than any external event.
The second-wave control group provided a check against that.
As
it turned out, the control group's attitudes were indistinguishable from
those of the panel, providing evidence that the experience of being interviewed
had not altered the Herald's subjects. Knowing that was important,
because there was a big change between Time 1 and
Time 2. Miami blacks, after King's death, were more committed than ever
to his nonviolent philosophy. The proportion interested in violence did
not change.12
Once you start looking for spurious effects, it is difficult to know where to stop. Donald T. Campbell, first with Julian Stanley and later with Thomas D. Cook, has made an intensive effort to figure out how to do things right. To do that, he first had to list the things that can go wrong. At Harvard, they used to call his list of foul-ups "Campbell's demons." Here is a partial listing:
Campbell's
demons 1.
History.
If you measure something at two different times and get a difference, it
could be because of any number of historical events that took place in
the intervening period. 2.
Maturation.
Subjects and experimenters alike get older, tired, bored, and otherwise
different in the course of an experiment. 3.
Testing.
Measuring the way a person responds to a stimulus can change the way he
or she responds the next time there is a measurement. School achievement
tests are notorious because teachers learn what is in the tests and start
teaching their content. Pretty soon all the children are above average,
just like in Lake Wobegon. 4.
Statistical
regression. Journalists have been easy prey to this one. A school board
announces a program to focus on its worst-performing schools and improve
them. It picks the two or three schools with the worst test scores in the
previous year and lavishes attention and new teaching methods on them.
Sure enough, the next year those schools have better test scores. The problem
is that they would have done better even if there had been no special attention
or new technique. The
reason is that there is a certain amount of random error in all tests and
rankings. The schools at the bottom of the list got there partly by chance.
Give them a new roll of the dice with next year's testing, and chance alone
will move them closer to average. The phenomenon is call regression
toward the mean, because it always moves the extreme performers,
top and bottom, closer to the mean on the second test. It is a danger any
time that you select the extremes of a distribution for treatment. Most
educators know about it, but knowing about it doesn't stop them from taking
the credit for it. 5.
Selection.
If comparison groups are not chosen strictly at random, then hidden biases
can destroy their comparability. Self-selection is the worst kind. If you
were doing the Black Panther experiment, and you let students volunteer
to display the bumper stickers, you might get the risk takers and therefore
the most reckless drivers. 6.
Mortality.
Not all of the subjects remain available during an experiment that lasts
over a period of time. Those who drop out or get lost may be different
in some systematic way. In the evaluation of Head Start programs for preschool
children, for example, the children with the most-motivated parents were
more likely to finish the treatment. The selective dropping out of the
less motivated took away children who had poorer family situations and
maybe other strikes against them. Their absence for the final comparisons
made Head Start look better than it really is. 7.
Instrumentation.
The measuring scale may have more flexibility in the middle than at the
extremes. Audiences rating different moments of a presidential debate on
a seven-point scale can make wider swings from the midpoint than when the
comparison is made from an extremely high or low point. 8.
The
John Henry effect. Members of a control group might know they are in
a control group and try harder just out of rivalry. Students in some educational
experiments have been suspected of doing this. John Henry, you may remember,
was the steel-driving man who "wouldn't let a steam drill beat him down"
in the traditional folk ballad. 9.
Resentful
demoralization. Just the reverse of the John Henry effect. Control
groups see the experimental group as being more favored, and they stop
trying. See Cook and Campbell13 for the full list of threats to experimental validity. But don't be discouraged by them. As Campbell noted many years ago, "all measures are complex and all include irrelevant components that may produce apparent effects."14 It is not necessary to become so frightened of those irrelevant components that one avoids field experiments. It is only necessary to be aware of the things that can go wrong and treat your own work with the appropriate skepticism. Oddball measures One way to foil many of Campbell's demons is to look for nonreactive measures of newsworthy phenomena. Such measures sometimes occur in nature. For example, you can estimate the age of viewers of a museum exhibit by locating the nose prints on the glass case and measuring their height from the floor. This example and many others comes from a wonderful book by Eugene J. Webb, Donald Campbell, and others which was written under the working title, "Oddball Measures." (" . . . it is only a fear of librarians that has caused us to drop it," the authors reported.)15 Their nonreactive measures include the simple observation of behavior where the observer does not intrude on the scene. For example, one study that they cite examined the social distance of whites and blacks in a college setting by observing the degree of racial clustering when whites and blacks chose their seats in classrooms. The effects of ghost stories told to a seated circle of children was observed by noting the shrinking diameter of the circle. Content analysis is a nonreactive measure. You can trace the diffusion of the switch from using the term "Negro" to "black" by counting both words and establishing their ratio in different newspapers and at different times. Archival records normally are not changed by the act of measuring. With modern computer archives, however, one cannot be so certain. Publication of a paper evaluating a newspaper's editorial skill by measuring the rate of certain misspelled words might conceivably induce the paper's editors to correct the misspellings in the electronic database. Records kept over time are subject to changes in the efficiency of record keeping. Thus archival data can show an increase in criminal activity after a period of police reform if one of the reforms is better record keeping. Webb
and colleagues also cite the case of a Chicago automobile dealer who had
his mechanics check the radio of every car brought in for service to see
what station it was tuned to. The dealership then placed its advertising
with the most popular stations. And then there is the classic story of
the measurement of attention to a football game by monitoring pressure
gauges at the city water department. The greater the audience, the greater
the pressure drop during commercials as toilets are flushed across the
viewing area.
Evaluation
research Government
agencies often use evaluation research to test the effectiveness of their
programs. Knowing about Campbell's demons can help you evaluate the evaluators. A
classic example is the great Connecticut crackdown on drunken drivers when
Abraham Ribicoff was governor in 1955. He imposed a program of intensified
law enforcement, and the annual rate of traffic deaths dropped by 12.3
percent in 1956. If you just look at the two years, you have a simple pretest,
posttest design, and the governor's program seems to have worked. But if
you check out the longer time series, you see that there was wide year-to-year
variation, and 1955 happened to be a peak year. Reforms are often instituted
when a problem is at its peak. Since chance alone may account for the peak,
a measurement in the following year, thanks to statistical regression,
will likely be closer to the long-run average. That may be all that happened
in Connecticut. The
effect of reforms is easiest to measure if they are introduced abruptly.
You then can look for their effect against the background of a longer time
trend. In
educational reforms, look for Hawthorne effect and selective retention
as factors that will make just about any school administration look good.
Schools get a lot attention when they try a new technique, and their pretest,
posttest studies show that it works. Journalists give it big play. Then
you stop hearing about it, and after a few years the technique is forgotten.
That's because the Hawthorne effect wears off, and
the new technique proves no better than the old in the long run. However,
nobody calls a press conference to tell you that. Selective
retention (Campbell's "mortality") occurs when only the most motivated
students continue to show up for a special program. To prove that the program
works, administrators compare those who finished with those who dropped
out. Sure enough, they perform better. But chances are they would have
done so in the absence of the new program, because they were more energetic,
aggressive, and better motivated to start with. Indeed, even the beneficial
effect of a college education has been shown to disappear when family background
is held constant. Going to college identifies you as a person with above-average
earning power. It does not necessarily give you that power. Your genes,
and possibly your family connections, did that. One
of the frustrating aspects of evaluation research is that political realities
tend to work against sound research designs. A true experiment requires
randomization of the experimental treatment. That is the best way to assure
that the experimental and control groups are equal to start with. But if
there is some reason to believe that the experimental treatment really
does confer an advantage, then the most politically astute are more likely
to get it. An
experimental plan to eliminate poverty through the simple expedient of
giving poor people money was devised by the federal government in the administration
of Lyndon Johnson. It was to take place in New Jersey and use entire communities
as units of analysis. That way, there could be controls without people
living next door to each other in similar circumstances and getting different
treatment from the government. But nobody could figure out a way to decide
what communities would get the treatment and which ones would have to settle
for being controls. The
federal government's most ambitious attempt to distribute benefits and
costs equally through randomization was the draft lottery during the war
in Vietnam. In theory, the men in the armed forces should have been a representative
sample of the population. They weren't even close, because the
more affluent men of draft age were more clever at figuring out ways to
beat the system by staying in school, joining the National Guard, feigning
illness, or leaving the country. Actually, more than cleverness was involved,
because staying in school cost money and getting into the Guard sometimes
took political connections. Where
the government is not involved, randomization is easier. Clinical drug
trials use the double-blind method where treatment and control groups are
randomly assigned a test drug or a placebo, and neither the person getting
the drug nor the person administering it knows which is which. Even that
procedure sometimes is given a political justification: the new drug is
scarce, and random selection is a fair way to decide who should get it.
However, the tradition of using the marketplace to allocate scarce resources
is strong, and opportunities for true experiments in education and social
policy will probably always be rare. That makes the need for skeptical
journalistic attention to policy-related quasi experiments all the more
important.
Notes 1. Phil Meyer, "Even Boors Get Break in Miami," Miami Herald, July 27, 1958, p. 26A. return to text 2. Melinda Stubbee, "Survey: Pharmacists Lack Knowledge of Safest Condom Types," Durham Morning Herald, December 7, 1988, p. 1-B. return to text 3. Shawn McIntosh, "Wary journalist braves good ol' boys' hang-out for class grade," UNC Journalist, December 1982, p. 10. return to text 4. Sissela Bok, Lying: Moral Choice in Public and Private Life (New York: Vintage Books, 1979), p. 212. return to text 5. Ernest Nagel, ed., John Stuart Mill's Philosophy of Scientific Method (New York: Hafner, 1950). return to text 6. Thomas D. Cook and Donald T. Campbell, Quasi-Experimentation: Design and Analysis Issues for Field Settings (Boston: Houghton-Mifflin, 1979), p. 3. return to text 7. Samuel Stouffer, "Some Observations on Study Design," American Journal of Sociology, January 1950. return to text 8. Frances K. Heussenstamm, "Bumper Stickers and the Cops," Transaction, February 1971. return to text 9. Cited in Cook and Campbell, Quasi-Experimentation, p. 67. return to text 10. George Caspar Homans, "Group Factors in Worker Productivity," reprinted in Sociological Research, Matilda White Riley, ed. (New York: Harcourt, Brace & World, 1963). return to text 11. Donald T. Campbell and Julian Stanley, Experimental and Quasi-Experimental Designs for Research (Chicago: Rand-McNally, 1966). return to text 12. Philip Meyer, Juanita Greene, and George Kennedy, "Miami Negroes: A Study in Depth," a Miami Herald reprint, 1968. return to text 13. Quasi-Experimentation, chapter 2. return to text 14. Donald T. Campbell, "Reforms as Experiments," American Psychologist, April 1969. return to text 15. Eugene J. Webb, Donald T. Campbell, Richard D. Schwartz, and Lee Sechrest, Unobtrusive Measures: Nonreactive Research in the Social Sciences (New York: Rand McNally, 1966. return to text |
|||||||||||||||||||||||||||||||||||||||
Download Chapter 7 |