[wordup] When a Test Fails the Schools, Careers and Reputations Suffer

Adam Shand larry at spack.org
Tue May 22 01:27:32 EDT 2001


why free software (that's free as in freedom, not free as in beer) is a
good thing?  Hmmm, certainly disclosure is a good thing eh?

adam.

(note nytimes url requires free registration)

URL: http://www.nytimes.com/2001/05/21/business/21EXAM.html?pagewanted=all
Via: http://slashdot.org/article.pl?sid=01/05/21/1842203&mode=flat

When a Test Fails the Schools, Careers and Reputations Suffer
By JACQUES STEINBERG and DIANA B. HENRIQUES
May 21, 2001

Sitting in his cramped office in Fort Wayne, Ind., with his calculator
running, John Kline became the first to suspect that a major test
publisher had erred in computing the standardized test scores of thousands
of his students.

As testing director for the local school system, Mr. Kline quickly alerted
the company, CTB/McGraw-Hill, but it did not fully investigate his
complaint at the time.

If it had, CTB would have discovered a crippling programming error in time
to prevent it from upending the lives of students, parents and educators
as it rippled across the nation over the first eight months of 1999. This
mishap, the most far-reaching in the recent history of school testing,
jolted school districts in at least six states, including New York City,
where it mistakenly sent nearly 9,000 students packing off to summer
school.

A post-mortem of how this error spread unimpeded for so long lays bare a
basic truth of standardized testing: school districts lack the ability to
uncover serious testing errors on their own, and must rely on the testing
companies to do so voluntarily.

Because the testing industry has succeeded in fending off various
proposals for federal oversight, the companies themselves decide what they
will disclose and when.

CTB's error hit hardest in New York City, the nation's largest school
system. Apart from the children, the most prominent victim may have been
the city's schools chancellor, Rudy Crew. The error showed - incorrectly -
that reading scores citywide had stagnated after rising for two years,
raising questions about Dr. Crew's leadership. Within months, he was out
of a job.

Before the mistake was discovered, Dr. Crew had been a leading advocate
for using standardized tests to hold students and educators accountable.
But now, as Congress is poised to vote on a presidential proposal that
would sharply increase the nation's reliance on standardized testing, Dr.
Crew says he has been chastened by his personal experience with the
testing industry.

"The answer is not to use test scores as the sole source of information
about a student's performance," he said. "These are human errors. They're
going to happen again."

The issue, then, is how the test companies handle mistakes once they
occur, educators say. A New York Times examination of CTB's error shows
that the company had been warned repeatedly by testing officials in
Indiana, New York City and other districts that their percentile scores
seemed wrong. While CTB told each not to worry, the company did not
mention the other complaints.

Then, after finding an error, CTB officials waited seven weeks before
passing that critical information on to New York City and other school
districts.

When told of these findings, Dr. Crew, who begins work next month at an
education foundation in San Francisco, expressed disappointment and anger.

"What CTB did was lie," he said.

CTB officials say they did their best to uncover a deeply imbedded
software problem. Once the problem was located, the officials say, they
did not immediately alert any school districts because they wanted to be
absolutely sure of the damage it had caused.

"It was hard to see this," David M. Taggart, the company president, said.
"But, and I think this speaks to the integrity of our company, we didn't
stop looking."

Robert Tobias, the longtime testing director in New York City, does not
accept the company's explanation, particularly in light of the early
warnings that CTB received.

"They clearly did not check carefully enough," he said. "It's that
simple."

Dr. Crew sees a broader problem. "The largest testing companies are guilty
of what most people accuse public schools of," he said. "They've actually
got a monopoly."

In Indiana: The First Indication of a Costly Error

CTB has its headquarters in a tan fortress perched atop a hill overlooking
California's idyllic Monterey Peninsula. Founded in 1926 by a Los Angeles
public school official and his wife, CTB grew into an industry giant after
being acquired in 1965 by McGraw-Hill, a financial information and
publishing company.

CTB's biggest rival, NCS Pearson, might score more student tests - about
one in every two nationwide - but CTB is an industry giant, too, providing
test design as well as scoring. By 1998, nine million students were taking
CTB tests annually, about 40 percent of the market.

Each spring, answer sheets descend on Monterey like a steady rain, with
postmarks from as far away as American military bases in Japan. Once
scored, the results are shipped back to the schools in boxes full of
numbers that are regarded as the definitive educational measure of
children and teachers and schools.

Though CTB's work is widely praised by educators, the company did make two
errors in 1998: one resulted in wrong math scores for a number of Missouri
school districts; the other affected the math scores of a small number of
Florida students who took the company's tests.

Still, as the 1999 testing season began, CTB was the envy of the testing
industry. The company could claim nearly 20 states as customers, all under
contract for several years.

Indiana was one state that believed in CTB, hiring the company to test
about 320,000 students in grades 3, 6, 8 and 10. But when Mr. Kline, the
testing director in Fort Wayne, got his district's scores in early 1999,
he saw that they had plunged unexpectedly.

"I felt sick," he said. "How am I going to explain it to the
superintendent?" Although Indiana did not use the test to promote
students, as many states do, the scores gave politicians and educators a
yardstick to measure student progress. Bad test scores, Mr. Kline knew,
would echo through the city like a tornado warning, causing parents to
worry and teachers to wonder what they had done wrong.

Before releasing the bad news, Mr. Kline called half a dozen other testing
directors to see how they fared. To his surprise, each described nearly
identical drops in scoring. "It was almost unbelievable how similar the
patterns were," Mr. Kline recalled.

It did not make sense, Mr. Kline thought, for so many students in so many
places to fail by nearly the same margin. So he called the testing
company.

CTB officials were not particularly alarmed to hear Mr. Kline's complaint,
because they knew that when test scores drop, the first and easiest
reaction of school officials is to blame the test.

But CTB did agree to look into Indiana's scores, and within days it found
a problem. In trying to compare Indiana students with the rest of the
country, CTB had used an old formula. When the problem was fixed, most
student scores rose, some as much as 10 percentage points.

But Mr. Kline still was not satisfied. He and his colleagues told CTB that
the error did not account for other large, unexplained drops. "Our feeling
was, `There is still more to it, there's something out there that no one's
been able to explain,' " he said.

By now, Mr. Kline had come to suspect that the scoring drop could be
traced to an arcane area of test design called equating.

This process is necessary so scores one year can be compared with those
from previous years, even if different questions are used. States ask for
new questions because they are worried the old questions will leak out.

CTB told Indiana that its sophisticated software program had insured that
the current test was comparable, or equated, to the previous year's test.
But just to be sure, the company agreed to take another look. This time,
the company said it found nothing wrong. "Our confidence in the accuracy
of the equating was reconfirmed," CTB told Indiana in a memorandum on Jan.
18, 1999.

CTB even sent its president, Mr. Taggart, to Indiana in early March, to
personally assure educators that the test scores were solid. In a
follow-up letter, though, the company said it was developing "procedures
to improve quality control in the future."

Reluctantly, Fort Wayne distributed the results to its schools, but not
before Mr. Kline had ordered them stamped: "May contain inaccurate
scores."

Then, with no options left, Mr. Kline gave up, assuming he had heard the
last of the matter.

In New York: Unearned Tickets to Summer School

In April, about the time Mr. Kline was conceding his fight, 300,000
students in New York City's public schools were taking their reading and
math tests in grades 3, 5, 6 and 7. Those tests, too, were designed by
CTB. And though many of the multiple-choice questions were different from
Indiana's, both school systems drew some of their questions from the same
versions of the company's flagship test, Terra Nova.

But the New York City Board of Education and its chancellor, Dr. Crew, had
decided to attach a much greater value to CTB's tests than Indiana did.
For the first time that spring, students in grades 3 and 6 were required
to pass CTB's test, or attend summer school. And if they did poorly in
summer school, they would be held back.

Making such decisions based on a single test score violates the testing
industry's standards, and both CTB and city school officials agree that
the company advised the city against putting such a premium on its test.
But the board forged ahead anyway.

Dr. Crew raised the stakes not only for children but also for school
principals and superintendents of the city's 32 neighborhood school
districts. He announced that, for the first time, school officials would
be judged by how well their students did on the CTB tests. Those educators
whose students scored poorly faced the loss of their jobs.

Dr. Crew's future was also at stake. For two years, Dr. Crew had managed
to do something that had eluded his predecessor, Ramon C. Cortines: forge
a warm relationship with Mayor Rudolph W.  Giuliani. But that was
changing. The issue: school vouchers.

Mr. Giuliani said he believed that taxpayer money should help finance
private-school tuition for thousands of students who were attending
failing public schools. Dr. Crew disagreed with the mayor, and he did so
publicly.

So long as test scores kept going up, Dr. Crew felt that he could defend
his position. If the scores were bad, Dr. Crew's own job would be on the
line.

When the eagerly awaited reading scores arrived from Monterey in early
May, Mr. Tobias, the New York system's testing director, was among the
first to see them.

The news was not good. As in Indiana, many of the students' scores had
dipped sharply from the previous year - so steeply and uniformly as to
appear improbable, Mr. Tobias thought. Knowing how high the stakes were
this year, Mr. Tobias directed his staff to ask CTB whether it had made a
mistake. The company's response, Mr. Tobias recalls, was as swift as it
was definitive: "We can't find anything wrong."

Mr. Tobias continued to press CTB, eventually calling the company himself
to make an argument the company had already heard: perhaps the tests from
one year to the next were not quite equal.  No one told him that he was
echoing Indiana's earlier suspicions.

Still, CTB held firm. "If we were not comfortable, we would have advised
them not to release the data," said Mr. Taggart, CTB's president.

Unsure of what to do, Mr. Tobias held off releasing the results until June
8, the last possible day the scores could be used to make summer-school
assignments.

As the date approached, Mr. Tobias finally told Dr. Crew about his doubts.
Dr. Crew says he seriously considered calling the press to disavow the
results. But as a national spokesman for the movement toward standardized
assessment, Dr. Crew decided his credibility would be lost. He thought he
would be seen as a crybaby.

Mr. Tobias concurred.

"Errors of measurement are a fact of life in this business," Mr. Tobias
said in an interview. "There are times you can explain them. Other times
you just bite the bullet and accept the data as they are."

And so, Dr. Crew summoned reporters to deliver the disappointing news: two
years of progress in reading had apparently stalled.

The mayor said he was "very alarmed and concerned." And Dr. Crew knew he
had some homework to do.

In Tennessee: State Officials Seek Review of Test

Most school districts, including New York City, gauge progress by
comparing students in a particular grade with their predecessors in the
same grade a year earlier. But Tennessee has long used a more
sophisticated approach: it compares a student's test scores as a first
grader with that same student's scores as a second grader, third grader,
and so on through school.

This approach was pioneered and overseen by William Sanders, a longtime
professor at the University of Tennessee, who was curious about how class
size and teaching styles influenced student performance.

In early May 1999, when Professor Sanders received Tennessee's scores from
CTB, he knew from his own data that they could not be right, state testing
officials said. The drops were much too sharp.

Again, state officials recall the company saying not to worry - the scores
were accurate. But Tennessee had something that Indiana and New York City
did not: a treasure trove of data on the performance of actual children
going back six years or more. CTB's results broke patterns in individual
students' scores that had been uninterrupted for years.

Professor Sanders was so insistent that there was a problem that he told
the company he would call a news conference to challenge the results,
Tennessee school officials said.

Then CTB did something that it would not do in any other state: it simply
raised the comparative rankings of many Tennessee students, and lowered
some others, to conform with Mr. Sanders's statistical models - even
though the company could find no error to justify those changes.

The company made this adjustment in late May or early June, just as it was
assuring New York City that its results were correct.

CTB did not tell any of its other customers what it had done for
Tennessee. CTB considers its relationship with each state or district to
be confidential, even if the products that state uses are similar to
others, said Mr. Taggart, the company president.

Moreover, Mr. Taggart said, CTB's researchers had not yet detected any
similarity in the complaints from New York City, Tennessee, Indiana and
another state, Nevada, which had contacted the company around the same
time. Finding a common thread was difficult, Mr. Taggart said, because
each had used a customized version of the same basic test.

But after certifying New York City's results as accurate, and altering
Tennessee's results, CTB began to have its own doubts, the company now
says. In June and into July, unbeknown to its customers, CTB assigned an
army of researchers to investigate its results.

The Results: School Districts Cope With Falling Scores

While CTB stepped up its inquiry, its clients were dealing with the
consequences of the test results they had been given.

In Tennessee, the adjusted results were not distributed to teachers and
principals until late summer, too late to play their customary role in
many districts' decisions on summer school or student promotion.

In Indiana, the districts' very public concerns about the accuracy of the
scores led teachers and principals to be wary about how much stock, if
any, to put in those numbers. And so, educators there grew reluctant to
use the test results to shape their lesson plans.

Nevada had voiced similar concerns to CTB. But state education officials
nonetheless moved forward, branding a handful of schools as "inadequate"
based on their poor scores. One of them was Cambeiro Elementary, in the
shadow of the Las Vegas strip, which was put under the supervision of a
state oversight panel and awarded over $100,000 for remedial programs.
School administrators felt more than a little humiliation.

"At bowling night and at church," Cenie Nelson, the school principal,
said, "teachers were asked by other teachers and friends, `Why would you
want to be associated with a school not doing a good job?' "

But nowhere did CTB's scores have more impact than in New York City. Based
solely on their performance on the test, Dr. Crew immediately ordered
nearly 40,000 third and sixth graders to attend summer school.

"Your child must attend summer school," the superintendent in one district
wrote to parents. "We feel that your child would benefit from this
enriching experience."

Two weeks after releasing the test results, Dr. Crew took direct control
of 43 failing schools, saying he intended to fire many of their
principals. He also fired or eased out 5 of the 32 superintendents who
preside over the city's neighborhood school districts, citing their
failures as leaders as well as their students' test scores.

One of them was Robert Riccobono, then 54, who had brought rigorous
literacy programs to one of the poorest districts in the city, No. 19 in
East New York, Brooklyn. After four years as superintendent, Mr. Riccobono
says, his efforts were starting to bear fruit when Dr. Crew fired him.

"Giuliani was talking tough," Mr. Riccobono said. "Crew felt the need to
find victims."

The day after Dr. Crew announced his firing at a news conference broadcast
live on local television, Mr. Riccobono attended his son's graduation from
high school.

"I felt singled out and embarrassed," said Mr. Riccobono, who had known
teachers at the school for a decade. "I was wondering where I had gone
wrong."

The Inquiry: An Error Is Found Deep in the Software

While New York City was firing administrators and disrupting the summer
vacations of students and teachers, CTB was closing in on evidence that
would undermine those very decisions.

The company's focus was again on the equating process, which allows test
scores to be compared year over year.

As it turned out, CTB - despite its assurances to Indiana and others - had
done an incomplete job of reviewing test data. When a much larger sample
was reviewed, a programming error surfaced.

The error had - erroneously - made the current test appear easier than the
previous year's. To make the tests equal in difficulty, the computer had
then compensated by making it harder for some students to do as well as
they had last time. The error did not change students' right and wrong
answers, but it did affect their comparative percentile scores.

On July 20, Wendy Yen, then the vice president of research for CTB, walked
into the office of Mr. Taggart, the company president, and announced, "We
have found something."

Mr. Taggart decided not to tell schools just yet about the problem,
because, he says, he did not yet know how bad it was. "Would it be a
positive impact, a negative impact, no impact?" Mr. Taggart said.

At the time the company found the error, New York City's students were
just two weeks into a monthlong summer-school program, sweltering in a
heat wave. Even classrooms with air-conditioners routinely registered 90
degrees on indoor thermometers.

Dr. Crew would later say that had he known what CTB knew - no matter how
tentative - "we could have corrected the action midstream, and not put
families through all that torment."

A month later on Aug. 24, after summer school had ended, Mr. Taggart
traveled to New York City to hear, in person, the city's lingering
concerns about the spring results.

"We're the largest school system in the country," Dr. Crew recalls saying.
"You have got to get this right with us."

Again, Mr. Taggart promised to look into the city's complaints. And again,
he did not tell them what he knew about the error.

Mr. Taggart had more to say when he called Mr. Tobias, the city's testing
director, on the first day of school, Sept. 9, 1999.

"We have done further analysis into your concerns about the scoring," Mr.
Tobias recalls being told. "And we have found a problem."

"It's a small problem," Mr. Tobias remembers the company president saying.
"We don't believe it's going to have a huge impact on your scores."

Mr. Tobias quickly did a few calculations of his own.

It seemed, at first, that 3,000 students who had been sent to summer
school in June had in fact scored well enough to have spent the summer as
they wished. That number eventually grew to nearly 9,000 - almost a
quarter of the mandatory summer-school roster.

So much for "a small problem," Mr. Tobias thought.

But the real shock came when school officials learned what the corrected
test scores meant for the entire city. Instead of reading scores
stagnating over all, the citywide average had actually risen five
percentage points - a substantial jump, particularly for an urban school
district.




More information about the wordup mailing list