[Edu-sig] Python Programming: Procedural Online Test

Laura Creighton lac at strakt.com
Tue Dec 6 10:48:04 CET 2005


Interesting.  When the law society of Upper Canada was working
on CAI for teaching tax law to law students, they found that
the percentage correct they set has a definite, measurable effect
on the amount of time that the students were willing to drill.

At above 85%, the students believed they knew everything, and
did not drill much.  Some of these students were correct in their
estimations.  But if you went below 75% correct, the students
believed that the computer system was at fault, and that doing
the drill was a waste of time.

This perfectly matched with the student's own assessment of their
own abilities:  We asked them ahead of time, how well they thought
that they ought to be doing, and they all thought between 70% and
90% was about how well that 'good students should do on tests'.
And LSUC does not _have_ any poor students.

When they were given things to 'learn things for the first time'
they were able to tolerate significantly worse performance by themselves,
before they concluded that the effort was a waste of time.  Thus
one of the things that we tried to measure was the effect of 'what
percentage correct' had on actual amount of tax law learned.

And the result was unequivocal.  It is better to give each student a
percent correct that was slightly, but not substantially higher than
they estimated their own abilities, and always better than 70% in
any case.  The poorer students will drill for 16 or 18 hours, and
the best ones would only take 4, but by the end of the exercise,
everybody was scoring pretty well on tax questions.

Laura


In a message of Mon, 05 Dec 2005 20:19:20 CST, "damon bryant" writes:
>
>Hi Rodrigo!
>
>>If I understood correctly the proposal is to give a "hard"-A for some
>>and an "easy"-A
>>for others, so everybody have A's (A=='good score'). Is that it?
>
>No, students are not receiving a hard A or an easy A. I make no 
>classifications such as those you propose. My point is that questions are
> 
>placed on the same scale as the ability being measured (called a theta 
>scale). Grades may be mapped to the scale though, but a hard A or easy A 
>will not be assigned under aforementioned conditions described.
>
>Because all questions in the item bank have been linked, two students can
> 
>take the same computer adaptive test but have no items in common between 
>the 
>two administrations. However, scores are on the same scale. Research has 
>shown that even low ability students, despite their performance, prefer 
>computer adaptive tests over static fixed-length tests. It has also been 
>shown to lower test anxiety while serving the same purpose as fixed-lengt
>h 
>linear tests in that educators are able to extract the same level of 
>information about student achievement or aptitude without banging a 
>student's head up against questions that he/she may have a very low 
>probability of getting correct. The high ability students, instead of bei
>ng 
>bored, are receiving questions on the higher end of the theta scale that 
>are 
>appropriately matched to their ability to challenge them.
>
>>That sounds like
>>sweeping the dirt under the carpet. Students will know. We have to
>>prepare them to
>>tackle failure as well as success.
>
>In fact computer adaptive tests are designed to administer items to a per
>son 
>of a SPECIFIC ability that will yield a 50/50 chance of correctly 
>responding. For example, there are two examinees: Examinee A has a true 
>theta of -1.5, and Examinee B has a true theta of 1.5. The theta scale ha
>s a 
>typical range of -3 to 3. There is a question that has been mapped to the
> 
>theta scale and it has a difficulty value of 1.5, how we estimate this is
> 
>beyond our discussion but is relatively easy to do with Python. The item 
>is 
>appropriately match for Examinee B because s/he has approximately a 50% 
>chance of getting this one right - not a very high chance or a very low 
>chance of getting it correct but a equi-probable opportunity of either a 
>success or a failure.
>
>According to sampling theory, with multiple administrations of this item 
>to 
>a population of persons with a theta of 1.5, there will be an approximate
>ly 
>equal number of successes and failures on this item, because the odds of 
>getting it correct vs. incorrect are equal. However, with multiple 
>administrations of this same item to a population of examinees with a the
>ta 
>of -1.5, which is substantially lower than 1.5, there will be exceedingly
> 
>more failures than successes. Adaptive test algorithms seek to maximize 
>information about examinees by estimating their ability and searching for
> 
>questions in the item bank that match their ability levels, thus providin
>g a 
>50/50 chance of getting it right.
>
>This is very different than administering a test where the professor seek
>s 
>to have an average score is 50% because low ability students will get the
> 
>vast majority of questions wrong, which could potentially increase anxiet
>y, 
>decrease self-efficacy, and lower the chance of acquiring information in 
>subsequent teaching sessions (Bandura, self regulation). Adaptive testing
> is 
>able to mitigate the psychological influences of testing on examinees by 
>seeking to provide equal opportunities for both high and low ability 
>students to experience success and failure to the same degree by getting 
>items that are appropriately matched to their skill level. This is the 
>aspect of adaptive testing that is attractive to me. It may not solve the
> 
>problem, but it is a way of using technology to move in the right directi
>on. 
>I hope this is a better explanation than what I provided earlier.
>
>
>
>>From: Rodrigo Senra <rsenra at acm.org>
>>To: edu-sig at python.org
>>Subject: Re: [Edu-sig] Python Programming: Procedural Online Test
>>Date: Mon, 5 Dec 2005 19:53:00 -0200
>>
>>
>>On 5Dec 2005, at 7:50 AM, damon bryant wrote:
>>
>> > One of the main reasons I decided to use an Item Response Theory (IRT
>)
>> > framework was that the testing platform, once fully operational,
>> > will not
>> > give students questions that are either too easy or too difficult
>> > for them,
>> > thus reducing anxiety and boredom for low and high ability students,
>> > respectively. In other words, high ability students will be
>> > challenged with
>> > more difficult questions and low ability students will receive
>> > questions
>> > that are challenging but matched to their ability.
>>
>>So far so good...
>>
>> > Each score is on the same scale, although some students will not
>> > receive the same questions. This is the beautiful thing!
>>
>>I'd like to respectfully disagree. I'm afraid that would cause more
>>harm than good.
>>One side of student evaluation is to give feedback *for* the
>>students. That is a
>>relative measure, his/her performance against his/her peers.
>>
>>If I understood correctly the proposal is to give a "hard"-A for some
>>and an "easy"-A
>>for others, so everybody have A's (A=='good score'). Is that it ?
>>That sounds like
>>sweeping the dirt under the carpet. Students will know. We have to
>>prepare them to
>>tackle failure as well as success.
>>
>>I do not mean such efforts are not worthy, quite the reverse. But I
>>strongly disagree
>>with an adaptive scale. There should be a single scale fro the whole
>>spectre of tests.
>>If some students excel their results must show this, as well as if
>>some students perform
>>poorly that should not be hidden from them. Give them a goal and the
>>means to pursue
>>their goal.
>>
>>If I got your proposal all wrong, I apologize ;o)
>>
>>best regards,
>>Senra
>>
>>
>>Rodrigo Senra
>>______________
>>rsenra @ acm.org
>>http://rodrigo.senra.nom.br
>>
>>
>>
>>
>>_______________________________________________
>>Edu-sig mailing list
>>Edu-sig at python.org
>>http://mail.python.org/mailman/listinfo/edu-sig
>
>
>_______________________________________________
>Edu-sig mailing list
>Edu-sig at python.org
>http://mail.python.org/mailman/listinfo/edu-sig


More information about the Edu-sig mailing list