Re: [Edu-sig] Python Programming: Procedural Online Test
Interesting. When the law society of Upper Canada was working on CAI for teaching tax law to law students, they found that the percentage correct they set has a definite, measurable effect on the amount of time that the students were willing to drill. At above 85%, the students believed they knew everything, and did not drill much. Some of these students were correct in their estimations. But if you went below 75% correct, the students believed that the computer system was at fault, and that doing the drill was a waste of time. This perfectly matched with the student's own assessment of their own abilities: We asked them ahead of time, how well they thought that they ought to be doing, and they all thought between 70% and 90% was about how well that 'good students should do on tests'. And LSUC does not _have_ any poor students. When they were given things to 'learn things for the first time' they were able to tolerate significantly worse performance by themselves, before they concluded that the effort was a waste of time. Thus one of the things that we tried to measure was the effect of 'what percentage correct' had on actual amount of tax law learned. And the result was unequivocal. It is better to give each student a percent correct that was slightly, but not substantially higher than they estimated their own abilities, and always better than 70% in any case. The poorer students will drill for 16 or 18 hours, and the best ones would only take 4, but by the end of the exercise, everybody was scoring pretty well on tax questions. Laura In a message of Mon, 05 Dec 2005 20:19:20 CST, "damon bryant" writes:
Hi Rodrigo!
If I understood correctly the proposal is to give a "hard"-A for some and an "easy"-A for others, so everybody have A's (A=='good score'). Is that it?
No, students are not receiving a hard A or an easy A. I make no classifications such as those you propose. My point is that questions are
placed on the same scale as the ability being measured (called a theta scale). Grades may be mapped to the scale though, but a hard A or easy A will not be assigned under aforementioned conditions described.
Because all questions in the item bank have been linked, two students can
take the same computer adaptive test but have no items in common between the two administrations. However, scores are on the same scale. Research has shown that even low ability students, despite their performance, prefer computer adaptive tests over static fixed-length tests. It has also been shown to lower test anxiety while serving the same purpose as fixed-lengt h linear tests in that educators are able to extract the same level of information about student achievement or aptitude without banging a student's head up against questions that he/she may have a very low probability of getting correct. The high ability students, instead of bei ng bored, are receiving questions on the higher end of the theta scale that are appropriately matched to their ability to challenge them.
That sounds like sweeping the dirt under the carpet. Students will know. We have to prepare them to tackle failure as well as success.
In fact computer adaptive tests are designed to administer items to a per son of a SPECIFIC ability that will yield a 50/50 chance of correctly responding. For example, there are two examinees: Examinee A has a true theta of -1.5, and Examinee B has a true theta of 1.5. The theta scale ha s a typical range of -3 to 3. There is a question that has been mapped to the
theta scale and it has a difficulty value of 1.5, how we estimate this is
beyond our discussion but is relatively easy to do with Python. The item is appropriately match for Examinee B because s/he has approximately a 50% chance of getting this one right - not a very high chance or a very low chance of getting it correct but a equi-probable opportunity of either a success or a failure.
According to sampling theory, with multiple administrations of this item to a population of persons with a theta of 1.5, there will be an approximate ly equal number of successes and failures on this item, because the odds of getting it correct vs. incorrect are equal. However, with multiple administrations of this same item to a population of examinees with a the ta of -1.5, which is substantially lower than 1.5, there will be exceedingly
more failures than successes. Adaptive test algorithms seek to maximize information about examinees by estimating their ability and searching for
questions in the item bank that match their ability levels, thus providin g a 50/50 chance of getting it right.
This is very different than administering a test where the professor seek s to have an average score is 50% because low ability students will get the
vast majority of questions wrong, which could potentially increase anxiet y, decrease self-efficacy, and lower the chance of acquiring information in subsequent teaching sessions (Bandura, self regulation). Adaptive testing is able to mitigate the psychological influences of testing on examinees by seeking to provide equal opportunities for both high and low ability students to experience success and failure to the same degree by getting items that are appropriately matched to their skill level. This is the aspect of adaptive testing that is attractive to me. It may not solve the
problem, but it is a way of using technology to move in the right directi on. I hope this is a better explanation than what I provided earlier.
From: Rodrigo Senra <rsenra@acm.org> To: edu-sig@python.org Subject: Re: [Edu-sig] Python Programming: Procedural Online Test Date: Mon, 5 Dec 2005 19:53:00 -0200
On 5Dec 2005, at 7:50 AM, damon bryant wrote:
One of the main reasons I decided to use an Item Response Theory (IRT ) framework was that the testing platform, once fully operational, will not give students questions that are either too easy or too difficult for them, thus reducing anxiety and boredom for low and high ability students, respectively. In other words, high ability students will be challenged with more difficult questions and low ability students will receive questions that are challenging but matched to their ability.
So far so good...
Each score is on the same scale, although some students will not receive the same questions. This is the beautiful thing!
I'd like to respectfully disagree. I'm afraid that would cause more harm than good. One side of student evaluation is to give feedback *for* the students. That is a relative measure, his/her performance against his/her peers.
If I understood correctly the proposal is to give a "hard"-A for some and an "easy"-A for others, so everybody have A's (A=='good score'). Is that it ? That sounds like sweeping the dirt under the carpet. Students will know. We have to prepare them to tackle failure as well as success.
I do not mean such efforts are not worthy, quite the reverse. But I strongly disagree with an adaptive scale. There should be a single scale fro the whole spectre of tests. If some students excel their results must show this, as well as if some students perform poorly that should not be hidden from them. Give them a goal and the means to pursue their goal.
If I got your proposal all wrong, I apologize ;o)
best regards, Senra
Rodrigo Senra ______________ rsenra @ acm.org http://rodrigo.senra.nom.br
_______________________________________________ Edu-sig mailing list Edu-sig@python.org http://mail.python.org/mailman/listinfo/edu-sig
_______________________________________________ Edu-sig mailing list Edu-sig@python.org http://mail.python.org/mailman/listinfo/edu-sig
participants (1)
-
Laura Creighton