Re: [Edu-sig] Python Programming: Procedural Online Test

Kirby: Thank you for your feedback! You completed the Declarative measure. I am also interested in your feedback on the Procedural test in which Python application or procedural questions are administered. Questions on this part are coded and displayed as they appear in the IDLE with highlighted key words and indentation for nested code. I think the longest code problem is about 20 lines. I appreciate your comment on the ability to specify font type/size because I'm currently working to accommodate persons with disabilities and others who may have difficulty viewing the text. Linda Grandel and I have an experimental site that we use for research and educational purposes; we are going to trial some Python questions for a class of her colleagues but are having some difficultly translating the testing in a short period. This is a long-term project. We have the goal of developing a worldwide database of Python test norms in an effort to track progress on the spread and proficiency of the language in different countries. Although it is a great idea, it is too large for a dissertation research project. If you are interested in trialing it with your class, perhaps we can collaborate. You did notice that towards the end, questions got easier for you. The test algorithm is adaptive but the question bank from which the items are pulled is not that large. In other words, the test presented items that were most appropriate for you when you began the test. As you got more items correct you got harder questions. In contrast, if you initially got questions incorrect, you would have received easier questions. Because the bank is so small (I do have plans of expanding it when I get some more time on my hands), you exhausted the bank of difficult questions and began to receive easier items. The opposite would have happened to an examinee of very low ability. My goal is to administer a computer adaptive Python test where examinees will only receive questions that are most appropriate for them. In other words, different examinees will be tested according to their ability. This goes back to Binet's idea of tailored testing where the psychologist administering the intelligence test would give items to an examinee based on previous responses. In the present case, it's done by computer using an artificially intelligent algorithm based on my dissertation. By expanding the question bank, I'll be able to reach that goal.
From: "Kirby Urner" <urnerk@qwest.net> To: "'damon bryant'" <damonbryant@msn.com>, vceder@canterburyschool.org CC: edu-sig@python.org Subject: RE: [Edu-sig] Python Programming: Procedural Online Test Date: Sat, 3 Dec 2005 07:44:32 -0800
I tweaked it now where all other browsers and OS combinations can access the computer adaptive tests. Performance may be unpredictable though.
Damon
OK, thanks. Worked with no problems.
As an administrator, I'd be curious to get the actual text of missed problems (maybe via URL), not just a raw percentage (I got 90% i.e. 2 wrong -- probably the one about getting current working directory, not sure which other).
The problems seemed to get much easier in the last 5 or so (very basic syntax questions). The one about "James"=="james" returning -1 is no longer true on some Pythons (as now we have boolean True).
The font used to pose the questions was a little distracting. I vastly prefer fixed width fonts when programming. I know that's a personal preference (some actually like variable pitch -- blech). Perhaps as a future enhancement, you could let the user customize the font?
Anyway, a useful service. I could see teachers like me wanting to use this with our classes.
Thank you for giving me this opportunity.
Kirby
-- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.362 / Virus Database: 267.13.11/191 - Release Date: 12/2/2005

As you got more items correct you got harder questions. In contrast, if you initially got questions incorrect, you would have received easier questions.... In the 70s there was research on such systems (keeping people at 80% correct is great rule-of-thumb goal). See Stuff done at Stanford's Institute for Mathematical Studies in the Social Sciences. At IMSSS we did lots of this kind of stuff. We generally broke the skills into strands (separate concepts), and kept track of the student's performance in each strand separately (try it; it helps). BIP (Basic Instructional Program) was an ONR (Office of Naval Research) sponsored system, that
damon bryant wrote: tried to teach "programming in Basic." The BIP model (and often the "standard" IMSSS model) was to score every task in each strand, and find the "best" for the student based on his current position. For arithmetic, we actually generated problems based on the different desired strand properties; nobody was clever enough to generate software problems; we simply consulted our DB. We taught how to do proofs in Logic and Set Theory using some of these techniques. Names to look for on papers in the 70s-80s include Patrick Suppes (head of one side of IMSSS), Richard Atkinson (head of the other side), Barbara Searle, Avron Barr, and Marian Beard. These are not the only people who worked there, but a number I recall that should help you to find the research publications (try Google Scholar). A follow-on for some of this work is: http://www-epgy.stanford.edu/ I worked there "back in the day" and was quite proud to be a part of some of that work. --Scott David Daniels Scott.Daniels@Acm.Org

Scott: I will attempt to incorporate your suggestion of keeping track of performance; I'll need to create some attributes on the examinee objects that will hold past test scores created within the system. I am, however, approaching the scoring differently. Although I do report percentage correct, I'm using Item Response Theory to (1) score each question, (2) estimate ability using a Bayesian algorithm based on maximum likelihood, (3) estimate the error in the estimate of ability, and (4) select the most appropriate question to administer next. This is very similar to what is done at the Educational Testing Service in Princeton with the computer adaptive versions of the SAT and the GRE. I don't know the language used to develop their platform, but this one for the demo is developed in Python using numarray and multithreading modules to widen the bottlenecks and speed the delivery of test questions served in html format to the client's page. Thanks for your comments! By the way, I am looking for teachers, preferably middle and high school, who would be willing to trial the system. I have another site where they will have the ability to enroll students, monitor testing status, and view scores for all students. Do you know of any?
From: Scott David Daniels <Scott.Daniels@Acm.Org> To: edu-sig@python.org Subject: Re: [Edu-sig] Python Programming: Procedural Online Test Date: Sat, 03 Dec 2005 12:03:06 -0800
As you got more items correct you got harder questions. In contrast, if you initially got questions incorrect, you would have received easier questions.... In the 70s there was research on such systems (keeping people at 80% correct is great rule-of-thumb goal). See Stuff done at Stanford's Institute for Mathematical Studies in the Social Sciences. At IMSSS we did lots of this kind of stuff. We generally broke the skills into strands (separate concepts), and kept track of the student's performance in each strand separately (try it; it helps). BIP (Basic Instructional Program) was an ONR (Office of Naval Research) sponsored system, that
damon bryant wrote: tried to teach "programming in Basic." The BIP model (and often the "standard" IMSSS model) was to score every task in each strand, and find the "best" for the student based on his current position. For arithmetic, we actually generated problems based on the different desired strand properties; nobody was clever enough to generate software problems; we simply consulted our DB. We taught how to do proofs in Logic and Set Theory using some of these techniques. Names to look for on papers in the 70s-80s include Patrick Suppes (head of one side of IMSSS), Richard Atkinson (head of the other side), Barbara Searle, Avron Barr, and Marian Beard. These are not the only people who worked there, but a number I recall that should help you to find the research publications (try Google Scholar).
A follow-on for some of this work is: http://www-epgy.stanford.edu/
I worked there "back in the day" and was quite proud to be a part of some of that work.
--Scott David Daniels Scott.Daniels@Acm.Org
_______________________________________________ Edu-sig mailing list Edu-sig@python.org http://mail.python.org/mailman/listinfo/edu-sig

I wrote:
... keeping people at 80% correct is great rule-of-thumb goal ...
To elaborate on the statement above a bit, we did drill-and practice teaching (and had students loving it). The value of the 80% is for maximal learning. Something like 50% is the best for measurement theory (but discourages the student drastically). In graduate school I had one instructor who tried to target his tests to get 50% as the average mark. It was incredibly discouraging for most of the students (I eventually came to be OK with it, but it took half the course). The hardest part to create is the courseware (including questions), the second-hardest effort is scoring the questions (rating the difficulty in all applicable strands). The software to deliver the questions was, in many senses, a less labor-intensive task (especially when amortized over a number of courses). I think we came up with at least a ten-to-one ratio (may have been higher, but definitely not lower) in effort compared to the new prep for a course by an instructor. I am (and was) a programming, rather than an education, guy. I do not know the education theory behind our research well, but I know how a lot of the code worked (and know where some of our papers went). We kept an exponentially decaying model of the student's ability in each "strand" and used that to help the estimate of his score in the coming question "cloud." A simplified version of the same approach would be to have strand-specific questions, randomly pick a strand, and pick the "best" question for that student in that strand. Or, you could bias the choices between strands to give more balanced progress (increasing the probability of work where the student is weakest). --Scott David Daniels Scott.Daniels@Acm.Org
participants (2)
-
damon bryant
-
Scott David Daniels