#H809 Issues with Student Experience Surveys

The analysis of the Ardalan et al paper, that compares students’ responses to paper-based and online course evaluation surveys, for TMA03 made me look at a paper from Mantz Yorke (Yorke, 2009) that empirically analyses the effect of some design elements in student experience surveys.  The paper is worthwhile alonefor its extensive literature overview of research findings and underlying psychological constructs that attempt to explain those findings.

Schematic overview of Yorke (2009) paper

Schematic overview of Yorke (2009) paper

In the empirical part of the paper the author looks at 4 research questions:

  1. Does the directionality of the presentation of a set of response options (‘strongly agree’ to ‘strongly disagree’, and vice versa) affect the responses?
  2. When there are negatively stated items, does the type of negativity affect the outcome?
  3. Does using solely positively stated items produce a different response pattern from a mixture of positively and negatively stated items?
  4. Does having negatively stated items in the early part of a questionnaire produce a different pattern of responses than when such items are left until later in the instrument?

Despite the lack of statistically significant findings the author writes:

‘Statistically non-significant findings seem often to be treated as if they were of no practical significance. The investigations reported in this article do, however, have a practical significance even though very little of statistical significance emerged’ (Yorke, 2009, p.734).

The nature of the reflection will depend on the context, such as the purpose (formative vs. summative) of the survey and the local culture (Berkvens, 2012).  The author offers a rich overview of items that should be part of such a reflection and discusses explanatory frameworks from psychology.  Unlike the Ardalan paper, the attempt to explain findings by referring to psychological theory moves the paper beyond mere correlations and creates  causal and predictive value.

Advertisements

#H809 Validity and Reliability

Two key terms in H809, originally introduced by Campbell and Stanley (1963) and often confused.  Validity in itself is a contested term, with a variety of category schemes designed over the years.  Below a scheme summarizing the two terms, based on references recommended in the course text.

Apart from focusing on validity, reliability and its sub-categories, the course texts suggests using a list of critical questions to evaluate research findings, such as:

  • Does the  study discuss how the findings are generalisable to other contexts?
  • Does the study show correlations or causal relationships?
  • Does the study use an underlying theoretical framework to predict and explain findings?
  • How strong is the evidence? (in terms of statistical significance, triangulation of methods, sample size…)
  • Are there alternative explanations?
validity reliability

Scheme summarizing validity and reliability, based on Trochim (2007)

The Hawthorne effect, the name derived from a series of studies in the 1920s at the Hawthorne Works manufacturing plants in the mid-western US.  It’s often misinterpreted (‘mythical drift’) as a kind of scientific principle, describing the effect that the researcher has on the experiment, or the effect of the awareness by those being studied that they’re part of an experiment.   In reality, the Hawthorne studies are useful to highlight some of the pitfalls of dealing with people (both the researcher as the research objects) in research.

References

  • Anon (2009) ‘Questioning the Hawthorne effect: Light work’, The Economist, [online] Available from: http://www.economist.com/node/13788427 (Accessed 28 April 2013).
  • Olson, Ryan, Hogan, Lindsey and Santos, Lindsey (2005) ‘Illuminating the History of Psychology: tips for teaching students about the Hawthorne studies’, Psychology Learning & Teaching, 5(2), p. 110.

 

Too Hard To Measure: On the Value of Experiments and the Difficulty to Measure Lesson Quality

Interesting article in The Guardian (from some time ago, I’m a slow reader) about the overblown importance attributed to doing experiments during science lessons.

The article reminds me of my experience in Cambodia, where experiments are also frequently espoused as proof of a student-centred lesson.  In reality experiments in Cambodian classrooms are often a very teacher-centred activity:

  • the teacher demonstrates and students (at best) trying to observe what happens.
  • students do the experiment in large groups, by adhering to a strict series of steps outlined in a worksheet.
  • students work in large groups, in which usually only one or two students do the work, The others are merely bystanders.
  • the procedure, observations and interpretation of the experiment are laid down in detail beforehand.

The article touches upon two interesting elements.  First, there is the questionable educational value of many experiments in science classes.  secondly, there is the challenge to measure lesson quality beyond ‘ticking off’ the occurrence of activities such as experiments.

The article refers to ‘The Fallacy of Induction‘ from Rosalind Driver.  Her book ‘Making Sense of Secondary Science’ is an excellent book on misconceptions in science education and has been an important inspiration for me.  

“Driver doesn’t dismiss practical work in science, but argues that ‘Many pupils do not know the purpose of practical activity, thinking that they ‘do experiments’ in school to see if something works, rather than to reflect on how a theory can explain observations.” (Driver et al, 1993, p.7).

She raises two main arguments.  First, practical activities are often presented to students as a simulation of ‘how science really works’, collecting data, making observations, drawing inferences and arriving at a conclusion which is the accepted explanation.  It’s simplistic, and pupils happily play along, following the ‘recipe’ in the ‘cookbook’, checking whether they have ‘the right answer’.  In 

reality, science rarely works this way:

“For a long time philosophers of science and scientists themselves have recognised the limitations of the inductive position and have acknowledged the important role that imagination plays in the construction of scientific theories.” (Driver, 1994, p.43)

The second argument is that pupils don’t arrive in class with a blank slate, but with a whole range of self-constructed interpretations or ‘theories’ on how natural phenomena work. These ‘preconceptions’ require more than an experiment to change, as children tend to fit observations within their own ‘theoretical framework’.

Observations are not longer seen as objective but influenced by the theoretical perspective of the observer. ‘As Popper said, ‘we are prisoners caught in the framework of our theories.’ This too has implications for school science, for children, too, can be imprisoned in this way by their preconceptions, observing the world throught their own particular ‘conceptual spectacles.’ (Driver, 1994, p.44)

“Misconceptions can be changed if they are made explicit, discussed and challenged with contradicting evidence.  After this ‘unlearning’ phase, children may adopt a different framework.  Driver concludes: ‘Experience by itself is not enough. It is the sense that students make of it that matters” (Driver et al, 1993, p.7).  

Discussion activities, in which pupils have the opportunity to make their reasoning explicit and to engage with and try out alternative viewpoints, including the ‘scientific one’, need to be central (cognitive conflict). Practical activities can be complementary to these discussions, instead of the other way around, when discussion and conclusion are quickly reeled off at the end of the practicum.

 

Measuring lesson quality

However, the love for experiments while neglecting the question whether and what students are actually learning also touches upon the difficulty to measure adequately lesson quality.  Limited time and resources result in a focus on outward and visible signs. However, these:

  • deny the complexity of teaching and learning;
  • deny the individuality of students’ learning and understanding;
  • steers teachers and programme staff towards focusing on these outward signs, as they know they will be evaluated on these criteria. 

Collecting valid and reliable data on lesson quality is hard.  Self-assessment instruments are notoriously prone to confirmation bias. Lesson observations don’t give a reliable everyday picture of lesson practice.  They suffer from the fact that teachers pull out special lessons when visitors appear for announced (or unannounced) visits.   Conversely, as Cuban describes beautifully, other teachers tremble and panic when an evaluator walks into their classroom and the lesson becomes a shambles.

Evidence-based evaluation is often touted as the way forward for development projects.  Randomized trials in health have been useful to collect a body of knowledge on what works and what not. In a randomized trial a group of students where teachers received pedagogical training is compared with a group of students where teachers didn’t receive training.  Comparisons can be made with test scores, student satisfaction or drop-outs.


However, test scores are unsuitable as exams are notoriously prone to cheating and questions focus on recollecting factual knowledge, the opposite of what we want to achieve.  A self-designed test could be a solution, but there’s the risk that programme activities will focus more on the test than on improving teaching skills.  Student satisfaction scores are prone to the aforementioned confirmation bias.  Drop-outs are hard to use as they are influenced by many interrelated factors such as geography, economic growth and government policy.


Ownership by the direct target group on the evaluation is part of the solution in my opinion, as well as using a variety of data sources.  In future blog posts I plan to write more on how we try to measure lesson quality.


————————

For more detail see this available study from Prof. James Dillon (pdf) on the value of practical work in science education.
Dri­ver, R. (1994) ‘The fal­lacy of induc­tion in sci­ence teach­ing’, in Teach­ing Sci­ence, ed. Levin­son, R., Lon­don, Rout­ledge, pp.41–48.

Driver, R., Squires, A., Rushworth, P. and Wood-Robinson, V. (1993) Making Sense of Secondary Science, Routledge.