#H809 Issues with Student Experience Surveys

The analysis of the Ardalan et al paper, that compares students’ responses to paper-based and online course evaluation surveys, for TMA03 made me look at a paper from Mantz Yorke (Yorke, 2009) that empirically analyses the effect of some design elements in student experience surveys.  The paper is worthwhile alonefor its extensive literature overview of research findings and underlying psychological constructs that attempt to explain those findings.

Schematic overview of Yorke (2009) paper

Schematic overview of Yorke (2009) paper

In the empirical part of the paper the author looks at 4 research questions:

  1. Does the directionality of the presentation of a set of response options (‘strongly agree’ to ‘strongly disagree’, and vice versa) affect the responses?
  2. When there are negatively stated items, does the type of negativity affect the outcome?
  3. Does using solely positively stated items produce a different response pattern from a mixture of positively and negatively stated items?
  4. Does having negatively stated items in the early part of a questionnaire produce a different pattern of responses than when such items are left until later in the instrument?

Despite the lack of statistically significant findings the author writes:

‘Statistically non-significant findings seem often to be treated as if they were of no practical significance. The investigations reported in this article do, however, have a practical significance even though very little of statistical significance emerged’ (Yorke, 2009, p.734).

The nature of the reflection will depend on the context, such as the purpose (formative vs. summative) of the survey and the local culture (Berkvens, 2012).  The author offers a rich overview of items that should be part of such a reflection and discusses explanatory frameworks from psychology.  Unlike the Ardalan paper, the attempt to explain findings by referring to psychological theory moves the paper beyond mere correlations and creates  causal and predictive value.

#H809 Research on MOOCs

credit: Freedigitalphotos

credit: Freedigitalphotos

Week 12 in the H809 course and MOOCs – the official educational buzzword of 2012 – couldn’t remain absent.  The focus in this course is not so much on what MOOCs are, their history and the different types with their various underlying pedagogies and ideologies.  I blogged on MOOCs before, as a participant in LAK11, a connectivist MOOC on learning analytics.  In H809 the focus lies on issues such as:

  • What kind of information and research is available on MOOCs?
  • What kind of MOOC research would be interesting to do?
  • What are benefits and limitations of the type of information on MOOCs that is around?
  • What is the educational impact (rather than the press impact) of MOOCs?

Much information on MOOCs consists of the so-called grey literature.  Main information sources include:

  • blogs from practitioners and academics, with an overrepresentation of academics from Athabasca Un. and the OU.
  • blogs from participants in MOOCs, sharing their experiences
  • articles in open academic journals such as IRRODL, EURODL, Open Praxis
  • articles in more popular education magazines such as Inside Higher Education and The Chronicle of HE.
  • articles in the general press such as The Economist and The New York Times

Some comments on these sources:

  1. The term ‘grey literature’ may sound a bit disparagingly.  However, as Martin Weller writes, notions of scholarship and  academic publishing are evolving.  Blogs and open journals constitute alternative forms of scholarship with more interaction, less formality and shorter ‘turnaround’ times.
  2. Information and research on MOOCs is heavily Anglo-Saxon centred (or perhaps better Silicon Valley-centred?).  I couldn’t hardly find any articles on MOOCs in Dutch, although that might not be so surprising.  Although MOOCs (xMOOCs) are often touted as a ‘solution’ for developing countries, there are few perspectives from researchers from developing countries.  As Mike Trucano writes on the EdTech blog from the World Bank:

    “Public discussions around MOOCs have tended to represent viewpoints and interests of elite institutions in rich, industrialized countries (notably the United States) — with a presumption in many cases that such viewpoints and interests are shared by those in other places.”

  3. It’s interesting to see how many of the more general news sources seem to have ‘discovered’ MOOCs only after the Stanford AI course and the subsequent influx of venture capital in start-ups such as Coursera, Udacity and edX.  The ‘original’ connectivist MOOCs, that have been around since 2008, let alone open universities are hardly mentioned in those overviews.  A welcome exception is the Open Praxis paper from Peter and Deimann that discusses historical manifestations of openness such as the coffee houses in the 17th century.
  4. The advantage of this grey literature is that it fosters a tremendously rich discussion on the topic. Blog posts spark other blog posts and follow-up posts. Course reflections are online immediately after the course. Events such as a failing Coursera MOOC or an OU MOOC initiative get covered extensively from all angles. This kind of fertile academic discussion can hardly be imagined with the closed peer-review publication system.
  5. The flipside of this coin is that there are a lot of opinions around, a lot of thinly-disguised commercialism and a lot of plain factual mistakes (TED talks!).  MOOCs may be heading for a ‘trough of disappointment’ in Gartner’s hype cycle.  Rigorous research would still be valuable.  For example, most research is descriptive rather than experimental and is based on ridiculously small samples collected in a short time.  Interrater reliability may be a problem in much MOOC research .  Longitudinal studies that investigate how conversations and interactions evolve over time are absent.
  6. Sir John Daniel’s report ‘Making Sense of MOOCs‘ offers a well-rounded and dispassionate overview of MOOCs until September 2012.

Interesting research questions for research on MOOCs could be:

  • What constitutes success in a MOOC for various learners?
  • How do learners interact in a MOOC? Are there different stages?  Is there community or rather network formation? Do cMOOCs really operate according to connectivist principles?
  • What are experiences from MOOC participants and perspectives of educational stakeholders (acreditation agencies, senior officials, university leaders) in developing countries?
  • Why do people choose not to participate in a MOOC and still prefer expensive courses at brick-and-mortar institutions?
  • What factors inhibit or enhance the learning experience within a MOOC?
  • How to design activities within a MOCO that foster conversation without causing information overload?
  • How do MOOCs affect hosting institutions (e.g. instructor credibility and reputation) and what power relations and decision mechanisms are at play (plenty of scope for an activity theoretical perspective here).

A few comments:

  • High drop-out rates in MOOCs have caught a lot of attention.  Opinions are divided whether this is a problem or not.  As they are free, the barrier to sign up is much lower.  Moreover, people may have various goals and may just be interested in a few parts of the MOOC.
  • MOOCs (at least the cMOOCs) are by its nature decentralized, stimulating participants to create artefacts using their own tools and networks, rather than a central LMS.  cMOOCs remain accessible online and lack the clear start and beginning of traditional courses. This complicates data collection and research.
  • Although MOOCs are frequently heralded as a solution for higher education in developing countries, it would be interesting to read accounts from learners from developing countries for whom a MOOC actually was a serious alternative to formal education. The fact that MOOCs are not eligible for credits (at the hosting institution) plays a role, as well as cultural factors, such as a prevalent teacher-centred view on education in Asian countries.


Overview of posts on MOOCs from Stephen Downes: http://www.downes.ca/mooc_posts.htm

Overview of posts on MOOCs from George Siemens: https://www.diigo.com/user/gsiemens/mooc

OpenPraxis theme issue on Openness in HE: http://www.openpraxis.org/index.php/OpenPraxis/issue/view/2/showToc

IRRODL theme issue on Connectivism, and the design and delivery of social networked learning: http://www.irrodl.org/index.php/irrodl/issue/view/44

Armstrong, L. (2012) ‘Coursera and MITx – sustaining or disruptive? – Changing Higher Education’,

Peter, S. and Deimann, M. (2013) ‘On the role of openness in education: A historical reconstruction’, Open Praxis, 5(1), pp. 7–14.
Daniel, J. (2012) ‘Making sense of MOOCs: Musings in a maze of myth, paradox and possibility’, Journal of Interactive Media in Education, 3, [online] Available from: http://www-jime.open.ac.uk/jime/article/viewArticle/2012-18/html

#H809 Validity and Reliability

Two key terms in H809, originally introduced by Campbell and Stanley (1963) and often confused.  Validity in itself is a contested term, with a variety of category schemes designed over the years.  Below a scheme summarizing the two terms, based on references recommended in the course text.

Apart from focusing on validity, reliability and its sub-categories, the course texts suggests using a list of critical questions to evaluate research findings, such as:

  • Does the  study discuss how the findings are generalisable to other contexts?
  • Does the study show correlations or causal relationships?
  • Does the study use an underlying theoretical framework to predict and explain findings?
  • How strong is the evidence? (in terms of statistical significance, triangulation of methods, sample size…)
  • Are there alternative explanations?
validity reliability

Scheme summarizing validity and reliability, based on Trochim (2007)

The Hawthorne effect, the name derived from a series of studies in the 1920s at the Hawthorne Works manufacturing plants in the mid-western US.  It’s often misinterpreted (‘mythical drift’) as a kind of scientific principle, describing the effect that the researcher has on the experiment, or the effect of the awareness by those being studied that they’re part of an experiment.   In reality, the Hawthorne studies are useful to highlight some of the pitfalls of dealing with people (both the researcher as the research objects) in research.


  • Anon (2009) ‘Questioning the Hawthorne effect: Light work’, The Economist, [online] Available from: http://www.economist.com/node/13788427 (Accessed 28 April 2013).
  • Olson, Ryan, Hogan, Lindsey and Santos, Lindsey (2005) ‘Illuminating the History of Psychology: tips for teaching students about the Hawthorne studies’, Psychology Learning & Teaching, 5(2), p. 110.


#H809 Comparing paper-based and web-based course surveys: The Ardalan (2007) paper

The second paper in week 11 of H809 looks at the effects of the medium when soliciting course feedback from students.  A switch from paper-based to web-based survey methods (2002-2003) provided a natural experiment setting for Ardalan and colleagues to compare the two modes for a variety of variables.  As for the Richardson paper , we were asked to critically look at the methodology and issues such as validity and reliability.  A lively (course-wide) forum helps to collect a variety of issues.

Ardalan 2007-copy

Schematic representation of Ardalan et al.(2007) paper


  • The study aims at presenting a ‘definitive verdict’ to some of the conflicting issues surrounding paper-based and web-based surveys.  The paper clearly favours statistically significant correlations as proof.  However, despite the large sample, the research is based on courses in one North-American university (Old Dominion University, Virginia) during two consecutive academic years (2002-2003).  The context of this university and academic years is not described in detail, limiting the applicability of the paper to other contexts.  Generalisability could be enhanced by including more institutions over a longer period of time.


  • The study succeeds in identifying some correlations, notably effects on the response rate and the nature of responses (less extreme).  However, it doesn’t offer explanations for the differences.  Changes in response rates could be due to a lack of access to computers by some students, they could be due to contextual factors (communication of the survey, available time, incentives, survey fatigue…), or they could be due to fundamental differences between the two survey modes .  We don’t know. The study doesn’t offer an explanatory framework, sticking to what Christensen describes as the descriptive phase of educational research.


  • It’s a pity that the study wasn’t complemented by interviews with students.  This could have yielded interesting insights in perceived differences (response rates, nature) and similarities (quantity, quality).
  • I found the paper extremely well-structured with a clear overview of literature, research hypotheses,


  • The difference response rate may well have had an impact on the nature of the sample.  The two samples may have been biased in terms of gender, age, location, socio-economic status (access to web-connected computer).  Perceived differences between the modes may have been due to sample differences.
  • I’m not sure whether the research question is very relevant.  Potential cost savings for institutions from switching to web-based surveys are huge, making that institutions will use online surveys anyway.

Even a medium-size institution with a large number of surveys to conduct realises huge cost savings by converting its paper-based surveys to the web-based method. With the infrastructure for online registration, web-based courses and interactive media becoming ubiquitous in higher education, the marginal cost savings above the sunk costs of existing infrastructure are even more significant. (Ardalan et al., 2007, p.1087)

Lower response rates with web-based surveys can be dealt with by increasing the sample size.  Rather than comparing paper-based and web-based surveys (a deal that is done anyway), it would be more interesting to analyze whether web-based surveys manage to capture a truthful image of the quality of a course as perceived by all students and what are influencing factors and circumstances.

#H809 Toolkits as Bridge between Learning Theory and Practice?

Conole et al. (2004) advocate the use of toolboxes as ways to bridge theory and practice.  Practitioners don’t have time to wade through wads of theoretical papers.  As a result many designs are based on ‘commonsense’ rather than theoretically informed.  The authors argue that theory-informed designs would improve quality and that toolkits are the ideal instrument to realize this:

They distinguish toolkits from wizards (which are black boxes, hiding the underlying decision process) and conceptual frameworks (which offer little practical use).

Some characteristics and key terms on toolkits in the article:

  • for non-expert users to engage with theories
  • elicit assumptions and theories
  • decision-making systems
  • reflect beliefs and assumptions of creator(s)
  • guiding framework
  • offer flexibility for local context
  • informed decisions
  • offer common language
  • provide examples (if linked database)
  • promote reflective practice

The toolkit presented in the paper is represented by a model


Learning activities such as brainstorming or presentation of materials can be mapped with the model, prompting reflection on the overall pedagogical balance and the types of learning supported.

The paper contains a welcome synthesis of learning theories.  I’m less convinced about the practical value of the toolkit.  Publishing the paper in a closed-access journal is not likely to contribute to its adoption by practitioners.


#H809 Key Criteria for ‘Healthy’ Online Communities

Communities of Practice is one of the most used concepts in educational research these days.  Wenger (1998) has provided a theoretical basis for the concept, although his definition is quite fluid and difficult to grasp (Johnson, 2001).  Preece (2000) has  developed an operationalisation of the concept, centred around concepts of usability and sociability.  These relate to the duality, developed by Wenger, between design and emergence.

Jones and Preece (2006) distinguish between Communities of Interest (COI) and Communities of Practice (COP).  The latter, described by Wenger (1998), are reserved for communities in professional contexts.  COI refer to the more organic, loosely structured communities that centre around people’s interests.  Garrison has coined the term Community of Inquiry, focusing on groups in educational settings.  There seems to be a rich amount of literature on these Communities of Inquiry.

Preece (2000) uses a sociability and usability framework to analyse the success of COI and COP.  Usability is related to user-friendliness and consists of guidelines for the design of online spaces.  Criteria for sociability centre around the 3 P’s of people, purpose and policies.

Sociability framework

Sociability framework (Preece, 2000)

1. People

  • Reciprocity
    • requires ‘nurturing’ in young communities
    • ‘lurkers’ routinely comprise at least 50% of participants
  • Empathy and trust
    • empathy: ability to understand others and react compassionately
    • trust: expectations of positive interactions
  • Clear leadership and commitment
    • Supported by research from Wenger et al. (2011): “what makes a difference is not the quantity of users, but the passion and commitment with which a subset of users provide leadership, example and high quality content”

2. Purpose

  • Common ground
    • corresponds with ‘mutual understanding’ (Wenger, 1998) , sense of unity, a common vision & values
    • clarity of common purpose for participants  (* I’m not convinced all participants need to have a common purpose)
    • related to motivation
  • Incentives for collaboration (vs. competition)

3. Policies

  • Etiquette
    • can be realized through formal rules or through self-governance/ cultural norms
    • related to amount of social pressure and presence of leadership
  • Social presence
    • described as sense people have online of others being present
    • can be generated by short response time, not necessarily by many postings
    • is strongly positively related with etiquette
  • maturity
    • COI/COP need time to form and grow, in order to develop, in Wenger’s (1998) terms, ‘mutual understanding’, ‘common language’ and ‘reified artefacts’

Comparing these criteria with Stephen Downes’ description of characteristics of successful networks, highlights some of the differences between communities and networks:

  • autonomy
    • degree to which a network and its members can act independently
    • not a criterion for a community, rather are coherence and a sense of belonging to group (identification)
  • diversity
    • degree to which various backgrounds and opinions are represented in the network
    • communities require a mutual understanding and shared repertoire.
  • openness
    • degree to which the community is open to new members
    • * although not mentioned, I believe this was a major weakness of the COP of physics teachers in the Jones and Preece (2006) study.


Preece, J. (2000) Online Communities: Designing Usability, Supporting Sociability, John Wiley & Sons.

Jones, A. and Preece, J. (2006) ‘Online communities for teachers and lifelong learners: a framework for comparing similarities and identifying differences in communities of practice and communities of interest’, International Journal of Learning Technology, 2(2), pp. 112–137.

Wenger, E. (1998) Communities of Practice: Learning, Meaning, and Identity, Cambridge University Press.

Wenger, E., Trayner, B. and De Laat, M. (2011) Promoting and assessing value creation in communities and networks: a conceptual framework, Ruud de Moor Centrum, Open University of the Netherlands, Available online

#H809 Ethical implications of an OLPC evaluation study

In the first TMA of ‘the season’ we were asked to formulate a research question on an educational technology topic and discuss the methodologies that could be used to address it.  I focused on the evaluation of an One-Laptop-Per-Child (OLPC) programme and whether it has an effect on learning in a developing country context.  In the first week after the TMA the focus is on ethics and audiences. What are the ethics implications of engaging a study such as the evaluation of an OLPC programme?

Some ethical aspects include (based on Lally et al., 2005):

  1. Informed consent
  • The proposed study works with minors,  so formal consent from parents or guardian is required
  • How can also the pupils’ voices be heard and taken into account by the researchers?
  • Children and parents/ guardians should be duly informed of the research and possible consequences.  How can it be guaranteed that they are fully aware of the research proposal?
  • Options to withdraw from the study should be specified.  In a long-term study such as this, this also has

On informed consent, Lally et al. (2005) write: “researchers should move away from the ‘granting approval’ mode of ethics, towards treating the participants as partners in research”.  A main reason for this shift is that it is increasingly difficult to list in advance all the ethical implications of a research study.  In Cambodia cultural barriers may stand in the way of such a negotiated approach.

2. Potential for discrimination and abuse

  • Given the permanency of the written word, safeguards to prevent information to be made public should be in place
  • Confidentiality and anonymity clauses of participants in research findings should be clear
  • Extent of moral duties of researchers, for example when confronted with harmful content, should be specified
  • Care should be taken of the effect on interpersonal relations (issues of power, safety) the introduction of an expensive technology has in an environment where such devices would normally not be purchased.
  • Introducing expensive devices may create inequalities and feelings of exclusion with pupils from deprived backgrounds.
  • Determining test and control groups may be difficult (in case of a RCT), as it implies some pupils in a class get a computer and others not.

3. User generated content

  • It should be clear who has access to usage data and child-created content and under what conditions
  • Copyrights on user created content should be clarified
  • It should be clear what content can be used in communication on research findings and under which confidentiality/ anonymity conditions

4. Attachment

  • Users may develop an attachment to a donated computer when used for a long time.  Will they become owner after the research ends?
  • Users may act differently with a device that is not their own (e.g. not use all its features)

Lally et al. (2005) highlight that mobile, ubiquitous and immersive technologies blur boundaries of learning (formal vs. informal, school vs. home, learner vs. consumer…).   These complicate also ethical implications.  There are no definitive answers for the issues listed above, rather they should be discussed with participants in a continuous process throughout the research programme.


Lally, V., Sharples, M., Tracy, F., Bertram, N. and Masters, S. (2012) ‘Researching the ethical dimensions of mobile, ubiquitous and immersive technology enhanced learning (MUITEL): a thematic review and dialogue’, Interactive Learning Environments, 20(3), pp. 217–238.


#H809 Can Technology ‘Improve’ Learning? And can we find out?

In education and learning we cannot isolate our research objects from outside influences, unlike in positive sciences.  In a physics experiment we would carefully select variables we want to measure (dependent variables) and variables that we believe could influence those (independent variables).  In education this is not possible.  Even in Randomized Controlled Trials (RCT), put forward by researchers as Duflo and Banerjee (see my post that discusses their wonderful book ‘Poor Economics’) as a superior way to investigate policy effects, we cannot, in my opinion, fully exclude context.

This is why, according to Diana Laurillard, many studies talk about the ‘potential’ of technology in learning, as it conveniently avoids dealing with the messiness of the context. Other studies present positive results, that take place in favourable external circumstances.  Laurillard argues that the question if technology improves education is senseless, because it depends on so many factors:

There is no way past this impasse. The only sensible answer to the question is ‘it depends’, just as it would be for any X in the general form ‘do X’s improve learning?’. Try substituting teachers, books, schools, universities, examinations, ministers of education – any aspect of education whatever, in order to demonstrate the absurdity of the question. (Laurillard, 1997)

In H810 we discussed theories of institutional change and authors such as Douglas North and Ozcan Konur, who highlighted the importance of formal rules, informal constraints and enforcement characteristics to explain policy effects in education.  Laurillard talks about ‘external layers of influence’. A first layer surrounding  student and teacher (student motivation, assessment characteristics, perceptions, available hard- en software, student prior knowledge, teacher motivation to use technology etc.) lies within the sphere of influence of student and teacher.  Wider layers (organisational and institutional policies, culture of education in society, perceived social mobility…) are much harder to influence directly.

That doesn’t mean she believes educational research is impossible.  She dismisses the ‘cottage industry’ model of education (See this article from Sir John Daniel on the topic), in which education is seen as an ‘art’, best left to the skills of the teacher as artist.  Rather, she argues for a change in direction of educational research.

Laurillard dismisses much educational research as ‘replications’ rather than ‘findings’, a statement that echoes the plea from Clayton Christensen to focus more on deductive, predictive rather research than descriptive, correlational studies.  He argues to focus less on detecting correlations and more on theory formation and categorisation of the circumstances in which individual learners can benefit from certain educational interventions.  A body of knowledge advances by testing hypotheses derived from theories.  To end with a quote from the great Richard Feynman (courtesy the fantastic ‘Starts with a Bang‘ blog):

“We’ve learned from experience that the truth will come out. Other experimenters will repeat your experiment and find out whether you were wrong or right. Nature’s phenomena will agree or they’ll disagree with your theory. And, although you may gain some temporary fame and excitement, you will not gain a good reputation as a scientist if you haven’t tried to be very careful in this kind of work.” -Richard Feynman


Konur, O. (2006) ‘Teaching disabled students in higher education’, Teaching in Higher Education, 11(3), pp. 351–363.
Laurillard, D. (1997) ‘How Can Learning Technologies Improve Learning?’, Law Technology Journal, 3(2), pp. (c) Warwick Law School; presented at the Higher Education 1998: Transformed by Learning Technology, Swedish–British Workshop 14–17 May 1993, University of Lund, Sweden.
North, D.C. (1994) Institutional Change: A Framework Of Analysis, Economic History, EconWPA

#H809 Can Computers Overcome Tensions between Qualitative and Quantitative Research Approaches?

The second paper in H809, from Wegerif and Mercer, uses computer-based language analysis as an opportunity to discuss qualitative and quantitative approaches in educational research.  The paper dates from 1997 and, similar to the Hiltz and Meinke paper, its main objective seems to highlight the role novel computer-based technologies can play in research.

Quantitative data analysis enables testing of research hypotheses, creating evidence and making generalisations.  This helps to build a body of knowledge and making predictions for new situations.  Qualitative approaches allow much finer level of analysis and more attention to the particular context.

The time required for analysis and the space required for presentation mean that there is a de facto relationship between degree of abstraction useful in the data and the sample size of a study or the degree of generalisation. More concrete data such as video-recordings of events cannot be used to generalise across a range of events without abstracting and focusing on some key features from each event. (276)

Increasing computer power allows for analysis of much higher amounts of data in more detail and reduce the required level of abstraction in the categorisation. Large amounts of data can be analysed with more sensitivity to content and context.

We believe that the incorporation of computer-based methods into the study of talk offers a way of combining the strengths of quantitative and qualitative methods of discourse analysis while overcoming some of their main weaknesses. (271)

I agree that computer-based discourse analysis may overcome some weaknesses of both approaches, language may still be difficult to capture quantitatively because:

  • nonverbal language plays an important part in communication
  • meanings may be ambiguous
  • meanings may change over time or vary among persons and among contexts.

I’m not sure it’s helpful to analyse language with the same tools and rigour as positive sciences, as context is much more prevalent in language than in positive sciences.

More computer power doesn’t mean that the subjective role of the researcher can be completely discarded.  Researchers may have various  motives for their research.  As a researcher you always need to make interpretative decisions. Even with computer-based text analysis the researcher still decides which categories to use, which hypotheses to test and which excerpts to publish.  I believe it’s best to document these decisions as well and transparently as possible.  For example, the researchers could discuss limitations and weaknesses of their research or suggest alternative explanations (e.g. perhaps learners knew each other better the second time).  Also making data publicly available would help, so other researchers can scrutinize the results (although few may have time and incentive to do this).


Wegerif, R. and Mercer, N.(1997) ‘Using Computer-based Text Analysis to Integrate Qualitative and Quantitative Methods in Research on Collaborative Learning’, Language and Education, 11(4), pp. 271–286.

#H809 Methodological Reflections on the Hiltz and Meinke Paper

The first paper in H809 is an oldie, a paper from Starr Roxanne Hiltz and Robert Meinke published in 1989.  The paper aims at comparing the learning outcomes in a few courses between online and face-to-face delivery.

Research Questions & Design
The article seeks to find out whether a virtual course implementation (VC) produces different learning outcomes than a traditional face-to-face (F2F) approach. Secondly, it looks to determine variables (student, instructor and course characteristics) associated with these outcomes.  The research uses quantitative research methods, using pre- and post-course survey data.  It complements these with evaluation reports from the course instructors.
The research aims at relating the mode of delivery to learning outcomes, measured by data such as SAT courses.  It takes a behavioural view on learning.  Alternatively, the research could have focused on the degree of understanding, the development of ‘soft skills’ 
Limitations of the research:
1. Distribution of students in groups (VC vs F2F) was done through self-selection (quasi-experimental approach).  Student characteristics may thus not be similar.  Perhaps, more disciplined or motivated students chose to take the VC approach.
2. The use of self-reporting pre- and post-course surveys may be prone to response bias. Responses may have been skewed by a desire to please the researchers. As students were asked to compare VC experiences with previous F2F experiences, they needed to rely on (distorting) memory. 
3. The scope of the research was limited to two institutions and a small student population.  Not surprisingly, few results were statistically significant: “In many cases, results of quantitative analysis are inconclusive in determining which was better, the VC approach or the F2F approach.  The overall answer is: It depends.”  Setting up methodologically sound quantitative research designs in a ‘real’ educational setting is challenging, as there are so many environmental variables that may influence the outcomes and which, in an ideal setting, should be kept constant in order to have conclusive results for the dependent variable.
4. The researchers mention implementation problems, such as resistance by faculty members.  Unfortunately, they don’t elaborate on this.
5. The same teacher, text and other printed materials were used in both modes.  This seems like an objective way to compare two modes, but it may not be. The teacher may have been less familiar with online delivery or failed to adapt his/her mode of instruction.  Texts and other printed materials may be suitable for F2F delivery, but online delivery calls for different course designs (See the work of Mayer and Clark).  For example, online delivery requires short chunks of text for online reading, proximity of a graph with the explanation of this graph and removal of redundancies in information.
6. The research focuses on the comparison of delivery modes (VC vs. F2F).  However, in their discussion on collaborative learning, the authors seem to suggest that it is mainly the selection of instructional strategies that counts, in particular the inclusion of collaborative learning activities like seminar-style presentations and discussions. 
The self-selecting of the samples is a weakness in the study.  Random selection would arguably provide a better basis to compare the learning outcomes of two delivery modes.  However, assigning students to a delivery mode, which you may suspect will put them to a disadvantage  and for which they have paid good money, raises ethical questions.  Providing the courses free of charge for students willing to take part of the study could be an option, although this may in turn affect the research.  Students may behave differently in a course for which they paid.
The study found little evidence of statistically significant correlations in learning outcomes between the two modes of delivery.  The pre- and post-test surveys did show some significant correlations in subjective assessments such as interest. Correlations were in both directions. For a mathematics course the online course generated higher interest course, whereas for an introductory sociology course, the result was opposite.  The authors suggest that this may be related to the fact that the sociology cohort was an academically weak group, as illustrated by their SAT scores.  
What counts as evidence in the paper?
The researchers look for statistically significant correlations.  I believe such a correlation gives more support for a claim, by indicating its strength and reliability. However, the claim is limited to the particular circumstances in which the research took place (characteristics of students, teachers, institutions, courses…) and cannot be extended to other circumstances without insight in the nature of the circumstances and their causality with the learning outcomes. In what circumstances do students achieve better learning outcomes in an online course?  For what types of courses does online learning offer a better learning experience?  The authors do discuss these circumstances, but base themselves mainly on their personal experiences as instructors rather than statistical tests.
A next step in the research could be to look for anomalies in the data.  Students, courses and implementation strategies that fail the hypotheses made.  For example the hypothesis that online learning is beneficial for more mature learners. Or the hypothesis that online learning is less suitable for wide, introductory courses that touch upon many topics.  
The research could form input to meta-analysis, which could compare the claims with other studies and try to distil findings based on a more diverse set of circumstances.