A concept on the forefront this week was action analytics. It refers to the application of academic or learning analytics with clear action component in mind. Most universities collect a lot of institutional stats, but struggle to provide more than reports. In an earlier blog post I briefly described the Signals Project at Purdue University.
Keyword during week 3 of the LAK11 course is the Semantic Web (or, a bit less informative, web 3.0). A bunch of text materials, ranging from the very accessible, to the more technical were all out there to help us grab these sprawling concepts.
Tim Berners-Lee, the father of the internet, recalls in his TED talk (highly recommended ,btw) that he wrote his proposal in 1989 to set up a linked information system out of frustration for his work as a software engineer at CERN. Being confronted with all kinds of different data formats, information systems and isolated information, he wrote the proposal and the code for the internet. Today he experiences a similar kind of frustration, the frustration of not finding what he’s looking for. From this frustration, he advocates the creation of a semantic web, on top of the current web.
The number of web pages is amounting to billions, soon even trillions of pages. The currently used HTML format stores data as text files, making it unsuitable for data analysis. Search engines like Google, confronted with a diminishing advantage of its search algorithms, struggle to render meaningful results with these quantities. The semantic web is the tool to bring more structure in the internet, actually making it a bit more like a database. Tim Berners-Lee defines the Semantic Web as “a web of data that can be processed directly and indirectly by machines.”
The idea is that people publish their data in a more standardized format. The way to do this is by using ontologies, a fixed way of describing concepts (like metadata) . For example, if everyone were using the same words to describe “rice”, it would be easier to connect information from different websites. Moreover, and that’s another important rationale for the semantic web, it would make it easier for machines to access the information and perform all kinds of queries on it.
This extract from Wolfgang Greller’s blog illustrates nicely the difference between a traditional search engine and a semantic web query.
What semantic web technologies can do is relatively simple to show. Take this example: You know that your friend John has a brother living in South America, but you can’t remember his name. Typing “brother of John” into a traditional search engine won’t work. All it will return is documents that contain the words ‘brother’ and ‘John’ or the exact phrase ‘brother of John’. The Semantic Web “knows” about relations, hence it would return a result saying ‘brother of John’ = ‘Kendon’. It works in exactly the same way for ‘capital of France’ = ‘Paris’; or ‘other words for red’ = ‘crimson’, ‘ruby’, etc. Semantic search engines can do this, based on a vocabulary of relations. This not only stores the words themselves, but also the way in which they relate to each other, i.e. ‘goose’ is a sub-item to ‘bird’.
Of course, if only you or me were to put his data online in a standardized format, it wouldn’t make any sense, since there would be nothing to link to. A critical mass of data published in a standardized format is necessary for scale advantages to come into play. The more connected data become, the more powerful the Semantic Web gets. This concept is called Linked Data. In a linked data model, things are uniquely identified, capable of being looked up, provide useful information when looked, and themselves link to other uniquely identified things. Because this interconnected data is structured, it allows computers to make complex connections.
A range of government funded initiatives are being built, some of them can be found on Freebase. The astronomy database, for example, not only lets you retrieve information about galaxies, but you can also make a graph with a distance distribution of galaxies. Openstreetmap.org lets you edit geographical information in a standard format and contribute to an online mapping system. DBpedia aims at structuring the the information on Wikipedia in a semantic web-friendly format, enabling queries and linking it to other data sources on the web.
The DBpedia knowledge base currently describes more than 3.5 million things, out of which 1.67 million are classified in a consistent Ontology, including 364,000 persons, 462,000 places, 99,000 music albums, 54,000 films, 17,000 video games, 148,000 organisations, 169,000 species and 5,200 diseases. The DBpedia data set features labels and abstracts for these 3.5 million things in up to 97 different language (DBpedia homepage)
The Semantic Web is built on top of the current web and uses a XML-based language called RDF to formally define and connect the data.
The Semantic Web is generally built on syntaxes which use Uniform Resource Identifiers (URIs) – similar to URLs – to represent data, usually in triples based structures: i.e. many triples of URI data that can be held in databases, or interchanged on the world Wide Web using a set of particular syntaxes developed especially for the task. These syntaxes are called “Resource Description Framework” (RDF) syntaxes. (parafrased from The Semantic Web: An Introduction)
For learning, the Semantic Web holds the potential for retrieving more relevant information more easily, both by people and by intelligent agents and tutors. Tagging systems could evolve into ontologies, with everyone using an identical set of tags to describe websites’ content.
However, will the Semantic Web concept succeed in bringing more order and structure to the web? Will people without database expertise be convinced to spend time entering their data in a standardized format? Data descriptions may have different meanings to different people. A small molecule can mean a few molecules to a chemist and 10.000 molecules for a biochemist (Laurence Cuffe on the course’s Moodle forum). Constructing queries on Linked Data requires proficiency in query syntax such as SPARQL. As Tanya Elias put it on the forum:
Will the web retain its open character? Is there a single right way to categorize information and does it not change continuously? How does the semantic relate to the social web, for example the collaborative tagging of Delicious and Diigo? I guess it’s still waiting for more powerful illustrations of the Semantic Web’s potential. In the meantime however, search engines such as Google already incorporate the Semantic Web into their search algorithms. Maybe the introduction of the Semantic Web will go largely unnoticed for most users.
“Terabytes are not hard to get, they are hard not to get” (Mark Olson, CEO Cloudera)
- Density estimation
- Relationship mining
- Association rule mining
- Sequential pattern mining
- Causal data mining
- Distillation of data for human judgment
- Discovery with models
In discovery with models, a model of a phenomenon is developed through any process that can be validated in some fashion, and this model is then used as a component in another analysis, such as prediction or relationship mining. (…) supporting sophisticated analyses such as which learning material sub-categories of students will most benefit from (Beck & Mostow, 2008), how different types of student behavior impact students’ learning in different ways (Cocea et al., 2009) and how variation sin intelligent tutor design impact students’ behavior over time (Jeong & Biswas, 2008).
For all its successes, though, statistical analysis continues to face tremendous skepticism and even animosity. For one thing, Ayres notes, statistics threaten the “informational monopoly” of experts in various fields. But even to many people without a vested interest, relying on cold, hard numbers rather than human instinct seems soulless.
Recommender systems, such as used by e-commerce firms as Amazon are regularly mentioned as one of the potential applications of learning analytics. For example, based on the sources accessed or links in the social network, students could get recommendations about potentially interesting articles, blogs or people.
Learning analytics could (should) also be student-centered. This means that students could be granted access to course data. For instance, they see how much time they’ve spent on various course activities and compare it with their peers. The question what students want generated discussion on the course forums. The idea, outlined by John Fritz in his presentation was that students take more responsibility for their own learning and strengthen their meta-cognitive abilities. They could get access to the data, but it would be their responsibility to interpret it and act upon it. However, most institutions are still in the phase of collecting heaps of data and analyzing them, without really predicting and modeling behavior, or using it to optimize learning.
“Institutions can’t “absolve” students from “at least partial responsibility for their own education. To do so denies both the right of the individual to refuse education and the right of the institution to be selective in its judgments as to who should be further educated. More importantly, it runs counter to the essential notion that effective education requires that individuals take responsibility for their own learning” (p. 144)
My first impressions on the MOOC are overwhelming, chaos and quality. The amount of e-mails and forum posts is staggering and different discussion are taking place simultaneously. However, it is not really the purpose of a MOOC to participate in everything but rather to be selective. In that way, a MOOC is a great way to get lectures, information and feedback of some of the leading researchers in the field. You are stimulated to read the materials and try to make sense of it at your own pace and on your own knowledge level. A next step is then to create something (like a forum post), share it and get into contact with “likeminded souls”. We’ll see how that plays out.
In 2008, Stephen Downes was teaching a class on learning theory at the University of Manitoba. Rather than limit access to his lectures to the 25 students registered for his course, he allowed the general public to attend virtually. The result was that more than 2300 people participated in his course.
Are they successful? I’m trying it out, and keeping you posted.