CoFIND- an Experiment in N-dimensional Collaborative Filtering

Originally presented at WbNet 99, Honolulu, Hawaii

Jon Dron, Richard Mitchell, Phil Siviter, Chris Boyne

Faculty of IT, University of Brighton, UK

E-mail: jon.dron@brighton.ac.uk

 

 

Abstract: This paper reports on the development of CoFIND, a web-based n-dimensional collaborative filtering system that seeks to guide learners to relevant resources based upon not only the content of the resources but the qualities exhibited by those resources that make them useful learning material. Qualities provide the n-dimensions of this collaborative filter. Qualities and resources are generated collaboratively by the users of the system. CoFIND is designed to allow evolution to occur, which is discussed in the context of Darwinian theory and includes reference to current theories relating to the development of complex systems. The paper goes on to describe the implementation of the system and the results of an early pilot experiment involving a group of 42 students. It is concluded that, despite encouraging early results, some further work is needed to develop an effective interface and to embody the kind of complex interactions needed to generate spontaneous evolution.

 

 

Introduction

As any web-aware teacher will know, there are many excellent, freely available resources for teachers and students on the World Wide Web, such as individual web pages, whole web sites, mailing lists and newsgroups. The biggest difficulty we face in using them lies in finding good resources and, having found them, identifying whether or not they are useful in a given context for a given learner. A teacher's eye view will be different from that of a student, but each student will differ from the next in content requirements and preferred learning style. Although there may be some common measures of excellence, there are few ways to measure the relevance of a given resource to given learner. Of the available options, the methods we use can be broadly split into two main categories:

  1. Appeals to authority
  2. Collaborative filters

 

Appeals to Authority

Appeals to authority are an effective way of finding resources for learners. Recommended texts provided by teachers and references in papers can help to filter the useful from the unhelpful, as can the recommendations of friends and colleagues. The Internet extends this concept through the use of such technologies as newsgroups, email, mailing lists and IRC to provide direct recommendations. More structured variations on the theme include hierarchical directories such as Yahoo (Yahoo, 1999), which provide vetted links provided by human beings. Combined with a search engine to narrow down the returned results, this is a powerful way of getting hold of good quality and potentially relevant resources. Similarly, the opportunities to review books provided by sites such as Amazon (Amazon, 1999) allow us to share our experiences with others. Another worthwhile technology is the PICS (Platform for Internet Content Selection) system (Resnick, 1997) which allows documents to be labelled and described in a standard manner. If we trust the labellers this can work very well, although it relies on a high uptake of the system- unlabelled documents are filtered out.

The disadvantage of all such systems for prospective learners lies in our diverse learning styles. There is no such thing as a perfect learning resource for all learners. Even if we can find reliable resources in the right subject area, there is no guarantee that they will help us to learn effectively. We learn in different ways.

 

Collaborative Filters

An Automated Collaborative Filter (ACF) allows us to overcome the problems of differences in viewpoint and needs by automatically identifying commonalities between users of a system and providing suggestions for resources that are closely matched to needs. Based on the assumption that if you and I like the same 100 resources then we are probably going to feel similarly about the 101st, the seminal Firefly and many other collaborative filtering systems (e.g. Amazon, PHOAKS, GroupLens) are immensely effective ways of identifying resources that we will probably like or enjoy.

 

N-dimensional Filtering

An ACF is superb for such areas as book and movie recommendations, but starts to founder when we seek novel resources. By its nature, learning is to do with bringing about changes in the learner. It follows that matching our previous likes and wishes may not always provide us with the most suitable resources for our current learning needs. As my past list of preferences for learning resources includes many works relating to English Literary Culture, it is relatively unlikely that my current need for resources relating to Network Management will be adequately met. Given divergent needs, different learners’ paths may branch too dramatically for existing ACF systems to provide much more than an approximate match to their requirements.

N-dimensional collaborative filtering seeks to build on the power of ACFs by introducing further explicit dimensions of value beyond like/dislike or good/bad into our assessment of resources. These dimensions are exhibited as the qualities of resources that make them useful to us. For instance, the qualities of learning resources for English Literary Culture might include ‘comprehensive’, ‘good for beginners’, ‘amusing’, ‘formal’ or any number of other descriptive terms that would be equally suitable as ways of describing resources relating to Network Management. In essence, a quality is something that a learner values about a resource.

 

CoFIND (Collaborative Filtering In N Dimensions)

CoFIND is a web-based n-dimensional filtering system that we are developing for collaboratively creating databases of URLs pointing to learning resources. It is intended to be maintained and created by the learners who use it. Although a teacher is not necessarily excluded, the system is designed to require no teacher-intervention.

Resources that have been found to be valuable in the learning process are entered onto the system by the learners themselves. Qualities are also created by the learners, as selection of qualities is too important to be left up to the ‘experts’.

When entering resources, we encourage users to also enter qualities that they find useful in those resources. We then encourage all users of the system to vote for how well those qualities match not only these but also the other resources in the resource-base. Resources may thus be rated according to how well they match the collaborative sum of learners’ opinions of different qualities.

 

Evolving a Solution

If references to resources and qualities were allowed grow unconstrained, the system would soon become clogged with less-than-useful detritus. Therefore, CoFIND has been designed to let appropriate qualities ‘evolve’, allowing more-used qualities to thrive at the expense of the less-used, a process which in turn affects the ‘success’ of the resources themselves. This mechanism for evolution is provided by a combination of ordering (more popular qualities are presented first, so are selected more often and thus thrive more easily) and planned retirement. After a configurable number of days unused, qualities are humanely ‘retired’. It is still possible for retired qualities to be voted for and thus to reappear on the list, but it is made significantly more difficult for this to happen than for those that are commonly selected.

A similar fate awaits resources that receive few votes for any quality- resources naturally drop to the bottom of the list as a result of vote-starvation. However, even popular resources do not stay consistently at the head of the list, as not all qualities will apply to every resource. By providing many qualities for learners to seek we are creating multiple dimensions, allowing us to cater for many different learning needs and styles. A variegated evolutionary landscape develops where speciation can occur and there are many evolutionary niches to be filled.

 

Evolution

CoFIND has been designed to be self-sustaining, driven by the activities of its users. It should provide an evolutionary landscape where adaptation of qualities can occur and where there is little or no need for sky-hooks (Dennett 96) or an external agency of change. We wish it to be learner, not teacher-driven.

William H. Calvin sums up the requirements for evolution to occur as "a pattern that copies with occasional variation, where populations of the variants compete for a limited workspace, biased by a multifaceted environment, and with the next round of variations preferentially done from the more successful of the current generation" (Calvin, 1997).

Within CoFIND, qualities thrive according to the numbers of votes cast. Successful qualities get votes and survive, less successful qualities fade away. Variation occurs by users identifying qualities imperfect for their needs then choosing semantically similar or (in an analogue to gross mutation) dissimilar words or phrases which, if successful, will oust their predecessors as descriptions of what is valuable in a resource. This can meaningfully be called reproduction on the grounds that ideas will spark other related/similar ideas and words will suggest other words, a way of thinking not unrelated to memetics. Even if this is hard to accept, the fact that semantically similar qualities exist at all should result in variation and competition. There is a potential for many ecological niches or fitness peaks, which may be more or less successful, but within a range of semantically similar qualities it is likely that few will survive. For instance, if the qualities 'good for beginners' and 'simple' are semantically close then the quality that is more widely used will render the other extinct. If they are sufficiently distinct and both of some use then both can survive as they will occupy different evolutionary niches. We achieve a multifaceted environment through the many shifting requirements of the learners using the system.

Each CoFIND instance is isolated from others. Different instances of the system are installed for different subject areas and purposes. As Calvin notes, "Parcellation (as when rising sea level converts the hilltops of one large island into an archipelago of small islands) typically speeds evolution" (Calvin, 1997). However, we have found it useful to be able to transfer an existing population of resources and qualities to a new context (again, reflecting what happens when islands are separated from mainlands).

Care has been taken to ensure that change is neither too slow nor too fast. Kauffman has observed that evolution happens best at the edge of chaos (Kauffman 1995). If the evolutionary landscape is too ordered (a ‘Stalinist regime’) then a stagnant stability is reached and interesting change does not occur. If it is too chaotic, no high peaks of fitness are ever reached and the system slips into a state of flux. Kauffman calls this the ‘Red Queen regime’, where we are always having to run to stay in the same place. Several methods implemented in CoFIND help to prevent this fall into chaos. For instance, we make it easy to vote for qualities that have been selected but hard to vote for those that have not; it is deliberately difficult to adding new qualities or resuscitate retired ones; even the way that resources are displayed (slowly, five at a time) is designed to limit rapid change. Together, these mechanisms create a kind of stickiness- success breeds more success and it is fairly difficult for new species of qualities to usurp the existing dominant values. However, the system remains dynamic through the constant input of new resources and qualities, so that there is little danger of too much stability or order setting in. Getting this tuned to the right level of dynamism remains the subject of ongoing research.

 

 

A Brief Description of the System

CoFIND is written using Microsoft’s Active Server Pages running under IIS on Windows NT, and uses a Microsoft Access database as its back end. To cope with a potentially large number of resources it is searchable. It uses a basic pattern matching search engine that looks through the descriptions and comments that users have posted about resources.

Upon logging in to the CoFIND system, a user is presented with a screen providing a selection of qualities (by which the returned resources will be ranked) and a simple search field. The qualities are listed in order of frequency of use- those that have been voted for most appear at the top. There is no requirement to enter any quality nor any search term, but unless one or both are entered the result is an unordered list of all resources in the resource-base.

Upon clicking a button to perform the query a users is presented with a list of resources, five at a time, ranked according to the sum of votes for preferred qualities and matching whatever search terms were entered. Only preferred qualities are shown, with a small bar-graph against each quality indicating the number of votes already cast. A hyperlink will take the user to a given resource, displayed in a separate browser window so that CoFIND remains available. Hyperlinks also provide a mechanism to vote for resource/quality combinations.

Users can quite easily add resources on a form that also provides a means to reinstate retired qualities. It is possible to add comments to existing resources, providing a mechanism for effective use of appeals to authority as well as ACF.

For a given link, a detailed page may be selected listing comments and giving a detailed breakdown of who has voted for a particular resource as well as its overall profile of votes for all qualities. The page also gives further information indicating the pattern of votes for the resource and specific information on each individual vote cast.

 

An Example Use of the System

CoFIND has been deployed in a number of environments so far. Users of the system range from an international group of teachers and trainers of Object Oriented languages to final year students of network management and a group of lecturers with a shared interest in learning technologies. Further uses of the system are planned for the near future, including application to the needs of language students.

The results presented here come mainly from a short self-contained assignment for to a group of forty two postgraduate students who were required to design network, a task that would require significant research to complete. The students had participated in a pilot trial of the system for one week prior to the start of the assignment. CoFIND was provided as a database of resources relating to the assignment, and students were offered up to five percent of their marks for their contributions to it, whether adding resources or commenting on existing resources. They were also asked to use the database to submit the URLs of their finished reports.

We had primed the system with three relevant resources and seven qualities (‘interesting’, ‘useful’, ‘reliable’, ‘informative’, ‘a good gateway to further resources’, ‘of broad coverage’, ‘accessible’) that had arisen in the earlier trial session. Throughout the experiment, conscious of the need to avoid behaving as a skyhook we did not make further contributions, despite strong temptation. We see the starting state of the database as simply part of the shape of the evolutionary landscape. We are interested in the dynamics of the system, what happens next rather than how it came to be.

 

Some Results

The system was very popular- comments along the lines of ‘we should have had something like this for the entire course’ were common. The main appeal to students was that it was a searchable collaborative resource database- a shared space for storing useful URLs and associated comments. From a user perspective, CoFIND is only a little different from a traditional search engine and indeed some students added qualities that made it behave that way- for instance ‘about firewalls’.

Thirty-six out of the forty-two students discovered over seventy relevant resources over the two week assignment with a range and quality far surpassing the authors’ attempts spanning several years to build a similar list of links. All the remaining students made comments on existing resources, indicating that they had spent some time looking at them.

Although the system is designed explicitly for resources that can be specified with a URL, it proved flexible enough for one of the students to use it to recommend a book- the ensuing votes and comments showed that this was a very successful choice.

Growth of qualities was generally restrained. Only a few new qualities were added to the original seeding values until the final hand-in day, at which point a Cambric explosion of qualities occurred, mainly along the lines of ‘"Brilliant!", "Top !" and "Spot on", all applied to the students’ own work. Until that point we had suspected that the students had not grasped the mechanics of adding qualities and that we had made the process too obscure, but this showed otherwise. With sufficient motivation, qualities were rapidly added.

Early questioning suggests that students may not have added new qualities because the existing ones mostly suited their needs and they were avoiding unnecessary effort. Similarly, voting was not as common as we had expected- the process was apparently too tedious and went virtually unrewarded.

Although we adjusted the interface following feedback from the pilot trial, difficulties understanding how to vote may still have influenced the results. A similar issue was that the students were at first unsure of the difference between quality-selection and inputting search terms. One student made the comment that a resource was ‘good for beginners’ without voting for the quality ‘good for beginners’. There are clearly some user interface issues to be resolved.

Some qualities were a lot more successful than others, with the most popular qualities gaining nearly three times as many votes as the least popular, when measured over an equal period. We believe that such qualities provide a clue as to what students really want out of their learning resources, but this is as yet not fully proven. Qualities that started successfully remained successful.

To simplify the voting process we have only provided a mechanism to allow users to agree that a resource exhibits a certain quality, not to disagree. Some students questioned this and in one of the other instances of CoFIND this has even led to the appearance of negative qualities- e.g. ‘not available’.

 

Evidence for Evolution

Useful Qualities

Qualities rose up the list relative to other qualities. For example, the quality ‘brilliant!’ was initially created as a quality of one group of students’ report, one day from the end of the assignment. It was immediately taken up by all the other groups as they handed their work in on the final day (clearly a successful quality when applied to students’ work), and within a day of entry was being applied to other resources and appeared half way up the list.

 

Competing Qualities

We have as yet gleaned no evidence of a discernible effect of votes for one quality affecting votes for any other. However, the relatively short duration of this experiment and relatively small number of qualities only allows us to represent the coarsest of trends. It is hoped that we might be able to determine longer-term interactions in other experiments taking place over the next few months.

 

Self-sustainability

We have to ensure that participation takes place, as without it we cannot see the growth and variation required to make the system self-sustaining. Although participation was strongly encouraged in this experiment by the allocation of marks, aspects of the system which were not marked (e.g. voting) were not as widely used as we would have hoped for, partly as a result of user interface issues already mentioned. We are still seeking ways to make voting a natural consequence of using the system, either transparently or by providing an incentive to participate. For example, An ACF such as Firefly cannot recommend films unless it can identify patterns of likes and dislikes. Thus user-input is essential to use the system in the first place, but we do not have that advantage. If nothing else, voting should be simple and fast. We had hoped to leave part of the system in a small frame whenever an URL was selected, allowing the user to vote for the resource in front of them without seriously interrupting their work. However, the security model of scripting languages in browsers made this an untenable solution.

We are investigating a further enhancement to the design to allow a more volatile shift in qualities. At the moment, a quality’s list-position is based solely on how many votes it has been given. An option that may be implemented in the next iteration of the tool is to make selection of qualities from the search screen feed back into the position of the qualities in the list. An advantage of this would be that no active participation in the voting process would be required, merely selection of qualities.

 

Other Problems

We have not given users the ability to change details after a resource has been posted, mainly because it would be too easy to cheat- for instance, a user could amass votes for a known popular site and then redirect it to their own home page. However, this makes it very difficult to deal with resources whose URLs change. No simple solution to this has yet occurred to us, with the existing mechanism requiring an email message to the administrator.

The current system for ordering resources is based on a simple count of votes for selected qualities, accompanied by a bar-graph display of those votes. This could become unwieldy as more votes are generated. We have looked into the possibilities of scaling votes, either by percentages or amalgamation, but we have yet to reach a sufficient number of votes for the issues to come to a head.

 

 

Conclusions

This is the first working iteration of our tools. We have designed an environment to allow evolution to occur but as yet we are only seeing fairly coarse changes and little evidence of complex relationships developing. In addition, further work is clearly needed on the HCI aspects of the system, particularly as it relates to managing and motivating interactions and ensuring that a sufficient range of qualities is generated. Possible avenues include providing ‘rewards’ (such as a mention for successful resource providers on the front page), a reduction in the number of steps and pages needed to add to the system and vote, and the explicit positioning of CoFIND as a useful bookmark repository for individuals and groups. This latter mechanism may allow selfish motives to drive the growth of the system.

However, in many ways (particularly from the point of view of our students) our early experiments are a success. Not only do the learners collaboratively generate a useful database of resources, but more valuably we are coming to know what it is that those learners desire in a resource, through the evolving sets of qualities that they seek. As providers of resources, this in turn means that we can start tuning our resources more effectively to the needs of our students, and concentrating on more relevant areas. We have laid the foundations of a self-sustaining feedback system that could result in ongoing improvements to the online resources we provide as well as putting the selection of learning resources firmly into the hands of the learners.

 

References

Amazon home page <http://www.amazon.com>. (1999, February 12)

Calvin, W.H. (1997). "The Six Essentials? Minimal Requirements for the Darwinian Bootstrapping of Quality," Journal of Memetics 1

Chernenko, A. (1997) Collaborative Information Filtering and Semantic Transports, <http://www.lucifer.com/~sasha/articles/ACF.html>. (1999, February 12)

Dron, J. (1999) MscIS CoFIND instance <http://ituser.it.bton.ac.uk/staff/jnd/mscisassignment/resource/getemail.asp>.

(1999, February 12)

Dennett, D. (1996), Darwin’s Dangerous Idea, Penguin 1996

Firefly home page <http://www.firefly.net>. (1999, February 12)

Kauffman, S. (1995) At Home in the Universe: The Search for the Laws of Self-Organization and Complexity. Oxford University Press, 1995

PHOAKS home page <http://www.phoaks.com>. (1999, February 12)

Resnick, P. (1997). Filtering Information on the Internet. Scientific American. March 1997

Yahoo home page <http://www.yahoo.com>. (1999, February 12)