Communal Categorization: The Folksonomy1

by David N. Sturtz
INFO622: Content Representation
December 16, 2004

Introduction

A recent flurry of discussion in the information architecture community has revolved around a concept being called, among other things, “social classification,” “ethnoclassification,” and “folksonomies2.” While it is clearly a popular phenomenon, it is not immediately apparent what use, if any, these organizational schemes are, and what their potential benefits might be. This paper will define the term, present current instances of folksonomies in use, discuss some potential benefits of folksonomies and suggest some directions for future research.

The first task is to define what exactly a folksonomy entails. Thomas Vander Wal, the individual credited with splicing “folk” onto “taxonomy” to create this neologism, calls it a “bottom-up social classification” (Vander Wal, 2004). Clay Shirky writes that they are “socially created, typically flat name-spaces” (Shirky, 2004). The centrally defining characteristics of folksonomies are thus their bottom-up construction, a lack of hierarchical structure, and their creation and use within a social context.

In practical terms, a folksonomy is the complete set of tags—one or two keywords—that users of a shared content management system apply to individual pieces of content in order to group or classify those pieces for retrieval. Users are able to instantly add terms to the folksonomy as they become necessary for a single unit of content.

The public nature of these terms is an essential feature, as it allows users to instantly determine how others have used the same terms in categorizing their own content, and view terms others have added. Through this loop of use and examination the community is able to shape the folksonomy, encouraging useful applications and eliminating useless ones (Shirky, 2004).

Folksonomies in Action

Unlike buried keyword metadata, hidden from view and used only to aid in searching, folksonomies exist on the surface, visible and useful. They reach beyond the folder metaphor, allowing the user to file her content in many places at once. Through co-assignment of terms, the folksonomy gains a structure based on convergence and use.

The differences between folksonomies and other classification and metadata schemes, becomes increasingly apparent when applied to an actual collection of information. The three most commonly cited folksonomies in action are the websites Flickr, Del.icio.us, and Furl. Additionally, it is worth exploring how the concept of folksonomies relates to strategies employed by Google in their web search algorithm.

Flickr

Flickr.com allows people to store their digital photographs and share them with family, friends, or the general public. To assist users in organizing their dozens or even hundreds of photos, Flickr allows them to assign tags to their images as they upload the files to the site. These tags can then be used to sort through the photo collection.

Unlike folder-based, hierarchical systems, the tags allow people to place each image in multiple groups. A photo can have tags of "flower," "summer," "2004," and "Colorado." While viewing the image, clicking on the tag "flower" will display all images that individual has tagged with the term "flower." A second click shows all photographs by any user that have been associated with the word "flower."

Del.icio.us and Furl

Del.icio.us and Furl.net function in a similar way to Flickr, but deal in electronic bookmarks rather than digital images. Instead of using a hierarchical folder structure to organize the links as is done within most web browsers, these sites allow the user to assign tags to the bookmarks.

Convergence plays a greater role in the functioning of Del.icio.us and Furl than it does for Flickr. The Flickr user categorizes only his unique content, while on the bookmark sites multiple users categorize the same piece of information. This second method would increase the likelihood of a single piece of content being categorized the way a unique user would expect, but also increases the chances of having additional instances of that content appear in other categories the user would not expect, or even desire. To minimize the effect of these fringe categorizations, the sites rank the bookmarks within each category by the number of users that categorized the link using that term. The content most likely to pertain to the folksonomy term is thus pulled to the top of the stack.

Google

In a more abstract way, Google's search also makes use of a vast folksonomy. The PageRank algorithm used by Google matches user's queries to web pages based partially on clues in the linked text on the web pages it crawls (Brin & Page, 1998). The links contained in a web page supply their linked text to Google's folksonomy. Google interprets that text as a categorization of the page the hyperlink points to.

When a user queries Google a portion of the logic behind the scenes involves matching the user's terms against the folksonomy of words others used in links. Just as in the other folksonomy-driven systems, terms with the greatest convergence are brought to the top. The difference here is that the tags themselves are invisible to the searcher.

The phenomenon of the Google bomb3 provides an excellent illustration of how this works. To create a Google bomb, hundreds or thousands of people intentionally categorize a site (e.g. JohnKerry.com) using an arbitrary term (e.g. "waffles"). This is accomplished by creating hyperlinks pointing to the targeted site and using the selected term as the text for the link. When a user queries Google for the term "waffles," the search engine will return the website that has been classified frequently with that term-in this case presidential candidate John Kerry's homepage4.

Discussion

Looking beyond the obvious drawbacks that accompany an uncontrolled vocabulary (duplication, inconsistency, imprecision, etc.), folksonomies present a number of opportunities for exploration and application, and feature some unique benefits.

The use of folksonomies for sorting images, as in the case of Flickr, appears to be a good match. Classifying images is notoriously challenging and time-consuming (Jörgensen, 1999). Browsing as a method for navigating image collections is a good retrieval solution, and easily supported by a folksonomy (Gordon, 2001). The previously mentioned co-assignment of multiple terms to a single image encourages browsing and introduces a degree of serendipity.

Giving the end user control over the organization of content can also allow entirely new domains to develop. On Flickr, groups of users have created specific tags to allow them to share images of their computer desktops5 or instant messaging status6 as a sort of personal history or discovered narrative. In this way, categories become things, and the classification becomes a shared space for communities of users to explore, and develop (Lakoff, 1987). This cyclical process allows the community to define and develop these areas of focus (Shirky, 2004). Indeed, folksonomies seem particularly well suited to loosely-defined, developing fields, in contrast to hierarchical schemes, which Barbara Kwasnik describes as, "excellent representations for knowledge in mature domains in which the nature of the entities, and the nature of meaningful relationships is known" (Kwasnik, 1999).

It is important to remember that classification inherently carries social, political, and economic implications (Bowker & Star, 1999). The democratic approach of a folksonomy avoids many of the ethical and political concerns of top-down, centrally-imposed systems. It allows the users of the system to establish their own sense of balance within the system, to use their own vernacular for indexing and retrieval, and prevents exclusion by creating new categories as needed. However, with no guiding hand at the helm, this communal approach has the power to shut out unpopular or misunderstood terms. The will of the community may flood them with useless content, use them in an unintended way, or marginalize them so that they essentially disappear.

Future Research

Those looking to further explore the implications and potential applications of folksonomies would do well to research how people use folksonomies to organize information, investigate ways of evaluating their usefulness, develop methods for capturing structures developed through folksonomies, and examine folksonomies as a way of understanding a community's conceptual model of a subject.

Past research has looked at how users organize physical spaces and paper-based information, and how they handle the same tasks metaphorically with electronic files and folders (Neumann, 1999, Gottleib & Dilevko, 2001, 2003). However, folksonomies allow users to sort items into multiple categories simultaneously. New research should examine the how users organize information in a purely digital way, dropping the physical-world metaphors.

Examining the quality of the indexing provided by users is essential to determining the usefulness of folksonomies for that purpose. For instance, checking tagged images against models of visual content (such as in Layne, 1994) to determine the depth or thoroughness of user's indexing. Methods for the measurement of recall and precision, and ways to model the scatter of terms should be investigated.

A major benefit of the folksonomy is beginning with a blank slate on which the structure of a content space can be allowed to develop through use until patterns emerge (Haverty, 2002). Once those patterns have emerged methods are needed to formalize those structures and form them into solidified supports for the content they describe.

Finally, in light of the importance of the social aspects of folksonomies, it is essential to gain an understanding of how a folksonomy is formed by a community. In what subtle ways are decisions about terms made? Folksonomies offer a unique insight into the mind of a community of users. What do they say, and is it accurate?

Conclusion

The folksonomy presents a new opportunity for users to participate in the creation of classification systems. New applications of this concept will and should be developed. At the same time, it is also essential to investigate the strengths and weaknesses of this new tool so that it may be used appropriately and to the greatest advantage. Of particular interest are its application to categorizing content in developing fields, and allowing the navigation of content sets by browsing. Finally, the social aspects and implications of these community-created systems are also of great significance and deserve exploration.


Notes

  1. "Communal Categorization" is a phrase used by Clay Shirky (Shirky, 2004). "Folksonomy" is a neologism attributed to Thomas Vander Wal by Gene Smith (http://atomiq.org/archives/2004/08/folksonomy_social_classification.html),
  2. Throughout this paper I will use the term "folksonomy" as it has been prevalent in the popular discussion and avoids some of the confusions and overtones to which terms such as "ethnoclassification" and "social classification" are prone.
  3. Possibly first reported by Adam Mathes at http://uber.nu/2001/04/06/
  4. http://www.wired.com/news/politics/0,1283,63557,00.html
  5. http://www.flickr.com/photos/tags/desktopshowandtell
  6. http://www.flickr.com/photos/tags/ichatstatus

References