Max Planck Digital Library

Towards integration between collaborative tagging and professional indexing

"Hier Haufen von Blaettern, dort Baeume" oder laesst sich professionelle semantische Kontrolle mit dem Ausdrucks- und Beziehungsreichtum partizipierender Nutzer integrieren?

Traugott Koch, Max Planck Digital Library

Presentation at Berliner Bibliothekswissenschaftliches Kolloquium 2007-07-17

Contents:
1 Introduction
2 Terminology and typologies
  • 2.1 Terminological issues
  • 2.2 Approaches exploring "social" and community activities
  • 2.3 Categorizations of social tagging systems
3 Social tagging systems in present practice
  • 3.1 Problems
  • 3.2 Oppertunities
4 Higher Education sector opportunities, R&D recommendations
  • 4.1 Greatest benefits to expect
  • 4.2 Research
  • 4.3 Developments and experiments
    • 4.3.1 Improve existing social tagging systems
    • 4.3.2 Build alternative tagging systems
    • 4.3.3 Integrate user tagging into existing systems so far dominated by professional indexing
5 Summary
References


1 Introduction

  • Hype
    • Web 2.0, Library 2.0, Business 2.0
    • speak of revolution, new era etc. [Kroski]
    • who is really using social tagging?
    • what phase of the hype are we in?

  • Not new
    • author keywords [Index Medicus etc.]; user created structures [DMOZ, Yahoo directories, Wikipedia categories]; naive classification [Beghtol 2003]
    • systems that invite user corrections [CiteSeer]
    • author and end user metadata (thought to be unrealistic). Now: immediate personal reward
    • how successful have they been?

  • Risks
    • commercial interests (Yahoo bought flickr, delicious etc. ): improved user and group profiles for better targeting
    • big brother (US Gov subpoenas; censorship), privacy
    • will not take off, frustration of contributors
    • stop of other knowledge organization activities

2 Terminology and typologies

  • 2.1 Terminological issues
    • 16 different meanings to the term tag(ging) acc. to wikipedia
    • folk - sonomies; community terminology; tagsonomy
    • fluid, culturally expressed ontologies; ethnoclassification [Merholz]
    • is it always collaborative, social? Maybe participatory. Individual vs. social tagging
    • tagging, keyword indexing, categorization, classification, faceted classification
    • serendipity kind of browsing vs. systematic browsing
    • broad vs. narrow folksonomies [Van der Wal]
    • bottom-up vs. top-down and connection to analytico-synthetical resp. facets vs. hierarchical-enumerative schemas

    Focus here on social tagging for topical discovery

  • 2.2 Approaches exploring "social" and community activities
  • Differentiation between the different approaches needed, awareness of relationships, though

    • Linking (Search engines ranking)
    • Citation (CiteSeer, Google Scholar)
    • Annotation
    • Recommendation, Recommender systems (e.g. Bookmark sharing, unalog; reviews); Rating
    • Wish lists (Amazon); Reading lists; Shopping lists
    • Usage:
      • Popularity:
        • of usage (Search engines ranking);
        • of bookmarks (delicious, CiteUlike, Digg.com, FeedButler, ...)

      • User behaviour, preferences, aggregate choices (Amazon; ConnectViaBooks)

    • Social tagging, collaborative web tagging, user contributed metadata
    • Collaborative filtering
    • Social searching
    • Customization, Personalization

  • 2.3 Categorizations of social tagging systems

  • by content creator and tag users: self - others [Hammond et al]
    by audience: scholarly - general [Hammond et al]
    by object type
    • web pages/blogs, bookmarks (del.icio.us, Connotea, CiteULike, Technorati)
    • pictures (flickr)
    • video (youtube)
    • music (Last.fm, Listal)
    • products (Amazon product tagging; Yahoo Shoposphere)
    • news (Digg, News Meme, NewsVine)
    • goals (43 Things)
    • friends (FOAF)
    • advertising (Adzooks, theadcloud)

    Multidimensional typology [Voss 2007 based on Marlow 2006]: Tagging rights; source of resource; representation of resource; tagging feedback; tag aggregation; vocabulary control; vocabulary connectivity; resource connectivity; automatic tagging.
    This is rather a multi-dimensional description approach which could serve as analytical basis for a typology.

3 Social tagging systems in present practice

  • 3.1 Disadvantages/problems:
  • "Tagging bulldozes the cost of classification and piles it onto the price of discovery" [Ian Davies]

    "The old way creates a tree. The new rakes leaves together" [David Weinberger]

    Major:
    • Undecided/mixed purpose (classification vs indexing; personal document and task management; social networking)
    • Same approach for all object types (text, web pages, link lists, blogs, pictures/other media, multimedia etc.)
    • No controlled vocabulary, name (first names, last names, nicknames, organisational names) and other authority forms of terms (places, time periods etc.)
    • No rules: exhaustivity, specificity etc.
    • No structure
    • No topical context
    • Problem of highly different granularity of tags
    • Not suitable for targeted effective search or systematic topical browsing
    • No backwards reference to changing vocabulary use, versions of terms and vocabulary systems (as in scope notes, versioned vocabularies etc.)

    Detailed:

    Ex.: Stephen's tagcloud in flickr

    Ex.: Uni Muenster, Zweigbibliothek Medizin

    • Wordform (singular, plural), other morphological inconsistencies (nouns), spelling [Guy], word case, use of numbers
    • Different character sets and transliteration
    • Lack of synonym control
    • Lack of homonym control
    • Bad retrieval performance: low precision (one word terms); low recall
    • No phrases (in most systems)
    • Compound construction (special character usage, punctuation etc.)
    • Both pre- and post-coordinated approach
    • Different languages, multilingual systems
    • Uncontrolled acronyms, name forms (incl. geographic)
    • Place names: as classification or just as meaninglessly associated place
    • Times, dates
    • Personal and emotional, time and task related contexts, tags not understandable by/meaningful to others (no interpersonal/intergroup meaning), e.g. Me, mine, (my) brother, fun, to buy
    • Context missing
    • No structure, hierarchy or relationships
    • Hierarchy/structure encoded in tags
    • Other information encoded in tags (longitude-latitude)
    • Other metadata put into tags (places, times, names, media, formats, types etc. vs. topics)
    • Different document types
    • Highly vulnerable to abuse such as manipulation, corruption, spamming etc.
    • Problems of privacy infringement, trust and authority

  • 3.2 Advantages/opportunities:
  • as listed by social tagging supporters

    A folksonomy is "liberating, not restrictive; bottom-up, not imposed; relational, not hierarchical. It also cleverly harnesses selfish acts and directs them towards the common good. But most of all, it just seems to fit the way our brains work" [T. Hannay]

    • Represents co-active intelligence [Ashman CS Nottingham]; wisdom of the crowds [Surowiecki]
    • Derivation of meaning from the consensus arising from mass human interaction
    • Bypasses the need for the explicit representation and automatic calculation of language-use rules [Ashman]
    • Current, copes with fast changes
    • Directly reflects users information needs, user-centered and not expert-dictated
    • Language and cultural richness, reflects user language, has no information loss
    • Inclusive to minorities and niche interests, without cultural, social or political bias
    • Democratic
    • Self-moderating via social dynamics
    • Offers discovery rather than finding
    • Supports serendipidy
    • Supports learning
    • Offers insight into user behaviour
    • Forced move, unavoidable ["amateurization of cataloguing", Shirky]
    • Best trade-off (between low cost/simplicity and retrieval performance)
    • Cheap
    • Better than nothing
    • "Low-investment bridge between personal and shared classification" [McMullin]
    • Most shortcomings are in fact design features
    • Can create and support communities

Tennis 2006 carried out a systematic framework analysis based comparison between social tagging and subject cataloguing:

Both are types of indexing. Indexing is the interpretation and representation of significant characteristics of documents for information systems.
Important differences should not be neglected, social tagging not just a reinvention of indexing.
Different in purpose, parts of different discourses (authority, authorship, intertextuality, language deployment), different predications (operationalization to achieve the purpose), functions and contexts.
Subject cataloguing identifies users needs for finding and precise collocating library objects by subject (formal, intentional, complete).
Tagging systems are built to enable sharing and managing citations, photos, web pages with idiosyncratic, time and task related tags (social, personal, accidental).
Main differences in analysis process, scope of documents, intended users and purpose. Professionals vs artists, Fordist vs post-Fordist environment.
(Real implementations and practice are changing away from the ideal models described by Tennis to multiple purposes, predications, functions and contexts.)
Social tagging addresses shortcomings in traditional indexing: insufficient representation of indexing authorship and task, lack of links to literary, user and request warrant, lack of explicit intertextuality.

4 Higher Education sector opportunities, R&D recommendations

Social tagging services as we know them today are not performing well when it comes to efficient searching, or systematic browsing and discovery

They should not replace other indexing and Knowledge Organization efforts.

In real information system implementations, also depending on the composition and purpose of the service, social tagging and controlled vocabulary indexing are only occasionally complementary (mostly in OPACs, some subject gateways, A&I databases).
Additional, overlapping or only indexing available. Hard to know the degree and details of the overlap, thus difficult to design proper search and browse features and to judge losses in recall.

4.1 Greatest benefits to expect:

  • extended scope: materials and publications which are largely ignored by other services
  • focused services: smaller cooperating groups, specialised subjects, communication intensive work environments, fields where there are no vocabulary systems
  • new media types: mixed media and multimedia indexing (incl. learning objects)
  • input to established systems/services as additional data
  • combination with Knowledge Organization systems and subject access, different layer of indexing
  • creation and/or improvement of vocabularies
  • stimulation of development and research efforts

4.2 Research:

  • user behaviour related to tagging and navigation: tagging practice, influence of social environment and related/popular tags display, functional and linguistic characteristics [Kipp 2006 and 2007; Hammond 2005; Kinds of Tags project UKOLN, Tonkin et al; Wolff at al Univ. Regensburg: tag categorization; Connotea tag statistics; and many more]
  • retrieval performance
  • social benefit
  • benefit of tagging standards
  • scalability and architecture
  • mass effects/intelligence
  • structure of tagspace [Golder 2006]
  • convergence of terminology [HP Labs: after 100 taggers describe the same website; Mathes 2004]
  • integration of heterogeneous tagsets, multiple "ontologies"
  • how many of the tags are covered in established systems and what is really new terminology, compare with alternative terminology creation approaches
  • comparison with author and professional indexing re. discovery improvements. Insights
  • study new developments below
  • tools and user interfaces for tagging (steve.museum research agenda; guided tagging, influence of UI, architectures, data analysis, integration into museum systems etc.)

4.3 Developments and experiments:

In principle:
  • Additional layer of indexing and classification in information systems: keep and use all layers (professional intellectual indexing, automated indexing, end-user tags, improved tags)
  • Basic indexing in systems without any other manual/intellectual indexing
  • End-user corrections to facts, named entities, topical indexing [CiteSeer; Doerr Co-reference tagging]
  • Creation and improvement/expansion of vocabularies (collecting terms/mining tags, associating terms, intellectual mapping, Information Retrieval measures such as co-occurrence and clustering, AI learning algorithms for text categorization)

Related to types of systems:
  • 4.3.1 Improve existing social tagging systems:
  • Systems so far lacking professional intellectual indexing:
    • Wikis
    • Blogs (Blog tagging, indexed by search engines, such as Technorati and RawSugar)
    • News services, RSS feeds
    • Encyclopedias, dictionaries (e.g. wikipedia categories)
    • Search engines [DeliSearch using del.icio.us to filter searches]
    • Directories [Yahoo directory and social systems; Dmoz]
    • Collaborative (or personal), non-professional cataloguing [Connotea, BibSonomy, CiteULike, LibraryThing]

    Actions:
    • define the purposes (build classification on top of improved indexing)
    • user education, guidelines (?)
    • system support during tagging process:
      • improved and adapted user interface
      • tagging categories/purposes for selection
      • tag suggestion (beyond popularity and overlapping tags as in del.icio.us), keyword/keyphrase extraction [Google Suggest; OCLC terminology and name web services]
      • system feedback
      • allow tag definition [delicious tag descriptions March 2007; Connotea tag notes, unstructured]
      • allow indication of tag relations/hierarchy [BibSonomy]
      • dictionary lookup
      • visualization (beyond tag clouds and other popularity rankings)

    • tag improvements by the system after initial tagging
      • spelling correction
      • language specification
      • compound treatment
      • synonym linking [LibraryThing]
      • WordNet, Wikipedia disambiguation
      • create flat island/partial hierarchies or semantic networks [automatic creation of tag hierarchies, Heymann 2006]
      • create facets [fac.etio.us: Siderean's faceted search of delicious tags]

    • manage the tag set
    • mapping to different metadata elements, in addition to subject [Kinds of Tags project, Tonkin et al; FAsTA: A Folksonomy-Based Automatic Metadata Generator, Southampton]
    • search and browse improvements: tag clusters beyond flickr clusters; co-occurrence; other aggregations; filters; ranking; visualization [Chudnov/unalog using Starlight; flythrough navigation in Lust Digital Depot]
    • user interface improvements
    • combination with other Information Retrieval features: co-occurrence of tags [del.icio.us], clustering of tags [for disambiguation: Flickr]; map tags to other vocabularies; create concept maps

  • 4.3.2 Build alternative tagging systems:
    • optimise for discovery and retrieval
    • experiment with tagging in more homogeneous services
    • systematic use of controlled vocabularies in tagging services
    • offer both free tags, automated tags, improved tags and controlled vocabulary for navigation
    • do more with the tagging data
    • map and link tags to facets, controlled vocabularies and authorities [Milne 2006, using wikipedias categories and link structure]
    • hook library and discovery services into social tagging systems [Lorcan Dempsey]
    • systematic tag definition and terminology creation environments

  • 4.3.3 Integrate user tagging into existing systems/services so far dominated by professional indexing:
  • Still, most often, tags are just added to the system or used as additional search and alphabetic or frequency browsing options
    Real integration is needed
    How can we exploit the user contributions and behaviour in a better way?

    • OPACs: Koelner UniversitaetsGesamtkatalog: individual tagging and tagclouds with Normdaten, forward to BibSonomy; LMU Muenchen: Connotea tool inclusion and LMU edocs added to Connotea; Livetrix at Groningen Univ. Library: social and automatic tagging; PennPal/PennTags, topical bibliographies as "projects"; OPACi prototype from Casey Bisson, WPopac; Open WorldCat reviews
    • Subject Gateways: for resource selection and improved subject access [GeoPortal Singapore]

      "Enhanced Tagging for Discovery Project" UKOLN/JISC:
      Experiment with improvements from 4.3.1 and 4.3.2 above, data from Intute (RDN) Social Sciences. Two groups of postgrad. students will tag with a vanilla social tagging system and a hybrid system.
      Simple and hybrid search system. Evaluation: compare two systems, social tagging vs hybrid tags and DDC classes, compare tags with previous professional indexing in DDC and automatic DDC classification, compare tags with search requests, retrieval effectiveness.
      The core of the system are different types of suggestions:
      • General: Improve tags re. spelling, singular/plural etc.; tag description; motivation for tagging
      • Based on previous personal or group tagging: type ahead
      • Applying DDC (incl. mapped terms) to user tags: prioritised suggestions from DDC; after selection: disambiguation and refinement suggestions
      • Applying automatic classification to the information resource: DDC resp. tag classification of document content; suggestions based on co-occurrence of tags with DDC; KEA++ phrase indexing
      • If DDC indexed resources available: suggest terms associated to that class; refine/expand; visualize DDC hierarchy; structure tag clouds with DDC; filter by DDC high level categories.

      Specification of tagging purpose: personal KO/bookmarking, recommendation, part of information searching and relevance assessment etc.
      Expansion of DDC entry vocabulary: used and refused DDC suggestions, free tags added. Comparison to previous professional DDC classification.
      DDC data is provided via OCLCs webservices.
      Tagging encouragment with rewards
      Potential addition: tagging by a research group in an area where no suitable KOS is available

    • Directories in domain and professional systems
    • Institutional repositories: basic indexing, conceptual structures
    • Subject repositories: resource selection, new vocabulary, conceptual structures
    • Citation services
    • Digital libraries
    • Knowledge Organization System (KOS) creation and development [Milne 2006]
    • Museum online interactive exhibitions and object catalogues [Steve.museum; ED2 project Cambridge Univ. Museum of Anthropology]
    • Metadata enhancement services (indexing richeness, disambiguation)

Advanced features applicable to all three types of systems:

  • Present different discovery and retrieval views
  • Allow to enter through different indexing layers: Tagging hits, controlled vocabulary hits, other
  • Co-occurrence and similarity clustering: tags with other tags, with documents, with controlled terms, with taggers and groups/communities
  • Automatic linking via tags, named entities or indexing terms/categories to external resources; mashups: resources distributed in terminology system space, geographical or time space [Perseus DL]

5 Summary

  • Beyond considerations of weaknesses and strengths of professional and social indexing, its about time to start more systematic experimentation with integration efforts
  • Rather than constructing a total hybrid system, stepwise extensions of existing services (existing and alternative tagging systems, systems dominated by professional indexing) seem more promising
  • The potential of participative indexing (and factual correction) is by far not yet exhausted. Beyond wide interoperability in very broad information systems, individuals information management and ad-hoc or existing groups communication based on concepts, terminology and references can be considerably enhanced
  • Indexing/tagging, searching, browsing and navigation support need still to be dramatically improved, incl. the application of automated routines
  • Systematic topical structures should be applied to tag clouds, browsing systems and mashups
  • Indexing based on different approaches need to be separable in layers and intelligeable as to its purpose, coverage and qualities
  • Professional controlled vocabularies need to be further developed based on social tagging
  • Linking and referral to versioned and uniquely identified conceptual and named entity authority sources is necessary for semantic quality, interoperability and long-term understanding of meaning
  • Related research efforts need to be intensified
  • All involved communities and disciplines need to be open to developments and approaches not invented by themselfes


References

BibSonomy http://bibsonomy.org/

CiteUlike http://www.citeulike.org

Connotea http://www.connotea.org

Del.icio.us http://del.icio.us/

Digg http://digg.com/

Flickr http://www.flickr.com/

Furl http://furl.net/

LibraryThing http://www.librarything.com

RawSugar http://rawsugar.com

Unalog http://unalog.com/

Technorati http://www.technorati.com/

Bearman, D. and Trant, J. (2005). Social Terminology Enhancement through Vernacular Engagement. Exploring Collaborative Annotation to Encourage Interaction with Museum Collections. In: D-Lib Magazine, 11:9, Sept. 2005
http://www.dlib.org/dlib/september05/bearman/09bearman.html

Bray, Tim. Do tags work? 205-03-04
http://www.tbray.org/ongoing/When/200x/2005/03/04/DoTagsWork

fac.etio.us http://www.siderean.com/facetious/facetious.jsp: Not publicly available anymore July 2007.

Folksonomy. http://en.wikipedia.org/wiki/Folksonomy

Golder, S. and Huberman, B.A. (2006). The structure of collaborative tagging systems. In: Journal of Information Science 32, 198-208.

Hammond, Tony, Hannay, Timo, Lund, Ben and Scott, Joanna (2005). Social Bookmarking Tools (I): A General Review, D-Lib Magazine, 11(4), 2005.
http://www.dlib.org/dlib/april05/hammond/04hammond.html

Hannay, Timo: Introduction. August 19, 2004, new version May 2, 2005.
http://tagsonomy.com/index.php/introduction-timo-hannay/

Heller, Lambert (2006). Bibliotheken und die Sacherschließung in sozialen Netzwerken Oder: "Passen Folksonomies und traditionelle bibliothekarische Sacherschließung zusammen?" Thesenpapier.
http://docs.google.com/View?docid=a748gvz5cx_17dgp8mv

Heymann, P. and Garcia-Molina, H. (2006). Collaborative creation of communal hierarchical taxonomies in social tagging systems. Technical report, InfoLab, Stanford.

Kipp, M.E.I. and Campbell, D.G. (2006). Patterns and inconsistencies in collaborative tagging systems: An examination of tagging practices. In: Proceedings ASIST, Austin, TX.
http://eprints.rclis.org/archive/00008315/

Kipp, M.E.I. (2007). @toread and cool: Tagging for time, task and emotion. In: Proc. Information Architecture Summit, Las Vegas.
http://eprints.rclis.org/archive/00010445/

Kroski, Ellyssa. The Hive Mind: Folksonomies and User-Based Tagging
http://infotangle.blogsome.com/2005/12/07/the-hive-mind-folksonomies-and-user-based-tagging/

Lust Digital Depot. http://www.lust.nl/lust/digitaldepot/

Mathes, Adam. Folksonomies - Cooperative Classification and Communication Through Shared Metadata
http://www.adammathes.com/academic/computer-mediated-communication/folksonomies.html

Milne, D., Medelyan, O. and Witten, I.H. (2006). Mining domain-specific thesauri from wikipedia: A case study. In: Proc. Conf. on Web Intelligence
http://www.cs.waikato.ac.nz/~olena/publications/milne_wikipedia_final.pdf

The searchguys weblog, May 13, 2005: Tags, keywords, and inconsistency
http://blogs.sun.com/roller/page/searchguy/20050513#tags_keywords_and_inconsistency

Steve.museum http://www.steve.museum/

Stephens, Michael. flickr tags http://www.flickr.com/photos/michaelsphotos/tags/

Tag patterns, from 10 bookmarking sites.
http://www.tagpatterns.com/
ex.: http://www.tagpatterns.com/tags/safari_export/all

Tagging. http://en.wikipedia.org/wiki/Tagging

Tennis, J.T. (2006). Social tagging and the next steps for indexing. In: Proc. 17th SIG-CR Classification Research Workshop, 4 Nov 2006.
http://www.slais.ubc.ca/users/sigcr/sigcr-06tennis.pdf

Trant, J. (2006). Exploring the potential for social tagging and folksonomy in art museums: proof of concept. In New Review of Hypermedia and Multimedia
http://www.steve.museum/index.php?option=com_weblinks&task=view&catid=35&id=37

Tudhope, D., Koch, T. and Heery, R. (2006). Terminology Services and Technology. JISC state of the art review. 96pp.
http://www.ukoln.ac.uk/terminology/JISC-review2006.html | PDF version at JISC .

Quintarelli, Emanuele: Folksonomies: Power to the People. Presented at the ISKO Italy-UniMIB meeting : Milan : June 24, 2005
http://www.iskoi.org/doc/folksonomies.htm

Voss, J. (2007). Tagging, folksonomy & Co. Renaissance of manual indexing? In: Proc. 10th Internat. Symposium for Information Science, Constance, pp.243-254 (contains bibliography)

Weinberger, David. Taxonomies and tags: from trees to piles of leaves
http://www.hyperorg.com/blogger/misc/taxonomies_and_tags.html


Traugott Koch, traugott.koch@mpdl.mpg.de
Created: 2007-07-10
Last modified: 2007-07-25
URL: http://www.mpdl.mpg.de/staff/tkoch/pres/tagging-BBK0717.html