Cultural advice

The Australian National University acknowledges, celebrates and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work, and whose cultures are among the oldest continuing cultures in human history.

Aboriginal and Torres Strait Islander peoples are advised that ANU Library collections may include images, names, voices, and other representations of deceased persons.

Material in the collection may contain terms, language or views that reflect the period in which the item was created and may be considered inappropriate today.

Automatic extraction of topic hierarchies based on WordNet

Loading...
Thumbnail Image

Date

Authors

Brey, Gerhard
Vieira, Miguel

Journal Title

Journal ISSN

Volume Title

Publisher

Australasian Association for Digital Humanities

Abstract

The aim of the research described here is the automatic generation of a topic hierarchy, using WordNet as the basis for a faceted browser interface, with a collection of 19th-century periodical texts as the test corpus. Our research was motivated by the Castanet algorithm, which was developed and successfully applied to short descriptions of documents. In our research we adapt the algorithm so that it can be applied to the full text of documents. The algorithm for the automatic generation of the topic hierarchy has three main processes: Data preparation, wherein data is prepared so that the information contained within the texts is more easily accessible; Target term extraction, wherein terms that are considered relevant to classify each text are selected, and; Topic tree generation, wherein the tree is built using the target terms. We evaluated samples of the resulting topic tree and found that over 90% of the topics are relevant, i.e. they clearly illustrate what the articles are about and the topic hierarchy adequately relates to the content of the articles. Future work will address problems resulting from mis‐OCRed words, erroneous disambiguation, and language anachronisms. Faceted browsing interfaces based on topic hierarchies are easy and intuitive to navigate, and as our results demonstrate, topic hierarchies form an appropriate basis for this type of data navigation. We are confident that our approach can successfully be applied to other corpora and should yield even better results if there are no OCR issues to contend with. Since WordNet is available in several languages, it should also be possible to apply our approach to corpora in other languages.

Description

Keywords

Citation

Brey, G. & Vieira, M. (March 2012). Automatic extraction of topic hierarchies based on WordNet. Presentation at the Digital Humanities Australasia 2012: Building, Mapping, Connecting [Conference][aaDH2012]. Canberra, Australia: ANU

Source

Book Title

Australasian Association for Digital Humanities Conference (1st : 2012 : The Australian National University, Canberra, ACT)

Entity type

Access Statement

Open Access

License Rights

DOI

Restricted until

Downloads

File
Description
Powerpoint presentation slides
abcd