About

Log in?

DTU users get better search results including licensed content and discounts on order fees.

Anyone can log in and get personalized features such as favorites, tags and feeds.

Log in as DTU user Log in as non-DTU user No thanks

DTU Findit

Journal article

Excavating the mother lode of human-generated text: A systematic review of research that uses the Wikipedia corpus

From

Concordia University1

Elon University2

Department of Applied Mathematics and Computer Science, Technical University of Denmark3

Cognitive Systems, Department of Applied Mathematics and Computer Science, Technical University of Denmark4

University of Oulu5

Although primarily an encyclopedia, Wikipedia’s expansive content provides a knowledge base that has been continuously exploited by researchers in a wide variety of domains. This article systematically reviews the scholarly studies that have used Wikipedia as a data source, and investigates the means by which Wikipedia has been employed in three main computer science research areas: information retrieval, natural language processing, and ontology building.

We report and discuss the research trends of the identified and examined studies. We further identify and classify a list of tools that can be used to extract data from Wikipedia, and compile a list of currently available data sets extracted from Wikipedia.

Language: English
Year: 2017
Pages: 505-529
ISSN: 18735371 and 03064573
Types: Journal article
DOI: 10.1016/j.ipm.2016.07.003
ORCIDs: 0000-0001-5574-7572 and Nielsen, Finn Årup

DTU users get better search results including licensed content and discounts on order fees.

Log in as DTU user

Access

Analysis