Journal article
Excavating the mother lode of human-generated text: A systematic review of research that uses the Wikipedia corpus
Although primarily an encyclopedia, Wikipedia’s expansive content provides a knowledge base that has been continuously exploited by researchers in a wide variety of domains. This article systematically reviews the scholarly studies that have used Wikipedia as a data source, and investigates the means by which Wikipedia has been employed in three main computer science research areas: information retrieval, natural language processing, and ontology building.
We report and discuss the research trends of the identified and examined studies. We further identify and classify a list of tools that can be used to extract data from Wikipedia, and compile a list of currently available data sets extracted from Wikipedia.
Language: | English |
---|---|
Year: | 2017 |
Pages: | 505-529 |
ISSN: | 18735371 and 03064573 |
Types: | Journal article |
DOI: | 10.1016/j.ipm.2016.07.003 |
ORCIDs: | 0000-0001-5574-7572 and Nielsen, Finn Årup |