About

Log in?

DTU users get better search results including licensed content and discounts on order fees.

Anyone can log in and get personalized features such as favorites, tags and feeds.

Log in as DTU user Log in as non-DTU user No thanks

DTU Findit

PhD Thesis

When Your News and Labels are Unreliable

From

Cognitive Systems, Department of Applied Mathematics and Computer Science, Technical University of Denmark1

Department of Applied Mathematics and Computer Science, Technical University of Denmark2

This thesis presents two different projects rooted in uncertainty in machine learning and online news. The first project concerns the prediction of reliability and bias in American news articles popularized as fake news detection. There are three associated papers on the topic. The first paper presents a collected dataset, containing 700.000+ news articles from 194 sources, as well as detailed labelling of the sources from multiple, independent authorities.

Another paper analyses copying patterns between news sources, concluding that there is heavy copying and that the copying patterns reveal communities of sources publishing similar or even identical content. The final paper presents a large robustness study of a known reliability and bias detection system.

The system is tested on unseen sources, tested for performance decrease over time, and tested against three types of attacks aimed at the system. The second project aims at simplifying common problems with labels in classification. We propose a framework called decoupling, which uses probabilistic methods to handle - Semi-supervised learning: only some of the training data have labels - Positive-unlabelled learning: we only have labels on one of two classes - Multi-positive-unlabelled learning: we have labels on all classes but one - Noisy-label learning: labels are known to have errors The system can also handle combinations of the above.

We derive the needed approximations for optimizing labels in the framework and empirically show that it can assist in solving the problems above. The project is currently only available in preprint, but we expect to publish the work soon. We end off the decoupling-project by showing an new interesting classification task, that we have not seen elsewhere, which we call degenerate classification.

We show a simple case in which decoupling can be used to encode the necessary assumptions needed to learn 6 classes using only 4 labels.

Language: English
Publisher: Technical University of Denmark
Year: 2020
Types: PhD Thesis

DTU users get better search results including licensed content and discounts on order fees.

Log in as DTU user

Access

Analysis