Geographic Feature Type Topic Model (GFTTM): grounding topics in the landscape
- Published
- Accepted
- Subject Areas
- Data Mining and Machine Learning, Data Science, Natural Language and Speech, Spatial and Geographic Information Systems
- Keywords
- Text mining, Topic modeling, Volunteered geographic information, Bayesian inference
- Copyright
- © 2015 Adams
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
- Cite this article
- 2015. Geographic Feature Type Topic Model (GFTTM): grounding topics in the landscape. PeerJ PrePrints 3:e816v1 https://doi.org/10.7287/peerj.preprints.816v1
Abstract
Probabilistic topic models are a class of unsupervised machine learning models used for understanding the latent topics in a corpus of documents. A new method for combining geographic feature data with text from geo-referenced documents to create topic models that are grounded in the physical environment is proposed. The Geographic Feature Type Topic Model (GFTTM) models each document in a corpus as a mixture of feature type topics and abstract topics. Feature type topics are conditioned on additional observation data of the relative densities of geographic feature types co-located with the document's location referent, whereas abstract topics are trained independently of that information. The GFTTM is evaluated using geo-referenced Wikipedia articles and feature type data from volunteered geographic information sources. A technique for the measurement of semantic similarity of feature types and places based on the mixtures of topics associated with the types is also presented. The results of the evaluation demonstrate that GFTTM finds two distinct types of topics that can be used to disentangle how places are described in terms of its physical features and more abstract topics such as history and culture.
Author Comment
This preprint will be a submission to PeerJ CS for review.
Supplemental Information
gfttm plate notation
Geographic feature type topic model plate notation
Screenshot of geonames features and related Wikipedia articles
Sample of geonames.org features and georeferenced Wikipedia articles in the vicinity of Yosemite valley. The `W' icons refer to Wikipedia articles and the numbered icons refer to geonames.org features. The colors correspond with the 9 broad feature type categories defined by geonames.org.