Geographic Feature Type Topic Model (GFTTM): grounding topics in the landscape
Author and article information
Abstract
Probabilistic topic models are a class of unsupervised machine learning models used for understanding the latent topics in a corpus of documents. A new method for combining geographic feature data with text from geo-referenced documents to create topic models that are grounded in the physical environment is proposed. The Geographic Feature Type Topic Model (GFTTM) models each document in a corpus as a mixture of feature type topics and abstract topics. Feature type topics are conditioned on additional observation data of the relative densities of geographic feature types co-located with the document's location referent, whereas abstract topics are trained independently of that information. The GFTTM is evaluated using geo-referenced Wikipedia articles and feature type data from volunteered geographic information sources. A technique for the measurement of semantic similarity of feature types and places based on the mixtures of topics associated with the types is also presented. The results of the evaluation demonstrate that GFTTM finds two distinct types of topics that can be used to disentangle how places are described in terms of its physical features and more abstract topics such as history and culture.
Cite this as
2015. Geographic Feature Type Topic Model (GFTTM): grounding topics in the landscape. PeerJ PrePrints 3:e816v1 https://doi.org/10.7287/peerj.preprints.816v1Author comment
This preprint will be a submission to PeerJ CS for review.
Sections
Supplemental Information
gfttm plate notation
Geographic feature type topic model plate notation
Screenshot of geonames features and related Wikipedia articles
Sample of geonames.org features and georeferenced Wikipedia articles in the vicinity of Yosemite valley. The `W' icons refer to Wikipedia articles and the numbered icons refer to geonames.org features. The colors correspond with the 9 broad feature type categories defined by geonames.org.
Additional Information
Competing Interests
There are no competing interests.
Author Contributions
Benjamin Adams conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.
Data Deposition
The following information was supplied regarding the deposition of related data:
https://github.com/darwinzer0/gfttm
Funding
There were no funding sources.