Digitising a machine-tractable version of Kamus Dewan with TEI-P5
- Published
- Accepted
- Subject Areas
- Computational Linguistics, Natural Language and Speech
- Keywords
- Machine-tractable dictionaries, TEI, Language resources, Bahasa Malaysia
- Copyright
- © 2016 Lim et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2016. Digitising a machine-tractable version of Kamus Dewan with TEI-P5. PeerJ Preprints 4:e2205v1 https://doi.org/10.7287/peerj.preprints.2205v1
Abstract
Kamus Dewan is the authoritative dictionary for Bahasa Malaysia, containing a wealth of linguistic and cultural information about Bahasa Malaysia. It is currently available in print, as well as a searchable online dictionary. However, the online dictionary lacks advanced search capabilities that target specific fields within each headword and lemma entry. For this information to be targeted and extracted efficiently by computers, the macro- and micro-structures of Kamus Dewan entries need to be first annotated or marked up explicitly. We describe how TEI-P5 guidelines have been applied in this endeavour to make the Kamus Dewan more machine-tractable. We also give some examples of how the machine-tractable data from Kamus Dewan can be used for linguistic research and analysis, as well as for producing other language resources.
Author Comment
This is a preprint submission to PeerJ Preprints.