A Spatial Data Infrastructure (SDI) is a framework of geospatial data, metadata, users and tools intended to provide an efficient and flexible way to use spatial information. One of the key software components of an SDI is the catalogue service which is needed to discover, query, and manage the metadata. Catalogue services in an SDI are typically based on the Open Geospatial Consortium (OGC) Catalogue Service for the Web (CSW) standard which defines common interfaces for accessing the metadata information.
A search engine is a software system capable of supporting fast and reliable search, which may use “any means necessary” to get users to the resources they need quickly and efficiently. These techniques may include features such as full text search, natural language processing, weighted results, fuzzy tolerance results, faceting, hit highlighting, recommendations, feedback mechanisms based on log mining, usage statistic gathering, and many others. In this paper we will be focusing on improving geospatial search with a search engine platform that uses Lucene, a Java-based search library, at its core.
In work funded by the National Endowment for the Humanities, the Centre for Geographic Analysis (CGA) at Harvard University is in the process of re-engineering the search component of its public domain SDI (WorldMap http://worldmap.harvard.edu ) which is based on the GeoNode platform. In the process the CGA has developed Harvard Hypermap (HHypermap), a map services registry and search platform independent from WorldMap.
The goal of HHypermap is to provide a framework for building and maintaining a comprehensive registry of web map services, and because such a registry is expected to be large, the system supports the development of clients with modern search capabilities such as spatial and temporal faceting and instant previews via an open API. Behind the scenes HHypermap scalably harvests OGC and Esri service metadata from distributed servers, organizes that information, and pushes it to a search engine. The system monitors services for reliability and uses that to improve search. End users will be able to search the SDI metadata using standard interfaces provided by the internal CSW catalogue, and will benefit from the enhanced search possibilities provided by an advanced search engine. HHypermap is built on an open source software source stack.