Technical Blog

Find closest subway station with ElasticSearch

This article aims to show a concrete example of spatial search in ElasticSearch. This feature allow to search records using their geographical coordinates: closest points, point within a circle or any polygon, etc. Shay Banon’s blog article is very helpful and helped me developing this concrete full example.

In this article, we will store in ElasticSearch all the subway stations of Paris and search for the closest ones to the Eiffel Tower (or any coordinate).

Getting the data

The Parisian transportation company made available for free the list of station’s coordinate (thanks to OpenData). You can retrieve the file here.

Creation of the database

We are going to store records with two fields:

  • Name, of type string
  • Location, of type Geo Point which stores latitude and longitude.

Setting location as a geo point will allow us to perform distance calculation operations on it. We create the type station on the geo_metro index using the following mapping:

curl -XPUT http://localhost:9200/geo_metro -d '
{
    "mappings": {
        "station": {
            "properties": {
                "name": {
                    "type": "string"
                },
                "location": {
                    "type": "geo_point"
                }
            }
        }
    }
}
'

Feeding the database

From the csv file, we are looking to parse it and insert it in the database. The best solution would be to make a script reading the file line by line and executing bulked insertion requests using an ElasticSearch client.

Here, to avoid installing clients I simply made a Python script (here) that generates a list of CURL requests that I save in a .sh file (here):

python create_requests.py stations.csv > insert.sh
bash insert.sh

Hopefully, the coordinates in the CSV file have already the format we need (degres only, with floating points). If it is not the case, you should look at the excellent Geographic Coordinate Conversion article from Wikipedia.

Searching the closest station

Now that we have our data stored in ElasticSearch with the correct mapping, we are able to execute searches. Our request will return the list of the five closest stations to a geographical point.
In our this example, I will use the Eiffel Tower coordinates that I found in its Wikipedia article.

The request is the following:

curl -XGET 'http://localhost:9200/geo_metro/station/_search?size=5&pretty=true' -d '
{
    "sort" : [
        {
            "_geo_distance" : {
                "location" : [48.8583, 2.2945],
                "order" : "asc",
                "unit" : "km"
            }
        }
    ]
    },
    "query" :
    {
        "match_all"
    }
'

It successfully returns:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 634,
    "max_score" : null,
    "hits" : [ {
      "_index" : "geo_metro",
      "_type" : "station",
      "_id" : "91jtmucvThaNq83Y2K4rww",
      "_score" : null, "_source" : {"name": "Champ de Mars-Tour Eiffel", "location": {"lat": "2.28948345865043", "lon": "48.855203725918"}},
      "sort" : [ 0.655364333719714 ]
    }, {
      "_index" : "geo_metro",
      "_type" : "station",
      "_id" : "H-Q9HFVcRqiqWVtk9OCvbQ",
      "_score" : null, "_source" : {"name": "I\u00e9na", "location": {"lat": "2.29379995911415", "lon": "48.8644728971468"}},
      "sort" : [ 0.6902478960716185 ]
    }, {
      "_index" : "geo_metro",
      "_type" : "station",
      "_id" : "PS0GCpC4TDmgjrhTRego4g",
      "_score" : null, "_source" : {"name": "Bir-Hakeim Grenelle", "location": {"lat": "2.28878285580131", "lon": "48.8543331583289"}},
      "sort" : [ 0.7735556213326537 ]
    }, {
      "_index" : "geo_metro",
      "_type" : "station",
      "_id" : "jd4ct8APS4WSWzRdvnsQbg",
      "_score" : null, "_source" : {"name": "Dupleix", "location": {"lat": "2.29276958714394", "lon": "48.8508056365633"}},
      "sort" : [ 0.8546099046905625 ]
    }, {
      "_index" : "geo_metro",
      "_type" : "station",
      "_id" : "Ol9SpVIcRkKYw2bj99Sb7g",
      "_score" : null, "_source" : {"name": "Pont de l alma", "location": {"lat": "2.30129356979222", "lon": "48.8624292670432"}},
      "sort" : [ 0.8838145038672007 ]
    } ]
  }
}

Note that the sort value is the distance between the Eiffel Tower and the subway station.

Conclusion

The objective of this article was to show how simple it is to perform requests based on geographical points, and how to use it. You can create way more complicated queries, changing match_all to a concrete query, using filters, facets, etc. with the scalability of ElasticSearch, which allows to store a huge amount of points!

  • http://deepintojee.wordpress.com/2012/12/03/spice-up-your-application-add-elasticsearch-geo-feature/ Spice-up your application: add elasticsearch geo feature « Diving deep into JEE

    [...] or filter by “geo_distance“. The other very useful resource was that post from Gauthier Lemoine‘s blog . The author’s app could find the nearest stations to the Eiffel Tour. The the [...]