Technical Blog

Exact search with ElasticSearch

ElasticSearch is an extremely powerful distributed database that can perform a lot of complex queries. This article aims to show how to perform an exact lookup (like WHERE field_name=field_value in SQL).

Creation of the test database

We will use a very simple database, that stores the age of an user. Here is an example object:

{
  "name" : "username",
  "age"  : 25
}

First, we create the db:

curl -XPUT http://localhost:9200/user_age

Then, we put some data in it:

curl -XPOST http://localhost:9200/user_age/user/ -d '{
   "name" : "user1",
   "user_age"  : 13
}'

curl -XPOST http://localhost:9200/user_age/user/ -d '{
   "name" : "user 2",
   "age"  : 20
}'

curl -XPOST http://localhost:9200/user_age/user/ -d '{
   "name" : "user 3",
   "age"  : 13
}'

curl -XPOST http://localhost:9200/user_age/user/ -d '{
   "name" : "USER4",
   "age"  : 20
}'

Note that we use POST requests, so we don’t have to manually specify the index key.

Failing requests

The objective of this database is to retrieve the age of an user from its username. For example, the following request is working:

curl -XGET http://localhost:9200/user_age/_search?pretty=true -d '{
    "query" : {
        "term" : {
            "name" : "user1"
        }
    }
}'
# Returns:
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.30685282,
    "hits" : [ {
      "_index" : "user_age",
      "_type" : "user",
      "_id" : "0lz5QLxlSRWEO7_OdvJcXg",
      "_score" : 0.30685282, "_source" : {
         "name" : "user1",
         "user_age"  : 13
       }
    } ]
  }
}

However, if we test with user 2 it doesn’t return anything:

curl -XGET http://localhost:9200/user_age/_search?pretty=true -d '{
    "query" : {
        "term" : {
            "name" : "user 2"
        }
    }
}'
# Returns:
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

Searching for USER4 fails as well, returning nothing.
Using a request that shouldn’t match anything (user) returns some results:

curl -XGET http://localhost:9200/user_age/_search?pretty=true -d '{
    "query" : {
        "term" : {
            "name" : "user"
        }
    }
}'
# Returns:
{
  "took" : 29,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.37158427,
    "hits" : [ {
      "_index" : "user_age",
      "_type" : "user",
      "_id" : "PBnEBcfMRpmBBX1xZG9Czw",
      "_score" : 0.37158427, "_source" : {
   "name" : "user 2",
   "age"  : 20
}
    }, {
      "_index" : "user_age",
      "_type" : "user",
      "_id" : "-ok3LJepR1C3H7W_DuaArQ",
      "_score" : 0.37158427, "_source" : {
   "name" : "user 3",
   "age"  : 13
}
    } ]
  }
}

The issue comes from the fact that when we inserted our records in the database, it was analyzed by the default analyzer (named Standard Analyzer). It contains the following operations:

As we are doing a term query, the input is not analyzed, which explains for example that USER4 doesn’t match, but user4 does.

We can change it to a text query (match query if your ElasticSearch is at least 0.19.9) but it doesn’t perform an exact search, so it would continue to return wrong results for “user” query.


curl -XGET http://localhost:9200/user_age/_search?pretty=true -d '{
    "query" : {
        "text" : {
            "name" : "user"
        }
    }
}'
# Returns:
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.37158427,
    "hits" : [ {
      "_index" : "user_age",
      "_type" : "user",
      "_id" : "PBnEBcfMRpmBBX1xZG9Czw",
      "_score" : 0.37158427, "_source" : {
   "name" : "user 2",
   "age"  : 20
}
    }, {
      "_index" : "user_age",
      "_type" : "user",
      "_id" : "-ok3LJepR1C3H7W_DuaArQ",
      "_score" : 0.37158427, "_source" : {
   "name" : "user 3",
   "age"  : 13
}
    } ]
  }
}

Solution: use mappings

This default behavior is good for documents indexing and retrieval, but is not adapted to our problem. We need to change it, using a custom mapping.

ElasticSearch is a schema-free database, which makes it very flexible. However, mapping allows to perform more powerful operations.
They allow to set different things:

  • Server settings (number of shards, replicas, etc.)
  • Analyzers: declared custom analyzers and filters, combining and configuring existing ones, including: NGrams generation, stemming, etc.
  • Mappings: details the fields of the records and different options, including which analyzer to use (see Core Types)

In our case, what we want is to remove the default analyzer, to keep the usernames unanalyzed.

First thing to do is to clear our index:

curl -XDELETE http://localhost:9200/user_age

Then, we can declare our mapping:

curl -XPUT http://localhost:9200/user_age -d '
{
	"mappings": {
		"user": {
			"properties": {
				"name": {
					"index": "not_analyzed",
					"type": "string"
				},
				"age": {
					"type": "integer"
				}
		}
	}
}
}
'

Then, we can put back our records and re-run the request. Now, “User” returns nothing and all the exact names, including “User 2″, returns the exact match.

To do it, we removed analyzing at insertion time and request time:

  • “index”: “not_analyzed” makes that the search engine keeps “User 2″ and nothing else, not ["user", 2]
  • Using term queries makes that request is not run for “user” and 2.

Note: this example is very simple, and could have been done very easily using the username as the record’s ID (in this case, it’s actually not even needed to add the name field):

curl -XPUT http://localhost:9200/user_age/user/user%202 -d '{
   "name" : "User 2",
   "user_age"  : 13
}'

It could have been retrieved like this:

curl -XGET http://localhost:9200/user_age/user/user%202

{"_index":"user_age","_type":"user","_id":"user 2","_version":1,"exists":true, "_source" : {
   "name" : "User 2",
   "user_age"  : 13
  }
}

However, if you want to do it on a non-unique field, or on several fields, this solution is not applicable anymore.

  • http://twitter.com/xXstandstillXx Machika Kara Kuro

    This is super useful and if you can make more them I think it would be great elasticsearch web site has lots of reference guide but no tutorials for nubs to get basics working.

  • Renato2099

    This awesome mate! Thanks so much! I have been looking for a concrete example of ElasticSearch mappings and this is it!

  • Maks Rafalko

    thanks, really useful!