Amit's Garage

A garage where one can get tools to solve their problems

Install and create a cluster of ES Servers

February 5, 2018 by BlakDronzer

There are many various installation notes around the various blogs you may find. So did i, but unfortunately none were in complete state or so. The procedure stated by them might have worked out for them at that time but unfortunately didn’t work for me. So, let me share the process i followed for setting the things right in installing / setting up a cluster of ES Servers.

Perquisites

You may have a physical / virtual machine with a flavour of Ubuntu or CentOS or either one you feel comfortable with. Will recommend to go with the latest edition that will help you serve with latest updates / security patches. Also will recommend a minimum of 1GB of RAM. But you will be in a better understand the requirements of the server that suites you the best.

Our team here had gone in with 3 Virtual Machines having CentOS 7 with having 8 GB of RAM.

Step 1. Installation

Before we start installing Elastic Search servers, first we will need to install Java in order to run the same. Follow the steps for installing Java (Current version of java is 8.):

    1. Update the server with the following command.
      sudo apt-get update
    2. Add the Official Oracle Java repository
      sudo add-apt-repository ppa:webupd8team/java
    3. Now refresh the package list
      sudo apt-get update
    4. Now install the java using the following command:
      sudo apt-get install oracle-java8-installer
    5. Once installed, verify the installation by checking the java version.
      java -version

 

There are two ways of installing the server which are easy / convenient to go along with. One using Yum / APT, other is by download the required version from the server and installing it. We are going to download and install the desired version from the site. This is much convenient then the other way of installations.

  1. Download the latest version from the server. Go for the RPM version of the installations, it creates all the necessary installations configurations. People who are expert enough may also go with the manual version through the zip / tar variants of the installation process. Note: At this moment, the latest version of ES was 6.1.3 – so we download and install the same
    cd ~
    wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.1.3.rpm
  2. Install the downloaded package.
    sudo dpkg -i elasticsearch-6.1.3.rpm

Step 2: Configurations

Open the file

/etc/elasticsearch/elasticsearch.yml.

(This will be the path of the configuration path in case you have followed up the installation steps shared above.

  1. Binding to VPN IP Address or Interface:
    • If we do not bind to any ip address in configuration, by default elastic search will be available only from localhost. At this point, the system is not accessible from other machines on the network or publicaly.
    • If we want to access it from other machines, we have to bind it to the desired IP address. Now there are times we may bind it to the internal network ip address or from external resource. In such scenario, we may point it to either IP addresses.
    • But one of the document shared a good insight. Bind it to the interfaces. Say we have the interface name set to – ‘tun0’. Other available interface is ‘local’.  Lets bind the same the server to this two interfaces.
      network.host: [_tun0_, _local_]
  2. Set the name of the cluster.
    custer.name: production
  3. Set the nodes name. Assuming we have node1, node2, node3 as hostnames for each hosts.
    node.name: node1
    Ps.Note: We will have to set the node name to its individual machines configuration.
  4. Now set the discovery hosts. Here, we will configure the initial lists of nodes that will be contacted to discover and form a cluster. This is necessary in a unicast network.
    discovery.zen.ping.unicast.hosts: ["node1", "node2", "node3"]
  5. Now this node names that we have set here, we will need to set the IP address pointing to each machine. Hence we will edit /etc/hosts file and set the ip address / hosts name on all machines.

Save and exit the elasticsearch.yml

Now start elasticsearch:

sudo systemctl start elasticsearch

Also configure the system to start Elastic Search on boot up:

sudo systemctl enable elasticsearch

Now there is one thing that is very critical / important for us to enable the cluster. We need to enable port 9300 into the firewall. (Pl. note, since we started with CentOS 7, we will go ahead configuring the same with that in the environment)

firewall-cmd --zone=public --add-port=9200/tcp --permanent
firewall-cmd --zone=public --add-port=9300/tcp --permanent
firewall-cmd --reload

Once the firewall is set to allow the machine to access port 9300 publicly, each servers should now be in state of communicating with each other to form a cluster. (Pl. Note, in above we have opened the port to be able to access the system by anyone who can connect the machine on public interface. It is recommended for you to keep the same limited to internal network / specific Ip address for security purpose.)

Now you can check the state of the cluster by issuing the following command:

curl -XGET 'http://localhost:9200/_cluster/state?pretty'

This surely should give u a clustered environment with 3 nodes connected with each other. The above should give you an output like the following:

{
  "cluster_name": "production",
  "compressed_size_in_bytes": 358,
  "version": 20,
  "state_uuid": "pTJFgszvT_qz8UJGezyJSQ",
  "master_node": "kXwy2-X7SDSqjtQZgLxgjg",
  "blocks": {},
  "nodes": {
    "lY_a_D2dT5uLM5NNCLsTTA": {
      "name": "node1",
      "ephemeral_id": "3XiogkxlTHKzpYY2m5iVug",
      "transport_address": "192.168.0.167:9300",
      "attributes": {}
    },
    "o8pHnUXaQHKLAL4f1oAiFA": {
      "name": "node2",
      "ephemeral_id": "Bjv0G8EXT0KvDyoS7WL4Qg",
      "transport_address": "192.168.0.168:9300",
      "attributes": {}
    },
    "kXwy2-X7SDSqjtQZgLxgjg": {
      "name": "node3",
      "ephemeral_id": "hExDdFoWRXyBbiexF3MzpQ",
      "transport_address": "192.168.0.169:9300",
      "attributes": {}
    }
  },
  "metadata": {
    "cluster_uuid": "A1MnDXSiRB6rF_eCh6llXQ",
    "templates": {},
    "indices": {},
    "index-graveyard": {
      "tombstones": []
    }
  },
  "routing_table": {
    "indices": {}
  },
  "routing_nodes": {
    "unassigned": [],
    "nodes": {
      "o8pHnUXaQHKLAL4f1oAiFA": [],
      "kXwy2-X7SDSqjtQZgLxgjg": [],
      "lY_a_D2dT5uLM5NNCLsTTA": []
    }
  },
  "snapshots": {
    "snapshots": []
  },
  "restore": {
    "snapshots": []
  },
  "snapshot_deletions": {
    "snapshot_deletions": []
  }
}

Another very important configuration that i came across for ES – for sake of performance and stability is to Enable Memory Locking. Following are the steps for the same.

  1. Edit the ES Configuration and uncomment or add the following line:
    bootstrap.mlockall: true

    Save and exit the file/

  2.  Next – open the following file for editing: /etc/sysconfig/elasticsearch and change a few settings in the same.
    • Set the Heap Size for ES. You can set it about 50% of the available memory. (Pl. note, the max heapsize allocation recommended is not more then 32GB).
      ES_HEAP_SIZE=4g
    • Uncomment the following line
      MAX_LOCKED_MEMORY=unlimited
    • Save and exit
  3. Now edit the ES Systemd unit file
    • sudo vi /usr/lib/systemd/system/elasticsearch.service
    • Uncomment or add the following:
      LimitMEMLOCK=infinity
    • Save and Exit
  4. Now reload the systemctl daemon and restart Elasticsearch to put in the changes into place
    sudo systemctl daemon-reload
    sudo systemctl restart elasticsearch
  5. Now verify Mlockall Status. Issue the following command to check the same:
    curl http://localhost:9200/_nodes/process?pretty

    Each node should have a line that says “mlockall” : true, which indicates that memory locking is enabled and working

That’s it. You should be fine to go around – play with your Elasticsearch now.

Update multiple documents with multiple values

January 9, 2018 by BlakDronzer

PROBLEM

We have yet seen how to update multiple documents in elasticsearch and we have checked how to update data using elasticsearch. But if you see, on both the articles we have been setting up multiple scripts / ingest-pipelines and then running the same for update individually. Means, if we were to update 4 elements in multiple documents, we will create that many scripts / ingest-pipeline and run it that many times. Now what if we could achieve the same in a single go!

SOLUTION

We can create a script that will run in multiple set of instructions, do an update for multiple values. The script currently is at a basic level, you can explore further to do much tedious operations on the same. But none the less – here is a simple set of example that will simplify the developers life.

Here, i just ran into a scenario that for a new client created, i forgot to turn on the flag for activation. Now when i save in all the articles, it sets into the system, gets mapped to a topic but without client / brand information. How to set the same in simple operation? Here is the following solution..

Step 1 – Create the update script


PUT _scripts/update-clientbrand
{
  "script": {
    "lang": "painless",
    "code": "ctx._source.brand_id = params.brand_id; ctx._source.brand_name = params.brand_name; ctx._source.client_id = params.client_id; ctx._source.client_name = params.client_name;"
  }
}

If you notice, we have placed in  – multiple execution statements for updating the document variables with set values. Now lets write in the script that will do the actual updation


POST articles/_update_by_query
{
  "script": {
    "id": "update-clientbrand",
    "params": {
      "brand_id": "12",
      "brand_name": "Some Brand Name",
      "client_id": "100003",
      "client_name": "Some Client Name"
    }
  },
  "query": {
    "match" : {
      "topic_name": "Some Topic Name"
    }
  }
}

Now this will update all the documents that gets hits up by this query and update the brand / client details.

Updating multiple documents using Elasticsearch-php

December 14, 2017 by BlakDronzer

SCENARIO

I just encountered myself with a scenario, i missed a field in my structure – articles location. The same value can be retrieved from the article_users table where the location field still persists. But the issue here was that since elasticsearch dose not provide stuff such as joins, i decided to add the field in the articles table of mine and then set the location to be the same as of the user who was associated with the article.

PROBLEM

The approach for me seemed to be simple enough where i could write update_by_query as discussed earlier. So the simpler solution for the same seemed – Retrieve all the users who don’t have yet any location value set. For each user, retrieve the location value and set the same for all the articles which belong to that user. For that planned to build the the code that used update_by_query script.


    $client = \Elasticsearch\ClientBuilder::create()->setHosts(['127.0.0.1:9200'])->build();
    # Request
    $updateRequest = [
        'index'     => 'artciles',
        'type'      => 'article',
        'conflicts' => 'proceed',
        'body' => [
            'query' => [ 
                'term' => [
                    'article_user_id' => $article_user_id
                ]
            ],
            'script' => [
                    'inline' => "ctx._source['article_user_location'] = '$location'"
            ]
        ]
    ];
    # Update 
    $results = $client->updateByQuery($updateRequest);

Looks Simple enough …correct? Well it seemed the same to me. But then there came an error. After 15 records were updated, it  started giving in error / exception. Now elasticsearch came in with a limitation of  max_compilation_per_minute set to 15 by default. Now one option for the same is – update the value of the same. But the question remains – how many? And the other reason not to update the same is – there is surely an reason for the same for ES team to have blocked it to 15 dynamic script compilations per minute. The reason is simple, from the other experts opinions / documentations that it consumes CPU. So in order to keep the CPU un-busy, it recommended to restrict it at the first instance. Still, it have kept it open for user to set up the numbers by himself. So how to solve this issue now?

SOLUTION

Like MySQL / Oracle have stored procedures, not exactly with full power like RDBMS Stored Procedures,  but it too have something like stored scripts. In last session, we saw how we could use ingest to update multiple records with static values. Now we want to achieve updating the data with dynamic value. How we do it – we create a stored script first.


PUT _scripts/update-userlocation
{
  "script": {
    "lang": "painless",
    "code": "ctx._source.article_user_location = params.value"
  }
}

Here – if u see, we created a script – where we want to update the field with a given value. Now the value yet is undefined. What we do in here – we pass the value dynamically from our PHP to the script for updating. How to consume the same in PHP?


POST articles/article/_update_by_query
{
  "query": { 
    "term": {
      "article_user_id": "2228954056"
    }
  },
  "script": {
    "id": "update-userlocation",
    "params": {
      "value": "Mumbai"
    }
  }
}

Above is the way we can achieve it with CURL. Now lets achieve the same with PHP code.


    $client = \Elasticsearch\ClientBuilder::create()->setHosts(['127.0.0.1:9200'])->build();
    # Request
    $updateRequest = [
        'index'     => 'artciles',
        'type'      => 'article',
        'conflicts' => 'proceed',
        'body' => [
            'query' => [ 
                'term' => [
                    'article_user_id' => $article_user_id
                ]
            ],
            'script' => [
                    'id' => "update-userlocation",
		    'params'=>[
		    'value'=>$location					]
            ]
        ]
    ];
    # Update 
    $results = $client->updateByQuery($updateRequest);

That’s it. Now we can enjoy writing in such script without worrying bout ES throwing back exception or even hogging the CPU with dynamic script compilations.

Search records in elasticsearch with empty fields OR no values

December 13, 2017 by BlakDronzer

PROBLEM

There are times and situations when in elasticsearch, while inserting, we don’t set any value at that moment. It is quite possible, we don’t have the value yet set. Like say, we are inserting in some article but at that point of time, we don’t have value to be set for sentiment at the time of insertion. Then the value dose not get set in the document that is indexed. If we see the document, we won’t find the key there even if we have mapped it.

SOLUTION

Now, if we were to find such articles who such keys missing, we can trace the same using scripting

 


GET articles
{
	"from": 0,
	"size": 200,
	"query": {
		"bool": {
			"must": {
				"script": {
					"script": {
						"inline": "doc[\"sentiment\"].empty"
					}
				}
			}
		}
	}
}

OR the same can also be simplified with the new Elasticsearch-SQL


GET _sql?sql=Select * from articles where script('doc["sentiment"].empty')
 This way we can get all the records whose values have not been set yet.

Eased out solution for people using elasticsearch

December 11, 2017 by BlakDronzer

PROBLEM

Developers coming from SQL community to play around with No-SQL systems like elasticsearch, building up queries is something of an issue they constantly face. Simple queries should be fine to deal with but complex queries are real challenging tasks for many to deal in with.

SOLUTION

A good news / sigh of relief for developers coming from SQL platform. There is a group of developers who constantly work around on building up simpler solution for providing SQL Query environment over Elasticsearch. It is a plugin which can be downloaded from the following link.

DRAWBACK

Though this comes in real handy with great advantage, there are limitations to the querying system in here and also, this have to be installed in as plugin inside elasticsearch. Hence, this will be fruitful for developers who have their own server where elasticsearch will be deployed. PLEASE TAKE A NOTE. Do not involve in developing the application / site using this plugin if you are planning to take in elasticsearch as an service.

Updating data in Elasticsearch

November 28, 2017 by BlakDronzer

There are many ways in Elasticsearch for one to update their data.

SCENARIO 1:

Lets say, you want to update a value (like increment a certain fields value) or like set some value to specific fields (like for example update the value of flag), there is something called Partial Updates. Example for the same:


//This one indicates the option for user to set direct values in the partial update
POST /website/blog/1/_update
{
   "doc" : {
      "tags" : [ "testing" ],
      "views": 0
   }
}

OR


//This one indicates the option for user to update the value using script .. like incrementing the same and stuff
POST /website/blog/1/_update
{
   "script" : "ctx._source.views+=1"
}

The same is perfect solution if in case you looking to update for single row.

SCENARIO 2:

Lets consider you wanted to update multiple rows together following a certain criteria, then there is a feature for the same – Update By Query. Following example will share you an insight as how you can take advantage for the same.


POST blog_posts/_update_by_query
{
  "script": {
    "source": "ctx._source.likes++",
    "lang": "painless"
  },
  "query": {
    "term": {
      "user": "blakdronzer"
    }
  }
}

There is another good thing with partial updates, that is using ingest nodes. There are great possibilities with ingest nodes. Will  be covering ingest nodes later in another document but let me give in a scenario up here. Let’s say, there are a certain default values (static) you want to update together on a certain rows. Creating an ingest node can be handy in a situation like this.


PUT _ingest/pipeline/set-default-values
{
  "description" : "sets dynamic values",
  "processors" : [ {
      "set" : {
        "field": "topic_id",
        "value": 1
      }
  },
  {
      "set" : {
        "field": "topic_name",
        "value": "Defualt Topic"
      }
  },
  {
      "set" : {
        "field": "client_id",
        "value": 1
      }
  },
  {
      "set" : {
        "field": "client_name",
        "value": "Default Client"
      }
  },
 {
      "set" : {
        "field": "defaults_set",
        "value": 1
      }
  }
  ]
}

Now i can use use this ingest to set default values to multiple rows


POST articles/_update_by_query?pipeline=set-default-values
{
  "query": {
    "term": {
      "defaults_set": 0
    }
  }
}

Now this becomes very handy. There are lot other possibilities which are there with ingest nodes that can be taken in consideration / powered up with. This is just 1 simple example shared with you all.

Percolator, a powerful tool from elasticsearch

November 22, 2017 by BlakDronzer

Yes, i will agree with this one fact that percolator is a wonderful, powerful tool / utility provided by elasticsearch. Here is a simple scenario that will let you understand and absorb the power of percolator.

PROBLEM

There is a need to automatically map a certain content that is pushed in by user to a set of categories. For each category, i have a set of keyword rules defined. Traditionally one would have done it the old fashioned way. Once the content is been pushed by the user, pick all the keywords, find the ones that exists in the content, allocate the same to categories found through means of keywords. Fair enough logic one would use. But the real life scenarios are not so easy. There are enough complexities along with. Very firstly, imagine there are around 10,000+ records as keywords to be scanned / processed through, how fast will be the processing through each set of content pushed by the user?

Many out there will argue the power of processors and memory cheaply available as for now, great to know that. But then it still takes enough time to process each of the content pushed. Let’s add to another set of complexity now. Lets say, it ain’t just keywords but also a set of conditions (AND / OR / NOT) logic to be processed. As for example:

Tata Power" AND ("Vijayant Ranjan" OR "Mahesh Paranjpe" OR "Hydro")"

Another Real Life Example


"National Association of Software" OR "Nasscom" OR "Ministry of Electronics and IT" OR "Indian Information Technology" OR "H1B1 Visa" OR "E governance" OR "E-governance" OR "Online visa services" OR "Passport services" OR "Sewa Kendras" OR "Commercial and employment perspective" OR "Ethical perspective" OR "Indian Software Firms" OR ("IT services" AND "IT services IT services"~10000) OR ("IT Industry" AND "IT Industry IT Industry"~10000) OR ("IT Company" AND "IT Company IT Company"~10000) OR ("IT companies" AND "IT companies IT companies"~10000) OR ("Indian IT" AND "Indian IT Indian IT"~10000) OR ("IT Sector" AND "IT Sector IT Sector"~10000) OR ("IT Sectors" AND "IT Sectors IT Sectors"~10000) OR ("IT implications" AND "IT implications IT implications"~10000) OR ("IT enterprises" AND "IT enterprises IT enterprises"~10000) OR ("IT firms" AND "IT firms IT firms"~10000) OR ("IT firm" AND "IT firm IT firm"~10000) OR ("IT organizations" AND "IT organizations IT organizations"~10000)"

And one more

"Lenovo" NOT ("Smart phones" OR "Smartphones")"

Now here if i have to deal with the regular parsing logic, the very first, writing the perfect logic is going to be a nasty / time consuming effort. Plus, it will definitely hamper a lot on performance.

The question here is, is there a simpler solution for the same above?

SOLUTION

Well, good news for folks who been looking around for a simpler solution. Yes, it is provided by elasticsearch. Where here we are performing is merely like a reverse search, trying to map the document to categories based on the keywords found in the same. In regular search, we have set of documents we search set of keyword / criteria to get results.

This reverse search mechanism is known as percolator. A very powerful tool indeed for those who can understand and use it. Now how to use it, let me share you a set of example for the same above scenario.

Step 1. We need to create a percolator mapping index for the same.


PUT /keywords
{
    "mappings": {
      "queries": {
        "properties": {
          "query": {
            "type": "percolator"
          }
        }
      },
      "keyword": {
        "properties": {
          "content": {
            "type": "text"
          }
        }
      }
    }
}

There goes in the definition for my percolator where i will add up all the set of my keywords that i want to filter the document along with. If you notice, this mapping is bit different then the regular mapping. The additional part in here is mapping of queries. That is going to do the trick for us. The keyword have a property – content which is like a document that will be expected from the user to be pushed in as a structure to search across the query sets.

Step 2. Now lets see how we put up with the entries in percolator.
Method 1.


PUT keywords/queries/4?refresh
{
    "query": {
      "query_string" : {
        "query": """National Association of Software" OR "Nasscom" OR "Ministry of Electronics and IT" OR "Indian Information Technology" OR "H1B1 Visa" OR "E governance" OR "E-governance" OR "Online visa services" OR "Passport services" OR "Sewa Kendras" OR "Commercial and employment perspective" OR "Ethical perspective" OR "Indian Software Firms" OR ("IT services" AND "IT services IT services"~10000) OR ("IT Industry" AND "IT Industry IT Industry"~10000) OR ("IT Company" AND "IT Company IT Company"~10000) OR ("IT companies" AND "IT companies IT companies"~10000) OR ("Indian IT" AND "Indian IT Indian IT"~10000) OR ("IT Sector" AND "IT Sector IT Sector"~10000) OR ("IT Sectors" AND "IT Sectors IT Sectors"~10000) OR ("IT implications" AND "IT implications IT implications"~10000) OR ("IT enterprises" AND "IT enterprises IT enterprises"~10000) OR ("IT firms" AND "IT firms IT firms"~10000) OR ("IT firm" AND "IT firm IT firm"~10000) OR ("IT organizations" AND "IT organizations IT organizations"~10000)""",
        "fields": ["content"]
      }
    }
}

Method 2.


PUT keywords/queries/4?refresh
{
    "query": {
      "term" : {
        "content": "National Association of Software",
      }
    }
}

In the above, notice the pattern we are to follow.

  1. Instead of the type – keyword, here we are mentioning queries.
  2. Along with it, we are also specifying 4 (as an id). Now this is very important. I will come back with explaining as why i recommend using the same here.
  3. Now, next if you see, we put up the method as how we are going to query to the incoming content. There can be many more but i have come across such 2 methods of querying the incoming content. First one is for more complex scenario, other one is for simpler one – straight way of searching using term methodology.
  4. One more important thing – here the method used is PUT and not POST. Need to be careful with this. I have not experimented with POST methodology, you guys can surely go ahead and do the same.

Now in case we were to search for matching keywords in a given content – how to go about the same?


POST keywords/_search
{
  "query": {
    "percolate": {
      "field": "query",
      "document_type": "keyword",
      "document": {
        "content": "In Taj Lake Palace there is going to be the meeting for Nasscom India"
      }
    } 
  }
}

The above pattern if you notice, we have a POST method with _search action. What we are querying up for – percolate. There we mention the document_type – The same as we defined up in the mapping. And in the document, we are giving in the same structure – with content placed in it.

What elasticsearch will do is parse the document across all the sets of the keywords set in percolator and return the list.

Now – how is this going to be helpful to us – if you remember – we had asked you to specify the index id, that is going to be very handy. Like me, many will still have their master data in RDBMS like MySQL. We have our keywords / categories mapped through the primary key. That is the same we need to set it here as index. So when we get the searched result along with the keywords, we will get the list if ID’s with the same. This are none other then category ids that we want to map the content to. There, now you got it all.

NOTE

One very important thing you need to keep in mind is the size of the result. By default, only 10 rows will be returned by the search query. If you expecting more then 10, you surely need to specify the size param along with the query.

USE CASES

Now where can such stuff be useful. Let me share you a few instances.

  1. Lets say, there are articles that are being uploaded by the users. These are some articles related to legal cases. Now, we want to automatically identify what category of law this articles fit into. Rather then user specifying the same, we can automatically scan through the content and identify what category it fits into, or what court the case was fought across, or which all sections was mentioned in the same.
  2. Lets say there is an E-Commerce system where the system is like a SAAS based application. Now the system wants to automatically set tags / search criteria / keywords to the uploaded product. Reason – if all are tagged up properly, the same will have great benefit for retrieving the data lightning fast rather then scanning through all the documents .

 

Why should we map an index in elasticsearch

November 13, 2017 by BlakDronzer

Elasticsearch is just like another document index non-rdbms database. All do support saving documents even without creating a mapping for the index. Still, then why is it recommended to actually map your index / types?

Well lets see at this scenario – when we save a document in elasticsearch, if there is no mapping set for the type in index – what it dose is – automatically consider the type to the key to be text field with 256 chars limit (above 256 chars – it automatically ignores).

Now that is fine to have it- but then imagine – if you want to run query on like a date comparison, there it can in trouble .. why? cuz what elasticsearch did – it saved the value for the date in text format.

That is 1 situation. Now if you want to run aggregations on a certain field as base – it is necessary to mark it as keyword (there is another way too for it to be dealt with .. you can refer to the documentation). But since the type of field is text – it will not allow you to run aggregations on it.

Change the structure of the mapping

November 13, 2017 by BlakDronzer

As you all might be aware of the elasticsearch in nature – it is more like a persistent database system. Now during development or production, there always can come a need for changing the mapping / structure. Now adding a field to mapping is not a problem but changing a field can be a problem for sure. As elasticsearch dose not allow us to change the field type mapping on the fly. So – how to achieve the same? Let us go with the original scenario we were working with dealing in earlier (topic / topic_name) problem set.

Original Structure


{
    "mappings": {
         "words": {
             "properties": {
                 "platform": {
                     "type": "keyword"
                 },
                 "topic": {
                     "type": "keyword"
                 },
                 "topic_id": {
                     "type": "integer"
                 },
                 "topic_name": {
                     "type": "text",
                     "fields": {
                         "keyword": { 
                            "type": "keyword", 
                            "ignore_above": 256 
                         } 
                      } 
                 },
           }
     }
}

Now if you realize in the above structure the field topic_name is a new field that got created, now we simply wanted to map it as a field with type – keyword. Also, we want to remove the field topic from the mapping.

How to achieve the same? There is a trick to doing the same with minimal or no downtime. For the same, we need to follow the following steps:

  • Create a new mapping with the required structure – but with another index name (topicsv2)

{
    "mappings": {
        "words": {
           "properties": {
              "platform": {
                 "type": "keyword"
              },
              "topic_name": {
                 "type": "keyword"
              },
              "topic_id": {
                 "type": "integer"
              },
           }
     }
}
  • Run a _reindex on the 2 mappings

POST /_reindex
{
    "source": {
        "index": "topics",
        "type": "words",
        "_source": {
            "excludes": [
                "topic"
             ]
        }
    },
    "dest": {
        "index": "topicsv2",
        "type": "words"
    }
}

In the script above – you will notice one thing – we have asked elastic to exclude the specific field which we don’t want elasticsearch to copy it into the new index (topic). Now since we have managed to recreate the new index and move in all the data from previous index to new one, now what we will require to work on is still be able to manage the previous name (topics). Now how to achieve the same, elasticsearch have a powerful feature for the same i will say – Alias. How to go about using the same here?

Go ahead, delete the current mapping that we have (topics). Then create a new alias for the same


POST /_aliases
{
	"actions" : [
		{ 
			"add" : { 
				"index" : "topicsv2", "alias" : "topics" 
			} 
		}
	]
}

Now with this feature – you will be able to access the data still using topics as index and also with topicsv2.

Now with the above solution that we derived with – what i have experienced and will like to share along with fellow developer friends of mine is – will recommend using indexes with v1 / v2 (at the end of their index name) originally. And let the same be served with index names (without versions). This is best during development / production – especially while development, when we are at the peak of changing the structure as and when required / demanded. But the same scenario can also arise in the production environment too. Hence it is good to have it in there too.

Now this will definitely help you do the restructuring of the mapping with ease.

Changing the field name in elasticsearch

November 13, 2017 by BlakDronzer

Elasticsearch is structureless by default – can add up any field / structure as to any document based DB (like Mongo), but then there is also option where to map up an index / type in elasticsearch.

Now dealing a change like name of the field is pretty easy with RDBMS – it deal in with all the behind the scene work. But the same is not a straight forward stuff with Elasticsearch. Though Elasticsearch have the mechanisms available for the same but it expects more to be dealt by end user himself.

A scenario – Say you have a table where you have a field named – topic. Now you want to change the name to topic_name. There is a simple operation available on the same,


POST topics/_update_by_query
{
    "query": {
        "constant_score" : {
            "filter" : {
                "exists" : { "field" : "topic" }
            }
        }
    },
    "script" : {
        "inline": "ctx._source.topic_name = ctx._source.topic;"
    }
}

What the above code will do is add the field topic_name and set the value – the same as the one of topic field.

But then we are stuck with one issue – we have topic field and topic_name field. The code will be designed to set the new rows to have topic_name and not topic. So the earlier rows will have an issue – there will be extra field (topic) which already existed. How to deal with it? Now there aint any methodologies provided by Elastisearch to delete an existing column (field) from all the rows across. Well, that too have been provided as using scripting.


POST topics/_update_by_query
{
    "query": {
        "constant_score" : {
            "filter" : {
                "exists" : { "field" : "topic" }
            }
        }
    },
    "script" : {
        "inline": "ctx._source.remove(\"topic\");"
    }
}

This way – you get the topic field removed. And you are left with just topic_name. With this above set pattern – you can update the index / mapping to have the field name changed as required.

Now but there is a catch to the same, i actually faced this one now. When we created the above, the structure technically changed. For – i wanted to have is remove the topic from mapping itself – and add topic_name as a keyword – which i can use to search. But what happened in there – it started failing .. with above .. why? when we performed that – what happened was it kept the topic as the field (which was set during the creation of mapping). Apart from it – what extra it did to the same was – it created topic_name as a text field (that’s noticed as a default behaviour of elasticsearch – its way to handle unmapped field). Due to the reason, the other queries that were working on fetching the data from topic – started failing with topic_name as it was of text type and not keyword type.

How to resolve this issue now? Well that i will continue in the other topic of mine – which will relate to restructuring of the mapping.

  • Search Knowledgebase