SCENARIO
I just encountered myself with a scenario, i missed a field in my structure – articles location. The same value can be retrieved from the article_users table where the location field still persists. But the issue here was that since elasticsearch dose not provide stuff such as joins, i decided to add the field in the articles table of mine and then set the location to be the same as of the user who was associated with the article.
PROBLEM
The approach for me seemed to be simple enough where i could write update_by_query as discussed earlier. So the simpler solution for the same seemed – Retrieve all the users who don’t have yet any location value set. For each user, retrieve the location value and set the same for all the articles which belong to that user. For that planned to build the the code that used update_by_query script.
$client = \Elasticsearch\ClientBuilder::create()->setHosts(['127.0.0.1:9200'])->build();
# Request
$updateRequest = [
'index' => 'artciles',
'type' => 'article',
'conflicts' => 'proceed',
'body' => [
'query' => [
'term' => [
'article_user_id' => $article_user_id
]
],
'script' => [
'inline' => "ctx._source['article_user_location'] = '$location'"
]
]
];
# Update
$results = $client->updateByQuery($updateRequest);
Looks Simple enough …correct? Well it seemed the same to me. But then there came an error. After 15 records were updated, it started giving in error / exception. Now elasticsearch came in with a limitation of max_compilation_per_minute set to 15 by default. Now one option for the same is – update the value of the same. But the question remains – how many? And the other reason not to update the same is – there is surely an reason for the same for ES team to have blocked it to 15 dynamic script compilations per minute. The reason is simple, from the other experts opinions / documentations that it consumes CPU. So in order to keep the CPU un-busy, it recommended to restrict it at the first instance. Still, it have kept it open for user to set up the numbers by himself. So how to solve this issue now?
SOLUTION
Like MySQL / Oracle have stored procedures, not exactly with full power like RDBMS Stored Procedures, but it too have something like stored scripts. In last session, we saw how we could use ingest to update multiple records with static values. Now we want to achieve updating the data with dynamic value. How we do it – we create a stored script first.
PUT _scripts/update-userlocation
{
"script": {
"lang": "painless",
"code": "ctx._source.article_user_location = params.value"
}
}
Here – if u see, we created a script – where we want to update the field with a given value. Now the value yet is undefined. What we do in here – we pass the value dynamically from our PHP to the script for updating. How to consume the same in PHP?
POST articles/article/_update_by_query
{
"query": {
"term": {
"article_user_id": "2228954056"
}
},
"script": {
"id": "update-userlocation",
"params": {
"value": "Mumbai"
}
}
}
Above is the way we can achieve it with CURL. Now lets achieve the same with PHP code.
$client = \Elasticsearch\ClientBuilder::create()->setHosts(['127.0.0.1:9200'])->build();
# Request
$updateRequest = [
'index' => 'artciles',
'type' => 'article',
'conflicts' => 'proceed',
'body' => [
'query' => [
'term' => [
'article_user_id' => $article_user_id
]
],
'script' => [
'id' => "update-userlocation",
'params'=>[
'value'=>$location ]
]
]
];
# Update
$results = $client->updateByQuery($updateRequest);
That’s it. Now we can enjoy writing in such script without worrying bout ES throwing back exception or even hogging the CPU with dynamic script compilations.