In Elasticsearch all data has an index andy type. For those who come from the world of relational databases the easiest way to understand these concepts is to see the following table:
MySQL | Databases | Tables | Columns/Rows |
---|---|---|---|
Elasticsearch | Indexes | Types | Documents with Properties (json objects) |
In this post explain how to do a CRUD in Elasticsearch, maybe you should look at it doesn't take you more than 5 min. :)
Before creating an index and a mapping I will quickly mention those concepts
Index
For putting data in Elasticsearch Server, you need to create a structure, which will describe what your data looks like. This is what is called an index.
And index(like a database) is a logical structure that holds data(json objects).
An Elasticsearch server can store many indexes.
For the example of this post I will create an index with the next properties:
"settings": {
"index": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"engram"
]
}
},
"filter": {
"engram": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 10
}
}
}
}
}
- analysis: Is a functionality that is used to converting text, into tokens or terms.
- analyzer: Is a functionality that is used to analize data o queries.
- tokenizer: Breaks text into terms whenever it encounters a character which is not a letter
- filter: Is a simple faceting type thar allows us to get the number of documents that match the filter.
- edgeNGram: A token filter.
Mapping
A mapping are used to define the index structure and an index can have multiples types.
For the example of this post I will create a mapping that will be called postal_code like this:
"mappings": {
"postal_code": {
"properties": {
"cp": {
"type": "text",
"store" : "yes"
},
"colonia": {
"type": "text",
"store": "yes",
"fielddata": true
},
"ciudad": {
"type": "text",
"store": "yes"
},
"delegacion": {
"type": "text",
"store": "yes"
},
"location": {
"type": "geo_point"
}
}
}
}
As you can see the content are json objects.
The fields for this example are: "cp", "colonia", "ciudad", "delegacion" and "location" all of them nested inside the "properties" object, and each field has a type of field, like a relational database.
The most common types are:
- Binary
- Bolean
- Date
- Number
- String
By default, field values are indexed to make them searchable, but they are not stored.
If you set the value in yes in store propertie it means that you can return the value of that field in the result.
Once I have defined all the minimal concepts needed for this example, I will now proceed with the code.
For this example I need the following:
- Go by Example: Environment Variables
- Package exec
- The docker repository of this post
- How to create multiline strings in Golang
Step 1
Add the following environment variables to the file: ~/.bashrc (this works for Mac Os s and Linux)
export ELASTICSEARCH_HOSTS=`docker inspect --format '{{ .NetworkSettings.IPAddress }}' elasticsearch`
export ELASTICSEARCH_PORT=9200
export ELASTICSEARCH_INDEX=mx
export ELASTICSEARCH_TYPE=postal_code
export ELASTICSEARCH_ENTRYPOINT=http://$ELASTICSEARCH_HOSTS:$ELASTICSEARCH_PORT
export ELASTICSEARCH_USERNAME=elastic
export ELASTICSEARCH_PASSWORD=changeme
This will allow me to use the environment variables within my go program.
Then run the following command:
source ~/.bashrc
Step 2
Package exec allows me to execute command like this:
cal
and i get this out
Agosto 2017
do lu ma mi ju vi sá
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31
Whith the next code:
package main
import (
"fmt"
"os"
"os/exec"
)
func main() {
command := "cal"
out1, err := exec.Command("sh", "-c", command).Output()
printError(err)
fmt.Printf("cal: \n\n%v\n\n", string(out1))
}
func printError(err error) {
if err != nil {
fmt.Printf("\nError: %v \n ", err.Error())
os.Exit(1)
}
}
Finally the complete code of the example is as follows:
/*
twitter@hector_gool
*/
package main
import (
"fmt"
"os"
"os/exec"
)
const(
ELASTICSEARCH_INDEX = "mx_test"
)
func main() {
fmt.Println("ELASTICSEARCH_ENTRYPOINT:", os.Getenv("ELASTICSEARCH_ENTRYPOINT"))
fmt.Println("ELASTICSEARCH_USERNAME:", os.Getenv("ELASTICSEARCH_USERNAME"))
fmt.Println("ELASTICSEARCH_PASSWORD:", os.Getenv("ELASTICSEARCH_PASSWORD"))
fmt.Println("ELASTICSEARCH_INDEX:", os.Getenv("ELASTICSEARCH_INDEX"))
fmt.Println("ELASTICSEARCH_TYPE:", os.Getenv("ELASTICSEARCH_TYPE"))
fmt.Println()
delete_index :=
`
curl -u $ELASTICSEARCH_USERNAME:$ELASTICSEARCH_PASSWORD -X DELETE $ELASTICSEARCH_HOSTS:$ELASTICSEARCH_PORT/`+ ELASTICSEARCH_INDEX +`
`
create_index :=
`
curl -u $ELASTICSEARCH_USERNAME:$ELASTICSEARCH_PASSWORD -X PUT "http://$ELASTICSEARCH_HOSTS:$ELASTICSEARCH_PORT/`+ ELASTICSEARCH_INDEX +`" -d '
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"engram"
]
}
},
"filter": {
"engram": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 10
}
}
}
}
},
"mappings": {
"user": {
"_all": { "enabled": false },
"properties": {
"firstname": { "type": "text" },
"lastname": { "type": "text" },
"nickname": { "type": "text" }
}
},
"postal_code": {
"properties": {
"cp": {
"type": "integer",
"store" : "yes"
},
"colonia": {
"type": "text",
"store": "yes",
"fielddata": true
},
"ciudad": {
"type": "text",
"store": "yes"
},
"delegacion": {
"type": "text",
"store": "yes"
},
"location": {
"type": "geo_point"
}
}
}
}
}
'
`
show_mapping :=
`
curl -u elastic:changeme -X GET $ELASTICSEARCH_HOSTS:$ELASTICSEARCH_PORT/`+ ELASTICSEARCH_INDEX +`/_mapping?pretty
`
out1, err := exec.Command("sh", "-c", delete_index).Output()
printError(err)
fmt.Printf("Delete index: %v\n\n", string(out1))
out2, err := exec.Command("sh", "-c", create_index).Output()
printError(err)
fmt.Printf("Create index: %v\n\n", string(out2))
out3, err := exec.Command("sh", "-c", show_mapping).Output()
printError(err)
fmt.Printf("Show mapping: \n%v\n\n", string(out3))
}
func printError(err error) {
if err != nil {
fmt.Printf("\nError: %v \n ", err.Error())
os.Exit(1)
}
}
This code first delete the index and create it again from scratch, the reason for this is because when I am doing tests it is very probable that I am wrong and I have to repeat this process several times, until I get the definitive version that I need for the draft.
Conclusion:
I know that this can be done more easily with a shell script, but try to test the options offered by go language :)
Or what do you think?
References:
https://www.elastic.co/blog/what-is-an-elasticsearch-index
https://www.elastic.co/guide/en/elasticsearch/reference/5.4/analysis-edgengram-tokenizer.html
very helpful, Thanks..