Elasticsearch is a distributed database with an HTTP API. Here are some things I’ve learned. I’ve installed Elasticsearch version 7.x on Mac OS via Homebrew
Concepts
Mapping concepts from an RDMBS can be helpful.
- Index - this is like a RDBMS table
- Document - this is like a RDBMS row
- Mapping - this is like an RDBMS DDL structure, although it can be applied upfront or later on. Implicit vs. explicit.
- Immutable documents - documents are immutable, when updating a document, it is not modified in place but is marked for deletion and replaced by a new version with the changes. PostgreSQL works this way as well for row updates and deletes.
More Concepts
These concepts are specific to the architecture of Elasticsearch and scalability.
- Shard - A self-contained index
- Primary shard - for indexing requests. Each document is in a primary shard. Fixed at index creation.
- Replica shard - a copy of a primary shard. Replica shards can be added to scale search requests.
- Node - (servers) nodes serve primary or replica shards
- Cluster - a collection of nodes
- Deployment (cluster) - this is Elastic.co terminology that is synonymous with cluster. A deployment will contain an Elasticsearch cluster, as well as nodes for other services like Kibana.
- Segment merging - how Elasticsearch processes deleted documents
API Concepts
Elasticsearch has an HTTP API. That means HTTP verbs like POST
, PUT
, GET
and DELETE
are mapped to concepts like creating, updating, searching and deleting things.
Elasticsearch also supports a bulk API that can be used to create and delete multiple documents.
App Development Concerns
If the application provides explicit mappings for an index, do not create the indices manually but create them via the application so that they have the correct mapping types.
Create Index
curl -XPUT 'http://localhost:9200/foo'
Add Documents To Index
Create a document with id 1
in the index foo
with a title of “My title”.
curl -H 'Content-Type: application/json' -X POST 'localhost:9200/foo/_doc/1?pretty' -d '
{
"title": "My title"
}'
Search an Index
There are various ways of querying, this is using the Query String format. We can search for the document we just put into the index.
Adding pretty
onto the end will format the JSON output on multiple lines and with indentation.
curl -X GET 'localhost:9200/foo/_search?q=title:title&pretty'
Count documents
GET /index/_count
More Queries
GET /_cat/indices
GET /_cat/indices/*pattern*
GET /index/_search
# list top 10 documentsGET /index/_search -d '{"foo":"bar"}'
# some JSON search payloadGET /index/_search -d '{"foo":"bar"}'
# some JSON search payloadGET /_cat/shards
GET /_cat/shards/index-name
# shards for particular index
Example search request with a payload via curl
:
curl -H "Content-Type: application/json" "http://localhost:9200/some_index_name/_search?pretty" -d '
{
"from": 0,
"size": 50,
"sort": [
],
"query": {
"bool": {
"filter": [
]
}
}
}
'
Check index mapping types:
GET /index_name/_mapping
Tuning
Parameter | Default | |
---|---|---|
index.refresh_interval | Every 1s | Tune for indexing speed |
Logs
On Mac OS ES 7 via Homebrew. Tailing the log file:
tail -f /usr/local/var/log/elasticsearch/elasticsearch_brew.log
Running via Docker (Recommended method):
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.17.0
Some activity will be logged like index creation.
Stats
Check the index stats, e.g. for deleted documents:
GET /index/_stats
For a given index, the deleted documents were as much as 32% of the total number of documents in a performance sensitive index we have with over 100 GB of size, and with wildcard queries which are already more costly.
According to how Lucene handles deleted documents, this percentage is within the normal range though.
Use Cases
As a primary database
Elasticsearch can be used as a primary database in a way similar to a RDBMS like PostgreSQL.
The operational concerns here are more about indexing rate, search speed etc. as opposed to search results relevancy.
Resources
As a search engine
Elasticsearch has powerful capabilities for searching.
Tools
- Kibana - visualization tool, search logs, API console
- Rally benchmarking
Some tooling in Ruby