Book Review: Elasticsearch in Action by Radu Gheorghe, Matthew Lee Hinman, Roy Russo (Manning)

Elasticsearch has been around for a while now, and it can be found in quite a lot of different websites nowadays. This book reviewed here aims to cover both the fundamentals of using elasticsearch, as well as the advanced usage, scaling and performance optimisation of elasticsearch in production.

In order to do this, the book is mainly divided in two distinct parts named “Core functionality” and “Advanced Functionality”, but there are also another six chapters at the end as appendices.

So let’s take things in order, as is also the recommended way of getting through the book, and start with the first part of the book. There are eight different chapters, covering all of the core functionality of elasticsearch. In these chapters the reader can find out about what kind of search problems can be solved, some typical use cases, understanding the meaning of documents, types, indices, shards and clusters. Following this, there is a thorough explanation about indexing, updating and deleting data. Another chapter is dedicated to searching the data with match and filter queries as well as using the best query for the specific job required. Analysing the data using analyzers, tokenizers and token filters with special sections about ngrams, edge ngrams, shingles and stemming, follows next. How scoring works and a description of the different scoring methods as well as boosting is described in the next chapter. The first part finishes off with two chapters about aggregations (metrics, multi-bucket and nesting aggregations) and the different ways to describe relationships (nested, parent-child etc) between documents.

The second part of the book, as mentioned previously, is dedicated to the advanced usage, scaling and performance in production.
There are three main chapters in this part.
The first one is about scaling out, and specifically about adding nodes, discovering other nodes, removing nodes and upgrading them in your Elastic cluster. There also sections about using the _cat API that provides helpful diagnostic and debugging tools in a more human readable way, different scaling strategies namely over-sharding, splitting data between indices and shards, and maximising throughput, using aliases, and finally routing.
The second chapter is mainly about improving performance, through the use of request grouping (bulk indexing, updating and deleting), multisearch and multiget APIs, Lucene segment optimisation, and the best use of caches, including a section about performance trade offs in different use cases.
The last chapter of this part concerns the administration of the elasticsearch cluster. This is done by means of improving the defaults that come ‘out of the box’, using ‘allocation awareneness’ that reduces central points of failure, and the subject of monitoring that includes checking the cluster health, CPU and memory usage, OS caches and store throttling. Finally there is also a section about backing up of the data and restoring it.

The third part with the appendices, is also a very thorough addition for many different subjects, and it should not be thought of as a very artificial mention of them as there is enough information provided. This includes a section about working with geospatial data, another one about a few of the most common plugins that can be used (both open source and commercial), the functionality of results highlighting, some monitoring plugins, the explanation and use case of the percolator for doing ‘upside down’ searches, and finally using suggesters for auto-completion and suggestions (did-you-mean) functionality.

It has to be noted, that throughout the book there are plenty of examples that you can follow and experiment for the simple ‘events’ example application that is provided. Everything that is been described has its own example code that can be used.
There are also numerous graphs and diagrams in all of the chapters, that make some concepts much easier to grasp and understand.
Many different use cases are also described as well as suggestions where each solution is best suited.

As a conclusion, this is a very useful book for both an experienced elasticsearch user/administrator as well as somebody that starts now. There is a lot of detail and examples that cover all the necessary functionality that someone needs in order to use and administer elasticsearch successfully. Many common use cases are provided and suggestions for each one are analysed in detail. Scaling and performance optimisation are also very well covered.
This is another indispensable book in ‘.. in Action’ Manning series, that has a very good balance between the theory and practice of its subject. Highly recommended.

Disclosure of Material Connection: I received this book free from the publisher. I was not required to write a positive review. The opinions I have expressed are my own. Regardless, I only recommend products or services I use personally and believe will add value to readers.