Book Review: Elasticsearch in Action by Radu Gheorghe, Matthew Lee Hinman, Roy Russo (Manning)

Elasticsearch has been around for a while now, and it can be found in quite a lot of different websites nowadays. This book reviewed here aims to cover both the fundamentals of using elasticsearch, as well as the advanced usage, scaling and performance optimisation of elasticsearch in production.

In order to do this, the book is mainly divided in two distinct parts named “Core functionality” and “Advanced Functionality”, but there are also another six chapters at the end as appendices.

So let’s take things in order, as is also the recommended way of getting through the book, and start with the first part of the book. There are eight different chapters, covering all of the core functionality of elasticsearch. In these chapters the reader can find out about what kind of search problems can be solved, some typical use cases, understanding the meaning of documents, types, indices, shards and clusters. Following this, there is a thorough explanation about indexing, updating and deleting data. Another chapter is dedicated to searching the data with match and filter queries as well as using the best query for the specific job required. Analysing the data using analyzers, tokenizers and token filters with special sections about ngrams, edge ngrams, shingles and stemming, follows next. How scoring works and a description of the different scoring methods as well as boosting is described in the next chapter. The first part finishes off with two chapters about aggregations (metrics, multi-bucket and nesting aggregations) and the different ways to describe relationships (nested, parent-child etc) between documents.

The second part of the book, as mentioned previously, is dedicated to the advanced usage, scaling and performance in production.
There are three main chapters in this part.
The first one is about scaling out, and specifically about adding nodes, discovering other nodes, removing nodes and upgrading them in your Elastic cluster. There also sections about using the _cat API that provides helpful diagnostic and debugging tools in a more human readable way, different scaling strategies namely over-sharding, splitting data between indices and shards, and maximising throughput, using aliases, and finally routing.
The second chapter is mainly about improving performance, through the use of request grouping (bulk indexing, updating and deleting), multisearch and multiget APIs, Lucene segment optimisation, and the best use of caches, including a section about performance trade offs in different use cases.
The last chapter of this part concerns the administration of the elasticsearch cluster. This is done by means of improving the defaults that come ‘out of the box’, using ‘allocation awareneness’ that reduces central points of failure, and the subject of monitoring that includes checking the cluster health, CPU and memory usage, OS caches and store throttling. Finally there is also a section about backing up of the data and restoring it.

The third part with the appendices, is also a very thorough addition for many different subjects, and it should not be thought of as a very artificial mention of them as there is enough information provided. This includes a section about working with geospatial data, another one about a few of the most common plugins that can be used (both open source and commercial), the functionality of results highlighting, some monitoring plugins, the explanation and use case of the percolator for doing ‘upside down’ searches, and finally using suggesters for auto-completion and suggestions (did-you-mean) functionality.

It has to be noted, that throughout the book there are plenty of examples that you can follow and experiment for the simple ‘events’ example application that is provided. Everything that is been described has its own example code that can be used.
There are also numerous graphs and diagrams in all of the chapters, that make some concepts much easier to grasp and understand.
Many different use cases are also described as well as suggestions where each solution is best suited.

As a conclusion, this is a very useful book for both an experienced elasticsearch user/administrator as well as somebody that starts now. There is a lot of detail and examples that cover all the necessary functionality that someone needs in order to use and administer elasticsearch successfully. Many common use cases are provided and suggestions for each one are analysed in detail. Scaling and performance optimisation are also very well covered.
This is another indispensable book in ‘.. in Action’ Manning series, that has a very good balance between the theory and practice of its subject. Highly recommended.

Disclosure of Material Connection: I received this book free from the publisher. I was not required to write a positive review. The opinions I have expressed are my own. Regardless, I only recommend products or services I use personally and believe will add value to readers.

PostGIS in Action by Regina O. Obe & Leo S.Hsu (Manning)

This is another book from Manning publications in the excellent ‘.. in Action’ series, which guides the reader through some practical uses of the book’s subject. In this case the subject is PostGIS, which for people that come across the term for the first time, is a spatial database extender for the PostgreSQL database management system. As described in the introduction the audience of this book includes GIS Practitioners and Programmers, DB Practitioners as well as Scientists, Researchers, Educators and Engineers. That makes it obvious that the audience covers a wide spectrum of professionals that would have various degrees of experience with the subject matter.

The material is divided in three main parts, which are: Learning PostGIS, Putting PostGIS to work, and Using PostGIS with other tools as well as four additional appendices.

The first part about Learning PostGIS is an introduction to GIS database concepts and practices, that introduces the geometry, geography, raster and topology types and what problems can be solved by each one of them. There is a thorough explanation of what PostGIS is and what you can do with a spatially enabled database that is not possible with a relational database. There are also chapters describing the spatial types that PostGIS offers and their related functions, an introduction to spatial reference systems and their concepts, tools for loading spatial data as well as desktop tools for viewing and querying them, and the use of geometry, geography and raster functions, geocoding and finally an introduction to spatial relationships.

The second part Putting PostGIS to work, is where all the pieces are put together, using the theory foundation from the previous part, in order to solve real world problems to questions like: which places are within X distance and what are the N closest places?
These cover the traditional methods of finding closest neighbours as well as KNN indexes. Following that there is a section dedicated to geotagging. Geometry and geography processing has its own chapter to demonstrate techniques to manipulate geometries, and some of the most common problems and solutions related to them. Other chapters include raster processing, topology which includes creating a topology, and building and working with topogeometries as well as the simplification and validation of them. The final two chapters of this part offer the reader practical solutions in how to organise the spatial storage depending on the requirements, and some very useful tips about query performance tuning and optimisations. It should be also noted that throughout the book there are plenty of examples for the reader to follow, and especially in this part, that are of great practical use.

In the Using PostGIS with other tools part we are told how PostGIS can be extended by means of add-ons like the PostgreSQL procedural languages PL/R and PL/Python that allows us to use the wealth of statistical functions and plotting capabilities of R as well as the numerous Python packages. A variety of travelling-salesperson problems are displayed in this section, and the pgRouting used for building routing applications is also covered. The remaining chapters cover server-side mapping servers and client-side mapping frameworks to display PostGIS data on the web.

Finally the appendices have a very useful section with additional resources, instruction for installing PostGIS, an SQL primer and a separate section with the PostgreSQL features that includes table inheritance, roles, functions and performance tips.

To summarise, this is an extremely useful book for a variety of professional people interested in discovering PostGIS and at the same time PostgreSQL. It does not require any previous knowledge of geospatial databases as there is a great explanation and coverage of the theory, systems and tools needed. It would be helpful if the reader has some knowledge of SQL in order to follow the examples provided, even though there is very good appendix that covers SQL.
A highly recommended book for starting your exploration in the world of spatial databases.

Disclosure of Material Connection: I received this book free from the publisher. I was not required to write a positive review. The opinions I have expressed are my own. Regardless, I only recommend products or services I use personally and believe will add value to readers.