Book Review: Elasticsearch in Action by Radu Gheorghe, Matthew Lee Hinman, Roy Russo (Manning)

Elasticsearch has been around for a while now, and it can be found in quite a lot of different websites nowadays. This book reviewed here aims to cover both the fundamentals of using elasticsearch, as well as the advanced usage, scaling and performance optimisation of elasticsearch in production.

In order to do this, the book is mainly divided in two distinct parts named “Core functionality” and “Advanced Functionality”, but there are also another six chapters at the end as appendices.

So let’s take things in order, as is also the recommended way of getting through the book, and start with the first part of the book. There are eight different chapters, covering all of the core functionality of elasticsearch. In these chapters the reader can find out about what kind of search problems can be solved, some typical use cases, understanding the meaning of documents, types, indices, shards and clusters. Following this, there is a thorough explanation about indexing, updating and deleting data. Another chapter is dedicated to searching the data with match and filter queries as well as using the best query for the specific job required. Analysing the data using analyzers, tokenizers and token filters with special sections about ngrams, edge ngrams, shingles and stemming, follows next. How scoring works and a description of the different scoring methods as well as boosting is described in the next chapter. The first part finishes off with two chapters about aggregations (metrics, multi-bucket and nesting aggregations) and the different ways to describe relationships (nested, parent-child etc) between documents.

The second part of the book, as mentioned previously, is dedicated to the advanced usage, scaling and performance in production.
There are three main chapters in this part.
The first one is about scaling out, and specifically about adding nodes, discovering other nodes, removing nodes and upgrading them in your Elastic cluster. There also sections about using the _cat API that provides helpful diagnostic and debugging tools in a more human readable way, different scaling strategies namely over-sharding, splitting data between indices and shards, and maximising throughput, using aliases, and finally routing.
The second chapter is mainly about improving performance, through the use of request grouping (bulk indexing, updating and deleting), multisearch and multiget APIs, Lucene segment optimisation, and the best use of caches, including a section about performance trade offs in different use cases.
The last chapter of this part concerns the administration of the elasticsearch cluster. This is done by means of improving the defaults that come ‘out of the box’, using ‘allocation awareneness’ that reduces central points of failure, and the subject of monitoring that includes checking the cluster health, CPU and memory usage, OS caches and store throttling. Finally there is also a section about backing up of the data and restoring it.

The third part with the appendices, is also a very thorough addition for many different subjects, and it should not be thought of as a very artificial mention of them as there is enough information provided. This includes a section about working with geospatial data, another one about a few of the most common plugins that can be used (both open source and commercial), the functionality of results highlighting, some monitoring plugins, the explanation and use case of the percolator for doing ‘upside down’ searches, and finally using suggesters for auto-completion and suggestions (did-you-mean) functionality.

It has to be noted, that throughout the book there are plenty of examples that you can follow and experiment for the simple ‘events’ example application that is provided. Everything that is been described has its own example code that can be used.
There are also numerous graphs and diagrams in all of the chapters, that make some concepts much easier to grasp and understand.
Many different use cases are also described as well as suggestions where each solution is best suited.

As a conclusion, this is a very useful book for both an experienced elasticsearch user/administrator as well as somebody that starts now. There is a lot of detail and examples that cover all the necessary functionality that someone needs in order to use and administer elasticsearch successfully. Many common use cases are provided and suggestions for each one are analysed in detail. Scaling and performance optimisation are also very well covered.
This is another indispensable book in ‘.. in Action’ Manning series, that has a very good balance between the theory and practice of its subject. Highly recommended.

Disclosure of Material Connection: I received this book free from the publisher. I was not required to write a positive review. The opinions I have expressed are my own. Regardless, I only recommend products or services I use personally and believe will add value to readers.

PostGIS in Action by Regina O. Obe & Leo S.Hsu (Manning)

This is another book from Manning publications in the excellent ‘.. in Action’ series, which guides the reader through some practical uses of the book’s subject. In this case the subject is PostGIS, which for people that come across the term for the first time, is a spatial database extender for the PostgreSQL database management system. As described in the introduction the audience of this book includes GIS Practitioners and Programmers, DB Practitioners as well as Scientists, Researchers, Educators and Engineers. That makes it obvious that the audience covers a wide spectrum of professionals that would have various degrees of experience with the subject matter.

The material is divided in three main parts, which are: Learning PostGIS, Putting PostGIS to work, and Using PostGIS with other tools as well as four additional appendices.

The first part about Learning PostGIS is an introduction to GIS database concepts and practices, that introduces the geometry, geography, raster and topology types and what problems can be solved by each one of them. There is a thorough explanation of what PostGIS is and what you can do with a spatially enabled database that is not possible with a relational database. There are also chapters describing the spatial types that PostGIS offers and their related functions, an introduction to spatial reference systems and their concepts, tools for loading spatial data as well as desktop tools for viewing and querying them, and the use of geometry, geography and raster functions, geocoding and finally an introduction to spatial relationships.

The second part Putting PostGIS to work, is where all the pieces are put together, using the theory foundation from the previous part, in order to solve real world problems to questions like: which places are within X distance and what are the N closest places?
These cover the traditional methods of finding closest neighbours as well as KNN indexes. Following that there is a section dedicated to geotagging. Geometry and geography processing has its own chapter to demonstrate techniques to manipulate geometries, and some of the most common problems and solutions related to them. Other chapters include raster processing, topology which includes creating a topology, and building and working with topogeometries as well as the simplification and validation of them. The final two chapters of this part offer the reader practical solutions in how to organise the spatial storage depending on the requirements, and some very useful tips about query performance tuning and optimisations. It should be also noted that throughout the book there are plenty of examples for the reader to follow, and especially in this part, that are of great practical use.

In the Using PostGIS with other tools part we are told how PostGIS can be extended by means of add-ons like the PostgreSQL procedural languages PL/R and PL/Python that allows us to use the wealth of statistical functions and plotting capabilities of R as well as the numerous Python packages. A variety of travelling-salesperson problems are displayed in this section, and the pgRouting used for building routing applications is also covered. The remaining chapters cover server-side mapping servers and client-side mapping frameworks to display PostGIS data on the web.

Finally the appendices have a very useful section with additional resources, instruction for installing PostGIS, an SQL primer and a separate section with the PostgreSQL features that includes table inheritance, roles, functions and performance tips.

To summarise, this is an extremely useful book for a variety of professional people interested in discovering PostGIS and at the same time PostgreSQL. It does not require any previous knowledge of geospatial databases as there is a great explanation and coverage of the theory, systems and tools needed. It would be helpful if the reader has some knowledge of SQL in order to follow the examples provided, even though there is very good appendix that covers SQL.
A highly recommended book for starting your exploration in the world of spatial databases.

Disclosure of Material Connection: I received this book free from the publisher. I was not required to write a positive review. The opinions I have expressed are my own. Regardless, I only recommend products or services I use personally and believe will add value to readers.

Book Review: BDD in Action by John Ferguson Smart (Manning)

As the subtitle of the book ‘Behavior-driven development for the whole software lifecycle’ suggests, this is a book describing Behaviour Driven Development for the different phases of software development.

It starts with a very general description of BDD, the problems that it addresses, the general principles and the origins of it, as well as the pros and cons in different organisations and team scenarios. Throughout this general description there are plenty of references and links for anyone wishing to have more details.

This is followed by a real life project, that goes through the phases of requirement analysis, creation of the features, stories and examples and the distinction and differences between each of them. There are various techniques described, including Feature Injection for identifying business goals and supporting features, Impact Mapping for high-level requirement visualisation, Purpose-Based Alignment Model for judging how much effort you should put into different features, and the identification of stakeholders and their roles. There is also a description about Real Options and Deliberate Discovery principles.

There are more detailed examples to automating scenario steps, described in different languages (Java, Python, .NET, Javascript) and diffent tools (JBehave, Cucumber-JVM, Behave, SpecFlow and Cucumber-JS). These are used to go from executable specifications to automated acceptance tests. There two separate chapters following that, describing the two different ways for automating the acceptance criteria depending on whether they are for the UI layer or the non-UI requirements.

Last but not least, there is a chapter dedicated to explaining the relationship between BDD, TDD and Unit Testing, another one describing the idea of living documentation, as well as reporting and project management, and finally the role of BDD in continuous integration (CI) and continuous delivery.

Throughout the text there are a lot of diagrams to help explain the ideas better, as well as a lot of references for more detail. Even though most of the examples are based in Java, the principles, ideas and techniques are easily applied to other languages and tools.

So, in summary this is a very useful book for anyone wanting to learn how Behaviour Driven Development can be used in practise. There are different sections that are targeted to different roles from business stakeholders, to testers and to developers. Hopefully when your whole team reads this book, the whole idea of BDD can be understood better, and used to build the software right as well as build the right software.

Disclaimer: This book was reviewd under the Manning reviewers program.

Book Review: Exploring Everyday Things with R and Ruby by Sau Sheong Chang (O’Reilly)

Exploring Everyday Things with R and Ruby, as the title suggests, is a book about data exploration, written in a very easy and very unusual way, that will make it hard to put down.

In the beginining it starts with a short introduction to the two languages used to achieve its purpose.

It gives a short explanation about the reason of using Ruby, followed by installation instructions and some basic ruby information. After that there is a short introduction to Ruby’s UI toolkit called Shoes.

The second initial part covers the other language to be used in the examples to follow, R. There are again the reason for picking R, installation instructions and a brief introduction to the capabilities of the language and especially the data analysis and graphing abilities.

Following the first two parts, is where the book starts to get really interesting and fun and it will certainly make you want to try the examples worked on.
Subjects like ‘Offices and Restrooms’ which is about deternining the correct people-to-restrooms ratio, ‘How to Be an Armchair Economist’ about a market economy simulation, ‘Discover Yourself Through Email’ dealing with email data mining, ‘In A Heartbeat’ for measuring the hearbeat, including a homemade digital stethoscope, ‘Schooling Fish and Flocking Birds’ a simulation of the Boids algorithm in Ruby, and finally ‘Money, Sex and Evolution’ an entire artificial world populate by the roids of the previous example.

All of the examples are fascinating, something that I would never imagine it would be possible to simulate before reading this book, and also include information about their specific fields.

So in conclusion, it is a very enjoyable, interesting and out of the ordinary book, helped greatly by the author’s unique writing style, and one that I would recommend to anyone with an interest in Ruby or R to read.