Tuesday, June 25, 2013

Why neography-batch is useful to me

I'm working on a ruby on rails project and I decided to use neo4j as one of the data stores. The software is about writing medial case reports and all reports are internally deconstructed into their syntactical and semantic structure. During that process, words are stored as neo4j-nodes (organised in sentences, chapters and reports). Words are also connected to other words trough neo4j-relations (e.g.: preposition, adjective-modifier, possession-modifier etc..) and sentences are connected to medical concepts (e.g.: Retroperitoneal hemorrhage) and concepts can be connected to other concepts.

This process goes to show that, when saving a report, a considerable amount of nodes and relationships have to be created, updated or delete. I tried following approaches to access the neo4j-server:

  1. Neo4j Java API via Rjb (Ruby Java Bridge): This approach was too slow since the Java API is fine-grained and each call into the Java API had to be bridged between ruby and java. Because I planned to deploy the software in the cloud (Heroku and the neo4j-plugin) this approach wouldn't have worked anyway.
  2. Neograpyh: This was a better solution since Neography is a ruby-gem which accesses neo4j trough its REST API. In order to have a good performance I exclusively used Cypher queries and Batches. Everything else was too fine-grained for my purposes. Additionally, I used batches to implement transactions. I aggregated all 'commands' into one batch which was sent over the network as a whole and had it executed by the neo4j-server in one transaction. This solution enabled me to define reports as an Aggregate and to always modify them in one batch. 
However, in the course of creating and/or modifying a report it proved difficult to use Neography's native batch functionality. The results produced by different parts of my software couldn't be easily aggregated into one single batch. Therefore, I wrote a tiny extension to Neography called neography-batch. The neography-batch helps composing several batches into larger batches and referencing specific 'commands' in a batch from the same batch or from another batch. If you are interested, please see neography-batch on github or RubyGems.org.


Marco said...

Jim Webber the lead developer of Neo4J gave a good talk at the GOTO conference 2013 with the title "A"Little Graph Theory for the Busy Developer". Unfortunately there is no video available but the slides are online.


Jörg Jenni said...


Thanks a lot. I liked the hints in the slides about graph-theory/algorithms. I think that's a good starting point when I need to know more about theory.