Saturday, November 25, 2017

Udacity Lab: Simulated Annealing Lab Results

Introduction

I'm currently doing a course at Udacity and I'm using this post to share some results with my peers. We were asked to analyse how changing certain properties of the Simulated Annealing Algorithm would change the solutions of the Traveling Salesman Problem. I produced some histograms which I found interesting enough to share. My apologies if I don't explain anything further.

Changing the Number of Cities

Nobs 100, Min 1426, Max 2450, Mean 1836, Var 63180, Skewness 0.622


Nobs 100, Min 4487, Max 7316, Mean 5840, Var 277449, Skewness 0.0098

By increasing the number of cities in the second graph we increase the complexity of the problem and we see that the algorithm most of the time only finds a sub optimal solution. The optimal solution is at the Min-Value to the left and only with 10 cities we see a graph that is skewed to the right. 

Changing Temperatur

Nobs 100, Min 1426, Max 2501, Mean 1895, Var 86949, Skewness 0.439

Nobs 100, Min 1426, Max 2439, Mean 1853, Var 76739, Skewness 0.391

Nobs 100, Min 1426, Max 2427, Mean 1859, Var 74511, Skewness 0.363

Nobs 100, Min 1426, Max 2436, Mean 1849, Var 73730, Skewness 0.271

Nobs 100, Min 1426, Max 2564, Mean 2040, Var 92452, Skewness -0.450

We see that choosing a very high temperature has not a huge effect. The first two histograms look the same even though the temperature on the first one was significantly higher. But we see in the last two histograms that choosing a too low temperature is not good as we start to skew to the left.

Changing Alpha

Nobs 100, Min 1426, Max 2368, Mean 1693, Var 37448, Skewness 0.587

Nobs 100, Min 1434, Max 2465, Mean 1870, Var 63441, Skewness 0.335

Nobs 100, Min 1426, Max 2548, Mean 2007, Var 82375, Skewness -0.379


Nobs 100, Min 1426, Max 2608, Mean 2065, Var 75604, Skewness -0.545

Nobs 100, Min 1676, Max 2729, Mean 2198, Var 60915, Skewness -0.283

We clearly see that choosing a high alpha value is best as the graph skews to the right while the variance is still the smallest. A high alpha significantly increase the run time therefore alpha and run time have to be carefully balanced.




Friday, April 1, 2016

Food Hero on wit.ai

The guys at wit.ai list Food Hero on their website now. That's pretty cool.

Have a look here and scroll down to Food Hero.

Friday, October 30, 2015

Food Hero is on the Apple App Store

I've been feverishly working on my first App and finally it's there... on the world famous Apple App Store. It arrived in heaven so to speak.

Food Hero is a restaurant guide with which you interact through a conversation. I always found a conversation based approach very interesting since it opens up a lot of new options which traditional guides, for example yelp, trip advisor, google etc..., don't have. It's an experiment and I have no idea where it goes next. Is it just a curiosity? Is searching for restaurants the right thing to do? Will Siri, Cortana, Google Now & co. do what I envision with Food Hero in future?

Food Hero has been trained with what I imagined people would do with it. I know reality will be much different. And that's my next aim. Learning what people actually do with it.

Download it, try it out and please let Food Hero collect usage data since that is really what I'm after at the moment. If you are concerned about your privacy please check out Food Hero's privacy policy.

For more information please visit:

Tuesday, June 25, 2013

Why neography-batch is useful to me

I'm working on a ruby on rails project and I decided to use neo4j as one of the data stores. The software is about writing medial case reports and all reports are internally deconstructed into their syntactical and semantic structure. During that process, words are stored as neo4j-nodes (organised in sentences, chapters and reports). Words are also connected to other words trough neo4j-relations (e.g.: preposition, adjective-modifier, possession-modifier etc..) and sentences are connected to medical concepts (e.g.: Retroperitoneal hemorrhage) and concepts can be connected to other concepts.

This process goes to show that, when saving a report, a considerable amount of nodes and relationships have to be created, updated or delete. I tried following approaches to access the neo4j-server:

  1. Neo4j Java API via Rjb (Ruby Java Bridge): This approach was too slow since the Java API is fine-grained and each call into the Java API had to be bridged between ruby and java. Because I planned to deploy the software in the cloud (Heroku and the neo4j-plugin) this approach wouldn't have worked anyway.
  2. Neograpyh: This was a better solution since Neography is a ruby-gem which accesses neo4j trough its REST API. In order to have a good performance I exclusively used Cypher queries and Batches. Everything else was too fine-grained for my purposes. Additionally, I used batches to implement transactions. I aggregated all 'commands' into one batch which was sent over the network as a whole and had it executed by the neo4j-server in one transaction. This solution enabled me to define reports as an Aggregate and to always modify them in one batch. 
However, in the course of creating and/or modifying a report it proved difficult to use Neography's native batch functionality. The results produced by different parts of my software couldn't be easily aggregated into one single batch. Therefore, I wrote a tiny extension to Neography called neography-batch. The neography-batch helps composing several batches into larger batches and referencing specific 'commands' in a batch from the same batch or from another batch. If you are interested, please see neography-batch on github or RubyGems.org.

Friday, October 21, 2011

Short abstract of HTML 5

I've recently finished reading Pro HTML5 Programming. The book includes good examples, it is easy to read, and it's a nice introduction to HTML5. Some examples unfortunately didn't work but I guess they were based on an older version of the HTML5 specifications which wasn't supported by my browsers any more.

Following a brief summary of the new HTML5 features ordered by my view of their importance:
  • HTML5 WebSocket: This is a lightweight duplex communication channel between server and websites and it offers a faster way to send small messages back and forth. A WebSocket can just be opened once the http communication has already been established ("Upgrading the http-connection to WebSocket")
  • HTML5 Web Workers: A WebWorker is setup by a JavaScript and executes another JavaScript in parallel. Communication to and from WebWorkers is preferably implemented using 'Cross Document Messaging' or messaging via the Web Storage.
  • HTML5 Web Storage: Stores key/value-pairs in either a session store or a local store. The stores are isolated by their origin (e.g: www.evil.com can't access values from www.example.com). There are events published by the store which are fired when a value changes. Those events can also be used to implement messaging between websites.
  • HTML5 Canvas: A simple API to render 2D-drawings from within the browser using Java-Script. It's also possible to read and modify parts of the canvas using bitmaps.
  • Communication APIs: 'Cross Document Messaging' allows sending messages and events between parts of a website even they are not from the same origin. (which hasn't been possible due to security concerns before). 'XMLHttpRequest Level 2' allows the same for communication to the server. It means a website originating from 'example.net' can send XMLHttpRequests to different origins like 'example.net' and 'example.com' at the same time. The import part is that both sender and receiver of 'cross messages' have to be configured accordingly otherwise the communication is not possible due to security constraints.
  • HTML5 Geolocation: A simple API to access the browser's geographical location. The method of how the location is determined is hidden away by the API. Depending on the kind of browser and hardware the accuracy of the measurement can vary greatly.
  • HTML5 Audio and Video: Embeds audio and video natively in HTML. Defines controls to play audio and video. Audio and video editing is not included.
  • HTML5 Forms API: A bunch of new html tags and attributes that provide us with better semantics and therefore enable the browser to render more advanced or different controls. (e.g a control that just accepts E-mail addresses)
I'm really looking forward seeing where HTML5 will take us and what frameworks and paradigms will evolve around it. Will websites become more like traditional rich clients now?



Saturday, July 30, 2011

Typing effort analyzer in Ruby

I've been working on RePhraser over the course of the last 6 months. RePhraser is a piece of software which aims to help professionals in writing repetitive texts quickly. Unfortunately I haven't been able to make it available to a broader audience yet mainly due to the fact that it only works reliably in Internet Explorer and that there are still some basic features missing. Even so, I've uploaded a short demo here.


RePhraser assists you by anticipating and displaying words while you are typing them. Imagine you being a physician who writes the word "immunocytochemically". You would start to type "immu" whereas RePhraser would bring up all the words that start with "immu". There would be simple words like "immune", "immunogen" or more complicated words like the aforementioned "immunocytochemically". In fact there are a lot of words that starts with "immu". RePhraser would be pretty useless if it displayed easy words like "immune" or "immunogen" at this point because professional writers type those words much faster than RePhraser could display them. To solve this problem RePhraser uses a typing effort model. The basic idea is to display words which are harder to type first and to ignore words which are easy to write. To do so RePhraser rates every word based on the carpalx typing effort model. The carpalx typing effort model takes account of things like weak fingers (like pinky and ring finger), travel distance of fingers, same-finger typing (e.g "uhm"), balanced hand-use vs. right-hand priority etc...


I implemented the carpalx typing effort model in ruby and published source code and Gem on https://github.com/Enceradeira/teanalyzer. The project is named Teanalyzer, which is an abbreviation of typing effort analyzer.


What do you think about Teanalyzer? Or are you interested in RePhraser? Please drop a line or contact me!

Saturday, March 27, 2010

My days at QCon 2010, London

This was my first visit at a QCon-conference and I was quite excited to see some of the 'famous' speakers in the world of software development. There were such interesting talks on the schedule that I even missed out on the talks of Eric Evans and Martin Fowler. May be next year!
Generally I jumped between the different tracks but attended more than one talk from the tracks Software Craftsmanship, Functional programming and Irresponsible Architectures and Unusual Architects.

Following talks were my personal highlights (based on the speaker, importance to me, and what I learned from it):

Bad Code, Craftsmanship, Engineering, and Certification by Robert C.Martin. Nothing new, but nethertheless a very entertaining talk. The bad code video was phenomenal (especially with the depressing background music). He questioned if bad code is written because of deadlines, laziness,  boredom or even job security ('I´m the only guy that can maintain that!'). He also suggested to follow The Boy Scout Rule: "Always leave the campground cleaner than you found it". "The only way to go fast is to go well" was another of his wisdoms. And then he 'evangelized' agile practices like TDD, pair programming, CI etc... that leads us to what he calls 'Pride of Workmanship'.


Sharpening the Tools by Dan North. Good speaker and motivator. He reflected on the way how we learn things (the Dreyfuss model) and that therefore we are in everything that we learn a 'novice', 'advanced beginner', 'competent', 'proficient' or an 'expert'. He suggested that we have always to renew our skills due to the continuous development of new and more effective techniques in the area of software development:
  1. Practise the basics
  2. Learn from other people
  3. Understand trends
  4. Share knowledge
  5. Maintain your toolbox ('Some tools are timeless, some are disposable')
  6. Learn how to learn
Not Only SQL: Alternative Data Persistence and Neo4J by Emil Eifrém. It was the first time that I attended a talk about this subject and I was very curious about it. I realised soon that it's all about 'scalability'. Like: ´what kind of datastore to you need if you are building a twitter-like-application´. Relational databases are strong with well structured not very complex data (e.g salary-list). NoSQL-datastores are better with more dynamically defined, complex data (e.g: persons and their different relationships to each other). I learned that this is not the end of relational databases but that there are now other techniques available when it comes to storing large amount of complex,dynamic data. I know now that a query like 'all friends of Peter that also know Sarah' can be much more efficient and easier be build with a NoSQL-datastore (especially graph-DB's).

The Joy of Testing by John Hughes. A very refreshing but a little bit academic talk about TDD. The key idea is to abstract test-cases to a set of test-'properties'. Think about a number of test-cases that test a method. Wouldn't it be nice to reduce those test-cases to one unique description that could be run by a test-runner and that this test-runner would find edge-cases that you have never thought about it? I'm not (yet) able to apply this to my daily work but I'm still thinking about it.

Kanban - Crossing the line, pushing the limit or rediscovering the agile vision? by Jesper Boeg. A good talk given by a convinced Kanban-evangelist. It was very interesting how deep they integrate product-owner, buisness-analysts and tester into the development process ('developers helps out business-analyst to write down stories if there is a temporary bottleneck'). Sounds like 'extreme scrumming' to me! (very short full development-cycles). I really like the idea of helping out each other so that we can cope with overloaded/under-staffed testers, productowner and business-analysts.

Command-Query Responsibility Segregation by Udi Dahan. Generally he was saying that displayed data (query-part) could completely be decoupled from the persisted data and on the other hand updating the persisted data (command-part) could be done using a much more sophisticated model (e.g: a strong domain-model or a asynchronous event queue etc...). I already knew about this style but new to me was:
  • It's about how we build the UI and what business-services we offer to a user. I got the impression we should display less raw-data (e.g: less data-grids) and let the software behave more intelligent.
  • We could deliberately display stale data (e.g: display a account-list with title 'account-balance as it was on 25.4.2010 at 14:34h'). He says that this is usually no problem to a user.
  • We could try to build much more valuable commands (e.g: a reservation-system that can ´book the best seats for a group of 12 persons were no person must sit alone' instead of letting the user to choose the seats by itself) 
Scenario-Driven Development by Ben Butler-Cole (Unfortunately no slides available). In his talk he was suggesting not to do integration/acceptance-testing on a 'per-story' basis. He mentioned that this could lead to brittle tests and to maintenance problems. He suggested to identify a small number of 'key-scenarios' that spam in their final version several stories. A key-scenario for a banking-application could be: 'Poor guy wants to pay a bill´ that includes the stories 'User pays bill' (that is rejected due to insufficient balance),  'User applies for a overdraft', 'Clerk manages overdraft application', 'User pays bill' (that is not anymore rejected). Scenarios are developed incrementally and extended when a story is going to be implemented. He also showed Twist that was developed by Thoughtworks to support Scenario-Driven Development. An interesting thing is that Twist puts several layer on the top of the application layer, helping to manage reusability and maintainability of the tested UI. These layers are called Scenario-Layer (the scenarios written with Twist, Workflow-Layer (commands like ´go to homepage´) and the Application-model (abstracting the underlying technology with drivers like Selenium). The Workflow-Layer and the Application-model have to be written in Java and enjoy therefore all advantages of a modern programming-language (refactoring,abstraction, object-orientation etc...)

I also attended following less interesting talks:
  • Functional Approaches To Parallelism and Concurrency
  • Demystifying monads
  • Living and working with aging software
  • The Counterintuitive Web
  • Patterns for the People
  • Transactions: Over Used or Just Misunderstood?
  • Fighting Layout Bugs
  • Test-Driven Development of Asynchronous Systems
  • Data Presentation in a Web App: The Journey of a Startup
  • Death by accidental complexity