Thursday, March 31, 2011

Recursion

I have a natural tendency to write libraries which provide unusual functionality. Libraries that I write always start out simple. I want to do X, so I write function Y.

Inevitably, this turns out to not be enough for whatever I am working on. I need function X to check for condition C, and IF(C), run X again. When working with event/state-based programming as I do, this quickly becomes a huge issue. If the inner loop free()s data still being used on the outer loop, at best I can hope for something simple going wrong, an IF(FREED_POINTER) evaluating as true or the like. At worse, crash. At absolute worst, extremely complex crash that takes hours to successfully debug and fix. Other problems appear as well, such as failures in maintenance of scheduled tasks and losing track of objects allocated in the inner loop.

This has recently become more relevant in my work to convert Zentific's backend daemons from using threaded and synchronous design with Glib and tons of mutexes to an asynchronous one using EFL. Both of the libraries which I use, Azy and Esskyuehl (esql), run recursively.

Azy uses recursivity to implement state-based rpc method calls which can be dynamically "rewound" to simulate a synchronous environment. However, in the course of the rewind action, Azy initially was unable to track the correct state of objects used in both inner and outer loops, and users were incorrectly allowed to manipulate objects which had either been freed or should not have existed to begin with. This is fixed now with the addition of more aggressive type checking and blocking free()s from the inner loop, but it has reminded me of the recursive main loops previously added to ecore for webkit compatibility. More on this in another entry.

Esskyuehl had other issues. In order to function properly, it maintains a list of queries for each database connection, sending the queries in order after the previous one has completed. Normally, only a very small amount of recursing is done: a query completes, the connection object updates its query list, and if there are more queries it runs the connection function again from within the completion function.
With a connection pool, this becomes trickier. When a query returns, the connection object updates its lists, then checks the rest of the pool to see if any connections need to have their queries rebalanced to other objects for optimal query speed. Thus, the completed query will rebalance the pool, sending any rebalanced queries on other connection objects and then sending queries for itself.
The problem here arises when a connection does not update its query accounting properly. A stale accounting list here will lead to a permanently occupied pool connection, or, if things get really out of control, callbacks/events used with the wrong query data. I have fixed more than one bug which resulted from such an error.

The common thing that I have noticed here is that most of the issues I have run into are the result of lack of recursion "glue", or an api to manage such things. This begs the question, is it possible to write an api for recursion?