Adventures in Engrish
Code
Object Oriented jQuery Plugins Mk 2
Aug 26th
In a recent post, I outlined a method to abstract away the complexity of creating an encapsulated jQuery plugin. However, as was pointed out in the comments, there was a missing piece that didn’t allow for arguments to be passed through. More seriously, there was an issue with the binding of the facade function, which meant that only the last defined public function in the class could be called.
As an aside, if you need to bind a variable at the time that it is defined, it isn’t enough to define an anonymous function with a reference to the closure, as that can change by the time the function is called. The bug in my code was in the piece that bound the facade function to the set member functions.
for ( var i in template) {
if (typeof (template[i]) == 'function') {
result[i] = function() {
this.each(function() {
this[i]();
});
};
}
}
By the time the inner function is called, “i” has already been re-bound to the final function name in the template. The solution is to bind the inner function name to the outer function name at the time of definition, which we can do by wrapping it in (yet another!) function.
// Iterates through the set calling the specified function.
function makeIteratorFunction(f, set) {
return function() {
for ( var i = 0; i < set.length; i++) {
set[i][f].apply(set[i][f], arguments);
}
};
}
and then calling that function:
if (template) {
for ( var i in template) {
if (typeof (template[i]) == 'function') {
result[i] = makeIteratorFunction(i, result);
}
}
}
At this point, although “i” will continue to change as the loop continues, the function call is bound through the closure on “f” in the auxiliary function, which remains fixed. Google Chrome’s developer tools certainly make following all that a lot simpler! I’ve updated the plugin with these improvements, which also enables us to pass arguments through, finally allowing:
$('#foo').myplugin().publicMethodWithArguments('hello');
You can get the updated plugin here. Feedback welcomed!
Object Oriented jQuery Plugins
Aug 22nd
I’ve recently begun the process of porting some javascript library code from Prototype to jQuery, and on the whole it hasn’t been too problematic. I really like the element-centric nature of jQuery, whereas Prototype is more like a excellent set of useful static methods. There were only a couple of things I found myself really missing, and they were enumerables, and elegant plugin encapsulation. The first problem was solved with Xavier Shay’s nice enumerables plugin. The second one was more of a challenge.
The lack of encapsulation for jQuery plugins seems to be a common frustration, and there are lots of queries on the web along the lines of “how do I add public methods to a jQuery plugin”. The best solution I found came from Hector Virgen, which allows you to do something like this:
var pluginInstance = $('#foo').data('myplugin');
pluginInstance.publicMethod();
That’s pretty neat, but I didn’t really like the idea of having to go through the data object each time. A couple of other people in the comments felt the same way, but there didn’t seem to be any solutions forthcoming, so I had a crack. What follows is largely based on Hector’s code, so you should go and read that first before you go through this.
We start off with a basic plugin shell, slightly modified from Hector’s, per my taste:
(function($) {
var Celsus = Celsus || {};
Celsus.MyPlugin = function(element, options) {
// Private members
var elem = $(element);
var settings = $.extend({}, options || {});
// Private methods
function _privateMethod() {
console.log("This is a private method!");
}
return {
publicMethod: function() {
console.log(elem);
_privateMethod();
return true;
}
};
};
$.fn.myplugin = function(options) {
return this.each(function() {
var element = $(this);
if (element.data('myplugin')) {
return;
}
var myplugin = new Celsus.MyPlugin(this, options);
// Store the new plugin definition in a data object.
element.data('myplugin', myplugin);
});
};
})(jQuery);
This is a pretty good start. However, as Hector points out, the main issue is that the plugin returns a jQuery object to enable chaining. In many complex instances, chaining isn’t necessarily something you’re going to want to do, so we make a small sacrifice and forego that convenience. Instead, we are going to return a set of plugin instance objects:
$.fn.myplugin = function(options) {
var result = [];
this.each(function() {
var element = $(this);
if (!element.data('myplugin')) {
// Initialise
var myplugin = new Celsus.MyPlugin(this, options);
// Store the new functions in a validation data object.
element.data('myplugin', myplugin);
}
result.push(element.data('myplugin'));
});
};
At this point, it’s not looking too great. We’ve broken chaining because we no longer return a jQuery object, but if you try and call
$('#foo').myplugin().publicMethod();
it still doesn’t work. This is because, although each plugin instance has the publicMethod() function, they are contained inside a bare array. At this point we could actually do:
($('#foo').myplugin()[0]).publicMethod();
($('#foo').myplugin()[1]).publicMethod();
Or something similar with each(), but this is very messy. We need a bit of syntactic sugar. To achieve this, the next step is to take that array, turn it into something we can work with, and add a facade, so that every public function we’ve just mixed in is presented as an option on the plugin instance set. We do that by adding the following to the plugin definition:
result = $(result);
var template = result[0];
if (template) {
for ( var i in template) {
if (typeof (template[i]) == 'function') {
result[i] = function() {
this.each(function() {
this[i]();
});
};
}
}
}
Firstly, we convert the array to a jQuery object. Then, we look at the first instance in the set and use that as a template. It should contain all the public functions we’ve defined in our definition class, and all the instances are of the same type, so we can safely use the first entry’s template for all of them. We then enumerate through all the public functions and create a proxy or facade function on the set, which simply calls the closure of that function for each element in the set. This ensures that when you call
$('#foo').myplugin().publicMethod();
it is functionally equivalent to:
$('#foo').myplugin().each(function(instance) {
instance.publicMethod();
});
So that’s pretty cool. We’ve avoided namespace pollution, we can mix in any number of public methods, and private members and methods behave as you’d expect them to. It isn’t possible to access public variables in this manner of course, but that is easily remedied by using public getters and setters. We can even add in a reference back to the jQuery object, so we can get chaining back in in some form:
result.$ = this;
Which lets us do:
$('#foo').myplugin().$.addClass('bar');
This is useful when you’re doing plugin initialisation, but thereafter it’s a bit redundant as the actual plugin call simply returns an object which you then ignore. The only final point is that there’s quite a lot of boilerplate going on here just to get set up. In actual fact, the actions are pretty generic, so we can extract all that code and put it in its own plugin. A plugin to generate a plugin!
(function($) {
$.fn.encapsulatedPlugin = function(plugin, definition, objects, options) {
var result = [];
objects.each(function() {
var element = $(this);
if (!element.data(plugin)) {
// Initialise
var instance = new definition(this, options);
// Store the new functions in a validation data object.
element.data(plugin, instance);
}
result.push(element.data(plugin));
});
// We now have a set of plugin instances.
result = $(result);
// Take the public functions from the definition and make them available across the set.
var template = result[0];
if (template) {
for ( var i in template) {
if (typeof (template[i]) == 'function') {
result[i] = function() {
this.each(function() {
this[i]();
});
};
}
}
}
// Finally mix-in a convenient reference back to the objects, to allow for chaining.
result.$ = objects;
return result;
};
})(jQuery);
With this little plugin, our plugin initialisation code is a lot lighter:
$.fn.myplugin = function(options) {
return $.fn.encapsulatedPlugin('myplugin', Celsus.MyPlugin, this, options);
};
Not too bad! The actual code that specialises a plugin is tucked away in a neatly encapsulated object, we have access to all the public methods defined on it and multiple instances can happily live side by side and be invoked separately without trampling on each other. You can grab a copy of the plugin generating plugin from here. This is still new cod and there might be the odd glitch, so if you spot any, or have other ideas, be sure to leave a comment!
Subtle Behaviour of The Static Keyword in PHP 5.3
Aug 11th
Version 5.3 of PHP added a new, very useful and long-overdue feature allowing for late-static binding using the ‘static’ keyword instead of the ‘self’ keyword. The PHP manual has a good explanation of the difference. However, there is a subtle behaviour that I certainly hadn’t anticipated and might catch you out.
A typical use-case of the new syntax is to lazy-load static variables in inherited child classes, which offers you the choice of avoiding the overhead of object-instantiation to provide class differentiation. Unfortunately, it is not quite as straightforward as you might think. Consider the following piece of code:
class BaseClass {
static $_value = null;
public static function getValue() {
if (null === static::$_value) {
static::$_value = get_called_class();
}
return static::$_value;
}
}
class ChildAClass extends BaseClass {}
echo BaseClass::getValue() . "\n";
echo ChildAClass::getValue() . "\n";
What’s going on here then? When I look at that piece of code, I expect the output to be:
BaseClass ChildAClass
Instead, it appears the call to ChildAClass::getValue() fails the conditional test and returns the static value of BaseClass, giving the following output:
BaseClass BaseClass
Not very helpful. How can we work around this? The answer is to ensure you declare the static variable in each of your child classes as well as the parent. It’s straightforward enough, but a pain to remember every time. The following example demonstrates:
class ChildBClass extends BaseClass {
static $_value = null;
}
echo BaseClass::getValue() . "\n";
echo ChildBClass::getValue() . "\n";
BaseClass ChildBClass
So it appears (at least for PHP 5.3.3 (OS X)), that static only really means static if you declare it yourself each time. This might be expected behaviour, but I consider it a bug, and it can certainly lead to hard to debug situations. It doesn’t seem to be documented clearly in any standard location that I could find.
Short story – if you are looking to lazily-populate static variables using the new syntax, be sure to declare the variables in all of the child classes you are going to use!
Do You Still Want Gengo?
Jun 30th
Gengo was (and remains) a project that I am very fond of. Originally developed for my own personal use to blog in Japanese as well as English, it turned into a fairly full-featured plugin that was used by quite a few people. Unfortunately, time pressures meant that I had to drop support for it a few years ago. Even so, every few weeks I get an email or comment (usually nothing to do with the post it was written against!) asking me to consider restarting development.
It might just be the alcohol talking, or end-of-financial-year madness, but with WordPress 3.0 recently out and a couple of requests already this week, I’m seriously considering starting development again. I do have a couple of projects I want to do though, so I only want to dedicate time to Gengo if it’s going to be used. My Japanese has sadly diminished to the point where I’m practically mono-lingual again, so I’m not going to be blogging in multiple languages any time soon, and it would purely be for others to use. So, if enough people email me or comment in the next week or so, I’ll give it a crack. Second time around, I’ll hopefully be able to design it more cleanly and more quickly, and WordPress’ plugin architecture hopefully hasn’t changed too much while I’ve been away.
So if you want to see a new version of Gengo, please let me know. On the other hand, I hear very good things about WPML, so If Gengo’s time has passed, feel free to say that too! I won’t have hurt feelings, promise
Using Multiple Start and End Keys for CouchDB Views
Mar 24th
CouchDB view collation is great and only has one real drawback that has caused me any real pain – the inability to handle queries that need to be parameterised by more than one dimension. These are suprisingly common, including problems such as “find me posts in Category A in March”.
This can be handled with a function that emits keys like:
["Category A", "2010", "03", "Post 1"] ["Category B", "2010", "03", "Post 2"] ["Category A", "2010", "03", "Post 3"]
and then use:
startkey=["Category A","2010","03"]&endkey=["Category A","2010", "03",{}]
However find its reciprocal “All March posts regardless of category” is problematic. You can’t do:
startkey=[*,"2010", "03"]&endkey=[*,"2010", "03",{}]
where * (or _, or nil, or pass) would represent “all”.
To handle this, there are currently only 2 options; design a new view with the the key components ordered differently, such that they emit:
["2010", "03", "Post 1"] ["2010", "03", "Post 2"] ["2010", "03", "Post 3"]
or, make multiple connections to the database like
startkey=["Category A","2010","03"]&endkey=["Category A","2010", "03",{}]
startkey=["Category B","2010","03"]&endkey=["Category B","2010", "03",{}]
startkey=["Category C","2010","03"]&endkey=["Category C","2010", "03",{}]
where you have a query for each category.
Neither approach is particularly satisfactory. On a recent particular problem set, a single view would be many hundreds of gigabytes of data, and while space is cheap, it’s not that cheap. Additional views were not an option. That same data set contained around 2000 different categories (or their equivalent) and 2000 connections for a particular query seemed excessive.
Since 0.9, Couch has had a way of passing multiple keys to a query in the post body of a view request. Unfortunately, this only supported precise keys, not start-end key ranges. There has been a ticket in the issue tracker to add this additional support since October, but it’s classed as a minor priority and nothing had been done on it. So I decided to have a crack.
On the face of it, it seems like a fairly simple change, only affecting the HTTP View Erlang module. On the other hand, I’ve probably written about 100 lines of Erlang in my life and never looked at the CouchDB code before, so it’s entirely possible I’ve done something wrong. Regardless, the following is a simple solution that appears to work correctly.
The output_map_view and output_reduce_view functions already had the ability to handle start and end keys, but they were being artificially restricted to treat the supplied keys and both start and end. I used Erlang’s pattern matching to make this a little richer:
case Key of
{[{<<"startkey">>,StartKey},{<<"endkey">>,EndKey}]} ->
nil;
_ ->
StartKey = Key,
EndKey = Key
end
and then passing those new variables in the appropriate place. This seemed to work well. I presume that the Keys parameter is processed just like multiple connections, and then the results aggregated, because the results are exactly the same as a call with the same parameters in the query string.
One final change was that group_level=X is mysteriously disallowed for Multikey queries. I took a punt and removed this restriction and it all seemed to work fine. I can only guess that this restriction didn’t make sense when you had to pass precise keys.
I then query using the following as POST data:
{
"keys": [
{
"startkey": ["Category A","2010","03"],
"endkey": ["Category A","2010","03",{}]
},
{
"startkey": ["Category B","2010","03"],
"endkey": ["Category B","2010","03",{}]
}
]
}
With this solution, I’m able to query 2000 services simultaneously, group them at any level I like, and get back the results at the lightning speed I’ve become accustomed to.
One small caveat: If I want to get back keys across non-contiguous blocks like this:
startkey=["Category A","2010","03"]&endkey=["Category A","2010", "03",{}]
startkey=["Category A","2010","06"]&endkey=["Category A","2010", "06",{}]
startkey=["Category B","2010","03"]&endkey=["Category B","2010", "03",{}]
startkey=["Category B","2010","06"]&endkey=["Category B","2010", "06",{}]
To get all posts in Category A and B in March and June, I can. However, if I have a reduce function and group at level 1, I still end up with 4 rows, 2 for Category A, 2 for Category B. I think this is because the queries are being run independently, without reference to the other. To do a full aggregation across time periods (for example to get the total number of posts by category in March and June), I’d still need to do a client aggregation on the resulting data-set. This may or may not be a big problem for you; it’s certainly something I can live with.
The CouchDB issue lives here, and the patch to 0.10.1 lives here.
Handling JSON Objects in CouchDB Native Erlang Views
Mar 18th
I’ve been working with CouchDB a fair bit in recent weeks and am really enjoying it so far. Once I got my head around how to structure views and take advantage of view collation, I found it to be far more expressive than I first thought.
I still have a couple of gripes, the largest one of which is that you can’t use a wildcard parameter at the beginning of your view keys, so if you need to get “items by user by category” and “items by category by user”, you need two views. I’m sure there are good architectural reasons for this, but for me it’s the one place where collation lets me down. For at least one of the solutions I’m working on, multiple views are a major problem, as even one takes up 120GB (and counting).
But, to the main point. Native Erlang views are now possible, and if you can create them, potentially significantly faster than Javascript ones. There are a couple of gotchas though, not least for me the handling of JSON objects.
We start with a document like this:
{
"_id": "36kem",
"_rev": "1-c895d5a55945a9898880bf870a3b3025",
"type": "usage",
"timestamp": [
"2010",
"02",
"28",
"23",
"10"
],
"data": [
{
"t": "E000005861",
"i": "232920",
"o": "2365730"
},
{
"t": "E000006504",
"i": "15784",
"o": "17786"
},
{
"t": "E000006505",
"i": "16661",
"o": "17786"
}
]
}
In reality there are thousands of entries in the data array, but this will do. Our aim is to emit one key-value pair for each item in the “data” field of each document of type “usage”. In Javascript this is pretty trivial. Erlang however, proves more of a challenge.
Based on pointers from the CouchDB Wiki, I started with:
fun ({Doc}) ->
case proplists:get_value(<<"type">>, Doc) of
<<"usage">> ->
Emit(proplists:get_value(<<"_id">>, Doc), null);
_ ->
ok
end
end.
and was very happy to see that work. Two things to note here: Don’t forget the {} around the Doc in the function definition or you’ll get strange errors, and; to get the value of a field in a document, you can use the standard proplists:get_value(<<"fieldname">>, Doc) construct. So far so good.
The main issue for me came with manipulating the “data” field. I didn’t actually want to emit null, but instead the “i” and “o” parts of the data field. First off, I tried:
lists:foreach(fun(Item) -> Emit(null, [proplists:get_value(<<"i">>, Item), proplists:get_value(<<"o">>, Item)]) end, proplists:get_value(<<"data">>, Doc)
But met with some (very long) errors. (Gripe number two – they could really do with humanising the Erlang crash dump.)
It took me quite a few attempts, including stripping it right back to confirm that I had an array to iterate and that each object does in fact contain an “i” and an “o” field, before I found the problem, which is this:
Even though Documents are defined within {} braces, and JSON objects within that definition are also defined within {} braces, you cannot access them the same way in an Erlang view.
proplists:get_value(<<"field">>, Doc) is fine for the document as a whole, but you can’t access JSON objects the same way. Bad assumption on my part. Luckily, the answer I got to another Stack Overflow question recently pointed the way.
To access the data we need to pattern match the components using the Erlang representation of a JSON object, like so:
{[{<<"t">>, TrackingID},{<<"i">>, In},{<<"o">>, Out}]} = Row
Ugly, hey?
Useful though, as it extracts the TrackingID, In and Out values all in one go, kind of like a list() statement on steroids.
With that in place, and a little more tidying up of the code, we arrive at:
fun({Doc}) ->
case proplists:get_value(<<"type">>, Doc) of
<<"usage">> ->
[Year, Month, Day, Hour, Minute | _] = proplists:get_value(<<"timestamp">>, Doc),
lists:foreach(fun(Row) ->
{[{<<"t">>, TrackingID},{<<"i">>, In},{<<"o">>, Out}]} = Row,
Emit([TrackingID, Year, Month, Day, Hour, Minute],[In, Out])
end, proplists:get_value(<<"data">>, Doc));
_ ->
ok
end
end.
That little beauty lets me query the usage of a service at any granularity over data from the last 7 years in a faster time than the browser can render it. Across an HTTP connection to a data source 1000km away. On development hardware.
CSS Technique: Morning Sunset
Mar 10th
After coming across an excellent article on generating full page images, I had the idea that it would be cool to blend between multiple images as a user scrolled down the page. The somewhat artistic conceit that a user could scroll a similar scene from morning to sunset, ironically came to me at sunset on Saturday and was finished by morning!
The effect is pretty simple to set up, though does require a specific bit of markup. We start with 2 images and set a z-index on them, such that the first one is in front of the second. With the full page image CSS rules, the first image fills the screen and the second is completely obscured.
<img id="morning" class="bg" style="z-index: 2;" src="morning.jpg" /> <img id="sunset" class="bg" style="z-index: 1;" src="sunset.jpg" /
For the CSS, In addition to the standard full page image rules, I add an additional higher z-index, plus a little bit of transparency This allows the content to sit on top of both of these images, and for the effect to be more pronounced.
div#content {
/* This is the only important rule */
/* We need our content to show up on top of the background */
position: relative;
z-index: 10;
/* Added some opacity to demonstrate the effect better */
opacity: 0.8;
filter: alpha(opacity=80);
}
So far, so what? We’ve arranged for a user to download an image he can’t see – not so good. The magic comes with a blending function tied to the scrollbar. The idea is that the top layer becomes more and more transparent as the user scrolls through the content.
In working with the Zend Framework, I’ve been getting to grips with Dojo and its supporting classes, so I was happy to see that Dijit had tools for getting the dimensions of the viewport. With this information, I was able to calculate the scroll ratio, which gave me a number ranging from 0 at the top of the page to 1 at the bottom.
Dojo also has a great style() function, which allows you to set opacity and have it “just work”, across all browsers, regardless of their non-standard filter() shennanigans. At the outset, the top layer image has an opacity of 1. Subtracting the scroll ratio from this allows it to be fully opaque at the top of the page, and fully transparent at the bottom.
dojo.subscribe("/window/scrolled", function(e){
// Calculate the scroll percentage, and adjust the opacity of the top layer, appropriately.
var vp = dijit.getViewport();
dojo.style("morning", {
"opacity": 1 - (vp.t / (document.documentElement.scrollHeight - vp.h))
});
});
There is some rate-limiting code going on to prevent the event firing continuously and slowing down the page. That came from a helpful Dojo Cookie by Peter Higgins over at Dojo Campus. I haven’t played fully with the rate limiting yet, but 50ms seemed to give a reasonable balance of subtle movement without overloading the page.
Degradation is variable – without Javascript, the user just sees the top image and there’s some overhead of the second image that is never seen. Without CSS, there’s more of a problem, as the images are inlined in the page, which pushes all the content down. If there is a full page image solution using background images only, I’d love to hear about it.
There’s a working, self-describing example of the effect so you can try it out for yourself. It works best in Chrome, with its superior Javascript handling, but works in all modern browsers to a reasonable degree.
The next step is to generalise the code, so that I can pass it an array of image URLs and have it automatically build the markup necessary to generate the effect. With more work on the blending function, I’ll be able to have it blend between multiple layers and potentially follow different rates – I might investigate Dojo curves for that.
I’m not aware of this technique being described anywhere else, but if there are other approaches to doing this, I’d be interested to see them.
Modelling Recurring Events in PHP
Feb 7th
In a previous article, I described how set operations could be modelled in PHP. With that foundation, we can begin to generate complex date criteria suitable for modelling recurring events.
There are a number of different kinds of date condition, which Martin Fowler terms “Temporal Expressions”. Typical temporal expressions include “Last Day in the Month”, “Nth Day of the Week” and “Repeats Yearly”. We can model each of these as a separate class and then combine them as necessary to provide a flexible architecture.
In order to take advantage of the set operations we defined previously, all of the classes must implement the same trivial interface:
interface Celsus_Temporal_Expression_Interface {
/**
* Determines whether the date specified is included in this temporal expression.
*
* @param string $date
*/
public function includes($date);
}
For our first example, let’s model conditions like “11th day of the month”. By using negative numbers, we can also use the same class to model “11th day from the end of the month”:
/**
* Handles scheduling rules like "11th day of the month".
*/
class Celsus_Temporal_Expression_DayOfMonth implements Celsus_Temporal_Expression_Interface {
/**
* The day of the month we are interested in. If the day is less than zero,
* it is interpreted as being from the end of the month.
*
* @var int
*/
private $_day;
public function __construct($day) {
$this->_day = $day;
}
public function includes($date) {
return $this->_day > 0 ? $this->_fromStartOfMonth($date) : $this->_fromEndOfMonth($date);
}
private function _fromStartOfMonth($date) {
return $this->_day == date('j', strtotime($date));
}
private function _fromEndOfMonth($date) {
$timestamp = strtotime($date);
return ((date('t', $timestamp) - date('j', $timestamp)) + 1) == abs($this->_day);
}
}
Usage is straightforward:
$tenth_of_the_month = new Celsus_Temporal_Expression_DayOfMonth(10);
var_dump($tenth_of_the_month->includes('2010-01-10')); // True
var_dump($tenth_of_the_month->includes('2010-01-07')); // False
$three_days_before_end_of_the_month = new Celsus_Temporal_Expression_DayOfMonth(-3);
var_dump($three_days_before_end_of_the_month->includes('2010-01-10')); // False
var_dump($three_days_before_end_of_the_month->includes('2010-01-28')); // True
A second type of query is of the form “every 3 months”. This is slightly more involved as it needs to also use a specified date as the base from which to start counting:
/**
* Handles scheduling rules like "every 3 months"
*/
class Celsus_Temporal_Expression_MonthsFromStart implements Celsus_Temporal_Expression_Interface {
/**
* The count specifying the interval of months we are interested in.
*
* @var int
*/
private $_count;
/**
* The start date of the sequence. The number is stored in ISO
* format, i.e 2000-12-31.
*
* @var string
*/
private $_start;
public function __construct($start, $count) {
$this->_start = $start;
$this->_count = $count;
}
public function includes($date) {
// Take the specified month, minus the start month, mod the interval. If it is
// zero then this date should be included.
return (0 == ((date('n', strtotime($date)) - date('n', strtotime($this->_start))) % $this->_count));
}
}
Again, usage is similar:
$every_two_months = new Celsus_Temporal_Expression_MonthsFromStart('2010-01-01', 2);
var_dump($every_two_months->includes('2010-03-01')); // True
var_dump($every_two_months->includes('2010-04-01')); // False
Now, making use of our set operations previously defined, we can combine them for more complex rules such as “the last day of every quarter”:
$last_day_of_month = new Celsus_Temporal_Expression_DayOfMonth(-1);
$every_three_months = new Celsus_Temporal_Expression_MonthsFromStart('2010-01-01', 3);
$last_day_of_quarter = new Celsus_Set_Operation_Intersection('Celsus_Temporal_Expression_Interface');
$last_day_of_quarter->addElements(array($every_three_months, $last_day_of_month));
var_dump($last_day_of_quarter->includes('2010-01-31')); // False;
var_dump($last_day_of_quarter->includes('2010-06-30')); // True;
In general, intersections are going to be the most useful set operation for this kind of use, but are not the only possibility. If we modify this to use a Union, we can test to see if dates are either the last day of the month, or within a month every 3 months from the start:
$last_day_of_month = new Celsus_Temporal_Expression_DayOfMonth(-1);
$every_three_months = new Celsus_Temporal_Expression_MonthsFromStart('2010-01-01', 3);
$last_of_month_or_every_three_months = new Celsus_Set_Operation_Union('Celsus_Temporal_Expression_Interface');
$last_day_of_quarter->addElements(array($every_three_months, $last_day_of_month));
var_dump($last_of_month_or_every_three_months->includes('2010-02-28')); // True;
var_dump($last_of_month_or_every_three_months->includes('2010-06-05')); // True;
A handful of temporal expressions can be downloaded here. Further ones are left as an exercise.
With an appropriate database schema for persistence of rules, such as Apple’s iCal reference, this provides a comprehensive method for defining complex date ranges and testing them quickly.
Set Operations in PHP
Feb 4th
The standard set operations of Intersection, Union and Difference are well-understood and have a wide number of applications. Martin Fowler gives a neat example here, for modelling recurring events where there are multiple conditions.
“The last day of every second month” for example, can be represented by an intersection of two conditions, “last day of the month” and “every 2 months”. More generally, they can be applied to any situation where you want to do combinations of criteria using AND and OR.
To model this in PHP, we can define a number of Set Operation classes to which we can attach 2 or more elements for testing. The 3 classes, Union, Intersect and Difference will all be constructed in the same way, so constructor code can be placed in a common ancestor:
abstract class Celsus_Set_Operation_Abstract {
protected $_setInterface = null;
public function __construct($setInterface) {
if (!is_string($setInterface)) {
throw new Celsus_Exception("Interface must be a string.");
}
if (!interface_exists($setInterface, true)) {
throw new Celsus_Exception("$setInterface is not a valid interface.");
}
$this->_setInterface = $setInterface;
}
/**
* Adds an element to the set.
*
* @param StdClass $element
*/
public function addElement($element) {
if ($element instanceof $this->_setInterface || $element instanceof Celsus_Set_Operation_Abstract) {
$this->_elements[] = $element;
} else {
throw new Celsus_Exception("Element must implement $this->_setInterface or Set");
}
}
}
In order to test the elements of the set, we need to ensure that they are all testable in the same way. We could force a particular method name, such as ‘contains’ or ‘includes’, but it turns out we don’t have to. As long as they all have the same testing function, we don’t actually care what it is called, and library code should be as transparent as possible. Therefore we construct the set operator with an interface, which all set elements must implement.
The addElement function provides a method by which we can add elements to the set. If the element doesn’t meet the required interface, it is rejected.
Next, let’s define a set of classes that can be compared for inclusion or exclusion. The only requirement is that they must all implement the same interface containing a single method, and that that method returns a boolean.
interface Celsus_Test_Set_Interface {
public function acceptable();
}
class Celsus_Test_Set_Acceptable implements Celsus_Test_Set_Interface {
public function acceptable() {
return true;
}
}
class Celsus_Test_Set_Unacceptable implements Celsus_Test_Set_Interface {
public function acceptable() {
return false;
}
}
This is about as simple as you can get. The ‘Acceptable’ class represents an object for which the criteria is always passed. The ‘Unacceptable’ class represents one for which the criteria is always failed. In practise, these would be more complex. For dates, for example, you could test whether the supplied date met the “last day of the month criteria”. These simple classes will suffice for now though, and allow us to discuss the set mechanisms.
An intersection is true if every element of the set meets the criteria, and can be represented as follows:
class Celsus_Set_Operation_Intersect extends Celsus_Set_Operation_Abstract
protected function __call($method, $arguments) {
// First check that the method we are calling is defined in the specified interface.
if (!method_exists($this->_setInterface, $method)) {
throw new Celsus_Exception("$method is not defined in $this->_setInterface");
}
if (!$this->_elements) {
// We don't have any elements.
return false;
}
// Iterate through the elements, calling the supplied method name on each.
// If any return false, this intersection is false.
foreach ($this->_elements as $element) {
if (!call_user_func_array(array($element, $method), $arguments)) {
return false;
}
}
return true;
}
Because we don’t know what the name of the containing function is going to be, we use the magic __call method to trap any method calls. We test that method name against the interface we constructed the Intersection with, and if it exists, we can begin. If no elements have been added to the set, the intersection returns false. Finally, we iterate through each element in the set and test its comparer function. As an intersection, if any element fails its test, the intersection fails.
We can demonstrate this using our simple test cases above:
$intersection = new Celsus_Set_Operation_Intersection('Celsus_Test_Set_Interface');
$intersection->addElement(new Celsus_Test_Set_Acceptable());
$intersection->addElement(new Celsus_Test_Set_Acceptable());
var_dump($intersection->acceptable()); // True
$intersection2 = new Celsus_Set_Operation_Intersection('Celsus_Test_Set_Interface');
$intersection2->addElement(new Celsus_Test_Set_Acceptable());
$intersection2->addElement(new Celsus_Test_Set_Unacceptable());
var_dump($intersection->acceptable()); // False.
First, we construct the intersection, then we add some elements, and finally we test. Note that the method call is to ‘acceptable()’, which is the method specified in the interface used to construct the intersection.
Similar classes can be constructed for Union and Difference which work in much the same way. Set difference is a little more complex, however, as we have to define both a list of criteria to exclude, and a list of criteria to include.
But what if we want to combine these set operations, for more complex tests? As a further refinement we can allow the set operations to themselves include other set operations, which allows for complex constructions like the following:
$intersection = new Celsus_Set_Operation_Intersection('Celsus_Test_Set_Interface');
$elementA = new Celsus_Test_Set_Acceptable();
$elementB = new Celsus_Test_Set_Acceptable();
$elementC = new Celsus_Test_Set_Unacceptable();
$intersection->addElement($elementA);
$intersection->addElement($elementB);
$union = new Celsus_Set_Operation_Union('Celsus_Test_Set_Interface');
$union->addElement($intersection);
$union->addElement($elementC);
var_dump($intersection->acceptable()); // True.
Which translates to “(elementA AND elementB) OR elementC”. The flexible structure allows any combination of these, at any depth.
A full set of code and unit tests that implement all of the above is available here. If you have any questions on their use, let me know in the comments.
From here, the next step is to define a more useful set of criteria. Using dates as an example, we can arrange for readable code that quickly determines whether a date is in a given set of recurrence conditions:
$intersection = new Celsus_Set_Operation_Intersection('Celsus_Temporal_Expression_Interface');
$intersection->addElements(array($every_month, $last_day_of_the_month));
var_dump($intersection->includes('2010-01-31')); // True
var_dump($intersection->includes('2010-06-02')); // False
I’ll go through the steps on how to achieve that in another post.
Automatically Mounting WebDAV in Ubuntu with a FuseDAV Init Script
Jan 27th
I use BingoDisk as a backup and remote storage solution and wanted to use it as the destination for my daily backup of this blog and my development environment. BingoDisk is available via WebDAV which is nice and convenient, and there are 2 main options for Ubuntu, davfs2 and fusedav.
On the face of it, davfs2 is more convenient, as it allows you to automount quite easily using fstab. However, despite a number of attempts, I couldn’t convince it to allow me to write files remotely, only folders. It seems a number of people have had the same issue.
Fusedav, on the other hand, was much more reliable, writing files and folders without errors. It also seemed noticeably faster. Unfortunately, there didn’t seem to be any convenient way to mount the share automatically, and this would be important for my automated backup routine. To solve this, I put together a quick solution using a simple mount description file and an init script that gives me better control over the process. The idea and parts of the code are built on a solution I found by müzso. This goes one step further and enables it to run automatically at boot time.
Below are the steps I took, and the script I’m using. The steps are for Ubuntu, but should be easily modifiable to any flavour of Linux.
First, install fusedav.
sudo aptitude -y install fusedav
Next, we do a quick test to check that fusedav will indeed connect to our WebDAV share.
sudo /usr/bin/fusedav -u username -p password http://mydomain.bingodisk.com /mnt/bingo
We’re running fusedav in the foreground because it will be easier to kill that way once we’ve finished testing, so open a new terminal and check that you can connect.
cd /mnt/bingo ls -la
If all is well, the top level of your remote share should now be visible. I don’t like that you have to specify the password on the command line, but there doesn’t seem to be too much you can do about that at the moment.
Close the second terminal, and kill the fusedav process in the first by pressing Ctrl+c. Now, we’re going to create a directory and a simple mount description file that includes the URL, username and password. I’m going to call the file bingo, and later on this means that the share will be mounted at /mnt/bingo. The directory layout I went for is similar to how Apache and Nginx manage virtual hosts.
mkdir -p /etc/fusedav/mounts-available
Now edit /etc/fusedav/mounts-available/bingo as root using your favourite text editor and put the following in it. The 3 segments are the URL to the WebDAV location, username and password, separated by a space.
http://mydomain.bingodisk.com username password
Because this file contains your password in plaintext, you should secure it so that only the owner (root) can read it.
sudo chmod 600 /etc/fusedav/mounts-available/bingo
Next, the init script, which you can download from here. You need to make it executable.
cd /etc/init.d/ sudo wget http://jamietalbot.com/projects/fusedav/files/bingo sudo chmod +x bingo
You can take a look at the file in its entirety. It’s not especially complex but the highlights are:
NAME=bingo
You should alter bingo on this line to be the same as the name of the file you put in /etc/fusedav/mounts-available. This is the only change you have to make (and you only have to make it if you called your mount something other than bingo).
MOUNT_FILE=/etc/fusedav/mounts-available/$NAME MOUNT_POINT=/mnt/$NAME
These lines define where the mount description files are kept and where the share will be mounted.
set -- $(cat ${MOUNT_FILE})
URI=$1
USERNAME=$2
PASSWORD=$3
DAEMON_ARGS="-u $USERNAME -p $PASSWORD $URI $MOUNT_POINT"
This section splits the mount description into usable variables and sets the parameters for fusedav.
The script automatically creates a PID file which should help avoid any issues if you start the daemon more than once. It also starts fusedav as a background process, so it doesn’t interfere with your terminal.
Let’s check that the script works.
su - /etc/init.d/bingo start cd /mnt/bingo/ ls -la
Hopefully, at this point you should see the remote share, as before.
Finally, we can add the script to the default runlevel so that it starts when the server boots up.
sudo /usr/sbin/update-rc.d -f bingo defaults
And that should be that! If you need to stop the daemon use the following.
sudo /etc/init.d/bingo stop
You can easily add further shares if necessary with the following steps as root (assuming the new share name of bongo).
cp /etc/init.d/bingo /etc/init.d/bongo chmod +x /etc/init.d/bongo vim /etc/init.d/bongo :%s/bingo/bongo/g :wq cat 'http://bongo.remote.drive username password' > /etc/fusedav/mounts-available/bongo chmod 600 /etc/fusedav/mounts-available/bongo /usr/sbin/update-rc.d -f bongo defaults /etc/init.d/bongo start
At which point, /mnt/bongo should be ready and waiting.