Handling JSON Objects in CouchDB Native Erlang Views
Mar 18th
I’ve been working with CouchDB a fair bit in recent weeks and am really enjoying it so far. Once I got my head around how to structure views and take advantage of view collation, I found it to be far more expressive than I first thought.
I still have a couple of gripes, the largest one of which is that you can’t use a wildcard parameter at the beginning of your view keys, so if you need to get “items by user by category” and “items by category by user”, you need two views. I’m sure there are good architectural reasons for this, but for me it’s the one place where collation lets me down. For at least one of the solutions I’m working on, multiple views are a major problem, as even one takes up 120GB (and counting).
But, to the main point. Native Erlang views are now possible, and if you can create them, potentially significantly faster than Javascript ones. There are a couple of gotchas though, not least for me the handling of JSON objects.
We start with a document like this:
{
"_id": "36kem",
"_rev": "1-c895d5a55945a9898880bf870a3b3025",
"type": "usage",
"timestamp": [
"2010",
"02",
"28",
"23",
"10"
],
"data": [
{
"t": "E000005861",
"i": "232920",
"o": "2365730"
},
{
"t": "E000006504",
"i": "15784",
"o": "17786"
},
{
"t": "E000006505",
"i": "16661",
"o": "17786"
}
]
}
In reality there are thousands of entries in the data array, but this will do. Our aim is to emit one key-value pair for each item in the “data” field of each document of type “usage”. In Javascript this is pretty trivial. Erlang however, proves more of a challenge.
Based on pointers from the CouchDB Wiki, I started with:
fun ({Doc}) ->
case proplists:get_value(<<"type">>, Doc) of
<<"usage">> ->
Emit(proplists:get_value(<<"_id">>, Doc), null);
_ ->
ok
end
end.
and was very happy to see that work. Two things to note here: Don’t forget the {} around the Doc in the function definition or you’ll get strange errors, and; to get the value of a field in a document, you can use the standard proplists:get_value(<<"fieldname">>, Doc) construct. So far so good.
The main issue for me came with manipulating the “data” field. I didn’t actually want to emit null, but instead the “i” and “o” parts of the data field. First off, I tried:
lists:foreach(fun(Item) -> Emit(null, [proplists:get_value(<<"i">>, Item), proplists:get_value(<<"o">>, Item)]) end, proplists:get_value(<<"data">>, Doc)
But met with some (very long) errors. (Gripe number two – they could really do with humanising the Erlang crash dump.)
It took me quite a few attempts, including stripping it right back to confirm that I had an array to iterate and that each object does in fact contain an “i” and an “o” field, before I found the problem, which is this:
Even though Documents are defined within {} braces, and JSON objects within that definition are also defined within {} braces, you cannot access them the same way in an Erlang view.
proplists:get_value(<<"field">>, Doc) is fine for the document as a whole, but you can’t access JSON objects the same way. Bad assumption on my part. Luckily, the answer I got to another Stack Overflow question recently pointed the way.
To access the data we need to pattern match the components using the Erlang representation of a JSON object, like so:
{[{<<"t">>, TrackingID},{<<"i">>, In},{<<"o">>, Out}]} = Row
Ugly, hey?
Useful though, as it extracts the TrackingID, In and Out values all in one go, kind of like a list() statement on steroids.
With that in place, and a little more tidying up of the code, we arrive at:
fun({Doc}) ->
case proplists:get_value(<<"type">>, Doc) of
<<"usage">> ->
[Year, Month, Day, Hour, Minute | _] = proplists:get_value(<<"timestamp">>, Doc),
lists:foreach(fun(Row) ->
{[{<<"t">>, TrackingID},{<<"i">>, In},{<<"o">>, Out}]} = Row,
Emit([TrackingID, Year, Month, Day, Hour, Minute],[In, Out])
end, proplists:get_value(<<"data">>, Doc));
_ ->
ok
end
end.
That little beauty lets me query the usage of a service at any granularity over data from the last 7 years in a faster time than the browser can render it. Across an HTTP connection to a data source 1000km away. On development hardware.
CSS Technique: Morning Sunset
Mar 10th
After coming across an excellent article on generating full page images, I had the idea that it would be cool to blend between multiple images as a user scrolled down the page. The somewhat artistic conceit that a user could scroll a similar scene from morning to sunset, ironically came to me at sunset on Saturday and was finished by morning!
The effect is pretty simple to set up, though does require a specific bit of markup. We start with 2 images and set a z-index on them, such that the first one is in front of the second. With the full page image CSS rules, the first image fills the screen and the second is completely obscured.
<img id="morning" class="bg" style="z-index: 2;" src="morning.jpg" /> <img id="sunset" class="bg" style="z-index: 1;" src="sunset.jpg" /
For the CSS, In addition to the standard full page image rules, I add an additional higher z-index, plus a little bit of transparency This allows the content to sit on top of both of these images, and for the effect to be more pronounced.
div#content {
/* This is the only important rule */
/* We need our content to show up on top of the background */
position: relative;
z-index: 10;
/* Added some opacity to demonstrate the effect better */
opacity: 0.8;
filter: alpha(opacity=80);
}
So far, so what? We’ve arranged for a user to download an image he can’t see – not so good. The magic comes with a blending function tied to the scrollbar. The idea is that the top layer becomes more and more transparent as the user scrolls through the content.
In working with the Zend Framework, I’ve been getting to grips with Dojo and its supporting classes, so I was happy to see that Dijit had tools for getting the dimensions of the viewport. With this information, I was able to calculate the scroll ratio, which gave me a number ranging from 0 at the top of the page to 1 at the bottom.
Dojo also has a great style() function, which allows you to set opacity and have it “just work”, across all browsers, regardless of their non-standard filter() shennanigans. At the outset, the top layer image has an opacity of 1. Subtracting the scroll ratio from this allows it to be fully opaque at the top of the page, and fully transparent at the bottom.
dojo.subscribe("/window/scrolled", function(e){
// Calculate the scroll percentage, and adjust the opacity of the top layer, appropriately.
var vp = dijit.getViewport();
dojo.style("morning", {
"opacity": 1 - (vp.t / (document.documentElement.scrollHeight - vp.h))
});
});
There is some rate-limiting code going on to prevent the event firing continuously and slowing down the page. That came from a helpful Dojo Cookie by Peter Higgins over at Dojo Campus. I haven’t played fully with the rate limiting yet, but 50ms seemed to give a reasonable balance of subtle movement without overloading the page.
Degradation is variable – without Javascript, the user just sees the top image and there’s some overhead of the second image that is never seen. Without CSS, there’s more of a problem, as the images are inlined in the page, which pushes all the content down. If there is a full page image solution using background images only, I’d love to hear about it.
There’s a working, self-describing example of the effect so you can try it out for yourself. It works best in Chrome, with its superior Javascript handling, but works in all modern browsers to a reasonable degree.
The next step is to generalise the code, so that I can pass it an array of image URLs and have it automatically build the markup necessary to generate the effect. With more work on the blending function, I’ll be able to have it blend between multiple layers and potentially follow different rates – I might investigate Dojo curves for that.
I’m not aware of this technique being described anywhere else, but if there are other approaches to doing this, I’d be interested to see them.
Modelling Recurring Events in PHP
Feb 7th
In a previous article, I described how set operations could be modelled in PHP. With that foundation, we can begin to generate complex date criteria suitable for modelling recurring events.
There are a number of different kinds of date condition, which Martin Fowler terms “Temporal Expressions”. Typical temporal expressions include “Last Day in the Month”, “Nth Day of the Week” and “Repeats Yearly”. We can model each of these as a separate class and then combine them as necessary to provide a flexible architecture.
In order to take advantage of the set operations we defined previously, all of the classes must implement the same trivial interface:
interface Celsus_Temporal_Expression_Interface {
/**
* Determines whether the date specified is included in this temporal expression.
*
* @param string $date
*/
public function includes($date);
}
For our first example, let’s model conditions like “11th day of the month”. By using negative numbers, we can also use the same class to model “11th day from the end of the month”:
/**
* Handles scheduling rules like "11th day of the month".
*/
class Celsus_Temporal_Expression_DayOfMonth implements Celsus_Temporal_Expression_Interface {
/**
* The day of the month we are interested in. If the day is less than zero,
* it is interpreted as being from the end of the month.
*
* @var int
*/
private $_day;
public function __construct($day) {
$this->_day = $day;
}
public function includes($date) {
return $this->_day > 0 ? $this->_fromStartOfMonth($date) : $this->_fromEndOfMonth($date);
}
private function _fromStartOfMonth($date) {
return $this->_day == date('j', strtotime($date));
}
private function _fromEndOfMonth($date) {
$timestamp = strtotime($date);
return ((date('t', $timestamp) - date('j', $timestamp)) + 1) == abs($this->_day);
}
}
Usage is straightforward:
$tenth_of_the_month = new Celsus_Temporal_Expression_DayOfMonth(10);
var_dump($tenth_of_the_month->includes('2010-01-10')); // True
var_dump($tenth_of_the_month->includes('2010-01-07')); // False
$three_days_before_end_of_the_month = new Celsus_Temporal_Expression_DayOfMonth(-3);
var_dump($three_days_before_end_of_the_month->includes('2010-01-10')); // False
var_dump($three_days_before_end_of_the_month->includes('2010-01-28')); // True
A second type of query is of the form “every 3 months”. This is slightly more involved as it needs to also use a specified date as the base from which to start counting:
/**
* Handles scheduling rules like "every 3 months"
*/
class Celsus_Temporal_Expression_MonthsFromStart implements Celsus_Temporal_Expression_Interface {
/**
* The count specifying the interval of months we are interested in.
*
* @var int
*/
private $_count;
/**
* The start date of the sequence. The number is stored in ISO
* format, i.e 2000-12-31.
*
* @var string
*/
private $_start;
public function __construct($start, $count) {
$this->_start = $start;
$this->_count = $count;
}
public function includes($date) {
// Take the specified month, minus the start month, mod the interval. If it is
// zero then this date should be included.
return (0 == ((date('n', strtotime($date)) - date('n', strtotime($this->_start))) % $this->_count));
}
}
Again, usage is similar:
$every_two_months = new Celsus_Temporal_Expression_MonthsFromStart('2010-01-01', 2);
var_dump($every_two_months->includes('2010-03-01')); // True
var_dump($every_two_months->includes('2010-04-01')); // False
Now, making use of our set operations previously defined, we can combine them for more complex rules such as “the last day of every quarter”:
$last_day_of_month = new Celsus_Temporal_Expression_DayOfMonth(-1);
$every_three_months = new Celsus_Temporal_Expression_MonthsFromStart('2010-01-01', 3);
$last_day_of_quarter = new Celsus_Set_Operation_Intersection('Celsus_Temporal_Expression_Interface');
$last_day_of_quarter->addElements(array($every_three_months, $last_day_of_month));
var_dump($last_day_of_quarter->includes('2010-01-31')); // False;
var_dump($last_day_of_quarter->includes('2010-06-30')); // True;
In general, intersections are going to be the most useful set operation for this kind of use, but are not the only possibility. If we modify this to use a Union, we can test to see if dates are either the last day of the month, or within a month every 3 months from the start:
$last_day_of_month = new Celsus_Temporal_Expression_DayOfMonth(-1);
$every_three_months = new Celsus_Temporal_Expression_MonthsFromStart('2010-01-01', 3);
$last_of_month_or_every_three_months = new Celsus_Set_Operation_Union('Celsus_Temporal_Expression_Interface');
$last_day_of_quarter->addElements(array($every_three_months, $last_day_of_month));
var_dump($last_of_month_or_every_three_months->includes('2010-02-28')); // True;
var_dump($last_of_month_or_every_three_months->includes('2010-06-05')); // True;
A handful of temporal expressions can be downloaded here. Further ones are left as an exercise.
With an appropriate database schema for persistence of rules, such as Apple’s iCal reference, this provides a comprehensive method for defining complex date ranges and testing them quickly.
Fixing Mystique for Backtype Connect
Feb 6th
I’ve been playing around with WordPress themes and plugins and one that caught my eye was Backtype Connect, which aggregates discussions from around the web and places them as comments onto the original post. Unfortunately it wasn’t playing nicely with the Mystique theme I’m currently using, with no comments showing and the sidebar not appearing on posts where there was a backtype comment.
It turned out to be a minor error in the Mystique theme, which deals with how CSS classes are assigned to the sections that display the comments. Mystique has the following lines of code at the bottom of its mystique_comment_class function:
$class = join(" ", $classes);
echo apply_filters("comment_class", $class);
Mystique is trying to do the right thing here, by allowing plugins to manipulate the classes with the ‘comment_class’ hook, just like WordPress’ own function. However, the code is turning the $classes array into a $class string and passing that to the plugins. Backtype Connect (and presumably other plugins) expect those classes to be passed as an array leading to the following error:
Fatal error: [] operator not supported for strings
However, because this error occurs within an HTML tag it isn’t rendered correctly, which means it is hard to spot. Nevertheless, it causes PHP to stop executing immediately, leading to half-rendered pages.
Luckily, the fix is simple. Simply join the array after the filter has been called and everything works well. Future revisions of the theme will probably fix this, but in the meantime, you can use the WordPress plugin editor to replace the two lines above in mystique/lib/core.php with the following:
$classes = apply_filters("comment_class", $classes);
echo implode(" ", $classes);
And they should live together in harmony.
Grow Up Australia
Feb 4th
Finally, there seems to be some momentum behind the drive to create an R18+ classification for videogames, which would bring Australia inline with the rest of the developed world. Hopefully with the support of major stores like EBGames and a large grassroots campaign, this anachronism will be corrected soon.
Sign the petition and have your say. For my part, I submitted this:
The solution is simple. In a cinema, if an under-age person tries to watch a movie, they are prevented from doing so by the cinema staff. Presumably cinemas are punished if they don’t adhere to these rules. Likewise in licensed premises, and so it should be for videogame stores.
Classification labels are clear and obvious. If a parent makes a decision to purchase a game out of classification based on their knowledge of their own child, that is their own choice. If an adult is ill-informed enough to purchase an R18 game for a child for whom it isn’t suitable, the government can hardly be expected to prevent that. Personal responsibility has to start somewhere.
The Videogames industry is a mature industry with a diverse range of titles targeting a variety of age groups. It now generates more revenue than the film industry and deserves to be taken seriously.
Finally, for those who wish to get their hands on restricted games, it could hardly be easier. Their are a myriad of international sellers only too happy to import those games, making a mockery of the current regulations, and denying revenue to Australian merchants.
Hopefully, with enough voices, the government of the people can be persuaded to act.
Set Operations in PHP
Feb 4th
The standard set operations of Intersection, Union and Difference are well-understood and have a wide number of applications. Martin Fowler gives a neat example here, for modelling recurring events where there are multiple conditions.
“The last day of every second month” for example, can be represented by an intersection of two conditions, “last day of the month” and “every 2 months”. More generally, they can be applied to any situation where you want to do combinations of criteria using AND and OR.
To model this in PHP, we can define a number of Set Operation classes to which we can attach 2 or more elements for testing. The 3 classes, Union, Intersect and Difference will all be constructed in the same way, so constructor code can be placed in a common ancestor:
abstract class Celsus_Set_Operation_Abstract {
protected $_setInterface = null;
public function __construct($setInterface) {
if (!is_string($setInterface)) {
throw new Celsus_Exception("Interface must be a string.");
}
if (!interface_exists($setInterface, true)) {
throw new Celsus_Exception("$setInterface is not a valid interface.");
}
$this->_setInterface = $setInterface;
}
/**
* Adds an element to the set.
*
* @param StdClass $element
*/
public function addElement($element) {
if ($element instanceof $this->_setInterface || $element instanceof Celsus_Set_Operation_Abstract) {
$this->_elements[] = $element;
} else {
throw new Celsus_Exception("Element must implement $this->_setInterface or Set");
}
}
}
In order to test the elements of the set, we need to ensure that they are all testable in the same way. We could force a particular method name, such as ‘contains’ or ‘includes’, but it turns out we don’t have to. As long as they all have the same testing function, we don’t actually care what it is called, and library code should be as transparent as possible. Therefore we construct the set operator with an interface, which all set elements must implement.
The addElement function provides a method by which we can add elements to the set. If the element doesn’t meet the required interface, it is rejected.
Next, let’s define a set of classes that can be compared for inclusion or exclusion. The only requirement is that they must all implement the same interface containing a single method, and that that method returns a boolean.
interface Celsus_Test_Set_Interface {
public function acceptable();
}
class Celsus_Test_Set_Acceptable implements Celsus_Test_Set_Interface {
public function acceptable() {
return true;
}
}
class Celsus_Test_Set_Unacceptable implements Celsus_Test_Set_Interface {
public function acceptable() {
return false;
}
}
This is about as simple as you can get. The ‘Acceptable’ class represents an object for which the criteria is always passed. The ‘Unacceptable’ class represents one for which the criteria is always failed. In practise, these would be more complex. For dates, for example, you could test whether the supplied date met the “last day of the month criteria”. These simple classes will suffice for now though, and allow us to discuss the set mechanisms.
An intersection is true if every element of the set meets the criteria, and can be represented as follows:
class Celsus_Set_Operation_Intersect extends Celsus_Set_Operation_Abstract
protected function __call($method, $arguments) {
// First check that the method we are calling is defined in the specified interface.
if (!method_exists($this->_setInterface, $method)) {
throw new Celsus_Exception("$method is not defined in $this->_setInterface");
}
if (!$this->_elements) {
// We don't have any elements.
return false;
}
// Iterate through the elements, calling the supplied method name on each.
// If any return false, this intersection is false.
foreach ($this->_elements as $element) {
if (!call_user_func_array(array($element, $method), $arguments)) {
return false;
}
}
return true;
}
Because we don’t know what the name of the containing function is going to be, we use the magic __call method to trap any method calls. We test that method name against the interface we constructed the Intersection with, and if it exists, we can begin. If no elements have been added to the set, the intersection returns false. Finally, we iterate through each element in the set and test its comparer function. As an intersection, if any element fails its test, the intersection fails.
We can demonstrate this using our simple test cases above:
$intersection = new Celsus_Set_Operation_Intersection('Celsus_Test_Set_Interface');
$intersection->addElement(new Celsus_Test_Set_Acceptable());
$intersection->addElement(new Celsus_Test_Set_Acceptable());
var_dump($intersection->acceptable()); // True
$intersection2 = new Celsus_Set_Operation_Intersection('Celsus_Test_Set_Interface');
$intersection2->addElement(new Celsus_Test_Set_Acceptable());
$intersection2->addElement(new Celsus_Test_Set_Unacceptable());
var_dump($intersection->acceptable()); // False.
First, we construct the intersection, then we add some elements, and finally we test. Note that the method call is to ‘acceptable()’, which is the method specified in the interface used to construct the intersection.
Similar classes can be constructed for Union and Difference which work in much the same way. Set difference is a little more complex, however, as we have to define both a list of criteria to exclude, and a list of criteria to include.
But what if we want to combine these set operations, for more complex tests? As a further refinement we can allow the set operations to themselves include other set operations, which allows for complex constructions like the following:
$intersection = new Celsus_Set_Operation_Intersection('Celsus_Test_Set_Interface');
$elementA = new Celsus_Test_Set_Acceptable();
$elementB = new Celsus_Test_Set_Acceptable();
$elementC = new Celsus_Test_Set_Unacceptable();
$intersection->addElement($elementA);
$intersection->addElement($elementB);
$union = new Celsus_Set_Operation_Union('Celsus_Test_Set_Interface');
$union->addElement($intersection);
$union->addElement($elementC);
var_dump($intersection->acceptable()); // True.
Which translates to “(elementA AND elementB) OR elementC”. The flexible structure allows any combination of these, at any depth.
A full set of code and unit tests that implement all of the above is available here. If you have any questions on their use, let me know in the comments.
From here, the next step is to define a more useful set of criteria. Using dates as an example, we can arrange for readable code that quickly determines whether a date is in a given set of recurrence conditions:
$intersection = new Celsus_Set_Operation_Intersection('Celsus_Temporal_Expression_Interface');
$intersection->addElements(array($every_month, $last_day_of_the_month));
var_dump($intersection->includes('2010-01-31')); // True
var_dump($intersection->includes('2010-06-02')); // False
I’ll go through the steps on how to achieve that in another post.
Automatically Mounting WebDAV in Ubuntu with a FuseDAV Init Script
Jan 27th
I use BingoDisk as a backup and remote storage solution and wanted to use it as the destination for my daily backup of this blog and my development environment. BingoDisk is available via WebDAV which is nice and convenient, and there are 2 main options for Ubuntu, davfs2 and fusedav.
On the face of it, davfs2 is more convenient, as it allows you to automount quite easily using fstab. However, despite a number of attempts, I couldn’t convince it to allow me to write files remotely, only folders. It seems a number of people have had the same issue.
Fusedav, on the other hand, was much more reliable, writing files and folders without errors. It also seemed noticeably faster. Unfortunately, there didn’t seem to be any convenient way to mount the share automatically, and this would be important for my automated backup routine. To solve this, I put together a quick solution using a simple mount description file and an init script that gives me better control over the process. The idea and parts of the code are built on a solution I found by müzso. This goes one step further and enables it to run automatically at boot time.
Below are the steps I took, and the script I’m using. The steps are for Ubuntu, but should be easily modifiable to any flavour of Linux.
First, install fusedav.
sudo aptitude -y install fusedav
Next, we do a quick test to check that fusedav will indeed connect to our WebDAV share.
sudo /usr/bin/fusedav -u username -p password http://mydomain.bingodisk.com /mnt/bingo
We’re running fusedav in the foreground because it will be easier to kill that way once we’ve finished testing, so open a new terminal and check that you can connect.
cd /mnt/bingo ls -la
If all is well, the top level of your remote share should now be visible. I don’t like that you have to specify the password on the command line, but there doesn’t seem to be too much you can do about that at the moment.
Close the second terminal, and kill the fusedav process in the first by pressing Ctrl+c. Now, we’re going to create a directory and a simple mount description file that includes the URL, username and password. I’m going to call the file bingo, and later on this means that the share will be mounted at /mnt/bingo. The directory layout I went for is similar to how Apache and Nginx manage virtual hosts.
mkdir -p /etc/fusedav/mounts-available
Now edit /etc/fusedav/mounts-available/bingo as root using your favourite text editor and put the following in it. The 3 segments are the URL to the WebDAV location, username and password, separated by a space.
http://mydomain.bingodisk.com username password
Because this file contains your password in plaintext, you should secure it so that only the owner (root) can read it.
sudo chmod 600 /etc/fusedav/mounts-available/bingo
Next, the init script, which you can download from here. You need to make it executable.
cd /etc/init.d/ sudo wget http://jamietalbot.com/projects/fusedav/files/bingo sudo chmod +x bingo
You can take a look at the file in its entirety. It’s not especially complex but the highlights are:
NAME=bingo
You should alter bingo on this line to be the same as the name of the file you put in /etc/fusedav/mounts-available. This is the only change you have to make (and you only have to make it if you called your mount something other than bingo).
MOUNT_FILE=/etc/fusedav/mounts-available/$NAME MOUNT_POINT=/mnt/$NAME
These lines define where the mount description files are kept and where the share will be mounted.
set -- $(cat ${MOUNT_FILE})
URI=$1
USERNAME=$2
PASSWORD=$3
DAEMON_ARGS="-u $USERNAME -p $PASSWORD $URI $MOUNT_POINT"
This section splits the mount description into usable variables and sets the parameters for fusedav.
The script automatically creates a PID file which should help avoid any issues if you start the daemon more than once. It also starts fusedav as a background process, so it doesn’t interfere with your terminal.
Let’s check that the script works.
su - /etc/init.d/bingo start cd /mnt/bingo/ ls -la
Hopefully, at this point you should see the remote share, as before.
Finally, we can add the script to the default runlevel so that it starts when the server boots up.
sudo /usr/sbin/update-rc.d -f bingo defaults
And that should be that! If you need to stop the daemon use the following.
sudo /etc/init.d/bingo stop
You can easily add further shares if necessary with the following steps as root (assuming the new share name of bongo).
cp /etc/init.d/bingo /etc/init.d/bongo chmod +x /etc/init.d/bongo vim /etc/init.d/bongo :%s/bingo/bongo/g :wq cat 'http://bongo.remote.drive username password' > /etc/fusedav/mounts-available/bongo chmod 600 /etc/fusedav/mounts-available/bongo /usr/sbin/update-rc.d -f bongo defaults /etc/init.d/bongo start
At which point, /mnt/bongo should be ready and waiting.