• 18 . 03 . 10
  • Using CouchDB Erlang views can be confusing when your documents contain JSON objects. Understanding how Couch processes JSON internally and making use of Erlang pattern matching smoothes the way.

  • Tags

    , , ,

  • StumbleUpon

Handling JSON Objects in CouchDB Native Erlang Views

I’ve been working with CouchDB a fair bit in recent weeks and am really enjoying it so far. Once I got my head around how to structure views and take advantage of view collation, I found it to be far more expressive than I first thought.

I still have a couple of gripes, the largest one of which is that you can’t use a wildcard parameter at the beginning of your view keys, so if you need to get “items by user by category” and “items by category by user”, you need two views. I’m sure there are good architectural reasons for this, but for me it’s the one place where collation lets me down. For at least one of the solutions I’m working on, multiple views are a major problem, as even one takes up 120GB (and counting).

But, to the main point. Native Erlang views are now possible, and if you can create them, potentially significantly faster than Javascript ones. There are a couple of gotchas though, not least for me the handling of JSON objects.

We start with a document like this:

{
   "_id": "36kem",
   "_rev": "1-c895d5a55945a9898880bf870a3b3025",
   "type": "usage",
   "timestamp": [
       "2010",
       "02",
       "28",
       "23",
       "10"
   ],
   "data": [
       {
           "t": "E000005861",
           "i": "232920",
           "o": "2365730"
       },
       {
           "t": "E000006504",
           "i": "15784",
           "o": "17786"
       },
       {
           "t": "E000006505",
           "i": "16661",
           "o": "17786"
       }
   ]
}

In reality there are thousands of entries in the data array, but this will do. Our aim is to emit one key-value pair for each item in the “data” field of each document of type “usage”. In Javascript this is pretty trivial. Erlang however, proves more of a challenge.

Based on pointers from the CouchDB Wiki, I started with:

fun ({Doc}) ->
  case proplists:get_value(<<"type">>, Doc) of
    <<"usage">> ->
      Emit(proplists:get_value(<<"_id">>, Doc), null);
    _ ->
      ok
  end
end.

and was very happy to see that work. Two things to note here: Don’t forget the {} around the Doc in the function definition or you’ll get strange errors, and; to get the value of a field in a document, you can use the standard proplists:get_value(<<"fieldname">>, Doc) construct. So far so good.

The main issue for me came with manipulating the “data” field. I didn’t actually want to emit null, but instead the “i” and “o” parts of the data field. First off, I tried:

  lists:foreach(fun(Item) -> Emit(null, [proplists:get_value(<<"i">>, Item), proplists:get_value(<<"o">>, Item)]) end, proplists:get_value(<<"data">>, Doc)

But met with some (very long) errors. (Gripe number two – they could really do with humanising the Erlang crash dump.)

It took me quite a few attempts, including stripping it right back to confirm that I had an array to iterate and that each object does in fact contain an “i” and an “o” field, before I found the problem, which is this:

Even though Documents are defined within {} braces, and JSON objects within that definition are also defined within {} braces, you cannot access them the same way in an Erlang view.

proplists:get_value(<<"field">>, Doc) is fine for the document as a whole, but you can’t access JSON objects the same way. Bad assumption on my part. Luckily, the answer I got to another Stack Overflow question recently pointed the way.

To access the data we need to pattern match the components using the Erlang representation of a JSON object, like so:

  {[{<<"t">>, TrackingID},{<<"i">>, In},{<<"o">>, Out}]} = Row

Ugly, hey? :) Useful though, as it extracts the TrackingID, In and Out values all in one go, kind of like a list() statement on steroids.

With that in place, and a little more tidying up of the code, we arrive at:

fun({Doc}) ->
	case proplists:get_value(<<"type">>, Doc) of
		<<"usage">> ->
			[Year, Month, Day, Hour, Minute | _] = proplists:get_value(<<"timestamp">>, Doc),
                        lists:foreach(fun(Row) -> 
                                {[{<<"t">>, TrackingID},{<<"i">>, In},{<<"o">>, Out}]} = Row,
				Emit([TrackingID, Year, Month, Day, Hour, Minute],[In, Out])
			end, proplists:get_value(<<"data">>, Doc));
		_ ->
			ok
	end
end.

That little beauty lets me query the usage of a service at any granularity over data from the last 7 years in a faster time than the browser can render it. Across an HTTP connection to a data source 1000km away. On development hardware.