• 18 . 03 . 10
  • Using CouchDB Erlang views can be confusing when your documents contain JSON objects. Understanding how Couch processes JSON internally and making use of Erlang pattern matching smoothes the way.

  • Tags

    , , ,

  • StumbleUpon

Handling JSON Objects in CouchDB Native Erlang Views

I’ve been working with CouchDB a fair bit in recent weeks and am really enjoying it so far. Once I got my head around how to structure views and take advantage of view collation, I found it to be far more expressive than I first thought.

I still have a couple of gripes, the largest one of which is that you can’t use a wildcard parameter at the beginning of your view keys, so if you need to get “items by user by category” and “items by category by user”, you need two views. I’m sure there are good architectural reasons for this, but for me it’s the one place where collation lets me down. For at least one of the solutions I’m working on, multiple views are a major problem, as even one takes up 120GB (and counting).

But, to the main point. Native Erlang views are now possible, and if you can create them, potentially significantly faster than Javascript ones. There are a couple of gotchas though, not least for me the handling of JSON objects.

We start with a document like this:

[javascript]
{
"_id": "36kem",
"_rev": "1-c895d5a55945a9898880bf870a3b3025",
"type": "usage",
"timestamp": [
"2010",
"02",
"28",
"23",
"10"
],
"data": [
{
"t": "E000005861",
"i": "232920",
"o": "2365730"
},
{
"t": "E000006504",
"i": "15784",
"o": "17786"
},
{
"t": "E000006505",
"i": "16661",
"o": "17786"
}
]
}
[/javascript]

In reality there are thousands of entries in the data array, but this will do. Our aim is to emit one key-value pair for each item in the “data” field of each document of type “usage”. In Javascript this is pretty trivial. Erlang however, proves more of a challenge.

Based on pointers from the CouchDB Wiki, I started with:

[erlang]
fun ({Doc}) ->
case proplists:get_value(<<"type">>, Doc) of
<<"usage">> ->
Emit(proplists:get_value(<<"_id">>, Doc), null);
_ ->
ok
end
end.
[/erlang]

and was very happy to see that work. Two things to note here: Don’t forget the {} around the Doc in the function definition or you’ll get strange errors, and; to get the value of a field in a document, you can use the standard proplists:get_value(<<"fieldname">>, Doc) construct. So far so good.

The main issue for me came with manipulating the “data” field. I didn’t actually want to emit null, but instead the “i” and “o” parts of the data field. First off, I tried:

[erlang]
lists:foreach(fun(Item) -> Emit(null, [proplists:get_value(<<"i">>, Item), proplists:get_value(<<"o">>, Item)]) end, proplists:get_value(<<"data">>, Doc)
[/erlang]

But met with some (very long) errors. (Gripe number two – they could really do with humanising the Erlang crash dump.)

It took me quite a few attempts, including stripping it right back to confirm that I had an array to iterate and that each object does in fact contain an “i” and an “o” field, before I found the problem, which is this:

Even though Documents are defined within {} braces, and JSON objects within that definition are also defined within {} braces, you cannot access them the same way in an Erlang view.

proplists:get_value(<<"field">>, Doc) is fine for the document as a whole, but you can’t access JSON objects the same way. Bad assumption on my part. Luckily, the answer I got to another Stack Overflow question recently pointed the way.

To access the data we need to pattern match the components using the Erlang representation of a JSON object, like so:

[erlang]
{[{<<"t">>, TrackingID},{<<"i">>, In},{<<"o">>, Out}]} = Row
[/erlang]

Ugly, hey? 🙂 Useful though, as it extracts the TrackingID, In and Out values all in one go, kind of like a list() statement on steroids.

With that in place, and a little more tidying up of the code, we arrive at:

[erlang]
fun({Doc}) ->
case proplists:get_value(<<"type">>, Doc) of
<<"usage">> ->
[Year, Month, Day, Hour, Minute | _] = proplists:get_value(<<"timestamp">>, Doc),
lists:foreach(fun(Row) ->
{[{<<"t">>, TrackingID},{<<"i">>, In},{<<"o">>, Out}]} = Row,
Emit([TrackingID, Year, Month, Day, Hour, Minute],[In, Out])
end, proplists:get_value(<<"data">>, Doc));
_ ->
ok
end
end.
[/erlang]

That little beauty lets me query the usage of a service at any granularity over data from the last 7 years in a faster time than the browser can render it. Across an HTTP connection to a data source 1000km away. On development hardware.

7 responses to “Handling JSON Objects in CouchDB Native Erlang Views”

  1. majelbstoat says:

    Stoat – Where?: Handling JSON Objects in CouchDB Native Erlang Views http://bit.ly/96le6E

    This comment was originally posted on Twitter

  2. oliverdaff says:

    Handling JSON Objects in CouchDB Native Erlang Views http://bit.ly/bkk2V9

    This comment was originally posted on Twitter

  3. bogdan says:

    That is very indeed insightful. It saved from a lot of trouble.

    Thank you,

  4. Rosswart says:

    Thank you for that post. It helped me to figure out how to emit JSON objects from my Erlang view.

    It seems that CouchDB wraps property lists in a tuple to flag them as JSON objects:
    {Proplist} = {[{<<"this">>, “prop”}, {<<"list">>, “is a JSON object}]}

    So you can use proplists:get_value/2 also deep down in your JSON document.
    In my example I use Get to make the code more readable.

    fun({Doc}) ->
    Get = fun(K, L) -> proplists:get_value(K, L) end,
    case Get(<<"type">>, Doc) of
    <<"usage">> ->
    [Year, Month, Day, Hour, Minute | _] = Get(<<"timestamp">>, Doc),
    lists:foreach(
    fun({Row}) ->
    TrackingID = Get(<<"t">>, Row),
    In = Get(<<"i">>, Row),
    Out = Get(<<"o">>, Row),
    Emit([TrackingID, Year, Month, Day, Hour, Minute],[In, Out])
    end,
    Get(<<"data">>, Doc));
    _ ->
    ok
    end
    end.

    If you want to emit JSON objects from an Erlang view just do for example

    Emit(<<"Example">>, {[{<<"prop">>, “list example”}]})

    Rosswart

    • Rosswart says:

      You can place the unwrapping in the helper function
      to further reduce the code:

      Get = fun(K, {P}) -> proplists:get_value(K, P) end

  5. […] due to a few helpful resources on the internet, I was able to get a simple view going. A typical record looks like […]

  6. […] There isn't a lot written on or off the internet on writing CouchDB map/reduce functions in erlang. First you have to enable the native erlang view server, which requires a small change to the local.ini. This post on the CouchDB wiki also provides the first example code for doing map/reduce with Erlang. Next, should check out this echolibre blog post on writing erlang map/reduce functions. Another useful post is this one on handling JSON objects in the erlang views. […]

Leave a Reply to Rosswart Cancel reply

Your email address will not be published. Required fields are marked *