<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Stoat - Where? &#187; JSON</title>
	<atom:link href="http://jamietalbot.com/tag/json/feed/" rel="self" type="application/rss+xml" />
	<link>http://jamietalbot.com</link>
	<description>Adventures in Engrish</description>
	<lastBuildDate>Sat, 29 Oct 2011 05:14:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Using Multiple Start and End Keys for CouchDB Views</title>
		<link>http://jamietalbot.com/2010/03/24/using-multiple-start-and-end-keys-for-couchdb-views/</link>
		<comments>http://jamietalbot.com/2010/03/24/using-multiple-start-and-end-keys-for-couchdb-views/#comments</comments>
		<pubDate>Wed, 24 Mar 2010 00:55:57 +0000</pubDate>
		<dc:creator>Jamie Talbot</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[CouchDB]]></category>
		<category><![CDATA[Erlang]]></category>
		<category><![CDATA[JSON]]></category>
		<category><![CDATA[LinkedIn]]></category>

		<guid isPermaLink="false">http://jamietalbot.com/?p=69</guid>
		<description><![CDATA[Describing how to patch CouchDB 0.10.1 to query views with multiple start and end key ranges in one request.]]></description>
			<content:encoded><![CDATA[<p>CouchDB view collation is great and only has one real drawback that has caused me any real pain &#8211; the inability to handle queries that need to be parameterised by more than one dimension.  These are suprisingly common, including problems such as &#8220;find me posts in Category A in March&#8221;.</p>
<p>This can be handled with a function that emits keys like:</p>
<pre class="brush: jscript; title: ; notranslate">
[&quot;Category A&quot;, &quot;2010&quot;, &quot;03&quot;, &quot;Post 1&quot;]
[&quot;Category B&quot;, &quot;2010&quot;, &quot;03&quot;, &quot;Post 2&quot;]
[&quot;Category A&quot;, &quot;2010&quot;, &quot;03&quot;, &quot;Post 3&quot;]
</pre>
<p>and then use:</p>
<pre class="brush: jscript; title: ; notranslate">
startkey=[&quot;Category A&quot;,&quot;2010&quot;,&quot;03&quot;]&amp;endkey=[&quot;Category A&quot;,&quot;2010&quot;, &quot;03&quot;,{}]
</pre>
<p>However find its reciprocal &#8220;All March posts regardless of category&#8221; is problematic.  You can&#8217;t do:</p>
<pre class="brush: jscript; title: ; notranslate">
startkey=[*,&quot;2010&quot;, &quot;03&quot;]&amp;endkey=[*,&quot;2010&quot;, &quot;03&quot;,{}]
</pre>
<p>where * (or _,  or nil, or pass) would represent &#8220;all&#8221;.</p>
<p>To handle this, there are currently only 2 options; design a new view with the the key components ordered differently, such that they emit:</p>
<pre class="brush: jscript; title: ; notranslate">
[&quot;2010&quot;, &quot;03&quot;, &quot;Post 1&quot;]
[&quot;2010&quot;, &quot;03&quot;, &quot;Post 2&quot;]
[&quot;2010&quot;, &quot;03&quot;, &quot;Post 3&quot;]
</pre>
<p>or, make multiple connections to the database like</p>
<pre class="brush: jscript; title: ; notranslate">
startkey=[&quot;Category A&quot;,&quot;2010&quot;,&quot;03&quot;]&amp;endkey=[&quot;Category A&quot;,&quot;2010&quot;, &quot;03&quot;,{}]
startkey=[&quot;Category B&quot;,&quot;2010&quot;,&quot;03&quot;]&amp;endkey=[&quot;Category B&quot;,&quot;2010&quot;, &quot;03&quot;,{}]
startkey=[&quot;Category C&quot;,&quot;2010&quot;,&quot;03&quot;]&amp;endkey=[&quot;Category C&quot;,&quot;2010&quot;, &quot;03&quot;,{}]
</pre>
<p>where you have a query for each category.</p>
<p>Neither approach is particularly satisfactory.  On a recent particular problem set, a single view would be many hundreds of gigabytes of data, and while space is cheap, it&#8217;s not that cheap.  Additional views were not an option.  That same data set contained around 2000 different categories (or their equivalent) and 2000 connections for a particular query seemed excessive.</p>
<p>Since 0.9, Couch has had a way of passing multiple keys to a query in the post body of a view request.  Unfortunately, this only supported precise keys, not start-end key ranges.  There has been <a href="https://issues.apache.org/jira/browse/COUCHDB-523">a ticket</a> in the issue tracker to add this additional support since October, but it&#8217;s classed as a minor priority and nothing had been done on it.  So I decided to have a crack.</p>
<p>On the face of it, it seems like a fairly simple change, only affecting the HTTP View Erlang module.  On the other hand, I&#8217;ve probably written about 100 lines of Erlang in my life and never looked at the CouchDB code before, so it&#8217;s entirely possible I&#8217;ve done something wrong.  Regardless, the following is a simple solution that appears to work correctly.</p>
<p>The output_map_view and output_reduce_view functions already had the ability to handle start and end keys, but they were being artificially restricted to treat the supplied keys and both start and end.  I used Erlang&#8217;s pattern matching to make this a little richer:</p>
<pre class="brush: erlang; title: ; notranslate">
case Key of
	{[{&lt;&lt;&quot;startkey&quot;&gt;&gt;,StartKey},{&lt;&lt;&quot;endkey&quot;&gt;&gt;,EndKey}]} -&gt;
		nil;
	_ -&gt;
		StartKey = Key,
		EndKey = Key
end
</pre>
<p>and then passing those new variables in the appropriate place.  This seemed to work well.  I presume that the Keys parameter is processed just like multiple connections, and then the results aggregated, because the results are exactly the same as a call with the same parameters in the query string.</p>
<p>One final change was that group_level=X is mysteriously disallowed for Multikey queries.  I took a punt and removed this restriction and it all seemed to work fine.  I can only guess that this restriction didn&#8217;t make sense when you had to pass precise keys.</p>
<p>I then query using the following as POST data:</p>
<pre class="brush: jscript; title: ; notranslate">
{
    &quot;keys&quot;: [
        {
            &quot;startkey&quot;: [&quot;Category A&quot;,&quot;2010&quot;,&quot;03&quot;],
            &quot;endkey&quot;: [&quot;Category A&quot;,&quot;2010&quot;,&quot;03&quot;,{}]
        },
        {
            &quot;startkey&quot;: [&quot;Category B&quot;,&quot;2010&quot;,&quot;03&quot;],
            &quot;endkey&quot;: [&quot;Category B&quot;,&quot;2010&quot;,&quot;03&quot;,{}]
        }
    ]
}
</pre>
<p>With this solution, I&#8217;m able to query 2000 services simultaneously, group them at any level I like, and get back the results at the lightning speed I&#8217;ve become accustomed to.</p>
<p>One small caveat: If I want to get back keys across non-contiguous blocks like this:</p>
<pre class="brush: jscript; title: ; notranslate">
startkey=[&quot;Category A&quot;,&quot;2010&quot;,&quot;03&quot;]&amp;endkey=[&quot;Category A&quot;,&quot;2010&quot;, &quot;03&quot;,{}]
startkey=[&quot;Category A&quot;,&quot;2010&quot;,&quot;06&quot;]&amp;endkey=[&quot;Category A&quot;,&quot;2010&quot;, &quot;06&quot;,{}]
startkey=[&quot;Category B&quot;,&quot;2010&quot;,&quot;03&quot;]&amp;endkey=[&quot;Category B&quot;,&quot;2010&quot;, &quot;03&quot;,{}]
startkey=[&quot;Category B&quot;,&quot;2010&quot;,&quot;06&quot;]&amp;endkey=[&quot;Category B&quot;,&quot;2010&quot;, &quot;06&quot;,{}]
</pre>
<p>To get all posts in Category A and B in March and June, I can.  However, if I have a reduce function and group at level 1, I still end up with 4 rows, 2 for Category A, 2 for Category B.  I think this is because the queries are being run independently, without reference to the other.  To do a full aggregation across time periods (for example to get the total number of posts by category in March and June), I&#8217;d still need to do a client aggregation on the resulting data-set.  This may or may not be a big problem for you; it&#8217;s certainly something I can live with.</p>
<p>The CouchDB issue lives <a href="https://issues.apache.org/jira/browse/COUCHDB-523">here</a>, and the patch to 0.10.1 lives <a href="https://issues.apache.org/jira/secure/attachment/12439618/multi_start_end_key.diff">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://jamietalbot.com/2010/03/24/using-multiple-start-and-end-keys-for-couchdb-views/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Handling JSON Objects in CouchDB Native Erlang Views</title>
		<link>http://jamietalbot.com/2010/03/18/handling-json-objects-in-couchdb-native-erlang-views/</link>
		<comments>http://jamietalbot.com/2010/03/18/handling-json-objects-in-couchdb-native-erlang-views/#comments</comments>
		<pubDate>Thu, 18 Mar 2010 05:41:02 +0000</pubDate>
		<dc:creator>Jamie Talbot</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[CouchDB]]></category>
		<category><![CDATA[Erlang]]></category>
		<category><![CDATA[JSON]]></category>
		<category><![CDATA[LinkedIn]]></category>

		<guid isPermaLink="false">http://jamietalbot.com/?p=63</guid>
		<description><![CDATA[Using CouchDB Erlang views can be confusing when your documents contain JSON objects.  Understanding how Couch processes JSON internally and making use of Erlang pattern matching smoothes the way.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been working with CouchDB a fair bit in recent weeks and am really enjoying it so far.  Once I got my head around how to structure views and take advantage of view collation, I found it to be far more expressive than I first thought.</p>
<p>I still have a couple of gripes, the largest one of which is that you can&#8217;t use a wildcard parameter at the beginning of your view keys, so if you need to get &#8220;items by user by category&#8221; and &#8220;items by category by user&#8221;, you need two views.  I&#8217;m sure there are good architectural reasons for this, but for me it&#8217;s the one place where collation lets me down.  For at least one of the solutions I&#8217;m working on, multiple views are a major problem, as even one takes up 120GB (and counting).</p>
<p>But, to the main point.  Native Erlang views are now possible, and if you can create them, potentially significantly faster than Javascript ones.  There are a couple of gotchas though, not least for me the handling of JSON objects.</p>
<p>We start with a document like this:</p>
<pre class="brush: jscript; title: ; notranslate">
{
   &quot;_id&quot;: &quot;36kem&quot;,
   &quot;_rev&quot;: &quot;1-c895d5a55945a9898880bf870a3b3025&quot;,
   &quot;type&quot;: &quot;usage&quot;,
   &quot;timestamp&quot;: [
       &quot;2010&quot;,
       &quot;02&quot;,
       &quot;28&quot;,
       &quot;23&quot;,
       &quot;10&quot;
   ],
   &quot;data&quot;: [
       {
           &quot;t&quot;: &quot;E000005861&quot;,
           &quot;i&quot;: &quot;232920&quot;,
           &quot;o&quot;: &quot;2365730&quot;
       },
       {
           &quot;t&quot;: &quot;E000006504&quot;,
           &quot;i&quot;: &quot;15784&quot;,
           &quot;o&quot;: &quot;17786&quot;
       },
       {
           &quot;t&quot;: &quot;E000006505&quot;,
           &quot;i&quot;: &quot;16661&quot;,
           &quot;o&quot;: &quot;17786&quot;
       }
   ]
}
</pre>
<p>In reality there are thousands of entries in the data array, but this will do.  Our aim is to emit one key-value pair for each item in the &#8220;data&#8221; field of each document of type &#8220;usage&#8221;.  In Javascript this is pretty trivial.  Erlang however, proves more of a challenge.</p>
<p>Based on pointers from the <a href="http://wiki.apache.org/couchdb/EnableErlangViews">CouchDB Wiki</a>, I started with:</p>
<pre class="brush: erlang; title: ; notranslate">
fun ({Doc}) -&gt;
  case proplists:get_value(&lt;&lt;&quot;type&quot;&gt;&gt;, Doc) of
    &lt;&lt;&quot;usage&quot;&gt;&gt; -&gt;
      Emit(proplists:get_value(&lt;&lt;&quot;_id&quot;&gt;&gt;, Doc), null);
    _ -&gt;
      ok
  end
end.
</pre>
<p>and was very happy to see that work.  Two things to note here:  Don&#8217;t forget the {} around the Doc in the function definition or you&#8217;ll get strange errors, and; to get the value of a field in a document, you can use the standard proplists:get_value(<<"fieldname">>, Doc) construct.  So far so good.</p>
<p>The main issue for me came with manipulating the &#8220;data&#8221; field.  I didn&#8217;t actually want to emit null, but instead the &#8220;i&#8221; and &#8220;o&#8221; parts of the data field.  First off, I tried:</p>
<pre class="brush: erlang; title: ; notranslate">
  lists:foreach(fun(Item) -&gt; Emit(null, [proplists:get_value(&lt;&lt;&quot;i&quot;&gt;&gt;, Item), proplists:get_value(&lt;&lt;&quot;o&quot;&gt;&gt;, Item)]) end, proplists:get_value(&lt;&lt;&quot;data&quot;&gt;&gt;, Doc)
</pre>
<p>But met with some (very long) errors.  (Gripe number two &#8211; they could really do with humanising the Erlang crash dump.)</p>
<p>It took me quite a few attempts, including stripping it right back to confirm that I had an array to iterate and that each object does in fact contain an &#8220;i&#8221; and an &#8220;o&#8221; field, before I found the problem, which is this:</p>
<p><strong>Even though Documents are defined within {} braces, and JSON objects within that definition are also defined within {} braces, you cannot access them the same way in an Erlang view.</strong></p>
<p>proplists:get_value(<<"field">>, Doc) is fine for the document as a whole, but you can&#8217;t access JSON objects the same way.  Bad assumption on my part.  Luckily, the answer I got to another <a href="http://stackoverflow.com/questions/2414811/emit-tuples-from-erlang-views-in-couchdb/2422631#2422631">Stack Overflow question</a> recently pointed the way.</p>
<p>To access the data we need to pattern match the components using the Erlang representation of a JSON object, like so:</p>
<pre class="brush: erlang; title: ; notranslate">
  {[{&lt;&lt;&quot;t&quot;&gt;&gt;, TrackingID},{&lt;&lt;&quot;i&quot;&gt;&gt;, In},{&lt;&lt;&quot;o&quot;&gt;&gt;, Out}]} = Row
</pre>
<p>Ugly, hey? <img src='http://jamietalbot.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />   Useful though, as it extracts the TrackingID, In and Out values all in one go, kind of like a <code>list()</code> statement on steroids.</p>
<p>With that in place, and a little more tidying up of the code, we arrive at:</p>
<pre class="brush: erlang; title: ; notranslate">
fun({Doc}) -&gt;
	case proplists:get_value(&lt;&lt;&quot;type&quot;&gt;&gt;, Doc) of
		&lt;&lt;&quot;usage&quot;&gt;&gt; -&gt;
			[Year, Month, Day, Hour, Minute | _] = proplists:get_value(&lt;&lt;&quot;timestamp&quot;&gt;&gt;, Doc),
                        lists:foreach(fun(Row) -&gt;
                                {[{&lt;&lt;&quot;t&quot;&gt;&gt;, TrackingID},{&lt;&lt;&quot;i&quot;&gt;&gt;, In},{&lt;&lt;&quot;o&quot;&gt;&gt;, Out}]} = Row,
				Emit([TrackingID, Year, Month, Day, Hour, Minute],[In, Out])
			end, proplists:get_value(&lt;&lt;&quot;data&quot;&gt;&gt;, Doc));
		_ -&gt;
			ok
	end
end.
</pre>
<p>That little beauty lets me query the usage of a service at any granularity over data from the last 7 years in a faster time than the browser can render it.  Across an HTTP connection to a data source 1000km away.  On development hardware.</p>
]]></content:encoded>
			<wfw:commentRss>http://jamietalbot.com/2010/03/18/handling-json-objects-in-couchdb-native-erlang-views/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>

