<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Graham Nott Technology Blog &#187; Solr</title>
	<atom:link href="http://www.grahamnott.com/category/software-development/solr/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.grahamnott.com</link>
	<description></description>
	<lastBuildDate>Wed, 19 Sep 2012 18:27:12 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Government of Canada Open Data</title>
		<link>http://www.grahamnott.com/2012/03/data_gc_ca/</link>
		<comments>http://www.grahamnott.com/2012/03/data_gc_ca/#comments</comments>
		<pubDate>Sat, 17 Mar 2012 00:06:12 +0000</pubDate>
		<dc:creator>graham</dc:creator>
				<category><![CDATA[Open Data]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[canada]]></category>
		<category><![CDATA[dataset]]></category>
		<category><![CDATA[government]]></category>
		<category><![CDATA[open data]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[solr]]></category>

		<guid isPermaLink="false">http://www.grahamnott.com/?p=73</guid>
		<description><![CDATA[Who loves data? ... I can't hear you! Who loves data? Everyone loves data, and the Government has lots of it. In this case, I'm referring to the Government of Canada. Celebrating its one year anniversary, the Open Data Pilot Project is, well, one  year old now.  I would like to thank David Eaves for [...]]]></description>
			<content:encoded><![CDATA[<p>Who loves data? ... I can't hear you! Who loves data? Everyone loves data, and the Government has lots of it. In this case, I'm referring to the Government of Canada. Celebrating its one year anniversary, the <a title="Open Data Pilot Project" href="http://www.data.gc.ca" target="_blank">Open Data Pilot Project</a> is, well, one  year old now.  I would like to thank David Eaves for promoting it via his blog (<em>Sharing ideas about data.gc.ca</em>; <a title="Sharing ideas about data.gc.ca" href="http://eaves.ca/2012/03/15/sharing-ideas-about-data-gc-ca" target="_blank">http://eaves.ca/2012/03/15/sharing-ideas-about-data-gc-ca</a>).</p>
<p>This is a very large topic; it is too large to cover in a single blog post. Instead, I'll share what I've done yesterday and today playing with the dataset named "<a title="Data.gc.ca Portal Catalogue" href="http://www.data.gc.ca/default.asp?lang=En&amp;n=5175A6F0-1&amp;xsl=datacataloguerecord&amp;xml=5175A6F0-61E1-49FC-8E5D-0BBCDAF5969D&amp;formid=C4C5C7F1-BFA6-4FF6-B4A0-C164CB2060F7&amp;showfromadmin=1&amp;readonly=true" target="_blank">Data.gc.ca Portal Catalogue</a>".</p>
<p>I'm sure it will be fixed soon enough, but the file itself seemed to have character encoding problems. It is in the "latin1" character set, but has unicode characters crammed into it. This caused me hours of grief over two days to diagnose and fix it into something that displayed mostly correct. If you are a MySQL user (or any SQL), this basic utf-8 SQL file should save you the time of trying to import and convert it. <a title="data_gc_ca-all_datasets.zip" href="http://www.grahamnott.com/data_gc_ca/data/data_gc_ca-all_datasets.zip" target="_blank">data_gc_ca-all_datasets.zip</a> (830KB zip file, 11.5MB expanded)</p>
<p>SQL format was only an intermediary for my plans. As a test, I wanted to load this simple dataset into <strong>Solr</strong> for searching. You can try the result here:</p>
<div style="border: solid 1px black; padding-left: 10px;">
<h1>Government of Canada Open Data Solr Search* v0.1</h1>
<h3><a title="Government of Canada Open Data Solr Search* v0.1" href="http://www.grahamnott.com/data_gc_ca/" target="_blank">http://www.grahamnott.com/data_gc_ca/</a></h3>
</div>
<p>This does little more than the current Open Data search so far. In order of complexity and infeasibility, the improvements could include:</p>
<ul>
<li><strong>Search filtering by category, date, etc.</strong></li>
<ul>
<li>It would be similar to filtering auction in Ebay, by clicking links on the left side to refine your search</li>
</ul>
<li><strong>Advanced search</strong></li>
<ul>
<li>You could search only the French Description field, for example</li>
</ul>
<li><strong>Modifications to relevance searching (by popularity), spellchecking, etc.</strong></li>
<ul>
<li>Popular datasets would appear at the top of the search results, for example</li>
</ul>
<li><strong>Full text search over the entire contents of every dataset</strong></li>
<ul>
<li>You can search "Silvicultural", for example, and find dataset <a href="http://www.data.gc.ca/default.asp?lang=En&amp;n=5175A6F0-1&amp;xsl=datacataloguerecord&amp;metaxsl=datacataloguerecord&amp;formid=645C42BA-5DC3-412E-AA3C-E37995354CB8" target="_blank">645C42BA-5DC3-412E-AA3C-E37995354CB8</a>, but searching "Clearcutting" (or "<em>coupe à blan</em>c") would also find this dataset, <strong>if</strong> the .csv file for the dataset was keyword indexed too</li>
</ul>
</ul>
<p>Being able to filter by category or date is not possible yet, since the fields do not exist. It would be nice to have  "Date Added" field to indicate when the dataset first appeared on Data.gc.ca, so that newly added datasets could be noticed easily. That would be in addition to adding "Subject Area",  "Creator", and many other fields.</p>
<p>Other issues and questions arise working with this data, which I'll save that for another time.</p>
<p>If you try it out the MySQL table or Solr search, please leave comments with your feedback, questions or errors.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.grahamnott.com/2012/03/data_gc_ca/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Blog search using Solr</title>
		<link>http://www.grahamnott.com/2011/10/blog-search-using-solr/</link>
		<comments>http://www.grahamnott.com/2011/10/blog-search-using-solr/#comments</comments>
		<pubDate>Tue, 25 Oct 2011 19:31:44 +0000</pubDate>
		<dc:creator>graham</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Web]]></category>
		<category><![CDATA[Wordpress]]></category>

		<guid isPermaLink="false">http://www.grahamnott.com/?p=51</guid>
		<description><![CDATA[Update January 16, 2012: Thanks to those who attended the presentation at Wordcamp. I have uploaded a revised set of presentation slides in PDF format that includes screen captures of the demonstration portions. If anyone has more questions, please leave comments on this post and I will try and answer them. blog-search-using-solr (PDF; 1.8 MB) [...]]]></description>
			<content:encoded><![CDATA[<p><strong><a href="http://www.grahamnott.com/wp-content/uploads/2011/10/250-speaking.png"><img class="size-full wp-image-55 alignleft" style="margin-right: 5px; display: inline; float: left;" title="250-speaking" src="http://www.grahamnott.com/wp-content/uploads/2011/10/250-speaking.png" alt="I'm speaking at WordCamp Victoria - January 14, 2012 - University of Victoria" width="250" height="250" /></a></strong></p>
<p><strong>Update January 16, 2012</strong>: Thanks to those who attended the presentation at Wordcamp. I have uploaded a revised set of presentation slides in PDF format that includes screen captures of the demonstration portions. If anyone has more questions, please leave comments on this post and I will try and answer them. <a title="Blog search using Solr presentation PDF" href="http://www.grahamnott.com/wp-content/uploads/2011/10/blog-search-using-solr.pdf" target="_blank">blog-search-using-solr</a> (PDF; 1.8 MB)</p>
<p><strong>Update January 2012</strong>:  I am pleased to be presenting "Blog Search Using Solr" at <a title="Wordcamp Victoria 2012" href="http://2012.victoria.wordcamp.org/" target="_blank">Wordcamp Victoria 2012</a>. As of today there may still be tickets available, so hurry and get yours now if you haven't already. See you there!</p>
<p><strong>Original topic description:</strong> If you discover the default search in WordPress is too basic, you may benefit from installing a Solr for WordPress plugin. Enabling the power of the <a href="http://lucene.apache.org/">Lucene</a> searching engine implemented in the <a href="http://lucene.apache.org/solr/">Solr</a> server sounds daunting, even to me. The search technology used by Digg, Netflix and Acquia (and others: <a href="http://wiki.apache.org/solr/PublicServers">http://wiki.apache.org/solr/PublicServers</a>) could be yours. As a result of using it, you will discover whether the advantages outweigh the disadvantages. Features like faceted search, keyword highlighting, and more are just the start. Once you see the plugin in action, you will understand the untapped search potential it provides.</p>
<p>You will find this blog using Solr for WordPress right now, and it is still a work in progress (so hopefully it still works when you try it).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.grahamnott.com/2011/10/blog-search-using-solr/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
