<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jonathan Hedley &#187; semantic analysis</title>
	<atom:link href="http://jonathanhedley.com/tag/semantic-analysis/feed" rel="self" type="application/rss+xml" />
	<link>http://jonathanhedley.com</link>
	<description>Winning at everything so that you don&#039;t have to.</description>
	<lastBuildDate>Wed, 18 Aug 2010 10:25:32 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Review: Programming Collective Intelligence</title>
		<link>http://jonathanhedley.com/articles/2008/05/programming-collective-intelligence</link>
		<comments>http://jonathanhedley.com/articles/2008/05/programming-collective-intelligence#comments</comments>
		<pubDate>Sun, 04 May 2008 06:44:29 +0000</pubDate>
		<dc:creator>Jonathan Hedley</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[data mining]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[reading list]]></category>
		<category><![CDATA[review]]></category>
		<category><![CDATA[semantic analysis]]></category>
		<category><![CDATA[web development]]></category>

		<guid isPermaLink="false">http://jonathanhedley.com/?p=132</guid>
		<description><![CDATA[Programming Collective Intelligence is a book about applying data mining techniques to analyse collections of data. There is submerged information in Ebay prices, in Facebook profile networks, in collections of movie reviews, in news sites, in the stockmarket; this book by Toby Segaran shows ways to extract, visualise, understand, and predict that information.]]></description>
			<content:encoded><![CDATA[<div class="left-pull thumb"><a href="http://www.amazon.com/dp/0596529325?tag=904351-20&amp;camp=0&amp;creative=0&amp;linkCode=as1&amp;creativeASIN=0596529325&amp;adid=0D9S71JN6Q4F6ZSA3V5R&amp;"><img src="http://static.jonathanhedley.com/2008/05/programming-collective-intelligence2.jpg" border="0" alt="programming collective intelligence" width="208" height="274" /></a></div>
<p><a href="http://www.amazon.com/dp/0596529325?tag=904351-20&amp;camp=0&amp;creative=0&amp;linkCode=as1&amp;creativeASIN=0596529325&amp;adid=0D9S71JN6Q4F6ZSA3V5R&amp;">Programming Collective Intelligence</a> is a book about applying data mining techniques to analyse collections of data. There is submerged information in Ebay prices, in Facebook profile networks, in collections of movie reviews, in news sites, in the stockmarket; this book by <span class="ts">Toby Segaran</span> shows ways to extract, visualise, understand, and predict that information.</p>
<p>Each chapter explains and explores a different data mining algorithm, and builds up a working example in Python, while presenting different methods and parameters of the implementation. I hadn&#8217;t really worked with Python before, but found the code easy to follow, and picked up some interesting Python idioms that I haven&#8217;t seen in other languages before. Chapters end with a set of exercises to follow that build your understanding.</p>
<p>As you follow the examples you build up a reasonably generic code base that allows you to swap in and out different implementations, and reuse previous code to add to new applications.</p>
<p>The examples use live examples from the web: sites like Ebay, Facebook, and Yahoo Finance, and this makes the book more interesting and the results more visceral than some other books on the subject which use more contrived or obscure examples. Even though there is a strong web (or web 2.0) focus on the examples, the methods and the understanding is useful for a whole range of applications.</p>
<p>Some of the topics covered:</p>
<ul>
<li>Bayesian classifiers to detect spam, or to file news articles into site sections</li>
<li>Hierarchical and k-means clustering to discover groups of similar items in massive sets</li>
<li>Euclidiean distance, Pearson Correlation Coefficient, Tanimoto Coefficient: ways to measure the distance (or difference) between items</li>
<li>Neural networks to predict user behaviour and improve search result ordering</li>
<li>Optimisation methods like hill climbing, simulated annealing, and genetic algorithms</li>
<li>Non-negative matrix factorization</li>
<li>Support vector machines and kernel methods to go where linear regression can&#8217;t</li>
</ul>
<p>I found it exciting to read &#8212; it&#8217;s one of those books that give you a whole bunch of new ideas for things to build as you read it. The presentation is very good: no background is assumed, and it doesn&#8217;t talk down to those more experienced.</p>
<p>Recommended.</p>
<div class="rhs">
<div class="ts"><a href="http://blog.kiwitobes.com/">The author&#8217;s blog</a></div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://jonathanhedley.com/articles/2008/05/programming-collective-intelligence/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Estimating salaries by comparing job ads</title>
		<link>http://jonathanhedley.com/links/2008/04/estimating-salaries-by-comparing-job-ads</link>
		<comments>http://jonathanhedley.com/links/2008/04/estimating-salaries-by-comparing-job-ads#comments</comments>
		<pubDate>Wed, 16 Apr 2008 23:14:23 +0000</pubDate>
		<dc:creator>Jonathan Hedley</dc:creator>
				<category><![CDATA[Links]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[semantic analysis]]></category>

		<guid isPermaLink="false">http://jonathanhedley.com/?p=99</guid>
		<description><![CDATA[This is clever: job classified aggregator Indeed estimates a job&#8217;s salary when it lacks that data by comparing the text to ads that do have a salary posted.
When people search for jobs, they want to put in a salary floor. They don&#8217;t want to see jobs that don&#8217;t at least pay a certain amount. Problem [...]]]></description>
			<content:encoded><![CDATA[<p>This is clever: job classified aggregator <a href="http://www.indeed.com/">Indeed</a> estimates a job&#8217;s salary when it lacks that data by comparing the text to ads that do have a salary posted.</p>
<blockquote class="intelligent"><p>When people search for jobs, they want to put in a salary floor. They don&#8217;t want to see jobs that don&#8217;t at least pay a certain amount. Problem is most job listings on the Internet don&#8217;t include salaries.</p>
<p>What Indeed did was built a system that estimates salaries on all jobs.</p></blockquote>
<blockquote><p>We use a proprietary methodology based on an analysis of similar job listings that include salaries. We start by extracting salaries from all job listings containing this information &#8211; about a fifth of the total &#8211; and then estimate salaries for the rest.</p></blockquote>
<p>Example: <a href="http://www.indeed.com/jobs?q=CFO+%24200%2C000&amp;l=new+york%2C+ny">CFO jobs in NYC that pay more than $200k per year</a></p>
<div class="rhs">
<p class="intelligent">Fred Wilson: <a href="http://avc.blogs.com/a_vc/2008/04/adding-intellig.html">Adding intelligence to search</a></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://jonathanhedley.com/links/2008/04/estimating-salaries-by-comparing-job-ads/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
