<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Disqus - Latest Comments for chl</title><link>http://disqus.com/by/chl/</link><description></description><atom:link href="http://disqus.com/chl/comments.rss" rel="self"></atom:link><language>en</language><lastBuildDate>Fri, 23 Apr 2010 13:29:24 -0000</lastBuildDate><item><title>Re: live.hackr : Gmahte Wiesn</title><link>http://hackr.de/2010/04/23/gmahte-wiesn#comment-46263586</link><description>&lt;p&gt;Wobei sich das aktuell eher als "Club Like" denn als wahrliches "Open Like" darstellt ...&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">chl</dc:creator><pubDate>Fri, 23 Apr 2010 13:29:24 -0000</pubDate></item><item><title>Re: An Exercise in Species Barcoding</title><link>http://norvig.com/ibol.html#comment-6856925</link><description>&lt;p&gt;I experimented a bit with n-gram histograms, using, as Ryszard suggested, cosine as similarity measure (instead of Jensen-Shannon divergence as used in the paper mentioned above). After filtering out all n-grams containing either "N" or "-" (to mirror Peter's Levenshtein distance adaption), I get the following correlations (edit distance/cosine) and distinct n-gram counts:&lt;/p&gt;&lt;p&gt;&lt;code&gt;n - r ---- #   &lt;br&gt;1. -0.6722 4    &lt;br&gt;2. -0.8715 16   &lt;br&gt;3. -0.9088 64   &lt;br&gt;4. -0.9383 256  &lt;br&gt;5. -0.9649 967  &lt;br&gt;6. -0.9754 2926 &lt;br&gt;7. -0.9839 6240 &lt;br&gt;8. -0.9882 10299&lt;br&gt;9. -0.9900 14202&lt;br&gt;10 -0.9907 18413&lt;br&gt;11 -0.9913 22515&lt;br&gt;12 -0.9916 26555&lt;br&gt;13 -0.9917 31257&lt;br&gt;14 -0.9917 36081&lt;br&gt;15 -0.9915 40961&lt;/code&gt;&lt;/p&gt;&lt;p&gt;Using 7-grams and a cutoff value of 0.81, the neighbourhoods match in 1246 of 1248 cases; calculation of the similarity matrix takes ~11s (thanks, NumPy!).&lt;/p&gt;&lt;p&gt;Maybe it's obvious, well-known or both, but I wouldn't have thought that n-grams correlate with edit distance so strongly (at least in this particular case ;-).&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">chl</dc:creator><pubDate>Tue, 03 Mar 2009 22:13:57 -0000</pubDate></item><item><title>Re: An Exercise in Species Barcoding</title><link>http://norvig.com/ibol.html#comment-6856001</link><description>&lt;p&gt;I think all the machinery you want is in &lt;a href="http://ibol.py" rel="nofollow noopener" target="_blank" title="ibol.py"&gt;ibol.py&lt;/a&gt;.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">chl</dc:creator><pubDate>Tue, 03 Mar 2009 21:43:06 -0000</pubDate></item><item><title>Re: An Exercise in Species Barcoding</title><link>http://norvig.com/ibol.html#comment-6624935</link><description>&lt;p&gt;Comparing those procedures for measuring distance would be _very_ interesting, indeed!&lt;/p&gt;&lt;p&gt;As for n-grams, maybe this paper is of interest to you:&lt;/p&gt;&lt;p&gt;Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions&lt;br&gt;Gregory E. Sims, Se-Ran Juna, Guohong A. Wua, and Sung-Hou Kima&lt;br&gt;&lt;a href="http://www.pnas.org/content/early/2009/02/02/0813249106.full.pdf+html" rel="nofollow noopener" target="_blank" title="http://www.pnas.org/content/early/2009/02/02/0813249106.full.pdf+html"&gt;http://www.pnas.org/content...&lt;/a&gt;&lt;/p&gt;&lt;p&gt;From the abstract:&lt;/p&gt;&lt;p&gt;"For comparison of whole-genome (genic + nongenic) sequences, multiple sequence alignment of a few selected genes is not appropriate. One approach is to use an alignment-free method in which feature (or l-mer) frequency proﬁles (FFP) of whole genomes are used for comparison—a variation of a text or book comparison method, using word frequency profiles."&lt;/p&gt;&lt;p&gt;"[...] to illustrate the utility of the method, phylogenies are reconstructed from concatenated mammalian intronic genomes; the FFP derived intronic genome topologies for each l within the optimal range are all very similar. The topology agrees with the established mammalian phylogeny revealing that intron regions contain a similar level of phylogenic signal as do coding regions."&lt;/p&gt;&lt;p&gt;If simple n-gram-based methods turn out to produce interesting results for segments as short as those used in DNA barcoding, that'd be quite exciting (to me, at least ;-).&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">chl</dc:creator><pubDate>Wed, 25 Feb 2009 17:51:18 -0000</pubDate></item><item><title>Re: An Exercise in Species Barcoding</title><link>http://norvig.com/ibol.html#comment-6616801</link><description>&lt;p&gt;Given that it's the "real task", a few more details on the clustering algorithm would be much appreciated.&lt;/p&gt;&lt;p&gt;&lt;i&gt;Update:&lt;/i&gt; Sorry, I didn't realize that all the clustering details I could ever ask for actually are in the Python script mentioned:&lt;/p&gt;&lt;p&gt;&lt;a href="http://norvig.com/ibol.py" rel="nofollow noopener" target="_blank" title="http://norvig.com/ibol.py"&gt;http://norvig.com/ibol.py&lt;/a&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">chl</dc:creator><pubDate>Wed, 25 Feb 2009 13:09:45 -0000</pubDate></item><item><title>Re: Gonzo Reader</title><link>http://gonzoreader.110mb.com/credits.html#comment-2401761</link><description>&lt;p&gt;This is probably the best application on all of the internets. Massive KTHX and mega success!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">chl</dc:creator><pubDate>Wed, 17 Sep 2008 13:51:07 -0000</pubDate></item><item><title>Re: Parallax</title><link>http://well-formed-data.net/archives/153/parallax#comment-51532055</link><description>&lt;p&gt;I'm all for entity shift! Or maybe entity pivoting? ;-)&lt;/p&gt;&lt;p&gt;Parallax is a fascinating demonstration for sure; however, a powerful exploration (and query formulation) tool like that makes it all the more obvious how Freebase (still) has a Herculean task in front of them when it comes to data quality &amp;amp; coverage.&lt;/p&gt;&lt;p&gt;Maybe sponsoring DBpedia wouldn't be a bad idea ...&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">chl</dc:creator><pubDate>Thu, 14 Aug 2008 20:51:51 -0000</pubDate></item><item><title>Re: Checking Out Google Trends For Websites</title><link>http://avc.com/2008/06/checking-out-go/#comment-725003</link><description>&lt;p&gt;The services are reporting different things: Google Trends "Daily Unique Visitors", the other two services "Monthly Unique Visitors", it seems (if I'm not misinterpreting labels like "People Counts - Monthly").&lt;/p&gt;&lt;p&gt;The huge disparity could be explained (for example) by &lt;a href="http://indeed.com" rel="nofollow noopener" target="_blank" title="indeed.com"&gt;indeed.com&lt;/a&gt; having a large number of non-repeat visitors (day-to-day). The effect would be especially pronounced when comparing to sites like Twitter, where a big fraction of one day's visitors will visit again the day after.&lt;/p&gt;&lt;p&gt;Update: Whoops, I should actually read the comments before posting ...&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">chl</dc:creator><pubDate>Sun, 22 Jun 2008 10:01:36 -0000</pubDate></item><item><title>Re: Delicious 2.0: We&amp;#039;ve Been Waiting 9 Months</title><link>http://techcrunch.com/2008/06/09/delicious-20-weve-been-waiting-9-months/#comment-71747385</link><description>&lt;p&gt;As a fairly heavy user, I'm not exactly holding my breath for 2.0. &lt;a href="http://del.icio.us" rel="nofollow noopener" target="_blank" title="del.icio.us"&gt;del.icio.us&lt;/a&gt; stumbled upon an awesome mix of functionality, and frankly I fear that any (heavy) tampering would rather make it worse.&lt;/p&gt;&lt;p&gt;There's obviously a lot of innovation potential on top of &lt;a href="http://del.icio.us" rel="nofollow noopener" target="_blank" title="del.icio.us"&gt;del.icio.us&lt;/a&gt;; but does that really have to come from Yahoo?&lt;/p&gt;&lt;p&gt;The one thing I'd really like to see is &lt;a href="http://del.icio.us" rel="nofollow noopener" target="_blank" title="del.icio.us"&gt;del.icio.us&lt;/a&gt; uncrippling its API. Currently, the request limits are draconian.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">chl</dc:creator><pubDate>Mon, 09 Jun 2008 09:08:47 -0000</pubDate></item><item><title>Re: live.hackr : TTYtter</title><link>http://hackr.de/2007/10/18/ttytter#comment-934105811</link><description>&lt;p&gt;wo isser hin?&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">chl</dc:creator><pubDate>Thu, 18 Oct 2007 19:56:29 -0000</pubDate></item><item><title>Re: Visual tools for the socio–semantic web</title><link>http://well-formed-data.net/archives/96/visual-tools-for-the-socio%e2%80%93semantic-web#comment-51531778</link><description>&lt;p&gt;Congratulations! Great title, great ideas, great design.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">chl</dc:creator><pubDate>Mon, 11 Jun 2007 13:06:03 -0000</pubDate></item><item><title>Re: Wil Wheaton via Eventful Demand</title><link>http://radar.oreilly.com/2006/05/wil-wheaton-via-eventful-deman.html#comment-587143472</link><description>&lt;p&gt;I'm sure the folks at eventful&lt;b&gt;.org&lt;/b&gt; are pretty happy about all the traffic you're sending them ;-)&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">chl</dc:creator><pubDate>Sat, 20 May 2006 22:03:17 -0000</pubDate></item><item><title>Re: Oooo, I like this Idea</title><link>http://beta.searchblog.net/archives/2006/01/oooo-i-like-this-idea.php#comment-509102623</link><description>&lt;p&gt;Greg is (as usual) spot on - "Fast Multiresolution Image Querying" is the paper that guided the implementation (which is, by the way, one of my all-time favourites, and recommended reading for anyone with only a passing interest in image retrieval).&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;I first came across it when someone (I think &lt;a href="http://usefulinc.com/edd/blog" rel="nofollow noopener" target="_blank" title="http://usefulinc.com/edd/blog"&gt;Edd Dumbill&lt;/a&gt;) linked to &lt;a href="http://www.imgseek.net" rel="nofollow noopener" target="_blank" title="http://www.imgseek.net"&gt;imgSeek&lt;/a&gt; a couple of years back; imgSeek is a standalone image management application that incorporates the same algorithm. retrievr is a new implementation in pure &lt;a href="http://www.python.org" rel="nofollow noopener" target="_blank" title="http://www.python.org"&gt;Python&lt;/a&gt; (plus a host of great libraries: &lt;a href="http://www.pythonware.com/products/pil/" rel="nofollow noopener" target="_blank" title="http://www.pythonware.com/products/pil/"&gt;PIL&lt;/a&gt;, &lt;a href="http://effbot.org/zone/draw-agg.htm" rel="nofollow noopener" target="_blank" title="http://effbot.org/zone/draw-agg.htm"&gt;aggdraw&lt;/a&gt; and &lt;a href="http://www.stsci.edu/resources/software_hardware/numarray" rel="nofollow noopener" target="_blank" title="http://www.stsci.edu/resources/software_hardware/numarray"&gt;numarray&lt;/a&gt;).&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;In my experience, the results are usually fairly good, sometimes even stunning - considering the artistic sophistication most of us are able to come up with (gallery forthcoming); and in the cases they're not so stellar, they are at least entertaining ;-) But clearly, the approach has its limits.&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;One thing to keep in mind is that it doesn't do object/face/text recognition of any kind, so if you're drawing an outline sketch of a chair (or &lt;a href="http://www.researchbuzz.org/2005/12/searching_flickr_by_drawing_a.shtml" rel="nofollow noopener" target="_blank" title="http://www.researchbuzz.org/2005/12/searching_flickr_by_drawing_a.shtml"&gt;corporate logos&lt;/a&gt; like Tara Calishain has tried), it almost certainly won't get you one back (except your index only contains images of chairs). It helps to think of it as matching the most pronounced slabs of colors. Another thing to know is that there's currently no way to specify the aspect ratio, so you have to rescale the image in your head (things that are close to the borders of the image you're thinking of should be close to the borders of your sketches), but that's really just a missing feature of the drawing flashlet than an inherent problem. Sometimes it also helps to _remove_ detail instead of adding it. And finally, the index covers only about 85k of Flickr's &lt;a href="http://www.flickr.com/explore/interesting/" rel="nofollow noopener" target="_blank" title="http://www.flickr.com/explore/interesting/"&gt;"most interesting"&lt;/a&gt; images at the moment (I didn't want to use up even more of their resources before checking back with them; it's fantastic enough that Flickr isn't imposing any up-front limits on API usage like &lt;a href="http://www.google.com" rel="nofollow noopener" target="_blank" title="http://www.google.com"&gt;most everyone else&lt;/a&gt; is doing).&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;In a way, I see retrievr less as a "search" tool than an "exploration" tool, and it seems to work very well for that.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">chl</dc:creator><pubDate>Mon, 02 Jan 2006 23:44:57 -0000</pubDate></item><item><title>Re: Oooo, I like this Idea</title><link>http://battellemedia.com/archives/2006/01/oooo_i_like_this_idea.php#comment-335325293</link><description>&lt;p&gt;Greg is (as usual) spot on - "Fast Multiresolution Image Querying" is the paper that guided the implementation (which is, by the way, one of my all-time favourites, and recommended reading for anyone with only a passing interest in image retrieval).&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;I first came across it when someone (I think &lt;a href="http://usefulinc.com/edd/blog" rel="nofollow noopener" target="_blank" title="http://usefulinc.com/edd/blog"&gt;Edd Dumbill&lt;/a&gt;) linked to &lt;a href="http://www.imgseek.net" rel="nofollow noopener" target="_blank" title="http://www.imgseek.net"&gt;imgSeek&lt;/a&gt; a couple of years back; imgSeek is a standalone image management application that incorporates the same algorithm. retrievr is a new implementation in pure &lt;a href="http://www.python.org" rel="nofollow noopener" target="_blank" title="http://www.python.org"&gt;Python&lt;/a&gt; (plus a host of great libraries: &lt;a href="http://www.pythonware.com/products/pil/" rel="nofollow noopener" target="_blank" title="http://www.pythonware.com/products/pil/"&gt;PIL&lt;/a&gt;, &lt;a href="http://effbot.org/zone/draw-agg.htm" rel="nofollow noopener" target="_blank" title="http://effbot.org/zone/draw-agg.htm"&gt;aggdraw&lt;/a&gt; and &lt;a href="http://www.stsci.edu/resources/software_hardware/numarray" rel="nofollow noopener" target="_blank" title="http://www.stsci.edu/resources/software_hardware/numarray"&gt;numarray&lt;/a&gt;).&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;In my experience, the results are usually fairly good, sometimes even stunning - considering the artistic sophistication most of us are able to come up with (gallery forthcoming); and in the cases they're not so stellar, they are at least entertaining ;-) But clearly, the approach has its limits.&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;One thing to keep in mind is that it doesn't do object/face/text recognition of any kind, so if you're drawing an outline sketch of a chair (or &lt;a href="http://www.researchbuzz.org/2005/12/searching_flickr_by_drawing_a.shtml" rel="nofollow noopener" target="_blank" title="http://www.researchbuzz.org/2005/12/searching_flickr_by_drawing_a.shtml"&gt;corporate logos&lt;/a&gt; like Tara Calishain has tried), it almost certainly won't get you one back (except your index only contains images of chairs). It helps to think of it as matching the most pronounced slabs of colors. Another thing to know is that there's currently no way to specify the aspect ratio, so you have to rescale the image in your head (things that are close to the borders of the image you're thinking of should be close to the borders of your sketches), but that's really just a missing feature of the drawing flashlet than an inherent problem. Sometimes it also helps to _remove_ detail instead of adding it. And finally, the index covers only about 85k of Flickr's &lt;a href="http://www.flickr.com/explore/interesting/" rel="nofollow noopener" target="_blank" title="http://www.flickr.com/explore/interesting/"&gt;"most interesting"&lt;/a&gt; images at the moment (I didn't want to use up even more of their resources before checking back with them; it's fantastic enough that Flickr isn't imposing any up-front limits on API usage like &lt;a href="http://www.google.com" rel="nofollow noopener" target="_blank" title="http://www.google.com"&gt;most everyone else&lt;/a&gt; is doing).&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;In a way, I see retrievr less as a "search" tool than an "exploration" tool, and it seems to work very well for that.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">chl</dc:creator><pubDate>Mon, 02 Jan 2006 18:44:57 -0000</pubDate></item><item><title>Re: Blog category tags too cumbersome</title><link>http://www.phildawes.net/blog/2005/01/27/blog-category-tags-too-cumbersome/#comment-2752955</link><description>&lt;p&gt;right on. I've been using a blog/tag thing for about 2 months now (internally), and it sure rocks.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">chl</dc:creator><pubDate>Thu, 27 Jan 2005 14:42:15 -0000</pubDate></item></channel></rss>