<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Disqus - Latest Comments for jelsas</title><link>http://disqus.com/by/jelsas/</link><description></description><atom:link href="http://disqus.com/jelsas/comments.rss" rel="self"></atom:link><language>en</language><lastBuildDate>Fri, 23 Apr 2010 16:11:43 -0000</lastBuildDate><item><title>Re: Yahoo LETOR Challenge upload format confusion</title><link>http://windowoffice.tumblr.com/post/543592794#comment-46297544</link><description>&lt;p&gt;we're currently operating in stealth mode.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Fri, 23 Apr 2010 16:11:43 -0000</pubDate></item><item><title>Re: Jeremy Barnes - Learning to Rank Challenge: Yahoo Misses the Point</title><link>http://www.barneso.com//2010/03/16/Yahoo_misses_the_point.html#comment-40172669</link><description>&lt;p&gt;Great post.  Excellent points.&lt;/p&gt;&lt;p&gt;I think this does highlight a real difference in the approach to research between the more traditionalist IR community and the ML community.  As an IR researcher, I'm interested in which queries my algorithm performs better or worse on and why.  What aspects of those queries or documents are different or unique and how can I understand and make use of that algorithmically.  An ML research looks at this approach and just sees feature engineering.  As am ML researcher, I want a huge set of features, a labeled dataset, and an objective function.  The goal is simply to maximize the objective function on held out data, without much regard for the semantics of the features.  This competition certainly takes the latter approach.&lt;/p&gt;&lt;p&gt;One clarification - this dataset does contain graded relevance levels (0 = non-relevant... 4 = perfect)&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Wed, 17 Mar 2010 09:44:14 -0000</pubDate></item><item><title>Re: http://windowoffice.tumblr.com/post/450669754</title><link>http://windowoffice.tumblr.com/post/450669754#comment-40018593</link><description>&lt;p&gt;Jeff - I agree there should be some way to get around this by chunking the data.  But, there are a couple issues - first, how to represent links across chunks?  and second, do the elegant compression algorithms supported by the java package fall apart when you can't store the data as one contiguous array?&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Tue, 16 Mar 2010 09:27:29 -0000</pubDate></item><item><title>Re: http://windowoffice.tumblr.com/post/426350580</title><link>http://windowoffice.tumblr.com/post/426350580#comment-39959358</link><description>&lt;p&gt;bingo.  Red = manually identified bad assessment, blue = a single assessor who did most of my HITs. &lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Mon, 15 Mar 2010 20:30:46 -0000</pubDate></item><item><title>Re: Relevance assessment with MTurk &amp;amp; statAP</title><link>http://windowoffice.tumblr.com/post/439466906#comment-38989451</link><description>&lt;p&gt;Doesn't that require multiple labels &amp;amp; a gold standard?  I have neither.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Thu, 11 Mar 2010 06:58:09 -0000</pubDate></item><item><title>Re: Google Buzz vs. Twitter &amp;amp; why Buzz might be a huge success.</title><link>http://windowoffice.tumblr.com/post/383828765#comment-36419538</link><description>&lt;p&gt;Buzz is really all about the API and the data.   I'm sure many buzz apps will appear shortly.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Thu, 25 Feb 2010 07:20:00 -0000</pubDate></item><item><title>Re: window office - Got the wrong Bob?</title><link>http://windowoffice.tumblr.com/post/212449840#comment-20033935</link><description>&lt;p&gt;Thanks Vitor -- I've found that the best way to encourage commenting on my blog is to post slightly inaccurate information :)&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Wed, 14 Oct 2009 07:11:58 -0000</pubDate></item><item><title>Re: Confmaster sucks</title><link>http://windowoffice.tumblr.com/post/120003393#comment-10648212</link><description>&lt;p&gt;all uploads finally went through OK.  Not all of the authors got confirmation emails.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Tue, 09 Jun 2009 08:28:04 -0000</pubDate></item><item><title>Re: Confmaster sucks</title><link>http://windowoffice.tumblr.com/post/120003393#comment-10625827</link><description>&lt;p&gt;340K &lt;br&gt;If that's too fat, then we've got real problems.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Mon, 08 Jun 2009 16:26:46 -0000</pubDate></item><item><title>Re: window office - Just received my first request for a paid link on...</title><link>http://windowoffice.tumblr.com/post/113259001#comment-9975724</link><description>&lt;p&gt;wha... is that site a joke?  my site is worth almost 4x as much as &lt;a href="http://cmu.edu" rel="nofollow noopener" target="_blank" title="cmu.edu"&gt;cmu.edu&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;they're clearly have trouble distinguishing the subdomain from '&lt;a href="http://tumblr.com" rel="nofollow noopener" target="_blank" title="tumblr.com"&gt;tumblr.com&lt;/a&gt;'&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Tue, 26 May 2009 20:11:39 -0000</pubDate></item><item><title>Re: window office - List of accepted papers | SIGIR'09</title><link>http://windowoffice.tumblr.com/post/98922532#comment-8655904</link><description>&lt;p&gt;compared to last year's stats (showing first/any authorship):&lt;/p&gt;&lt;p&gt;MS 11/18&lt;br&gt;Y 1/4&lt;br&gt;G 1/3&lt;/p&gt;&lt;p&gt;which is 15%/29% of the papers in 2008.  Its hard to say what proportion of these numbers are due to recent grads publishing work done while in school after they've joined one of these companies.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Fri, 24 Apr 2009 11:27:05 -0000</pubDate></item><item><title>Re: window office - Twitter Cascades</title><link>http://windowoffice.tumblr.com/post/98866799#comment-8572139</link><description>&lt;p&gt;looks like there's some clients starting to support this, for example: &lt;a href="http://www.atebits.com/tweetie-mac/" rel="nofollow noopener" target="_blank" title="http://www.atebits.com/tweetie-mac/"&gt;http://www.atebits.com/twee...&lt;/a&gt;&lt;br&gt;But, as you said its all reliant on the clients &amp;amp; users adding the right commands in their messages &amp;amp; correctly parsing those commands on the other side.  These aren't built into the messaging system, but added to the message body and are constrained to the same character limit.  What if someone wants to direct a tweet to 10 different users?  well, already you've used up something around 80-100 characters of your messages.&lt;/p&gt;&lt;p&gt;I'm not a twitter user, but I do see some of the appeal.  This lack of structure, tho, is a real turn-off for me.  Facebook, for example, stores threaded conversations attached to almost anything.  I happen to like that model quite a bit better.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Wed, 22 Apr 2009 11:33:53 -0000</pubDate></item><item><title>Re: Evil Marketing Ploy or April Fools Joke?  (&amp;#039;cause it can&amp;#039;t be true)</title><link>http://windowoffice.tumblr.com/post/91899617#comment-7716059</link><description>&lt;p&gt;True.  I did visit the site and even blogged about it.  They certainly know how to pique your interest by claiming you've become an over-night celebrity. &lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Wed, 01 Apr 2009 11:10:24 -0000</pubDate></item><item><title>Re: window office - Amazon.com search for [girl scout cookies]</title><link>http://windowoffice.tumblr.com/post/91245970#comment-7640639</link><description>&lt;p&gt;There are items in their catalog with an exact title match on the query, for example this excellent Ted Nugent song:&lt;br&gt;&lt;a href="http://www.amazon.com/Girl-Scout-Cookies/dp/B000W0Z2II/" rel="nofollow noopener" target="_blank" title="http://www.amazon.com/Girl-Scout-Cookies/dp/B000W0Z2II/"&gt;http://www.amazon.com/Girl-...&lt;/a&gt;&lt;br&gt;but this doesn't show up until more than halfway down the second page of results.&lt;/p&gt;&lt;p&gt;Its seems bizarre that any ranking algorithm would down-weight query matches in the title in favor of items that don't contain the query terms at all.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Mon, 30 Mar 2009 14:57:53 -0000</pubDate></item><item><title>Re: window office - Wolfram Blog : Wolfram|Alpha Is Coming!</title><link>http://windowoffice.tumblr.com/post/84883324#comment-7032379</link><description>&lt;p&gt;see the discussion @ Daniel's blog:&lt;br&gt;&lt;a href="http://thenoisychannel.com/2009/03/09/a-new-kind-of-marketing-nkm/" rel="nofollow noopener" target="_blank" title="http://thenoisychannel.com/2009/03/09/a-new-kind-of-marketing-nkm/"&gt;http://thenoisychannel.com/...&lt;/a&gt;&lt;br&gt;and the glowing anticipation elsewhere:&lt;br&gt;&lt;a href="http://www.twine.com/item/122mz8lz9-4c/wolfram-alpha-is-coming-and-it-could-be-as-important-as-google" rel="nofollow noopener" target="_blank" title="http://www.twine.com/item/122mz8lz9-4c/wolfram-alpha-is-coming-and-it-could-be-as-important-as-google"&gt;http://www.twine.com/item/1...&lt;/a&gt;&lt;/p&gt;&lt;p&gt; &lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Mon, 09 Mar 2009 11:17:30 -0000</pubDate></item><item><title>Re: window office - WSDM 2009 papers from the ACM Digital Library</title><link>http://windowoffice.tumblr.com/post/78048201#comment-6245811</link><description>&lt;p&gt;nope, haven't seen them.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Fri, 13 Feb 2009 14:44:38 -0000</pubDate></item><item><title>Re: window office - TerrierTeam: Building Terrier by Open Collaboration</title><link>http://windowoffice.tumblr.com/post/77556445#comment-6203904</link><description>&lt;p&gt;Ahhh... thanks for the correction.   Apologies for my mis-understanding.&lt;/p&gt;&lt;p&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Thu, 12 Feb 2009 06:46:29 -0000</pubDate></item><item><title>Re: &amp;quot;public&amp;quot; email archives on Google &amp;amp; Yahoo</title><link>http://windowoffice.tumblr.com/post/76920169#comment-6118544</link><description>&lt;p&gt;This really falls into the 'scraping' category as its not really accessing the archives through Yahoo's provided interfaces, and doesn't seem to be geared towards large-scale archiving.  I doubt Yahoo! would be too psyched if you downloaded a few larger archives en masse with a tool like this.&lt;/p&gt;&lt;p&gt;Really, what's so hard about providing gzipped mbox files directly on the yahoo group site?&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Mon, 09 Feb 2009 13:15:48 -0000</pubDate></item><item><title>Re: window office - My new academic homepage (comments?).</title><link>http://windowoffice.tumblr.com/post/75881597#comment-6055444</link><description>&lt;p&gt;well, it *is* a joke.  not sure this is really going to be promoted to my real homepage, although I'm pretty tired of that one.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Fri, 06 Feb 2009 17:11:23 -0000</pubDate></item><item><title>Re: Academic IR research and queries</title><link>http://windowoffice.tumblr.com/post/75373231#comment-6055416</link><description>&lt;p&gt;ahh... what an unsatisfying answer :)&lt;/p&gt;&lt;p&gt;I've heard both this opinion AND the opposite from a few senior IR researchers.  The data is out there, and presumably there are (were) people using it for unsavory purposes.  Even though its been officially pulled from distribution, to ignore it for research purposes seems like throwing the baby out with the bath water.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Fri, 06 Feb 2009 17:10:14 -0000</pubDate></item><item><title>Re: window office - My new academic homepage (comments?).</title><link>http://windowoffice.tumblr.com/post/75881597#comment-5869628</link><description>&lt;p&gt;yeah -- its sort of a joke &amp;amp; a means to procrastinate from writing my thesis proposal.   its a clear example of how something that's readable in print really doesn't translate to the web.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Thu, 05 Feb 2009 12:16:52 -0000</pubDate></item><item><title>Re: window office - Galago</title><link>http://windowoffice.tumblr.com/post/73509632#comment-5585283</link><description>&lt;p&gt;Nice to see.  Seems quite a bit more flexible than Indri for some things.  I'll be trying it out this week.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Tue, 27 Jan 2009 12:13:16 -0000</pubDate></item><item><title>Re: argmax &amp;amp; Python performance</title><link>http://windowoffice.tumblr.com/post/65775125#comment-4873392</link><description>&lt;p&gt;its WAY faster.  haven't run the full set of tests, but it blows the others out of the water.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Sat, 03 Jan 2009 21:06:30 -0000</pubDate></item><item><title>Re: window office - acadmics cannot address core web search on a level...</title><link>http://windowoffice.tumblr.com/post/57143341#comment-3409673</link><description>&lt;p&gt;whew... glad we agree on something (and my response was remarkably coherent, considering it was written upon returning home after a night of drinks with your ex-intern)&lt;/p&gt;&lt;p&gt;another thought: the query is an artifact of the dialog between user &amp;amp; system.  how can we know what queries will look like if a system doesn't exist?&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Fri, 31 Oct 2008 12:07:51 -0000</pubDate></item><item><title>Re: window office - acadmics cannot address core web search on a level...</title><link>http://windowoffice.tumblr.com/post/57143341#comment-3403709</link><description>&lt;p&gt;What comes first:  The algorithm?  The interface?  The task?   How can we measure any of these without some approximation of the others?  How do we know what users will look for before there exists a means to look?&lt;/p&gt;&lt;p&gt;There is clearly *some* value in *some* artifacts of online social interaction.  I don't think all online social services produce useful artifacts, but some are incredibly fertile with real contributions from expert communities.   Social media is NOT just twitter, or LiveJournal or Facebook.  As Mark Smith put it (&lt;a href="http://ir.mathcs.emory.edu/SSM2008/papers/ssm22p-smith.pdf)" rel="nofollow noopener" target="_blank" title="http://ir.mathcs.emory.edu/SSM2008/papers/ssm22p-smith.pdf)"&gt;http://ir.mathcs.emory.edu/...&lt;/a&gt;, social media is "collective good produced through computer-mediated collective action".  That "collective good" can be, and often is, much more than a narcissistic MySpace profile and race to increase your friend count.&lt;/p&gt;&lt;p&gt;An example: newsgroups and mailing lists are the backbone of open source software support and development.   I recently corresponded through a public mailing list with one of the authors of 'wget', and I'm sure I'm not the only person who was confused by the documentation.  This social interaction, now archived several mirrors of the GNU email archives, can and should be accessed by future users of the software.  But, does an interface or service exist to support that sort of access?   Should we treat email archives just like other text on the web?  Or can we leverage what we know about the structure of email to improve access to this information?&lt;/p&gt;&lt;p&gt;This type of artifact of online social interaction offer a level of structure in the corpus that hasn't really been investigated with regard to retrieval algorithms -- authors, topics, messages, threads are all potential units of retrieval, and relations exist between these objects.  In my view, the challenges in search over "social media" corpora are really challenges of search in a world of more complex data types, with meaningful relationships between them.&lt;br&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jon Elsas</dc:creator><pubDate>Fri, 31 Oct 2008 00:24:50 -0000</pubDate></item></channel></rss>