Disqus - Latest Comments for jelsas

Re: Yahoo LETOR Challenge upload format confusion

Jon Elsas — Fri, 23 Apr 2010 16:11:43 -0000

we're currently operating in stealth mode.

Re: Jeremy Barnes - Learning to Rank Challenge: Yahoo Misses the Point

Jon Elsas — Wed, 17 Mar 2010 09:44:14 -0000

Great post. Excellent points.

I think this does highlight a real difference in the approach to research between the more traditionalist IR community and the ML community. As an IR researcher, I'm interested in which queries my algorithm performs better or worse on and why. What aspects of those queries or documents are different or unique and how can I understand and make use of that algorithmically. An ML research looks at this approach and just sees feature engineering. As am ML researcher, I want a huge set of features, a labeled dataset, and an objective function. The goal is simply to maximize the objective function on held out data, without much regard for the semantics of the features. This competition certainly takes the latter approach.

One clarification - this dataset does contain graded relevance levels (0 = non-relevant... 4 = perfect)

Re: http://windowoffice.tumblr.com/post/450669754

Jon Elsas — Tue, 16 Mar 2010 09:27:29 -0000

Jeff - I agree there should be some way to get around this by chunking the data. But, there are a couple issues - first, how to represent links across chunks? and second, do the elegant compression algorithms supported by the java package fall apart when you can't store the data as one contiguous array?

Re: http://windowoffice.tumblr.com/post/426350580

Jon Elsas — Mon, 15 Mar 2010 20:30:46 -0000

bingo. Red = manually identified bad assessment, blue = a single assessor who did most of my HITs.

Re: Relevance assessment with MTurk & statAP

Jon Elsas — Thu, 11 Mar 2010 06:58:09 -0000

Doesn't that require multiple labels & a gold standard? I have neither.

Re: Google Buzz vs. Twitter & why Buzz might be a huge success.

Jon Elsas — Thu, 25 Feb 2010 07:20:00 -0000

Buzz is really all about the API and the data. I'm sure many buzz apps will appear shortly.

Re: window office - Got the wrong Bob?

Jon Elsas — Wed, 14 Oct 2009 07:11:58 -0000

Thanks Vitor -- I've found that the best way to encourage commenting on my blog is to post slightly inaccurate information :)

Re: Confmaster sucks

Jon Elsas — Tue, 09 Jun 2009 08:28:04 -0000

all uploads finally went through OK. Not all of the authors got confirmation emails.

Re: Confmaster sucks

Jon Elsas — Mon, 08 Jun 2009 16:26:46 -0000

340K
If that's too fat, then we've got real problems.

Re: window office - Just received my first request for a paid link on...

Jon Elsas — Tue, 26 May 2009 20:11:39 -0000

wha... is that site a joke? my site is worth almost 4x as much as cmu.edu.

they're clearly have trouble distinguishing the subdomain from 'tumblr.com'

Re: window office - List of accepted papers | SIGIR'09

Jon Elsas — Fri, 24 Apr 2009 11:27:05 -0000

compared to last year's stats (showing first/any authorship):

MS 11/18
Y 1/4
G 1/3

which is 15%/29% of the papers in 2008. Its hard to say what proportion of these numbers are due to recent grads publishing work done while in school after they've joined one of these companies.

Re: window office - Twitter Cascades

Jon Elsas — Wed, 22 Apr 2009 11:33:53 -0000

looks like there's some clients starting to support this, for example: http://www.atebits.com/twee...
But, as you said its all reliant on the clients & users adding the right commands in their messages & correctly parsing those commands on the other side. These aren't built into the messaging system, but added to the message body and are constrained to the same character limit. What if someone wants to direct a tweet to 10 different users? well, already you've used up something around 80-100 characters of your messages.

I'm not a twitter user, but I do see some of the appeal. This lack of structure, tho, is a real turn-off for me. Facebook, for example, stores threaded conversations attached to almost anything. I happen to like that model quite a bit better.

Re: Evil Marketing Ploy or April Fools Joke? ('cause it can't be true)

Jon Elsas — Wed, 01 Apr 2009 11:10:24 -0000

True. I did visit the site and even blogged about it. They certainly know how to pique your interest by claiming you've become an over-night celebrity.

Re: window office - Amazon.com search for [girl scout cookies]

Jon Elsas — Mon, 30 Mar 2009 14:57:53 -0000

There are items in their catalog with an exact title match on the query, for example this excellent Ted Nugent song:
http://www.amazon.com/Girl-...
but this doesn't show up until more than halfway down the second page of results.

Its seems bizarre that any ranking algorithm would down-weight query matches in the title in favor of items that don't contain the query terms at all.

Re: window office - Wolfram Blog : Wolfram|Alpha Is Coming!

Jon Elsas — Mon, 09 Mar 2009 11:17:30 -0000

see the discussion @ Daniel's blog:
http://thenoisychannel.com/...
and the glowing anticipation elsewhere:
http://www.twine.com/item/1...

Re: window office - WSDM 2009 papers from the ACM Digital Library

Jon Elsas — Fri, 13 Feb 2009 14:44:38 -0000

nope, haven't seen them.

Re: window office - TerrierTeam: Building Terrier by Open Collaboration

Jon Elsas — Thu, 12 Feb 2009 06:46:29 -0000

Ahhh... thanks for the correction. Apologies for my mis-understanding.

Re: "public" email archives on Google & Yahoo

Jon Elsas — Mon, 09 Feb 2009 13:15:48 -0000

This really falls into the 'scraping' category as its not really accessing the archives through Yahoo's provided interfaces, and doesn't seem to be geared towards large-scale archiving. I doubt Yahoo! would be too psyched if you downloaded a few larger archives en masse with a tool like this.

Really, what's so hard about providing gzipped mbox files directly on the yahoo group site?

Re: window office - My new academic homepage (comments?).

Jon Elsas — Fri, 06 Feb 2009 17:11:23 -0000

well, it *is* a joke. not sure this is really going to be promoted to my real homepage, although I'm pretty tired of that one.

Re: Academic IR research and queries

Jon Elsas — Fri, 06 Feb 2009 17:10:14 -0000

ahh... what an unsatisfying answer :)

I've heard both this opinion AND the opposite from a few senior IR researchers. The data is out there, and presumably there are (were) people using it for unsavory purposes. Even though its been officially pulled from distribution, to ignore it for research purposes seems like throwing the baby out with the bath water.

Re: window office - My new academic homepage (comments?).

Jon Elsas — Thu, 05 Feb 2009 12:16:52 -0000

yeah -- its sort of a joke & a means to procrastinate from writing my thesis proposal. its a clear example of how something that's readable in print really doesn't translate to the web.

Re: window office - Galago

Jon Elsas — Tue, 27 Jan 2009 12:13:16 -0000

Nice to see. Seems quite a bit more flexible than Indri for some things. I'll be trying it out this week.

Re: argmax & Python performance

Jon Elsas — Sat, 03 Jan 2009 21:06:30 -0000

its WAY faster. haven't run the full set of tests, but it blows the others out of the water.

Re: window office - acadmics cannot address core web search on a level...

Jon Elsas — Fri, 31 Oct 2008 12:07:51 -0000

whew... glad we agree on something (and my response was remarkably coherent, considering it was written upon returning home after a night of drinks with your ex-intern)

another thought: the query is an artifact of the dialog between user & system. how can we know what queries will look like if a system doesn't exist?

Re: window office - acadmics cannot address core web search on a level...

Jon Elsas — Fri, 31 Oct 2008 00:24:50 -0000

What comes first: The algorithm? The interface? The task? How can we measure any of these without some approximation of the others? How do we know what users will look for before there exists a means to look?

There is clearly *some* value in *some* artifacts of online social interaction. I don't think all online social services produce useful artifacts, but some are incredibly fertile with real contributions from expert communities. Social media is NOT just twitter, or LiveJournal or Facebook. As Mark Smith put it (http://ir.mathcs.emory.edu/..., social media is "collective good produced through computer-mediated collective action". That "collective good" can be, and often is, much more than a narcissistic MySpace profile and race to increase your friend count.

An example: newsgroups and mailing lists are the backbone of open source software support and development. I recently corresponded through a public mailing list with one of the authors of 'wget', and I'm sure I'm not the only person who was confused by the documentation. This social interaction, now archived several mirrors of the GNU email archives, can and should be accessed by future users of the software. But, does an interface or service exist to support that sort of access? Should we treat email archives just like other text on the web? Or can we leverage what we know about the structure of email to improve access to this information?

This type of artifact of online social interaction offer a level of structure in the corpus that hasn't really been investigated with regard to retrieval algorithms -- authors, topics, messages, threads are all potential units of retrieval, and relations exist between these objects. In my view, the challenges in search over "social media" corpora are really challenges of search in a world of more complex data types, with meaningful relationships between them.