We were unable to load Disqus. If you are a moderator please see our troubleshooting guide.

Publish. There is not any theoretical framework on any "oracle properties" or whatever but I like it better than choosing a tuning parameter manually. Also, the fact it's already implemented in SuperLearner may indicate that some people may use it (even if maybe only for prediction).

Is it too trivial...?

Looking at what crap can be released in a peer reviewed journal:

http://www.plosone.org/arti...

So I would say your approach here is overly complex and should be published ASAP! :)

Have you compared it against published results?, i.e. van't veer.

Seems a bit trivial on the other hand, if you could package it in a nice short article full of practical advice then why not?

I think you should go for the paper.

It might have been done already: it seems similar to sure screening or sure independence screening. It's been a while since I was into this literature, but you should be able to track down the details of SS and SIS via the paper "A selective overview of variable selection in high dimensional feature space" (sorry no URL — phone's clipboard is not cooperating). To publish in e.g. Annals of Statistics, you'd probably be asked to compare your method against some more recent algorithms. Zhang's (?) MCP algorithm comes to mind; it selects the right variables when n >> p under more relaxed conditions than the Lasso. The coordinate descent implementation of MCP by Hastie and Mazumder is called SparseNet. There are packages for SIS, MCP (called "plus", IIRC), and SparseNet on CRAN.

It is trivial, but important nonetheless. If optimized lasso and Leekasso perform essentially equivalently, then Leekasso still has some benefit as being a "simpler" algorithm and easier to understand. Moreover (and more importantly), the paper that describes Leekasso with respect to lasso gets to make the important and underappreciated point that both are fundamentally *mathematical* techniques that must be used only cautiously and with experimental validation before interpreting selected features as being mechanistically causative. Seeing that lasso and Leekasso use different predictors (if that is in fact the case) to come up with roughly equivalent predictions helps illustrate this. (Seeing that lasso often makes equivalently-correct predictions when the selected predictors are excluded from the model makes the same point.)

I'd see this most usefully then as a paper on the appropriate interpretation of regression models as much as the Leekasso proper.

It's not too trivial; I think it could be a fine paper. Some of my favorite papers seemed trivial at the time.

But whether you want to take the effort to write it up depends on what other things you could be doing but won't in order to do this one.

Nicholas Dronen• 5 years ago(The commenting system might have eaten the original version of this comment.)

I can offer some suggestions for if you decide to try to publish this. I am reminded here of sure independence screening [1]. If I recall correctly, SIS is a two-stage algorithm, the first of which involves univariate filtering similar to your method; you should definitely compare your method to the first stage of SIS.

Also, the literature about the Lasso has grown enormously in both depth and breadth since Tibshirani's 1996 papers. A reviewer might request that you compare your method to more recent algorithms. Zhang's MC+ [2], for example, selects the variables in the true model under less restrictive conditions than those required by the Lasso. Zhang has an implementation of MC+ on CRAN [3]; Hastie and Mazumder have a slightly modified coordinate-descent implementation of MC+, too [4].

All of this is from a flurry of reading I did last year, so I might have missed something. The survey from Fan and Lv [5] might be of use.

[1] http://cran.r-project.org/w...

[2] http://projecteuclid.org/eu...

[3] http://cran.r-project.org/w...

[4] http://cran.r-project.org/w...

[5] http://arxiv.org/abs/0910.1122