We were unable to load Disqus. If you are a moderator please see our troubleshooting guide.
Thanks for the comment and the words of appreciation!
I think this evidence is/was disconcerting to a lot of people. Note that we aren't saying that you will never see the bias-variance tradeoff. For example, if someone told me they had constructed a data distribution that leads larger and larger trained neural networks to be higher and higher variance, I would probably believe them and go check out their work. I think people are generally much less interested in these theoretical constructions than actual datasets, though.
Hmm, so with pure noise example you asked about, the Bayes error (best error you could expect from any classifier) is 1 - 1/k, where k is the number of classes. Any randomly initialized neural network will get roughly that. So then, I would expect that it wouldn't learn much, and you'd see roughly a horizontal line in performance when increasing width. It might be more interesting to add lower levels of label noise. From a quick search, it looks like that's what they do in this paper (see Figure 4b): https://arxiv.org/abs/1912..... They are able to find a double descent curve with up to 20% label noise (not sure they tested it with higher label noises).
I'm definitely not saying we "don't need inductive bias"! :) Even infinitely large neural networks have inductive bias. See the initial work that Radford Neal did on this and the recent revamping that it has received.
I think a big part of machine learning these days is people trying to find the right inductive biases that are specific enough to be useful for learning, but general enough in that they can exploit the structure that is shared throughout our world.
This purported evidence is a little confusing/disconcerting to me. You test your thesis on datasets curated by humans, so it's possible that improvements that "aren't subject to the tradeoff" are merely researchers/engineers learning to (indirectly) exploit the structure *common to all datasets*. Here's a way to check this: train things on "bad" data (like pure noise). Would you still expect the 'Double Descent Curve'?
If anything begins to sound like "I don't need inductive bias" (and perhaps it's debatable whether what you're saying sounds like that), I have to flag it.
I should add that I really liked this post, and am thankful to the wikipedia edit that led me here.