We were unable to load Disqus. If you are a moderator please see our troubleshooting guide.

aargh! Likewise I'd love a follow up on this.

EDIT: though I think as mentioned below all that matters is that the expectation is negative so we are able to be interested in solely the |f|_L <= 1 functions.

Those visualizations are beautiful! congrats on this extremely nice intro!

The great news is that, despite what the bibliography of the paper seems to suggest, there are several ML papers that have been proposed that fit between Villani and the Wasserstein GAN paper.

I am happy to plug mine, notably at nips 2016:

https://papers.nips.cc/pape...

https://papers.nips.cc/pape...

and nips 2013:

https://papers.nips.cc/pape...

but there are also many other interesting projects going on that are worth mentioning, and I am only mentioning pure ML/stats papers.

http://cbcl.mit.edu/wassers...

https://arxiv.org/abs/1507....

https://arxiv.org/pdf/1701....

https://papers.nips.cc/pape...

I think there was a small typo at the Strong Duality section in the second display-mode equation (i.e. document.getElementById('MathJax-Element-115-Frame')).

In particular, the [b, -z* + epsilon] should be transposed:

\begin{bmatrix} \mathbf{A} \\ -\mathbf{c}^T \end{bmatrix}^T

\begin{bmatrix} \mathbf{y} \\ \alpha \end{bmatrix} \leq \mathbf{0}, \ \ \ \

\begin{bmatrix} \mathbf{b} \\ -z^* + \epsilon \end{bmatrix}^T

\begin{bmatrix} \mathbf{y} \\ \alpha \end{bmatrix} > 0

Also, thanks for the great article.

Thank you very much for the enlightening discussion of this discrete case!

The hardest part to follow was the part

Since P_ r and P_\theta are nonnegative ...

this is not the case.

For one thing, it should be reworded as it reads as if \sum_i f_i + g_i is in fact the objective function! But moreover, I didn't see how the benefits would need to be for the same j value. To flesh this argument out, it would only not be possible to reduce the 'gap' f(x_i)+g(x_i) to be closer to 0 if we had j, k such that

f(x_i) + g(x_j) = |i-j|

and

f(x_k) + g(x_i) = |i-k|

where j and k might not be equal! Is this what was implied, or is there a way to reduce to the j=k case? I cannot see a way.

Then from this we can deduce a contradiction since adding the two, the LHS is less than |j-k| by the D_{jk} result for g(x_j) and f(x_k), but by the triangle inequality the RHS is at least |j-k|.

Hi. Really nice tutorial. Do you think this is going to be valid when the range of f is R+ (positive reals) instead of R? Thank you

Hi, looks like there is a minor typo in your second code block. In the linprog call for the dual problem, it should be A.T. This corresponds to your description and agrees with your full notebook on github: https://github.com/vincenth...

Hello Vincent Herrmann ,

Thanks for the illustrative post. When reading the proof of strong duality, I'm confused about why $\hat A x^* =\hat b_0$ for $\epsilon=0$? In other word, why $Ax^*=b$ for the optimal point $x^*$? The constraints say $Ax≤b$, and [this wikipedia](https://en.wikipedia.org/wi... says only a subset of inequality constraints becomes equalities.

Another question is that when $\epsilon>0$, you does not seem to prove that $y≤0$, which is required by the dual.

Thank you so much for the explanation in both discrete and continues forms. This gives nice insights. Dual problem can also think as profit maximization method in optimal transfer plan where we need to assign buying and selling price by having constrain which is the profit should be less than the initial transfer cost (c). It will again lead to constraint Lipschitz continuity. I understood how this method is formulated from your explanation.

Hi,

Thanks a lot for this amazingly clear introduction to the Kantorovich-Rubinstein duality. Really helped me get a better understanding of what it meant.

I believe there are small typos in the **Strong Duality** section, 3rd paragraph.

Shouldn't it be $\hat{b_{0}}^{T}\hat{y}$ in the first line? And then I believe it is "Then changing to any number greater than has to result in $\hat{b_{\eps}}^{T}\hat{y}$."

Thanks!

Hello!

Your minimax principle is wrong. Here's a counterexample:

A = [0, 2], B = [0, 1],

g (a, b) = min (|a + b - 1.5| - 0.5, 0)

Then

sup_b g(a, b) = 0 is a convex function,

inf_a g(a, b) = -0.5 is concave function,

however

inf_a sup_b g(a,b) = 0 isn't equal to

sub_b inf_a g(a,b) = -0.5

The proof is otherwise very clear and accessible, I wonder if there's a way to fix it!

Hey!

I've just gone through it again, and I still think that it's correct. I have added one sentence to make maybe things a little clearer. But it's quite a while ago that I thought about it in depth, so I'm not absolutely sure.

Your function g(a, b) is 0 for all a and b, isn't it? For it to be -0.5, |a + b - 1.5| would have to be zero, which it can't be. Or am I missing something?

Hi, thanks for answering. No, g isn't constant zero. |a + b - 1.5| is zero for all a,b such that a + b = 1.5. It's a segment in AxB, one end is (0.5, 1), the other end is (1.5, 0).

Function g is zero in lower triangle where a + b <= 1, zero in upper triangle where a + b >= 2, and negative in a strip where 1 <= a + b <= 2.

Ah, my bad - sorry! I read it as A and B having only the two elements in the list. Too much programming:)

You're right of course, thanks for pointing it out! I don't see exactly what the problem is at the moment, I'll have to think about it. Maybe a missing regularity condition? Because the result is true, it's established...

By the way, I know the error now. Of course g(a, b) has to be convex in a for any fixed b (not just at the supremum) and concave in b for any fixed a. However, it's not as easy to show this for the Wasserstein case. I will update the post as soon as possible.

Thanks for this very nice into! Probably the best Wasserstein explanation I've seen :)

May I ask you what tools you have used other than jupyter (+ matplotlib) to create your diagrams?

Thank you, Daniel!

I used Adobe Illustrator to edit the plots (Inkscape is a great free alternative) and GeoGebra for the Farkas diagrams.

Hello Vincent Herrmann ! It seems that your Mathjax's CDN has been down so all the formulas are in a mess to me. Could you please find some time to fix it ?

Hi! For me alls seems to be working fine... what browser are you using?

Hello,

I was wondering where you got the functions f and g for the dual problem?

thanks for writing this up. I have a question on the last part. How do you show that the additional term is 0 if gamma in pi and infinity otherwise? The part when you try to make gamma independent of pi (the joint distribution)

If gamma in pi, then $E_(x,y~gamma)[f(x) - f(y)]$ and $E_(s~p_r)[f(s)] - E_(r~p_theta)[f(r)]$ are the same, so their difference is always zero, no matter what f we choose. If gamma not in pi, then s and x, or r and y are completely independent and we can choose an f that makes the difference +inf. Does this help?

ah yes! I misread the equation, thanks for the quick response!

https://github.com/maestroj...

Hi, I implemented Approximation EMD implemented with neural network, and gan and wgan implementation on simple and well-known example.

Hello

I really was impressed about your article and it really helped me understand kantrovich duality

I have a question about last part of your proof(wasserstein distance)

You going through your proof by introducing 'sup' term into original equation and

swap 'inf' and 'sup' part by showing minimax principle

What i could not get on is that why convexity is met for 'sup'g(a,b) and 'inf'g(a,b) is concave

Can you explain it more detail if you possible?

Thanks in advance

Hi,

I wasn't careful enough when if blended two different proofs here. The terms convex and concave are wrong in this case. What I actually mean is that every local minimum in $a$ for $sup_b g(a, b)$ is also a global minimum, and every local maximum in $b$ for $inf_a g(a, b)$ is also a global maximum. There is another mistake, if we use expectations of the random variable gamma instead of integration over the ordinary variable gamma, then we won't get to $-inf$ if $||f||_L > 1$, but only some value less than 0. The proof still works the same. Does this help?

I will change this part as soon as I can.Thank you for calling my attention to this issue!

Thanks for your reply.

I think the term 'convex and concave' is right since every local minimum of convex function

is global minimum

Thanks really helps me a lot

But i still can not understand 'inf' ove gamma part of last equation.

Why it is 0 when f is 1-Lipschtz function?

I'm no expert in convexity, but since there could be many separate global minima, I don't know if we can really call it convex. But it's definitely something very similar.

The definition of the 1-Lipschitz condition is basically |f(x) - f(y)| <= |x-y|, the difference between two values of f cannot be greater than their distance. Then |x-y| - (f(x) - f(y)) >= 0, for all x and y. We get the inf over gamma for example if we choose a probability of 1 at the same single value for x and y. The expectation is then 0.

Thanks now i understands

So now how i understand is that at first you wanna show 'inf''sup' term is changeable to 'sup''inf term.

That`s why you added term sup over 'f' which is every local minimum is a global minimum. So changing 'inf''sup' term to 'sup'inf term can be done.

Next, inf over gamma term achieves 0 in that only 1-Lipschitz f function gives feasible solution.

Am i understanding right?

Yeah, I think that's right.

The idea is to first replace the constraint on gamma (gamma \in pi) with an optimization over a function f, then swap the order of inf_gamma and sup_f, and lastly replace the optimization over gamma with an constraint on f (f is 1-Lipschitz continuous).

So we want to change from inf_gamma sup_f to sup_f inf_gamma. This is only possible if both sup_f is "convex" (or whatever we call it) in gamma, and inf_gamma is "concave" in f. We show that this condition is sufficient in an abstract way in the g(a,b) part. In our case the whole function (all the expectations, but without the inf and sup) is g(gamma, f), and it is relatively easy to see that the conditions are fulfilled.

Thanks for summary.

Thank you

http://nbviewer.jupyter.org...

This is how I worked on. It is really hard to understand in continuous case. Almost I gave up and recommended your blogs in the end haha.

I appreciate that you stuck with it! Yeah, the bilevel optimization and convex functions are tricky, but I think it's great how quickly you can prove it this way.

Uhmm, it is hard to get the proof of strong duality. But I did it. Maybe you have more time, you rather add some explanation on it

Thanks for all the corrections! Sorry if the errors confused you...

I agree that some of it is quite difficult to grasp. There is always this temptation to make things look easy by making the explanation short. But of course, most of the the time, this just means that the reader has to do everything himself. I will try to think of something to clear it up a bit.

At weiestrass distance, not dx, y but dx dy

I found one more error \sum f_i + g_i instead of \sum_i f_i - g_i

\alpha z^*= \alpha c^Tx^* \geq y^TAx^*=y^Tb>\alpha(z^*-\epsilon)

You'd better use this equation to show \alpha>0

actually \widetilde{z} > z* - \epsilon, not equal.

What I meant is that $\tilde{z} := z* - \epsilon$ is possible. All we want is this one value $\tilde{z}$ that gets close to $z^{*}$, the weak duality takes care of the rest.

I changed the equals sign to a defined sign, maybe this is clearer.

I got your purpose. Thanks for your kindness.

Hi. May I translate this into Chinese?

Yes, of course!

Feel free to tell me if you find some errors along the way:)

Thanks again for such a great article!

In the first paragraph of section "Linear Programming", I think b should be a m dimensional vector while it's defined as a n dimensional vector. Is that a typo or I'm getting it wrong?

Yes, you're absolutely right - that was a typo, I fixed it now.

Damn, it's hard to spot them all... thanks for being so attentive!

Thanks a lot! XD

MathJax rendering is not rendering inline math. You might want to change the cdn... see this link: https://www.mathjax.org/cdn...

Oh, I did not know that. Thanks for telling me!

I changed it and everything seems to be working, at least on my machine (Safari & Chrome).

Great - it's working now thanks for the post!

Hey may I translate these into Korean?

heyohai• 3 years agoThanks a lot for your impressive introduction!

I am a little confused by the derivation between "Let’s try changing to sup inf" and "We see that the infimum is concave, as required. Because all functions ....".

Could you explain more about why $inf_gamma E_{x,y~gamma}[|x-y|-(f(x)-f(y)]$ is negative infinity when $f$ is not 1-Lipschitz? Thank you!