We were unable to load Disqus. If you are a moderator please see our troubleshooting guide.

Samuel Pastva • 1 year ago

The speed difference might be just the difference between what USB4/TB3 advertises vs. what it actually is: https://macperformanceguide...

For example, even though TB3 is *technically* 40Gbit, only 32Gbit can be used for data (the rest is always reserved for DP alt-mode/etc., even if unused). Out of those, only 25Gbit is actually usable data due to the error-correcting encoding. I'm assuming additional overhead is spent on encoding the Ethernet packets into Thunderbolt packets, so the ~20Gbit maximum that you link to is probably the very peak you can do with this setup.

Finally, even if you had two ports that are advertised as 40Gbit, that does not mean they can both do 40Gbit at the same time. On most chipsets I've seen, you usually have two ports sharing the same 40Gbit bus, or more specifically, 4 PCIe 3.0 lanes. I'm not sure how the bandwidth allocation works in these scenarios, but it is possible that just *having* the two interfaces enabled could split your available bandwidth further. If each Ethernet interface presents itself as a separate PCIe device over thunderbolt, the interface could just give each device two 3.0 lanes instead of dynamically allocating the bandwidth. That would actually align with your observed ~11Gbit rather well. But again, this is heavy speculation. I have very little knowledge of the actual drivers involved here.

Fang-Pen Lin • 1 year ago

Thanks for sharing the link and the thoughts regarding why it only hit 11Gbps. 🙌

Rhys • 1 year ago

Hi Fang-Pen,

This is a great setup, well done. How do you handle ip addressing, routing, vlans etc.? I've got a similar ring topology with dual 10gbps SFP+ cards. I'm working with OpenStack and Open vSwitch etc. as below.

https://blog.rhysgoodwin.co...

Cheers
Rhys

Fang-Pen Lin • 1 year ago

Hi Rhys,

Your cluster looks pretty awesome. I will take a time to read your article, and maybe I can learn one trick or two from your setup, thanks for sharing 🙏

As for more details of the cluster, I am not actually really an expert in terms of networking stuff. I only know very basic stuff. And I am still learning, my approach may not be the best. But yeah, here's how I did it.

As you can see the IP I added to the Thunderbolt3 interfaces, I set a peer IP to them (the IP on the other end). With those, I have 10.7.0.x IPs reachable on each USB ports of these three machines. Those 10.7.0.x IPs are currently not the IP addresses I use for connecting in application level (Kuerbnetes). Actually, I was using Nebula as another layer of VPN mesh on top of my underlaying network:

https://github.com/slackhq/...

The Kubernetes cluster was actually running on top of this Nebula network. I configure the Nebula network with preferred_ranges set to 10.7.0.0/24 so that it would use the Thunderbolt3 ports first to connect to each other if it's available:

https://nebula.defined.net/...

While this setup works great, even when my Thunderbolt3 connection is down for any reason, it can still switch back to ethernet or even Wifi, but there's a catch:

https://github.com/slackhq/...

When I tried to benchmark with the Nebula network, I can only hit somewhere around 2.xGbps and then encounter CPU usage saturation issue. I guess because Nebula is using Tun device, unlike Wireguard is running in Linux kernel, so there's more overhead to operate at high speed. There are many Nebula parameters you can tune, I tried a bit of changes but that doesn't seem like brining up a higher speed number for now. As I cannot find a time to do a full-scale benchmark and dig deeper into how to solve the nebula bottleneck problem, so currently my Ceph on top of the Nebula network can only hit 2.x Gbps.

I was thinking about configuring Ceph to make them use raw 10.7.0.0/24 IP for better performance, I recall I saw a Gist post with similar setup and the author configured Ceph to use raw IP address:

https://gist.github.com/scy...

I haven't read it yet, so I don't know how to do that at this moment yet. But regardless if I can make Ceph to talk to each other nodes via Thunderbolt3 IPs or not, actually, I want to fix Nebula's bottleneck more because in that way, even if I unplug the cables, my cluster still works, that's very cool.

Anyway, actually this article only cover a very tiny potion of the stuff I've tried on this cluster. There are many interesting fancy new stuff (new to me at least 😅) I have adopt on this project:

- NixOS https://nixos.org/
- Cilium https://cilium.io/
- Nebula https://github.com/slackhq/...
- Rook Ceph https://rook.io/
- USB4/Thunderbolt3

And I've learned a tons from doing it. Hopefully I can find some time to write more articles on those topics. Hope the above makes sense, and let me know if there's anything you would like to know about my setup!

Mario Giammarco • 1 year ago

Have you tried batman, batman-adv, or robin protocol? I am using them in a proxmox/ceph cluster with used 40gbit pci ethernet cards in a ring topology like you.
I able to obtain redundancy and speeds above 10gbit (but I am not completely satisfied)

Fang-Pen Lin • 1 year ago

Nope, first time I heard about this, but it looks pretty promising. Thank you so much for sharing, I will try it out at some point 🙏

Mario Giammarco • 1 year ago
jacob • 1 year ago

I love the 10gb virtual network interface available in he thunderbolt stack, and glad to see someone giving some thought and well written words on the subject. As an aside, it looks like you have a ring architecture and not a full mesh (each node would need three ports it seems)

no-bug404 • 1 year ago

Surely in three nodes mesh and ring have the same topology. As all devices are connected directly to all other devices.

Fang-Pen Lin • 1 year ago

Thanks! Also thank you for pointing out about the ring architecture, actually I am not that familiar with the strict definitions of ring architecture or a so-called full mesh in terms of networking setup 😅

The ring architecture (or topology) was just something I recall read long time ago on a book, it's basically hosts forming a circular structure, by connecting pairwise. I was just thinking maybe I can do similar stuff with USB4. Pardon my term use may not be very accurate.

Lee Schumacher • 1 year ago

How did you configure ceph?

Fang-Pen Lin • 1 year ago

I use https://github.com/rook/rook/ for installation and didn't change much of the configurations.

Daniel15 • 1 year ago

Make sure you enable jumbo frames by changing the MTU from 1500 to 9000. Some devices support even larger frames, up to 16k. (check the documentation)

If you don't have jumbo frames enabled, that'd explain some of the 'missing' performance.

Fang-Pen Lin • 1 year ago

Yeah, currently MTU was set at 1500. I may increase it and test again if I can find a time. But I guess the TCP header overhead only counts for part of the reasons.

CNW TECH • 1 year ago

USB4 speed limitation may be result of hardware, chipset firmware, installed NIC drivers on OS, specific USB port limitations (some controllers support highest speeds only on certain ports), cable (your cable says 40 Gbps but vendor has comparable that’s rated 20 Gbps, why, and did you get 20 Gbps version, they state contact them if issues w early product), speed negotiation on either side of USB cable (eg auto link speed, frame size, etc)… sometimes external switch used to fix speed if can’t control w drivers.

Did you check hardware capabilities by port in OS (dmesg)? What does lsusb -l show for usb port configs? Did you search for firmware updates from manufacturer of USB controller (usually chipset gives away) and confirm w support no uodates to support USB4 fully? Is there updated OS driver to support USB4 controller from system manufacturer?

Grant

Fang-Pen Lin • 1 year ago

For the firmware part, it's running Linux distribution. I thought the firmware/driver was coming from the Linux distribution itself, not sure if you can install them like Windows.

CNW TECH • 1 year ago

First place is to check with hardware platform manufacturer... specifically for USB drivers... see what. they have, if nothing available to download, firmware and drivers up to date as per their specs, then contact their support to confirm USB4 configuration

CNW TECH • 1 year ago

Sometimes there is a linux disto for usb firmware flash... other times it may be provided by system/hardware manufacturer for their specific hardware implementation, other times is may be generic and provided by the USB hardware controller chipset manufacturer

James • 1 year ago

Regarding the speed issue, I'm fairly certain it's the secondary (the one on the right) port.

I own the Minisforum UM775 Lite, which is basically the previous gen of the 790 Pro and, despite Minisforum claiming both front ports are USB4, only one of them supports 40Gbps and altDP mode. The second one is capped at 10Gbps and does not support power deliver either.

I would suggest testing only two of these 790 Pros at a time by connecting them using only the primary USB4 port (the one on the left). In the picture I can see there's a cross pattern for the uplinks. Try doing left-to-left (for top and middle) and right-to-right (middle and bottom).

EDIT: After reading your post I had to check Minisforum's website and saw they now have an Intel 13th gen mini PC in the 1L form factor with *two* 10G SFP+ ports. Absolutely insane.

Fang-Pen Lin • 1 year ago

I tried to connect with left-left, right-right pattern, but somehow kept seeing linux USB kernel errors, not sure why. Doing a cross pattern seems like reducing the chance of seeing the same error. I didn't manage to find a time to look deeper into Kernel level see what's going on there.

As for Minisforum's new MS-01, yeah, that looks pretty cool, and I would like to buy one and try it out at some point.

Peter Vander Klippe • 1 year ago

This is amazing, but I would say that the Dell server you estimated the power consumption for I think was grossly over estimated. I have a T430 and it idles around 98W and it only has 500W power supplies and I've never seen it pull more than 150W so far. Your newer boxes are definitely more efficient but they don't consume 1000W 24/7.

Fang-Pen Lin • 1 year ago

Thanks for pointing it out. I saw many others pointed it out as well. I never really own an 1U server physically, so I was just picking a pretty random number under its power rating, provides it got tons of memory & disks, two CPUs in that machine. But obviously I had no clue what I was talking about. I've already updated. Thanks again 🙌

azbest • 1 year ago

Tried to reproduce it between two GmkTec M7 6850h pro system.
Under ubuntu, I got 12.5Gbps real speed.
Windows showed 20Gbps USB4 P2P network adapter.
After reading similar blogposts, 40 Gbps TB/usb4 link made from 2 x 20 Gbps link and the P2P network uses only one of the links.

Eren • 1 year ago

Can someone please help, I just can't seem to get the connection to work :(
I also have UM790 Pro but it shows no connection between two nodes.
I went by this manual: https://gist.github.com/scyto/67fdc9a517faefa68f730f82d7fa3570

I assume that it might have something to do with the driver path, which for me is:
https://uploads.disquscdn.c...

Here the path I put in, I also tried the path from your manual (pci-0000:c7:00.6 and 5 respectively) but it still didnt recognize. I use TB5 cables from Cable Matters but have also tried TB3 Cables:
https://uploads.disquscdn.c...

Help is greatly appreciated

OP • 1 year ago

The iperf3 from Ubuntu APT repository is not the latest, and does not support multi-thread. Please consider download source code from official repo, and manually compile it, which may cost less than 5 minutes. The single core of the CPU is too weak to unleash the 40G Ethernet link.

Little Fooch • 1 year ago

Fang-Pen
Nice work. Do you have a simple topology diagram you can share of how the machines are connected together? Regardless of the kind of network, I assume in your USB4 network that any node/machine on the network can reach any other machine.

I've been working on something similar in Windows.
Thanks
LF

Little Fooch • 1 year ago

OK, so based on your image at top (I'll call them A, B and C, top to bottom), Can A ping C or vice versa? Obviously B can reach both A and C

Fang-Pen Lin • 1 year ago

Yeah, it's a fully connected mesh betwen A, B and C. So there are USB4 cables between

A to B
B to C
A to C

It's very straightforward network structure. The network interface is simplely thunderbolt bridge as shown in the nixos config sample I provided in the nixos config. So yes, each of them can reach any nodes of them. This only works when there are only 3 nodes, when there are more than 3, I will need to change a way of connecting the nodes and the routing might be a little bit more complex.

Little Fooch • 1 year ago

Thanks Fang-Pen, have you seen this mesh implemented in Windows?

For a non-mesh solution, tt may be of interest that back in 2014, someone interconnected 3 Windows TB hosts via TB cables and routing. I've never been able to replicate the routing tricks in current Win version.

Sometimes Network Bridge actually works as it is supposed to in Windows which allows a Thunderbolt Network Adapter on a TB host in the Windows Network Connections windows to be bridged to an existing (10gb In my case) Windows ethernet adapter.

If interested, the White Paper named 'Thunderbolt™ Networking Bridging and Routing Instructional White Paper' can be downloaded from any Intel Thunderbolt web site.

disqus_sgbYlOplNT • 1 year ago

Hey mate, great read! Appreciating you putting your time out to share this. It inspired me to got the minisforum stations just now to build my first K8s cluster.

Do you have anything in place for cluster loadbalancing? Or do you connect via NodePort?

And yea, I have heard good things about NixOS and trying to extend my knowledge around it. I love the idea where you can replicate the nodes based on NixOS configs so keen to read on your experience on it.

Fang-Pen Lin • 1 year ago

I am using Cilium (https://cilium.io/) for the networking of my Kubernetes cluster. It comes with many cool interesting features enabled by eBPF, such as load balancing as you mentioned, they patched connect / sendTo and other basic socket syscalls to route traffic to different node with zero overhead. Black magic basically. While it works, it's super awesome, but sometimes I felt it's still very cutting edge technology and I kept running into different issues from time to time, it's kind of annoying. It's a recally cool project, I really hope it gets more mature in the near future. If you feel adventurous and want to see what eBPF can do for things such as loadbalancing, it's definitely worth a try. But if you don't have appetite for debugging all the way down to Kernel with eBPF, it's probably still too unstable for most IMHO. I haven't tried other Kubernetes network solutions yet, so no idea about others.

Renato Aquino • 1 year ago

Nice article.
About the speed, maybe it's just your cables.
https://youtu.be/qV03FfdPHO...

Fang-Pen Lin • 1 year ago

Thanks for sharing, I will check it out.