Tuesday . Main hall.. 11 o'clock.
RIPE 90
Plenary session
13 May 2025
11 a.m.
CHAIR: Good morning everyone. We have got a couple of announcements to make right before we get started with our first presentation of this session.
So, the RIPE PC election nomination also close at 3:30 today, so if you have somebody to nominate or if you want to nominate yourself, make sure you send that through before then.
The NomCom will hold office hours during the second half of all lunch breaks from 1:15 to 2 p.m., come by to room castania to speak in person to the committee or arrange for separate meeting with committee members. You can also share a feedback on ICP 2 from 1:30 to 2 p.m. at the meet and greet desk.
Without further ado, we have our you first talk.
THOMAS WEIBLE: Thank you very much for the introduction everybody. My name is Thomas WEIBLE co‑founder and founder of FLEXOPTIX, together with our lead research engineer, we brought a new topic to the RIPE meeting, we do also a little bit of recap on the RIPE in Krakow, so the ones who attended there might see some slides, or actually one.
We are specialized optical transceivers, or everything on the optical transmission part.
Having said that, let's start with the agenda. So, our topic today is more about what we can tweak out of a transceiver especially with a flex box, what we developed. We did some really interesting measurements on the management interface to the transceiver, and at the beginning, first, we do a recap to pick up everybody. And then we look at the bit error rate. We lookeded at two different scenarios, one is the distance on the fiber spin and also later on on the temperature, that's where goer hard will take over armed he has a hot topic with him, he does a live demo to show you how it's done and then we will have some takeaways.
So, as promised, there is some recap back from Krakow, RIPE 88. We did there a presentation about coherent transceivers and we introduced also single to noise ratio, that's a really critical thing in these days now still, especially when it comes to models query modules but also for direct at the text transceivers, what is the SNR? This is the ratio between my desired signal on the top and the noise level. So when, for example, now all the doors are closed, the noise level is pretty low and we have a really good SNR, and hopefully you can understand me and there is less feedback on the audio system. So we have a great SNR then. Typically, the SNR is more for optical folks, when it comes for the data engineering guys we look at the bit error rate. We will show that later on as well.
Even more, back in 2019, we introduced digital signal processor we introduced the 4100 gig transceivers to bring the signal on the line, and that's also what we cover today again, so six years later, what are the inadversarial of the digital signal processor especially when it comes to the bit error measurement and the optical SNR signal to noise ratio measurement. We do not cover the effect that much today.
So, we took a flex box 5, this tiny device, we developed ‑‑ this is how you can see it when we open it, at the plugged in a transceiver, this one was 8100 gig transceiver, and we looped the fibers on the far end, basically, that's what we did. Then we run a bit error rate test straight out of the transceiver. We did a couple of device tests, so we started the 100‑gig which provided the transceiver, because they have a DSP inside, then we moved on with 4100 gig and also the latest 4100 gig developments on QS FP112, which runs four times 100 big bits, also similar protocol on the the 855 gig, doing the electrical tracers.
So far for the intro. Now looking at the beginning now on the distance, how did our setup look like and we do not show everything now, because this would extend us 25 minutes, we have in the talk and it would be very boring, so we just picked out some test results.
What you see here is, it's the test setup we did with 100‑gig bit 40 kilometre terse transceiver and we used the transceiver, we used some fibre drums, 10 kilometres, etc., cascaded them to different lengths and we needed some attenuators.
Our math at the beginning looked like this. So, we said okay we have six different dances we want to do, a back‑to‑back link up to 50 kilometres so beyond the spec of the transceiver itself because it's rated up to 40 kilometres. And the interesting part is the transmit and the receiver, and keep in mind that the receiver minimum is at minus 14 DBM, that's going to be important for the next slide, and we want to ‑‑ why did we use attenuator, we wanted to look at edge cases and we also wanted to protect the receiver because we get into the OpenFlow at minus 3, we see the launch power it up to plus 8 DBM so we need to do some attendation there.
What's the result? This is basically the combination, the summary of all the tests we did, and are interesting, so on the Y axis you see the bit error rate. There is a demarcation line, 2.4 multiplied by 10 to the power of minus 4, and for simplification I just used the exponent from now on, so minus 4, minus 3 and minus 5. That's the bit error rate so. The demarcation line is at minus 4. Above that to minus 3, it's a weak signal, we have too much bit errors in the link and below it's all fine, that's defined by IEEE basically. This is raw data coming out of the transceiver. And it's working perfectly fine beside the 20 kilometre one and when you look at the 20 kilometres, we highlighted this also in the green section, we are closely to the receiver minimum value of minus 14 DBM which I showed in the previous slide so we are now at minus 13.4 and we are just at the edge of the demarcation line which we are allowed to have for the bit error rate.
And 50 kilometres didn't work at all. You see we just went out to a really bad bit error rate of minus 3 almost and then at the higher temperature it shut it down. We also measured the temperature there.
There are some interesting steps in there which you might identify with a question mark there, and this is actually the control circuit inside of the transceiver. So, the control circuit looks at the temperature for a transceiver and when it recognised the temperature is rising of the transceiver itself it changes the bias current going into the laser and optimised the lays are a little bit to get a better bit error rate because you see them always dropping down at 40 kilometres for example, at roughly 35 degrees, or at 30 kilometres, it does change at roughly 50 degrees Celsius, and then it optimizes the bit error rate there for that setup.
Some words about TTL bit error rate. As I mentioned before, 2.4 multiplied by 10 to the power of minus 4, this is defined by the IEEE. This is the industry which takes care about the ethernet. What is the bit error rate? Basically it's a mound of errors, divided by the total amount of bits we transferred on the link in a certain time span. But you do not want to look only at one error, because this would give you a false result. You always cluster them. You take a couple of errors. For example, in this example, it would be 100 errors on 100 gigabit link and then on a bit error rate of minus 12, for example, which is called error free, the gaining time between the errors would be roughly 60 minutes, which is pretty good. When you get down to minus 4, when you look at the gating time there, we are at 4 microseconds, so every 4 microsecond error will show up on our link and this is going to be corrected by the 4 bit error correction. And so the weaker your bit error rate is, the more the afford error correction has to work and to correct errors actually on your link and that's good to know.
So this was the distance part. Now we are coming to the hot stuff, to the cool, or to the temperature part. So what we did there is we used the query 4100 gig set error and we were not able to operate it for a long period of time. We needed to add extra cooling, a fan, and also heat sync.
The funny story here is we took a heat sync from an old Cisco 800 so there is shoe history involved, which is pretty awesome, so you have a Cisco 800 supporting a 4100 gig coherent module, which is pretty cool. A good end from of mine he is a wrecker around the Frankfurt area and he basically rips apart those components, he decommissions components and then he handed me over a box of this. If you need support in Frankfurt area to get rid of your gear, let me know. I know someone. With that I hand over to Gerhard.
GERHARD STEIN: I am very excited to show the live demo, but we have to get through theory of the current transceivers. There is a little bit of a difference with this specific coherent transceiver. It doesn't mean it is for all the existing coherent transceivers, but in this case you don't need any attendant ateers and in this particular case it makes us a little bit easier to see correlation between temperature and fiber length and as you can see, was the temperature it's a little bit more independent on one hand. On the other hand, you also see different bit error rates and you might be missing, if you paid attention to the non‑coherent more confusing diagram that we had red sections, this red section is missing because it's way above the one. Because the bit error rate specification is a different document where it sets agreement that's the bit error can be higher. It is related to the nature of the coherent transceiver that the signal and the benefit of the light that has more properties and the coherent transceiver can benefit from T it is okay to have larger bit error rates, because it has more ways to recover the signal.
That's for one thing. As you can see where you have some relation, the longer the cable the more attenuation you have and obviously the bit error rate gets worse.
If you increase the temperature too much, you get to see some like the 40 and 50 kilometre case. It's a different scheme how it corrects, but this is what happens. Other than that, as you can see it's pretty much constant with noise here and there, and it's not so much about the coherent, you can connect fiber of 2 metres, you can connect fiber of 40 kilometres, but please check it out the specifications which is the limit between transmission and exception power.
All right. In our experiment we wanted to find out, talking about hot topics, is what happens if you get the transceiver too hot. So we decided to have some barbecue party and make one of our transceivers particularly hot, and see if it still works afterwards. And the thing is, what you can see here is a little close view of it, I wrapped everything in aluminium foil here the heat ‑‑ don't use it for drying hairs, it's a different kind of gun, this can give you up to 600 degree centigrade of temperature. The only thing that was exposed, as you can see here was my little pencil, in order to see it better as the transceiver, the only object exposed to, I blue hot air on it, and I did it in different sections because our application code protects itself with our programmer device, how hot it can get. So I used the number programme to warm up the transeiver without the heat gun, That is step one. And let it go until the programme said no, that's too hot, let me cool it down. You have to be careful because we don't want to build up things.
As you can see with the red and orange vertical lines, high warning alarms is where you can read out from switches, also from our flex box where the limits are, and the goal here was to see how much margin we have beyond these two lines. If the transceiver afterwards will work and how it will react. Where is our loss of signal.
In the second step I put it 100 degrees so it would cross these lines, then I got a lots of signal, then I tried this again with way more temperature, measuring up to 120 degrees and it was the same reaction, but the graph changed a little bit. And if we zoom in the section, because what also was interesting was to see if after having done this experiment, if the transceivers is able to recover itself. And as you can see it's possible that in step 4, I left it cool down a little bit.
Then I turned it on, this time without the heat gun, the temperature goes up? Why? Because there is a DSC MPU inside that produces some computing power. Obviously it generates heat, but since environment in Germany in January is a little bit cold outside, it cooled down again. And as you can see, it recovered itself to the bit error rates you would expect in the correlation to the temperature.
So, that is one of the experiments we were running just to see the limits. So you can go beyond the limits. You get a loss of signal, but it is something you don't want. I mean, cooling makes sense, right, that's the reason why in our data centres we have noisy fans basically.
Here, I have a question for you, after having learned what we have learned so far.
Bit error rates: So in this graph you can see another experiment where I removed the dissipater, this cooler from Cisco. After a couple of minutes, and just observed what would be happening. And obviously the red curve is the temperature. It goes up. But the blue line, which also goes up, bit error rate, recovers itself, but it does not recover itself as quick, the reason for that is we have a rolling average for you here so where you have bit error rates and you do find stuff like this, collecting a lot of errors and then you put back the cooling device, you turn on your fans, or whatever, it does not jump down. It takes a lot of time and you have this line where in the end you have ten times worse. But it doesn't mean that it's really worse. The thing is you have surpassed the recording. So you have to consider that when you are measuring bit error rate.
And having that, let's get to our small experiment, I am going to connect the device. This is our new Flex box 5 and I also brought a transceiver and I am going to launch an application for you and discuss a little bit about the what abouts.
SPEAKER: The setup up is 2 metres of jumper cable as a loop. And what Gerhard does now is just power it up and brings the bit error rate tester into place.
GERHARD STEIN: It's still booting up a little bit. Let's wait. Here you can see what it's doing. LP mode is low power mode. We need to enable the power mode so the laser turns on and generates a PRBS, it's a generator that transfers some bits and with that you can measure the bit error rate. So in order to measure the reliability of your cable you have on the other hand, that is what you do with these kind of transceivers. It's like a system that is booting up. Here we go, our Python programme was waiting for, and here we have it. We are measuring with this cable our current bit error rate, and yeah, now we can do a couple of things.
What you have to know on these transceivers when you register, the SNR, the optical signal to noise ratio shares the same memory with the bit error rate and you have to write into a certain section where you can record. But we are recording in parallel both of them, which means in the background there is some switching and take a little bit more time. I can disable this, like here, and you see now it's a little bit smoother, a little bit nicer to see, you get where you delivered at faster rate. That's the reason for it. If you want to get more where from your transceiver, especially SNR bit error rate, they are on the same register, you need to swap the DSP. It takes a little bit of more time. Let's disable this, and show you that our experiment is not something pre‑recorded, it is live. I am going to bend this cable and let's observe what is going to happen.
Am I bending it harder? Oh, yeah, very hard. And what you can also see here is what happens to the SNR. Usually one expects that the SNR gets worse, if would go down, but it doesn't, it goes up, it goes to 30. That is a wlikes to define. The thing is you never want to go beyond 30 with your SNR, because there is another effect. So, if you have too much noise, if you have too good signal quality, does it make sense, it's also bad. Why? Because there is a different effect. And we are looking at signal strength, we're not looking at bit error rate. The problem is if it goes too high, you have an effect called quantum noise or short noise, this short noise the way Windows predominant and it doesn't let you recover the signal properly. That's a challenge we have to face here.
Now we are not bend ing the cable any more. The temperature goes up a little bit but nice and easy with 40 degrees, we are way below our warning. So... that's that.
And that's the experiment so far. Let me stop this. And let's continue with a little bit more of the theory.
In another experiment we tried with an SFP plus, just to see how are resistance reacts to temperature. Because in the last experiment, the transceiver was able to recover itself, but we wanted to go beyond and see if there is a point where it does not recover, and I added another section, I called it weird point or something like that, when you get to 128 degrees, maybe you can guess why, all of a sudden it goes to negative values. I corrected the you get to see negative values of minus 1100 degrees. They are unrealistic. I corrected this by encrypting the last bit where you are your signed bit actually but we saw also other effects like TV power went down for a little bit of time and then you got loss of signal at 120. I mean it was 250 degrees, it's a temperature. And guess what? This transceiver ‑‑ I was able to boot it, it still was working, but the was damaged. The readings of the IX signal was very, very poor, so, it's a transceiver you don't want to use anymore. I mean you can even see what kind of serial number it has, so it is that.
Getting to our takeaways.
Bit error rate on direct detection transceiver depends on both temperature around fiber length as we saw. They are more sensitive especially on temperature. Bit error rate is always a rolling average. Have a look at that, so if you have this effect with recovering happening.
About high alarm and warning alarm, there isn't a margin way beyond that, so if you already crossed it it's really a dangerous section. Please don't do that.
And, Thomas.
THOMAS: Which is not on the slide deck here, but what also Gerhard showed in his live demonstration is there is a close relationship between the RX power when you bend the fiber quite hard, down to minus 40 DBM, so basically loss of signal to the desired bit error rate. So the bit error rate got worse and we also saw it on the jumping power on the SNR. So there is a relationship on the RX level and that's also good to mention and keep in mind when you operate a network and you are at the edge of the ‑‑ of your sensitivity of your receiver, keep in mind this is something which can influence your link. When you are really close at the edge. That's worth to mention.
Having said that, on the coherent side, it's a little bit different. The coherent make use of more properties of the light and not just the amplitude. So we use the polarisation and also the phase, and due to that fact it's a little bit more robust signal. This gives us the nice benefit that for sure the temperature has an influence on the coherent block, but when the change is happening on a coherent module rapidly, it doesn't influence the bit error rate that bad, then compared with direct detection transceivers. So the limiting factor there is mainly the attenuation plus the optical characteristics of the fiber span of the length.
.
At the end, ONSR is available now with those modules, but when you operate your data centre, your typical grey optics, it's just to look at the bit error rate to have problem while you stare then you don't have to border much on the ONSR because they are running in parallel and they are close to each other, so there is nothing much to worry about. As long as you are not on the transmission side on the the other system, but that's a different topic for a different talk next time. Thank you very much.
(Applause) Now we open the lines for questions.
CLARA WADE: Just remember to state your name and affiliation. No questions? Any online? No, thank you.
CLARA WADE: Next up, we have the next talk.
GRIGONLI SOLOVEV: Hello everyone. I am glad to be here today. I hope you are doing well.
I am a network architect at Yandex Infrastructure and today I want to tell you about our tool for network device configuration deployment ‑‑ automation network configuration deployment.
It's another tool amongst many others, but because we developed it for our own needs, we choose our on way how to resolve the problems. And we believe that we get ‑‑ got very good product and today, it is released like OpenSource. You can go to the repository for it.
This product has already widely adopted in several large companies. Our main goal was multivendor support.
On the slide you can see the supported vendors.
When we decided to develop this product, we want to resolve ‑‑ to solve those issues. Multivendor support. Our network, like many larger ones, has many vendors with different network operating systems, including quite old devices and all of them should be supported by our new tool: We have a lot of network devices. Currently it's more than 10,000, and our new tools should work with so many devices easily. Some devices have complicated configuration, and our new tool should support device configuration of any complexity which is allowed by the vendor.
Our network constantly changes, and moving from one configuration to another should be supported by our new tool.
Very often only a small part of the configuration is automated. From the beginning, we decided to use full configuration templating for the whole thing. It can be called like software defined configuration. It means that we created configuration and deploy it on the device. It gives us some advantages.
We can configure ‑‑ we can share configuration on the device and if a device configuration doesn't much deploy, we can fix it. For mass changes, we just need to make change in just one template and deploy changes to all the devices which doesn't much new version.
Your documentation is always up to date because if your device is mass templates, templates has the actual information. And because configuration is defined by software, we get advantages of software development tools like version control system review and tests.
It's a great concept and we took it as a brace, but there is still a question how to configure devices.
For this, there are three ways to do it. It is CLI, NETCONF and Ansible. Cm is or data (inaudible) we didn't consider it. I add Ansible here because it is a first choice if you want to automate software, if you want to automate configuration network device configuration deployment. We had these three requirements.
First of all, it should be machine readable, it should have multivendor support, and reliable delivery of course.
So why doesn't it hit any of them? Of course, Ansible, to be honest, also, it is a man readable but unfortunately it doesn't ‑‑ not all vendor are supported, and the delivery not always reliable. Some models still use CLI under the hood anyway.
Then NETCONF in this sense looks pretty good, especially since Open Config which gives us vendor independent models which some network devices of some vendors actually is supported. And of course, it is ‑‑ NETCONF is machine readable and has a reliable delivery part configuration or full configuration.
But after syncing more, we added some new requirements. People write configuration and configuration tools should be clear and convenient for people to write. Also, our network has many devices, including legacy device which often don't support NETCONF and almost never will be support Open config.
And because of that, we decided to use CLI for working, which is our equipment, because every device has CLI. It means that every device in our network will be supported without exception.
CLI is convenient for engineers. They don't need to learn NETCONF for young models and as we discovered it's machine readable in reality. But we still need to somehow resolve multivendor support and reliable delivery in some other ways.
Okay, we did this difficult choice, and create this based op the next principles. First of all declarative not imperative. We described full configuration of the device without worrying about how to deploy changes. Annet handles it.
Modularity: We can work with some small parts of the configuration, not with all of the configuration.
We write templates using common programme languages, which was Python, because it is very expressive and it covers our needs.
We can, as I said before, we can support every device because every device has CLI. There is no one single model for device configuration. Through vendor independence, it requires full device monitoring but we decided to use models for some specific tasks, not for all configuration.
Annet has integration with inventory system out of the box. And Annet, it's not a client server software, it is just local CLA client.
Let me show you some examples of what we built. Using this comment we see NTP configuration for device 127 D2.
Next, we can compare this desired configuration with actual configuration of the device and we look at the difference. As you can see, we need to change time zone and add two new NTP servers.
Next we can see it on the page, the page is a set of comments which gives us to desired configuration.
And we deploy this patch on the device.
Let's look at how it works.
Annet has this, has adapters to connect to different systems currently support Yaml file and NetBox. As I mentioned before, we don't have one single model, we have several models for some specific tasks.
Generations ‑‑ generators. When we get configuration, we compare it with actual configuration of the device, and get the diff. Rule book is the magical part of the Annet which turns diff to the patch. And finally Annet deploys the patch to the device by deploying the model.
Let's look at each element in the details. Information from the inventory system converted to Python object, and in the generator you can easily get access to attributes. For example, you can get FQDN, or check model or, for example, get list of names of all interfaces on the device.
Routing policy is quite similar between different vendors, but can be complex. Annet has RPL model to describe a routing policy in Python. As you can see, it's pretty simple to understand what happens here. We have a set of conditions and set of actions. Here, we can have a reference to other objects like prefix list or BGP.
Because networks show all connections between devices, Annet has a mesh model which describes for device neighbours based on templates and these templates applies to connections in inventory. Connections to inventory, described in inventory system.
And we get a list of Python objects for remote peers. In this example, there is a template for Tor spine connections for the fabric. As you can see we can use mass separations based on names of the device.
This template can describe all the connections between all the Tor spines in well designed data centre fabric in just one template.
We ‑‑ Annet uses this template and connections in NetBox, returns a list of remote peers.
You don't need to keep IP addresses for AS numbers. As you are aware, everything is described in just one template.
Very useful for regular networks.
How we work with multivendors:
We resolve that issue in Python generators. In generators ‑‑ each generator is Python classed with methods for each vendor. In the generator, you have access to inventory data from models. And generator yields configuration lines.
There is an example of the generate NTP configuration for Juniper. As you can see, it's pretty simple and you don't need to be a Python guru to work with that tool.
For this tool, in fact, if you can write censorship list you are already qualified to work with that tool.
Each generator has a limited scope of, for configuration, and the scope described by ACL. When we compare configuration, we need to look at lines managed by ACL, it's also defined by this ACL, managed by generator, and this shows which ones should be considered is defined in the ACL. It device us a modularity. This example of ACL for SNMP generator for the whole device.
The rule book.
The rule book turns diff into patch, but sometimes you can't remove lines from configuration, just adding a prefix like no and do or delete. Sometimes you need to change these lines or skip it. In this example, we are trying to change the weight of port‑queue for the devices and as you can see we need to remove the middle part of configuration line.
Annet supports very flexible logic, almost for any case. Order is also important because you, you should remove BGP community before removing ‑‑ for example, you should remove BGP community before removing routing policy which uses it.
Sometimes network operating system asks you some questions, and the rule book has answerable questions on the database to make the answer.
And finally, patch ‑‑ Annet deploys patch to the device. It happens by using OpenSource go library gin /ET CLI. This library is part of Annet stack. It supports pagination, interactive interaction, NETCONF, and supporting different network operation systems. It can be used like a separate CLI tool or a gRPC server, and also has Python SDK.
And let's look at examples. In this example, we can see how to drain spine in data centre fabric, we just assign tech to the device in the NetBox and the deployed changes for routing policy generator. In that case we add in graceful shut down community to all BGP announcements and the whole task is done.
And the second one, the second example, it's drain link between devices, were also assigned maintenance stacks to the cable and it deploys the changes for BGP generator. In that case we shut down BGP neighbour, which is configured on that link. And the task is done.
And again, a list of supported vendors. Feel free to add some newer ones. We are really glad to see your PR. And how to try it. It's again a link for Annet repository. We also have some virtual labs. You can use it to try Annet for yourself. Also, we have detailed tutorial, step by step, you see how it works, how it should be configured and how we can use it. And also we have support telegram chat.
I am ready for your questions.
(Applause)
AUDIENCE SPEAKER: Jen Linkova. Thank you very much, very cool. I have a particular question. I assume you intentionally didn't say anything about that, but so you have to play to which seem to be shared for a given device role, right? So, now I want to change it. This would impact every single device, right, of this role? How do we deal with incremental rollout of those changes?
GRIGONLI SOLOVEV: This is just local CLI client. You make changes ‑‑ you can make changes locally and deploy it or you can make changes, make PR, send it to, for example, GitLab repository. These changes will be delivered to any other and they can deploy it. This toot doesn't use for something like deploying different stages for different tasks for different network segments, and you need to work with that maybe with other tools, it's more simple than you expect.
JEN LINKOVA: You basically do your configuration but you do not automatically push those changes anywhere right? You manually control ‑‑
GRIGONLI SOLOVEV: Pretty simple how it works in Ansible, but it's easier and faster and useful.
JEN LINKOVA: Okay. Thank you.
AUDIENCE SPEAKER: Darren Clarke. It's not specific to your tool but I just had a quick question. How does it deal with logging into Nokia when Nokia doesn't allow non‑interactive SSH sessions?
GRIGONLI SOLOVEV: Nokia supports several kinds of CLAs.
AUDIENCE SPEAKER: It's got its classic and the new MDC.
GRIGONLI SOLOVEV: Yeah, we are working with newer one.
AUDIENCE SPEAKER: The problem is it still doesn't allow non‑interactive SSH. I wonder do you know allow it's getting around that?
GRIGONLI SOLOVEV: I don't remember that we have some issues with Nokia. We are working with that. We have a number of routers from Nokia and it's working. Maybe if you have maybe some specific questions, we can discuss it later I think.
AUDIENCE SPEAKER: Okay. Thank you.
ANTONIO PRADO: Maybe more a comment, not a question, from Gert Doering: Having all these vendors supported with proper diff and do new config is quite an impressive achievement.
CLARA WADE: Thank you.
(Applause)
Next up we have Maria, who is going to show us how to assimilate networks in your laptop.
MARIA MATEJKA: Hello, I am Maria. I was here a year ago submitting a lightning talk about basically this, but it has moved a little bit forward and I want to say a little bit more about what's actually happened.
So, first, with myself, I am Maria, I am working for CZ.NIC which is the host of the previous RIPE meeting. I am team leader of the BIRD Internet routing Daemon, and also I am an IPv6 Max list and I am expert in complaining about almost everything. So if you wish to hear some more complaints from me, well this is going to be full of complaints also, but if you wish to hear more complaints from me, just hit me hard. I am also a cook at a summer camp which is called ProTab, which is relevant for the next slide.
Because ProTab is a summer camp for young programmers in Czechia, there is a website link on the next slide, but it's only in Czech because well, the programmers are ‑‑ the target group is Czechia, so, we don't have an English version, but Google translate may work.
This is a picture from the camp. No AI involved, hopefully.
The camp is intended for kids who actually already know something about IT and something about programming. So, we say you should know what a condition is. You should know what a fore cycle and you should know what's a function. So, it's not like teaching the complete basics, it's more like: Well, are you thinking about IOI? Are you thinking about some university expert fast check or something like that? Yeah, come in, we are a good party, we are a group of people who are as nerdy as you are. And if you can't see anybody as nerdy as you around you, all over the year, now this is going to be that group for you.
And half of the schedule is teaching. So, we teach them more. So, they become more nerdy, and they becomere ‑‑ well, it's fun, and I am cooking there. I originally ‑‑ well, several years ago I was leading that camp, but I got fed up, so, I left it to the others and went into the kitchen. But I am still teaching routing at ProTab. And it's like a bit crazy. Because, well, I can't assimilate hardware. I can't use physical hardware, it's just impossible to bring there. I can't assimilate the hardware, it takes too much. Container lab is greedy as well, considering this. And I cannot run this at some Cloud ‑‑ at some server farm because if the uplink goes down, well, the lecture goes down as well.
And I am not going to set up an online Cloud, well, not at all sorry.
So, what do we need? We need to do a stimulation of routing to run offline. We need to run it without pseudo, because we don't want to scare the young kids, hey, you should install this and run these five scripts and pseudo, and so it should have some quite useful user experience for somebody who has not five years, ten years of shell scripting in their backpack. It should be approachable for some little kids who are, admittedly, at least familiar with the the command line a little bit. But I don't want them to read big bash scripts. Let's think about what's happening there. And I want least resources consumed. They don't have my work laptop which is the high end current technology. Some of them from ten year‑old laptops because their families are not rich enough, they don't have the money to get a laptop for like €2,000. Sorry.
So, we needed to run on their own laptops.
So, what I tried two years ago was BIRD's network, which tested our test tool, written by... ten years ago. It uses Linux kernel, it's extremely lightweight. It has almost no start uptimes. It has no memory and no CPU, with just the minimum you need and it runs in your laptop.
It needs the root access, it has no other isolation, so you need to know that every ‑‑ that each of the containers is actually your own laptop, but with isolated space. You have to run, write a config file or you have to use those IPNS big commands. It expects BIRD to run everywhere, it's like ‑‑ it's expected to be used as a testing tool, as a simulation for BIRD development, not for teaching kids how to route.
And also, well, there is too many NetLabs, there are like at least two NetLabs just in Czechia, in Prague, different projects, so... well, it's just too general name. It mixes Python and bash, which ‑‑ well, the way how it mixes Python and Bash is quite crazy. Well, nobody in our team is actually an expert in Python, so this is how it looks.
What we need for testing is actually full automation. We need it to run offline as well because when we use it on an aeroplane from ‑‑ well, I know I can't use the argument of sitting on a transatlantic aeroplane because I am not going transatlantic to the USA until further actions are done, but maybe the other way, but yes, I want it to run in the aeroplane because when I am fixing a bug in the aeroplane, I want to test it. And I can't test it with a container lab there in a data centre, sorry. I need to test it right now in my laptop.
I can run ‑‑ I need to run multiple clusters on the same machine. I want to prioritise the scripts and let's say try 15 different variations whether it's, it shuts out or not, and actually, this this mass makes the testing faster. I need repeatable tests, I need lots of control. I need putting the links up and down, adding more, removing them in different ways, lots of things.
So, let's introduce Flock.
There are too many flocks already actually. If you look into Linx, there is a Flock, so I am using a Flock‑similar, but if you find another name, something better, something more unique, feel free to suggest it. There is like a small window when it can still be renamed with not much of a fuss.
So what is Flock doing? I basically wanted to go for matching the container, what container lab is doing, which is actually not done yet, but what I'm trying to do is go from the bottom up. So, try to do the minimum needed to make it useful and feasible. So I'm making half containers. I don't need the route, I don't need the WIDs, the GIDs that's just using Linex network spaces and many others. So, it isolates everything it can. But still what I need is a good shell access. So I don't like Docker which were opening a shell needs a like five or six word command, it needs SUDO. I wanted to like ‑‑ like I want the children to not kill their own laptop by accidentally doing something bad on the route.
So let's jump into it, how it is used. You just create a cluster, you have to name it. That thing ‑‑ well, bastu, well that's a Swedish word for sauna. You create a cluster, which is, which you have to name, it's actually a path in your file something where everything is stored, so you can go to/D Mb and do this, you can go anywhere else and do this and then you start two machines and join these two machines with a link and you say well add the ‑‑ if you read the last command, it's make a PtP link in the cluster from the machine A, it has to name B and to the machine B and it has the name A, so it points to the other side.
You can log in to the machine, just say shell and it entries that thing.
You have to fix some common problems sometimes like set up for forwarding on picks, IPv4, this didn't like me because you don't have any other user than root there. You have a root there, which is, which is funny, but the root is actually your user, it just fakes the root. But the root can do anything like setting up IP addresses, putting interfaces up and down. But the TCP dump in Debian didn't like me, so I had to build it myself. When I upgraded to tricksey, they actually fixed this. Thank you, I don't know who but thank you.
Basically everything that dumps, everything that drops users, drops the roots to some specific user may have problems inside this setup. On the other hand, you are using the root is not a real root so you sometimes have go for those commands which disable those security mechanisms. This is not ideal, but it can be well explained when teaching.
Well, this is a little bit crazier setup. Let's loop 200 machines with one after another. This basically just creates 200 nodes. There is another piece of this script creating BIRD configuration. The common part setting up the device on the OSPF, with a wild card for interface name. Then you prepare the specific configurations which basically needs just a router ID, everything else is the same. Then you run BIRD on all the machines. You cannot open BGPD. You can un whatever you want, but this is what I am most familiar with, so presenting it with BIRD.
This is just for five lines of a Bash script fed into the the shell, setting up the forwarding, starting BIRD, adding a local, adding a local interface, setting it up and adding an IP address to it which is also generated. You'll see the print I have there. So every single machine gets its own IP address in the global branch.
Then you need to link the machines together, which is basically ‑‑ you create the BDP links which is shown there, the first one is 1 to 200 the others are 1 to 2, 2 to 3 and so forth. You have to set these Linux up and start them.
That's it. And you have the routes. You just can show the routes.
So, when you do this, nice, you can show it to the, you can show it to the kids and every single child can have this crazy setup in their laptop, well considering they have at least 8 gig and half of memory, but otherwise it's like five minutes to set up everything, if you just copy the script, the start‑up takes a minute and a little bit, others take ‑‑ other parts also take some minutes but on my laptop, it fits into five minutes, maybe on the child's laptops, it may be ten because of the CPU speed. Well, this setup obviously has some problems. Like if you have 200 nodes in a loop, you don't ping the other side. It's actually quite a bit stupid example but... well, it actually allows me to show the element, the TTL, this is actually something we used last year at the ProLab, and the kids looked like they learned something new.
So, where is Flock today?
Flock runs on Linux. This is not written here, it runs on Linux. It's in the development phase, so I am running it on DBN, now 13. I don't suppose it should differ very much between distributions, but I didn't test it yet.
There is a responsibility that it would actually run also in that Linux for Windows. I don't know whether BSD systems would ever be okay with this, because if you run this on BSD, they don't have the containerisation, those name space thingies, it has a specific approach and specifically prohibits what I'm doing with that.
Flock is written in C and Python. It's quite simple and easy, it's just several hundred lines of code. It's CLI interface, and it has JSON interface, so it means you can actually link your automation onto it, well, not yet because it's not so complete but it will move forward.
You can dynamically update the topology, you can see if ‑‑ if I go three slides back ‑‑ this thing I was doing it after setting everything up and this is actually creating the links between the machines. So, I was first having 200 separate machines and even, only after setting up everything, I linked them together.
There are some things to be fixed. I have some weird problems with overlayfs, which may prevent easy running of FRR. But this is something we aim to fix quickly. The terminal sometimes behaves weirdly because I probably didn't set the right things when starting the shell. This is something to be looked into and fixed. There is a little bit of documentation to be added, especially the documentation about the Internet architecture. And last but not least, we should find out how to install this in a useful way. For now it's just creating a SIM link and mate minus E somewhere because you have to build the C part.
In development is actually automation as BIRD testing tool. We are trying to make this to surpass the current NetLabs switch which we have, and to make the net lab testing, for now Flock testing, much larger and cover most of the BIRD thingies. Which by the way, BIRD 3, has been released. So we are now flying in multiple threats. Sorry, commercial break ends.
What we are developing there, also, we expect to implement a TAP link connecting to a VXLAN so this may very easily allow students to connect between themselves, they just ‑‑ each of them creates a VXLAN in their cluster and the VX ‑‑ well, they create a tab in their cluster around it looks like a VXLAN outside, so they just can communicate over this, and it can be used for teaching like local AS, I have my local AS on my laptop and then I am connecting with my colleague who is having their loan AS, and we are checking over BGP links and learning the BGP links over these virtual connections.
We are trying to fix Docker image support basically running Docker inside this. Who knows whether it works. It's going to be quite some experimenting, but we'll try our best best because we actually need it.
We'd like to add support for some physical interfaces, adding like real routers to the virtualised machinery. And last but not least we want to have GUI for the kids to click all those things and be ‑‑ make it even easier.
You can try Flock, and it's available, it's downloadable. I have like two hours ago released version 0.2, not yet announced. It's still under quite some development. But if you wish to contribute, if you wish to fix some of those problems, I mentioned earlier, you are very welcome. I love it. This is myself. If you want to look me up anywhere, so, it's going to be in the presentation.
That's it.
(Applause)
CLARA WADE: And we have some questions from the audience.
AUDIENCE SPEAKER: Hi. This is on Bray IC. Cool stuff. So I might actually take a look because I also teach computer networks and DNS now, but I wanted to point out something similar. I spoke to Casey de/SOL, the DNS space and he also teaches networks in the US and they have something called CougarNet which is where he described the networks and it will do similar. While you might want to take a look at this as well because it's probably simpler than what you do because you also want to test it for BIRD testing, but the CougarNet is also cool stuff like for teaching the university students and kids the networking because it also does need Docker and stuff. But cool stuff. Thank you.
MARIA MATEJKA: Could you please send me a link? Thank you. Because I won't remember this in a minute.
AUDIENCE SPEAKER: Hello. Constanze from MTT. I was here the previous time you presented this talk. First of all, thank you for an OpenSource tool, that's very nice. However, real production networks, you have to assimilate the vendor equipment. It is not just Linux containers connected to each other. So, that's the point of Container Lab. And having Container Lab, I would like to also ask the question: If I use Container Lab with just images of FRR or BIRD routers containers, how would you compare that to your setup? Because I would believe that Container Lab would be lightweight enough in this case.
MARIA MATEJKA: Okay. I say when I have to use ‑‑ when I have to teach Dutch kids how to use SUDO to start it up and this way I am not going to teach them Container Lab. I am really trying to go the least ‑‑ the least burden on the user possible. So, like, one of my goals which are ‑‑ which I can't leave, is not having to have a route access to the system. So, this is what I was trying to install Container Lab, maybe I was just too dumb, but if I remember correctly, I had quite some problems with no SUDO and even with SUDO it was not like just do this, do this and now it runs.
AUDIENCE SPEAKER: Okay. So you are also targeting an educational tool for naive, or beginners, let's say, that you want to minimise the security exposure to the box.
MARIA MATEJKA: Yes. I want to ‑‑ I'm not aiming to precisely assimilate networks. The aim is to have a tool for easy, for easy setup, to go for like first five minutes of setting it up, at most, and then go teaching, teaching, teaching.
AUDIENCE SPEAKER: Thank you. That's clear. Thank you.
AUDIENCE SPEAKER: Martin Winter. You mentioned that you developed more of a network automation testing on BIRD, and I was wondering if you already looked into the issue ‑‑ we had similar issues on our testing that when we did multiple topologies in containers that the way the Linux kernel works, all the network operations is only on one core. So you will run into potentially serious issues on like regular arch testing. Probably not an issue on the laptop because you don't have that many cores but on large servers. But
.
The other question I had is you mentioned on the connections like PTP, are these point‑to‑point linking or are these broadcast links?
MARIA MATEJKA: These are implemented at Vdh, just with the Internet, so they wouldn't be actually broadcast. Well the technical stuff under there is not relevant for the teaching itself and you can set it up as APTB, basically what do you do with the BETH is on you. I think it's created as a broadcast ‑‑ or is it? That's a good question. I wasn't thinking much about those specifics, but I suppose we'll run into some of those issues later or earlier, so who knows? Sorry. I couldn't answer.
CLARA WADE: Thank you. Antonio, any questions? No? Thank you Maria.
(Applause)
CLARA WADE: Before you go, just repeating a couple of announcements. The RIPE PC election nominations close at 3:30 today. So if you are interested or, you know, wondering if this would be a good fit, come to us and feel are free to ask questions or send your nomination.
The NomCom will hold office hours during the second half of all lunch breaks from 1:15 to 2 p.m. at room Caustania and you can share feedback on ICP‑2 from 1:30 to 2 p.m. at the meet and greet desk. Thank you very much.
(Lunch break)