RIPE 90

Daily Archives

SASHA ROMIJN: Hello everybody to the welcome to the RIPE90 open working group, we are your chairs Sasha, Marco and Marco.

For a start administrative matters, we have a scribe from the RIPE NCC which we thank you, this session is here and also on Meetecho and on the live stream and also being recorded. You can ask your questions remotely as well and then we'll replay them. Code of Conduct for RIPE community of course applies here and don't forget to rate the talks. Not only for the plenary but also for the sessions here.

We just posted yesterday our meetings minutes from RIPE 89, we have not received any comment yet so unless there's any objections, we would like to approve those.

No one? I would also like to highlight that finally I have succeeded in my most important goal since becoming Chair of this working group, which was dragging our agenda template into this century.

Almost everyone is happy.

So I think these are our administrative matters. We have four interesting talks for you. We had many interesting submissions so this is great, it's good to see there was so much interest in speaking here.

And first I'd like to welcome Victoria to talk about money and OpenSource sustainability.

VICTORIA RISK: Hi, good morning. I have been a product manager for about 35 years and eleven of those have been at ISC. So I am going to talk, this is perhaps a bit of a misnomer, I am not going to talk about everything to do with money, I am going to share our experience at ISC how we have managed to sustain and fund our OpenSource work.

ISC is non‑profit, that means we don't need to produce extra money for venture capitalist gain, anything like that. It also means that we don't survive on, say, passive income from a foundation or anything like that. Our board of directors does not want us to build up a massive amount of reserves, so basically every year, we have to earn the money to fund our operations for that year.

That said, one way or another, we have been in operation since 1994 which is a long time and we have learned a few things along the way.

We have a couple of relatively significant long‑lived OpenSource projects, for an OpenSource project, we have a relatively large staff, approximately 45 people. I say approximately because some people aren't full‑time, and I think we have tried every single technique that is logically possible for funding OpenSource.

I myself have tried selling t‑shirts, selling back links to shady operations on our website, many, many different things.

My boss Geoff Osborne over here has said we can be completely transparent about our finances, so I am going to share some actual numbers. On this slide, the red line is the funding that we received externally to support ISC DHCP during the past five years ending with December of 2024.

The high point on this is 250,000 dollars a year which is not enough to support a single senior software engineer in the US with benefits, let alone a QA team, release operations and other expenses. So we have some experience in supporting ISC DHCP, we have been supporting it for decades. And I will say during my eleven years at ISC, we have never had any funding, not a dollar, to support the client or the relay function. The only support we have had is for the server.

So we recognise that DHCP software was kind of a labour of love. We came out with a new DHCP server called Kea, it's been around for a little while. Really the reason I am here to talk about this is that in the past five years, we have managed to make Kea fully self‑sustaining, so we have really come up with a solution to funding this OpenSource.

The green line there on the top is Kea funding, and you can see it's quite a bit more significant than the funding for ISC DHCP. I just want to say the funding was not the reason we declared end of life for ISC DHCP two years ago, the software was unsustainable really so I don't want you to draw any conclusion from that.

So I am going to talk about the things that we have tried and what's worked for us. I recognise there are lots of other models out there in OpenSource and different things work for other organisations. In the case of Kea, we did bootstrap it internally for several years so we had funding from other operations, mostly for BIND support and we used some of that money to help sustain Kea during the early years and this is something that isn't available to everyone. For some people, internal bootstrapping, you know, might look like keeping your day job, living with your parents, things like that. But we were able to internally bootstrap Kea for the first several years.

I am going to talk about some of the other things here that we did applying for grants, getting donations, selling premium features which I know is quite controversial and charging people for professional technical support.

We have been lucky enough to get quite a few grants. We have gotten at least 100,000 dollars each from Mozilla and ComCast back in the 20`16/2017 earlier days of Kea, and I have to say that the grant money that we have gotten was really instrumental in helping to build up the feature base that we have. There are some additional funds, probably most of you know about them that are available specifically for European OpenSource project, there's obviously not enough grant funding out there for all of the OpenSource project and one problem with grants, on the one hand it's a little bit like easy money if you can get it, I have never seen a granter that made a real effort to figure out what the users wanted, the granters have their own agenda. Typically you apply for a grant, you might not get the money for a year and in that time your priorities may change and sometimes you can renegotiate what you use the money for, but it's something to be aware of. But the biggest problem with grants is that typically, they are only funding new feature development and it's much harder to fund ongoing maintenance.

We have been lucky enough to get some donations. In the past five years, they have ranged from like, I think, 1800 dollars in 2024 to over a 100,000 that we got in to 2022. The biggest problem with donations is it's kind of unpredictable, you don't really have much control over it.

We actually have two lovely, lovely individuals who give us five and ten dollars a month and have for years, like maybe eight years, two of these people, we love them but this is not going to fund our operation.

We have gotten some pretty significant donations from a couple of foundations, fidelity charitable foundation and Craigslist foundation, in both cases these are real donations, they are not asking for anyone from us but it's also not something you can apply for, you know, it's a bluebird, it happens or it doesn't happen.

The thing, the type of donation that I think is most likely to help other projects with their ongoing operations is really donations from industry partners and we have been very lucky to get some industry partners who have made ongoing donations to us mostly for BIND support, this I think is promising. There are some microdonation programmes you can sign up for, PayPal, charitable, Amazon Smile, things like that, go ahead and sign up for them, but I wouldn't expect to get more than a few dollars out of that. It's a not a lot of money and again you have really no control over how much you get that way.

So this is the controversial topic. Back in 2017, Tamek...., the lead developer and original author of Kea, came and talked at a RIPE meeting about the idea of trying sort of what's known as the Open Core model where the core server, the DHCP 4 and DHCP V6 and dynamic DNS servers are all OpenSource and there's some extra features there are premium. And I came back the following year and I talked about this again, and it was pointed out to me that this is the OpenSource working group and this is commercial software and that was like the beginning of ‑‑ which was completely correct and I apologise for that, but honestly I think I hadn't realised until that point that we were actually talking about commercial software.

But we did have a rationale for what we thought it would be reasonable to charge for, we wanted to make sure everything that was required for compatibility, for performance, for standards compliance, all of the protocol level features were all OpenSource, everything that was needed for backwards compatibility with ISC DHCP we wanted that to be OpenSource, but we reasoned that service suppliers and really large scale deployments would and should contribute, so features that were only needed that those kind of environments were features that were a convenience feature for management, things that we could, you know, charge a premium for and we also at the time we thought some people would upgrade and pay us for professional technical support.

So a little primer. It's actually not hard to sell software. I just went on the internet and downloaded a EU LA from somewhere, it took me 15 minutes, there was a little bit more work development had to put the software in a separate repo, it wasn't public, there will people who will sell you cloud service where you can put your software in a repo, we put up up a little online store, people get a credit card and use a secret token to to get the software, we give them the source code as well as packages, they get a right to use for five years, it seemed like you know, easy, you click the button and you pay the money. Anyway.

So the promise was this would be self‑service, it would be low touch, not very much extra effort or work on our part and it would be like OpenSource Plus. But this is not what happened. These are just a few emails from the ‑‑ literally the day when I first made this slide deck. This is business to business software, this is not end users buying stuff on Amazon, a lot of people were not prepared to pay with a credit card. They wanted an invoice, they wanted a vendor portal, they wanted resellers, they wanted multiple quotes.

So, what we found was this first comment is harder to explain but we felt like it was actually slowing the adoption of the software because people couldn't tell whether or not they needed these extra plus features. We tried to set it up for auto renewals and people hated that, people didn't want to pay via credit card and it was just very confusing. The bottom line is the revenue impact wasn't that significant. If you look at this chart, the green bars are all professional support services with different SLA levels and the yellow bit at the top was the money we got from charging for premium software features. Not an insignificant amount of money, it might have been almost 200,000 dollars last year, but as you can see, it's not that significant in the big picture.

So we decided to stop doing this. We have announced that we are ending the online sales of premium software features and with our next stable release, Kea 3.0, which is coming out in June, these features are now in the OpenSource, they are in the development branch in the OpenSource now.

I am not going to read this because I am running out of time but we have already heard from one user who was thinking about adopting Kea who shyed away because they saw Kea was not in the OpenSource and they have now changed their mind so this is great confirmation that this is the right choice.

Just to conclude, I want to say what we have learned the way that we are funding ourselves with OpenSource, we have professional commission sales people and I know a lot of people think sales isn't a real job, but it is a real job. Somebody has to fill out quotes, has to schedule meetings, people have to be able to produce invoices, fill out all the questionnaires and the surveys from these enterprise customers, it is a real job.

What we sell are annual support subscriptions with a variable SLA, standardised agreement include a mutual NDA, we are not going to give you a list of our customers, you need people who can produce invoice, sometimes multiple invoices and ISC, we have a separate professional dedicated support staff. I know lots of OpenSource projects want to use developers for support and I understand that, but tech support is interrupt driven and so it's not a great fit for the developer job.

A couple of final observations. Business users, mostly the people who are using your software, it's not their job to make donations, somebody else in their company might do that, but they are empowered to buy things they need to do their job.

So if you offer something they need to do their job, then you can get money. People do understand usage‑based pricing and it seems fair.

However, the sales cycle for business‑to‑business software can be pretty long, it can be a year. I will say that the reason tech support has worked so well for us is the renewal rates, it can be very high. Our renewal rates are at least 85%, this means that we don't have to go out and find all new customers every year.

So in summary, in our experience, donations, not a lot of money, it's kind of unpredictable with a possible exception of business partnerships. Grants, I highly recommend grants, a great way to sponsor new features and there's a significant amount of money you can get from grants.

Our experience with premium features was, it was too confusing, was not a lot of money and it was really inconsistent with everything else we were doing as an OpenSource organisation.

And finally what has really worked for us over years is selling technical support, this way we are aligned with doing the things that our users need, we know what our users need because we are offering technical support and we have a stable revenue base over years and if you have an OpenSource project that you are going to need to sustain for years, I recommend you look at this.

That's it.

(APPLAUSE.)

AUDIENCE SPEAKER: Thanks for the presentation. There is one funding which I was missing which seems to work for us and what we are doing is like that we offer more like towards the premium feature but it's not just for paid users, you give us money, you need a special feature, you can go to the community and say I want that and maybe convince somebody or you can come to us and pay us money and they will develop it for you, but everybody gets it at the same time, nothing special.

VICTORIA RISK: A sponsored feature.

AUDIENCE SPEAKER: Did you try that and what's your experience there?

VICTORIA RISK: We have done that a bit, I think think of maybe three features we did that way. In one case the sponsor insisted that we embargo the feature from the OpenSource for a while, this was during a period when we really were hurting for money and so we agreed to that, but it hasn't been a big resource for us but yeah, thanks for that. So sponsor features work well for you.

AUDIENCE SPEAKER: Thank you. The queue is closed after Maria.

AUDIENCE SPEAKER: We are a very happy user of Kea, we use it for four years now, implemented and migrated almost everything from ISC to Kea.

I have two comments. One thing is that you changed, for example, the what premium plugs you get if you have a support contract. We started with a bronze support contract, we pay a decent lot of money but I experienced that we did not get every plug in. So my client was complaining that, for example, a default, an extra plug‑in was not included while we were paying a lot of money; luckily you helped us. One comment is that, don't make it too complex. If a business pays a lot of money, they expect they will get all premium features. And if I looked at the table, I had to go to the silver or gold support only to get one feature so they didn't like that.

VICTORIA RISK: Right, so the only features that you don't get as bronze are really the only feature is the role‑based access control and that's ‑‑ honestly not a super mature feature yet, it probably doesn't do everything people are thinking it would do.

AUDIENCE SPEAKER: I would expect some basic thing.

VICTORIA RISK: We actually also have removed the complexity in the subscription so anybody who subscribes gets all the same software.

AUDIENCE SPEAKER: I saw it in the 3.0 version, that's very good and also don't change the whole model in the future, just make your model and don't change it a lot because the business doesn't like it, that's the feed black. Thank you.

AUDIENCE SPEAKER: Hello. I wanted to know that in terms of the support team, do you have connection when you are getting to the second or third lines into the developers?

VICTORIA RISK: Where I can yes. So we have seven professional support engineers, nearly all of them are more senior operators before we hired them, they meet with the development team at least once a week for a support escalation, but we use matter most chat, we use the OpenSource source version of it, yeah for that, and we are in discussion all the time so absolutely, they can escalate something at any point.

AUDIENCE SPEAKER: Thank you.

AUDIENCE SPEAKER: I was curious with the emerging regulatory compliance needs particularly of your propriety software partners, have you considered using compliance specific with the CRA as a potential funding model.

VICTORIA RISK: Absolutely we have, yes. That was a good question actually so the question was as OpenSource project have to apply for and receive CE marking effectively for their software, can you use that CE marking with your consumers who are creating commercial products? Because they are going to need that CE marking, can you use that as a way to get support from them, I think it's an excellent strategy.

NIALL O'REILLY: This time speaking with my Tolerant Networks hat on. You mentioned some European sources of funding in particular an NL‑NET foundation, Tolerant is a partner with NL foundation in some of the NGI 0 funding programmes. I spoke about it in Rotterdam, I am not saying much now. One of the things to bear in mind it's not just for European developers, the project, it supports collaborative project and as I recall, one of the partners has to be European but it can include people outside the boundary.

AUDIENCE SPEAKER: Maria, developer of BIRD, I must confirm what you had said basically everything works for us the same way as for you. One thing to the premium features, I have encountered several times plug ins so N‑JINX, they are using the premium features model and the N‑JINX has the problem that there are alternative OpenSource plug ins which are also included in the premium versions which are effectively kills your premium features an also code difference and problems with merging. So this is why I from the beginning have both premium features in BIRD and we are still keeping with the no premium model.

VICTORIA RISK: That's a good comment, these premium features are all developed by the ISC engineers, we had hoped to see more community developed plug ins and obviously those would not be premium but in any case, I'd like to think that the whole premium debacle is behind us. And I just wanted to share how it came out, how it turned out for us in terms of funding.

MARCO SANZ: Thank you very much, it was really very sincere and insightful, thank you very much.

(APPLAUSE.)

Then the next speaker, Ondrej is going to talk about how they made BIND 9 faster and more scalable transition to new century, thank you.

ONDREJ SURY: Hello, Ondrej Sury. ISC look over the sessions. Not really.

So this is talking about BIND but I would like to emphasise this is not specific to BIND, it's just I want to share how we did stuff but it is useable to all approaches as well.

So had a little bit of BIND history, it's a successor of BIND 4 and BIND 8 and it was, development started in 1998, this is really, really old project.

And it can show in some places. It was originally both single threaded and multi‑threaded, it had own event based cooperative scheduling, there was none in the year 1998, there was nothing like that before. It uses the... locking, but we have our own RW lock, I will show you why in a moment.

So 98, how old is it, this old! It's hard to, it's very easy to forget how old and how fast the computing developed since then.

So BIND 9.20 synchronisation primitive using ‑‑ we are using PO six locking, there's a paper, it's not a hundred percent one‑to‑one implementation of the paper, there's it just blew up the memory and it showed up in the measurements, it's not really needed. It also needs a standard atom IX in the C 11, yeah. And this morning I scheduled a report that BIND doesn't comply with DCC 494, it's stuff that... are trying to use really, C11, if every project upgraded to C11, the quality of the C code would just go up.

So what's the effect of the custom RW lock, I showed the pictures in the DNS group, if you don't know them, you basically on the Y scale is a time and on the X scale is reverse of percentage so you want these to go to the left an to the down, so to be dispatch as many DNS answers and as quickly as possible so quick is down, as many to the left, the blue one is like before, well the blue one is the main branch and the orange one is the one with just, I disabled a customer ID block and it uses 61 and there's a huge effect of trust pending the implementation of the RW, the real world resolver performance. So don't be afraid to look into something that's not in the standard library if it helps, but you need data, like use the measurement all the time when you are making changes like this.

Pictures from really great book from Paul McKenny about the Parallel Programming, so what can you do about it? Parallel programming and those a couple of features, a couple of reasons why, some of the stuff might be slower than you expect. Like thermal throttling, you make some algorithm faster and now it's running slower and what could be the reason, it could be you hit the thermal throttling, the CPU slows down because you are burning it hot. Memory barriers, the cache in the CPU needs to be synchroniseed with the memory and if you over do this, it will be slower in the end. That's pipeline plushing, cache misses, IO completion, the IO is really slow, memory references, all this stuff that jug else the memory and forces the CPU to load the new content from the memory, makes your programme slow. And it could be avoided, some of it, not all of it but some of it.

So again, what is RCU really, what is a technique that's been used in the Linux kernel but there's a library called user spice RCE, it's being used in BIND and also being used in knot DNS and so if you look at the difference between the RW lock and RCU is when there's a writer and the writer needs to, here, so the writer needs to wait for all the readers to stop before you can write stuff. But RCU doesn't apply, you can do the updates while reading from memory and there's memory problems that can be resolved, it's the picture of the left, you just need for all the readers to finish before you reclaim the memory that's no longer being used.

And that's it.

So all the threads can run at full speed, almost full speed, while you are using the synchronisation technique, some pictures from the book and the RCU is a really great help to make the Parallel Programming really, really fast. It requires a little bit of change of like mindset, how you think about the things, but it's worth the effort if you are doing high speed servers.

So there's some, this is the user space RCU API, it has the RCU, read copy update API, concurrent memory model, architecture abstractions, there's the great part is that for some of the data structures, they already have implementation, you don't have to implement everything from the scratch like the list or hash table and there's a great stack that's lock free, some of those are weight free as well and it can be used right away if you start using RCu.

So now BIND 9.20, we replaced some of the locking, not all in places, like, you know, logging, it's being hit all the time. Some weak references in a couple of places. And we use some of these like lock‑free‑wait‑free data structures for call backs table it helps with the architecture as well, it might not help with the performance, it helps with the overall architecture. And these are internals to BIND so unless you want to go writing BIND code, it doesn't make sense to go in depth.

So the bad cache API is formerly bucked locked lists, and there's a linear list, with the DNS names ascending index and the lists were used as list recently used so you need to clean this up as well. Now it's completely lock free. It's shorter, it's better and shorter, it reduces the hash table, the lock free hash table from the RCU and it has a per thread LRU list and this is the API for memory reclamation but the gist is that you can go better and the code could be simpler in the end if you know write library and know how to use it.

What we also did, we replaced the old event based model and now everything works on lib uV loops, it's again simply but more powerful, unfortunately we have like four tabs of different queues so the ISC job, you know, there's so many names, naming is hard, I do the DNS. So there's no locking on loop, very fast for basically networking operations. So this runs needs to be very fast, the async, no locking, it used the wait free Con questioner and queue but you can just move the jobs between individual threads, it needs to stay ton the same thread but it is very fast. For ISC helper, this was added to the key trap because we need to off‑load some of the cryptographic operations, something that's slower but the important like take from this is that you don't have to do everything yourself. Sometimes it's better just to rely on the operating system because those are people who know what they are doing, when it comes to the jobs scheduling and the preemption and stuff, we know DNS and we should rely on the operating system and the system libraries as much as we can.

There's a new key validator based based on the QP Trie, there's an article, it's not completely done yet, but it's transaction al key value storage based on user space RCU again, so there's no locking but unfortunately there's still some locking higher up in the tree because it takes some time and there's only so much you can do writing a software.

And the thing that's built on top of its is like a cache and zone database and still uses some RW locks and we are working on replacing some of the locking with the RCU mechanism, we are relying more on the QP implementation in BIND so it will get better, but don't expect it to start, it won't happen overnight basically.

But here's some pretty pictures. So how it did help? So again this is very small, but blue one is BIND 9.16 which is now end of life, the orange one is BIND 9.18 and the green one as you'd expect is BIND 9.20 and you can see there's been like great progress, at least I think it's great progress but it's for you to consider to judge, I believe we did a good job improving BIND, it has been here since 1998.

So how do we also measure hot spots? D trace comes from but it's called system tap in Linux and it's D trace in B S Ds as well, it has a zero over head trace points, they are reached in the memory, if you are not using email; they don't cost anything. And the, probing you can use the probes into code that can be enabled they run time of the programme and we use them for RW locks and event loop jobs, we measured incoming transfers and it was ‑‑ somebody contributed RR L feature like drops as a user space probe, so they can measure it in their system and it doesn't cost any competition power then it's not being used.

You look like this, you might a script basically being executed in the kernel so this is one basically measures there's a one probe for measuring the entering into the blue text, the lock, when it gets acquired and when it gets released and you can in the end measure the time between these two events and just like look, this is the output of it and you can then do your fancy stuff like where is your hot spots, there's the biggest contention and yeah, this is inserted in the spreadsheet and sorted out so I can take a look at it. For example the rdlock, the old one cacheing we spent most of the time there in RBTDB and, again, this is like database, you know, optimisation, it's not like we are just like oh I think this needs optimisation so we now have tools as a software industry to actually measure stuff and know where your problematic spots are and you can optimise those.

For the future, that's what we are working on right now, so my like ultimate goal is keep all the data local to the threads so there's no switching the data between threads because that cost you a lot of time and we will only share what absolutely needs to be shared among threads, like you know, DNS cache, you don't want each thread to have own its own cache because it would just like cause other problems and we are converting the easy and typical cases to RCU API, not everything can be and should be also converted. There's some cases where rocking is a better chase, it's like oh, this is a cool thing and we need to change everything to RCU, that doesn't work like this. You need to measure and think about is this suitable here and sometimes the answer is no, no, you should just keep it like this or just use something else.

And one of the things we also work on, so BIND has one problem that if you change the configuration, it's just like I need ‑ and you re‑load it and just like it comes to a false stop, all the threads need to stop, the configuration needs to be reloaded and users start again. And we are working on that to change it so it doesn't so if you change the configuration, it will just like the threads will eventually pick up the new configuration.

And yeah. Next thing we move the locking in the QPDB, we are collectively working as a team on that, it is quite a difficult thing because ‑‑ and some of the things we are considering doing and might not be, we might not end up doing this like this but this is our goal, and the other thing is the address database where it be stored on name servers, these two use cases are the shared data that needs to be kept shared.

And do we have some progress already. So this is like what we can squeeze out of our performance measurement system is based on Git labs CI and shut begun licence measurement system, this is roughly 300 QPS and this is the cold cache. As you can see, we are far, far to the right because we are really doing this hard and this is 918 and the individual colours are the number of threads. So from ‑‑ you can see it on the right, this is just single thread, it doesn't do anything, if you just hit it hard and the magenta that one is like 16 threads, it tries to do something at least.

The difference between 918 rand 920 is there's quite significant improvement, don't forget this is log rhythmic, this is 40% of the drops and this is only 20% of drops.

So there's quite, 20% improvement is quite significant I would say.

For memory, when I first drew this picture, it was like what's going on? It should not look like this. It shouldn't be crazy octopus. But it is, it is. And with 920, it looks much better, right? This is what we expect, not this crazy octopus thing.

So I think you are heading in the right direction.

So for CPU usage, this is 916 rand this is 16 core machine so and so on the Y axis it's basically multiplied by a hundred so this evens that the CPU usage is not using fully 14 of these and it's kind of fuzzy, you know, so it means there are some block contention and stuff so for 9.20, we are almost ‑‑ we are over 14 and it looks less fuzzy, the thing is still could be the locking that makes it less fuzzy because there could be some stuff but in the end, if I go back, the performance did improve so yeah, I think that the locking is not the huge problem right so the problem is like improving what has to be done inside and the spikes, it might be the measurements artefacts, this is very hard to, you know, it's run in the AWS or some other cloud so it's hard to get exactly the measurement what happened in these spikes.

Yeah. That's it. Now I hope it helps with your...

(APPLAUSE.)


MARCO SANZ: What an amazing piece of working software engineering, thanks for sharing the tools, sharing the techniques and now it's time for questions.

AUDIENCE SPEAKER: Fascinating work, thank you very much for all the work you have put it, especially as an operator, where he max out the CPU, this really matters, just a question I was wondering if you have ever measured the impact of this work on power usage? Do you expect any impact on power usage?

ONDREJ SURY: We don't have that measurement anywhere, but if you use all your CPUs probably it will also affect it, but I think it goes hand in hand, so if you make this more effective, it will, you will be able to use the cores to full speed but at the same time if you are not so fully loaded, it means that you will consume less CPU. And I also, I was a guest lecturer at Parallel Programming past at the university and what I was saying to the students, not everything can be scaled by just putting more money into the cloud and I think it was... who vote this or I might be miss remembering but if you can make it little faster, you should, always. Because you don't know if your service will be used by ten users now but by 10 million users in a year.

And like one millisecond for ten users is is nothing but one for 10 million users is a lot of money, a lot of power and I don't care about the money, I work for a non‑profit but as long as I am getting paid... but it impacts on the environment, it should be always considered when you are doing coding and making stuff as much effective as possible because it has real world impact.

AUDIENCE SPEAKER: Thank you.



AUDIENCE SPEAKER: Hello, Tomas, PC H. So you talked about the back probes in BIND, the question is is that like a production feature which you intend to put into live production packages.

ONDREJ SURY: It's there, I think it's in the packet.

AUDIENCE SPEAKER: It's not in the Debian, that was the question.

ONDREJ SURY: The Debian is a little bit weird with the whole system, I wasn't able to run it on Debian, I am ‑‑ oh god using Fedora.

AUDIENCE SPEAKER: On Debian, you only need to start some... packages or something like that.

ONDREJ SURY: In Fedora, it's enabled in LIP‑C and the probe being is not enabled in the lip see, I am a Debian developer but I haven't been active in the Debian but I think it needs to be fixed.

AUDIENCE SPEAKER: We will talk afterwards in the hallway so yeah.

AUDIENCE SPEAKER: Hello... awesome presentation, thank you very much. I am curious about your opinion, I mean C language has been kicking around for decades and it's a proof that we have finite engineering. I am curious what's your opinion about more modern language trying to take a dangling point like rust for example?

ONDREJ SURY: The C is very dear to my heart. So my opinion is that the memory safety is a great thing but for example in BIND, we take a great care about synthesising all the inputs, we make mistakes but everybody makes mistakes, I have been saying for sometime, not everybody can upgrade to rust, right. But I think that the whole OpenSource and general software ecosystem could be improved by just using modern C features because you see project and you see C89, C 99 and you if go just to C11 or if you want to be fancy, C 23, the language provides you with better tools and better checks and it's easier to go from C 99 to C11 than to go from C 99 to rust. So our like shared responsibility might be just like pushing all these project that might not have enough resources to just do a complete redrive, just improve the code quality by using the better C standards. That's my view on the ‑‑ I am definitely, I am a fan of rust but I am not a fan of the evangelism.

MARCO SANZ: Thanks again Andre, that was very insightful and now I welcome Maria on stage, she's going to talk with us more topics.

MARIA MATEJKA: Hello, Maria, it's Maria here again, it's almost again that I am between you and the lunch but there is one person more. So I will try to be appetising. First of all, I want to say we are not going to migrate to rust. We stand firmly with the same as Ondrej said, it's a good idea to upgrade to newer versions of the C standard and if, I will look at it in a little while.

Because, well, what it takes to maintain an OpenSource software. Well, you need to develop something, you need to test it, you need to support it for the customers who are paying you. You have to merge contributions. And in all of that, you have to do some product management, project management and you have to track the RFCs which are coming up like crazy.

And my first question is: What's the rewriting to rust in these fixed points?

It's actually not the development, it's not a new feature. It needs a lot of testing. It needs a lot of support. And it needs an awful lot of product and project manage. I would have to stop BIRD development in features for like three years or more just to do the rewriting to rust. So no thank you.

This all we are doing in two people in the year 2022 and that was way too much for us. And then there is the problem with external architectures which I complain regularly about. The contributions are basically not oriented the way we always need. You can get a fix for a bug. You can get a new feature, sometimes the feature needs a complete re‑work, you get a piece of code and then you find out that it's going to kill performance or something like that.

And then you think about it and it takes weeks and you think more about it and it takes months and then you find out that you have not replied to that contributor even, so they feel like you are ghosting them.

Which brings some more problems like it's stupid to tell that contributor after half a year that their code is complete crap. Because I should have told them right away that I am looking into it and it smells badly.

And also you won't get any performance improvements and any refactoring from your external contributors, if you have a project that's 25 years old, you are simply getting more and more crap into your code without actually updating why are data structures to support the knew features which you need so there's a heck over heck over heck, so this is why it took me so much time to finish BIRD 3, that's another thing.

So let's find another developer. Let's find a senior.

(APPLAUSE.)

I was trying to find some, it was quite hard job. So, let's find a junior. And then you find out they give you the first code and it's completely for nothing. You can just throw it out.

They basically aren't as good after you after you spent ten years on the project. They don't know anything. Why have you ever hired them. I could just throw them away and do their work, do the work in a week. I have just wasted money and time. Or not. Because this is not a short run and a junior is not a low paid senior. It's a person who needs support and who needs to grow. And the junior is not going to grow in three months.

They are not going to grow in a year. This is a several year‑long process and there are many cases in the IT industry where I you hire people after school and they hop between several companies and then go out of the IT completely because they have burnt out and the burn out is a serious problem and burn out prevention is one of the things we are trying to implement quite a lot in the team.

Also there's a question. What are you going to assign to the juniors?

This is the real problem. And the only answer I am approving is the last one, they should do easy development tasks.

Everything else needs either completely different skill sets or they need to be a senior to do that.

But you should choose what development tasks are actually the right one because what I see like a two‑line change in the filter engine is a nightmare of studying thousands of lines of other code before finding out that the these two lines should be actually changed.

Also fixing some bugs is not always the right way to go.

But we manage somehow. The juniors grow and now finally I am feeling like when we went to RIPE, the work is going on. It has not stopped. Nice. The juniors are doing their work. But we have to do, put the work in and lead them to be productive. Also the other thing which blocked us from actually doing some sustained work is support because if two developers do all the support and customers asking how to configure BIRD, well it's not going to fly.

So we hired some L 2/3 support staff, they also need to be taught something, they need to learn a lot, they need to grow because, well, you don't hire people who are very proficient with BIRD just there on the street.

It's quite, I would say, impossible; you have to find people who have potential, which is, which comes to the question what happens next? What should the L 2/3 do? And that's not so bad as with the juniors. You just delete all those do administrative things and assistant things but some of those are attending to do so sometimes you have to stop them from serving tea to you.

But they also have to think out of the box and you can't just, you have to insist that they start at the beginning but they have to know the customer, you would not ask DE‑CIX if their BGP was complaining that the BGP down whether they plugged the cable in, that would be blasphemy.

They also have to progress with you, the developers, because you are responsible for putting the recommendation in and they just, they are just go in and complaining, there's no recommendation, how should I apply to this customer, I have no idea and then you find out that L 2/3 stuff is actually for nothing, you have to reply because you have no documentation and that's our fault and we have to fix it.

So, now BIRD team has 7 people instead of two, wow. Which brings me to the other problem. I have stopped coding. I do lots of other things. But I have not stopped coding but it's quite hard to get the time.

I am doing supervisions. I spend time with the juniors. Looking at their code, walking through it, walking them through their changes and saying, well, please don't do this, please don't do that, let's do this a different way, and so on and so forth. And then I commit this and amend the original Git comment and the submission is under their name, because it's their code, I have just done those simple updates.

And it helps them grow. I have to do a lot of corporate bullshit. You just have to have somebody who is speaking to the sales, because otherwise the sales will just sell somebody BIRD with ChatGPT as their route selection engine.

And they will insist you are going to implement it in half a year, in half a week, why? This is why the corporate structures do exist and there must be somebody in the team who is at least willing to speak to the marketing because otherwise they are marketing something else than you are actually doing and that's not working.

Also somebody has to do the hiring. Somebody has to sit with those candidates and find out whether they are good or bad for your job, whether they are going to fit in the team. It's not just the time you are sitting with them. It's the time you are preparing that, you are communicating with everybody, then there's the risk whether they are going to be the good ones or not. It's a lot of work.

And you have to ensure that everybody on the team is actually happy somehow. That people do ‑‑ people are coming to the job willingly, that you don't have to check whether we are working or not because they feel like it's the right way, it's the right thing to do and their job is not mundane. If you are looking at the same screen for a week, looking at the same book and you are stuck, then there's problem and this is why the seniors are in the team.

And this is why the juniors have to ask the seniors, hey, so what going on here? And the seniors have to come and not say, hey, I am taking it; they have to say, well, let's walk through it with you. And this is the real skill, one doesn't have when becoming just a senior.

So what happened, I am not coding almost. Sometimes some code comes up but most of the code is done by someone else. I am happier. I don't have to do so many things, I have to do lots of other things but I am not feeling so much pressure as before.

We are still not catching up with the RFCs, the IETF is just a measurement for work. (Check) now last at least some hope, there was no hope years ago.

Though one date I would say the three years between RIPE and Berlin and RIPE in here, it was a long run, it was crazy, there were lots of ‑‑ a lot of things to do, lots of things to learn, lots of things to find out that I have to learn but I didn't know that I would ever need and in the end, it's thanks to the whole team that they are actually willing to work at BIRD and thanks to BIRD support customers who are willing to pay for it.

(APPLAUSE.)

MARCO SANZ: So, questions?

AUDIENCE SPEAKER: Hi, it's Ondrej. I see, I don't have a question but a recommendation. Let it go. Learn to delegate. Rely on your team and it helps a lot in the role, who you are, that you are in and I am sure you have a great team and I do have a great team and it's hard to let go because we all know that the only person who does it right is I, but it's also a skill that not even the person in leadership but also the seniors, you said that, need to learn to just let it go and not everything has to be perfect and I know you have a great support from the leadership, from mine and Geoff here, but that's an important lesson to learn.

MARIA MATEJKA: Yes, thank you, actually the this is one of the hardest things to do, to let it go and to accept that not everything is going to be under my direct control.

AUDIENCE SPEAKER: Hi, Gerardo, again. Thank you very much for your presentation. I try to tell people that technology is people and because most of us forget this. One question: Have you, do you or have you tried to changing the junior devs and support staff? Because I find that sometimes bringing developers from the support staff into development actually brings somewhat of an enriched experience which helps in both cases.

MARIA MATEJKA: This is a hard thing to do because actually it's much way easier to hire a junior than to higher an L 2/3 support, to hire a BIRD experienced user and a network administrator is much harder than hire a junior who is proficient in C, so I can do the networking part.

MARIA MATEJKA: Some parts of the L 2/3 support team who is actually a little bit with an overlap to the development and vice versa, some of the juniors tend to do a little bit more of the support stuff, so I don't categorise it, it's just who wants to do what, then they do that.

MARCO SANZ: If there are no questions from remote, I have a personal comment to you speaking as an individual. One of the tasks you listed for you and even marked it bold was ensure that everybody is happy and I wanted to tell you that don't put too much pressure, too much... on you, that is not doable, you cannot do that. The task is maybe to ensure that there is an environment there for everybody to be able to become happy but you will not be responsible for everybody's happiness.

MARIA MATEJKA: Yes, thank you, I owe a little bit overexaggerateed it, it was like I have to create the environment and I have to maintain it, but it's still actually my responsibility that the environment is there. So I have to like keep it and maintain it, it doesn't mean that I have to ensure person is everybody is happy all the time.

MARCO SANZ: Thanks for sharing.

MARIA MATEJKA: Thank you.

(APPLAUSE.)

And our last presenter is already there, he is going to present from remote, hello Martin, nice to see you and Martin is going to talk about all things SDN so the floor is yours:

MARTIN STIEMERLING: Hello, I hope you can hear me and here we go, thank you for having me, I am Martin Stiemerling, from... and I am talking about another SDN controller, I think it's a nice add‑on to what Maria was saying; it's about how to educate people coming into the area of networking.

And the first question maybe most in the people in the room have OK, SDN controller, we have them all, why yet another one? So we are from an academic background and we try to use a lot of existing SDN controllers to educate students here running SDN networks, figuring out what they could do and getting people's hands on extending it code and applications for this but we figured out the existing ones like ONOS, open daylight, you name them, open flow which is by now more than outdated protocol, they are all too big ships in terms of the, it's not documented, if you try to add something, it will take you like half a year to dig into the code and try to find who the code is doing, typically you may still notice you are still in times you don't have six months, it's more like three weeks, four weeks, it's it's too much. Further the operations set up is too complex and meanwhile there was something from the ONOS people which is called Micro ONOS was started and my reading was this was started for the same reason, the product was just too big and in short, was it usable in research operations and teaching and not from our experience. In the meantime some other controllers spawned also like Terra flow, they addressed similar concerns about the existing controllers are too big to be used.

So we had the need for clean and well‑documented code basis for research project and for teaching and we wanted to know how this works and not just thinking we know it, like, having, I don't know, 400,000 lines of code, you probably don't understand this easily and especially since most people here are not only doing let's say teaching and development, but I am a Professor also, so I do, let's say, academic administration. I have to make other things, so time is usually limited, but it's probably true for everybody else.

What we want to have is a stable and well understood SDN controller for our lab operations and teaching and with that said, OK we want to produce something new, we didn't have concerns about reusing things we already had, we decided to go with a model driven software engineering with YANG models used a ...protocol... and we didn't go for rust, but we did go for YANG, sorry, to develop our software there, just start from scratch.

OK, how does it look like, the architecture itself is pretty easy because our intention was to say that the controller itself should not have too much function amount in terms of networking protocols and features. If you look to other controllers, this kind of picture is at least the full slide was 20, 30 boxes and we decided to say OK, an SDN controller in the best case only be there to talk to the network devices at the bottom of the slice which will have a database to store the information about the devices and you also decided not to develop our own storage, but we said OK, the database people have spent so much time in creating databases, talk about performance represent cations, just give it there, we have an event system on the right hand which was used for applications on top of them, the applications can say, I am interested in changing on certain elements, interfaces go up and down and the controller is taking care of getting this information from the network elements, storing in the database and notifying the event system to the application and the application can talk to the north point interface to get in information and decide through routing updates to implement the file servers, whatever you think the networking part should be do doing there.

The key facts, people talk about controllers, how do you get your hand on the data coming from the... we didn't do any manual programming, mainly for the reason that say if you look at the network element like a switch, you probably already have a hundred or thousand different parameters to configure and you sometimes have to multiply by the number of ports you have, we said we need to have a model based approach, and automatically generates the programming stops we can use, this is done with.... applications we are using GRPC and HTTP... and the south bound protocol, we have one protocol, it's only GNMI because we said as an academic partner, we can maintain an single interface, if you start maintaining tender faces, it's way too much. And persistence meaning in the ‑‑ we have MongoDB or etc D or Rabbit M Q for events and everything we are writing was written in go, the applications, we also have the counterpart of the SDN controller, which is the SDN agent now.

So the applications as mentioned they are sitting on top of that, the core is extremely lightweight and there's some changed logic inside, two applications write to the same part of the configuration, there's conflict resolution included, the applications... rely on the presentation of the individual by YANG models. OK, so wrapping up because I am hopefully not too fast but I am the last one before lunch and probably coming to a more interesting part. Is this used anyway, proposing something is usually easy, first of all it's all B S D OpenSource, everybody can take it and try it and come back with a change requests, we have to see ‑‑ we use this in the teaching in the master level networking courses, we have a number of five years, different bachelor master theses, where students are coming back to the question how to educate junior people to be more proficient in some topics. For instance somebody was looking for higherability, some people could look into and work on the code and read for this and so on, using this in research products and quantam key networks and BM B F project and... used in operations of the particular networks. Perfect.

What are the meta goals of this? First of all, do it yourself. We want to enable students to work in network tech on well understood pieces, not just when I started here 11 years ago, people were doing some vendor academy, people were sitting on a control and typing commands and that's also needed in some point but first of all you you should understand how the network looks internally, how are the protocols and one of the big key words maybe in Europe there's a big discussion about digital sovereignty, I would say if you are doing it yourself, you are kind of independent because you are network, have some chance to understand it and you don't have to just hang on to other people's knowledge. Then we want to evolve the student acts because it's a combination of science and engineering, people have to read papers. They have to learn a lot of networking protocols and also software engineering, coding skills and in some cases like if it comes to high performing things, they have to understand, okay, what is my hardward?

And typically computer science students are too far away from opportunity. And we have working with the national teams which is typically quite new students. And last but not least, requirement engineering, also customer requirements engineering, so you can have a cool piece of software that if your fellow students which is coming next year is not able to use it, it's probably not that well done, so you have to figure out the key, what is the need of the students in this case and how can you adapt to this.

So I think the first presentation was about where do you get money from and in terms to pay work‑force. In our case, the aim at their great goal to save public money, public code, that's a reason why we publish on the OpenSource licences. We can use some of our university resources but that's typical limited manpower. For us we have an infrastructure with a Git repository, so there are no limitations on our end, we have students contributions to ‑‑ students contribute code and sometimes as Maria said, some output is not usable but now and then we have a student which contributes a major part to one of the elements which can used in the long one. We live a lot on public research funding of course. And sometimes but that's rarely we get some industry funding to use this for some prototypeing in a specific setting. Commitment is using OpenSource is a big problem, we use the code for teaching and keep improving and that's the bad point sometimes. For instance, I spend this morning on improving some source code I just had time for six months but you have some pressure to use it in teaching, you need to keep up and running with this stuff.

In the long run as everybody knows, it's time consuming and hard work, getting the funding isn't easy but typically, it pays off and students rarely say, I am going to look into the code how YANG models work for the controllers for resolve configuration conflicts for instance, they can have actual have insights, more technical insights about this.

Last but not least, the bigger question for us in Europe and it's not meant to exclude other people, but in Europe, there's the problem are are we doing our technology on our own and can sometime like this can help in Europe to educate people, to get people working things and my bad example is not sure if you have thought about this, the Eurostack initiative. They have a really nice web page but they don't do code. Typically it's about coding things, deploying this, getting feedback and making something tech.

OK, last slide and I'm happy to take any questions, that's my contact, there's the SDN controller source code and most exercises we also develop our own SDN agent which runs a Linux and which can do some configurations; maybe not as many as commercial products but it's under our full control. All right. Thank you.

(APPLAUSE.)


SASHA ROMIJN: Thank you for sharing, you were right on rhyme time and sharing different ideas and questions for us.

Are there any questions or comments from the room? There are a lot of questions in these slides, I'm hoping there's something. But maybe we need some time. And we can also, of course, continue this kind of discussion on the mailing list.

Do we have anything from online at this time? OK. Then thank you again, Martin, for speaking with us.

MARTIN STIEMERLING: Have a good lunch.

SASHA ROMIJN: I have a few final aannouncements before you go to lunch. As we said at the beginning, please rate the talks for the working group, then we use that feedback for the presenters and for us next time in planning. I have been asked to remind you the NomCom that selects the next RIPE Chair will hold office hours second half of the lunch break in the Castanea room, drop by if you have inputs for them.

There is an ICP‑2 feedback session at the Meet and Greet desk 1:30 to 2:00. PC election is still open, the RIPE committee that selects plenary amongst other meeting planning, you have until five to cast your vote on new members for the PC.

Finally, PT NOC will have a feedback meeting in this room right after us. If you have any other feedback or comments for us, please always feel free to share them on the working group or to mail us as chairs directly, thank you and have a nice lunch.