RIPE 90

Daily Archives

RIPE90.

Tuesday 13th May 2025. Main hall, 9am:

Plenary. Good morning everyone,er about to start, please take your seats.

Hello, my name is Osama, I am part of the Programme Committee, I will be chairing me and Max today, Valerie was feeling a little bit sick today so...
Our first speaker for this morning is Savvas from the university of the Twente, he is going to be talking about critical BGP fixes.

SAVVAS KASTANAKIS: Everything works. Hello everyone. My name is Savvas and I am a post doctoral in the university of Twente and today I am going to present to you one of our ongoing project, this is is funded by the Dutch research council under the European project and this is a joint work with these people. The title of the presentation is called critical BGP fixes a measurement based analysis on critical infrastructure security and we see where the measurement part comes in, what critical infrastructure is and what we define security.

Now, let's set the foundation first, the internet as you know is a network of networks and those networks are called autonomous systems and they are called autonomous because they independently the routing policies without the need to globally co‑ordinate with a remaining 75,000 or so autonomous systems on the internet.

Now these autonomous systems manage their own set of IP prefixes which essentially are blocks of IP addresses. Now, using the border gateway protocol, autonomous systems advertise routes towards the prefixes to neighbours and those neighbours propagate based on the routing policies these announcements further to their neighbours and that happens continuously until all autonomous systems on the internet know at least one route to reach all prefixes.

And now the interesting part comes in, BGP was introduced in a time where there were only a few ASes constituting the network and probably network operators knew each other, there was no actual need to build trust in that protocol.

And due to this inherent limitation, BGP still suffers from attack vectors like prefix hijacks, path hijacks and rout leaks and this can lead to all sorts of inputs like man in the middle attack, denial or service or any impersonation attack.

Now I started my post doctoral journey in the university of Twente in October 2024. Now the document you see on the left is titled road map to enhances internet routing security and it was published by the Biden/Harris administration, September 2024.

If I give you the link to that document, you won't be able to access it now because a few weeks later after Trump got into the administration, this document was brought down but likely for us we can still access a past snapshot of it. If we go to the table of contents, you will see there is a specific section called recommended actions, with a subsection titled baseline actions for all network operators. And of course we are going to send our time reading this, but I want to focus on the first highlighted part which says every network operator should develop, maintain and periodically update a cyber security risk management plan or framework. And to summarise the bullets you see below, this say know your assets, know your peers, know the criticality of your assets, be able to estimate what potential attacks could do in your assets and the whole ecosystem of your company.

And that's where our part comes into play. We decided to automate this risk assessment process, not just for the lazy network operators out there but also for the network operators that the business model of I their organisation doesn't really fall in line with such security practices.

So we decided to incentivise those network operators by bridging the gap between policy based recommendation and actual network practice. So we said let's implement an OpenSource BGP risk assessment toolbox which can help both network operators and policy makers devise their own strategies but you know, we are researchers, we aim to publish papers so this toolbox fingers crossed is going to drive the seals of studies which are potentially going to become published papers.

So far on paper, these are the things we claim to have implemented. And this is what our toolbox currently looks like, so as you can see, we created a box of tools, the blue tools have to do with the network posture of an autonomous system, specifically setting prefixes and the red tools have to do with the security. So the way we envisioned that toolbox, it's going to take a list of prefixes or a single prefix and it's going to do some magic in the back and then it's going to output a status report on the network security hygiene of these specific assets.

Let me say a few things about the tools that we use. We use a lot of RIPE state API functionality, Maxmind for the geolocation, get access to routing paths, public routing paths and then CAIDA, IODA and GRIP for the security analysis.

But we are not network operators and that means that we do not own our own prefixes, so we had to do a careful input selection and evaluate our approach on some input so we decided tow evaluate our approach on real world dataset from critical infrastructure sectors. Now, if we start discussing what each one of us thinks critical infrastructure means, trust me you are going to get a variety of responses. But we are not going to follow that path. We are going to follow the already defined path by government and policy makers on what critical infrastructure means, you can see some examples on the left like defence and national security, banking and finance, education, water, energy, health, so on and so forth. Essentially these are the systems that if it gets disrupted, that would significantly impact the public health, safety and economic stability of our society.

And many of those sectors and applications rely on heavily interconnected systems like DNS and BGP so as you can understand failures on such like systems can propagate globally online services and impact essential and critical operations worldwide.

Very important slide. What we do not study. And of course what we study. What we study is the blue circle, what we do not study is the yellow circle. So we study the internet facing part of critical infrastructure like online banking portals, hospital booking systems, tax file systems but we do not study the core infrastructure. Like the traffic lights control network or public safety systems, this is out of the scope of this work. Now in some cases the blue circle, the yellow circle relies a lot on the blue circle, an online banking system heavily relies on the internet to operate so a problem in the interfacing part of an online banking system would also affect the total functionality of the bank, but in case of power grid, then probably our analysis is at best an indicator on what the security practices in that company would be.

And this is the whole presentation in one slide, this is our approach. So in the right you can see that we have our toolbox, which we aim to fill with prefixes and also the owners of those prefixes. But in order to get there, we start from the top left part, so we are essentially start from a list of critical infrastructure domain names, we resolve the domain names with DNS to the underlying IP addresses and if we have the IP addresses in our hands, we can also get the covering prefixes and the owners of those prefixes which we term as critical BGP prefixes and critical autonomous systems. And then we feed those into the toolbox and yes, we expect the toolbox to spit out a report on the network and security hygiene of those assets.

A few more words on the input that we use. So we used two specific dataset from these two websites, and basisbeveiliging.nl it's an initiative by the internet clean up foundation and assesses and publicly reports on the basic security of Dutch organisation, this is a Dutch specific dataset, and on the left, there are some categories like municipalities, health care, water boards, safety regions, ministries, education, all these categories fall into the critical infrastructure umbrella.

From those dataset, we do not consider anything about how this tool calculated the network and security posture of those domains, we only collect the domains so that we can have a starting point and do our analysis. And the same approach was followed through the hardenize.com dataset, which offers comprehensive assessments for multiple countries, from those countries we only select Switzerland, Estonia, Lithuania and Sweden, so that we can somehow focus on analysis on European critical infrastructure.

And again on the right you can see, on the left you can see there are dash boards on the website and on the right you can see some of the categories like financial market, fintech, food, health, education, for that specific country.

And again we collect only the domain names.

I am not going to solve this Adobe error, but we have the toolbox, we have the approach, we have the data, so now it's time to show some results.

Ah. It was easy, okay.

So, first of all, a CDF is a hard plot for me to understand when I started, I try to put it simply, a CDF tells us how much of the data which is the Y axis sits at or below a certain point on the X axis, had a we try to do here is the multihoming adoption across those five European countries. And this can be found on the black line. We also drew for comparison the grey line which is the multihoming adoption of all autonomous systems on the internet and we used the CAIDA AS relationship dataset for this and I am not going to stay in the patterns and differences between patterns but what I want to draw your attention on is that there is not a single black line which starts from one, that means that there is not a single critical AS that relies only on a single provider which could also indicate that it is a single point of failure. So we start with some positive results.

And we also continue with some positive results because the blue line you see over here was drawn using the RIPE state API and specifically the visibility function.

And again for comparison, we draw a random set of prefixes as the red line. Now I want to draw your attention in the bottom right part of the figure and that blue line tells us that approximately one hundred per cent of the prefixes in all the countries that we studied, demonstrate approximately 100% visibility rate.

And that plot tells us that constant monitoring is important because if you somehow measure the visibility of your assets and you start seeing negative spikes in your measurements, that could could potentially indicate that your services might be become unreachable in the near future.

Awesome, that's one of my most interesting results at least for me.

I wouldn't have the space to draw all of the maps for all the of the countries, I am just focusing on the Netherlands.

What you can see on the left is the European map and what you can see on the right is the US map. So before we drew those results, we made some assumptions and we said that we expect the critical infrastructure of a country to be geolocating mostly in the country. We shouldn't be expecting that a part of the critical infrastructure of a country should be geolocating somewhere else. Because that would mean a jurisdictional dependency to a third country.

So what we can see for the Netherlands here in the left map is that we have a very strong red colour in the Netherlands, that's what we expect. But also we see on the right map a strong orange colour which says that critical BGP prefixes in the Netherlands demonstrate a strong presence not only the country of origin but also in the United States and that heavy US concentration that's also observed in the rest of the countries but I'm just skipping them for space constraints suggest disruptions or political and regulatory shifts in the US could also propagate globally and affect the Netherlands and the rest of the countries that we studied. And for those of you who like numbers, approximately 25% of the Dutch critical BGP prefixes geolocate in the US.

Now, again some weird plots, I am trying, I will try to explain simply, what we tried to plot on the left is a number of hijacks on the X axis against the total duration of those hijacks that the critical ASes among those five countries suffered during 2024.

So the worst case scenario is to observe an autonomous system of the top right part because that would mean that that autonomous system suffers from multiple hijacks or outages that also take a long time.

Now let's focus on the right figure. We can see there are some ASes on the bottom right part, 6939, 20, 2120 that suffer from freak network outages and some other ASes like in the top left part of the right figure 20773 which suffers from prolonged network outages. That can somehow help us highlight operational instability or lack of redundancy in certain CI infrastructure.

But also we observe some large ASes like AT&T and Cogent and Amazon on the left figure and those exist on the bottom right part, which experience a big number of BGP hijacks and that tells us that those attack vectors will not only affect small networks but also affect hyper giants.

And even those hyper giants remain vulnerable to BGP attack vectors.

So what can we do to somehow deal with those problems? I do not have the answer and probably no one has a definitive answer but I have the best answer so far. So the best answer is RPKI. We can use RPKI and to put it simply: RPKI consists of two things, the signing of your prefixes which is the ROA part, so you need to sign a prefix and claim that you are the legitimate owner of this prefix. And the RoV part which is the validation part, others need to validate whether you are telling the truth or not in your announcement. Now we imagine that all of the autonomous systems on the internet with ROA compliant, that means all of them sign with their prefixes but no one on the internet validates those ROA objects, that would essentially mean we have zero RPKI security and even though we observe those five European countries that that there is a good ROA compliance, we only observed 67% for Sweden and more than 80% for the rest of the countries which is far above the world average.

We also observe that 40%, more than 40% of those critical ASes failed to perform RoV and these essentially undermines the overall RPKI security.

These are some suggestions, some heavily subjective suggestions vis‑a‑vis our results, so what we suggest is that network operators should prioritise the signing of their prefixes and their assets with ROA but also deploy RoV, not just sign their prefixes.

And from the policy makers perspective, I usually disagree when somebody suggests to punish someone who doesn't follow a technology and on the other side try to incentivise people. I suggest that policy makers can incentivise RPKI compliance either through tax benefits or grants to smaller ISPs or medium ISPs to implement such security practices.

Now, if I had to summarise the whole preparation in one slide, I would say that we aim to bridge the gap between policy‑based recommendations and actual network practice. And to that end, we design a BGP‑based risk assessment toolbox to help us reach that goal.

Now using that toolbox, we investigate the network and security postures across five European countries and two important insights are we get, is that critical BGP prefixes exhibit a heavy concentration not only in the country of origin but also in the US and that is the same for all of the countries that we studied.

That means that disruptions in the US will affect the critical infrastructure of another country.

And also critical ASes demonstrate high ROA compliance to their assets but low RoV enforcement and that could undermine the overall RPKI security.

Now, network operators could use such a tool to prioritise signing of assets and the prefixes, or they could check the routing pass in the routing tables and for example prefer paths that include high enforcers, of RoV or when the average RoV of the path for example it's higher than other parts, prefer those.

Thank you for your time to watch my presentation. I would be happy to take any questions, both online and off line and please, feel free to send me your suggestions, positive or negative comments on my email on how this could be better implemented, fixed or what could be added on top of that toolbox. Thank you again very much.


(APPLAUSE.)


MAX STUCCI: Good morning everyone, so we have two questions, I will go with the first person who stood up first.

AUDIENCE SPEAKER: Thank you for the presentation, Jeff Heuston from APNIC, I have two observations and it follows a study I did on the DNS infrastructure and looking at where and how DNS resources are authoratively served. When you talk about critical BGP prefixes, did you take into account the fact that some of these prefixes are anycast and some of them are well anycast, why they dispersed across the globe and some of them are badly anycast to cities right beside each other? There's no mention here of to what extent those critical BGP focuses or Anycast, you simply say they are in the US. I suspect there's more Anycast than you are making out here.

SAVVAS KASTANAKIS: 4% is the answer, approximately, of the critical prefixes that we studied are anycasted and we used two datasets to measure that. The first dataset is called Ncastcensus and the second one is from BGP tools so we did measure that in order to try and avoid such...

AUDIENCE SPEAKER: The second thing, you talk about multiple ASes but particularly in the DNS, they are ahead of you. A number of DNS hosting providers run multiple ASes inside one organisation, it looks like there are many ASes, in fact they are all the same. And actually looking behind the ASes to find the organisational entity that is operating those ASes is actually part of diversity. For example, and I will cite one a long way away, vee ja Nick in Vietnam run ten ASes and host their staff on all ten, looks great until you scratch the surface and find it's only really one across ten numbers.

SAVVAS KASTANAKIS: Okay. That's a very good observation. We try to start from the bottom so we don't really start from the organisation level, we start from the domain left and try to reach the AS.

AUDIENCE SPEAKER: They are ahead of you, they have seen you do and go I will have more ASes, I have ticked the box when in actual fact they are not.

SAVVAS KASTANAKIS: I am not sure of the answer to that question but thank you for that.

AUDIENCE SPEAKER: Careful use of the registry data will disclose whether resource held by the same organisation or not. It's all public data.

SAVVAS KASTANAKIS: Thank you. I will take that into consideration.

AUDIENCE SPEAKER: Thank you for this presentation. Andrei Robachevski, I have to two questions. First one I understand you infer some of the prefixes from DNS data, looking at the names, but that's not the only one, right. Do you use any datasets to infer what prefixes belong to critical infrastructure?

SAVVAS KASTANAKIS: So, we only observe the internet facing part of those critical prefixes, we do not see the core infrastructure and we start from pre‑compiled domain dataset so the best case scenario would be to start from scratch and build our own measurement campaign and explore our own critical prefixes, probably around the world and then do this analysis but I tried not to focus on the critical infrastructure part, I only used it as an evaluator for the toolbox we tried to compile and so so if we can, if we start discussing about all the problems in my definition what critical BGP prefixes are, we might find a lot but the point is not to focus on the critical infrastructure part, rather the functionality of the tool that we are compiling but the answer to to your question is no, we did not focus on the core infrastructure.

AUDIENCE SPEAKER: Okay, thank you. Second question: When it says BGP hijacks, what kind of data you used?

SAVVAS KASTANAKIS: The tool we used is called GRIP, we used API of GRIP and they essentially measure events and rate them with suspicions course and we only collected high suspicions course at most events but I don't think there's a specific platform that can definitively tell you whether a hijack was actually a hijack or not or at least I cannot find a public tool but feel free to share.

AUDIENCE SPEAKER: MANRS is using GRIP as well at the moment, we are looking for other providers as well and we also ensure that they are as low rate of false positive as possible, but indeed with this approach, we cannot be 100% sure because inferring the intent from outside observations but thank you for your presentation again, thank you.

SAVVAS KASTANAKIS: Thank you for your comments also.

AUDIENCE SPEAKER: Randy Bush. Could you give me clue as to how you determine if an AS is multi‑homed or not?

SAVVAS KASTANAKIS: Yes. So, there is a dataset provided by CAIDA and this dataset includes the relationships between autonomous systems, if an autonomous system, apart from the connections you can find in these, in this dataset, you can also find the type of relationship, so it's ‑‑

AUDIENCE SPEAKER: I am familiar with the CAIDA data, you are saying you used the CAIDA dataset?

SAVVAS KASTANAKIS: That's the short answer, yes.

AUDIENCE SPEAKER: Thank you for your presentation. I have a question about the attribution that you use which I assume is the core of your work, meaning to attribute you said you started from the internet facing domains and attribute to those prefixes, did you have a method to evaluate your attribution method you used? The reason I am asking this is because in the slide that you showed the map, you mentioned that some of the Dutch critical infrastructure are also located in the US. Which could be due to many things for example CDN did you have a method to evaluate this?

SAVVAS KASTANAKIS: No, the answer is we didn't evaluate that approach, we followed that path because we somehow started implementing the toolbox and we wanted to observe some first results and whether we see some weird patterns by using our toolbox.

I can think of some techniques on how we could evaluate it but this project, no, we didn't. It's only in the first steps of its life, let's say. But no, we didn't.
MAX STUCCI: I see we have no more questions remaining so thank you very much.

SAVVAS KASTANAKIS: Thank you.

(APPLAUSE.)


MAX STUCCI: Now next up is Lesley Daigle telling us about the microplastics on the internet, interesting concept.



LESLIE DAIGLE: Interesting in every sense of the word and sadly including the pollution ones. Yes, so I am Lesley Daigle, I am the chief technical officer of the global cyber aligns and that organisation is responsible for this work, so since 2018, the global cyber aligns has had global honey farm, I just a quick note on what it is, we have sensors across the globe just listening for things that are trying to poke them. And are being very generous in terms of what they accept by logging credentials and so on and we basically listen to capture what are you trying to do and what are you, what commands you are trying to run, which control servers are you trying to reach, what are you trying to download and that kind of activity.

So as I say, we have been collecting data since 2018, our initial impetus was to collect data, to understand what was happening to IoT devices because this was hot on the heels of Mirai taking dine down Mirai, and the sad realisation we came to this was not just about IoT devices, this is what's hitting every single open IPv4 port in the globe and I know I have looked at my server logs at home and I see some of the IP addresses and look for the attacking IP addresses at home, we say yep, yep, yep, yep.

So our honey farm, we updated the system at the end of last year, we are now running our own honey pot software, which is collecting data over SS RH, HTTP and HTTPS and SFTP and we are continuing to add some protocols although honestly Telnet remains the big sources of attack traffic.

But I am not actually going to talk a lot more about us, I want to talk about you and your networks and what we know about you.

So just a very brief highlight of what it is that we see. This is 114 seconds in the life of a hundred knee farm. This is, it took 114 seconds to see 1,000 attacks on our two hundred sensors across the globe and the slides are available online and when you peer at these in more detail, the blue lines are the scans and everything else is either an attack or the precursor to an attack on our sensors. This is not just people poking and saying there's an open port, this is real attack traffic. In fact, this is real attack traffic from a network here at RIPE this week and you can see it gets kind of bursty and scary big spikes. That network is also shown on this graph. But you can't see it because a couple of other networks that are also here this week sent a lot more traffic our way and this is fromQone of this year.

Again things you can note, the attack traffic tends to be bursting and more on that in a minute, there's also a background radiation of attacks hitting us from different autonomous systems. Sometimes a lot. I think network B here has something to work out in its system because it clearly is sending an awful lot of attack traffic at us and at every other open IPv4 port in the world.

So I will mention these are networks that are, these were ASes that were represented by people who are attending RIPE this week. I am deliberately not saying which ASes they are just bass my point here is not to name and shame, my point is just to illustrate what's actually happen onning on the internet all day every day and that this is the problem that needs to be solved. These are the microplastics and just like microplastics, you might not see them.

But that doesn't mean they weren't impacting everything in the ecosystem.

This is just another graph to show you a little bit better the relationship between those four ASes. And this is a volume measure of what we saw inQis of this year from those four networks in terms of attacks on our two hundred sensors.

So, you might say yeah, yeah, that's nice but it's not impacting my bandwidth, what's the problem. It is impacting the value of your IPv4 addresses because when you get spikes like I showed you on those earlier graphs, every other network is the world is updating their block lists and you know, refusing traffic from your IP addresses.

And you can say well, that's fine but I am here to serve my customers and it's not impacting my customers, except sometimes it is.

If you run cloud infrastructure, you got to know some of you are why customers are busy beating up others of your customers, whether they are doing it deliberately or whether they have infected machines not being addressed is an open question, but the point is attacks are real and they are ongoing.

Like I said, it's like microplastics, it's everywhere, it's into everything, it's clogging the ecosystem and so we are calling it pollution and we made an index.

So if you go to gcaaide.org, you will see this graph or an updated version which shows the amount of traffic we are seeing, the amount of attacks we are seeing from each country in the globe. And it's a heat map so obviously the darker the red, the more we are seeing normalised by population.

If you scroll all the way to the bottom of the page, you can see a little bit more about how we are calculating this index, this pollution index that we have got.

And our aim with this is again illustration, we are illustrating just how much of this there is in the world and where it's coming from. We obviously had the data we can drill down to the network level, we can drill down to any level we'd like but for public purposes, we are sharing it at a country level basis here.

So that's still all very generic and tells you a little bit about volume and rate and how often and how it never stops but who cares about it anyway.

So, coming back to those spikes, those spikes may or may not be related to particular machines being infected, they may be related more to whether or not there are specific campaigns being run because we are way past the point of script kiddies in their basement poking open things and saying ha ha, I tickled the machine on the other side of the globe. There are specific attack campaigns that being run by different groups including state actors. So for instance, the Flax Typhoon campaign which relies on tools and exiting tools in the operating system to basically passes legitimate software and deploy VPN so make a persistent connection across multiple devices and multiple machines across the globe and this one was primarily targeting Taiwan based hardware but the principle applies, once you built one, you can retarget. I will say that we saw a surge in Chinese originated attacks on our Taiwanese centre during the time frame of the typhoon campaign with the peak in August 2023.

More recently, there's the, there's some manager Ray variance, it's the gift that keeps on giving, there's a Mirai variant and it's still exploiting old delinkable vulnerabilities to take over DLink devices around the globe and we can stand here all day and talk about how people should patch their devices but the reality is some devices are deployed and not in accessible places or have other reasons they can't be updated and they get owned. So we had a look at this particular attack fingerprint and saw a fair bit of it in the talent traffic in our data. Hitting 193 of our approximately 200 sensors, this was from one IP dress in Vietnam. It eventually got shut down. Somebody, you know, rang the alarm bell and either successfully logged it globally or shut it down, we did later see another surge of these attacks coming from a different IP address somewhere else in the world.

I want to talk a bit more about the Volt Typhoon campaign from just over a year ago. The US was noting that there appeared to be a Chinese driven campaign to take control of vulnerable routers and modems and cameras, cams are are really suspect, these are the IoT devices that everybody thought oh people will just play them in their home and they are not going to be vulnerable, we don't need to have the ability to change passwords, passwords, but they are the things that get owned and controlled most often. For that very reason.

So in terms of Volt Typhoon, we saw a fair bit of traffic hitting our sensors across the globe, some in the US, but also in Europe and parts of Asia as well.

And so we dug into it a little bit more and I will say now at the end of the deck, which is available online, there are pointers to a variety of sources for these slides, but also pointer to some recent work that we have published on our blog, outlining more of the detail what what we have seen but for today, this is what the Volt Typhoon attack looked like in the log scale.

With over two million attacks coming from a single AS. This graph is only showing where we saw more than ten attacks from a given AS, I mean because you know, sometimes it's one machine that's infected or one attacker that's living in your AS, sometimes it's a plethora of them, the size and shape varies an awful lot. So this graph is only showing where we we had more than ten attacks from the AS, yes and beyond this graph more than than 3400 AS that attacked us. In this graph where you can't possibly see it. The red lines of the graph are ASes that are here at RIPE this week.
‑this is not somebody else's problem. This is our problem.

And this just sort of shows a little bit of the shape and nature of the attacks coming from those ASes that are here this week.

Again, you can see spikes of activity but you can also see a fair bit of ongoing activity and resurgeons in the case of a couple of the networks, whether that's because something got shut down and the perpetrators moved to different resources within the network or you know, whether it was just other people that came along an got infected.

We did a little bit of investigation of how does this line up against different types of networks. And the biggest surprise for us was that most of the Volt Typhoon attacks came from network service providers, wait a minute, why are network service providers, these are like infected end machines, so I don't have an answer. The things I would say are one, it's pretty hard to classify networks these days because very few networks are actually a single type of network, most are doing multiple things. But also that there are, it's pretty common for networks to lease out their IPv4 address space to other entities so this may be a cautionary tale of what happens when you lease out your IPv4 address space, you get it back and it's just slightly dirty.

And on a lot of people's block lists now.

Yeah, so this is breaking out the Volt Typhoon across the network service providers in our list, and just focusing on the top three, just so you get a sense of a, of the pulse of the different attacks, unique IPs involved in the attack because sometimes as I said it's just one perpetrator that may show up and you know, just blast the heck out of our sensors and sometimes it's spread throughout the network, so over 6,000 different IP dresses coming at us network service providers, it's pretty widespread and well deployed footprint of an attack.

So, I hope I have convinced you that it's real and out there and happening. So what can we do about it. Take away the notion that yeah, this is in your network. Or maybe it's not and if it isn't in your network, we would love to know what you are doing that networks A B C and D are not doing, that can actually address the source of these attacks.

Barring that, you can help us help others, we can collaborate to address the problem of microplastics in figuring out what things can be done to stop them at source before they infect others and I think that we have success stories of that kind of approach, building industry consensus about what is a good norm for doing something security in your network, like maybe MANRS, what they immediately agreed norms for routing security have done for improving routing security, I am optimistic that we can find some agreed norms for how do you find and stop this nonsense in your network before it attacks somebody else, before it trashes the reputation of your IP addresses and before we have a mess in the internet.

So, that's what I have to say. Thank you for your attention and I would be happy to take any questions or comments.



(APPLAUSE.)



AUDIENCE SPEAKER: Hi, Leo, can you say how you categorise the types of networks so that the network service providers with the highest source of abuse.

LESLIE DAIGLE: Yes, so we relied on CAIDA data so do the mapping between the AS and network type.

AUDIENCE SPEAKER: Thank you.

OSAMA AL‑DOSARY: Any other questions? There's one online. And the question is how do you classify scans from research institutions or researchers marked as malicious.

LESLIE DAIGLE: We classify scans as scans, whether you are a research institution or not, they are just scans so the deliberate point was more to call out when we are seeing scouts and attacks because those are the things that we think are particularly damaging, because then earlier versions of talks about our data, a common observation was oh well all that stuff that's hitting your sensors is just scans so it's more that we are saying look at all this data that's not about scans.

OSAMA AL‑DOSARY: Any other questions? Thank you Lesley.

LESLIE DAIGLE: Thank you very much.



(APPLAUSE.)


OSAMA AL‑DOSARY: Our next speaker is Alexander Azimov .


ALEXANDER AZIMOV: Hello everybody, my name is Alex Azimov, I work for Yandex. Today I am going to talk with you about how to collect, store and get access to BGP and BMPdata. So let's start with a problem definition. Why do we need routing data, of course this is our plan for today and routing data, routing data is a core of more than internet ecosystem and of course routing date can be used not only for routing, for advertising prefixes but also to draw such a wonderful pictures.

But of course applicant of routing data is much wider. For example, you may want to have an IP look up from multiple network devices. As a network engineer, I hope there are network engineers in the room, you laugh locks, especially you laugh them after routing incident when something happened, something disrupted your operations and you would like to understand the state of routing tables at a specified moment of time in days maybe ago.

And still both of these cases require engineer operators with them, it becomes more challenging when you are using routing and data as a part of automated pipeline, for example to automatically adjust routing policies or for capacity planning or maybe you are facing a problem with size of rip table and you start inventing smart BGP injectors to solve it and of course you may want to use routing data for monitoring purposes and you need it in realtime.

So I hope these sales pitch have convinced you that we really need to collect routing data, let discuss what options do we have.

And of course the most loved option is BGP. You can use a BGP multi‑hop session to retrieve single rout for each prefix available, it's a lottery table at your network device, it's a crucial data but it's not all the data that's available on the network devices. It's because this data is after the filtering, after the best possible selection process was done, so it is a subset of prefixes that are available at the router and there is much more data coming from peers, maybe you want to have the data that you are advertising to other peers and you may want to start using BMP, it stands for BGP monitoring protocol. So it's a TCP based protocol, it provides a way to collect routes from the adj‑RIB in, adj‑RIB out, wide prefixes, so it's a swift tool for all BGP states that are available in the router so maybe we can just stop using BGP mull top sessions and switch to BMP.

Unfortunately in this case, we need to think about the protocol lifetime. BMP was standardised nine years ago, in terms of how technology progress, it's a brand new protocol.

And normally you will see support of he had rib in and rib post, sometimes you will have access to Locke rib via BMPand normally Locke rip out is not supported at all.

So we have a hot choice. On one hand, we have universal BGP with limited view on the routes that are available in the network devices and on the other hand we have universal BMP but with limited support. What should we choose.

And I believe the best choice is not make a choice. At least at this point.

So we decided to utilise both BGP and BMP in our collection engine but to make the process of storage, the access, the collectors, to look as similar as possible.

Let's discuss the field of collecters. So speaking about BGP for many years it was quite common to use for BGP collections same software that we use for FRR routing, it's BIRD, FRR, it's go BGP, sometimes you need also to use BGP dump, to confer to their default binary format to what can be readable but the field of BMPis quite different, in BGP it was routing software, in BMP world, it is collectors. They are not meant for routing so their main purpose to collect routes.

And the number of tools that is available is expanding, it's growing fast. A few years ago I was aware of two of them, only one of them was working out of the box. But nevertheless, it's obvious that as far as I know there's only one tool that works for BGP and BMP, it's ACCT, that's why we decided to use it as a collector for both BGP and BMP data.

So we discussed why we need routing data and we discussed what type of collectors can we use to interpret BGP data coming from the network devices but we still need to figure out how to connect our applications with routing data.

What options do we have. So this is just to have a collector for each application. Unfortunately this approach have obvious scalability issues. For BGP, it just doesn't work for BMP where you will have a hard limits on main ruling platforms for the number of BMP sessions, some patterns we will be able to establish a single BMP session so we need an inter immediate storage that we'll store data and after that, give an access to this storage to the applications.

This storage has several requirements. Of course the first is consistency, we need to guarantee the awe ton knee of their updates and we need to have automated fill over, it should not be storage on one server and performance, it's quite important. So in our case pro tracing was the answer for our requirements and we decided to use, to set up automated pipeline of processing routes.

Let's see how it works. In our design, we utilised the active active approach. So we have multiple simultaneous writers and they are, they have unblocked operations and they inserting data in the table that we called multi‑view.

So but before giving access to the application we first need to merge it, to to create a single view table.

Here we have a protocol dependency, in BMP, there is a realtime stamp, it's a time not when the route was received by our software or by the collector, it's the time when the route was received by the BGP border route and unfortunately in BGP we don't have such a luxury and in BGP to perform the merge we need to implement best past selection process but anyway, when a record is added to the multi‑view table, it utilis triggers from the 'graves, from the update, on the delete we have triggers that start ordering process for selected prefix and try to update the single view table. If the sin velocity view table is updated, it's also, it adds a record to the log table so as you can see there run no external scripts, other than the collectors that are passing their BGP and BMP data, unfortunately collect torts may fail. And to process this scenario, we also have a keep alive table that if a collector is not active for a selected period of time, there is a garbage collection process that starts to remove stale records from the multi‑viewer. The view table and the log table are updated accordingly using the same triggers that we just discussed.

So now we have a collector, we have a storage, we need to give an access to the collected data to the applications. The easiest way to give direct access to the database, unfortunately in this case there is a high chance that you will be ending on a frequent random request that will be addressing columns where you don't have indexes. The result will be not very high performance and not very happy end users. So normally you will be addressing this issue with Restful API. Restful API can provide a cache that will optimise performance between end users and database storage but it's not normally working well when you need a continuous updates.

So for these, you will need an SDK, and once again we have a hard choice whether to use API or use SDK and whether we should support multiple end users with SDK.

We decided to do both and in our environment SDK API is a client of our of the SDK library and it provides flexibility for the end users and application developers, whether to use API or SDK for a selected language. The systems have been already widely deployed in our neck. We have BGP and BMP data coming from the border routers and from the CDN and router deflectors and we have BMP coming from the data centre gateways.

So we have discussed how to build the selectors, how to store data, how to provide access to the data, let's jump to the examples.

And I hope you still remember this slide from the beginning of the presentation. So we did a talk but time doesn't allow me discuss all of the points that are present here and I will speak about how to use BMP data to detect routing anomalies. So just to make a reminder to you, what are routing anomalies, normally we are speaking about hijacks and route leaks, hijacks happen when someone advertises your address space without your permission, route leaks happen when some network decides to advertise prefixes that are received from its providers or peers. So the result of this anomalies are mainly the same, it's connectivity issues, in a worse scenarios, denial of servers.

There is a classic BGP monitoring when you use remote vantage points to collect data, analyse data and detect anomalies.

But I believe there's a significant part of the external monitoring of your network can be done from within your network.

And monitoring ruling incidents is not an exception.

So let's see the next example. Here we have five autonomous systems, if the arrow goes up, it means a route is advertised to a provider, if it goes down, it means than route is advertised to a customer. If it's horizon al, it means that a route is advertised to a peer.

And on this picture, there are no route leaks, everything is fine.

But what happens if autonomous system 5 fails to deploy, a proper filtering to its provider, to autonomous system 2, if it has proper filters on the ingress, everything is still fine because in this case, there is a route leak but it is not propagated.

But if we have a double problem, if we have two mistakes at once, there is a route leak. And a significant part of your traffic may be redirected and the result will be... connectivity, unhappy end users issues.

And that's rare BMP becomes a handful. If system 2 accepts a leaked prefix, it will send back to auto system one, to the system that was originating the prefix.

The BGP loop detection will filtrate out, you won't see these routes in the Locke rib table but it is available if the Adj‑RIB in so it's available in your BMP data and you need only to figure out how to detect route leaks in the Adj‑RIB in.

You have a couple of options.

The first one is OTC, which stands for only to customer attribute which was started disked in RFC 9234, it has a simple route how to detect route leaks coming from peers or customers.

It is supported by several OpenSource routing software, it's also supported by PM ACCT and deployed by significant number of IXSss, this is not the list of all of them that support BGP routing OTCs it's where I see an attribute in my routing table, I do see them that they are active.

Unfortunately some well known vendors will not fast quick enough to develop this feature but things are changing for the better and this summer one of the big vendors will release software that supports this RFC so I hope the adoption level will be growing fast.

The other option is ASPA, which stands for autonomous system provider authorisation, it still scrolls through the IETF standardisation process, it provides a way to detect route leaks coming from all directions, from providers, peers and customers, but it's, there are no ASPA records. But no one can stop you from creating an ASPA record in your YAML file, just naming a few well known tier ones, so there will be on Thursday, there will be an overview by team at the routing working group, please join us and give us feedback.

And so there is another question, which one to use,OTC or ASPA and by now, yes, you should guess the answer.

This is a high‑level design of our monitoring system, so we have multiple collectors, we have a storage engine, we have SDK that provides access to the data, we are using ROAs to detect hijacks, we are using OTC and 15 ASPA records from the YAML file to detect route leaks.

And here is the numbers. The number of active route leaks in the Adj‑RIB in of my network that were active when I was finishing the slides yesterday evening.

So, of course it's not only about our prefixes, but it's also works when someone leaks prefixes that we are advertising so here when the practice meets theory in a good way.

On this particular example, there was a network that started leaking prefixes, it was our peer, to its providers and these providers were sending these prefixes back to us and the system was capable for to instantly detect it and report it.

So, this is my last slide for today. I'd like to hope that I have inspired you to look and try BMP, I'd love to see a broader support for OTC and BGP, ask your favourite vendor about its support and help us with ASPA. Thank you for listening.


(APPLAUSE.)


MAX STUCCI: I already see one person asking a question.

AUDIENCE SPEAKER: Hi, what have I done! Okay, Jen Linkova, someone who runs BGP. Thank you, very cool. I guess obvious question, all those nice architecture you described, right I guess it's in‑house development, right?

ALEXANDER AZIMOV: You guess right.

AUDIENCE SPEAKER: Is OpenSource available for other people who run BGP?

ALEXANDER AZIMOV: The tools I use in‑house are obviously available to everybody, it's... and 'raves, it's their scammer from 'graves is not OpenSource but we'll think about it, maybe it's a good idea to /SPWEG great it with BM 08 CT but it's something to think about.

AUDIENCE SPEAKER: I guess it would be very useful if someone instructions or guidance so people do not reininvestment your beautiful wheel. Can be provided. Thank you.

ALEXANDER AZIMOV: Thank you for the question.
MAX STUCCI: I don't think we have any other questions and I didn't see one online. Okay. Well, thank you very much.

(APPLAUSE.)


So this concludes our session today, so you get a little bit more time for coffee break. Don't forget to rate the talks, so give the programme a way to committee a way to understand if you enjoy the talks and we'll see you at the top of the hour. See you later.


Coffee break.