Johannes Bechberger, Author at foojay

CAP in the Pocket: Developing Java Applications on your Phone

Johannes Bechberger — Mon, 07 Jul 2025 15:27:26 +0000

Table of Contents

TermuxDevelop Directly in TermuxUse an Emulated UbuntuSAP CAP SFlightBuilding and Running SAP CAP SFlightModifying the Application via VSCodeAndroid Linux TerminalExtra: CAP-in-the-Pocket VSCode ExtensionConclusion

Smartphones are more powerful then ever, with processors rivaling old laptops. So let's try to use them like a laptop to develop web-applications on the go.

Below I'll show you how to do use run and develop a CAP Java Spring Boot application on your smartphone and how to run VSCode locally to develop and modify it. This, of course, works only on Android phones, as they are a Linux at their core.

Termux

We first need a proper Linux environment with a package manager and more. The most popular app that facilitates this is Termux:

Termux is an Android terminal emulator and Linux environment app that works directly with no rooting or setup required. A minimal base system is installed automatically - additional packages are available using the APT package manager.
TERMUX Website

Termux allows us to run Linux applications natively on device and use it to e.g. use OpenSSH to connect to another server, or interact with the Android filesystem using UNIX tools. Of course there are differences to other Linux distributions like Debian or Ubuntu, many of them stem from the fact that Termux wants to integrate directly into Android.

We have now two ways to go forward, we could

Run everything directly in Termux
Use an emulated Ubuntu in Termux
Use the new Linux Terminal App

Develop Directly in Termux

We start by using pure Termux, as it's faster and more integrated into Android than the second approach. But we'll also see soon, why running in Ubuntu can have it's benefits.

We start our pure Termux approach by installing our basic dependencies (and yes I like ohmyzsh and don't want to miss it anywhere):

apt install git zsh wget htop
sh -c "$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)"

Now we want to install Java. Termux lacks the pthread library and a few others so simply getting the Linux aarch64 built of my favourite JDK, SapMachine, running on my phone seemed too much work, so I'm using OpenJDK which already available in the Termux packages.

apt install openjdk-21

Now we have a proper OpenJDK JVM running on our phone:

But developing code in shell tools like VIM is cumbersome, so we let's install VSCode. The cool part about VSCode is that it consists of two parts, a backend and a front-end. We can run the backend in Termux and the front-end in the browser.

The official VSCode distribution doesn't support Termux. But Code-Server by Coder is a fork/variant of VSCode that has support for running directly in Termux (via dev.to). It requires a few packages from the termux-user-repository and can be installed via:

apt install tur-repo
apt update
apt upgrade
apt install build-essential python nodejs code-server

Now just start it:

nohup code-server --auth none &

This launches a local version and ignores the shell output. Code-server is by default password-protected, which is great. But we don't need authentication as the device is in home WIFI where nobody from the outside can access it anyway. If for what-ever reason, you want to password protect it, then remove --auth none and you'll find the auto-generated password in $PREFIX/.config/code-server/config.yaml.

Go to localhost:8080 to access it. Now we have a IDE, running locally directly on our device:

That was simple, wasn't it? The only Problem: The official VSCode extensions and accessing the official VSCode market place is not supported, as code-server is a fork of VSCode:

You can still install the Java extensions from file by downloading the latest Linux arm64 release via wget from GitHub and then use "Install from VSIX" in the front-end, but this is cumbersome.

This is why I looked for other possibilities.

Use an Emulated Ubuntu

The problem with trying to run the official VSCode directly in Termux is that the Termux environment is too different from a normal Linux. Using proot we can emulate an Ubuntu environment and use it to install VSCode (dev.to) and even a proper SapMachine on our phone.

We're especially interested in proot-distro. To quote the termux proot wiki page:

Termux provides a package proot-distro which takes care of management of the Linux distributions inside Termux.
TERMUX PROOT WIKI

This allows to easily install an Ubuntu on our phone, which looks a little bit like Windows Subsystem for Linux:

pkg install proot-distro
proot-distro install ubuntu

Now you can login via the following to run as root and use the Termux home folder as home:

proot-distro login ubuntu --termux-home

Running in this environment is, as I told you before, slower than running directly in Termux. To quote a reddit user:

Proot is slower. It uses Linux debugging interface (ptrace) to control the process execution and hijack arguments and return values of system calls, so it can simulate a different file system layout and user/group ids. This cause a lot overhead. In my experience the biggest performance penalty can be observed when working with a lot of files (e.g. extracting tarball).
REDDIT

As before, we want install some basic utilities and ohmyzsh:

apt install git zsh wget htop
sh -c "$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)"

Now we can install SapMachine as explained in the SapMachine Wiki:

# Add the SapMachine GPG key
wget -qO- https://dist.sapmachine.io/debian/sapmachine.key | tee /etc/apt/trusted.gpg.d/sapmachine.asc > /dev/null
# Add the SapMachine repository
echo "deb https://dist.sapmachine.io/debian/$(dpkg --print-architecture)/ ./" | tee /etc/apt/sources.list.d/sapmachine.list > /dev/null
# Install SapMachine 21
apt update
apt install sapmachine-21-jdk

Resulting in a proper JVM:

After that, we can install the official VSCode distribution:

wget 'https://code.visualstudio.com/sha/download?build=stable&os=linux-deb-arm64' -O code.deb
apt install ./code.deb
# Install the missing packages
apt --fix-broken install
rm code.deb

Installing the missing packages to around half an hour on my Pixel 8a.

Starting the VSCode backend is as easy as before with code-server:

nohup code serve-web --port 8080 --without-connection-token &

Now we have a proper official VSCode and can view it in the browser at localhost:8080:

In this we can install all the extensions we want and can start developing applications.

I would recommend to use "Add to Home screen" in your browser menu to create a VSCode web app on your home screen. This app then omits the browser menu bar, giving you more space to see your code.

SAP CAP SFlight

Let's start running and developing our application. You can of course create any application to your hearts content and use either Termux or the emulated Ubuntu. But I'll choose the latter and as a sample application the SAP CAP SFlight application.

CAP is the framework agnostic application framework of SAP for writing applications in Java and NodeJs and SFlight a simple sample application:

This is a sample app for the travel reference scenario, built with the SAP Cloud Application Programming Model (CAP) and SAP Fiori Elements.

The purpose of this sample app is to:

Demonstrate SAP Fiori annotations

Demonstrate and compare SAP Fiori features on various stacks (CAP Node.js, CAP Java SDK, ABAP)

Run UI test suites on various stacks

README of CAP-SFLIGHT

I'm neither a CAP nor an SAP Fiori expert, but CAP is really important in the SAP context and can be used to write Java web applications. The initial idea for this whole blog post came out-of a discussion with the CAP Java folks (Robin de Silva Jayasinghe), which is why I'm choosing SFlight.

Building and Running SAP CAP SFlight

Let's start by building SFlight on device. First we clone it:

mkdir code # some hygiene
cd code
git clone https://github.com/SAP-samples/cap-sflight
cd cap-sflight
# install the Maven build system and npm
apt install maven npm

For those curious, this took:

# Emulated Ubuntu
3.22s user 2.05s system 81% cpu 6.457 total

# Pure Termux
2.98s user 1.19s system 99% cpu 4.184 total

# Mac M4 for reference
1.47s user 0.55s system 58% cpu 3.457 tota

Now we build and run it.

npm ci # NodeJS based CAP tools
npm run build:ui
mvn spring-boot:run

But the CAP tools run into a problem:

Seems like nobody prebuilt their npm packages from the Android arm64 target. And no, self-building all packages doesn't work and even if. mvn spring-boot:run, which uses the CAP's NodeJS tools internally, crashes with a double free:

Because I still want to build SFlight on my phone, I need to get creative. The problem here are solely the CAP tools, not anything written in Java. So yes, you might call the following hacky, but this would be no problem in standard Spring Boot applications. It only means that you should be aware of the dependencies that your project needs, especially of natively compiled dependencies.

The Idea is: CAP Java generates database model helper classes and the UI using a complex toolchain, so let's directly commit the files to GitHub from a machine that can run all tools (my MacBook) and push all into a seperate branch. Developing with this branch is fine, as long as you only modify the backend and don't change the database schema.

Is it hacky? Yes. Is this a problem for a demo? No.

We add the new branch as follows:

git remote add fork https://github.com/parttimenerd/cap-sflight
git pull fork
git checkout cap-in-the-pocket

Now we can build and run the application again:

mvn spring-boot:run

It might exclaim that JAVA_HOME is not set correctly, this can be remedied by

export JAVA_HOME=/usr/lib/jvm/sapmachine-21

You might want to add this to your .zshrc to make it permament.

Do this

To access the most basic page of the app, visit localhost:4004

Or visit http://localhost:4004/travel_processor/dist/index.html to a proper page with which we can easily interact. When the website wants you to sign in, choose "privileged" as a user and an empty password.

Leading you to:

Modifying the Application via VSCode

We can use the VSCode instance that we launched before to access our SFlight project (nohup code serve-web --port 8080 --without-connection-token & to start it again if you stopped yours) and work with the code:

Now let's have some fun: The SFlight admin screens allows the privileged user to application deductions to the cost of a flight. We now introduce a small bug in the code that computes the discount in the DeductDiscountHandler class to always calculate a ten times higher discount than requested. This might not make any sense, but the reverse (fixing this bug) is a nice demo of how one can use VSCode on their phone to fix a serious bug on the go.

The discount is computed in DeductDiscountHandler as follows:

BigDecimal discount = BigDecimal.valueOf(context.percent())
	.divide(BigDecimal.valueOf(100), new MathContext(3));

BigDecimal deductedBookingFee = travel.bookingFee()
        .subtract(travel.bookingFee().multiply(discount))
	.round(new MathContext(3));
BigDecimal deductedTotalPrice = travel.totalPrice()
        .subtract(travel.totalPrice().multiply(discount));

We can now introduce the bug by dividing context.percent() in the first two lines not by 100, but by 10.

When we rebuild and run the application via mvn spring-boot:run, deducting 10% of the travel costs:

Decreases the price to 0 USD:

Nothing prevents us from using git to commit our change to some repo. This shows how we can easily modify our CAP/Spring-Boot application locally on our phone.

Android Linux Terminal

In March Google launched the native Linux Terminal app for Android on Google Pixel devices. Luckily the Android phone I use for all these tests is Pixel 8a which is one of the supported devices. The app offers essentially the same experience as the emulated Ubuntu in Termux:

The Terminal app operates by launching a Debian Linux environment within a virtual machine, powered by Android’s Virtualization Framework (AVF). Rather than exposing the underlying Android file system, it gives you an isolated Linux shell—much like what ChromeOS has offered developers and enthusiasts for years.
ikkaro.net

To enable it, you have to setup Developer Mode (by repeatedly clicking the Build Number in the settings) and activate the Linux Development Environment in the settings, as described at ikkaro.net.

But of course the app has limitations compared to Termux:

Slow startup times: Booting the Terminal can take 10-20 seconds since it loads a full virtual machine (especially on mid-range hardware).

Limited file system access: By default, you can only access your phone’s Downloads directory (via cd /mnt/shared), not the entire Android file system. For privacy and security, other folders remain off-limits. [...]

Resource constraints: The VM’s allotted disk space is adjustable (from the Terminal app settings), but heavy use—like AI models or large database systems—can quickly fill storage.

ikkaro.net

One tiny but significant limitation is that the terminal app currently doesn't support tabs, but this feature will apparently be available soon with Android 16. Overall the app feels far less stable than Termux (but supports pasting via external keyboard) and sometimes restarts the UI, clearing the terminal but killing the underlying user session. You should definitely adjust the disk space which is by default limited to 5GB which quickly fills up.

Let's install the utitlities and SapMachine as before with the main difference that we're not running as root by default:

# Install the utilities
sudo apt update
sudo apt install git zsh wget htop
sh -c "$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)"
su # change to root
# Add the SapMachine GPG key
wget -qO- https://dist.sapmachine.io/debian/sapmachine.key | 
tee /etc/apt/trusted.gpg.d/sapmachine.asc > /dev/null
# Add the SapMachine repository
echo "deb https://dist.sapmachine.io/debian/$(dpkg --print-architecture)/ ./" | tee /etc/apt/sources.list.d/sapmachine.list > /dev/null
# Install SapMachine 21
apt update
apt install sapmachine-21-jdk
exit # exit root

And of course you should set JAVA_HOME as before to prevent maven from complaining later.

We now have a SapMachine:

There are now technically two SapMachines and one OpenJDK installed on my tiny phone. Maybe I should install an OpenJ9 build just for good measure...

We install VSCode as before too:

wget 'https://code.visualstudio.com/sha/download?build=stable&os=linux-deb-arm64' -O code.deb
sudo apt install ./code.deb
# Install the missing packages
sudo apt --fix-broken install
rm code.deb

During the installation I got asked whether I want to add the Microsoft apt repository, having this prompt on my phone just looks funny:

Unsuprisingly, launching VSCode via nohup code serve-web --port 8080 --without-connection-token & and then accessing the front-end via localhost:8080 works:

Now the thing you're all waiting for: How long does a git clone take and do the CAP Java tools work in building the SFlight CAP app on this system?

Though first we have to update the NodeJS version (via nodejs.org): Report bug, that it contains "\.. source ..." which doesn't work

# Download and install nvm:
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh | bash

# in lieu of restarting the shell
. "$HOME/.nvm/nvm.sh"

# Download and install Node.js:
nvm install 22

Now we can run the commands as before:

mkdir code # some hygiene
cd code
git clone https://github.com/SAP-samples/cap-sflight
# took: 3.58s user 1.52s system 109% cpu 4.643 total
# which sits in between pure Termux and the emulated Ubuntu
cd cap-sflight
# install the Maven build system and npm
sudo apt install maven

npm ci # NodeJS based CAP tools
npm run build:ui
mvn spring-boot:run

And well, it worked... Which I find surprising. The UI of the new Linux Terminal App might be lacking and the partition size limit might be limiting, but I can build and run the stock CAP SFlight application without any changes.

Well it worked till I opened the browser and then the Linux Terminal App crashed. I'm happy that I documented everything. I tried it two times more. But this means that despite it's apparent benefits, it's basically unusuable for this specific use case for now.

Hopefully I can revisit this in a few months and it's fixed.

Update: This terminal works much better with Android 16. It's still flaky, but it can run a proper CAP build-pipeline and VSCode without any issues.

Extra: CAP-in-the-Pocket VSCode Extension

Switching between VSCode for editing and the shell for killing the previous SFlight server instance is too cumbersome for my demo. So I created the CAP-in-the-Pocket extension for VSCode:

To install it, download the extension from GitHub and open with the Termux app to make it available to VSCode. Then open the Extensions view in VSCode, click the "..." at the top of the Extensions view, select "Install from VSIX..." and select the downloaded file.

Please be aware that this extension is highly experimental and only created with the specific demo in mind.

The "(Re)Launch CAP App" button tries to kill the previous running instance and relaunches it:

(lsof -ti:4004 | xargs kill -9) || killall java || true
mvn spring-boot:run

Did I say this extension is highly experimental?

Below the button, you see the output of the commands and links to the CAP application. By default I show buttons that open the two main views of SFlight. But you can configure it via the settings.json file. The default configuration is equivalent to:

"cap-in-the-pocket.urlButtons": [
  {
    "label": "Travel Processor",
    "url": "http://localhost:4004/travel_processor/dist/index.html"
  },
  {
    "label": "Travel Analytics",
    "url": "http://localhost:4004/travel_analytics/dist/index.html"
  }
]

Creating this little plugin (with the help of Claude Sonnet and GitHub Copilot) allows me to have a more immersive demo.

Please be aware that you should only use the extension when you access VSCode in the normal browser, as opening links in the web-app added to the home-screen doesn't work properly.

Conclusion

In this blog post, I showed you how to install and use VSCode and Java on your Android phone and develop applications, using three different options. Of course there a cloud-offerings that allow you to develop applications from mobile devices too, but I like the simplicity of running the web application directly on your phone, with full control. All of this is possible because Android phones use Linux as their base level operating system.

You could go even further and use the newly introduced Android Desktop Mode with an external screen, connect a keyboard and a mouse to your device and you come close to a basic Linux computer. And if you use the phone directly, you have a laptop with really good battery life, albeit not the best performance. It probably won't become my daily Linux driver but to do coding when I'm on the go.

Thank you for joining me on this journey to learn how to develop your web applications on your phone.

P.S.: I tried compiling the OpenJDK on my phone but Termux killed the process and the Linux Terminal App ran out of disk memory.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone. It has been first published on my personal blog. Thanks to Antje Luttenberger and Guilherme Dellagustin from the SAP OSPO for inspiring this work and Robin de Silva Jayasinghe from CAP Java for helping to setup SFlight and fixing bugs along the way.

The post CAP in the Pocket: Developing Java Applications on your Phone appeared first on foojay.

A Glance into JFR Class and Method Tagging

Johannes Bechberger — Mon, 17 Feb 2025 09:26:02 +0000

Table of Contents

TagsTaggingConclusion

Ever wonder how the JDK Flight Recorder (JFR) keeps track of the classes and methods it has collected for stack traces and more?

In this short article, I'll explore JFR tagging and how it works in the OpenJDK.

Tagging

We can visualize the whole life cycle of a tag for a given entity:

In this example, the entity, a class, is brought into JFR by the method sampler (link) while walking another thread's stack. This causes the class to be tagged and enqueued in the internal entity queue (and is therefore known to the JFR writer) if it hasn't been tagged before (source):

inline void JfrTraceIdLoadBarrier::load_barrier(const Klass* klass) {
  SET_METHOD_AND_CLASS_USED_THIS_EPOCH(klass);
  assert(METHOD_AND_CLASS_USED_THIS_EPOCH(klass), "invariant");
  enqueue(klass);
  JfrTraceIdEpoch::set_changed_tag_state();
}

inline traceid JfrTraceIdLoadBarrier::load(const Klass* klass) {
  assert(klass != nullptr, "invariant");
  if (should_tag(klass)) {
    load_barrier(klass);
  }
  assert(METHOD_AND_CLASS_USED_THIS_EPOCH(klass), "invariant");
  return TRACE_ID(klass);
}

This shows that tagging also prevents entities from being duplicated in a chunk.

Then, when a chunk is written out. First, a safepoint is requested to initialize the next period (the next chunk) and the period to be toggled so that the subsequent use of an entity now belongs to the new period and chunk. Then, the entity is written out, and its tag for the previous period is reset (code). This allows the aforementioned concurrency.

But how does it ensure that the tagged classes aren't unloaded before they are emitted? By writing out the classes when any class is unloaded. This is simple yet effective and doesn't need any change in the GC.

Conclusion

Tagging is used in JFR to record classes properly, methods, and other entities while also preventing them from accidentally being garbage collected before they are written out. This is a simple but memory-effective solution. It works well in the context of concurrency but assumes entities are used in the event creation directly when tagging them. It is not supported to tag the entities and then push them into the queue to later create events asynchronously. This would probably require something akin to reference counting.

Thanks for coming this far in a blog post on a profiling-related topic. I chose this topic because I wanted to learn more about tagging, and I plan to do more of these short OpenJDK-specific posts.

P.S.: I gave three talks at FOSDEM, on fuzzing schedulers, sched-ext, and profiling.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone. It first appeared on my personal blog mostlynerdless.de.

The post A Glance into JFR Class and Method Tagging appeared first on foojay.

The slow Death of the onjcmd Debugger Feature

Johannes Bechberger — Sat, 18 Jan 2025 10:47:30 +0000

Table of Contents

JCmd triggered debuggingSummaryProblemSolutionConclusion

Almost to the day, one and a quarter years ago, I published my blog post called Level-up your Java Debugging Skills with on-demand Debugging. In this artucle, I wrote about multiple rarely known and rarely used features of the Java debugging agent, including the onjcmd feature.

To quote my own article:

JCmd triggered debugging

There are often cases where the code that you want to debug is executed later in your program’s run or after a specific issue appears. So don’t waste time running the debugging session from the start of your program, but use the onjcmd=y option to tell the JDWP agent to wait with the debugging session till it is triggered via jcmd.

A similar feature long existed in the SAPJVM. In 2019 Christoph Langer from SAP decided to add it to the OpenJDK, where it was implemented in JDK 12 and has been there ever since.

The alternative to using this feature is to start the debugging session at the beginning and only connect to the JDWP agent when you want to start debugging. But this was, for a time, significantly slower than using the onjcmd feature (source):

After the feature had been merged, it was decided that it needed a CSR because it was user-facing. But the feature wasn't it without its opponents, and the CSR was only accepted because the feature had already been merged:

After consultation with others including Alan Bateman and Mark Reinhold, I've concluded there is lack of technical consensus on this appropriateness of the feature in its current state to the platform.

As noted in the CSR FAQ (https://wiki.openjdk.java.net/display/csr/CSR+FAQs):

"In exceptional circumstances, the need for a CSR review may be recognized only after a push has already occurred. In such cases, a retroactive CSR review can be conducted. The results of such a retroactive review may require updates to the change, up to and including complete removal of the change."

Administratively, I'm retroactively voting to approve this CSR as it has already been pushed in JDK 12; however, given the lack of consensus, I've filed the follow-up bug JDK-8226608 to:

hide the onjcmd option from the help output

explore hiding "VM.start_java_debugging" from the "jcmd help"

This bug needs to be addressed before JDK 13 ramdown 2.JOE Darcy in His Comment to THE CSR

So, it was decided to remove it with JDK-8226608, as Joe Darcy mentions in his comment with the CSR JDK-8227078:

Summary

Hide the onjcmd option of the jdwp agent and the corresponding VM.start_java_debugging command, without removing the functionality outright.

Problem

According to JDK-8223456 the onjcmd option and the corresponding diagnostic command should be hidden as far as possible.

Solution

The onjcmd option is not mentioned in the help output of the JDWP agent anymore. The corresponding diagnostic command VM.start_java_debugging is now registered as hidden, so it would not be included in the list of supported commands by jcmd or via the mbeans.

Apart from that the functionality is still working.

This is probably one of the major reasons nobody wrote about it: nobody outside the SAP, the few people involved in its inception, and the JDWP agent knew about it. If you search the internet for the onjcmd feature, you will likely only encounter articles from this very blog (and its various cross-posts).

So this feature was a hidden gem for a while, but as discussed in my article Is JDWP's onjcmd feature worth using?, this feature is not worth using anymore:

Between JDK 11.0.3 and JDK 21, there have been improvements to the OpenJDK, some of which drastically improved the performance of the JVM in debugging mode. Most notable is the fix for JDK-8227269 by Roman Kennke. [...]

This clearly shows the significant impact of the change. 11.0.3 came out on Apr 18, 2019, and 11.0.9 on Jul 15, 2020, so the onjcmd improved on-demand debugging for almost a year.

So, the feature has been hidden and has offered no benefits since mid-2020. It's just sitting in the OpenJDK, likely unused and unknown by most developers. The last thing to do is remove the feature. For this, I created the CSR with the help of Christoph:

Summary

Remove the onjcmd option from the jdwp agent, because it is considered obsolete and unused.

Problem

[...]

However, it is not needed anymore, as the performance issue has been fixed, and the networking/open port topic can easily be handled by infrastructure. Furthermore, the option is rarely used due to being hidden via JDK-8227078. So, we should remove the feature along with its coding to reduce complexity.

Solution

Remove the onjcmd option from the JDWP agent and eliminate the corresponding VM.start_java_debugging command in the JVM. This will clean up the agent code and remove obsolete functionality that is no longer needed or used.

For such CSRs, one also needs to state the compatibility risks. As explained before, there are possibly none outside of SAP. Together with my related PR, this will remove the feature from the OpenJDK, and JDK 24 will most probably be the first JDK since JDK 12 without the onjcmd debugger feature. RIP.

Conclusion

In this week's artilce, we saw the life cycle of the onjcmd feature, from its inception to its removal. As software developers, we shouldn't be too afraid to remove features we or our teams implemented. Every unused removed feature is a good feature. Large projects, like the OpenJDK, tend to collect lots of features that were great years ago but fell out of use and clog the source code. In my opinion, this also includes other JDWP agent features like onthrow. To be slightly more controversial, why not start deprecating the UI stack and moving it into a separate project like JFX?

But what do you think? Do you have a use for onjcmd and will miss it? Whatever your opinion is, I hope you liked my article. See you in my next article.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone. Thank you to Christopher Langer and Cris Plummer for the help with the CSR, and the PR. The article first appeared in October 2024 on my personal blog.

P.S: Stuart Marks, aka Dr. Deprecator, likes the removal of unused features. I managed to meet him at Devoxx Belgium this week:

The post The slow Death of the onjcmd Debugger Feature appeared first on foojay.

Who instruments the instrumenters?

Johannes Bechberger — Thu, 30 May 2024 16:25:45 +0000

Table of Contents

Meta-AgentSpring and MockitoConclusion

Fixing bugs in Spring Boot and Mockito by instrumenting them

Have you ever wondered how libraries like Spring and Mockito modify your code at run-time to implement all their advanced features?

Wouldn't it be cool to get a peek behind the curtains?

This is the premise of my meta-agent, a Java agent to instrument instrumenters, to get these insights and what this blog post is about.

This article is a collaboration with Mikaël Francoeur, who had the idea for the meta-agent and wrote most of this post. So it's my first ever post-collaboration. But I start with a short introduction to the agent itself before Mikaël takes over with real-world examples.

Meta-Agent

The meta-agent (GitHub) is a Java agent that instruments the Instrumentation.addTransformer methods agents use to add bytecode transformers and wrap the added transformers to capture bytecode before and after each transformation.

This allows the agent to capture what every instrumenting agent does at run-time. I covered the basics of writing your own instrumenting agent before in my blog post Instrumenting Java Code to Find and Handle Unused Classes and my related talk. So, I'll skip all the implementation details here.

But how can you use it? You first have to download the agent (or build it from scratch via mvn package -DskipTests), then you can just attach it to your JVM at the start:

java -javaagent:target/meta-agent.jar -jar your-program.jar

This will then create a web server at http://localhost:7071 that allows you to inspect the bytecode modifications of each instrumenter dynamically. For the example from the README http://localhost:7071/full-diff/class?pattern=java.lang.Iterable shows you, for example, how Mockito modifies the Iterable class upon mocking:

This uses Vineflower to decompile the bytecode, but you can, of course, also view raw bytecode diff, as Vineflower might not always produce correct code. Just add &mode=javap to the previous URL:

Another nice feature is that the agent allows you to inspect almost all classes, even if they haven't been modified. This enables you to gain insights into code you don't have the source code.

Now I'm handing it over to Mikaël, who actually fixed bugs using my tool:

Spring and Mockito

Two of the tools I work the most with are Spring and Mockito, and both make liberal use of bytecode generation and modification.

Spring instruments your application classes using proxies generated with CGLIB or JDK's Proxy class, and Mockito uses the Bytebuddy library to modify classes at run-time. This lets these tools extend Java code in ways that would not otherwise be possible. In this next section, I'll discuss how we can use the meta-agent to gain more observability over these processes.

Run-time Proxies in Spring

The first example that comes to mind is proxying classes, a technique frequently used to add arbitrary behavior to an existing object and used extensively by Spring. The JDK contains java.lang.reflect.Proxy, a class that lets you implement arbitrary interfaces to generate objects at run-time. For example, here is how you would implement a JDK proxy:

interface Door {
  @Secured
  void open();
  boolean isOpen();
}

Door makeSecure(Door door) {
  return (Door) Proxy.newProxyInstance(
    getClass().getClassLoader(), // classloader
    new Class[] { Door.class }, // interfaces
    (Object proxy, Method method, Object[] args) -> {
      if (method.getAnnotation(Secured.class) != null 
          && !userIsAuthorized()) {
        throw new RuntimeException(
          "user is unauthorized to access method %s"
          .formatted(method.getName())
        );
      }
      return method.invoke(door, args);
    }
  );
}

@Test
void testSecured() {
  Door securedDoor = makeSecure(new SimpleDoor());
  setUserIsAuthorized(false);
  assertThatException().isThrownBy(door::open);
}

These few lines of code are enough to implement an annotation that can be reused on arbitrary interfaces. In fact, a lot of what Spring does can be summed up by this code snippet.

But alas, the JDK only supports this for interfaces, not for classes, so you wouldn't be able to use this if Door, or any of the types annotated with @Secured, was not an interface, even though the code would still make perfect sense from a developer's point of view. CGLIB addresses this gap.

CGLIB is a bytecode generation library that is now abandoned but forked, repackaged, and used extensively by Spring (1). It has an API that emulates java.lang.reflect.Proxy, but that also works with classes. This lets us generify our little security framework to also instrument classes:

 T makeSecure(T t) {
  return (T) Enhancer.create(
    t.getClass(), // we can now use classes!
    (MethodInterceptor) (o, method, objects, methodProxy) -> {
      // same intercepting behaviour
    }
  );
}

And everything you pass into makeSecure() will now have its @Secured methods intercepted. This is a powerful mechanism, but have you ever wondered what happens when you invoke Proxy.newProxyInstance, or Enhancer.create? What kind of trickery takes place? Well, meta-agent lets us see exactly what is happening. Here is part of the decompiled bytecode from the $Proxy8 class that was dynamically created by the Proxy in the first example above:

final class $Proxy8 extends Proxy implements Door {
  private static final Method m5;

  public final void open() {
    try {
      super.h.invoke(this, m5, null);
    } catch (RuntimeException | Error var2) {
      throw var2;
    } catch (Throwable var3) {
      throw new UndeclaredThrowableException(var3);
    }
  }

  static {
    ClassLoader var0 = $Proxy8.class.getClassLoader();
    m5 = Class.forName("me.bechberger.meta.MockitoTest$Door", 
                       false, var0).getMethod("open");
  }
}

This is something you could very well write in any Java program. In fact, it's a pretty standard implementation of the decorator design pattern. In a static initializer, the relevant methods are first cached in static fields so that getMethod(), a non-trivial operation, only has to happen once. Then, every method is just a simple delegation to the InvocationHandler you provided, surrounded by some error handling. The only peculiar thing about this code is that it was generated at run-time directly in bytecode.

When I first read this, the first thing that struck me was that Proxy wraps undeclared unchecked exceptions (2) into an unchecked UndeclaredThrowableException. This is a documented, albeit little-known, fact and can lead to surprising behavior when using proxy-based frameworks that rely on exception types. More on this later.

Now let's look at what CGLIB does. CGLIB generates a lot of code. The proxy for the SimpleDoor class that implements our Door interface is over 300 lines long. Here is just the isOpen() method. I've renamed the variables for better readability.

public class SimpleDoor$$EnhancerByCGLIB$$b71b2e45 {
  public final boolean isOpen() {
    MethodInterceptor interceptor = this.isOpenCallback;
    if (this.isOpenCallback == null) {
      initCallbacks(this);
      interceptor = this.isOpenCallback;
    }

    if (interceptor != null) {
      Object returnValue = interceptor.intercept(
        this, isOpenMethod, zeroLengthArray, isOpenProxyMethod
      );
      return returnValue == null ? false : (Boolean) returnValue;
    } else {
      return super.isOpen();
    }
  }
}

There are two surprising things here. First, this check for isOpenCallback == null seems redundant. It turns out that CGLIB allows setting or removing callbacks after the proxy is initialized, so proxies have to do this sanity check for every method call.

This doesn't seem to be used much by the Spring Framework, so there might be room for optimization by making CGLIB proxies immutable. Second, the returnValue == null ? false means that if a user-provided callback returns null, CGLIB returns false, instead of throwing a NullPointerException like coercing null to a primitive would typically do. This mostly undocumented behavior happens for all primitives, and when I learned about it, I dug through the Spring Framework and was able to identify and fix one bug.

Remember what the JDK Proxy did with UndeclaredThrowableException? CGLIB proxies in Spring also used to do this, but that changed somewhere along the way. Again, when I learned about it, I opened up the Spring Framework and spotted a regression that affected transaction handling caused by this change. I opened a PR with the fix and it will be available in 6.2:

I found reading the code generated by the JDK Proxy and the CGLIB Enhancer classes to be fascinating because it's something I rely on every day as a Spring developer but never had the chance to see (3). Moreover, it allowed me to spot and contribute fixes for one bug and one regression in Spring.

Mockito

Another tool I use daily where bytecode generation is used extensively is Mockito, a mocking framework used for testing. The motivation behind mocks is to test collaborators.

For example, in a UserService that saves a User and then sends a notification based on the results from the database, how would you test that no notification is sent if saving the user fails? The lightest way to do it is using mocks, and anyone who has handwritten mocks before knows how tedious it can be (4). Mockito makes this easy:

@Test
void givenRepositoryThrowsException_whenSaveUser_thenDoesNotSendNotification() {
  UserRepository userRepository = mock(UserRepository.class);
  NotificationService notificationService = 
    mock(NotificationService.class);
  UserService userService = 
    new UserService(userRepository, notificationService);

  when(userRepository.save(any())).thenThrow(new RuntimeException("nope"));

  assertThatThrownBy(() -> userService.saveUser(new User("Mikaël")))
    .hasMessage("nope");

  verifyNoInteractions(notificationService);
}

Where Mockito shines is not only in its API (mock(), when().thenThrow(), any(), and verifyNoInteractions() in this example), but also its ability to mock even final or static methods. The proxying techniques we've seen so far used subclassing to generate new objects that implemented or extended existing interfaces or classes. But final and static methods can't be overridden. Mockito gets around this by registering a JVM agent and transforming the existing classes (5).

I originally planned on showing the decompiled code as an example. Unfortunately, the Vineflower decompiler doesn't output the most readable code; in this case, it gets confused and generates illegal Java.

I had to switch meta-agent to javap-verbose mode to confirm with the bytecode, and I fixed and edited the code for readability. Here is the UserRepository::save method from the example above, as transformed by Mockito:

class UserRepository {
  // omitting constructor

  User save(User user) {
    MockMethodDispatcher dispatcher =
      MockMethodDispatcher.get("VCcM9ivB", this);

    if (dispatcher != null && dispatcher.isMocked(this)) {
      Method method = UserRepository.class.getDeclaredMethod(
        "save", User.class
      );
      if (!dispatcher.isOverridden(method)) {
        Callable mockCall = dispatcher.handle(
          this, UserRepository.class.getDeclaredMethod(
            "save", User.class
          ), new Object[] { user }
        );
        return mockCall != null ? (User) mockCall.call() : user;
      }
    }

    return user;
  }
}

Notice that the name of the UserRepository class hasn't changed. Whereas Proxy and CGLIB generated new classes that are extended or inherited from our existing types, Mockito transforms the existing class, and every method of the transformed class first checks with a static registry (MockMethodDispatcher) to see if the current object is a mock.

If it is, then it uses the mocked behavior; if not, it uses the object's natural behavior (in this case, just returning the user). Something that isn't so obvious unless you dig through Mockito's source code is that this transformation is done for every class up the chain of inheritance, up to Object. The meta-agent will readily show the complete list of transformed classes using the /classes endpoint.

One potential inefficiency I would like to point out in the code above is that getDeclaredMethod(), a non-trivial operation, is actually called twice on every invocation of mocked methods. It can't be cached in a static field like in the Proxy or CGLIB classes, because that would require defining new fields, and most JVMs don't allow adding fields to existing classes (6).

Storing it in a local variable might result in a performance gain. Still, I wonder if some static cache, like what already exists in MockMethodDispatcher, would translate to faster tests when using Mockito. Finally, it's worth noting that the double invocation of getDeclaredMethod() is almost invisible in Mockito's source code but immediately apparent in the decompiled code.

Verdict

This is decompiled code, and the Vineflower decompiler backing meta-agent struggles in some cases. For the Mockito mock and some CGLIB proxies that I tried, it generated illegal Java. There's a bit where it just gave up and left a code comment. It also doesn't generate @Override annotations consistently, which is weird.

Fortunately, you can also pass ?mode=javap or ?mode=javap-verbose to meta-agent, and it will show the actual bytecode and other low-level information like constant pools.

Conclusion

This was a fun project; there's probably more to uncover by analyzing generated code.

It would also be interesting to use the meta-agent to look at other generated bytecode from popular libraries like Hibernate, EclipseLink, AspectJ, or Lombok.

Or do you have other instrumenters that you want to explore with us?

This project shows how a question during a conference, "How can I inspect code at run-time?", turned into an interesting and eventually helpful project.

References

(1) CGLIB is now unmaintained, but Spring repackages a patched and updated version of CGLIB. The examples in this text are from that version.

(2) Undeclared checked exceptions can be thrown using Lombok's @SneakyThrows, or from a language without checked exceptions, such as Kotlin.

(3) CGLIB does have a DebuggingClassWriter that can be used to output generated classes before they are loaded, and to get the same behaviour for Proxy you can set the jdk.proxy.ProxyGenerator.saveGeneratedFiles system property to true, but it is nowhere near as convenient as using meta-agent.

(4) What Mockito calls "mocks" are called "test doubles" in the literature. See the related discussion.

(5) This is actually the behaviour of the "inline mock maker", the default mock maker as of version 5. Mockito also supports generating mocks with subclasses, similar to how the Proxy and Enhancer classes work.

(6) Rafael Winterhalter's answer to "How to add a field to an existing instance with ByteBuddy?" https://stackoverflow.com/a/58529716/7096763.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone. This article first appeared on my personal blog mostlynerdless.de.

Declutter Your Code: Your Undead Code Is A Time Vampire

The average Java application contains somewhere between 10 to 50% dead code. In this webinar we'll discuss ways of monitoring JVMs across different environments to identify what runs or doesn't run in each, identify what you can get rid of, and how to work better on these larger applications.

Watch On Demand Here

The post Who instruments the instrumenters? appeared first on foojay.

Hello eBPF: XDP-based Packet Filter (9)

Johannes Bechberger — Mon, 13 May 2024 07:13:38 +0000

Table of Contents

Network PacketeXpress Data Path (XDP)Writing a Packet FilterConclusion

Welcome back to my series on ebpf. In the last blog post, we learned how annotation processors can generate C code, simplifying writing eBPF applications.

This week, we'll use this work together with new support for XDP to create a simple package blocker for eBPF (GitHub):

./run_bpf.sh XDPPacketFilter twitter.com

This blocks all incoming IPv4 packages from twitter.com. We see how it works in this blog post. First, we start with some background on networking and explain what XDP is.

Network Packet

All networking is packet-based, with multiple layers of protocol from shared medium (e.g., Ethernet) to application level (e.g., HTTP):

Ethernet is the lowest-level protocol, with all packets coming to and from network interfaces being ethernet packets. The ethernet header contains the "physical" MAC address of both the source and destination of the package, combined with the protocol number of the next level protocol. We can represent it in C as follows:

struct ethhdr {
    unsigned char h_dest[6];
    unsigned char h_source[6];
    __be16 h_proto;
};

Today, Ethernet is routed on switch level, but it was initially used to communicate between devices that shared the same medium, typically cable.

Above the Ethernet protocol sit multiple protocols, but we're focusing here on the Internet Protocol (IP) with protocol type 0x0800. The IP protocol comes in two common variants, IPv4 and IPv6, and is used to communicate between devices on the whole internet. Although IPv6 has many advantages, IPv4 is still commonly used, and we're focusing on this variant in the following section to keep it simple. IP datagrams are typically fragmented into multiple smaller IP packets. An IPv4 header consists of the following parts:

struct iphdr {
    __u8 ihl: 4; // number of 32-bits in the header
    __u8 version: 4; // 4 (IPv4), 6 (IPv6)
    __u8 tos; // "priority" of the packet
    __be16 tot_len; // size of packet (header + data) in bytes
    __be16 id; // id of the datagram that this fragment belongs to
    // offset of this packet fragment in the unfragmented datagram
    __be16 frag_off;
    // hop count, router decrement it and drop package if ttl = 0
    __u8 ttl;
    // next level protocol
    __u8 protocol;
    // checksum, so that sum of all header 32-bit words is 0xFFFF
    __sum16 check;
    // source and destination address
    // (might be changed in transit 
    //  due to network address translation)
    struct {
        __be32 saddr;
        __be32 daddr;
    } addrs;
};

This misses the last field officially specified field, the options field, but it is, according to Wikipedia, usually not used:

The options field is not often used. Packets containing some options may be considered as dangerous by some routers and be blocked.
wikipedia

Above the IP protocol is the TCP protocol, which essentially adds ports and acknowledged package delivery, and on the web, the topmost layer is usually HTTP, which adds URL paths and more.

eXpress Data Path (XDP)

XDP is one of the most essential parts of the eBPF kernel land. It allows users to write firewalls, load balancers, and more, such as the packet filter of this blog post. To quote Jonathan Corbet:

The core idea behind the XDP initiative is to get the network stack out of the way as much as possible. While the network stack is highly flexible, XDP is built around a bare-bones packet transport that is as fast as it can be. When a decision needs to be made or a packet must be modified, XDP will provide a hook for a user-supplied BPF program to do the work. The result combines minimal overhead with a great deal of flexibility, at the cost of a little "some assembly required" label on the relevant man pages.
Accelerating networking with AF_XDP

The eBPF hooks attached to a specific network interface can inspect and modify the incoming packages, let them pass, drop, or send them back. A basic eBPF program that drops all packages looks, for example, like the following:

SEC("xdp")
int xdp_drop(struct xdp_md *ctx) {
    return XDP_DROP:
}

But please don't attach this program, as it would also drop Address Resolution Protocol (ARP) packages, which other members of your local ethernet network can map IP addresses to MAC addresses. Dropping all ARP packages can effectively disconnect your machine from the local network.

The passed xdp_md object contains the package content and some metadata:

struct xdp_md {
    __u32 data;
    __u32 data_end;
    __u32 data_meta;
    __u32 ingress_ifindex;
    __u32 rx_queue_index;
    __u32 egress_ifindex;
};

We're just focusing on the content that can be found between data and data_end. In fact, the header data structures I showed you in the previous section are precisely the structures that describe the content.

Armed with this knowledge, we can now create a package filter:

Writing a Packet Filter

The basic structure of our packet filter application consists of a Java part that handles the configuration and logging and an eBPF part that uses an XDP hook that is called for every received packet. As explained above, The XDP hook decides what to do with every packet. So the structures are as follows:

We start with the definition of eBPF for collecting statistics, blocked packets per IP address, and the configuration of the blocked IP addresses:

@BPFMapDefinition(maxEntries = 256 * 4096)
BPFHashMap blockedIPs;

@BPFMapDefinition(maxEntries = 256 * 4096)
BPFHashMap blockingStats;

Now we move on to the eBPF program that checks for the IPv4 addresses and drops the packet if the address is in the blockedIPs map (based on the program from a blog post of sematext):

#include 
#include 
#include 

// protocol numbers 
// copied from the linux kernel
#define ETH_P_8021Q 0x8100
#define ETH_P_8021AD 0x88A8
#define ETH_P_IP 0x08
#define ETH_P_IPV6 0x86DD
#define ETH_P_ARP 0x0806

SEC("xdp")
int xdp_pass(struct xdp_md *ctx) {
    // the package  
    void *end = (void *)(long)ctx->data_end;
    void *data = (void *)(long)ctx->data;
    u32 ip_src;
    u64 offset;
    u16 eth_type;

    struct ethhdr *eth = data;
    offset = sizeof(*eth);

    if (data + offset > end) {
        // ethernet package header is incomplete
        return XDP_ABORTED;
    }
    eth_type = eth->h_proto;

    /* handle VLAN tagged packet */
    // we use bpf_htons for the check to convert
    // from hardware to network endianess
    if (eth_type == bpf_htons(ETH_P_8021Q) || 
          eth_type == bpf_htons(ETH_P_8021AD)) {
        struct vlan_hdr *vlan_hdr;

        vlan_hdr = (void *)eth + offset;
        offset += sizeof(*vlan_hdr);
        if ((void *)eth + offset > end) {
            // ethernet package header is incomplete
            return false;
        }
        eth_type = vlan_hdr->h_vlan_encapsulated_proto;
    }

    /* let's only handle IPv4 addresses */
    if (eth_type != bpf_htons(ETH_P_IP)) {
        return XDP_PASS;
    }
    
    // get the IPv4 header
    struct iphdr *iph = data + offset;
    offset += sizeof(struct iphdr);
    
    // make sure the bytes you want to read are 
    // within the packet's range before reading them
    if (iph + 1 > end) {
        return XDP_ABORTED;
    }
    ip_src = iph->saddr;

    // find entry in block list
    void* ret = (void*)bpf_map_lookup_elem(&blockedIPs, &ip_src);
    if (!ret) {
        // IP not in blocked list
        return XDP_PASS;
    }

    // count the number of blocked packages per IP address
    s32* counter = bpf_map_lookup_elem(&blockingStats, &ip_src);
    if (counter) {
        // use atomics to prevent a race condition when a packet
        // from the same IP address is received on two
        // different cores at the same time
        // (thanks Dylan Reimerink for catching this bug)
        __sync_fetch_and_add(counter, 1);
    } else {
        u64 value = 1;
        bpf_map_update_elem(&blockingStats, &ip_src, 
                            &value, BPF_ANY);
    }

    return XDP_DROP;
}

Now we use it with some picocli-based command line handling to build our application:

@BPF(license = "GPL")
@Command(name = "XDPPacketFilter", 
         mixinStandardHelpOptions = true, 
         description = "Use XDP to block " +
                       "incoming IPv4 packages from a URLs")
public abstract class XDPPacketFilter 
  extends BPFProgram implements Runnable {

    // maps, ...

    private static final String EBPF_PROGRAM = """
            // ...
            """;

    @Parameters(arity = "1..*", description = "URLs to block")
    private String[] blockedUrls;

    @Option(names = "--run-url-retrieve-loop", 
        description = "Try to retrieve the content " + 
                      "of the first URL in a loop")
    private boolean runURLRetrieveLoop;
   
    private Map ipToUrlMap;
    
    void setupBlockedIPMap() {
        ipToUrlMap = Arrays.stream(blockedUrls).flatMap(url -> {
            try {
                // Resolve the URL to the related IP addresses
                return Arrays.stream(
                    InetAddress.getAllByName(url))
                               .map(addr -> 
                       // convert the IP address to numbers
                       Map.entry(XDPUtil.ipAddressToInt(addr), url));
            } catch (UnknownHostException e) {
                throw new RuntimeException(e);
            }
        }).collect(Collectors.toMap(
             Map.Entry::getKey, 
             Map.Entry::getValue));
        ipToUrlMap.keySet().forEach(ip -> {
            // put the IP addresses in the map
            blockedIPs.put(ip, true);
        });
    }
    
    // print the content of blockingStats
    void printBlockedLog() {
        out.println("Blocked packages:");
        blockingStats.forEach((ip, count) -> {
            out.println("  Blocked " + count + " packages from " +
                    XDPUtil.intToIpAddress(ip) +
                    " (" + ipToUrlMap.get(ip) + ")");
        });
    }

    @Override
    public void run() {
        setupBlockedIPMap();
        if (runURLRetrieveLoop) {
            XDPUtil.openURLInLoop(blockedUrls[0]);
        }
        xdpAttach(getProgramByName("xdp_pass"), 
            XDPUtil.getNetworkInterfaceIndex());
        // print the blocking statistics every second
        while (true) {
            printBlockedLog();
            try {
                Thread.sleep(1000);
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            }
        }
    }

    public static void main(String[] args) {
        try (XDPPacketFilter program = 
              BPFProgram.load(XDPPacketFilter.class)) {
            var cmd = new CommandLine(program);
            cmd.parseArgs(args);
            if (cmd.isUsageHelpRequested()) {
                cmd.usage(out);
                return;
            }
            program.run();
        }
    }
}

This is all we need, now we can use it:

# block twitter.com and log in the background
> ./run_bpf.sh XDPPacketFilter twitter.com > log.txt &
# try to access twitter.com with a timeout of 5 seconds
> wget twitter.com --timeout 5
URL transformed to HTTPS due to an HSTS policy
--2024-04-22 13:28:29--  https://twitter.com/
Resolving twitter.com (twitter.com)... 104.244.42.65
Connecting to twitter.com (twitter.com)|104.244.42.65|:443... failed: Connection timed out.
Retrying.

--2024-04-22 13:28:36--  (try: 2)  https://twitter.com/
Connecting to twitter.com (twitter.com)|104.244.42.65|:443... failed: Connection timed out.
Retrying.

--2024-04-22 13:28:43--  (try: 3)  https://twitter.com/
Connecting to twitter.com (twitter.com)|104.244.42.65|:443... failed: Connection timed out.
Retrying.

# and so on

So we can't access twitter.com anymore till we stop our application.

Conclusion

Using XDP and eBPF, we can create a partial packet filter that is easily extended into a firewall and blocks incoming packets. The filtering overhead is low, as the packets are processed directly in the kernel. Using hello-ebpf, we can wrap the filter program in a neat command-line application.

But this is only the starting point; we can use a few more XDP features to create a fast load-balancer and add support for cgroups to create a proper firewall, filtering out-going packages too. The following two blog posts will be about this, so see you in two weeks' time.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone. This article first appeared on my personal blog mostlynerdless.de.

The post Hello eBPF: XDP-based Packet Filter (9) appeared first on foojay.

Hello eBPF: Generating C Code (8)

Johannes Bechberger — Wed, 24 Apr 2024 13:11:40 +0000

Table of Contents

Generating Struct DefinitionsGenerating Map DefinitionsRing Buffer Sample ProgramConclusionAddendum

Welcome back to my series on ebpf. In the last article, we learned how to auto-layout struct members and auto-generate BPFStructTypes for annotated Java records. We're going to extend this work today.

This is a rather short article, but the implementation and fixing all the bugs took far more time then expected.

Generating Struct Definitions

We saw in the last article how powerful Java annotation processing is for generating Java code; this week, we'll tackle the generation of C code: In the previous article, we still had to write the C struct and map definitions ourselves, but writing

struct event {
  u32 e_pid;
  char e_filename[FILE_NAME_LEN];
  char e_comm[TASK_COMM_LEN];
};

when we already specified the data type properly in Java

record Event(@Unsigned int pid,
             @Size(FILE_NAME_LEN) String filename,
             @Size(TASK_COMM_LEN) String comm) {}

seems to be a great place to improve our annotation processor. There are only two problems:

The annotation processor needs to know about BPFTypes, so we have to move them in there. But the BPFTypes use the Panama API which requires the --enable-preview flag in JDK 21, making it unusable in Java 21. So we have to move the whole library over to JDK 22, as this version includes Panama.
There is no C code generation library like JavaPoet for generating Java code.

Regarding the first problem: Moving to JDK 22 is quite easy, the only changes I had to make are listed in this gist. The only major problem was getting the Lima VM to use a current JDK 22. In the end I resorted to just using sdkman, you can a look into the install.sh script to see how I did it.

Regarding the second problem: We can reduce the problem of generating C code into two steps:

Create an Abstract Syntax Tree (AST) for C
Create a pretty printer for this AST

To create an AST I resorted to an ANSI C grammar for inspiration. Each AST node implements the following interface:

public interface CAST {

    List children();

    Statement toStatement();

    /** Generate pretty printed code */
    default String toPrettyString() {
        return toPrettyString("", "  ");
    }

    String toPrettyString(String indent, String increment);
}

We can then create a hierarchy of extending interfaces (PrimaryExpression, ...) and implementing records (ConstantExpression, ...). You can find the whole C AST on GitHub.

This leads us to an annotation processor that can add automatically insert struct definitions into the C code of our eBPF program, reducing the amount of hard-to-debug errors as it is guaranteed that both the Java specification and C representation of every type are compatible.

But can we do more with annotation processing?

Generating Map Definitions

There is another definition that we can auto-generate: Map definitions like

 struct                                
 {                                     
   __uint (type, BPF_MAP_TYPE_RINGBUF);
   __uint (max_entries, 256 * 4096);   
 } rb SEC (".maps");

which define maps like hash maps and ring buffers that allow the communication between user- and kernel-space.

With a little of annotation processor, we can define the same ring buffer from above in Java:

@BPFMapDefinition(maxEntries = 256 * 4096)
BPFRingBuffer rb;

Our annotation-processor then turns this into the C definition from above and inserts code into the constructor of the Java program that properly initializes rb.

But how does the processor know what code it should generate? By parsing the BPFMapClass annotation on BPFRingBuffer (and any other class). This annotation contains the templates for both the C and the Java code:

@BPFMapClass(
        cTemplate = """
        struct {
            __uint (type, BPF_MAP_TYPE_RINGBUF);
            __uint (max_entries, $maxEntries);
        } $field SEC(".maps");
        """,
        javaTemplate = """
        new $class<>($fd, $b1)
        """)
public class BPFRingBuffer extends BPFMap {
}

Here $field is the Java field name, $maxEntries the value in the BPFMapDefinition annotation and $class the name of the Java class. $cX, $bX, $jX give the C type name, BPFType and Java class names related to the X^th type parameter.

Ring Buffer Sample Program

When we combine all this together we can have a much simpler ring buffer sample program (see TypeProcessingSample2 on GitHub):

@BPF(license = "GPL")
public abstract class TypeProcessingSample2 extends BPFProgram {

    private static final int FILE_NAME_LEN = 256;
    private static final int TASK_COMM_LEN = 16;

    @Type(name = "event")
    record Event(
      @Unsigned int pid, 
      @Size(FILE_NAME_LEN) String filename, 
      @Size(TASK_COMM_LEN) String comm) {}

    @BPFMapDefinition(maxEntries = 256 * 4096)
    BPFRingBuffer rb;

    static final String EBPF_PROGRAM = """
            #include "vmlinux.h"
            #include 
            #include 
            #include 
              
            // This is where the struct and map
            // definitions are inserted automatically          
                  
            SEC ("kprobe/do_sys_openat2")
            int kprobe__do_sys_openat2 (struct pt_regs *ctx)
            {
               // ... // as before
            }
            """;

    public static void main(String[] args) {
        try (TypeProcessingSample2 program = 
           BPFProgram.load(TypeProcessingSample2.class)) {
            program.autoAttachProgram(
              program.getProgramByName("kprobe__do_sys_openat2"));
            // we can use the rb ring buffer directly
            // but have to set the call back
            program.rb.setCallback((buffer, event) -> {
                System.out.printf(
                  "do_sys_openat2 called by:%s " + 
                  "file:%s pid:%d\n", 
                  event.comm(), event.filename(), 
                  event.pid());
            });
            while (true) {
                // consumes all registered ring buffers
                program.consumeAndThrow();
            }
        }
    }
}

There are two other things missing in the C code that are also auto-generated: Constant defining macros and the license definition. Macros are generated for all static final fields in the program class that are defined at compile time.

Conclusion

Using annotation processing allows to reduce the amount of C code we have to write and reduces errors by generating all definitions from the Java code. This simplifies writing eBPF applications.

See you in two weeks when we tackle global variables, moving closer and closer to making hello-ebpf's bpf support able to write a small firewall.

This will also be the topic of a talk that I submitted together with Mohammed Aboullaite to several conferences for autumn.

Addendum

The more I work on writing my own ebpf library, the more I value the effort that the developers of other libraries like bcc, the Go or Rust ebpf libraries put it in to create usable libraries.

They do this despite the lack of of proper documentation.

A simple example is the deattaching of attached ebpf programs: There are multiple (undocumented) methods in libbpf that might be suitable; bpf_program__unload, bpf_link__detach, bpf_link__destroy, bpf_prog_detach, but only bpf_link__destroy properly detached a program.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone. This article first appeared on my personal blog mostlynerdless.de.

Webinar, September 10: Boost JVM Scalability & Performance With Kotlin and Azul

Kotlin’s expressive features, like coroutines and inline classes, simplify scalable code, while advanced JVM technologies, such as Azul’s optimized JIT compiler and garbage collector, help maximize runtime efficiency.

The post Hello eBPF: Generating C Code (8) appeared first on foojay.

Hello eBPF: Auto Layouting Structs (7)

Johannes Bechberger — Sun, 14 Apr 2024 16:19:50 +0000

Table of Contents

Struct ExampleAlignment RulesStruct Example with PaddingAuto-Layouting in hello-ebpfConclusion

Welcome back to my series on ebpf. In the last article, we learned how to use ring buffers with libbpf for efficient communication. This week, we're looking into the memory layout and alignment of structs transferred between the kernel and user-land.

Alignment is essential; it specifies how the compiler layouts the structs and variables and where to put the data in memory. Take, for example, the struct that we defined in the previous article in the RingSample:

#define FILE_NAME_LEN 256
#define TASK_COMM_LEN  16
                
// Structure to store the data that we want to pass to user
struct event {
  u32 e_pid;
  char e_filename[FILE_NAME_LEN];
  char e_comm[TASK_COMM_LEN];
};

Struct Example

Using Pahole in the Compiler Explorer, we can see the memory layout on amd64:

struct event {
	unsigned int               e_pid;                /*     0     4 */
	char                       e_filename[256];      /*     4   256 */
	/* --- cacheline 4 boundary (256 bytes) was 4 bytes ago --- */
	char                       e_comm[16];           /*   260    16 */

	/* size: 276, cachelines: 5, members: 3 */
	/* last cacheline: 20 bytes */
};

This means that the know also knows how to transform member accesses to this struct and can adequately place the event in the allocated memory:

You've actually seen the layouting information before, as the hello-ebpf project requires you to hand layout all structs manually:

record Event(@Unsigned int pid,
             @Size(FILE_NAME_LEN) String filename,
             @Size(TASK_COMM_LEN) String comm) {}

// define the event records layout
private static final BPFStructType eventType =
        new BPFStructType<>("rb", List.of(
        new BPFStructMember<>("e_pid",
                BPFIntType.UINT32, 0, Event::pid),
        new BPFStructMember<>("e_filename",
                new StringType(FILE_NAME_LEN),
                4, Event::filename),
        new BPFStructMember<>("e_comm",
                new StringType(TASK_COMM_LEN),
                4 + FILE_NAME_LEN, Event::comm)
   ), new AnnotatedClass(Event.class, List.of()),
   fields -> new Event((int)fields.get(0),
       (String)fields.get(1), (String)fields.get(2)));

eBPF is agnostic regarding alignment, as the compiler on your system compiles the eBPF and the C code, so the compiler can decide how to align everything.

Alignment Rules

But where do these alignment rules come from? They come from how your CPU works. Your CPU usually only allows/is optimized for certain types of accesses. So, for example, x86 CPUs are optimized for accessing 32-bit integers that lay at addresses in memory that are a multiple of four. The rules are defined in the Application Binary Interface (ABI). The alignment rules for x86 (64-bit) on Linux are specified in the System V ABI Specification:

And more, but in general, scalar types are aligned by their size. Structs, unions, and arrays are, on the other hand, aligned based on their members:

Structures and unions assume the alignment of their most strictly aligned component. Each member is assigned to the lowest available offset with the appropriate alignment. The size of any object is always a multiple of the object‘s alignment.

An array uses the same alignment as its elements, except that a local or global array variable of length at least 16 bytes or a C99 variable-length array variable always has alignment of at least 16 bytes.

Structure and union objects can require padding to meet size and alignment constraints. The contents of any padding is undefined.
System V Application Binary Interface
AMD64 Architecture Processor Supplement
Draft Version 0.99.6

ARM 64-but has the same scalar alignments and struct alignment rules (see Procedure Call Standard for the Arm® 64-bit Architecture (AArch64)); we can therefore use the same layouting algorithm for both CPU architectures.

We can formulate the algorithm for structs as follows:

struct_alignment = 1
current_position = 0
for member in struct:
  # compute the position of the member
  # that is properly aligned
  # this introduces padding (empty space between members)
  # if there are alignment issues
  current_position = \
    math.ceil(current_position / alignment) * member.alignment
  member.position = current_position
  # the next position has to be after the current member
  current_position += member.size
  # the struct alignment is the maximum of all alignments
  struct_alignment = max(struct_alignment, member.alignment)

With this at hand, we can look at a slightly more complex example:

Struct Example with Padding

The compiler, at times, has to create an unused memory section between two members to satisfy the individual alignments. This can be seen in the following example:

struct padded_event {
  char c;  // single byte char, alignment of 1
  long l;  // alignment of 8
  int i;   // alignment of 4
  void* x; // alignment of 8
};

Using Pahole again in the Compiler Explorer, we see the layout that the compiler generates:

struct padded_event {
	char                       c;                    /*     0     1 */

	/* XXX 7 bytes hole, try to pack */

	long                       l;                    /*     8     8 */
	int                        i;                    /*    16     4 */

	/* XXX 4 bytes hole, try to pack */

	void *                     x;                    /*    24     8 */

	/* size: 32, cachelines: 1, members: 4 */
	/* sum members: 21, holes: 2, sum holes: 11 */
	/* last cacheline: 32 bytes */
};

Pahole tells us that it had to introduce 11 bytes of padding. We can visualize this as follows:

This means that we're essentially wasting memory. I recommend reading The Lost Art of Structure Packing by Eric S. Raymond to learn more about this. If we really want to save memory, we could reorder the int with the long member, thereby only needing the padding after the char, leading to an object with 24 bytes and only 3 bytes of padding. This is really important when storing many of these structs in arrays, where the wasted memory accumulates.

But what do we do with this knowledge?

Auto-Layouting in hello-ebpf

The record that we defined in Java before contains all the information to auto-generate the BPFStructType for the class; we just need a little bit of annotation processor magic:

@Type
record Event(@Unsigned int pid,
             @Size(FILE_NAME_LEN) String filename,
             @Size(TASK_COMM_LEN) String comm) {}

This record is processed, and out comes the suitable BPFStructType:

We implemented the auto-layouting in the BPFStructType class to reduce the amount of logic in the annotation processor.

This results in a much cleaner RingSample version, named TypeProcessingSample:

@BPF
public abstract class TypeProcessingSample extends BPFProgram {

    static final String EBPF_PROGRAM = """...""";

    private static final int FILE_NAME_LEN = 256;
    private static final int TASK_COMM_LEN = 16;

    @Type
    record Event(@Unsigned int pid, 
                 @Size(FILE_NAME_LEN) String filename, 
                 @Size(TASK_COMM_LEN) String comm) {}


    public static void main(String[] args) {
        try (TypeProcessingSample program = BPFProgram.load(TypeProcessingSample.class)) {
            program.autoAttachProgram(
              program.getProgramByName("kprobe__do_sys_openat2"));

            // get the generated struct type
            var eventType = program.getTypeForClass(Event.class);

            var ringBuffer = program.getRingBufferByName("rb", eventType,
             (buffer, event) -> {
                System.out.printf("do_sys_openat2 called by:%s file:%s pid:%d\n", 
                                  event.comm(), event.filename(), event.pid());
            });
            while (true) {
                ringBuffer.consumeAndThrow();
            }
        }
    }
}

The annotation processor currently supports the following members in records:

integer types (int, long, …), optionally annotated with @Unsigned if unsigned
String types, annotated with @Size to specify the size
Other @Type annotated types in the same scope
@Type.Member annotated member to specify the BPFType directly

You can find the up-to-date list in the documentation for the Type annotation.

Conclusion

We have to model all C types that we use in both eBPF and Java in Java, too; this includes placing the different members of structs in memory and keeping them properly aligned.

We saw that the general algorithm behind the layouting is straightforward.

This algorithm can be used in the hello-ebpf library with an annotation processor to make writing eBPF applications more concise and less error-prone.

I hope you liked this introduction to struct layouts. See you in two weeks when we start supporting more features of libbpf.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone. This article first appeared on my personal blog mostlynerdless.de.

The post Hello eBPF: Auto Layouting Structs (7) appeared first on foojay.

To Brussels, Canada and Back

Johannes Bechberger — Tue, 26 Mar 2024 10:53:11 +0000

Table of Contents

FOSDEMConFooVoxxedDays ZurichConclusion

Last year was my first year blogging, speaking at conferences, meeting incredible people, and seeing places I've never been before.

It was at times quite arduous but at the same time energizing, as you can read in my post Looking back on one year of speaking and blogging. I didn't want it to be a one-off year, so I dutifully started a new article series on eBPF and applied for conferences...

And I got accepted at a few of them, which was really great because I started missing traveling after almost three months of being home. In this article, I'll cover my first three conferences this year: FOSDEM in Brussels, ConFoo in Montreal, and Voxxed Days Zurich; they all happened between early February and early March.

It was the most travel, distance (and continent) wise, that I ever did before, by quite some margin:

FOSDEM

Every good journey starts with taking a train far too early:

I took the train at 3:30 am on the 2nd of February from Karlsruhe to Schwetzingen so one of my SapMachine colleagues, Christoph Langer, could pick me up for the first conference of the year: FOSDEM in Brussel. But before FOSDEM, which happened over the weekend, there was the JDK Committers Workshop nearby, where I chaired a session on modernizing JFR with Andrei Pangin:

Ron Pressler was also present, making the discussions quite worthwhile. This eventually led to my current work on implementing a CPU profiler in JFR with Andrei.

After the workshop day, the conference came. Last year, I was in Brussels, too, but I never really took the time to explore the city. Instead, I sat at the conference all day. My talk in the Java room was Saturday evening, so I took the opportunity to visit the main cathedral and some parks together with Alexander Wert and his wife:

My first talk at FOSDEM was titled Inner Workings of Safepoints, in which I explained what safepoints are, why they are important, and how they are implemented, all based on my article, The Inner Workings of Safepoints:

My second talk was actually my first ever Python talk at any conference: It was my talk on Python 3.12's new monitoring and debugging API (based on a blog series):

But it wasn't the last time I'd give the same talk at PyCon Berlin and a similar one at PyConLT in Lithuania. I'm a Java developer but use Python regularly for all the plumbing and black box testing, so I was happy to give something back to the community.

After these conferences, I had two weeks without any conference, which was great because the next conference was somewhat farther away...

ConFoo

I've never been to another continent, let alone visited Canada, so I was pretty happy when I got accepted to ConFoo in Montreal. Being 8 hours on a plane on the 19th of February and having Jetlag, both for the first time, was quite tiring, but hey, at least the conference paid for it:

After arriving in Montreal, I explored the city, Mount Royal, and the underground for an evening and a whole day, meeting Theresa Mammarella and her husband for dinner in China Town:

The days at the conference flew by, especially because of the great food there. I was also glad to meet Sam, Jessie, and Jonatan, joining them to eat bagels from both St. Viateur (pictured) and Fairmount Bagel, as well as eating poutine at Chez Claudette:

While the traveling aspect of this journey was great, I also gave two talks: One on writing a profiler in pure Java and another on debugging, but they are sadly not available publicly (yet). I stayed in Montreal till Saturday, the 23rd.

Then, I traveled by train to Toronto for almost a week with Theresa and her husband Charles before I returned to Montreal to take the plane back to Europe. I first met Theresa at JavaZone in Oslo (see trip report) in September last year...

Where Montreal was icy cold, Toronto felt welcoming, almost warm. I explored the city, went up the CN Tower, down into the underground, to Niagra Falls, and enjoyed Charles' blueberry pancakes with dark maple syrup:

Being in Canada was great, and I'm looking forward to returning to North America for KCDC in June. I came back to Europe Saturday morning with quite some jetlag and four bagels from St. Viateur, spending my weekend with a friend close to the airport before going to Zurich in the following week:

VoxxedDays Zurich

I actually flew to and from Canada via Zurich, so I was at home in Karlsruhe for just three days before going back for the VoxxedDays, meeting new and old acquaintances and giving a talk called Instrument to Remove: Using Java agents for fun and profit (related blog post):

Photo by the great Dimitris Doutsiopoulos

I had the opportunity to reconnect with Mario Fusco (pictured), Matthias Häussler, Anja Kunkel, and many more and to meet new people like Myriam Jessier.

After the one-day conference, I met someone the next day, someone I had first met at ConFoo. By pure coincidence, Marcus Boerger lives close to Zurich with his family, so I joined him for lunch at his home, using the rest of the day to explore the city:

Conclusion

My conference year started with three great conferences and the opportunity to travel. I'm grateful to my team at SapMachine that allows me to speak in so many different places, as well as for all the other people, be it the conference organizers or hosts, that make all this possible.

I'm looking forward to my next month of traveling, in April, with three conferences (including a deep-dive at JavaLand) and at least one user group.

See you next week for the next installment of my hello-ebpf series and hopefully the week after with my first co-authored article with a person I met at ConFoo.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone. It first appeared on my personal blog mostlynerdless.de.

The post To Brussels, Canada and Back appeared first on foojay.

Hello eBPF: Ring buffers in libbpf (6)

Johannes Bechberger — Tue, 19 Mar 2024 17:54:22 +0000

Table of Contents

Ring buffers

bpf_ringbuf_output
bpf_ringbuf_reserve
bpf_ringbuf_submit
bpf_ringbuf_discard
bpf_ringbuf_query

Ring Buffer eBPF ExampleRing Buffer Java ExampleConclusion

Welcome back to my article series on eBPF. Some weeks ago, I got started in using libbpf instead of libbcc. This week, I show you how to use ring buffers, port the code from Ansil H's blog post eBPF for Linux Admins: Part IX from C to Java, and add tests to the underlying map implementation.

My libbpf-based implementation advances slower than the bcc-based, as I thoroughly test all added functionality and develop a proper Java API, not just a clone.

But first, what are eBPF ring buffers:

Ring buffers

In Hello eBPF: Recording data in event buffers (3), I showed you how to use perf event buffers, which are the predecessor to ring buffers and allow us to communicate between kernel and user-land using events. But perf buffers have problems:

It works great in practice, but due to its per-CPU design it has two major short-comings that prove to be inconvenient in practice: inefficient use of memory and event re-ordering.

To address these issues, starting from Linux 5.8, BPF provides a new BPF data structure (BPF map): BPF ring buffer (ringbuf). It is a multi-producer, single-consumer (MPSC) queue and can be safely shared across multiple CPUs simultaneously.
BPF ring buffer by Andrii Nakryiko

Ring buffers are still circular buffers:

Their usage is similar to the perf event buffers we've seen before. The significant difference is that we implemented the perf event buffers using the libbcc-based eBPF code, which made creating a buffer easy:

BPF_PERF_OUTPUT(rb);

Libbcc compiles the C code with macros. With libbpf, we have to write all that ourselves:

// anonymous struct assigned to rb variable
struct
{
  // specify the type, eBPF specific syntax
  __uint (type, BPF_MAP_TYPE_RINGBUF);
  // specify the size of the buffer
  // has to be a multiple of the page size 
  __uint (max_entries, 256 * 4096);
} rb SEC (".maps") /* placed in maps section */;

More on the specific syntax in the mail for the patch specifying it, more in the ebpf-docs.

On the eBPF side in the kernel, ring buffers have several important helper functions that allow their easy use:

bpf_ringbuf_output

long bpf_ringbuf_output(void *ringbuf, void *data, __u64 size, __u64 flags)

Copy the specified number of bytes of data into the ring buffer and send notifications to user-land. This function returns a negative number on error and zero on success.

bpf_ringbuf_reserve

void* bpf_ringbuf_reserve(void *ringbuf, __u64 size, __u64 flags)

Reserve a specified number of bytes in the ring buffer and return a pointer to the start. This lets us write events directly into the ring buffer's memory (source).

bpf_ringbuf_submit

void *bpf_ringbuf_submit(void *data, __u64 flags)

Submit the reserved ring buffer event (reserved via bpf_ringbuf_reserve).

You might assume that you can build your own bpf_ringbuf_output with just bpf_ringbuf_reserve and bpf_ringbuf_submit and you're correct. When we look into the actual implementation of bpf_ringbuf_output, we see that it is not that much more:

BPF_CALL_4(bpf_ringbuf_output, struct bpf_map *, map, 
           void *, data, u64, size,
	   u64, flags)
{
  struct bpf_ringbuf_map *rb_map;
  void *rec;
        
  // check flags
  if (unlikely(flags & ~(BPF_RB_NO_WAKEUP | BPF_RB_FORCE_WAKEUP)))
    return -EINVAL;

  // reserve the memory
  rb_map = container_of(map, struct bpf_ringbuf_map, map);
  rec = __bpf_ringbuf_reserve(rb_map->rb, size);
  if (!rec)
    return -EAGAIN;

  // copy the data into the reserved memory
  memcpy(rec, data, size);

  // equivalent to bpf_ringbuf_submit(rec, flags)
  bpf_ringbuf_commit(rec, flags, false /* discard */);
  return 0;
}

bpf_ringbuf_discard

void bpf_ringbuf_discard(void *data, __u64 flags)

Discard the reserved ring buffer event.

bpf_ringbuf_query

__u64 bpf_ringbuf_query(void *ringbuf, __u64 flags)

Query various characteristics of provided ring buffer. What exactly is queries is determined by flags:

BPF_RB_AVAIL_DATA: Amount of data not yet consumed.

BPF_RB_RING_SIZE: The size of ring buffer.

BPF_RB_CONS_POS: Consumer position (can wrap around).

BPF_RB_PROD_POS: Producer(s) position (can wrap around).

Data returned is just a momentary snapshot of actual values and could be inaccurate, so this facility should be used to power heuristics and for reporting, not to make 100% correct calculation.

Return: Requested value, or 0, if flags are not recognized.
bpf-Helpers man-Page

You can find more information in these resources:

eBPF Docs by Dylan Reimerink
official Linux eBPF documentation
bpf-helpers(7) man-page
Linux kernel source code, as you saw above, can give us insights that no documentation can provide us with

Ring Buffer eBPF Example

After I've shown you what ring buffers are on the eBPF side, we can look at the eBPF example that writes an event for every openat call, capturing the process id, filename, and process name and comes as an addition from Ansil H's blog post eBPF for Linux Admins: Part IX:

#include "vmlinux.h"
#include 
#include 
#include 
                
#define TARGET_NAME "sample_write"
#define MAX_ENTRIES 10
#define FILE_NAME_LEN 256
#define TASK_COMM_LEN 256
                
// Structure to store the data that we want to pass to user
struct event
{
  u32 e_pid;
  char e_filename[FILE_NAME_LEN];
  char e_comm[TASK_COMM_LEN];
};
                
// eBPF map reference
struct
{
  __uint (type, BPF_MAP_TYPE_RINGBUF);
  __uint (max_entries, 256 * 4096);
} rb SEC (".maps");
                
// The ebpf auto-attach logic needs the SEC
SEC ("kprobe/do_sys_openat2")
     int kprobe__do_sys_openat2(struct pt_regs *ctx)
{
  char filename[256];
  char comm[TASK_COMM_LEN] = { };
  struct event *evt;
  const char fmt_str[] = "do_sys_openat2 called by:%s file:%s pid:%d";
                
  // Reserve the ring-buffer
  evt = bpf_ringbuf_reserve(&rb, sizeof (struct event), 0);
  if (!evt) {
      return 0;
  }
  // Get the PID of the process.
  evt->e_pid = bpf_get_current_pid_tgid();
                
  // Read the filename from the second argument
  // The x86 arch/ABI have first argument 
  // in di and second in si registers (man syscall)
  bpf_probe_read(evt->e_filename, sizeof(filename), 
        (char *) ctx->si);
                
  // Read the current process name
  bpf_get_current_comm(evt->e_comm, sizeof(comm));
            
  bpf_trace_printk(fmt_str, sizeof(fmt_str), evt->e_comm,
        evt->e_filename, evt->e_pid);
  // Also send the same message to the ring-buffer
  bpf_ringbuf_submit(evt, 0);
  return 0;
}
                
char _license[] SEC ("license") = "GPL";

Ring Buffer Java Example

With this in hand, we can implement the RingSample using the newly added functionality in hello-ebpf:

@BPF
public abstract class RingSample extends BPFProgram {

  static final String EBPF_PROGRAM = """
              // ...
            """;

  private static final int FILE_NAME_LEN = 256;
  private static final int TASK_COMM_LEN = 16;
  
  // event record
  record Event(@Unsigned int pid, 
               String filename, 
               @Size(TASK_COMM_LEN) String comm) {}

  // define the event records layout
  private static final BPFStructType eventType = 
          new BPFStructType<>("rb", List.of(
          new BPFStructMember<>("e_pid", 
                  BPFIntType.UINT32, 0, Event::pid),
          new BPFStructMember<>("e_filename", 
                  new StringType(FILE_NAME_LEN), 
                  4, Event::filename),
          new BPFStructMember<>("e_comm", 
                  new StringType(TASK_COMM_LEN), 
                  4 + FILE_NAME_LEN, Event::comm)
  ), new AnnotatedClass(Event.class, List.of()), 
  fields -> new Event((int)fields.get(0),
          (String)fields.get(1), (String)fields.get(2)));

  public static void main(String[] args) {
    try (RingSample program = BPFProgram.load(RingSample.class)) {
      // attach the kprobe
      program.autoAttachProgram(
              program.getProgramByName("kprobe__do_sys_openat2"));
      // obtain the ringbuffer
      // and write a message every time a new event is obtained
      var ringBuffer = program.getRingBufferByName("rb", eventType, 
              (buffer, event) -> {
        System.out.printf("do_sys_openat2 called by:%s file:%s pid:%d\n", 
                event.comm(), event.filename(), event.pid());
      });
      while (true) {
        // consume and throw any captured
        // Java exception from the event handler
        ringBuffer.consumeAndThrow();
      }
    }
  }
}

You can run the example via ./run_bpf.sh RingSample:

do_sys_openat2 called by:C1 CompilerThre file:/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/snap.intellij-idea-community.intellij-idea-community-a46a168b-28d0-4bb9-9e15-f3a966353efe.scope/memory.max pid:69817
do_sys_openat2 called by:C1 CompilerThre file:/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/snap.intellij-idea-community.intellij-idea-community-a46a168b-28d0-4bb9-9e15-f3a966353efe.scope/memory.max pid:69812
do_sys_openat2 called by:java file:/home/i560383/.sdkman/candidates/java/21.0.2-sapmchn/lib/libjimage.so pid:69797

Conclusion

The libbpf part of hello-ebpf keeps evolving. With this article, I added support for the first kind of eBPF maps and ring buffers, with a simplified Java API and five unit tests. I'll most likely work on the libbpf part in the future, as it is far easier to work with than with libbcc.

Thanks for joining me on this journey to create a proper Java API for eBPF. Feel free to try the examples for yourself or even write new ones and join the discussions on GitHub. See you in my next article about my journey to Canada or in two weeks for the next installment of this series.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone. This article first appeared on my personal blog mostlynerdless.de.

The post Hello eBPF: Ring buffers in libbpf (6) appeared first on foojay.

Hello eBPF: First steps with libbpf (5)

Johannes Bechberger — Thu, 07 Mar 2024 13:51:35 +0000

Table of Contents

Problems with LibbccBPF-based LibraryHelloWorld ExampleCompiler ErrorsConclusion

Welcome back to my blog series on eBPF. Two weeks ago, I showed you how to write your own eBPF application using my hello-ebpf library based on libbcc. This week, I show you why using libbcc is not the best idea and start working with the newer libbpf.

With my current libbcc-based approach, we essentially embed the executed eBPF program into our programs as a string into our applications and compile them on the fly for every run:

public class HelloWorld {
    public static void main(String[] args) {
        try (BPF b = BPF.builder("""
                int kprobe__sys_clone(void *ctx) {
                   bpf_trace_printk("Hello, World!");
                   return 0;
                }
                """).build()) {
            b.trace_print();
        }
    }
}

Problems with Libbcc

Using libbcc and porting the Python wrapper made it easy to start developing a user-land Java library and offers some syntactic sugar, but it has major disadvantages, to quote Andrii Nakryiko:

Clang/LLVM combo is a big library, resulting in big fat binaries that need to be distributed with your application.

Clang/LLVM combo is resource-heavy, so when you are compiling BPF code at start up, you'll use a significant amount of resources, potentially tipping over a carefully balanced production workfload. And vice versa, on a busy host, compiling a small BPF program might take minutes in some cases.

BPF program testing and development iteration is quite painful as well, as you are going to get even most trivial compilation errors only in run-time, once you recompile and restart your user-space control application. This certainly increases friction and is not helping to iterate fast.

BPF Portability and CO-RE by Andrii Nakryiko

Additionally, the libbcc binaries in the official Ubuntu package repositories are outdated, so we're accumulating technical debt using them.

BPF-based Library

So what is the alternative? We compile the embedded C code in our application to eBPF bytecode at build time using a custom annotation processor and load the bytecode using libbpf at run-time:

This allows us to create self-contained JARs that will eventually neatly package our eBPF application.

With this new chapter of the hello-ebpf project, I am trying to create a proper Java API that

builds on top of libbpf
isn't bound to mimic the Python API, thus making it easier to understand for Java developers
is tested with a growing number of tests so that it is safe to use
prefers usability (and a small API) over speed

The annotation processor for this lives in the bpf-processor, and the central part of the library is in the bpf folder. It is in its earliest stages, but you can expect more features and tests in the following months.

HelloWorld Example

Writing programs with libbpf is not too dissimilar to using my libbcc wrapper:

@BPF // annotation to trigger the BPF annotation processor
public abstract class HelloWorld extends BPFProgram {
    
    // eBPF program code that is compiled at build
    // time using clang
    static final String EBPF_PROGRAM = """
            #include "vmlinux.h"
            #include 
            #include 
                            
            SEC ("kprobe/do_sys_openat2")
            int kprobe__do_sys_openat2(struct pt_regs *ctx){                                                             
                bpf_printk("Hello, World from BPF and more!");
                return 0;
            }
                            
            char _license[] SEC ("license") = "GPL";
            """;

    public static void main(String[] args) {
        // load an instance of the HelloWorld implementation
        try (HelloWorld program = BPFProgram.load(HelloWorld.class)) {
            // attach to the kprobe
            program.autoAttachProgram(
                program.getProgramByName("kprobe__do_sys_openat2"));
            program.tracePrintLoop(f -> 
                String.format("%d: %s: %s", (int)f.ts(), f.task(), f.msg()));
        }
    }
}

Running this class via ./run_bpf.sh HelloWorld will then print the following:

3385: irqbalance: Hello, World from BPF and more!
3385: irqbalance: Hello, World from BPF and more!
3385: irqbalance: Hello, World from BPF and more!
3385: irqbalance: Hello, World from BPF and more!
3385: irqbalance: Hello, World from BPF and more!
3385: irqbalance: Hello, World from BPF and more!
3385: irqbalance: Hello, World from BPF and more!
3385: C2 CompilerThre: Hello, World from BPF and more!

The annotation processor created an implementation of the HelloWorld class, which overrides the getByteCode method:

public final class HelloWorldImpl extends HelloWorld {
    /**
     * Base64 encoded gzipped eBPF byte-code
     */
    private static final String BYTE_CODE = "H4sIAA...n5q6hfQNFV+sgDAAA=";

    @Override
    public byte[] getByteCode() {
        return Util.decodeGzippedBase64(BYTE_CODE);
    }
}

Compiler Errors

But what happens when you make a mistake in your eBPF program, for example, not writing a semicolon after the bpf_printk call? Then, the annotation processor throws an error at build-time and prints the following error message when calling mvn package:

Processing BPFProgram: me.bechberger.ebpf.samples.HelloWorld
Obtaining vmlinux.h header file
Could not compile eBPF program
HelloWorld.java:[19,66]  error: expected ';' after expression
    bpf_printk("Hello, World from BPF and more!")
                                                 ^
                                                 ;
1 error generated.

The annotation processor compiles the eBPF program using Clang and post-processes the error messages to show the location in the Java program. Using libbcc, we only get this error at run-time, which makes finding these issues far harder.

Conclusion

Using libbpf instead of libbcc has many advantages: Smaller, self-contained JARs, better developer support, and a more modern library. The hello-ebpf project will evolve to focus on libbpf to become a fully functional and tested eBPF user-land library. Using an annotation processor offers so many possibilities, so stay tuned.

Thanks for joining me on this journey to create a proper Java API for eBPF. I'll see you in two weeks for the next installment in this series, and possibly before for a trip report on my current travels.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone, first published on my personal blog mostlynerdless.de. This article was written in Canada, thanks to ConFoo and Theresa Mammarella, who made this trip possible. Inspiration came from Ansil H's series on eBPF.

The post Hello eBPF: First steps with libbpf (5) appeared first on foojay.