foojay – a place for friends of OpenJDK

No Keys, No LLM: Building a Wikidata Definition API with Embabel

Vincent Vauban — Thu, 05 Mar 2026 08:00:25 +0000

Table of Contents

TL;DRPart I — Concepts

I.1 Embabel
I.2 Spring AI (even in a “no LLM” demo)
I.3 Role of Embabel in this application
I.4 Wikidata: definition and why it’s ideal for demos

Part II — App building (code + explanations)

II.1 Maven setup (pom.xml)
II.2 Configuration (application.yml)
II.3 App launcher + Embabel enablement + NOOP LLM registration
II.4 The NOOP ChatModel (Spring AI)
II.5 Domain model (Java records)
II.6 Repository: Wikidata calls with RestClient
II.7 The Embabel agent (actions + goal)
II.8 Service: running the agent via AgentInvocation
II.9 Controller: a single endpoint

Part III — Demo

III.1 Curl request
III.2 Response
III.3 Logs: the agentic part

Part IV — Conclusion and extensions

1) Disambiguation
2) Multi-language
3) Confidence score
4) Caching and rate limiting
5) Multi-source enrichment
6) Optional LLM post-processing (when needed)

TL;DR

I built a Spring Boot 4 API that defines terms via Wikidata.
The app is fully reproducible: no API keys and no model installation needed.
Embabel orchestrates the pipeline as a sequence of actions to achieve the goal DefinitionResult.
The logs show planning, execution, and typed object binding—the most useful part for teaching agentic flows.

No Keys, No LLM: Building a Wikidata Definition API with Embabel

I wanted a demo that is simple, reproducible, and still shows agentic orchestration in a way that’s easy to explain on video.

So I built a small Spring Boot 4 app that exposes a single endpoint:

GET /api/wiki/define?term=...

It returns a compact JSON “definition” fetched from Wikidata (no authentication, no API keys).
The important part: I used Embabel to orchestrate the workflow, even though the workflow is deterministic and does not need an LLM.

Part I — Concepts

I.1 Embabel

Embabel is an agent framework for the JVM. I like to think of it as a way to model a workflow as:

Actions: steps the agent can execute
Goals: what the workflow should produce
State / facts: typed objects available at each moment
Planning: decide which actions to run and in which order to achieve the goal

In practice, that means I don’t call methods in a fixed chain. I provide an initial input (a domain object), tell Embabel what type I want as the result, and Embabel plans and runs the required actions.

I.2 Spring AI (even in a “no LLM” demo)

Spring AI provides an abstraction layer for interacting with chat models (and other AI components) using Spring-friendly APIs.

In this project, I implemented a tiny NOOP chat model. It’s not used to generate anything. It exists because the Embabel starter expects a default model entry to be configured at startup.

This kept the demo:

fully runnable without credentials,
focused on orchestration,
and easy to extend later with a real model.

I.3 Role of Embabel in this application

A reasonable question is: “What’s the point of using Embabel just to query a REST API?”

The REST call is not the point. The point is to demonstrate a workflow that:

starts from a DefinitionRequest(term)
resolves a Wikidata entity ID (Q-id)
fetches entity details
builds a typed DefinitionResult

Embabel makes these steps explicit, typed, and observable, and it can re-plan as the state evolves. That’s a much better foundation than packing everything into one big service method—especially when the demo grows.

I.4 Wikidata: definition and why it’s ideal for demos

Wikidata is a public, open knowledge base. It’s perfect for demos because:

it’s online,
it’s free to read,
and the APIs are easy to call from a small Java project.

I used two endpoints:

wbsearchentities to search for a term and retrieve the most relevant Q-id
Special:EntityData/{QID}.json to fetch structured entity data (labels, descriptions, and Wikipedia sitelinks)

This gives a nice “definition API” in a few lines of code, with zero setup for viewers.

Part II — App building (code + explanations)

II.1 Maven setup (`pom.xml`)

I used Spring Boot 4.0.3 with Java 25, and Embabel 0.3.4.

Because this is Boot 4, I added spring-boot-starter-restclient so RestClient.Builder is auto-configured.

I also forced Jackson 2 compatibility (spring-boot-jackson2) and excluded spring-boot-starter-json, because the Embabel starter wiring in this setup expects Jackson2ObjectMapperBuilder.



    4.0.0

    
        org.springframework.boot
        spring-boot-starter-parent
        4.0.3
        
    

    com.example
    wikidemo
    0.0.1-SNAPSHOT
    wikidemo

    
        25
        0.3.4
    

    
        
        
            org.springframework.boot
            spring-boot-starter-web
            
                
                    org.springframework.boot
                    spring-boot-starter-json
                
            
        

        
            org.springframework.boot
            spring-boot-starter-restclient
        

        
            org.springframework.boot
            spring-boot-jackson2
        

        
        
            com.embabel.agent
            embabel-agent-starter
            ${embabel-agent.version}
        
    

    
        
            
                org.springframework.boot
                spring-boot-maven-plugin

II.2 Configuration (`application.yml`)

I set the server port and configured the default Embabel model name to noop.

spring:
  application:
    name: wikidemo

server:
  port: 8080

embabel:
  models:
    default-llm: noop

II.3 App launcher + Embabel enablement + NOOP LLM registration

The application entrypoint enables agent scanning using @EnableAgents, then registers a “noop” model so the platform boots without external dependencies.

package com.vv.wikidemo;

import com.embabel.agent.config.annotation.EnableAgents;
import com.embabel.agent.spi.LlmService;
import com.embabel.agent.spi.support.springai.SpringAiLlmService;
import com.vv.wikidemo.service.NoopChatModel;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;

@SpringBootApplication
@EnableAgents
public class WikiDemoApplication {
    public static void main( String[] args ) {
        SpringApplication.run( WikiDemoApplication.class, args );
    }

    @Bean
    public LlmService noopLlm() {
        return new SpringAiLlmService(
                "noop",          // model name (must match embabel.models.default-llm)
                "noop-provider", // provider label (any string)
                new NoopChatModel()
        );
    }
}

II.4 The NOOP ChatModel (Spring AI)

This is intentionally minimal. If Embabel ever calls it, it returns a predictable message.

package com.vv.wikidemo.service;

import org.springframework.ai.chat.messages.AssistantMessage;
import org.springframework.ai.chat.model.ChatModel;
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.chat.model.Generation;
import org.springframework.ai.chat.prompt.Prompt;

import java.util.List;

public class NoopChatModel implements ChatModel {

    @Override
    public ChatResponse call( Prompt prompt ) {
        var msg = new AssistantMessage(
                "NOOP LLM: no real LLM configured (this demo doesn't need one)."
        );
        return new ChatResponse( List.of( new Generation( msg ) ) );
    }
}

II.5 Domain model (Java records)

I used records for the request, intermediate agent objects, and final result.

package com.vv.wikidemo.model;
public record DefinitionRequest(String term) {
}

public record DefinitionResult(
        String term,
        String entityId,
        String label,
        String description,
        String wikidataUrl,
        String wikipediaUrl
) {
}

public record WikidataEntityDetails(
        String label,
        String description,
        String wikipediaTitle
) {}

public record WikidataEntityId(String id) {
}

The key idea is that Embabel “stores” and “reuses” these typed objects during execution. They become the agent’s working memory.

II.6 Repository: Wikidata calls with RestClient

The repository is responsible for the data access logic only:

search for the best match (Q-id)
fetch details for that Q-id
build stable URLs

I kept DTO mappings minimal and resilient with @JsonIgnoreProperties(ignoreUnknown = true).

package com.vv.wikidemo.repository;

import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.annotation.JsonProperty;
import com.vv.wikidemo.model.WikidataEntityDetails;
import org.springframework.http.HttpHeaders;
import org.springframework.stereotype.Repository;
import org.springframework.web.client.RestClient;

import java.net.URLEncoder;
import java.nio.charset.StandardCharsets;
import java.util.List;
import java.util.Map;
import java.util.Optional;

@Repository
public class WikidataRepository {

    private final RestClient wikidata;

    public WikidataRepository( RestClient.Builder builder ) {
        this.wikidata = builder
                .baseUrl( "https://www.wikidata.org" )
                .defaultHeader( HttpHeaders.USER_AGENT, "wikidemo/0.0.1 (SpringBoot+Embabel demo)" )
                .build();
    }

    /**
     * Step 1: find the first matching Q-id for a term
     */
    public Optional searchFirst( String term ) {
        SearchResponse response = wikidata.get()
                                          .uri( uriBuilder -> uriBuilder
                                                  .path( "/w/api.php" )
                                                  .queryParam( "action", "wbsearchentities" )
                                                  .queryParam( "search", term )
                                                  .queryParam( "language", "en" )
                                                  .queryParam( "format", "json" )
                                                  .queryParam( "limit", "1" )
                                                  .build() )
                                          .retrieve()
                                          .body( SearchResponse.class );

        if ( response == null || response.search == null || response.search.isEmpty() ) {
            return Optional.empty();
        }
        return Optional.ofNullable( response.search.get( 0 ) );
    }

    /**
     * Step 2: fetch label/description (and Wikipedia title if present) from Special:EntityData
     */
    public WikidataEntityDetails fetchEntityDetails( String entityId ) {
        EntityDataResponse data = wikidata.get()
                                          .uri( "/wiki/Special:EntityData/{id}.json", entityId )
                                          .retrieve()
                                          .body( EntityDataResponse.class );

        if ( data == null || data.entities == null || !data.entities.containsKey( entityId ) ) {
            return new WikidataEntityDetails( null, null, null );
        }

        Entity entity = data.entities.get( entityId );
        String label = valueOf( entity.labels, "en" );
        String desc = valueOf( entity.descriptions, "en" );
        String wikiTitle = (entity.sitelinks != null && entity.sitelinks.containsKey( "enwiki" ))
                ? entity.sitelinks.get( "enwiki" ).title
                : null;

        return new WikidataEntityDetails( label, desc, wikiTitle );
    }

    public static String wikidataUrl( String entityId ) {
        return "https://www.wikidata.org/wiki/" + entityId;
    }

    public static String wikipediaUrl( String title ) {
        if ( title == null || title.isBlank() ) {
            return null;
        }
        String normalized = title.replace( ' ', '_' );
        return "https://en.wikipedia.org/wiki/" + URLEncoder.encode( normalized, StandardCharsets.UTF_8 );
    }

    private static String valueOf( Map map, String lang ) {
        if ( map == null ) {
            return null;
        }
        LangValue lv = map.get( lang );
        return lv == null ? null : lv.value;
    }

    // --- DTOs for JSON mapping (minimal fields only) ---

    @JsonIgnoreProperties( ignoreUnknown = true )
    static class SearchResponse {
        @JsonProperty( "search" )
        public List search;
    }

    @JsonIgnoreProperties( ignoreUnknown = true )
    public static class SearchItem {
        @JsonProperty( "id" )
        public String id;

        @JsonProperty( "label" )
        public String label;

        @JsonProperty( "description" )
        public String description;
    }

    @JsonIgnoreProperties( ignoreUnknown = true )
    static class EntityDataResponse {
        @JsonProperty( "entities" )
        public Map entities;
    }

    @JsonIgnoreProperties( ignoreUnknown = true )
    static class Entity {
        @JsonProperty( "labels" )
        public Map labels;

        @JsonProperty( "descriptions" )
        public Map descriptions;

        @JsonProperty( "sitelinks" )
        public Map sitelinks;
    }

    @JsonIgnoreProperties( ignoreUnknown = true )
    static class LangValue {
        @JsonProperty( "value" )
        public String value;
    }

    @JsonIgnoreProperties( ignoreUnknown = true )
    static class Sitelink {
        @JsonProperty( "title" )
        public String title;
    }
}

II.7 The Embabel agent (actions + goal)

The agent defines the workflow. Each method is a step (@Action). The final step is tagged as a goal (@AchievesGoal) because it produces the desired output type DefinitionResult.

package com.vv.wikidemo.service;

import com.embabel.agent.api.annotation.AchievesGoal;
import com.embabel.agent.api.annotation.Action;
import com.embabel.agent.api.annotation.Agent;
import com.vv.wikidemo.model.DefinitionRequest;
import com.vv.wikidemo.model.DefinitionResult;
import com.vv.wikidemo.model.WikidataEntityDetails;
import com.vv.wikidemo.model.WikidataEntityId;
import com.vv.wikidemo.repository.WikidataRepository;
import org.springframework.web.server.ResponseStatusException;

import static org.springframework.http.HttpStatus.NOT_FOUND;

@Agent( description = "Define a word using Wikidata (no LLM, no auth)" )
public class WikidataDefinitionAgent {

    private final WikidataRepository repo;

    public WikidataDefinitionAgent( WikidataRepository repo ) {
        this.repo = repo;
    }

    @Action
    public WikidataEntityId findEntityId( DefinitionRequest request ) {
        var hit = repo.searchFirst( request.term() )
                      .orElseThrow( () -> new ResponseStatusException(
                              NOT_FOUND, "No Wikidata entity found for term: " + request.term()
                      ) );
        return new WikidataEntityId( hit.id );
    }

    @Action
    public WikidataEntityDetails fetchDetails( WikidataEntityId id ) {
        return repo.fetchEntityDetails( id.id() );
    }

    @Action
    @AchievesGoal( description = "Return a Wikidata-based definition" )
    public DefinitionResult build( DefinitionRequest request,
                                   WikidataEntityId id,
                                   WikidataEntityDetails details ) {

        String wikidataUrl = WikidataRepository.wikidataUrl( id.id() );
        String wikipediaUrl = WikidataRepository.wikipediaUrl( details.wikipediaTitle() );

        // If Wikidata doesn't have an English label/description, you still get a stable entity link.
        return new DefinitionResult(
                request.term(),
                id.id(),
                details.label(),
                details.description(),
                wikidataUrl,
                wikipediaUrl
        );
    }
}

I like this structure because it stays small and readable. More importantly, it becomes easy to extend later:

add a disambiguation action,
add a caching action,
add alternative paths,
add optional post-processing.

II.8 Service: running the agent via `AgentInvocation`

The service is the bridge between the web layer and Embabel. It creates an AgentInvocation and calls it with a DefinitionRequest.

package com.vv.wikidemo.service;

import com.embabel.agent.api.invocation.AgentInvocation;
import com.embabel.agent.core.AgentPlatform;
import com.vv.wikidemo.model.DefinitionRequest;
import com.vv.wikidemo.model.DefinitionResult;
import org.springframework.stereotype.Service;

@Service
public class WikiService {

    private final AgentPlatform                     agentPlatform;
    private final AgentInvocation invocation;

    public WikiService( AgentPlatform agentPlatform ) {
        this.agentPlatform = agentPlatform;
        this.invocation = AgentInvocation
                .builder( agentPlatform )
                .build( DefinitionResult.class );
    }

    public DefinitionResult define( String term ) {
        return invocation.invoke( new DefinitionRequest( term ) );
    }
}

II.9 Controller: a single endpoint

The controller stays boring on purpose. All the interesting logic is in the agent and repository.

package com.vv.wikidemo.controller;

import com.vv.wikidemo.model.DefinitionResult;
import com.vv.wikidemo.service.WikiService;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;

@RestController
@RequestMapping( "/api/wiki" )
public class WikiController {

    private final WikiService wikiService;

    public WikiController( WikiService wikiService ) {
        this.wikiService = wikiService;
    }

    @GetMapping( "/define" )
    public DefinitionResult define( @RequestParam( "term" ) String term ) {
        return wikiService.define( term );
    }
}

Part III — Demo

III.1 Curl request

curl --request get --url 'http://localhost:8080/api/wiki/define?term=kafka'

III.2 Response

{
  "term": "kafka",
  "entityId": "Q16235208",
  "label": "Apache Kafka",
  "description": "open source data stream processing platform",
  "wikidataUrl": "https://www.wikidata.org/wiki/Q16235208",
  "wikipediaUrl": "https://en.wikipedia.org/wiki/Apache_Kafka"
}

This is intentionally “small JSON”: label + description + canonical links.

III.3 Logs: the agentic part

These logs are the best part to show on screen, because they reveal Embabel’s planning and execution.

21:35:05.039 [tomcat-handler-2] INFO  Embabel - [goofy_mcclintock] created
21:35:05.039 [tomcat-handler-2] INFO  Embabel - [goofy_mcclintock] object added: DefinitionRequest

21:35:05.046 [task-1] INFO  Embabel - [goofy_mcclintock] formulated plan:
  com.vv.wikidemo.service.WikidataDefinitionAgent.findEntityId ->
  com.vv.wikidemo.service.WikidataDefinitionAgent.fetchDetails ->
  com.vv.wikidemo.service.WikidataDefinitionAgent.build

21:35:05.047 [task-1] INFO  Embabel - [goofy_mcclintock] executing action ... findEntityId
21:35:05.745 [task-1] INFO  Embabel - [goofy_mcclintock] executed action ... findEntityId in PT0.686S
21:35:05.743 [task-1] INFO  Embabel - [goofy_mcclintock] object bound it:WikidataEntityId

21:35:05.749 [task-1] INFO  Embabel - [goofy_mcclintock] formulated plan:
  com.vv.wikidemo.service.WikidataDefinitionAgent.fetchDetails ->
  com.vv.wikidemo.service.WikidataDefinitionAgent.build

21:35:05.749 [task-1] INFO  Embabel - [goofy_mcclintock] executing action ... fetchDetails
21:35:06.187 [task-1] INFO  Embabel - [goofy_mcclintock] executed action ... fetchDetails in PT0.437S
21:35:06.186 [task-1] INFO  Embabel - [goofy_mcclintock] object bound it:WikidataEntityDetails

21:35:06.189 [task-1] INFO  Embabel - [goofy_mcclintock] formulated plan:
  com.vv.wikidemo.service.WikidataDefinitionAgent.build

21:35:06.190 [task-1] INFO  Embabel - [goofy_mcclintock] executing action ... build
21:35:06.191 [task-1] INFO  Embabel - [goofy_mcclintock] object bound it:DefinitionResult
21:35:06.196 [task-1] INFO  Embabel - [goofy_mcclintock] goal ... achieved in PT1.164...

What stands out:

Embabel starts with DefinitionRequest
It formulates a plan (sequence of actions)
It executes each action
It binds the produced objects (WikidataEntityId, WikidataEntityDetails, then DefinitionResult)
It declares the goal achieved

This is the “agentic” angle: Embabel is not just calling methods—it's planning against typed state.

Part IV — Conclusion and extensions

This application intentionally starts simple. It’s a demo designed to be reproduced in minutes.

However, the Embabel structure is already useful because it’s an orchestrator. Extending the system becomes a matter of adding actions and (optionally) conditions, not rewriting a monolithic service method.

Here are extensions that make the demo evolve naturally:

1) Disambiguation

Instead of limit=1, fetch the top N hits and add an action to pick the best match. For example:

exact label match
description keyword match
“instance of” filtering (person vs concept vs product)

2) Multi-language

Add lang to DefinitionRequest and propagate it into:

wbsearchentities&language=...
selecting labels/descriptions by language

3) Confidence score

Add a ConfidenceScore record and an action that computes a score based on:

match quality
label similarity
number of aliases
presence of sitelinks

Return it to consumers to make the API safer to use.

4) Caching and rate limiting

Add an action that checks a cache before querying Wikidata. This is a classic production step and it fits nicely as an independent action.

5) Multi-source enrichment

Add an alternative source for definitions:

DBpedia
Wikipedia summary API
internal enterprise knowledge base

Embabel becomes more valuable as the number of sources increases, because orchestration becomes a first-class concept.

6) Optional LLM post-processing (when needed)

A good, minimal LLM use case is last-mile text rewriting:

convert the Wikidata description into a more “dictionary-like” sentence
add examples
translate to French
generate a short TL;DR

This keeps the retrieval deterministic and makes the LLM optional, which is often a safer architecture.

Repo: https://github.com/vinny59200/embabel

Udemy Spring Certification Practice Course: https://www.udemy.com/course/spring-professional-certification-6-full-tests-2v0-7222-a/?referralCode=04B6ED315B27753236AC

Study Guide For Spring: https://spring-book.mystrikingly.com

The post No Keys, No LLM: Building a Wikidata Definition API with Embabel appeared first on foojay.

Shaping Jakarta Agentic AI Together – Watch the Open Conversation

Dominika Tasarz — Mon, 02 Mar 2026 11:15:46 +0000

Table of Contents

What is Jakarta Agentic AI?What we discussed in the sessionWhy this matters for the Jakarta ecosystemWatch the recording and get involved

Last week, Eclipse Foundation and Payara hosted Jakarta Agentic AI, An Open Conversation, an open house Jakarta TechTalk session, exploring a brand new initiative under the Eclipse Foundation. If you could not join us live, the full recording is now available.

What is Jakarta Agentic AI?

Jakarta Agentic AI is an exploratory project looking at how AI agents could be built, deployed and run within Jakarta EE runtimes. As AI systems increasingly move from simple inference to autonomous, agent-based behaviour, the question becomes how these systems fit into enterprise Java environments that value reliability, security, and portability.

Find Jakarta Agentic AI on GitHub

What we discussed in the session

During the conversation, panel members actively involved in the project – Reza Rahman (Jakarta EE Ambassadors, Payara), Tanja Obradovic (Eclipse Foundation), Mike Redlich (Garden State JUG, InfoQ), Luis Neto (Payara) & Dominika Tasarz (Payara) – covered topics including:

What agentic AI means in the context of enterprise Java
Why Jakarta EE is a strong foundation for experimenting with agent-based systems
The early goals and design principles guiding the project
How openness, flexibility and community input are being prioritised from day one
Where feedback and contributions are most valuable right now
The discussion reflects a project at a very early stage, focused on learning, collaboration and shared exploration rather than predefined outcomes.

Why this matters for the Jakarta ecosystem

Jakarta EE has long provided a stable, open platform for enterprise Java applications. As AI-driven systems become more autonomous and more integrated into business workflows, it is important that Jakarta remains an active participant in that evolution.

Jakarta Agentic AI is one way the community can explore how emerging AI patterns align with existing enterprise concerns such as:

Portability across runtimes and vendors
Security, governance and observability
Integration with existing Jakarta EE applications and architectures

Watch the recording and get involved

If you are interested in the future of Jakarta EE, enterprise Java or agent-based AI systems, the recording is a great place to start. You will hear directly from the people shaping the project and get a clear sense of where input from the community can make a real difference.

This initiative is open, early and very much a work in progress. Your questions, ideas and concerns are not just welcome, they are essential!

The post Shaping Jakarta Agentic AI Together – Watch the Open Conversation appeared first on foojay.

Watch the Recording: DIY Technical Marketing for Java Developers

Dominika Tasarz — Thu, 26 Feb 2026 10:28:58 +0000

The software development industry is more competitive than ever. Being a strong technical expert is essential, but on its own it is often not enough to grow your career or open new opportunities.

In this short, practical talk DIY Technical Marketing, Real World Tips For Building A Successful Developer Brand, delivered at Jfokus 2026 conference, Payara Community Manager Dominika Tasarz-Sochacka explores why personal branding matters for developers and how even small, intentional actions can make a real difference over time.

Drawing on real world examples from years of working with Java developers, the session focuses on three core ideas:

• understanding the value of personal branding in the tech industry
• identifying what makes you unique and how to communicate it clearly
• taking simple first steps to build visibility without turning it into a full time job

This talk is designed to be approachable and realistic, especially for developers who want to focus on building great software while still investing in their long term career growth.

The post Watch the Recording: DIY Technical Marketing for Java Developers appeared first on foojay.

JC-AI Newsletter #13

Miro Wengner — Thu, 05 Feb 2026 21:12:12 +0000

Two weeks have passed, and it is time to present a new collection of readings that may shape developments, utilization or ideas in the field of artificial intelligence in 2026.

While significant activity characterizes the AI field, many unresolved research, design, and implementation challenges continue to impact progress. Future advancement depends heavily on understanding the nature of these challenges to approach probabilistic problems from the appropriate directions. This JC-AI newsletter features insightful interviews with key figures in the field, enabling readers to ask the right questions and compare visions of an 'uncertain future' against current capabilities to maintain a grounded perspective.

article: Deep Researcher with Sequential Plan Reflection and Candidates Crossover (Deep Researcher Reflect Evolve)
authors: Saurav Prateek
date: 2026-01-28
desc.: This paper introduces Deep Researcher, a novel architecture that shifts the paradigm from latency-optimized parallel scaling to an accuracy-driven sequential refinement model. Within the development of Deep Research Agents (DRAs), two primary paradigms are considered, Parallel Scaling and Sequential Refinement. The Deep Researcher agent achieved an overall score of 46.21 on the Research Bench, demonstrating superior performance compared to existing agents, including Claude Researcher, Nvidia AIQ Research Assistant, Perplexity Research, Kimi Researcher, and Grok Deep Search. While these improvements are good, the field requires further research to address remaining challenges.
category: research

article: Manipulation in Prediction Markets: An Agent-based Modeling Experiment
authors: Bridget Smart, Ebba Mark, Anne Bastian, Josefina Waugh (University of Oxford)
date: 2026-01-28
desc.: The paper investigates the utilization of agentic systems in the economic field and their impact on prediction. First, the paper evaluates an agent-based model of a prediction market in which bettors with heterogeneous expertise, noisy private information, variable learning rates, and budgets observe the evolution of public opinion on a binary election outcome to inform their betting strategies in the market. The agentic system exhibits stability across experiments. The second area relates to experiments on how "whale" agents, a highly resourced minority with biased information, may distort market prices and for how long. The paper discusses interesting simulation results on how biased information may change the market from a long-term perspective.
category: research

article: Beyond Accuracy: A Cognitive Load Framework for Mapping the Capability Boundaries of Tool-use Agents
authors: Qihao Wang, Yue Hu, Mingzhe Lu, Jiayue Wu, Yanbing Liu, Yuanmin Tang
date: 2026-01-28
desc.: While LLMs' ability to use external tools enables powerful real-world applications, current benchmarks focus on final accuracy rather than revealing the cognitive bottlenecks that limit their true capabilities. This paper presents a framework based on Cognitive Load Theory that aims to decompose tasks into two components: Intrinsic Load and Extraneous Load. The paper discusses performance inconsistencies as cognitive load increases, and demonstrates how the proposed framework enables the identification of capability boundaries in the examined examples.
category: research

article: Build a Prompt Learning Loop - SallyAnn DeLucia & Fuad Ali, Arize
authors: AI Engineer, Sally Ann Delucia, Fuad Alli (Arize)
date: 2026-01-06
desc.: This talk aims to provide ideas on how it is possible to improve LLM responses by using feedback loops. It's important to view this talk through the lens of current research results regarding the LLM hallucination phenomenon and other factors. The main reason to keep current research results in mind is to avoid ending up in an infinite loop of failure/error.
category: youtube

article: Stanford CS230 | Autumn 2025 | Lecture 8: Agents, Prompts, and RAG
authors: Stanford Online
date: 2025-11-11
desc.: For more information about Stanford’s Artificial Intelligence professional and graduate programs
category: youtube, tutorial

article: Developer Experience in the Age of AI Coding Agents – Max Kanat-Alexander, Capital One
authors: AiEngineer, Max Kanat-Alexander
date: 2025-12-23
desc.: It feels like every two weeks, the world of software engineering is being turned on its head. Are there any principles we can rely on that will continue to hold true, and that can help us prepare for the future, no matter what happens? Max uses research, data, and his 20+ years working in enterprise Developer Experience teams to talk through what we can do now that will prepare us for an agentic future, no matter what that future holds.
category: youtube, opinion

article: Token-Guard: Towards Token-Level Hallucination Control via Self-Checking Decoding
authors: Yifan Zhu, Huiqiang Rong, Haoran Luo
date: 2026-01-29
desc.: Hallucination is a recognized phenomenon in the LLM field that impacts applications such as Retrieval-Augmented Generation (RAG) and Reward Modeling (RM). This paper introduces Token-Guard, a self-checking mechanism designed to identify and control hallucinations at the token level. The experiments demonstrate improvements.
category: research

article: Reward Models Inherit Value Biases from Pretraining
authors: Brian Christian, Jessica A. F. Thompson, Elle Michelle Yang, Vincent Adam, Hannah Rose Kirk and others (University of Oxford, University Pompeu Farba)
date: 2026-01-28
desc.: Despite their importance in LLM alignment, reward models (RMs) remain under-researched. This paper provides evidence that RMs inherit biases from their base models, suggesting that the choice of an open-source model is a reflection of values as much as performance. The paper discusses limitations of experiments and offers avenues for future research.
category: research

article: Professor Geoffrey Hinton - AI and Our Future
authors: City of Hobart, Geoffrey Hinton
date: 2026-01-08
desc.: Professor Geoffrey Hinton, known as the "Godfather of AI", will discuss artificial intelligence - how it works, the risks it poses to our society, and how we might coexist with super-intelligent AI. Ideal for business leaders, creatives, researchers, educators, students and anyone curious about the future of intelligence and society.
category: opinion

article: Your MCP Server is Bad (and you should feel bad) - Jeremiah Lowin, Prefect
authors: AI Engineer, Jeremiah Lowin
date: 2026-01-12
desc.: Too many MCP servers are simply glorified REST wrappers, regurgitating APIs that were designed for SDKs rather than agents. This leads to confused LLMs, wasted tokens, and demonstrably poor performance. If you have ever pointed an MCP generator at an OpenAPI spec and called it a day, this talk is your wake-up call.
category: youtube

article: Frontier Models & AI | Sam Altman, CEO & Co-Founder, OpenAI
authors: Cisco
date: 2026-02-04
desc.: Although Sam Altman, CEO and Co-Founder of @OpenAI, explores ideas about future possibilities and potential developments, he is asked during the interview to align his vision with the current state of research and existing technological capabilities. The interview, however, does not present clear data demonstrating how Codex outperforms alternatives or what 'better' specifically means in this context. The responses to questions may appear to be non-deterministic in nature. The interview relies heavily on thoughts about an "undefined future" that would require a deterministically defined foundation. It is interesting how the interview examined frontier AI models and their implications for economies, institutions, and global systems.
category: opinion

article: How to build secure and scalable remote MCP servers
authors: Den Delimarsky (Microsoft)
date: 2025-07-25
desc.: The tutorial provides insights into how to build a reliable Model Context Protocol (MCP) server, enabling AI agents to connect to external tools. It covers several crucial areas and provides valuable resources and ideas for tackling the challenge.
category: tutorial

The post JC-AI Newsletter #13 appeared first on foojay.

First Experiments with Java on the LattePanda IOTA: An Alternative to Raspberry Pi?

Frank Delporte — Thu, 11 Dec 2025 09:13:14 +0000

Table of Contents

Unboxing the LattePanda IOTAAssemblySetting Up The Board

First Boot: Windows Pre-installed
Installing Ubuntu
Setting Up Java Development

Testing Java, JavaFX, and Pi4J

HelloWorld with JBang
JavaFX Test
Pi4J Test
Performance Check

Conclusion

After years of experimenting with Raspberry Pi boards, Java, JavaFX, and Pi4J to control electronics, I wanted to explore whether my knowledge and experience could be applied to similar boards from other providers. There are many alternatives available these days, based on ARM, Intel processors, and RISC-V architectures.

I reached out to several suppliers to see if I could get evaluation copies, and I'm happy to share that I received my first box from DFRobot containing the LattePanda IOTA.

Unboxing the LattePanda IOTA

The box contained multiple smaller boxes, but the most important one was the LattePanda IOTA board itself, based on an Intel Twin Lake N150 quad-core processor (up to 3.6GHz). It has a clear warning on the packaging: "Do not operate without a heatsink". This thing will definitely get hot if you ignore that warning I guess

The board is a bit bigger than a Raspberry Pi and appears very well-made. It has:

A GPIO header (similar to Raspberry Pi, though the pin numbering is different)
Network connection
Connections for storage options and other expansions
Three USB ports
A full-size HDMI connector (more convenient than the mini or micro HDMI on Raspberry Pi)

In the same box, I also received:

M2 expansion board: for extra storage
Active cooler: essential to prevent overheating
UPS hat: for battery backup functionality
Power over Ethernet shield: handy, will test later
4G LTE module with SIM card support

The cooling fan has a nice logo and excellent build quality. The PoE shield connects directly to a new network connector on the board, unlike Raspberry Pi expansion boards that use the Pi's existing network connection.

Assembly

Following the documentation, I applied thermal paste to the processor, attached the cooling fan, and connected the M2 expansion board.

Setting Up The Board

First Boot: Windows Pre-installed

After finding the power button, the LattePanda logo appeared on screen, and... Windows started booting. Windows was pre-installed, though I'm not sure if this is default or just for evaluation units. Either way, I immediately noticed 100% CPU usage, the exact reason I left Windows long ago, as I never understood that it's an ongoing problem with Windows... Memory usage was also pretty high.

This thing definitely works with Windows, but I don't use Windows myself. Time to turn this into a Linux device.

Installing Ubuntu

I put the latest Ubuntu system on a USB stick to boot from it, restarted the device, and kept pressing the delete button to enter the BIOS. The system recognized the USB drive immediately. After selecting it and choosing "Save and exit", it booted into Ubuntu installation mode. A few configuration steps later, I had a nice combination: LattePanda running Ubuntu.

Setting Up Java Development

As expected, Java isn't pre-installed in Ubuntu, but several installation options were suggested. However, there's an easier way to prepare a Linux embedded board like this or a Raspberry Pi for Java development: the Pi4J OS repository.

This repository contains scripts to set up boards for Java development, making it easy to have everything prepared and ready to start. There are two scripts available:

One for Raspberry Pi
One for non-Raspberry Pi boards

Using the second option, curl downloads and executes the script for non-Raspberry Pi boards with the following command:

curl -sL https://raw.githubusercontent.com/Pi4J/pi4j-os/main/script/prepare-for-java-non-rpi.sh | bash

This performs:

System update
Installation of extra dependencies for Java and I2C
SDKMAN installation
Java installation
Maven installation
JBang installation

I also installed Visual Studio Code, the preferred Java editor for this kind of board because it's lightweight and has excellent extensions for Java and JavaFX applications. These are the recommended extensions for Java development:

Extension Pack for Java: Installs many tools for Java development
JBang: To execute JBang code directly from VS Code

Testing Java, JavaFX, and Pi4J

I cloned the Pi4J JBang examples project and opened it in Visual Studio Code, to execute code in an easy way.

HelloWorld with JBang

The simple "Hello World" example ran perfectly. There's also an extended example using the Jackson library for JSON parsing, demonstrating how JBang can create single-file applications with dependencies, without needing a full Maven or Gradle project.

JavaFX Test

Since I installed the Java version from Azul with JavaFX included, I could also run a JavaFX demo application. It uses Pi4J to detect the board type, though this only contains methods to detect Raspberry Pi board versions at this moment, so it didn't recognize the LattePanda.

But the application ran smoothly! It showed we're running on a Linux 64-bit system with Java 25. The board wasn't recognized yet as expected, maybe we can in the future add detection tools in the Pi4J library to show the brand or manufacturer information.

Without any extra work, we have a JavaFX application running very smoothly on this board!

Pi4J Test

Now for the fun part: let's see what happens when we run something Pi4J-specific. I tried a project that uses an RGB-LED and changes colors. It compiled, but gave errors about user groups not being configured correctly. This was expected, I've never tried Pi4J on a non-Raspberry Pi single-board-computer before, so I wasn't expecting it to work on the first attempt.

This is something I'll dive into further and post follow-up videos about what can be achieved with the Pi4J library on boards like this.

Performance Check

With htop, I checked the CPU usage. Compared to Windows using 100% CPU, we have here in an idle state almost nothing. There's a lot of room for applications we can run on this board. Great!!!

Conclusion

This was the first quick test, and it only took me about an hour to unbox everything, assemble it, and record this. Very promising results:

Java runs perfectly
JavaFX runs very smoothly
Pi4J not working yet, but that was expected

The next step will be to determine which configuration changes are needed, either at the system level or within Pi4J itself. I'm very happy with this first result. The LattePanda IOTA is a very good-looking board, well-made, and comes with a good fan. You don't hear it running during normal usage. It only ramps up when you start demanding applications.

Promising results! I'm looking forward to experimenting more with this and similar boards to see what's possible with Java(FX) and Pi4J on alternative hardware platforms.

Stay tuned for follow-up videos and blog posts!

The post First Experiments with Java on the LattePanda IOTA: An Alternative to Raspberry Pi? appeared first on foojay.

JC-AI Newsletter #11

Miro Wengner — Tue, 09 Dec 2025 16:12:01 +0000

Fourteen days have passed, and it is time to present a fresh collection of readings that could influence developments in the field of artificial intelligence.

This newsletter explores the evolution of agentic AI systems, provides valuable insights into the Chain-of-Thought (CoT) approach, Vibe coding, and discusses the pattern-matching capabilities of LLMs. The newsletter features an insightful interview with Stuart J. Russell, known for his significant contributions to the AI field. Even more exciting is the published paper by Apple researchers titled 'The Illusion of Thinking...' and several immediate reactions to the authors' conclusions, which allow newsletter readers to observe current research challenges and scientific community responses. This provides readers with a vital picture of the state-of-the-art in AI research.

article: AI Expert: (Warning) 2030 Might Be The Point Of No Return! We've Been Lied To About AI!
authors: The Diary Of A CEO
date: 2025-12-04
desc.: AI Expert Stuart J. Russel, exposes the trillion-dollar AI race, why governments won’t regulate, how AGI could replace humans by 2030, and why only a nuclear-level AI catastrophe will wake us up. NOTE: During the interview, a crucial question arises: If you had a 'red button' that could erase all AI-LLM current development, would you press it?... hear the answer with reasons
category: youtube, interivew

article: Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
authors: Chengshuai Zhao, Zhen Tan, Pingchuan Ma, Dawei Li, Bohan Jiang and others
date: 2025-08-02, revisited 2025-08-13
desc.: The aim of the Chain-of-Thought (CoT) approach is to produce human-like reasoning steps, but this may be more superficial than it appears. This paper studies CoT using data distribution analysis to enable observation of reasoning paths. For this purpose, the DataAlchemy environment has been designed. Systematic validation reveals that CoT exhibits sharp performance degradation when detecting unknown patterns.
category: research

article: The BS-meter: A ChatGPT-Trained Instrument to Detect Sloppy Language-Games
authors: Alessandro Trevisan, Harry Giddens, Sarah Dillon, Alan F. Blackwell (Cambridge)
date: 2024-11-22, revisited 2025-06-10
desc.: Using hypothesis-testing methods, this paper demonstrates that a statistical model of sloppy language can reliably generate the artificial output of ChatGPT to the social and workplace referred to bullshit as observed in natural human language. The paper presents an empirical investigation of LLM behavior that offers insights into language use while clarifying the social and epistemological status of LLMs themselves. The results indicate with high significance that ChatGPT's outputs resemble bullshit jobs rather than precise, factual scientific writing. While this is often evident from observing its outputs, the mechanisms by which such imprecise language is produced have not been previously established.
category: research

article: Comparison of Text-Based and Image-Based Retrieval in Multimodal Retrieval Augmented Generation Large Language Model Systems
authors: Elias Lumer, Alex Cardenas, Matt Melich, Myles Mason, Sara Dieter and others (PricewaterhouseCoopers)
date: 2025-11-25
desc.: This paper discusses the capabilities of Retrieval-Augmented Generation (RAG) systems to access multimodal knowledge bases containing both text and visual information, such as charts, for information extraction. The paper reveals limitations, such as contextual loss, and presents a novel RAG analysis approach for comparing embedding creation methods. The paper analyzes the most suitable approaches for storing embeddings that incorporate both text and visual information.
category: research

article: The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
authors: Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton and others (Apple)
date: 2025-07-07
desc.: This paper discusses the progress of language models in generating detailed reasoning processes (Chain-of-Thought) prior to producing answers and improved benchmarks performance. However, the paper argues, supported by empirical evidence, that their fundamental capabilities, scaling properties, and limitations remain poorly understood. The paper systematically reveals the limitations related to task complexity and provides directions for future research.
category: research

article: Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
authors: A. Lawsen
date: 2025-07-10
desc.: This paper responds to "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity." It presents an alternative perspective, aiming to recontextualize the original findings while identifying three potential critical issues in the original paper: (1) Tower of Hanoi experiments risk exceeding model output token limits, (2) limitations of the automated evaluation framework employed, and (3) benchmark constraints. Nevertheless, the paper acknowledges that the original findings underscore the importance of rigorous experimental design when evaluating AI reasoning capabilities.
category: research

article: A Comment On The Illusion of Thinking: Reframing the Reasoning Cliff as an Agentic Gap
authors: Sheraz Khan, Subha Madhavan, Kannan Natarajan (Pfizer, Cambridge)
date: 2025-07-25
desc.: While the paper acknowledges the results provided by "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity," it aims to present an alternative perspective. The paper argues that the observed failures in Chain-of-Thought reasoning do not constitute evidence of a fundamental cognitive boundary, but rather represent predictable outcomes of various system-level constraints. The paper concludes that "The Illusion of Thinking" provides a valuable contribution by developing a rigorous benchmark and demonstrating that explicit Chain-of-Thought in models such as DeepSeek-R1 and Claude 3.7 Sonnet-Thinking does not guarantee reliable execution of long plans. However, it contends that the conclusion regarding an intrinsic reasoning frontier is premature.
category: research

article: LLMs’ 'simulated reasoning' abilities are a 'brittle mirage', researchers find
authors: Kyle Orland (arstechnica)
date: 2025-08-11
desc.: Over recent months, LLMs have demonstrated capabilities in pattern matching across both structured and unstructured data. This article examines whether the responses generated by agentic systems can be considered equivalent to the logical reasoning observed in human thought processes. The presented data and cited sources raise questions about such capabilities, including concerns regarding the systems' understanding of their own generated responses. The article includes the following sections: "No One Trained Me for This," "A False Aura of Dependability," and discussions of warned findings related to "chain-of-thought" approaches with supporting references.
category: research

article: Detecting Perspective Shifts in Multi-agent Systems
authors: Eric Bridgeford, Hayden Helm
date: 2025-12-04
desc.: Let us model a situation where data-scrapers access the internet, databases, or other LLMs and, based on collected data, generate or serve decision proposals. This paper introduces the Temporal Data Kernel Perspective Space (TDKPS) approach, which aims to detect agent behavioral changes in black-box settings. The paper discusses limitations and future research proposals.
category: research

article: Strategic Self-Improvement for Competitive Agents in AI Labour Markets
authors: Christopher Chiu, Simpson Zhang, Mihaela van der Schaar (University of Cambridge)
date: 2025-12-04
desc.: The paper introduces a novel framework that captures the real-world simulation of economic forces that may shape agentic labor markets in comparison with traditional human labor markets. Although agentic labor markets will differ significantly from their human counterparts, this paper identifies critical economic forces and capabilities required by agentic systems: metacognition, competitive awareness, and long-horizon strategic planning. Despite reported limitations, self-improving agents have demonstrated superior performance compared to other agent types (e.g., CoT, ReAct).
category: research

article: Is Vibe Coding Safe? Benchmarking Vulnerability of Agent-Generated Code in Real-World Tasks
authors: Songwen Zhao, Danqing Wang and others
date: 2025-12-02
desc.: In recent months, the developer community has witnessed a rapid increase in the adoption of the "Vibe Coding" programming paradigm. "Vibe coding" practices are widely used, predominantly by beginner developers, despite unresolved concerns regarding associated risks and vulnerabilities. The paper reports that although coding agents may achieve cca. 60% solution success rates, only cca. 10% of these solutions are free from known security issues, with the possibility of introducing undocumented attack vectors remaining a significant concern.
category: research

article: Mathematical Framing for Different Agent Strategies
authors: Philip Stephens, Emmanuel Salawu (Google Cloud AI)
date: 2025-12-05
desc.: The paper introduces a probabilistic framework for comparing diverse AI agent strategies, allowing for a more detailed view of outcomes. The paper discusses the trade-offs of various architectures while highlighting the necessity of mathematical evaluation. The paper establishes that the behavior of any agentic system may be understood as a probabilistic process by framing individual agent behavior as a chain of probabilities. The paper does not question the non-deterministic nature of LLMs themselves, but rather aims to establish a "Degrees of Freedom" agentic concept and considering probability.
category: research

The post JC-AI Newsletter #11 appeared first on foojay.

JC-AI Newsletter #10

Miro Wengner — Wed, 26 Nov 2025 18:39:40 +0000

Fourteen days have passed, and it is time to present a fresh collection of readings that could influence developments in the field of artificial intelligence.

This newsletter focuses on examining how agentic AI systems improve accuracy, tutorials on agentic system architecture, and importnat security challenges arising from increased not only from agentic AI systems adoption. This edition of the AI newsletter includes compelling discussions and interviews about the future of AI and approaches.

article: Introducing SWE-grep and SWE-grep-mini: RL for Multi-Turn, Fast Context Retrieval
authors: Ben Pan, Carlo Baronio, Albert Tam, Pietro Marsella and others
date: 2025-10-16
desc.: Modern coding agents face a fundamental trade-off between speed and intelligence. The article presents SWE-grep and SWE-grep-mini, trained fast agentic models specialized in highly parallel context retrieval. These models match the retrieval capabilities of frontier coding models while requiring an order of magnitude less time.
category: research

article: Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs
authors: Ali Taghibakhshi, Sharath Turuvekere Sreenivas, Saurav Muralidharan, Ruisi Cai, Marcin Chochowski and others
date: 2025-11-20
desc.: Training a family of large language models targeting multiple scales and deployment objectives is prohibitively expensive. Recent work on model compression through pruning and knowledge distillation has reduced this cost, but still requires substantial computational resources, increasing costs per compressed model. This paper presents Nemotron Elastic, the first elastic training framework for reasoning-capable LLMs. While the Nemotron Elastic framework achieves good results, it still has potential for future research. (NVIDIA)
category: research

article: Cognitive Foundations for Reasoning and Their Manifestation in LLMs
authors: Priyanka Kargupta, Shuyue Stella Li, Haocheng Wang, Jinu Lee and others
date: 2025-11-20
desc.: Large language models successfully solve complex problems yet fail on simpler variants, suggesting they achieve correct outputs through mechanisms fundamentally different from human reasoning. A meta-analysis of 1,598 LLM reasoning papers reveals that the research community concentrates on easily quantifiable behaviors while neglecting meta-cognitive controls. The paper documents systematic structural differences and proposes connecting cognitive science with research on model capabilities rather than pursuing various shortcuts.However, the presented results leave unclear whether the proposed guidance enables genuine deployment of latent capabilities or simply helps models retrieve cached reasoning patterns from training data.
category: research

article: Incorporating Self-Rewriting into Large Language Model Reasoning Reinforcement
authors: Jiashu Yao, Heyan Huang, Shuang Zeng, Chuwei Luo and others
date: 2025-11-20
desc.: Rather than traditional approaches that reward reasoning processes through reinforcement learning which can lead to issues such as over-thinking, focus on irrelevant aspects and etc., the paper presents a Self-Rewriting approach in which a model rewrites its own reasoning text and subsequently learns from the rewritten reasoning to improve its internal thought process quality. The results report improved accuracy of +0.6 alongside 46% shorter reasoning sequences. The article discusses the achieved results and related challenges, including trade-offs compared to standard approaches.
category: research

article: Hiding in the AI Traffic: Abusing MCP for LLM-Powered Agentic Red Teaming
authors: Strahinja Janjuesvic, Anna Baron Garcia, Sohrob Kazerounian
date: 2025-11-20
desc.: Today's 'Vibe Coding' approach enables developers to generate code without fully understanding its mechanics, including the orchestration of multi-agent swarms and sophisticated detection evasion strategies. While existing frameworks may use LLMs to issue post-exploitation commands, they often rely on traditional channels. The paper proposes an innovative Command & Control (C2) architecture leveraging the Model Context Protocol (MCP) for coordinating autonomous red teams of agents while addressing stealth and evasion aspects in depth. The article discusses differences between theoretical attack vectors and enterprise environments. Although the approach shows noticeable improvements, it comes with multiple unanswered questions for future research (MIT, Antropic).
category: research

article: JudgeBoard: Benchmarking and Enhancing Small Language Models for Reasoning Evaluation
authors: Zhenyu Bi, Gaurav Srivastava, Yang Li, Meng Lu, Swastik Roy and others
date: 2025-11-20
desc.: Although SLMs' ability to judge answers remains underexplored, recent studies show that small language models (SLMs) can perform competitively on reasoning tasks with appropriate prompting or fine-tuning. This paper proposes JudgeBoard, an evaluation pipeline capable of injecting SLMs to improve answer comparisons. Due to the limitations of SLMs, the paper introduces the Multi-Agent Judging (MAJ) framework, which outperforms standard approaches (Chain-of-Thought, etc.) by approximately 2% in accuracy. The paper reveals a significant performance gap in judging capability between SLMs and LLMs while highlighting the importance of multi-stage judging (Amazon).
category: research

article: Multi-Agent LLM Orchestration Achieves Deterministic, High-Quality Decision Support for Incident Response
authors: Philip Drammeh
date: 2025-11-19
desc.: Through multiple trials using a reproducible framework, the paper demonstrates that multi-agent orchestration fundamentally transforms LLM-based incident response quality compared to single-agent, error-prone solutions. The multi-agent response is treated as deterministic while introducing latency, however, speed is not the primary goal, provided it remains within acceptable thresholds. Despite the strong performance of multi-agent systems, multiple challenges remain, including LLM deadlocks, fine-tuning requirements, and latency constraints.
category: research

article: Hierarchical Token Prepending: Enhancing Information Flow in Decoder-based LLM Embeddings
authors: Xueying Ding, Xingyue Huang, Mingxuan Ju, Liam Collins and others
date: 2025-11-18
desc.: The paper proposes Hierarchical Token Prepending (HTP) to improve causal attention mechanisms by mitigating attention-level compression and introducing mean-pooling, enabling backward information flow that is critical for generating high-quality embeddings. HTP achieves consistent performance, especially in long-context settings. The article addresses future research directions.
category: research

article: Stanford AI Club: Jeff Dean on Important AI Trends
authors: Stanford AI Club
date: 2025-11-24
desc.: Jeff Dean is one of the most influential computer scientists of the modern computing era, best known as Google’s Chief Scientist and a co-founder of Google Brain. His work has shaped the foundations of large-scale distributed systems and modern machine learning—spanning breakthroughs in search infrastructure, deep learning frameworks like TensorFlow, and today’s frontier AI research. The video provides a timeline of basic technologies and approaches currently employed in the AI-LLM field.
category: youtube

article: Elon Musk Makes Shocking Future Predictions At U.S.-Saudi Arabia Forum Alongside Jensen Huang
authors: Forbes Breaking News
date: 2025-11-20
desc.: Elon Musk and Jensen Huang discuss technology at the U.S.-Saudi Arabia Investment Forum in Washington, D.C., offering an interesting perspective on the future. The interview presents a vision free from current societal constraints and structures such as money-based decisions, resource requirements, sustainability of technologies, or long-term impacts that may limit future evolution. The interview does not address crucial contemporary debates.
category: youtube, interview

article: AI Kill Switch for malicious web-based LLM agent
authors: Sechan Lee, Sangdon Park
date: 2025-09-26
desc.: While AI agents improve the ability to handle complex tasks, they simultaneously amplify the risks of malicious misuse, such as unauthorized collection of personally identifiable information (PII). The paper proposes an "AI Kill Switch" technique aimed at immediately identifying and stopping such malicious AI agent behavior. The key idea lies in identifying an effective defense prompt, which shows similarities to the "LLM as a judge" approach, and focuses on "Prompt Injection" and "Jailbreak-based prompt" forms of attacks. The paper discusses limitations such as the absence of real-world test cases and additional challenges.
category: research

article: BrowseSafe: Understanding and Preventing Prompt Injection Within AI Browser Agents
authors: Kaiyuan Zhang, Mark Tenenholtz, Kyle Polley and others
date: 2025-11-25
desc.: The integration of artificial intelligence (AI) agents into web browsers introduces security challenges beyond traditional web application threat models. The paper discusses identified attack vectors, such as prompt injection, and their impact within real-world environments, noting the low level of current understanding. The paper proposes a novel benchmark and multi-layer defense mechanism called BrowseSafe. Although the paper presents improvements, the complexity of prompt injection attacks remains an open investigation topic (Perplexity AI).
category: research

The post JC-AI Newsletter #10 appeared first on foojay.

Micrometer & Prometheus in Spring Boot: Kafka Burger Orders🍔📨

Vincent Vauban — Fri, 14 Nov 2025 10:13:03 +0000

Table of Contents

1) Expose a Counter with Tags (Micrometer)2) REST Controller → Produce to Kafka3) Kafka Consumer → Count “DukeBurger”4) Avro Bytes → Object (utility)References

👨‍💻 GitHub: https://github.com/vinny59200/dukeburger

🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪

TL;DR

This guide shows how to use Micrometer and Prometheus in Spring Boot to track a custom metric for a Kafka-driven Burger Orders app. You’ll post a burger order to a REST endpoint, publish it to Kafka, consume the topic, and increment a counter for all “DukeBurger” orders. Copy the snippets, run, and you’ll see your metric on /actuator/prometheus.

🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪

Why Micrometer and Prometheus?

Micrometer is a vendor-neutral metrics facade. Your code records counters, timers, and gauges once; Micrometer ships those to many backends (Prometheus, Datadog, etc.) via simple registries. Prometheus is a time-series database that pulls metrics by scraping an HTTP endpoint periodically (Spring exposes /actuator/prometheus). Micrometer Application Observability

Key ideas:

Micrometer offers a simple API: Counter, Timer, Gauge.
Spring Boot Actuator autoconfigures Micrometer and exposes metrics endpoints, including Prometheus format. See
Prometheus “scrapes,” so your app just exposes a text endpoint—no push needed. docs.micrometer.io

🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪

What the Burger Orders App Does

Order a burger via HTTP POST /orders?burger=DukeBurger.
Produce an Avro message to Kafka topic burger.orders.
Consume burger.orders with @KafkaListener.
Increment a Micrometer Counter named events_DukeBurger_total whenever the burger is "DukeBurger".
Expose metrics at /actuator/prometheus for Prometheus to scrape.

This pattern is common: REST → Kafka → Consumer → Metric. Spring Kafka makes producing and consuming concise; Micrometer makes metrics easy. See

🔵🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪

The Data Contract (Avro)

{
  "type": "record",
  "name": "BurgerOrder",
  "namespace": "com.vv.burger",
  "fields": [
    { "name": "burger", "type": "string" },
    { "name": "timestamp", "type": "string" }
  ]
}

Why: A tiny schema keeps the demo clear. Avro gives you compact messages and generated classes.

🔵🔵🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪

Hot Spots: Minimal Code You Need

Spring initializer for Micrometer & Prometheus in Spring Boot: Kafka Burger Orders

1) Expose a Counter with Tags (Micrometer)

package com.vv.burger.config;

import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Tag;
import io.micrometer.core.instrument.Tags;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class MetricsConfig {

    @Bean
    public Counter burgerOrderCounter(MeterRegistry registry) {
        // Common tags for the burger app
        Tags tags = Tags.of(
                Tag.of("app", "burger-service"),
                Tag.of("topic", "burger.orders")
                           );

        return Counter.builder("events_DukeBurger_total")
                      .description("Count of DukeBurger order events processed")
                      .baseUnit("orders")
                      .tags(tags)
                      .register(registry);
    }
}

Side note: We add consistent tags now (app, topic) so you can filter and graph later. See

🔵🔵🔵🔵🔵🔵⚪⚪⚪⚪⚪⚪

2) REST Controller → Produce to Kafka

package com.vv.burger.controller;

import com.vv.burger.BurgerOrder;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;

import java.nio.charset.StandardCharsets;
import java.time.OffsetDateTime;
import java.util.UUID;

@RestController
@RequestMapping( "/orders" )
public class OrderController {

    private final KafkaTemplate kafkaTemplate;
    private final String                             topic;

    public OrderController( KafkaTemplate kafkaTemplate,
                            @Value( "${app.kafka.topic}" ) String topic ) {
        this.kafkaTemplate = kafkaTemplate;
        this.topic = topic;
    }

    @PostMapping
    public String sendOrder( @RequestParam String burger ) {
        // 1. Build the Avro payload (BurgerOrder must be a generated Avro class)
        BurgerOrder order = BurgerOrder.newBuilder()
                                       .setBurger( burger )
                                       .setTimestamp( OffsetDateTime.now()
                                                                    .toString() )
                                       .build();

        // 2. Create CloudEvent metadata as headers
        String id = UUID.randomUUID()
                        .toString();
        OffsetDateTime now = OffsetDateTime.now();

        ProducerRecord record = new ProducerRecord<>( topic, order );
        record.headers()
              .add( "ce_id", id.getBytes( StandardCharsets.UTF_8 ) );
        record.headers()
              .add( "ce_type", "BurgerOrder".getBytes( StandardCharsets.UTF_8 ) );
        record.headers()
              .add( "ce_source", "http://localhost/orders".getBytes( StandardCharsets.UTF_8 ) );
        record.headers()
              .add( "ce_specversion", "1.0".getBytes( StandardCharsets.UTF_8 ) );
        record.headers()
              .add( "ce_time", now.toString()
                                  .getBytes( StandardCharsets.UTF_8 ) );
        record.headers()
              .add( "ce_subject", "order".getBytes( StandardCharsets.UTF_8 ) );
        record.headers()
              .add( "ce_datacontenttype", "application/avro".getBytes( StandardCharsets.UTF_8 ) );

        // 3. Send the record
        kafkaTemplate.send( record );

        return "✅ Order sent to Kafka: " + burger;
    }
}

Side note: The headers mimic CloudEvents so you can plug into event tooling later. This is optional for the metric. Cloud Events

🔵🔵🔵🔵🔵🔵🔵⚪⚪⚪⚪⚪

3) Kafka Consumer → Count “DukeBurger”

package com.vv.burger.consumer;

import com.vv.burger.BurgerOrder;
import io.cloudevents.CloudEvent;
import io.cloudevents.core.builder.CloudEventBuilder;
import io.micrometer.core.instrument.Counter;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.springframework.context.annotation.Bean;
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.kafka.config.ConcurrentKafkaListenerContainerFactory;
import org.springframework.kafka.core.ConsumerFactory;
import org.springframework.stereotype.Component;

import java.net.URI;
import java.time.OffsetDateTime;
import java.util.Map;

@Component
public class ConsumerApp {

    // Injects the Counter bean defined in MetricsConfig (events_DukeBurger_total)
    private final Counter burgerOrderCounter;

    public ConsumerApp( final Counter burgerOrderCounter ) {
        this.burgerOrderCounter = burgerOrderCounter;
    }

    @KafkaListener( topics = "burger.orders",
                    groupId = "group1" )
    public void receive( ConsumerRecord record ) {
        BurgerOrder order = record.value();

        // Optionally reconstruct CloudEvent from headers
        CloudEvent cloudEvent = CloudEventBuilder.v1()
                                                 .withId( getHeader( record, "ce_id" ) )
                                                 .withType( getHeader( record, "ce_type" ) )
                                                 .withSource( URI.create( getHeader( record, "ce_source" ) ) )
                                                 .withSubject( getHeader( record, "ce_subject" ) )
                                                 .withTime( OffsetDateTime.parse( getHeader( record, "ce_time" ) ) )
                                                 .withDataContentType( getHeader( record, "ce_datacontenttype" ) )
                                                 .withData( "application/avro", order.toString()
                                                                                     .getBytes() ) // optional
                                                 .build();

        System.out.println( "📥 Received order: " + order.getBurger() + " at " + order.getTimestamp() );
        System.out.println( "🧾 CloudEvent type: " + cloudEvent.getType() + ", id: " + cloudEvent.getId() );

        if ( isDukeBurger( order ) ) {
            burgerOrderCounter.increment();
        }
    }

    private boolean isDukeBurger( final BurgerOrder order ) {
        return "DukeBurger".equals( order.getBurger()
                                         .toString() );
    }

    private String getHeader( ConsumerRecord record, String key ) {
        return new String( record.headers()
                                 .lastHeader( key )
                                 .value() );
    }

    @Bean
    public ConcurrentKafkaListenerContainerFactory kafkaListenerContainerFactory() {
        ConcurrentKafkaListenerContainerFactory factory =
                new ConcurrentKafkaListenerContainerFactory<>();
        factory.setConsumerFactory( consumerFactory() );
        return factory;
    }

    public ConsumerFactory consumerFactory() {
        Map props = Map.of(
                "bootstrap.servers", "kafka:9092",
                "group.id", "group1",
                "key.deserializer", StringDeserializer.class.getName(),
                "value.deserializer", io.confluent.kafka.serializers.KafkaAvroDeserializer.class.getName(),
                "schema.registry.url", "http://schema-registry:8081",
                "specific.avro.reader", true
                                          );

        return new org.springframework.kafka.core.DefaultKafkaConsumerFactory<>( props );
    }
}

Side note: @KafkaListener binds the method to the topic with minimal boilerplate. Keep consumer config small for a first run. See

🔵🔵🔵🔵🔵🔵🔵🔵⚪⚪⚪⚪

4) Avro Bytes → Object (utility)

package com.vv.burger.consumer;

import com.vv.burger.BurgerOrder;
import org.apache.avro.io.BinaryDecoder;
import org.apache.avro.io.DecoderFactory;
import org.apache.avro.specific.SpecificDatumReader;

import java.io.ByteArrayInputStream;

public class AvroUtils {
    public static BurgerOrder fromBytes( byte[] bytes ) {
        try ( ByteArrayInputStream in = new ByteArrayInputStream( bytes ) ) {
            SpecificDatumReader reader = new SpecificDatumReader<>( BurgerOrder.class );
            BinaryDecoder decoder = DecoderFactory.get()
                                                  .binaryDecoder( in, null );
            return reader.read( null, decoder );
        } catch ( Exception e ) {
            throw new RuntimeException( "Failed to deserialize BurgerOrder Avro event", e );
        }
    }
}

Side note: Spring Kafka + Confluent deserializer already returns BurgerOrder, so you rarely need this. It’s useful in tests or when you manually handle bytes.

🔵🔵🔵🔵🔵🔵🔵🔵🔵⚪⚪⚪

Application Properties (essentials)

spring:
  application:
    name: burger-service

# Kafka
app:
  kafka:
    topic: burger.orders

# Actuator + Micrometer Prometheus
management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus
  endpoint:
    health:
      show-details: always

Side note: This exposes /actuator/prometheus so Prometheus can scrape. See

🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵⚪⚪

Run & Observe

Build the image : docker build -t my-spring-boot-app:latest .
Run the app: docker-compose up -d
Create the topic: http://localhost:8080/ui/clusters/local/all-topics/create-new-topic named burger.orders
Send a few orders:
- curl -X POST "http://localhost:8080/orders?burger=DukeBurger
- curl -X POST "http://localhost:8080/orders?burger=Veggie
- curl -X POST "http://localhost:8080/orders?burger=DukeBurger
Check metrics: open http://localhost:8080/actuator/prometheus and search for events_DukeBurger_total. You should see it increase after each “DukeBurger” consumed.
Check in JMC: Connect JMC to your app and In JMC; open MBean Browser (left pane); Expand the metric ; Navigate to the counter events_DukeBurger_total; Click it → Attributes tab → read Count; You should see it increase after each “DukeBurger” consumed.

JMC for Micrometer & Prometheus in Spring Boot: Kafka Burger Orders

🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵⚪

Takeaways

Small steps win: REST → Kafka → Consumer → Metric is a powerful, simple pipeline.
Micrometer first: Write metrics once; swap backends later (Prometheus today, Datadog tomorrow). See
Tags matter: Add app and topic tags now. Your future dashboards will thank you.
Avro stays lean: A tiny schema keeps payloads small and generated classes easy to use.
CloudEvents optional: The headers help interoperability but are not required for Micrometer. Cloud Events

🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵

Conclusion

You just wired Micrometer and Prometheus in Spring Boot around a Kafka flow and produced a clean, tagged counter you can graph and alert on. From here, extend the metric set (timers for latency, gauges for queue depth), add dashboards, and create an alert when events_DukeBurger_total stalls or spikes.

Recommended Courses (go further with certification)

Java OCP prep (Udemy):
https://www.udemy.com/course/ocp-oracle-certified-professional-java-developer-prep/?referralCode=54114F9AD41F127CB99A
Spring Professional – 6 full tests (Udemy):
https://www.udemy.com/course/spring-professional-certification-6-full-tests-2v0-7222-a/?referralCode=04B6ED315B27753236AC
Spring Certification Book (Leanpub/ Kindle/ Paperback):
https://spring-book.mystrikingly.com/

References

Micrometer docs and Prometheus registry overview. Micrometer Application Observability+1
Spring Boot Actuator metrics. See
@KafkaListener reference. See
CloudEvents Java SDK. Cloud Events

The post Micrometer & Prometheus in Spring Boot: Kafka Burger Orders🍔📨 appeared first on foojay.

JC-AI Newsletter #9

Miro Wengner — Wed, 12 Nov 2025 15:21:12 +0000

Fourteen days have passed, and it is time to present a fresh collection of readings that could influence developments in the field of artificial intelligence.

This newsletter focuses on examining how AI enhances productivity through enterprise studies, tutorial, agentic system architecture, GraphRAG, evaluating risk methodologies in agentic systems, and the security challenges arising from increased AI-LLM adoption. This edition of the AI newsletter includes a compelling discussion between six of the most influential leaders in artificial intelligence, along with additional content.

The world influenced by LLM is changing very quickly, let's start...

article: The Minds of Modern AI: Jensen Huang, Geoffrey Hinton, Yann LeCun & the AI Vision of the Future
authors: Financial Times Live
date: 2025-11-06
desc.: Six of the most influential figures in AI (Jensen Huang, Yoshua Bengio, Geoffrey Hinton, Fei-Fei Li, Yann LeCun, and Bill Dally) share their vision for the future of the field. Defining a clear future horizon for AI remains a challenging goal. The interviewees appear to grapple with questions regarding concrete AI contributions and the trajectory of progress, avoiding discussion of current challenges while expressing hope that future research will adequately address these issues.
category: youtube

article: GraphRAG: The Marriage of Knowledge Graphs and RAG: Emil Eifrem
authors: Emil Eifrem, AI Engineers
date: 2024-08-28
desc.: Although GraphRAG has made dramatic progress, the fundamentals are sometimes overlooked in favor of introducing additional features. As the saying goes, 'Natural language is most powerful when it can draw from a rich context.' This principle applies equally to both poetry and large language models. Knowledge graphs excel at capturing context, which raises an important question: how can combining knowledge graphs with RAG enhance this capability?
category: youtube

article: GraphRAG: Unlocking LLM discovery on narrative private data
authors: Jonathan Larson, Steven Truitt (Microsoft)
date: 2024-02-13
desc.: A remaining challenge for LLMs is extending their powerful capabilities to solve problems beyond their training data and to achieve comparable results with data the LLM has never encountered. Although the Microsoft Research work on GraphRAG is already somewhat dated given the current pace of LLM development, it remains valuable to understand the fundamentals, rationale, and purpose of GraphRAG. GraphRAG may play an important role in the development of agentic AI systems.
category: research

article: Agentic GraphRAG: Simplifying Retrieval Across Structured & Unstructured Data — Zach Blumenfeld
authors: Zach Blumenfeld, AI Engineers
date: 2025-06-27
desc.: Agentic workflows often become complex, brittle, and difficult to maintain when they need to retrieve and reason across both structured and unstructured data. This talk explores how mapping key information into a knowledge graph can simplify these workflows and improve retrieval quality. The presented example of identifying individuals with similar skills and abilities extracted from CVs provides insight into the practical application of agentic AI systems with GraphRAG.
category: youtube

article: TAMAS: Benchmarking Adversarial Risks in Multi-Agent LLM Systems
authors: Ishan Kavathekar, Hemang Jain, Ameya Rathod, Ponnurangam Kumaraguru, Tanuja Ganu
date: 2025-11-07
desc.: The agentic AI systems are increasingly used to collaboratively solve problems. However, the safety and security of these systems remain largely under-explored. Existing benchmarks and datasets predominantly focus on single-agent settings, providing biased results and failing to capture the unique vulnerabilities of multi-agent dynamics and coordination. The paper aims to address a gap related to the safety, security, and various vulnerabilities of multi-agent LLM systems by introducing the Threats and Attacks in Multi-Agent Systems (TAMAS) benchmark. Reported findings show that multi-agent systems are highly vulnerable to adversarial attacks.
category: research

article: ORCHID: Orchestrated Retrieval-Augmented Classification with Human-in-the-Loop Intelligent Decision-Making for High-Risk Property
authors: Maria Mahbub, Vanessa Lama, Sanjay Das, Brian Starks and others
date: 2025-11-07
desc.: High-Risk Property (HRP) classification is critical at U.S. Department of Energy (DOE) sites, where inventories include sensitive and often dual-use equipment. Compliance efforts must track evolving regulations designated by various export control policies to ensure transparent and auditable decisions. Traditional expert-only workflows are time-consuming, prone to backlogs, and struggle to keep pace with shifting regulatory boundaries. The paper introduces ORCHID, a modular agentic system for HRP classification that pairs retrieval-augmented generation (RAG) with human oversight to produce policy-based outputs that can be audited. "Although ORCHID enhances classification reliability, transparency, and reproducibility through evidence-based, policy-aware decision-making, it comes with several limiting factors: the precision and validity of source documents, ambiguity in decision-making processes, the requirement for qualified reviewers, and other constraints.
category: research

article: Multi-Agent Craftax: Benchmarking Open-Ended Multi-Agent Reinforcement Learning at the Hyperscale
authors: Bassel Al Omari, Michael Matthews, Alexander Rutherford, Jakob Nicolaus Foerster
date: 2025-11-07
desc.: Through analytical examination, the paper demonstrates that existing algorithms struggle with key challenges in this benchmark, including long-horizon credit assignment, exploration, and cooperation, and argues for its potential to drive long-term research in multi-agent reinforcement learning (MARL). MARL extends the reinforcement learning paradigm to the co-learning of multiple agents simultaneously. The paper introduces Craftax-MA and its extension Craftax-Coop, a multi-agent extension of the hardware-accelerated Craftax benchmark. The obtained results were limited by small agent populations, and future research directions are proposed.
category: research

article: StepChain GraphRAG: Reasoning Over Knowledge Graphs for Multi-Hop Question Answering
authors: Tengjun Ni, Xin Yuan, Shenghong Li, Kai Wu, Ren Ping Liu, Wei Ni, Wenjie Zhang
date: 2025-10-03
desc.: The paper addresses challenges of commonly used approaches that rely on static or ad hoc expansions of knowledge graphs. The paper introduces the StepChain GraphRAG framework, which combines question decomposition and BFS-RF (breadth-first search reasoning flow) with dynamic graph maintenance. This pipeline dynamically inserts new evidence at each sub-question, refining the knowledge graph in real time. The result is a more transparent, debuggable process for multi-hop question answering that fully exploits both text-based retrieval and graph-structured insights.
category: research

article: RAG Meets Temporal Graphs: Time-Sensitive Modeling and Retrieval for Evolving Knowledge
authors: Jiale Han, Austin Cheung, Yubai Wei and others
date: 2025-10-15
desc.: While Retrieval-Augmented Generation (RAG) systems enrich LLMs with external knowledge, they largely ignore temporal dynamics, which raises two challenges for RAG systems. First, current RAG methods lack effective time-aware representations. The same facts at different time points are difficult to distinguish using vector embeddings or conventional knowledge graphs. Second, most RAG evaluations assume a static corpus, leaving a blind spot regarding update costs and retrieval stability as knowledge evolves. This paper introduces Temporal GraphRAG (TG-RAG), which incorporates time-aware retrieval strategies. Although TG-RAG outperforms current baselines, it comes with several challenges.
category: research

article: TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework
authors: Chao Zhang, Yuhao Wang, Derong Xu, Haoxin Zhang and others
date: 2025-11-07
desc.: Retrieval-Augmented Generation (RAG) enhances Large Language Models' reliability through external knowledge integration. While agentic RAG systems use autonomous, multi-round retrieval for improved accuracy, they generate substantial token overhead. TeaRAG addresses this efficiency challenge by compressing both retrieval content and reasoning steps, delivering a token-efficient agentic RAG framework that balances accuracy with computational economy.
category: research

article: The Learning Loop and LLMs
authors: Unmesh Joshi, Thoughtworks
date: 2025-11-04
desc.: Software development has consistently resisted the notion that it can be reduced to an assembly-line process. Even as our tools become smarter, faster, and more capable, the essential nature of the work remains unchanged: we learn by doing. We must acknowledge the fundamental role of experiential learning in this field, “there are no shortcuts to learning”.
category: opinion

article: Driving a web browser with Gemini's Computer Use model in Java
authors: Guillaume Laforge
date: 2025-11-02
desc.: This tutorial will guide you through the process of programmatically interacting with a web browser using the new Computer Use model in Gemini 2.5 Pro. The tutorial presents an example project written in Java that leverages Microsoft's powerful Playwright Java SDK to handle browser automation. Multi-agentic systems may complement classical end-to-end tests, but several challenges remain, including hallucination.
category: tutorial

The post JC-AI Newsletter #9 appeared first on foojay.

JC-AI Newsletter #8

Miro Wengner — Thu, 30 Oct 2025 06:36:12 +0000

Fourteen days have passed, and it is time to present a fresh collection of readings that could influence developments in the field of artificial intelligence.

This newsletter focuses on examining how AI enhances productivity through enterprise studies, agentic system architecture, attack vectors, Model Context Protocol (MCP) implementation, Agent-to-Agent (A2A) protocol, Java code generation within IDEs, LLM benchmarking methodologies, and the security challenges arising from increased AI-LLM adoption.

The world influenced by LLM is changing very quickly, let's start...

article: Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models
authors: Samuel Paech, Allen Roush, Judah Goldfeder, Ravid Shwartz-Ziv
date: 2025-10-16
desc.: Widespread LLM adoption has introduced characteristic repetitive phraseology, termed "slop" which degrades output quality and makes AI-generated text immediately recognizable. This paper presents Antislop, a comprehensive framework providing tools to detect and eliminate these overused patterns.
category: research

article: Toward Understanding Security Issues in the Model Context Protocol Ecosystem
authors: Xiaofan Li, Xing Gao
date: 2025-10-18
desc.: The Model Context Protocol (MCP) is an emerging open standard that enables AI-powered applications to interact with external tools through structured metadata, while lacking a sufficient standardization. This paper presents the first comprehensive security analysis of MCP ecosystems and uncovers a wide range of vulnerabilities.
category: research

article: The 4 Patterns of AI Native Development
authors: Patrick Debois
date: 2025-06-04
desc.: The presentation examines AI development evolution through the lens of previously observed cloud computing patterns. It introduces four AI-native development paradigms: 1. producer-to-manager, 2. implementation-to-intent, 3. delivery-to-discovery, and 4. content creation knowledge. While the framework appears to project development needs in a linear fashion, the presentation does not fully address the challenges associated with the nondeterministic behavior of LLMs, which affects all levels of project development.
category: youtube

article: Does AI Actually Boost Developer Productivity ? (100k Devs Study)
authors: Yegor Denisov-Blanch
date: 2025-07-23
desc.: The presentation addresses a critical question regarding the impact of AI-LLM utilization on project development. Data collected from 136 teams across 27 companies provides statistically significant findings. This dataset enables the formulation of hypotheses concerning the conditions under which AI-assisted coding delivers desired value. Standford University research.
category: youtube

article: MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents
authors: Dongsen Zhang, Zekun Li, Xu Luo, Xuannan Liu and others
date: 2025-10-14
desc.: While Model Context Protocol (MCP) unlocks broad interoperability between agents, it notably extends the attack surface of agentic systems. This paper presents the MCP Security Benchmark, which aims to provide systematic measures of agent resistance against various forms of attacks. The paper discovers that models with stronger performance are more vulnerable to attacks due to various discussed reasons. The experiments demonstrate that MCP-specific vulnerabilities are highly exploitable. The paper provides a practical baseline for future research.
category: research

article: Formalizing the Safety, Security, and Functional Properties of Agentic AI Systems
authors: Edoardo Allegrini, Ananth Shreekumar, Z. Berkay Celik
date: 2025-10-15
desc.: This paper aims to address critical questions related to the utilization of safety protocols, security, and functionality of critical systems depending on LLMs. The current ecosystem of agent communication lacks standardization, such as the Model Context Protocol (MCP) for tool access or the Agent-to-Agent (A2A) protocols. This fragmentation creates a semantic gap that prevents rigorous analysis of system properties and introduces risks such as architectural misalignment and exploitable coordination issues. The paper proposes a domain-agnostic framework for semantic analysis and discusses future research directions.
category: research

article: Generative AI and the Transformation of Software Development Practices
authors: Vivek Acharya
date: 2025-10-12
desc.: The paper examines how AI-assisted techniques are transforming software engineering practices, alongside the emerging challenges of trust and hallucination. The paper considers current key concepts of LLM utilization, including multi-agents, dynamic prompt orchestration, Model Context Protocol (MCP), and assisted coding. The paper discusses psychological aspects of skill set transformation and identifies multiple areas for future investigation.
category: research

article: Automatic Building Code Review: A Case Study
authors: Hanlong Wan, Weili Xu, Michael Rosenberg, Jian Zhang, Aysha Siddika
date: 2025-10-03
desc.: The paper presents a novel agent-driven framework for Automated Code Review (ACR) that integrates Building Information Modeling (BIM) data extraction with agent-orchestrated workflows and existing check tool engines. The paper presents a case study developed in cooperation with the US Department of Energy.
category: research

article: When MCP Servers Attack: Taxonomy, Feasibility, and Mitigation
authors: Weibo Zhao, Jiahao Liu, Bonan Ruan, Shaofei Li, Zhenkai Liang
date: 2025-09-29
desc.: While Model Context Protocol (MCP) servers enable AI applications to connect to external systems in a plug-and-play manner, they create new attack vectors requiring consideration. The lack of standardized mechanisms increases this urgency. This paper addresses three research questions: 1. what types of attacks malicious MCP servers can launch, 2. how vulnerable MCP hosts and Large Language Models (LLMs) are to these attacks, and 3. how feasible these attacks are in practice. The paper proposes a component-based taxonomy comprising twelve attack categories.
category: research

article: Privacy in Action: Towards Realistic Privacy Mitigation and Evaluation for LLM-Powered Agents
authors: Shouju Wang, Fenglin Yu, Xirui Liu, Xiaoting Qin and others
date: 2025-09-22
desc.: The increasing autonomy of LLM agents in handling sensitive communications, accelerated by the Model Context Protocol (MCP) and Agent2Agent (A2A) frameworks, creates urgent privacy challenges. This paper presents PrivacyCheck, which aims to reduce privacy leakage from approximately 35% to 7%, depending on the model. The paper also proposes additional mitigation strategies to improve privacy in the emerging agentic ecosystem.
category: research

article: Cuckoo Attack: Stealthy and Persistent Attacks Against AI-IDE
authors: Xinpeng Liu, Junming Liu, Peiyu Liu, Han Zheng and others
date: 2025-09-19
desc.: The utilization of agentic AI systems introduces a new critical attack surface, including development environments (IDEs). The paper proposes the Cuckoo Attack, a novel attack capable of stealthy and persistent command execution by embedding malicious payloads into configuration files. The paper shows that the impact may extend beyond compromising the individual developer environment.
category: research

article: Tractable Asymmetric Verification for Large Language Models via Deterministic Replicability
authors: Zan-Kai Chong, Hiroyuki Ohsaki, Bryan Ng
date: 2025-09-14
desc.: The landscape of Large Language Models (LLMs) is shifting rapidly toward dynamic, multi-agent systems. This introduces a fundamental challenge in establishing computational trust between agents to ensure that information is not corrupted. This paper introduces a probabilistic audit approach within a defined context to ensure information integrity in multi-agent systems. The paper presents simulation achievements and proposes directions for future research.
category: research

Previous JC-AI Newsletters

The post JC-AI Newsletter #8 appeared first on foojay.

foojay – a place for friends of OpenJDK

No Keys, No LLM: Building a Wikidata Definition API with Embabel

TL;DR

No Keys, No LLM: Building a Wikidata Definition API with Embabel

Part I — Concepts

I.1 Embabel

I.2 Spring AI (even in a “no LLM” demo)

I.3 Role of Embabel in this application

I.4 Wikidata: definition and why it’s ideal for demos

Part II — App building (code + explanations)

II.1 Maven setup (pom.xml)

II.2 Configuration (application.yml)

II.3 App launcher + Embabel enablement + NOOP LLM registration

II.4 The NOOP ChatModel (Spring AI)

II.5 Domain model (Java records)

II.6 Repository: Wikidata calls with RestClient

II.7 The Embabel agent (actions + goal)

II.8 Service: running the agent via AgentInvocation

II.9 Controller: a single endpoint

Part III — Demo

III.1 Curl request

III.2 Response

III.3 Logs: the agentic part

Part IV — Conclusion and extensions

1) Disambiguation

2) Multi-language

3) Confidence score

4) Caching and rate limiting

5) Multi-source enrichment

6) Optional LLM post-processing (when needed)

Shaping Jakarta Agentic AI Together – Watch the Open Conversation

What is Jakarta Agentic AI?

What we discussed in the session

Why this matters for the Jakarta ecosystem

Watch the recording and get involved

Watch the Recording: DIY Technical Marketing for Java Developers

JC-AI Newsletter #13

First Experiments with Java on the LattePanda IOTA: An Alternative to Raspberry Pi?

Unboxing the LattePanda IOTA

Assembly

Setting Up The Board

First Boot: Windows Pre-installed

Installing Ubuntu

Setting Up Java Development

Testing Java, JavaFX, and Pi4J

HelloWorld with JBang

JavaFX Test

Pi4J Test

Performance Check

Conclusion

JC-AI Newsletter #11

JC-AI Newsletter #10

Micrometer & Prometheus in Spring Boot: Kafka Burger Orders🍔📨

TL;DR

Why Micrometer and Prometheus?

What the Burger Orders App Does

The Data Contract (Avro)

Hot Spots: Minimal Code You Need

1) Expose a Counter with Tags (Micrometer)

2) REST Controller → Produce to Kafka

3) Kafka Consumer → Count “DukeBurger”

4) Avro Bytes → Object (utility)

Application Properties (essentials)

Run & Observe

Takeaways

Conclusion

Further Reading (Foojay)

Recommended Courses (go further with certification)

References

JC-AI Newsletter #9

JC-AI Newsletter #8

II.1 Maven setup (`pom.xml`)

II.2 Configuration (`application.yml`)

II.8 Service: running the agent via `AgentInvocation`