Soham Dasgupta, Author at foojay

Context Is a Budget — Eight levers and three workflow patterns

Soham Dasgupta — Fri, 22 May 2026 12:52:06 +0000

Table of Contents

Where the tokens actually goThe Eight Levers

A. Context engineering — scope your asks
B. Prompt caching — order matters
C. Tool & MCP hygiene — every schema is a tax
D. Custom instructions & skills — codify it once
E. Model routing — start cheap, escalate when stuck
F. Output discipline — diffs, not novels
G. Repo hygiene — what the indexer sees
H. Observability — latency is your token meter

Three workflow patterns that compound

1. The Ralph Wiggum loop
2. Auto-compact
3. Planner → Implementer → Reviewer (agent handover)

The Monday checklistClosing

Eight levers and three workflow patterns that pay for themselves in a week.

A team of fifty developers can quietly burn $30,000 a month on AI coding assistants without anyone noticing. Premium-request quotas vanish by the third week. The bill arrives. Nobody has a story for where it went.

The cost is the obvious pain. The other two are sneakier:

Latency. Bigger contexts take longer. The model thinks more, but you also wait more.
Context rot. This is the surprising one. Anthropic and Chroma have both shown that as the context window fills up, model recall and reasoning degrade — even well inside the advertised window. The 200K-token model is genuinely worse at the 150K mark than at the 20K mark. More context is not free; past a point, it's actively harmful.

The mental model that fixes all three: stop treating context as a free buffet. Treat it as a budget you spend on every turn.

This post is a practical guide to spending it well: where the tokens actually go, eight levers that move the needle, and three workflow patterns that compound on top of them.

Where the tokens actually go

Every request to a coding assistant is a stack of buckets. The shape varies by tool and session, but it tends to look like this:

Bucket	Typical share	Notes
System prompt / instructions	5–15%	Boilerplate that's been copy-pasted for months
Tool / function schemas	10–40%	Re-sent on every turn
Retrieved files & code chunks	20–60%	The biggest lever, almost always
Conversation history	10–30%	Grows linearly until you compact it
Model output	5–20%	Verbose prose is expensive to produce and to read

A few things to notice:

Tool schemas dominate more than people expect. Five connected MCP servers can easily contribute 5,000–10,000 tokens to every request before you've typed a word. The model doesn't have to use the tool — the schema ships either way.
Conversation history grows without bound. A 30-turn chat is paying for the first 29 turns on every new question, plus your fresh one.
Output is small in volume but expensive per token. On most direct APIs, output tokens cost three to five times input tokens. A reply that says "Sure! Let me explain what I'm about to do…" before doing it is pure tax.

Rule of thumb: profile your own traffic before optimizing. The bucket dominating your sessions is rarely the one your gut says.

In a Copilot context, you can't see token counts directly — but you can see the symptom. Open Output → "GitHub Copilot Chat" and watch the ccreq lines: each one shows the model, latency, and request type per turn. When the same question takes three times longer in chat #2 than chat #1, you've just watched your token meter the entire time.

The Eight Levers

These aren't in priority order — they're in the order you'd naturally encounter them in a session. The first three (context, caching, tools) are about the request shape. The next three (instructions, model, output) are about how you talk to the assistant. The last two (repo, observability) are the foundations that make all of the others stick.

A. Context engineering — scope your asks

The single biggest waste in most AI workflows is asking vague questions of agent-mode chat with full codebase access. The agent dutifully explores, reads ten files to find the two it needed, summarizes them all, and then answers. You pay for every step.

Compare:

Bad: "Refactor the order confirmation email to use the new template engine."
The agent opens four files under src/main/java/com/example/demo/email/, reads WelcomeEmailService.java for context it doesn't need, considers whether a templates/ resource directory should exist, and proposes a sprawling diff that renames a method on the way through.

Good: "Refactor #file:src/main/java/com/example/demo/email/OrderConfirmationService.java to call render on #file:src/main/java/com/example/demo/email/TemplateEngine.java instead of renderLegacy. Keep behaviour identical."
The agent opens two files. The diff is three semantically meaningful lines. The whole turn is roughly a tenth of the cost.

Specificity is free. Every #file: (Copilot) or explicit path (Claude Code) you provide is a chunk the agent doesn't have to find. Every "keep behaviour identical" is a sentence of guard-rails that prevents a 200-line side quest.

Do this Monday: make #file: your default. Use agent-mode-with-broad-retrieval only when you genuinely don't know what you don't know.

B. Prompt caching — order matters

Every major provider supports prompt caching now. Anthropic and OpenAI both charge roughly 10% of base input cost for cache hits. Google's Gemini does it explicitly. The mechanism is the same: a stable prefix at the front of your prompt is cached after the first request and read back cheaply on subsequent ones.

The cost discipline is therefore about order:

[ tool definitions ]    ← rarely change         ┐
[ system prompt ]       ← rarely changes      │ cache these
[ skills / rules ]      ← stable per repo           ┘
[ retrieved files ]     ← changes per task
[ conversation ]        ← changes every turn

Static at the top, dynamic at the bottom. The longest stable prefix you can construct is the most cacheable one.

The classic anti-pattern is innocent-looking and brutal is to have dynamic values/variables part of your instructions, custom agent files. It will most likely busts the cache on every request. You will pay full price for the same 10 KB of preamble all day. The fix is to push dynamic content down into the user message or tail of the prompt.

Do this Monday: audit the first 200 tokens of your system prompts. Anything that changes per-request belongs further down.

C. Tool & MCP hygiene — every schema is a tax

Each connected tool ships its full JSON schema with every request. A typical MCP server with 8–15 tools costs 400–2,500 tokens per turn. Five servers connected? You may be paying 5,000–10,000 tokens per turn for tool definitions the model never invokes.

Treat MCP servers like browser extensions: useful, but only the ones you actually need today.

// .vscode/mcp.json — keep this short
{
  "servers": {
    "filesystem": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-filesystem"] }
    // disable github, playwright, brave-search, etc. when you don't need them
  }
}

The same discipline applies to the tools you build yourself. A tool that returns { id, summary } is cheap. A tool that returns a 50-field JSON object is expensive — the model re-processes all 50 fields on every turn it's referenced. Default to compact responses with optional ?expand=... for the rare caller that needs the rest.

Do this Monday: open MCP server list, disable everything you didn't actively use this week. Re-enable on demand.

D. Custom instructions & skills — codify it once

Anything you find yourself re-typing in chats belongs in an instructions file. The exact filename varies — .github/copilot-instructions.md, CLAUDE.md, AGENTS.md, Cursor Rules — but the principle is identical: write your team conventions once, commit them, and let every chat in the repo inherit them.

A small example is worth more than a long one:

Six lines. Now no chat in this repo will propose Jest, no chat will dump a whole-file rewrite when a diff would do, and no chat will preface its answer with "Sure! Let me explain what I'm about to do…"

For stack-specific rules, use path-scoped instructions. In Copilot:

---
applyTo: "src/main/java/**/*.java"
---
# Test conventions for src/
- Use JUnit 5 via `mvn test`.
- Tests mirror the source tree under `src/test/java/...` as `Test.java`.

This file is loaded only when a matching file is in scope. Repo-wide rules go in the global instructions; stack-specific rules go in scoped ones. Both are committed, both are versioned, both are team artifacts — not personal preferences buried in someone's IDE settings.

Do this Monday: check what you've typed into chat windows in the last week. Anything that reappeared more than twice is a candidate for an instructions file.

E. Model routing — start cheap, escalate when stuck

Routine tasks pick the most expensive model by default if you let them. You probably just paid 10× for the same answer.

A defensible default routing table:

Task	Model	Multiplier (Copilot)
Inline completions, simple chat	Cheapest available (e.g. GPT-4.1)	0×
Real coding work	Mid-tier (GPT-5 / Claude Sonnet)	1×
Long-context refactor / agent mode	Mid-tier with long context	1×
Genuinely hard reasoning	Top-tier (Claude Opus)	10×

The rule is: start cheap, escalate only when stuck. "Stuck" means you've tried the mid-tier model with good context and it's plainly missing the point — not "I want to feel sure," not "I have time to spare."

The math compounds. A team of fifty doing twenty agent runs each per day at 10× costs five times more than at 2× — for the same diffs, on most days.

Do this Monday: pin your default to the mid-tier. Make Opus a deliberate choice with a reason.

F. Output discipline — diffs, not novels

Every model has a "let me explain what I'm about to do" reflex. It's polite. It's also pure cost.

Same fix, two ways to ask:

"In templateEngine.js, the welcome template is missing an exclamation mark. Show me the updated file."
→ 30 lines back. (With a 600-line file, 600 lines.)

"In templateEngine.js, the welcome template is missing an exclamation mark. Reply with a unified diff only, no commentary."
→ 5 lines back.

Output tokens are typically three to five times the price of input tokens on direct APIs. In a per-request model like Copilot's, verbose output still hurts: it increases latency, fills the context for subsequent turns, and evicts earlier useful content sooner.

The leverage is in the system prompt. Two lines in copilot-instructions.md make every chat in the repo behave better forever:

- Be concise. No preamble.
- Prefer diffs over full files.

Do this Monday: add those two lines.

G. Repo hygiene — what the indexer sees

The indexer that powers retrieval respects .gitignore. Tighten it.

+target/
+*.class
+*.jar
+.idea/
+*.iml
 *.log

Important gotcha: if a file is already tracked in git, adding the path to .gitignore does not untrack it — the indexer still sees it. You also need:

git rm --cached target/demo-0.0.1-SNAPSHOT.jar

For secrets, fixtures, and vendored deps, use content exclusions at the org/repo level (most coding-assistant providers expose this).

The other half of repo hygiene is summary comments at the top of each module:

// TemplateEngine — central renderer. Use render(id, data) for new emails.
// renderLegacy(id, data) is deprecated and only used by OrderConfirmationService.
// Templates registered: welcome, order_confirmation_v2.

Three lines, ~50 tokens. Now "what does the template engine do?" can be answered without reading the rest of the file. A 200-token summary at the top of each module beats re-reading 5,000 tokens of code, every single time.

Do this Monday: git rm --cached whatever shouldn't be indexed; add three-line summaries to your top-of-mind modules.

H. Observability — latency is your token meter

You can't see Copilot's token counts. You don't need to. Use the proxy you already have:

Reply latency	≈ Input tokens
< 5 s	20 s	Near limit — start a new chat

When the same question takes three times longer in your fourth chat than in a fresh one, you've just watched your context bloat in real time. The fix is "new chat with a summary," not "wait it out."

You can also lint for context bloat the same way you lint for bundle size. A 30-line script in CI is enough to catch the most common regressions:

// fail if any .github/instructions/*.md exceeds 150 lines
import { readdir, readFile } from "node:fs/promises";
const files = (await readdir(".github/instructions")).filter(f => f.endsWith(".md"));
let failed = false;
for (const f of files) {
  const lines = (await readFile(`.github/instructions/${f}`, "utf8")).split("\n").length;
  console.log(`${lines > 150 ? "❌" : "✅"} ${f}: ${lines} lines`);
  if (lines > 150) failed = true;
}
if (failed) process.exit(1);

Wire it into CI and context bloat stops accumulating silently across PRs.

Do this Monday: put a stopwatch next to your editor for one day. Count "Amsterdam" (not Mississippi's) . You'll know which chats to rotate.

Three workflow patterns that compound

The eight levers above shrink the cost of an individual turn. These three patterns shrink the number of expensive turns. Apply them on top.

1. The Ralph Wiggum loop

Named after the Simpsons character whose superpower is relentless dumbness. The recipe is unglamorous on purpose:

Write a TODO.md with checkbox tasks.
Open agent-mode chat with a cheap model.
Tell it: "Read TODO.md. Pick the first unchecked item. Implement only that. Run npm test. If green, check the box and commit. Pick the next. Repeat."

That's it. The agent burns through the list one item at a time.

Why it works:

Each iteration starts with a small, fresh context. The chat history isn't growing the way it would in a free-form conversation.
State lives on disk (TODO.md and git commits), not in conversation tokens.
A cheap model is good enough, because each task is small and self-contained.
It's restartable. Kill the chat halfway, start a new one, run the prompt again — it picks up where it left off.

After it runs, git log --oneline reads like a changelog: one commit per task, message starts with the task title, easy to revert any one step. Compare with the typical "fix things" mega-commit and you'll never go back.

2. Auto-compact

Most assistants don't compact aggressively on their own. You have to drive it.

When a chat hits 60–80% of the context window (you'll know — replies start to crawl), stop and ask:

Summarize what we've discussed: the goal, files we've touched, decisions made, open questions, and the next step. Keep it under 300 words and use bullet points.

Save the output to plan.md. Open a brand new chat. Attach it:

Continue from #file:plan.md. The next step is…

The new chat's first request is dramatically smaller than the old chat's last one. The model picks up the thread without missing a beat. Roughly: a 4 KB summary keeps 95% of the signal at 3% of the cost.

The bonus pattern: that summary file becomes a stable, cacheable prefix. Every future chat that references it benefits from prompt caching on top of the compaction. Two compounding wins for one summarization.

If you are interested in a sofisticated implementation of compaction, check this skill which is used by some of the custom agents.

Rule of thumb: one task per chat. New task → new chat with summary attached.

3. Planner → Implementer → Reviewer (agent handover)

This is the one that changes how features get built. Three short, focused chats with three different model choices and one shared artifact:

Planner — expensive model, one call. Reads the feature request, produces plan.md with goal, acceptance criteria, tasks, files expected to change, out-of-scope items, and risks. No code yet.
Implementer — cheap model, agent mode, fresh chat. Sees only plan.md. Runs a Ralph loop on it: pick first unchecked task, implement, test, check the box, commit, repeat.
Reviewer — expensive model, fresh chat. Sees only plan.md and the diff. Marks each acceptance criterion PASS or FAIL, lists bugs, smells, out-of-scope edits. Ends with VERDICT: APPROVE or VERDICT: REQUEST CHANGES.

Three chats, ~5–8 premium requests total for an end-to-end feature. Compare with one mega-chat using the most expensive model the whole way: easily 30+ requests at 10× the multiplier.

The crucial discipline: the handover artifact (plan.md, the diff, the review notes) is the only thing that crosses the boundary. Never chat history. That's how you keep each agent's context small, focused, and cheap.

The Monday checklist

Pin this to your team's wiki. Take what's useful, ignore the rest.

Repo setup

[ ] Add a top-level instructions file (copilot-instructions.md, AGENTS.md, CLAUDE.md, or your tool's equivalent) with build, test, lint, conventions, and output-style rules.
[ ] Add path-scoped instruction files for stack-specific rules (e.g. test conventions under src/).
[ ] .gitignore build outputs, snapshots, and large fixtures. git rm --cached anything already tracked.
[ ] Add three-line "what does this module do" summary comments to your top 10 modules.
[ ] Add a CI lint that fails if instruction files exceed ~150 lines or prompt files exceed ~250 lines.

Per-session habits

[ ] Disable MCP servers you don't need this session. Re-enable on demand.
[ ] Default to a mid-tier model. Escalate to a top-tier model only when stuck — and only with a reason.
[ ] Use #file: (or your tool's equivalent) instead of broad-retrieval / agent mode for scoped tasks.
[ ] Ask for diffs, not full files.
[ ] Start each new task in a fresh chat.
[ ] When responses start to crawl (~60% context), summarize to a plan.md and continue in a new chat.

Workflow patterns to try this week

[ ] Run a Ralph loop on a TODO.md of small chores.
[ ] Use the planner / implementer / reviewer split for one real feature. Notice the request count.
[ ] Treat latency as your token meter. Count Amsterdam for one day.

Closing

The mindset shift is small and the wins are not.

Prompt engineering used to be about clever phrasing. Context engineering — what this post was really about — is about what's in the window and what isn't. Smaller prompts, fewer tools, scoped retrieval, summaries instead of histories, cheap models for cheap work, expensive models for the rare hard parts.

None of it is novel. None of it is hard. Most teams don't actually have a token problem; they have a discipline problem. The levers are boring. The compounding is real: a team that adopts even half of the above will see latencies fall, premium-request burn drop noticeably, and, counterintuitively, answer quality go up, because the model isn't drowning in irrelevant context.

One sticky line to take with you:

The worst tokens are the ones you're paying for and not noticing.

Watch your ccreq lines. Count Amsterdams. Spend the budget like it's yours.

The post Context Is a Budget — Eight levers and three workflow patterns appeared first on foojay.

Why is my Talk selected? Reflections from a Program Committee Reviewer

Soham Dasgupta — Wed, 14 Jan 2026 15:14:56 +0000

Table of Contents

Speaker/Talk related reasons
Organization/Program related reasons

If you are like me, get the adrenaline rush of getting up on stage and can’t wait to share things that you experienced and learned, then submitting talks to conferences and local meetups excites you.

But as much as you are proud and happy when you are accepted to present, normally there is a lot of work before that for the reviewers and organizers to find the right presentation that resonates with the attendees and fits the whole program.

Drawing on my experience as a reviewer for Voxxed Days Amsterdam over the past two years, as well as organizing Microsoft internal meetups and roundtables and my previous involvement with Capgemini’s Java community, I have dedicated significant time to selecting presentations and talks.

So, this is my humble effort to share a few criteria that have guided me (and others) in choosing your submissions for inclusion. This is not an exhaustive list but what we see most.

You have a story to tell

Everyone has a story to tell, but what makes your story worth listening to? This is a question I often ask myself, both when reviewing submissions and when preparing my own.

For me, this consideration carries significant weight, as it is only natural for a speaker to share their personal experiences, highlighting what has worked well and what has not. Your talk stands out because of your unique experiences, including both achievements and challenges with a specific technology, framework, or methodology. By sharing these insights, your presentation offers genuine value and learning opportunities for your audience.

It is also important to mention that, as reviewers, we actively welcome a percentage of new speakers, and ideally I (or we) look for you to share your story, how you implemented a technology, framework, or methodology, what you learned along the way, and what your learning journey looked like.

You are a contributor to the topic you are presenting

When you are directly involved with the creation, maintenance, or significant contribution to the topic of your talk, your presence becomes invaluable to the audience. Attendees are eager to hear insights “straight from the horse’s mouth,” as your knowledge is both current and authoritative.

Presenting as a contributor means you can address highly specific questions, share in-depth understanding, and help solve real-world issues that the audience may be facing. This opportunity for direct engagement is particularly attractive for organizers, as it encourages meaningful interaction between users and experts, helping to draw larger crowds to conferences and events.

There is something for the audience to learn

When evaluating a submission, I always consider what the attendees will gain from the experience. The crucial point is the take-away, what practical insights or knowledge will participants leave with?

It is important to highlight subtle elements involved in using a particular technology, framework, or methodology, such as lesser-known features or important considerations when integrating it with other systems. Ultimately, if someone is dedicating their time to your talk, it should provide them with genuinely useful information and value.

Your talk is fun, quirky, and inspiring

One of the reasons why attendees are attending a tech conference or meetup is the opportunity to be inspired in their hunger for all things technical. If your presentation has this sense of fun and nerdiness, it can be a compelling factor for your selection as a speaker.

For example, sharing a project that is both technically intriguing and entertaining not only captures the audience’s imagination but may also encourage them to try on their own creative or experimental journey. Even in instances where there is not a concrete takeaway for the participants, the enjoyment and inspiration delivered through your talk can motivate others to innovate, to build or break something thus making your contribution valuable and memorable.

You are a good speaker and have a history of delivering quality talks

If your topic resonates with the reviewers and organizers but there are multiple similar submissions, the strength of your delivery experience and history can become a decisive factor in the selection process. Reviewers and organizers typically assess your credentials as a speaker in three main ways:

You have an established presence within the community, and members of the selection committee have seen you present one or more times before. This prior visibility gives confidence in your ability to engage and inform an audience effectively.
When you include a recording of your previous talks as part of your submission, reviewers have the opportunity to directly observe your presentation style, content delivery, and audience engagement. This tangible evidence can strongly support your application.
If your talks are easily searchable online, organizers can independently reference your past presentations. This accessibility helps them gauge your experience and the reception of your previous sessions, further informing their decision. But make sure, if possible, provide a recording link of the talk you are submitting or one of your previous talks, because sometimes, helping the reviewers and organizers to find everything in one place helps your talk being selected as well.

Your topic is related to the conference/meetup target audience

It is straightforward, but nonetheless worth mentioning, that a .Net-focused conference is unlikely to select a talk centered around Java internals. Organizers adhere to a specific theme as this helps attract the intended audience. However, your presentation may still be relevant to the programme if it offers comparative insights between different technologies.

For instance, at a Java conference, a session exploring which language features are absent in Java compared to .Net, or vice versa, could be of genuine interest to attendees.

Unique topic

Even if your submission does not meet every single criterion, having a unique topic can make a significant difference.

Whether your proposal focuses on technology adoption, promoting diversity, technology for good, or sharing tips and addressing barriers in career progression, such subjects often fall under the Community & Career category. However, these themes can also be relevant and valuable for other tracks within the conference or meetup programme.

Where are you located

When organizers are considering which speakers to select, your location can play a role in the decision-making process. While many conferences are able to cover travel expenses for their presenters, there are also plenty that cannot always provide such support.

In these cases, organizers often favor speakers who are based nearby or at least on the same continent. If you are a local expert with a relevant topic, you may be given preference over someone who would need to travel internationally. This is not only a matter of cost, but also convenience and reliability for the event’s planning.

Your employer needed to be represented

Occasionally, it is necessary for organizers and reviewers to ensure that a particular organization or product company is represented within the conference programme. This is not solely due to sponsorship considerations.

Instead, it may be the case that certain products or companies are closely aligned with the intended theme of the event, and their inclusion is seen as a comprehensive exploration of the subject matter. In these instances, representation from specific organizations helps to maintain the breadth of the programme, ensuring that key technologies, products or perspectives are adequately covered.

Conference/Meetup Programme Schedule

There are a couple of key reasons why your talk might be included in the schedule for a conference or meetup.

Sometimes, organizers are looking to feature presentations that fit a particular format or topic. This could mean talks that are time bound, such as brief 5- or 15-minute sessions, or presentations focused on a specific subject, for example, discussions around Agentic AI frameworks.

On occasion, there may not be enough submissions for certain tracks or presentation types. If your submission matches one of these underrepresented areas, it stands a higher chance of being selected for the programme.

AI was used to restructure and refine the grammar and sentences, it had no influence on the idea or the structure of this post.

Special Thanks to: Marit van Dijk, Julien lengrand-lambert, Sander Mak, Simone de Gijt, Kaya Weers, Ko Turk and Wilco Burggraaf for helping me shape this post.

The post Why is my Talk selected? Reflections from a Program Committee Reviewer appeared first on foojay.

Foojay Podcast #49: JCON Report, Part 1 – JUGs, Communities, Open Source, Generative AI, LangChain4j, Machine Learning

Frank Delporte — Tue, 21 May 2024 06:05:39 +0000

Table of Contents

VideoPodcast AppsContent

On Tuesday, May 14th, the Foojay Podcast went live at the JCON conference in Cologne, Germany, to talk with speakers and visitors about all things Java.

We had so many amazing talks that we will combine them into several podcast episodes in the next weeks.

This is part 1 about JUGs, Communities, Open Source, Generative AI, LangChain4j, Machine Learning!

Video

Podcast Apps

You can listen and subscribe to the Foojay Podcast on:

Spotify
Apple Podcasts
And most others...

Content

00:26 Geertjan Wielenga: Founding father of Foojay.io
https://www.linkedin.com/in/geertjanwielenga/
01:18 Markus Kett: Organizer JCON and JUG Oberpfalz
https://www.linkedin.com/in/markuskett/
04:47 Richard Fichtner: Organizer JCON and JUG Oberpfalz
https://www.linkedin.com/in/richardfichtner/
07:04 Jonathan Vila: Organizing Communities, JUGs, and events + Sonar, how can tools be both available for free and still make a profit as a company
https://www.linkedin.com/in/jonathanvila/
14:55 Soham Dasgupta: Community spirit, Talks about Generative AI
https://www.linkedin.com/in/dasguptasoham/
21:29 Mary Grygleski: Volunteer at JCON, Organizing Chicago JUG, Talks about Generative AI
https://www.linkedin.com/in/mary-grygleski/
30:16 Mohammed Aboullaite: Java and Machine Learning and training models
https://www.linkedin.com/in/aboullaite/
37:16 Simon de Groot and Richelle Bussenius: Organizing NLJUG, conferences, communities, and Masters Of Java
https://www.linkedin.com/in/simon-de-groot-ab832a169
https://www.linkedin.com/in/richellebussenius

The post Foojay Podcast #49: JCON Report, Part 1 – JUGs, Communities, Open Source, Generative AI, LangChain4j, Machine Learning appeared first on foojay.

State of Open (Source?!) and Free AI – a FOSDEM recap

Soham Dasgupta — Thu, 15 Feb 2024 13:04:44 +0000

Table of Contents

FOSDEM
What is Open (Source) AI?
Why Free and Open?
What are the components of an AI system?
State of “Open”-ness in AI systems
What is AI system Specification?
TLDR;
References

Disclaimer: This article is on the things I learned/observed spending the day in AI and Machine Learning Developer Room at FOSDEM 24. Opinions and statements are mine and have nothing to do with my employer. This article might raise more questions than answers, but in my opinion, we all need more awareness on this topic and get familiar with the (right) questions that are to be answered.

FOSDEM

FOSDEM (Free Open-Source Developers’European Meeting) is a community-organised event that is free and non-commercial. The aim is to provide a venue for free and open-source software developers and communities to:

connect with other developers and projects.
learn about the newest trends in the free software world.
learn about the newest trends in the open-source world.
listen to interesting talks and presentations on diverse topics by project leaders and committers.
to encourage the development and benefits of free software and open-source solutions.

There were 35 devrooms, ranging from Java, Containers, Go, Rust, Network, Community, and other various topics. Although I am a huge fan of Java and OSS eco-system around it, but I went to FOSDEM this year specifically to understand and discuss about the state and direction of AI in Free and/or Open-Source world. And this article is about that.

“An AI system is a machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations, or decisions influencing real or virtual environments. AI systems are designed to operate with varying levels of autonomy.” – Open-Source Initiative, AI definition

What is Open (Source) AI?

To be Open Source, an AI system needs to make its components available under licenses that individually grant the freedoms to:

Study how the system works and inspect its components.
Use the system for any purpose and without having to ask for permission.
Modify the system to change its recommendations, predictions, or decisions to adapt to your needs.
Share the system with or without modifications, for any purpose.

The Golden Rule applies “also” to AI > If I like an AI system, I must be free to share it with other people. (Reference #4)

Why Free and Open?

The term ‘open source’ means software that is available on an open-source licence that lets anyone see the source code or the code that humans can read and allows anyone using the code on that licence to keep and change the code. They can do this by themselves, or with a skilled third party they choose. The Open-Source Initiative must approve open-source licenses.(Reference #1, #2)

"Free software" is a different term though and it means any piece of software that doesn't cost anything, but there is a difference between free and open-source software. Because open-source software is not only free in terms of money---"free" also means the freedom open-source software gives its users by being easy to modify and more transparent. (Reference #2, #3)

There is a general emphasis on ethics and morals in the open-source community with how developers treat their users. While it's not a sure thing, this can help to make sure you're getting the best experience possible without being exploited for private data. And because the source code is public, it is easy for knowledgeable users to find out if the developers are doing something untrustworthy. (Reference #2, #3)

The supply-side value of widely used Open-Source Software (OSS) is $4.15 billion, but that the demand-side value is much larger at $8.8 trillion.(Reference #5) To put some perspective, this amount is 30% more than the total federal budget of USA in 2023.(Reference #6)

What are the components of an AI system?

It was easy to categorize a software or the code behind and although it had its complications but the definition of components in a traditional software is straightforward. But it becomes very complicated when we try to define the same for an AI system.

A (current possible) identified components of an AI system:(Reference #7)

Data
a. The data on which it is trained.
b. Description of it.
c. Collection methodologies.
d. Hosting options and costs.
e. Transparency of data quality.
f. Ability of opting out.
Code
a. Data cleaning/processing related.
b. Actual training code.
c. Assumptions/pre-reqs related to the implementation.
External
a. Specification of hardware on which it is trained.
b. Time spent on training.
c. Configurations.
d. Definition of correctness.
Output
a. Model it produces.
b. Binary data it comprises of.
c. Tasks or results it generates.

This also implies, that the definition of FREE and OPEN might be different for each component or a sub-set of a component. For example, a model which identifies early-stage cancer based on X-Ray or MRI images might want to shield the data it is trained on due to privacy regulations, but at the same time can have the rest of the components FREE and/or OPEN. Modification to this model by the community would be defined differently.

State of “Open”-ness in AI systems

Currently there is no proper definition of open-ness for AI systems, and they fall under a big spectrum.(Reference #8)
And for reasons mainly of ethical consideration and on how to engage with whole or parts of AI system, a definitive guide is needed.

Mostly now, the access and usage of an AI systems is managed by individual or additional license restriction.

But this imposes barriers against use, difficulties to adopt and improve, problem in control over the technology and weak oversight and transparency.

What we need is:

Open-ness in AI.
Interoperable licenses with possibilities of making it free.
Accessibility, Reusability and Sustainability of AI systems.
Ethical compliance to fall under purview of regulations and not software licenses.

What is AI system Specification?

Open-Source shows that when you eliminate the obstacles to learning, using, sharing and enhancing software systems, everyone benefits. These benefits come from using licenses that follow the Open-Source Definition. The benefits can be expressed as autonomy, transparency, and cooperative improvement. They are necessary for everyone in AI. We need basic freedoms to help users create and use AI systems that are trustworthy and clear.(Reference #4)

The current draft version is here > The Open Source AI Definition – draft v. 0.0.5 – Open Source Initiative and it follows the definition of AI system adopted by the Organization for Economic and Co-operation Development (OECD).
For each AI systems (such as Pythia, Llama, BLOOM, Mistral, Phi2, Olmo etc.) the Specification target to define:

What do you need to give an input and get an output?
What do you need to give an input and get a different output?
What do you need to understand why given an input, you get that output?
What do you need to let others give an input and get an output?
What’s the preferred form to make modifications to an AI system?

The plan and schedule of Open Initiative about this spec is to have a release candidate (RC) at the end of October’24.

Stakeholders engaged in this varies from system and license creators, regulators, end users and the subject.

Ongoing and following tasks of this spec for Open-Source Initiative are:

more publicity to the process
- public discussion forum https://discuss.opensource.org
- bi-weekly townhalls
- more opportunities to volunteer.
reach out to more stakeholders.
raise funds for 2024 meetings.
setup the board for review and approval of v. 1.0.

The drafts can be found at > Drafts of the Open Source AI Definition – Open Source Initiative

TLDR;

What is Open-Source AI and why it matters: Open-Source AI is an AI system that allows anyone to study, use, modify, and share its components under licenses that follow the Open-Source Definition. Open-Source AI matters because it offers benefits such as autonomy, transparency, and cooperative improvement, and it helps to create and use AI systems that are trustworthy and clear.

What are the components of an AI system and how to define their openness: An AI system is composed of data, code, external factors, and output, which can have different levels of openness depending on the licenses and specifications that apply to them. The openness of an AI system can be defined by the freedoms that it grants to its users and the transparency that it provides about its functioning and outcomes.

What are the challenges and barriers for Open-Source AI: Open-Source AI faces challenges and barriers such as privacy, quality, interoperability, and ethical compliance of its components, especially data and output. Moreover, Open-Source AI may face difficulties to adopt and improve due to individual or additional license restrictions, lack of control over the technology, and weak oversight and transparency.

What is the Open-Source AI Definition and its goals: The Open-Source AI Definition is a draft specification by the Open-Source Initiative that aims to provide a clear and consistent way to assess the openness of an AI system and its components. The goals of the specification are to encourage the development and benefits of Open-Source AI, and to ensure that AI systems respect the basic freedoms of their users.

What is the Open-Source AI Specification and how to use it: The Open-Source AI Specification is a set of questions that help to evaluate the openness of an AI system and its components, based on the freedoms to study, use, modify, and share them. The specification can be used by system and license creators, regulators, end users, and subjects to understand and engage with different aspects of an AI system.

References

The post State of Open (Source?!) and Free AI – a FOSDEM recap appeared first on foojay.

Soham Dasgupta, Author at foojay

Context Is a Budget — Eight levers and three workflow patterns

Where the tokens actually go

The Eight Levers

A. Context engineering — scope your asks

B. Prompt caching — order matters

C. Tool & MCP hygiene — every schema is a tax

D. Custom instructions & skills — codify it once

E. Model routing — start cheap, escalate when stuck

F. Output discipline — diffs, not novels

G. Repo hygiene — what the indexer sees

H. Observability — latency is your token meter

Three workflow patterns that compound

1. The Ralph Wiggum loop

2. Auto-compact

3. Planner → Implementer → Reviewer (agent handover)

The Monday checklist

Closing

Why is my Talk selected? Reflections from a Program Committee Reviewer

Speaker/Talk related reasons

You have a story to tell

You are a contributor to the topic you are presenting

There is something for the audience to learn

Your talk is fun, quirky, and inspiring

You are a good speaker and have a history of delivering quality talks

Organization/Program related reasons

Your topic is related to the conference/meetup target audience

Unique topic

Where are you located

Your employer needed to be represented

Conference/Meetup Programme Schedule

Foojay Podcast #49: JCON Report, Part 1 – JUGs, Communities, Open Source, Generative AI, LangChain4j, Machine Learning

Video

Podcast Apps

Content

State of Open (Source?!) and Free AI – a FOSDEM recap

FOSDEM

What is Open (Source) AI?

Why Free and Open?

What are the components of an AI system?

State of “Open”-ness in AI systems

What is AI system Specification?

TLDR;

References