foojay – a place for friends of OpenJDK

What is Sharding in MongoDB and When Should You Use It?

Nancy Agarwal — Tue, 02 Jun 2026 22:15:00 +0000

Table of Contents

A Practical Introduction to Horizontal Scaling
1. Shards
2. Config Servers
3. Mongos Router
Large datasets
High write throughput
Rapid data growth

A Practical Introduction to Horizontal Scaling

When building applications, most developers start with a single database server.

At the beginning, everything works perfectly.

Your application might have:

A few thousand users
Manageable traffic
Datasets that easily fit on one machine

But as your application grows, something interesting starts to happen.

Queries take longer.
Write operations slow down.
The database server starts hitting CPU, RAM, or storage limits.

At this stage, many engineers ask an important question:

Should we upgrade the server or scale the database differently?

This is where horizontal scaling and sharding come into the picture.

If you're using MongoDB, sharding is the mechanism that allows your database to scale beyond the limits of a single machine.

In this article, we'll walk through:

What sharding actually is
Why horizontal scaling matters
How MongoDB implements sharding
When you should (and shouldn’t) use it

The Scaling Problem Most Databases Face

Imagine your application stores user data in a database.

Initially, the architecture looks like this:

Application

    │

Database Server

All reads and writes go to one machine.

This approach is called vertical scaling, when you keep upgrading the same server by adding:

More CPU
More RAM
Faster storage

While this works for a while, vertical scaling eventually hits limits:

Hardware upgrades become expensive
There is always a maximum server size
Downtime may be required during upgrades

Eventually, a single server becomes a bottleneck.

Instead of making one machine bigger, the better approach is to add more machines.

This approach is called horizontal scaling.

What is Horizontal Scaling?

Horizontal scaling means distributing data across multiple servers rather than relying on a single server.

Instead of storing all data on a single machine:

Server A

2 TB of data

You distribute the data:

Server A → 500 GB

Server B → 500 GB

Server C → 500 GB

Server D → 500 GB

Each server stores only part of the dataset.

This is exactly what sharding does.

What is Sharding in MongoDB?

Sharding is the process of splitting large datasets across multiple database servers.

Each server stores a portion of the data, called a shard.

For example, imagine an application storing millions of users.

Instead of keeping all users on one server:

Shard	Data
Shard 1	Users with IDs 1–1M
Shard 2	Users with IDs 1M–2M
Shard 3	Users with IDs 2M–3M

Each shard contains only a subset of the collection.

When queries come in, MongoDB determines which shard contains the relevant data.

This allows the database to handle massive datasets and high traffic efficiently.

MongoDB Sharded Cluster Architecture

A sharded cluster in MongoDB consists of three main components: shards, config servers, and MongoDB routers

1. Shards

Shards are where the actual data is stored.

Each shard is usually deployed as a replica set to ensure high availability and fault tolerance.

2. Config Servers

Config servers store metadata about the cluster.

They maintain information such as:

Which shard contains which data
How data is distributed
Shard key ranges

Without config servers, the cluster would not know where data lives.

3. Mongos Router

Applications do not connect directly to shards.

Instead, they connect to mongos, which acts as a query router.

Its responsibilities include:

Receiving application queries
Determining which shard contains the data
Forwarding the query to the correct shard

A simplified architecture looks like this:

     Application

          │

        Mongos

      /   |   \

Shard1  Shard2   Shard3

This abstraction means the application does not need to know where the data is stored.

Choosing a Shard Key

A shard key determines how data is distributed across shards.

For example:

{ userId: 1 }

MongoDB uses the shard key to decide which shard a document belongs to.

Choosing a shard key is one of the most critical decisions in a sharded architecture.

A good shard key should:

Distribute data evenly
Avoid hotspots
Support common query patterns

For example, if most queries are based on userId, using it as the shard key makes sense.

However, choosing something like country might create imbalanced shards if most users are from one region.

Creating a Sharded Collection

Let’s look at a simple example.

First, enable sharding for a database.

sh.enableSharding("companyDB")

Next, shard a collection.

sh.shardCollection(

 "companyDB.employees",

 { employeeId: 1 }

)

MongoDB will now automatically distribute documents across shards.

Querying Data in a Sharded Cluster

One of the nice things about sharding in MongoDB is that application queries remain the same.

For example:

db.employees.find(

 { department: "Engineering" },

 { name: 1, managerName: 1, departmentName: 1 }

)

The mongos router determines which shard contains the relevant documents and routes the query to that shard.From the application's perspective, it still feels like one database.

When Should You Use Sharding?

Sharding is powerful, but it should be introduced only when needed.

Here are common situations where sharding makes sense.

Large datasets

If your dataset grows into hundreds of gigabytes or terabytes, a single server may not be sufficient.

Examples include:

Analytics platforms
Log storage systems
IoT platforms

High write throughput

Applications that generate large numbers of writes can benefit from sharding because writes can be distributed across multiple nodes.

Examples include:

Event tracking systems
Gaming platforms
Social media feeds

Rapid data growth

If you expect your dataset to grow rapidly, designing the system with sharding in mind early can save major architectural changes later.

When Sharding Might Be Overkill

Despite its benefits, sharding adds operational complexity.

You probably don’t need sharding if:

Your dataset is relatively small
Your workload is moderate
Vertical scaling still works

Many applications run perfectly fine with replication and proper indexing.

Sharding should usually be considered after other scaling strategies have been exhausted.

Sharding vs Replication

Developers sometimes confuse these two concepts.

Feature	Replication	Sharding
Purpose	High availability	Horizontal scaling
Data	Same data on every node	Data split across nodes
Reads	Can scale reads	Scales read and write
Storage	Data duplicated	Data distributed

In practice, MongoDB often uses both together.

Each shard is typically configured as a replica set, ensuring both scalability and fault tolerance.

Final Thoughts

Sharding is one of the most powerful scaling mechanisms available in MongoDB.

It allows databases to handle:

Massive datasets
High query throughput
Continuously growing applications

However, like most architectural decisions, it should be introduced carefully and intentionally.

Understanding your data access patterns and choosing the right shard key are essential for a successful sharded deployment.

If you’re building applications expected to scale to millions of users or terabytes of data, sharding becomes a key tool in your database architecture.

The post What is Sharding in MongoDB and When Should You Use It? appeared first on foojay.

Exploring MongoT (Atlas Search)

Luke Thompson — Thu, 28 May 2026 21:02:49 +0000

Table of Contents

Let’s dive in!Simple Example - Text Search

Breakdown Table (for a ~9ms $search aggregation path through MongoT)

Local DebuggingSample DataInteresting Example - Faceted Text SearchLucene Indexing Strategy + Benefits over MongoD IndexesVector Search ExampleLocal Grafana MonitoringPerformance Java Code PackagesSo what can you learn from MongoT? Wrap

Let’s explore this fascinating and awesome Java project from MongoDB - MongoT!

You can check out the source code here:

git clone https://github.com/mongodb/mongot

MongoT is a wrapper around the amazing Java search engine: Lucene.

Lucene is a powerful search toolkit built around an inverted token index structure that enables advanced text search capabilities, including ranked results, autocomplete, synonyms, fuzzy matching, highlighting, and faceting — all with high performance regardless of dataset size. Unlike MongoDB's native query engine, it can efficiently search across multiple indexes simultaneously by intersecting lists of ordinal document IDs in parallel, using optimization techniques like skip-lists, ordinal compression, and document frequency ordering. It also supports indexing of various field types (integers, dates, keywords, etc.) and has expanded into vector search, enabling semantic similarity search by meaning rather than exact text matching.

Adding vector search to MongoDB was clearly a core goal for the MongoT project, as it is important to participate in the semantic search space. I think it is really worth digging into the capabilities it offers beyond vector search, too, as all the Lucene search types massively complement MongoDB database’s own incredible B-tree index-based search features.

Let’s dive in!

(Check out the live demo of this screenshot here)

Once you see the code in the MongoT project, you may be a little overwhelmed at first by the volume and complexity of it (I was!).

Never fear, though! We are going to break it down and walk through a few real-world query examples, and see exactly how it all hangs together. By the end, I want you to feel comfortable with the codebase, try forking it, and have some fun debugging, testing, and even making some changes.

If you (like me!) are a visual learner, have a play with the animated tour through the code packages along the way: https://luketn.com/mongot-app-tour/index.html

Simple Example - Text Search

Let’s start with a real example. Here’s an actual Atlas Search query:

db.image.aggregate([
  {
    $search: {
      text: {
        query: "Pizza",
        path: "caption"
      }
    }
  }
]);

[{
  caption: 'Stacks of dominos pizza boxes with a pizza.',
  url: 'http://images.cocodataset.org/train2017/000000371822.jpg',
  hasPerson: false,
  food: [
    'pizza'
  ]
},...]

The client application sends that as a MongoDB aggregate command through its driver to MongoD (the driver never connects to MongoT directly - it connects only to MongoD). When MongoD reaches the $search stage, it rewrites the public stage into an internal remote-search stage, builds a MongoT search command, and opens a remote cursor against MongoT.

Inside MongoT, the request lands on the gRPC command stream, dispatches to SearchCommand, resolves the search index, creates a cursor, builds the Lucene query, executes the initial Lucene search, materializes BSON results, and returns the first batch to mongod.

The cursor is left open (if it wasn’t exhausted) so that future getMore’s on the MongoD cursor can, in turn, fetch more results on the MongoT cursor.

Breakdown Table (for a ~9ms $search aggregation path through MongoT)

Phase	Code Path	Indicative Time Taken	Percentage of Command (excluding streaming results)	What It Means
Query context	SearchCommand.run	17 us	1.91%	Builds per-query execution context before parsing the request.
Parse BSON	SearchQuery.fromBson	211 us	23.71%	Converts the incoming MongoT search command into MongoT query model objects.
Index lookup	SearchCommand.getIndexFromCatalog	3 us	0.34%	Finds the named search index in MongoT's in-memory catalog.
Cursor setup	MongotCursorManagerImpl.newCursor, CursorFactory.createCursor, IndexCursorManagerImpl.createCursor	46 us	5.16%	Creates cursor state around the index reader and batch producer.
Build Lucene query	LuceneSearchQueryFactoryDistributor.createQuery, TextQueryFactory.createQuery	37 us	4.16%	Translates MongoT's query model into a Lucene Query. This is construction, not execution.
Lucene collect hits	MeteredLuceneSearchManager.initialSearch, LuceneOperatorSearchManager.initialSearch	92 us	10.34%	Executes the initial Lucene text search and returns the first TopDocs.
Reader orchestration	LuceneSearchIndexReader.query, LuceneSearchIndexReader.collectorQuery	107 us	12.02%	Handles reader bookkeeping, stored-source checks, branch dispatch, and locking around Lucene execution.
Advance batch	MongotCursor.getNextBatch, LuceneSearchBatchProducer.execute	12 us	1.35%	Advances the batch producer for the first batch; later, getMore can use searchAfter.
Materialize BSON	LuceneSearchBatchProducer.getSearchResultsFromIter, ProjectStage.project, MetaIdRetriever.getRootMetaId	372 us	41.80%	Converts Lucene hits into BSON response documents, including stored-source or id/score output.
Batch orchestration	MongotCursorManagerImpl.getNextBatch, IndexCursorManagerImpl.getNextBatch	16 us	1.80%	Wraps first-batch loading and cursor exhaustion checks.
Response document	SearchCommand.getBatch, MongotCursorBatch.toBson	13 us	1.46%	Builds the command response wrapper, cursor document, and metadata variables.
Encode BSON	SearchCommand.getBatch, MongotCursorBatch.toBson	1 us	0.11%	Serializes the response payload returned on the command stream.
Stream lifecycle	ServerCallHandler.onNext, ServerCallHandler.handleMessage, CommandManager	8.078 ms outside command	N/A	gRPC stream lifetime outside the initial command span, including response observer handling, client consumption, cleanup, and any later cursor work in the same stream.

Local Debugging

Next, let’s get MongoT up and running locally from source code. I’m going to use IntelliJ for the IDE in this walkthrough, but the steps should be similar in any IDE.

Follow these steps:

First up, you’ll need the JetBrains IntelliJ Bazel plugin installed in order to work with the project: https://plugins.jetbrains.com/plugin/22977-bazel
Clone the repo and open it in IntelliJ (IntelliJ will automatically recognize the Bazel project and configure an IntelliJ project mapped onto the Bazel configurations)

git clone https://github.com/mongodb/mongot
cd mongot
idea .

Enable debugging by adding the following changes to the mongot-local container in community-quick-start/docker-compose.yml file:

  mongot-local:
...
    command:
      - /mongot-community/mongot
      - --jvm-flags
      - "-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005"
      - --config=/mongot-community/config.default.yml

  mongot-local:
...
    ports:
...
      - 5005:5005    # Debug port

This will allow us to connect and debug the locally built mongot code over the 5005 debugger port.

Run the local built mongot code from a shell on the root of the project:

make docker.up MODE=local

Once the build is finished, and the containers are running, we can attach a debugger on port 5005.

Create a Remote JVM run config:

Run the new MongoT Container run config, and you’ll see your IntelliJ is now debugging the source code:

You can connect to MongoDB using Compass:

When configuring the connection you’ll need to configure the TLS settings to point at the ca.pem and client-combined.pem files in the community-quick-start/tls directory:

Sample Data

You can find great sample databases for Atlas published by MongoDB here: https://www.mongodb.com/docs/atlas/sample-data

When we ran the community quick start above the local instance was prepopulated with some of these sample datasets.

For the rest of the article, I’m going to use my own little example, which you can find the data and index mappings for here: https://github.com/luketn/atlas-search-coco-dataset

(It’s a faceted index on the popular COCO image dataset.)

If you want to dig into the code for that, there’s a walkthrough here: https://github.com/luketn/atlas-search-coco

https://tech-blog.luketn.com/java-faceted-full-text-search-api-using-mongodb-atlas-search

Here’s the Atlas Search Coco sample project serving up queried data from the Coco image dataset through a locally debugged MongoT:

And here is the index that was built in the local MongoT through Compass:

If you explore this app, you’ll find some fun things you can do with local LLMs, vector embeddings, and queries that I played around with while writing this article.

Create some sample databases (or real ones!), create Atlas Search indexes, and perform queries locally. Put breakpoints in the code and have fun exploring to see how it all hangs together.

Interesting Example - Faceted Text Search

Let’s go a bit deeper and perform a faceted search example.

Here’s an Atlas Search query matching an image with a caption ‘frisbee’ and an animal category of ‘dog’. We’re asking for a count of two facets on the result set, so we can see how many of our matches also contain the category ‘sports’.

Run the following from MongoDB Compass in the shell, and put a breakpoint in the code on TextQueryFactory.createQuery.

db.image.aggregate([
 {
   $search: {
     facet: {
       operator: {
         compound: {
           filter: [
             {
               text: {
                 path: "caption",
                 query: "frisbee"
               }
             },
             {
               equals: {
                 path: "animal",
                 value: "dog"
               }
             }
           ]
         }
       },
       facets: {
         animal: {
           type: "string",
           path: "animal",
           numBuckets: 10
         },
         sports: {
           type: "string",
           path: "sports",
           numBuckets: 10
         }
       }
     },
     count: {
       type: "total"
     }
   }
 },
 {
   $facet: {
     docs: [],
     meta: [
       {
         $replaceWith: "$$SEARCH_META"
       },
       {
         $limit: 1
       }
     ]
   }
 }
]);

You can step through and see it create the Lucene query object instances, e.g., TermQuery ($type:string/caption:frisbee) being built from the Atlas Search facet compound clauses.

Continue stepping through, you’ll eventually get to LuceneFacetCollectorSearchManager.initialSearch:

Here you can see the fully composed BooleanQuery, combining the string-type TermQuery on frisbee with the token-type TermQuery on dog.

This is interesting just for learning the (somewhat obtuse) Java library API for Lucene queries.

The results from Lucene include the docs matched and the facets:

You can keep digging around and see all the ways the wrapper is marshaling documents from Lucene indexes.

You’ll notice some interesting things, like the _ids in Lucene Indexes are integers. This is core to the way Lucene works, and I’ll get into why in a minute:

The MongoDB ObjectID (or whatever type you use) _id is stored as metadata and returned as part of the result set if required:
com.xgen.mongot.index.lucene.query.util.MetaIdRetriever#getRootMetaId

Eventually, MongoDB shows the results like this:

{
  docs: [
    {
      _id: 27617, // in the Atlas Search Coco sample data I am using integer IDs
      caption: 'A dog relaxes on the green grass as he holds a yellow frisbee.',
      url:'http://images.cocodataset.org/train2017/000000027617.jpg',
      hasPerson: false,
      animal: [
        'dog'
      ],
      kitchen: [
        'bowl'
      ]
    },
    ... 365 more items
  ],
  meta: [
    {
      count: {
        total: 366
      },
      facet: {
        sports: {
          buckets: [
            {
              _id: 'frisbee',
              count: 364
            },
            {
              _id: 'sports ball',
              count: 4
            },
            {
              _id: 'baseball glove',
              count: 1
            }
          ]
        },
        animal: {
          buckets: [
            {
              _id: 'dog',
              count: 366
            }
          ]
        }
      }
    }
  ]

Lucene Indexing Strategy + Benefits over MongoD Indexes

Let’s step back a moment from the detail, and ask why use Atlas Search at all?

There are three compelling reasons for me:

Advanced Text Search
Multiple-index searching with merged results
Vector Search

I have always been a massive fan of Lucene.

It’s an awesome search toolkit, and I’ve used it as an embedded Java library in my applications as well as in services like the excellent Elasticsearch (and its open-source fork OpenSearch) and Solr.

Performance is great, the query syntax is intuitive (once you get used to it), and its indexing approach is extremely efficient and flexible. With vector search now supported, we have complete text search and advanced parallel indexing, which complement MongoDB’s own search perfectly.

Current State of MongoDB’s Built-In Search

MongoDB’s native indexing and query engine is extremely quick and powerful when you have well-defined fields and query patterns.

When designing built-in indexes to be efficient and get great performance, you can:

optimize collection and document design
optimize indexes by using the ESR Rule
tune the number of indexes for good read and write performance
optimize queries using explain plans and practical experiments

However, there are a few types of search where MongoDB falls short. Lucene comes into the picture to enhance or resolve these use cases:

Advanced Text Search

MongoDB has some basic capabilities for text search, using a $text or $regex query. These are ok for simple searching on small datasets, but often (especially for $regex) extremely slow when your dataset grows larger and/or your queries become more complex.

By comparison, Lucene can perform advanced text searches in the style of Google / Amazon’s search box, with ranked results, autocomplete, synonyms, fuzzy matching, highlighting, and faceting. Not only that, it can do so with great performance, almost irrespective of your data size, thanks to its ‘inverted’ token index structure.

This is a super deep topic, and I won’t cover all of it here, but there are great references both in the Lucene documentation and MongoDB’s: https://lucene.apache.org/core

https://www.mongodb.com/atlas/search

Multiple-index searching with merged results

MongoDB can’t do multiple-index searching. MongoDB uses a single index per query, and if there is further filtering to be done, it will be done directly on the documents. A MongoDB query for multiple indexed fields looks broadly like this:

(Gross oversimplification of MongoDB’s query engine)

i.e., the query planner picks one index and uses that.

Lucene, by contrast, can perform searches over multiple indexes efficiently and return the intersection of the results.

Documents in Lucene are each assigned an ordinal id — a docid (0, 1, 2…).

In a typical text field index, Lucene analyzes text into terms. It stores those terms in a term dictionary, and each term points to a postings list: a compact, sorted list of Lucene internal document IDs for documents that contain that term.

You can also have simpler 1-1 indexes over fields that don’t need the text extracted, like integers, floating point numbers, dates, enums, and keywords.

You can think of Lucene indexes like a set of maps between Term Ids to a list of Document Ids.

Lucene is very efficient at performing index intersection, because of the basic structure and sort order of the data and ordinal indexes, and several optimization techniques like:

skip-lists: additional structures that allow skipping large chunks of ordinals
ordinal compression: storing ordinals for documents in a variety of compressed formats, which are smaller to retrieve and iterate
document frequency ordering: order terms by least frequency so that rarer terms (which would be more selective) reduce the set of ids to intersect first

MongoT creates Lucene documents to match MongoDB documents, mapping every Atlas Search indexed field into a Lucene document.

Depending on the preferences you choose for each field in the Atlas Search field mappings, MongoT will store the values of the MongoDB document differently in the Lucene document. It may store a single value using multiple Lucene document field types in order to support filtering, sorting, and faceting:

https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/index/package-summary.html#field_types

For this reason, it is probably worth taking some time tuning the Atlas Search index field mappings to ensure you are only selecting the options you really need. The bigger and more complex the field mappings and types are, the worse the performance and the heavier the resource requirements.

As we saw in the Interesting Example, MongoDB document _ids (typically ObjectIds) are stored with the Lucene index and mapped to and from ordinal Document Ids. This maintains the connection to the MongoD data whilst gaining the performance advantage of the ordinal id data structures of Lucene.

Vector Search

Vector indexes in Lucene efficiently find the ‘nearest’ semantically similar match for a search term.

Atlas text search will get you a match between ‘circular’ and ‘circle’, through lexical text matching techniques.

Vector-based semantic search will get you a match from ‘circular’ to ‘round’.

It’s pretty amazing, and users have come to expect that search engines just get what you mean, not just what you wrote. So it is becoming a must-have feature.

This works using an algorithm over an array of numbers - an ‘embedding’, and computing a distance on a number of dimensions. To be honest, the math here is a bit beyond my understanding, but I can use local or API-based endpoints to take some text and produce embedding vectors.

One thing I really love about having both lexical and vector support is being able to combine the two, seeing both lexical and semantic meaning matches to your query. I think it makes for a super powerful search engine and logical + accurate results.

The big downside of vectors is the time required to compute the embedding on the search term, and the cost for producing embeddings across large corpora of text.

An exciting capability of MongoT is automatic embeddings, which is the ability to plug in a vector embedding engine (currently supports Voyage AI). When you have a vector engine enabled, vectors are computed automatically behind the scenes as you insert and update data, and you can provide query text to the $vectorSearch instead of a queryVector array, wherein MongoT will automatically perform the embedding of the query term too. This is so cool, and I think it is the way all vector solutions will work in the future (i.e., the array of numbers is an invisible abstracted implementation detail).

Vector Search Example

Let’s set up vector search!

Before you get started, you’ll need to sign up for the Voyage API and create an API key:

https://dashboard.voyageai.com/organization/api-keys

Warning - You’ll need a payment method to compute vectors, so this could cost actual $, although in my tests I was well within the free tier.

And restart MongoT.

Then you’ll need to update the MongoT config file (mongot-dev.yml) with a few new fields:

embedding:
  queryKeyFile: "/Users/luketn/code/personal/mongot/voyage-api-key"
  indexingKeyFile: "/Users/luketn/code/personal/mongot/voyage-api-key"
  providerEndpoint: "https://api.voyageai.com/v1/embeddings"
  isAutoEmbeddingViewWriter: true

And restart MongoT.

You should see in the log:

CommunityMongotBootstrapper…Initialized auto-embedding with 4 model(s)

Once you have the Voyage API enabled, you can create a vector search index like this:

db.image.createSearchIndex({
  name: "caption_auto_embed",
  type: "vectorSearch",
  definition: {
    fields: [
      {
        type: "autoEmbed",
        path: "caption",
        model: "voyage-4",
        modality: "text"
      }
    ]
  }
});

MongoT will automatically hit the API in batches, and compute embeddings for the caption field using the voyage-4 model, storing them separately in an internally managed embeddings collection, which is great since it doesn’t pollute the MongoDB document with index data!

(although I’d argue maybe a Lucene index file would have been a better abstraction)

Then you can perform searches with simple text query parameters

db.image.aggregate([{$vectorSearch: {
  index: "caption_auto_embed",
  query: "circular flying",
  path: "caption",
  numCandidates: 10,
  limit: 10
}}]);

Behind the scenes, MongoT computes the embedding for the query text ‘circular flying’, uses Voyage API to compute a semantic meaning as an array of floats, then uses Lucene to find the nearest matches semantically in the index.

You can put a breakpoint on EmbeddingServiceManager.embed() and take a look at how the query path works:

So cool. There is obviously a real financial cost to this, but it is the ultimate in convenience.

If you’ve taken the alternative path of computing embeddings yourself, maybe messing around with local models, storing the vectors in MongoDB documents, and indexing them, you’ll understand the value this path has. I did exactly that while writing the article using LM Studio locally - it’s a whole thing. I won’t cover it here, but if you’re interested, feel free to reach out with questions, or I can cover it in another article.

Having tried both the manual vector computation with LM Studio and the Voyage API, I’m recommending the Voyage API :).

With that said, diving into vector search is something to go into with eyes open to the costs - both financial and in time and resources. It’s not something that comes for free.

Local Grafana Monitoring

If you’re feeling really adventurous, you can configure MongoT to output performance traces and metrics using OpenTelemetry, collecting trace data with Jaeger, metrics with Prometheus, and visualizing with Grafana.

I won’t write a full guide here on doing this, but there is a helper script in my fork of MongoT to start Jaeger, Prometheus, and Grafana here:

https://github.com/luketn/mongot/blob/main/local-monitoring.sh

(which also writes the MongoT config to connect to them)

And a few notes here:

https://github.com/luketn/mongot/blob/main/LOCAL-RUN.md

Performance

The performance of MongoT is incredible. I added a little dashboard to Grafana:

https://github.com/luketn/mongot/blob/main/local-grafana-mongot-dashboard.json

And tweaked the MongoT code a little to output some nicer buckets for the distribution of search commands flowing through the system:

https://github.com/luketn/mongot/pull/1/changes

And then ran a K6 load test script to see what sort of performance MongoT was providing for the overall search. As you can see, MongoT *(Lucene) more than pulls its weight in the overall performance of Atlas Search queries.

Here you can see the end- to- end performance of a Java Atlas Search API.

As represented by the MongoT performance dashboard in Grafana:

And as seen by the K6 client:

k6 run -e K6_VUS=25 -e K6_DURATION=5m k6.js

█ TOTAL RESULTS
   HTTP
   http_req_duration: avg=12.8ms, min=2.4ms, med=12.3ms, max=208.4ms p(90)=17.4ms p(95)=18.9ms
   http_reqs: 574,519  1,914.977657/s
   CUSTOM
   search_docs_returned: avg=4.4, min=0, med=5, max=5, p(90)=5, p(95)=5
   EXECUTION
   vus: 25
   NETWORK
   data_received: 2.7 GB  9.1 MB/s
   data_sent: 84 MB   281 kB/s

(run on an M4 MacBook Pro)

What does this mean? Well, the full round-trip of a Java HTTP Atlas Search API call as measured by the K6 client had a median response time of 12.8ms (using a large range of data scenarios drawn from the COCO image dataset).

If I add some tracing to each request, I can see the overheads of Java vs the MongoT->MongoD end-to-end query command:

k6 run -e K6_VUS=25 -e K6_DURATION=5m k6.js
   CUSTOM
    docs_returned: avg=4.42, min=0, med=5, max=5, p(90)=5, p(95)=5
    http_time_ms: avg=25.6, min=7.4, med=25.5, max=83.238, p(90)=31.5   p(95)=33.7
    java_time_ms: avg=2.9, min=0.2, med=3.1, max=46.8, p(90)=3.6, p(95)=3.9
    mongodb_time_ms: avg=7.2, min=2.0 med=6.8, max=48.7, p(90)=9.85     p(95)=11.0
    requests: 291719  972.301245/s

I have to admit getting a bit deep down in the rabbit hole here, and spending more time further expanding the number of spans in the traces emitted by default. I created a branch of MongoT on my own fork and added a bunch of detailed tracing:

https://github.com/luketn/mongot/pull/2/changes

With these traces, you can see that the actual Lucene index query part of the whole system is a tiny fraction of the overall query time.

And over a K6 load client run:

(Note times are slowed by additional tracing)

What does that mean? Well, I think one of the most interesting things about this project is Lucene itself.

If you look at the breakdown of timings within MongoT, only a small fraction of the time is spent performing the actual Lucene Index search. The rest of the time is spent parsing to and from BSON, coordinating cursors, and other activities unrelated to the actual search.

All that said, the overall performance for Atlas Search is amazing, and as a pairing, they are a rock-solid, high-performance search engine, tightly coupled (in a good way!) between the transactional data and the search index.

I really like a visual representation of performance, and being able to pull out traces:

And then walk through the spans in a search trace:

Really helps me understand the code and how it works. Of course, the additional tracing severely impacts performance, but if you want to, check out the branch and play with the Grafana dashboards!

Java Code Packages

Here’s a list of the major packages of the MongoT project as they interact with one another:

Package	Description	Linked Major Packages
com.xgen.mongot.community	Community-edition entrypoint and top-level assembly/bootstrap wiring.	com.xgen.mongot.util, com.xgen.mongot.config, com.xgen.mongot.logging
com.xgen.mongot.index	Core search/vector engine: index definitions, ingestion, Lucene integration, query execution, result shaping, and index status/metadata.	com.xgen.mongot.util, com.xgen.mongot.featureflag, com.xgen.mongot.metrics, com.xgen.mongot.cursor, com.xgen.mongot.embedding, com.xgen.mongot.monitor, com.xgen.mongot.trace, com.xgen.mongot.server, com.xgen.proto, com.xgen.mongot.blobstore, com.xgen.mongot.config, com.xgen.mongot.logging
com.xgen.mongot.replication	MongoDB replication pipeline, including initial sync, steady-state change-stream processing, durability, and indexing work scheduling.	com.xgen.mongot.util, com.xgen.mongot.index, com.xgen.mongot.metrics, com.xgen.mongot.embedding, com.xgen.mongot.logging, com.xgen.mongot.featureflag, com.xgen.mongot.catalog, com.xgen.mongot.cursor, com.xgen.mongot.monitor
com.xgen.mongot.server	External server surface: gRPC/command handling, protocol plumbing, request routing, and streaming responses.	com.xgen.mongot.util, com.xgen.mongot.index, com.xgen.mongot.cursor, com.xgen.mongot.config, com.xgen.mongot.catalogservice, com.xgen.mongot.catalog, com.xgen.mongot.embedding, com.xgen.mongot.metrics, com.xgen.mongot.featureflag, com.xgen.mongot.trace
com.xgen.mongot.embedding	Embedding-provider integration, request context, auto-embedding helpers, and materialized-view support for vector workflows.	com.xgen.mongot.util, com.xgen.mongot.index, com.xgen.mongot.metrics, com.xgen.mongot.replication
com.xgen.mongot.config	Configuration models, validation, providers, change planning, and config-management workflow for MongoT subsystems.	com.xgen.mongot.util, com.xgen.mongot.index, com.xgen.mongot.replication, com.xgen.mongot.featureflag, com.xgen.mongot.metrics, com.xgen.mongot.catalog, com.xgen.mongot.catalogservice, com.xgen.mongot.embedding, com.xgen.mongot.server, com.xgen.mongot.monitor, com.xgen.mongot.cursor, com.xgen.mongot.lifecycle, com.xgen.mongot.logging
com.xgen.mongot.cursor	Cursor domain model, managers, batching, and serialization for paged search results / getMore flows.	com.xgen.mongot.index, com.xgen.mongot.util, com.xgen.mongot.trace, com.xgen.mongot.catalog, com.xgen.mongot.metrics
com.xgen.mongot.catalogservice	Metadata service layer for authoritative index definitions, per-server index stats, and server heartbeats stored in the internal metadata database.	com.xgen.mongot.util, com.xgen.mongot.index, com.xgen.mongot.replication
com.xgen.mongot.catalog	Local index catalog abstractions and implementations are used to resolve/search the index state.	com.xgen.mongot.index
com.xgen.mongot.blobstore	This is interesting; it seems like it is perhaps a future roadmap feature for the community edition or something intended for us to extend. Couldn’t see how to configure this to do snapshots of the index to blob storage like AWS S3.	com.xgen.mongot.util
com.xgen.mongot.featureflag	Static and dynamic feature flag definitions plus runtime flag registry/config.Some interesting ones in there, such as: ENABLE_10K_BUCKET_LIMITDon’t know about you, but I hit cases where the current 1000 bucket limit was constraining!	com.xgen.mongot.util, com.xgen.mongot.index
com.xgen.mongot.lifecycle	Startup/shutdown lifecycle coordination, especially around index lifecycle management.	com.xgen.mongot.index, com.xgen.mongot.util, com.xgen.mongot.replication, com.xgen.mongot.catalog, com.xgen.mongot.metrics, com.xgen.mongot.blobstore, com.xgen.mongot.monitor
com.xgen.mongot.logging	Structured logging helpers and JSON log-format customization.	None
com.xgen.mongot.metrics	Metrics abstractions plus Full-Time Diagnostic Data Capture (FTDC) collection/reporting infrastructure.	com.xgen.mongot.util, com.xgen.mongot.index
com.xgen.mongot.monitor	Disk and replication-state monitoring, gates, and hysteresis controls used to protect service behavior under stress.	com.xgen.mongot.util, com.xgen.mongot.config, com.xgen.mongot.metrics
com.xgen.mongot.trace	OpenTelemetry tracing helpers, exporters, sampling toggles, and trace parsing utilities.	None
com.xgen.mongot.util	Shared foundation code used across MongoT: BSON/proto conversion, concurrency helpers, collections, versioning, and general utilities.	com.xgen.proto, com.xgen.mongot.metrics, com.xgen.mongot.logging
com.xgen.proto	BSON-aware protobuf runtime plus code-generation plugin for BSON-capable protobuf messages.	None

Digging into the most important of these packages - com.xgen.mongot.index:

Package	Description	Linked Major Packages
com.xgen.mongot.index.lucene	Largest execution layer: Lucene-backed indexing, search, highlighting, result shaping, commit management, and searcher orchestration.	com.xgen.mongot.index.query, com.xgen.mongot.index.definition, com.xgen.mongot.index.analyzer, com.xgen.mongot.index.path, com.xgen.mongot.index.ingestion, com.xgen.mongot.index.version, com.xgen.mongot.index.synonym, com.xgen.mongot.index.status, com.xgen.mongot.index.blobstore
com.xgen.mongot.index.query	Query AST, operators, collectors, pagination, score shaping, and translation from request semantics into Lucene execution.	com.xgen.mongot.index.path, com.xgen.mongot.index.definition, com.xgen.mongot.index.lucene
com.xgen.mongot.index.definition	Core schema model for search, vector, and view indexes, including field definitions, options, and validation logic.	com.xgen.mongot.index.version, com.xgen.mongot.index.analyzer, com.xgen.mongot.index.query, com.xgen.mongot.index.lucene, com.xgen.mongot.index.path
com.xgen.mongot.index.ingestion	BSON document processing, field extraction, and ingestion-time transforms that feed Lucene indexing.	com.xgen.mongot.index.definition, com.xgen.mongot.index.lucene
com.xgen.mongot.index.analyzer	Analyzer builders, providers, factories, and language-specific tokenization plumbing for index definitions and query-time analysis.	com.xgen.mongot.index.definition, com.xgen.mongot.index.lucene, com.xgen.mongot.index.path, com.xgen.mongot.index.query
com.xgen.mongot.index.autoembedding	Auto-embedding and materialized-view index helpers that derive generated fields and coordinate embedding-oriented index metadata.	com.xgen.mongot.index.definition, com.xgen.mongot.index.mongodb, com.xgen.mongot.index.status, com.xgen.mongot.index.version, com.xgen.mongot.index.analyzer, com.xgen.mongot.index.query
com.xgen.mongot.index.blobstore	Snapshotting hooks for persisting and restoring index state through blob storage.	com.xgen.mongot.index.version
com.xgen.mongot.index.mongodb	Narrow MongoDB-facing helpers for materialized-view writes and index-related metrics/state propagation.	com.xgen.mongot.index.lucene, com.xgen.mongot.index.status, com.xgen.mongot.index.version
com.xgen.mongot.index.path	Shared path abstractions for dotted field-path parsing and traversal across schema and query code.	None
com.xgen.mongot.index.status	Index and synonym status enums/models used to expose lifecycle and readiness state.	None
com.xgen.mongot.index.synonym	Synonym mapping models, registries, and status tracking are integrated with Lucene query behavior.	com.xgen.mongot.index.status, com.xgen.mongot.index.definition
com.xgen.mongot.index.version	Index format/version identifiers, generation metadata, and compatibility/capability checks.	None

Ref: https://github.com/luketn/mongot/blob/main/MONGOT_PACKAGE_TOUR.md

So what can you learn from MongoT?

For me, this is an incredible example of an awesome database company, MongoDB, building a production-grade search engine companion app.

There are many aspects that are interesting to learn from:

How to perform change stream in a robust and reliable way to sync data to any external system (Lucene being a great example)
How to manage a Lucene index in Java, and perform searches on it
How to build a scalable Java service that can grow to a huge scale in production
How to do semantic search with vectors in a seamless way

We haven’t dug too deeply here in this introduction to any of these, but hopefully it gives you a quick tour to get you started and some ideas about the goodies there are to explore.

Wrap

I’ve been exploring the codebase and playing with Atlas Search (both lexical and semantic) for the last few weeks. It’s been a lot of fun, and I learned a lot too.

I hope you get a lot out of exploring and trying it yourself, too.

Happy searching!

The post Exploring MongoT (Atlas Search) appeared first on foojay.

AI-Powered Code Review Assistant: Automated Code Analysis with Spring AI and MongoDB

Farhan Hasin Chowdhury — Thu, 14 May 2026 17:09:39 +0000

Table of Contents

Prerequisites1. Project setup2. Storing and managing review patterns

Defining the pattern model
Creating the repository
Building the service layer
Exposing the REST endpoints

3. Embedding patterns with Spring AI and MongoDB Atlas Vector Search

Adding Spring AI dependencies
Generating embeddings
Seeding the pattern library
Creating the Atlas Vector Search index
Searching for similar patterns

4. Building the code review engine

Defining the submission and finding models
The review service
Prompt design
The review controller
Testing the review engine

5. Tracking review trends with aggregation pipelines6. Testing the full workflowConclusion

Code reviews catch bugs before they ship, but they take time. Most teams rely on manual review or basic linters that flag syntax issues but miss deeper problems like subtle resource leaks, poor exception handling, or security anti-patterns. Static analysis tools help, but they work with rigid rules that cannot generalize across code variations. A rule that catches catch (Exception e) {} will miss catch (Throwable t) { return null; }, even though both are the same underlying problem.

In this article, you will build a code review assistant API. Developers submit code snippets through a REST endpoint. The system embeds the submitted code with Spring AI and searches a library of known anti-patterns stored as vectors in MongoDB Atlas. It then sends the code along with matched patterns to an LLM for structured review feedback. Every submission and its findings are stored in MongoDB, and aggregation pipelines surface trends over time.

The tech stack is Java 21+, Spring Boot 3.x, Spring AI, Spring Data MongoDB, and MongoDB Atlas. By the end, you will have a working review API that accepts code, finds relevant anti-patterns using Atlas Vector Search, gets structured feedback from an LLM, and tracks findings across submissions. The complete source code is available in the companion repository on GitHub.

Prerequisites

Java 21 or later
Spring Boot 3.x (use Spring Initializr with the Spring Data MongoDB and Spring Web dependencies; you will add Spring AI manually later in the article)
A MongoDB Atlas cluster (the free tier is sufficient, and you will need it for Atlas Vector Search). You can set up one by following the MongoDB Atlas getting started guide.
An OpenAI API key (used for both the embedding model and the chat model)
Basic familiarity with Spring Boot (controllers, services, dependency injection)

1. Project setup

Go to Spring Initializr and generate a new project. I am using the following settings, feel free to use your own group name:

Group: dev.farhan
Artifact: code-review-assistant
Java version: 21
Dependencies: Spring Web, Spring Data MongoDB

You will add Spring AI dependencies manually in section 3. For now, the project only needs web and MongoDB support.

Open application.properties and configure the MongoDB connection:

spring.data.mongodb.uri=mongodb+srv://:@.mongodb.net/code-review-assistant?appName=devrel-article-java-springai-foojay

Replace the placeholders with your Atlas cluster credentials. The appName query parameter helps MongoDB track which application is connecting, which is useful for monitoring. If you are running MongoDB locally, use mongodb://localhost:27017/code-review-assistant?appName=devrel-article-java-springai-foojay instead.

The companion repository has the complete project structure. You can clone it and follow along, or build each piece from scratch as you read.

2. Storing and managing review patterns

The review assistant works by comparing submitted code against a library of known anti-patterns. Before you can do any comparison, you need a way to define what an anti-pattern looks like, store it in MongoDB, and expose endpoints for adding and listing patterns.

Defining the pattern model

Review findings will have severity levels, so start by defining those as a Java enum. An enum is a type that restricts a value to a fixed set of options, which prevents invalid severity strings from entering the system:

public enum Severity {
    CRITICAL, WARNING, INFO
}

CRITICAL is for issues that will cause bugs or security vulnerabilities. WARNING is for problems that may cause issues under certain conditions. INFO is for suggestions that improve code quality but are not urgent.

Next, define the ReviewPattern class. This is the document that represents a single anti-pattern in your library. The @Document annotation tells Spring Data MongoDB which collection this class maps to, and @Id marks the field that MongoDB will use as the document's unique identifier:

@Document(collection = "review_patterns")
public class ReviewPattern {

    @Id
    private String id;
    private String name;
    private String description;
    private String language;
    private Severity severity;
    private String category;
    private String exampleBadCode;
    private String exampleGoodCode;
    private String explanation;

    // constructors, getters, and setters omitted for brevity
}

Each pattern has a name (like "empty catch block"), a description that explains the problem in plain language, and a language field so you can filter patterns by programming language. The category field groups related issues together (for example, "security" or "error-handling"). The exampleBadCode and exampleGoodCode fields show the problem and its fix side by side, and explanation describes why the bad code is problematic.

You will add an embedding field to this class later in section 3 when you set up vector search. For now, the text fields are enough to define the pattern library.

Each pattern's id is a human-readable slug like unclosed-resources or hardcoded-credentials, set at creation time rather than auto-generated as an ObjectId. ObjectIds are useful when many writers insert records concurrently or when you want a time-ordered index, but neither is an issue with a small admin-curated pattern library. Slugs make findings easier to read in the shell and give the LLM a meaningful label to echo back in matchedPatternId.

To see what a pattern looks like as a JSON document, here are two examples. The first describes an empty catch block, a common error-handling problem:

{
  "_id": "empty-catch-block",
  "name": "Empty catch block",
  "description": "Catching an exception and doing nothing with it, silently swallowing errors",
  "language": "java",
  "severity": "CRITICAL",
  "category": "error-handling",
  "exampleBadCode": "try { connection.close(); } catch (SQLException e) { }",
  "exampleGoodCode": "try { connection.close(); } catch (SQLException e) { logger.warn(\"Failed to close: {}\", e.getMessage()); }",
  "explanation": "Empty catch blocks silently swallow errors. When something fails, there is no log entry and no way to diagnose the problem."
}
``````json
{
  "_id": "hardcoded-credentials",
  "name": "Hardcoded credentials",
  "description": "Storing passwords, API keys, or secrets as string literals in source code",
  "language": "java",
  "severity": "CRITICAL",
  "category": "security",
  "exampleBadCode": "private static final String DB_PASSWORD = \"s3cretP@ss!\";",
  "exampleGoodCode": "@Value(\"${db.password}\") private String dbPassword;",
  "explanation": "Hardcoded credentials end up in version control and build artifacts. Use environment variables or a secrets manager."
}

The second describes hardcoded credentials, a security anti-pattern:

{
  "_id": "hardcoded-credentials",
  "name": "Hardcoded credentials",
  "description": "Storing passwords, API keys, or secrets as string literals in source code",
  "language": "java",
  "severity": "CRITICAL",
  "category": "security",
  "exampleBadCode": "private static final String DB_PASSWORD = \"s3cretP@ss!\";",
  "exampleGoodCode": "@Value(\"${db.password}\") private String dbPassword;",
  "explanation": "Hardcoded credentials end up in version control and build artifacts. Use environment variables or a secrets manager."
}

Each JSON document maps directly to the fields in the ReviewPattern class. When you save one of these through the API, Spring Data MongoDB converts the Java object into a document with this same structure and stores it in the review_patterns collection.

Creating the repository

To read and write patterns from MongoDB, you need a repository interface. In Spring Data, a repository is an interface that provides database operations without requiring you to write implementation code. You declare methods with names that follow a specific naming convention, and Spring generates the query logic at runtime:

public interface ReviewPatternRepository extends MongoRepository {

    List findByLanguage(String language);

    List findByCategory(String category);

    List findByLanguageAndCategory(String language, String category);

    List findBySeverity(Severity severity);
}

By extending MongoRepository, this interface inherits standard operations like save(), findById(), findAll(), and deleteById(). The two generic parameters tell Spring that this repository manages ReviewPattern documents and that the ID field is a String.

The custom methods use Spring Data's derived query feature. findByLanguage("java") translates to a MongoDB query that filters documents where the language field equals "java". findByLanguageAndCategory combines two filters with an AND condition. You do not need to write any MongoDB query syntax here. Spring parses the method name, identifies the field names and the operator (And), and builds the query for you.

Building the service layer

The service class contains the business logic for creating and retrieving patterns. The @Service annotation marks it as a Spring-managed component, which means Spring will create a single instance of this class and make it available for injection into other components:

@Service
public class ReviewPatternService {

    private final ReviewPatternRepository patternRepository;

    public ReviewPatternService(ReviewPatternRepository patternRepository) {
        this.patternRepository = patternRepository;
    }

    public ReviewPattern createPattern(CreatePatternRequest request) {
        ReviewPattern pattern = new ReviewPattern(
                request.id(), request.name(), request.description(), request.language(),
                request.severity(), request.category(),
                request.exampleBadCode(), request.exampleGoodCode(),
                request.explanation()
        );
        return patternRepository.save(pattern);
    }

    public List listPatterns(String language, String category) {
        if (language != null && category != null) {
            return patternRepository.findByLanguageAndCategory(language, category);
        }
        if (language != null) {
            return patternRepository.findByLanguage(language);
        }
        if (category != null) {
            return patternRepository.findByCategory(category);
        }
        return patternRepository.findAll();
    }

    public Optional getPattern(String id) {
        return patternRepository.findById(id);
    }
}

The constructor takes a ReviewPatternRepository as a parameter. Spring sees this and automatically injects the repository instance it created. This pattern is called constructor injection, and it is the recommended way to wire dependencies in Spring Boot.

The createPattern method builds a ReviewPattern from the incoming request and saves it to MongoDB.

The listPatterns method handles optional filtering. When both language and category are provided as query parameters, it calls the combined query. Without that first check, the method would silently ignore the category and filter by language only. When neither filter is provided, it falls back to findAll(), which returns every pattern in the collection.

The getPattern method returns an Optional. An Optional is a container that may or may not hold a value. It forces the caller to handle the case where no pattern exists for the given ID, rather than risking a null pointer exception.

The CreatePatternRequest is a Java record that maps the incoming JSON request body. Records are a concise way to define immutable data carriers. The compiler automatically generates a constructor, getter methods, and equals/hashCode implementations from the field list:

public record CreatePatternRequest(
        String id, String name, String description, String language,
        Severity severity, String category,
        String exampleBadCode, String exampleGoodCode, String explanation
) {}

When a JSON body arrives at the endpoint, Spring deserializes it into this record by matching JSON field names to the record's component names.

Exposing the REST endpoints

The controller class maps HTTP requests to service methods. The @RestController annotation tells Spring that this class handles web requests and that every method's return value should be serialized directly as the response body (as JSON, by default). @RequestMapping("/api/patterns") sets the base URL path for all endpoints in this controller:

@RestController
@RequestMapping("/api/patterns")
public class ReviewPatternController {

    private final ReviewPatternService patternService;

    public ReviewPatternController(ReviewPatternService patternService) {
        this.patternService = patternService;
    }

    @PostMapping
    @ResponseStatus(HttpStatus.CREATED)
    public ReviewPattern createPattern(@RequestBody CreatePatternRequest request) {
        return patternService.createPattern(request);
    }

    @GetMapping
    public List listPatterns(
            @RequestParam(required = false) String language,
            @RequestParam(required = false) String category) {
        return patternService.listPatterns(language, category);
    }

    @GetMapping("/{id}")
    public ReviewPattern getPattern(@PathVariable String id) {
        return patternService.getPattern(id)
                .orElseThrow(() -> new ResponseStatusException(HttpStatus.NOT_FOUND));
    }
}

@PostMapping handles POST requests to /api/patterns. The @RequestBody annotation tells Spring to deserialize the JSON request body into a CreatePatternRequest record. @ResponseStatus(HttpStatus.CREATED) changes the default response code from 200 to 201, which is the standard HTTP status for "resource created."

@GetMapping without a path handles GET requests to /api/patterns. The @RequestParam(required = false) annotation binds query parameters from the URL. For example, GET /api/patterns?language=java&category=security passes "java" as language and "security" as category. Since both are marked as not required, omitting them results in null values, which the service handles by returning all patterns.

@GetMapping("/{id}") handles GET requests like /api/patterns/unclosed-resources. The @PathVariable annotation extracts the id value from the URL path. If the service returns an empty Optional, orElseThrow converts it into a 404 response.

You can test this by adding a pattern manually:

curl -X POST http://localhost:8080/api/patterns \
  -H "Content-Type: application/json" \
  -d '{
    "id": "empty-catch-block",
    "name": "Empty catch block",
    "description": "Catching an exception and doing nothing with it",
    "language": "java",
    "severity": "CRITICAL",
    "category": "error-handling",
    "exampleBadCode": "try { conn.close(); } catch (SQLException e) { }",
    "exampleGoodCode": "try { conn.close(); } catch (SQLException e) { logger.warn(\"Close failed\", e); }",
    "explanation": "Empty catch blocks silently swallow errors."
  }'

This works for adding patterns one at a time, but the system is more useful with a full library loaded. The next section adds the data seeder along with the embedding and vector search capabilities that make pattern matching work.

3. Embedding patterns with Spring AI and MongoDB Atlas Vector Search

Suppose a developer writes InputStream is = new FileInputStream(path); without a try-with-resources block. Your pattern library describes "unclosed resources in try blocks" with a different code example that uses FileReader. The underlying problem is identical, but the code looks different. Exact string matching will not connect the two. This is where embeddings help. By converting both the stored pattern and the submitted code into vectors, you can measure their semantic similarity regardless of superficial differences in syntax.

Adding Spring AI dependencies

Spring AI is managed through a Bill of Materials (BOM), which is a special dependency declaration that locks the versions of all Spring AI modules so they stay compatible with each other. Add the BOM and the OpenAI starter to your pom.xml:


    
        
            org.springframework.ai
            spring-ai-bom
            1.0.0
            pom
            import
        
    



    
    
        org.springframework.ai
        spring-ai-starter-model-openai

The spring-ai-starter-model-openai dependency does not have a tag. The BOM provides the version, so you only need to specify it in one place. The starter auto-configures both an EmbeddingModel bean (for generating vectors) and a ChatClient.Builder bean (for calling the LLM), which you will use in later sections.

Then add the OpenAI configuration to application.properties:

spring.ai.openai.api-key=${OPENAI_API_KEY}
spring.ai.openai.embedding.options.model=text-embedding-3-small
spring.ai.openai.chat.options.model=gpt-4o-mini
spring.ai.openai.chat.options.temperature=0.2

The ${OPENAI_API_KEY} syntax reads the value from an environment variable, so you do not hardcode your key in the configuration file. The text-embedding-3-small model produces 1536-dimensional vectors, meaning each piece of text gets converted into an array of 1536 numbers that capture its semantic meaning. The low temperature setting (0.2) keeps code review output deterministic and consistent, which is what you want for a review tool that should give similar feedback for similar code. You can swap gpt-4o-mini for a different model if you want stronger results and do not mind higher API costs.

Generating embeddings

To generate an embedding for a pattern, you need to combine its most descriptive fields into a single text block and pass that to the embedding model. Add an embedding field and a helper method to the ReviewPattern class:

@Document(collection = "review_patterns")
public class ReviewPattern {

    // ... existing fields ...

    private float[] embedding;

    public float[] getEmbedding() { return embedding; }
    public void setEmbedding(float[] embedding) { this.embedding = embedding; }

    public String buildEmbeddingText() {
        return description + " " + exampleBadCode + " " + explanation;
    }
}

The embedding field stores the vector that the embedding model generates. It is a float[] because each dimension is a floating-point number.

buildEmbeddingText() concatenates the description, example bad code, and explanation into one string. This gives the embedding model enough context to understand what the pattern is about. The method lives on the model class because both the service and the data seeder need to build this text, and putting it here means the concatenation logic is defined in one place. If you later decide to include the pattern name or category in the embedding, you change this one method instead of updating it in multiple places.

Now update the ReviewPatternService to inject the EmbeddingModel and generate embeddings when creating patterns:

@Service
public class ReviewPatternService {

    private final ReviewPatternRepository patternRepository;
    private final EmbeddingModel embeddingModel;

    public ReviewPatternService(ReviewPatternRepository patternRepository,
                                EmbeddingModel embeddingModel) {
        this.patternRepository = patternRepository;
        this.embeddingModel = embeddingModel;
    }

    public ReviewPattern createPattern(CreatePatternRequest request) {
        ReviewPattern pattern = new ReviewPattern(
                request.id(), request.name(), request.description(), request.language(),
                request.severity(), request.category(),
                request.exampleBadCode(), request.exampleGoodCode(),
                request.explanation()
        );

        pattern.setEmbedding(embeddingModel.embed(pattern.buildEmbeddingText()));

        return patternRepository.save(pattern);
    }

    // listPatterns and getPattern remain unchanged
}

The EmbeddingModel is a Spring AI interface that the OpenAI starter auto-configures. Its embed() method sends the text to OpenAI's embedding API and returns a float[] with 1536 values. Each value represents one dimension of the text's meaning in the model's vector space. Two pieces of text about similar topics will produce vectors that point in similar directions, which is what makes semantic search possible.

Seeding the pattern library

The companion repository includes a DataSeeder component that loads about 20 patterns on startup. It implements CommandLineRunner, which is a Spring Boot interface with a single run method. Spring Boot automatically calls run after the application context is fully initialized, making it a convenient place for one-time setup tasks like loading seed data:

@Component
public class DataSeeder implements CommandLineRunner {

    private final ReviewPatternRepository patternRepository;
    private final EmbeddingModel embeddingModel;

    public DataSeeder(ReviewPatternRepository patternRepository,
                      EmbeddingModel embeddingModel) {
        this.patternRepository = patternRepository;
        this.embeddingModel = embeddingModel;
    }

    @Override
    public void run(String... args) {
        if (patternRepository.count() > 0) {
            return;
        }

        List patterns = createPatterns();

        for (ReviewPattern pattern : patterns) {
            pattern.setEmbedding(embeddingModel.embed(pattern.buildEmbeddingText()));
        }

        patternRepository.saveAll(patterns);
    }

    private List createPatterns() {
        List patterns = new ArrayList<>();

        patterns.add(new ReviewPattern(
                "unclosed-resources",
                "Unclosed resources",
                "Opening a resource without using try-with-resources",
                "java", Severity.CRITICAL, "maintainability",
                "FileInputStream fis = new FileInputStream(\"config.properties\");\n"
                + "Properties props = new Properties();\n"
                + "props.load(fis);\nreturn props;",
                "try (FileInputStream fis = new FileInputStream(\"config.properties\")) {\n"
                + "    Properties props = new Properties();\n"
                + "    props.load(fis);\n    return props;\n}",
                "If an exception occurs between opening and closing a resource, "
                + "the close call never runs. This leaks file handles and connections."
        ));

        // ... 19 more patterns covering error-handling, security,
        //     performance, and maintainability categories ...

        return patterns;
    }
}

The run method starts with a guard check: patternRepository.count() > 0. If the collection already has data, the method returns immediately. This prevents the seeder from re-generating embeddings or re-inserting data on application restarts.

When the collection is empty, the method builds all 20 patterns, then loops through each one to generate its embedding. The loop calls embeddingModel.embed() once per pattern, sending each pattern's text to the OpenAI API. After all embeddings are generated, patternRepository.saveAll(patterns) writes every pattern to MongoDB in a single batch operation, which is more efficient than saving them one at a time in separate round trips.

The full list of 20 patterns covers error handling (catching generic exceptions, empty catch blocks, swallowing InterruptedException), security (hardcoded credentials, SQL injection, logging sensitive data), performance (string concatenation in loops, N+1 queries, unnecessary autoboxing), and maintainability (unclosed resources, missing null checks, raw generics). The complete list is available in the companion repository.

Creating the Atlas Vector Search index

Before you can query the embeddings, you need to create a vector search index in Atlas. This index tells MongoDB how to organize and search the embedding vectors efficiently.

Go to your cluster in the Atlas UI, select the Atlas Search tab, and click Create Search Index. Choose Atlas Vector Search as the index type and select the review_patterns collection. In the index name field, enter vector_index. The code you write later references the index by this exact name, so do not leave the auto-generated default. Then paste the following definition:

{
  "fields": [
    {
      "type": "vector",
      "path": "embedding",
      "numDimensions": 1536,
      "similarity": "cosine"
    }
  ]
}

The path field points to embedding, which is where you stored the vector in the ReviewPattern class. The numDimensions value must match the output of your embedding model, which is 1536 for text-embedding-3-small. If these values do not match, the search will fail.

The similarity field specifies how MongoDB measures the distance between vectors. Cosine similarity measures the angle between two vectors regardless of their magnitude, which makes it a good fit for text embeddings where the direction of the vector matters more than its length.

Searching for similar patterns

With the index in place, you can build a method that finds patterns semantically similar to a given code snippet. This method takes a query vector (the embedding of the submitted code) and runs a $vectorSearch aggregation against the patterns collection.

Aggregation pipelines in MongoDB work like an assembly line. Data flows through a sequence of stages, and each stage transforms the data before passing it to the next one. In this pipeline, the first stage finds similar vectors, the second adds a similarity score to each result, and the third removes the large embedding array from the output:

private List findSimilarPatterns(float[] queryVector, int limit) {
    List queryVectorList = new ArrayList<>();
    for (float f : queryVector) {
        queryVectorList.add((double) f);
    }

    Document vectorSearchStage = new Document("$vectorSearch",
            new Document("index", "vector_index")
                    .append("path", "embedding")
                    .append("queryVector", queryVectorList)
                    .append("numCandidates", 50)
                    .append("limit", limit));

    AggregationOperation vectorSearch = context -> vectorSearchStage;

    AggregationOperation addScore = context ->
            new Document("$addFields",
                    new Document("searchScore",
                            new Document("$meta", "vectorSearchScore")));

    AggregationOperation excludeEmbedding = context ->
            new Document("$project",
                    new Document("embedding", 0));

    Aggregation aggregation = Aggregation.newAggregation(vectorSearch, addScore, excludeEmbedding);

    AggregationResults results =
            mongoTemplate.aggregate(aggregation, "review_patterns", ReviewPattern.class);

    return results.getMappedResults();
}

The method starts by converting the float[] query vector into a List. This conversion is necessary because the MongoDB Java driver expects double-precision numbers in the $vectorSearch query vector.

The $vectorSearch stage is the core of this method. It specifies which index to use (vector_index), which field contains the vectors (embedding), and the query vector to compare against. The numCandidates parameter controls how many candidate documents MongoDB evaluates internally before selecting the final results. Setting it higher than limit gives the search algorithm more options to choose from, which improves accuracy at the cost of slightly more processing time. The limit parameter controls how many results to return.

The $addFields stage adds a searchScore field to each result. The $meta: "vectorSearchScore" expression pulls the cosine similarity score that MongoDB calculated during the vector search. This score ranges from 0 to 1, where 1 means the vectors are identical. You will pass this score to the LLM later so it knows how confident the vector search was about each match.

The $project stage with "embedding": 0 removes the embedding array from the results. Each embedding is a 1536-element array that the prompt builder does not need, so without this exclusion, every vector search would transfer several kilobytes of unused data per pattern.

Finally, mongoTemplate.aggregate() runs the pipeline against the review_patterns collection and maps each result document back into a ReviewPattern Java object.

To hold the similarity score that $addFields injects, add a searchScore field to ReviewPattern and mark it with @Transient:

@Transient
private double searchScore;

The @Transient annotation tells Spring Data MongoDB not to persist this field to the database. The searchScore only gets populated during vector search results and has no meaning outside that context. Without @Transient, saving a pattern returned by vector search would write a stale score to the database.

4. Building the code review engine

The ReviewService is where the pieces connect. It accepts a code submission, finds matching patterns via vector search, sends both to an LLM, and parses the structured response into findings. The following diagram shows the complete flow from submission to response:

Figure 1: Review flow diagram showing the steps from code submission through embedding, vector search, LLM analysis, and saving findings to MongoDB

Before building the service, you need two more document classes: one for storing the code that developers submit, and one for storing the issues that the review engine identifies.

Defining the submission and finding models

The CodeSubmission document stores each code snippet that a developer sends for review:

@Document(collection = "code_submissions")
public class CodeSubmission {

    @Id
    private String id;
    private String code;
    private String language;
    private String fileName;
    private String submittedBy;
    private Instant submittedAt;
    private List findingIds;

    // constructors, getters, and setters omitted for brevity
}

The code field holds the raw source code the developer submits. The language and fileName fields provide context about what kind of code it is. The submittedAt field uses Instant, which stores a precise UTC timestamp. The findingIds field is a list of references to the ReviewFinding documents that the review produces. Rather than embedding findings inside the submission document, storing IDs keeps the submission document small and lets you query findings independently.

The ReviewFinding document stores individual issues that the review engine identifies. Each finding references its parent submission and optionally references the pattern it matched:

@Document(collection = "review_findings")
public class ReviewFinding {

    @Id
    private String id;
    @Indexed
    private String submissionId;
    private String matchedPatternId;
    private int startLine;
    private int endLine;
    private Severity severity;
    private String category;
    private String message;
    private String suggestion;
    private double confidence;

    // constructors, getters, and setters omitted for brevity
}

The @Indexed annotation on submissionId tells Spring Data MongoDB to create a database index on that field. When you look up all findings for a given submission, MongoDB uses this index to jump directly to the matching documents instead of scanning the entire collection. Without it, every call to findBySubmissionId would get slower as the collection grows.

The startLine and endLine fields mark where in the submitted code the issue appears. The matchedPatternId field is nullable because the LLM may flag issues that do not map to any stored pattern. For example, the LLM might notice a logic error that is too specific to be a general anti-pattern. The confidence field is a score from 0.0 to 1.0 that the LLM assigns to indicate how certain it is about the finding.

The review service

Here is the flow that the review service follows for each submission:

Save the code submission to MongoDB.
Embed the submitted code and run vector search to find the top 5 matching patterns.
Build a prompt with the code and matched patterns, then call the LLM.
Parse the LLM response into ReviewFinding objects and save them.

@Service
public class ReviewService {

    private final MongoTemplate mongoTemplate;
    private final EmbeddingModel embeddingModel;
    private final ChatClient chatClient;
    private final CodeSubmissionRepository submissionRepository;
    private final ReviewFindingRepository findingRepository;

    public ReviewService(MongoTemplate mongoTemplate,
                         EmbeddingModel embeddingModel,
                         ChatClient.Builder chatClientBuilder,
                         CodeSubmissionRepository submissionRepository,
                         ReviewFindingRepository findingRepository) {
        this.mongoTemplate = mongoTemplate;
        this.embeddingModel = embeddingModel;
        this.chatClient = chatClientBuilder.build();
        this.submissionRepository = submissionRepository;
        this.findingRepository = findingRepository;
    }

    public ReviewResponse reviewCode(ReviewRequest request) {
        if (request.code() == null || request.code().isBlank()) {
            throw new ResponseStatusException(HttpStatus.BAD_REQUEST, "Code must not be empty");
        }

        CodeSubmission submission = new CodeSubmission();
        submission.setCode(request.code());
        submission.setLanguage(request.language());
        submission.setFileName(request.fileName());
        submission.setSubmittedAt(Instant.now());

        float[] codeEmbedding = embeddingModel.embed(request.code());
        List matchedPatterns = findSimilarPatterns(codeEmbedding, 5);

        String systemPrompt = buildSystemPrompt();
        String userPrompt = buildUserPrompt(request.code(), matchedPatterns);

        List findings = chatClient.prompt()
                .system(systemPrompt)
                .user(userPrompt)
                .call()
                .entity(new ParameterizedTypeReference<>() {});

        submission = submissionRepository.save(submission);

        for (ReviewFinding finding : findings) {
            finding.setSubmissionId(submission.getId());
        }
        List savedFindings = findingRepository.saveAll(findings);
        List findingIds = savedFindings.stream()
                .map(ReviewFinding::getId)
                .toList();

        submission.setFindingIds(findingIds);
        submissionRepository.save(submission);

        return new ReviewResponse(submission, savedFindings);
    }

    // findSimilarPatterns from section 3 goes here
    // buildSystemPrompt and buildUserPrompt shown below
}

The constructor takes five dependencies. The ChatClient.Builder is a Spring AI auto-configured bean that provides a builder for creating chat clients. The service calls .build() in the constructor to create a ChatClient instance that it reuses for every review request. MongoTemplate provides lower-level MongoDB operations that the repository interfaces do not cover, which you need for the vector search aggregation pipeline.

The reviewCode method starts with a null check on the submitted code. Without it, an empty request would trigger an expensive embedding API call and LLM call before eventually failing. Returning a 400 error early is cheaper and gives the caller a clear error message.

Next, the method creates a CodeSubmission object and populates its fields from the request. It then generates an embedding for the submitted code using the same embeddingModel.embed() method used for patterns. This vector represents the semantic meaning of the code, and findSimilarPatterns uses it to search for patterns whose embeddings point in a similar direction.

The chatClient.prompt() chain builds and sends the LLM request. .system(systemPrompt) sets the system-level instructions that define how the LLM should behave. .user(userPrompt) provides the actual code and matched patterns. .call() sends the request to the OpenAI API. .entity(new ParameterizedTypeReference<>() {}) tells Spring AI to parse the LLM's JSON response directly into a List. Spring AI generates the JSON schema from the target type and instructs the LLM to return JSON in that format, so you do not need to write parsing code yourself.

After the LLM returns findings, the method saves the submission first to get its generated id, then assigns that id to each finding before saving them with findingRepository.saveAll(). Using saveAll in a single batch is more efficient than saving each finding individually, since batch saving makes one database round trip instead of one per finding. Finally, the submission is updated with the list of finding IDs and saved again.

The method returns savedFindings (the list from saveAll) rather than the original findings list. The saved list has MongoDB-generated IDs on each finding. Returning the original list would give clients findings without IDs, making it harder to reference specific findings later.

One catch with the two saves is that if the application crashes between them, the findings will be in the database with a valid submissionId, but the submission document will have an empty findingIds. The data is not lost, though. Each finding still references its parent, so findingRepository.findBySubmissionId(submission.getId()) returns them and you can rebuild the submission's findingIds afterward. If you want stricter atomicity, wrap both writes in a MongoDB multi-document transaction with Spring's @Transactional. Otherwise, treat findingIds as a lookup optimization and query by submissionId as a fallback.

Prompt design

The system prompt sets the reviewer persona and defines the exact output format. Being specific about the JSON structure is important because the entity() call on the chat client needs the response to match the ReviewFinding class:

private String buildSystemPrompt() {
    return """
        You are a senior Java code reviewer. Analyze the submitted code and identify issues.
        You will receive a code snippet and a set of known anti-patterns that matched semantically.
        For each issue you find, return a JSON array of findings. Each finding must have these fields:
        - startLine (int): the line number where the issue starts
        - endLine (int): the line number where the issue ends
        - severity (string): one of "CRITICAL", "WARNING", or "INFO"
        - category (string): one of "security", "performance", "maintainability", "error-handling"
        - message (string): a concise description of the issue
        - suggestion (string): how to fix the issue
        - confidence (double): your confidence from 0.0 to 1.0
        - matchedPatternId (string or null): the pattern ID if it matches a provided pattern

        Focus on real issues. Do not flag stylistic preferences or minor formatting.
        Return ONLY the JSON array, no additional text.
        """;
}

The last two lines are important. "Focus on real issues" prevents the LLM from flagging every minor style choice as a finding. "Return ONLY the JSON array" ensures the response is parseable by Spring AI's entity() method. Without that instruction, the LLM might wrap the JSON in markdown code fences or add explanatory text around it, which would break parsing.

The user prompt provides the code to review and the matched patterns from vector search:

private String buildUserPrompt(String code, List patterns) {
    StringBuilder prompt = new StringBuilder();
    prompt.append("## Code to review\n\n```java\n");
    prompt.append(code);
    prompt.append("\n```\n\n");
    prompt.append("## Known anti-patterns to check against\n\n");

    for (int i = 0; i < patterns.size(); i++) {
        ReviewPattern pattern = patterns.get(i);
        prompt.append(String.format("%d. **%s** (ID: %s, similarity: %.3f)\n",
                i + 1, pattern.getName(), pattern.getId(), pattern.getSearchScore()));
        prompt.append("   Description: ").append(pattern.getDescription()).append("\n");
        prompt.append("   Example: ```java\n   ").append(pattern.getExampleBadCode());
        prompt.append("\n   ```\n");
        prompt.append("   Why: ").append(pattern.getExplanation()).append("\n\n");
    }

    return prompt.toString();
}

The prompt includes each pattern's ID so the LLM can populate the matchedPatternId field in its findings. This creates a traceable link from each issue back to the stored pattern that triggered it. The similarity score from vector search is included too, which gives the LLM a signal about how confident the match is. A pattern with a 0.92 similarity score deserves more weight than one at 0.61, and the LLM can factor that into its confidence assessment.

The chatClient.prompt() call can fail if the OpenAI service is unavailable or if the response does not parse into the expected structure. In this tutorial, the exception propagates as a 500 error. In production, you would want to catch the failure and return a meaningful error response to the caller rather than an unhandled stack trace.

The review controller

The controller exposes three endpoints: one for submitting code for review, one for retrieving a past review by submission ID, and one for listing just the findings:

@RestController
@RequestMapping("/api/reviews")
public class ReviewController {

    private final ReviewService reviewService;

    public ReviewController(ReviewService reviewService) {
        this.reviewService = reviewService;
    }

    @PostMapping
    @ResponseStatus(HttpStatus.CREATED)
    public ReviewResponse submitReview(@RequestBody ReviewRequest request) {
        return reviewService.reviewCode(request);
    }

    @GetMapping("/{submissionId}")
    public ReviewResponse getReview(@PathVariable String submissionId) {
        return reviewService.getReview(submissionId);
    }

    @GetMapping("/{submissionId}/findings")
    public List getFindings(@PathVariable String submissionId) {
        return reviewService.getFindings(submissionId);
    }
}

The POST endpoint at /api/reviews accepts a JSON body with the code to review and returns the full review response including the submission and all findings. The GET endpoint at /api/reviews/{submissionId} retrieves a previous review, and /api/reviews/{submissionId}/findings returns just the findings for a given submission, which is useful when you only need the issues without the submission metadata.

Testing the review engine

Submit a Java method with a few intentional issues:

curl -X POST http://localhost:8080/api/reviews \
  -H "Content-Type: application/json" \
  -d '{
    "code": "public void processFile(String path) {\n    String content = \"\";\n    try {\n        FileInputStream fis = new FileInputStream(path);\n        byte[] data = fis.readAllBytes();\n        content = new String(data);\n    } catch (Exception e) {\n        // handle later\n    }\n    String[] lines = content.split(\"\\n\");\n    String result = \"\";\n    for (String line : lines) {\n        result += line.trim() + \"\\n\";\n    }\n    System.out.println(result);\n}",
    "language": "java"
  }'

This code has three issues: an unclosed FileInputStream (no try-with-resources), a generic catch (Exception e) with an empty body, and string concatenation with += inside a loop. The response includes a finding for each issue, with the matched pattern ID, severity, line range, and a suggestion for how to fix it. The confidence scores typically range from 0.7 to 0.95 depending on how closely the code matches the stored patterns.

5. Tracking review trends with aggregation pipelines

After enough reviews accumulate, you can use MongoDB aggregation pipelines to answer questions like "what issues keep showing up?" across all submissions. Aggregation pipelines work by passing documents through a series of stages, where each stage performs an operation like filtering, grouping, or sorting. The output of one stage becomes the input for the next.

Create an AnalyticsService with three pipelines that surface different views of your review data.

The first pipeline groups findings by category and counts how many times each category appears. This tells you where a team's code most often needs improvement:

public List getCategoryCounts() {
    Aggregation aggregation = Aggregation.newAggregation(
            Aggregation.group("category").count().as("count"),
            Aggregation.sort(Sort.Direction.DESC, "count")
    );
    return mongoTemplate.aggregate(aggregation, "review_findings", CategoryCount.class)
            .getMappedResults();
}

Aggregation.group("category") is a $group stage that collects all findings with the same category value into one group. .count().as("count") adds a field called count to each group that holds the number of documents in it. Aggregation.sort(Sort.Direction.DESC, "count") orders the groups so the most frequent category appears first. mongoTemplate.aggregate() runs the pipeline against the review_findings collection and maps each result into a CategoryCount object.

The second pipeline uses the same structure but groups by severity instead. This shows the balance of critical, warning, and informational findings across all reviews:

public List getSeverityDistribution() {
    Aggregation aggregation = Aggregation.newAggregation(
            Aggregation.group("severity").count().as("count"),
            Aggregation.sort(Sort.Direction.DESC, "count")
    );
    return mongoTemplate.aggregate(aggregation, "review_findings", SeverityCount.class)
            .getMappedResults();
}

If most findings are CRITICAL, the team may need to focus on fundamental practices. If the distribution skews toward INFO, the codebase is generally healthy.

The third pipeline is more involved. It identifies which specific patterns keep recurring across reviews by joining data from two collections. The following diagram shows how documents flow through each stage:

Aggregation pipeline diagram showing the stages from match through group, sort, limit, lookup, unwind, and project

public List getTopPatterns() {
    Aggregation aggregation = Aggregation.newAggregation(
            Aggregation.match(Criteria.where("matchedPatternId").ne(null)),
            Aggregation.group("matchedPatternId").count().as("count"),
            Aggregation.sort(Sort.Direction.DESC, "count"),
            Aggregation.limit(10),
            Aggregation.lookup("review_patterns", "_id", "_id", "pattern"),
            Aggregation.unwind("pattern"),
            Aggregation.project()
                    .and("pattern.name").as("patternName")
                    .and("count").as("count")
    );
    return mongoTemplate.aggregate(aggregation, "review_findings", PatternFrequency.class)
            .getMappedResults();
}

This pipeline has several stages, so here is what each one does:

Aggregation.match(Criteria.where("matchedPatternId").ne(null)) filters out findings that have no matched pattern. Not every finding maps to a stored pattern (the LLM can flag issues independently), so this stage removes those before counting.
Aggregation.group("matchedPatternId").count().as("count") groups the remaining findings by which pattern they matched and counts the occurrences.
Aggregation.sort(Sort.Direction.DESC, "count") orders patterns by how frequently they were matched.
Aggregation.limit(10) keeps only the top 10 results.
Aggregation.lookup("review_patterns", "_id", "_id", "pattern") performs a join with the review_patterns collection. The _id from the grouped result (which holds the matchedPatternId value) is matched against the _id in the review_patterns collection. The matching document is placed into a new array field called pattern. This is similar to a SQL JOIN, but the result is always an array because MongoDB does not assume a one-to-one relationship.
Aggregation.unwind("pattern") flattens that array. Since each grouped result matches exactly one pattern, the pattern array has one element. unwind replaces the array with the single document inside it, which makes the fields easier to access in the next stage.
Aggregation.project() selects the final output fields. .and("pattern.name").as("patternName") pulls the name field from the joined pattern document and renames it to patternName. .and("count").as("count") keeps the count from the grouping stage. Everything else is excluded from the output.

Expose these three pipelines through an AnalyticsController:

@RestController
@RequestMapping("/api/analytics")
public class AnalyticsController {

    private final AnalyticsService analyticsService;

    public AnalyticsController(AnalyticsService analyticsService) {
        this.analyticsService = analyticsService;
    }

    @GetMapping("/categories")
    public List getCategoryCounts() {
        return analyticsService.getCategoryCounts();
    }

    @GetMapping("/severity")
    public List getSeverityDistribution() {
        return analyticsService.getSeverityDistribution();
    }

    @GetMapping("/top-patterns")
    public List getTopPatterns() {
        return analyticsService.getTopPatterns();
    }
}

After running several reviews through the system, the category endpoint might return something like:

[
  { "category": "error-handling", "count": 12 },
  { "category": "maintainability", "count": 8 },
  { "category": "security", "count": 5 },
  { "category": "performance", "count": 4 }
]

This tells you that error handling is the most frequent issue category across all reviewed code. These pipelines scan the entire review_findings collection each time they run. For a tutorial with a few dozen reviews, that is fine. In production with thousands of findings, you would want indexes on category, severity, and matchedPatternId to speed up the $group stages.

6. Testing the full workflow

Here is the complete flow from start to finish:

Start the application. The DataSeeder loads about 20 patterns and generates their embeddings on first run. You should see the patterns in the review_patterns collection in Atlas.

Add a custom pattern. The library is extensible. Add a pattern that is specific to your codebase:

curl -X POST http://localhost:8080/api/patterns \
  -H "Content-Type: application/json" \
  -d '{
    "id": "logging-user-passwords",
    "name": "Logging user passwords",
    "description": "Writing user passwords to log output in authentication flows",
    "language": "java",
    "severity": "CRITICAL",
    "category": "security",
    "exampleBadCode": "logger.info(\"Login: user={}, pass={}\", username, password);",
    "exampleGoodCode": "logger.info(\"Login attempt: user={}\", username);",
    "explanation": "Passwords in logs violate security policy and compliance requirements."
  }'

Submit code with a known issue. Send a snippet with an obvious anti-pattern:

curl -X POST http://localhost:8080/api/reviews \
  -H "Content-Type: application/json" \
  -d '{
    "code": "public String readConfig() {\n    FileInputStream fis = new FileInputStream(\"app.conf\");\n    byte[] data = fis.readAllBytes();\n    return new String(data);\n}",
    "language": "java"
  }'

The response includes a finding for the unclosed FileInputStream with a matchedPatternId pointing to the "unclosed resources" pattern.

Submit code with a subtler issue. Try a snippet that does not exactly match any stored pattern's example:

curl -X POST http://localhost:8080/api/reviews \
  -H "Content-Type: application/json" \
  -d '{
    "code": "public void backup(Path source, Path dest) throws Exception {\n    BufferedReader reader = Files.newBufferedReader(source);\n    BufferedWriter writer = Files.newBufferedWriter(dest);\n    String line;\n    while ((line = reader.readLine()) != null) {\n        writer.write(line);\n        writer.newLine();\n    }\n}",
    "language": "java"
  }'

Even though this uses BufferedReader and BufferedWriter instead of FileInputStream, the vector search still finds the "unclosed resources" pattern as a top match because the semantic meaning is the same: resources opened without try-with-resources. Check the similarity score in the response to see how closely it matched.

Check analytics. After running a few reviews, hit the analytics endpoints:

curl http://localhost:8080/api/analytics/categories
curl http://localhost:8080/api/analytics/severity
curl http://localhost:8080/api/analytics/top-patterns

These show the accumulated data across all your reviews.

Conclusion

You built a code review assistant with three layers. Atlas Vector Search matches submitted code against the pattern library by semantic similarity, so it finds issues even when the code looks different from the stored examples. Spring AI sends the matched patterns and the code to an LLM, which returns structured findings with severity, line ranges, and fix suggestions. MongoDB aggregation pipelines turn the accumulated findings into trends across submissions.

From here, you could expand the pattern library with anti-patterns from your own team's code reviews. You could add support for reviewing full files or Git diffs instead of snippets, or experiment with code-specific embedding models for better similarity matching. A feedback endpoint where developers mark findings as helpful or not would let you improve pattern quality over time.

The complete source code is available in the companion repository on GitHub.

The post AI-Powered Code Review Assistant: Automated Code Analysis with Spring AI and MongoDB appeared first on foojay.

Building an AI-Powered Operations Assistant with Spring AI and MongoDB Atlas — Part 1: RAG Foundation

Matteo Rossi — Thu, 07 May 2026 19:22:11 +0000

Table of Contents

The problemWhat we are buildingWhy RAG and why MongoDB AtlasHow the Pieces Fit TogetherGetting the Project RunningThe Ingestion PipelineThe Retrieval PipelineThe Atlas Vector Search IndexTrying It OutConclusion and What’s Next

This is the first article in a three-part series. Part 2 covers short-term and long-term memory; Part 3 introduces stateful workflow checkpointing with pause/resume.

The problem

It’s 2 a.m. Suddenly, an alert pops up indicating abnormal CPU usage on the payment services. The on-call engineer opens their laptop, logs into the monitoring dashboards, and begins the hunt. One by one, he searches the runbooks on Confluence, checks the Slack chats, and opens the GitHub wikis and documents shared during the design phase. By the time he finds any useful information, ten minutes have already passed.

And what he finds is often not what he was looking for because he didn’t know which keywords to use for the search. Or perhaps what he finds isn’t up to date.

We’re talking about a problem that, in theory, has already been solved. The team managing the service has prepared and versioned the runbooks needed to resolve the incident; the knowledge is available and documented. The real problem is searching for and retrieving this knowledge: taking and extracting the right context from the ongoing incident, identifying the root cause, and correctly matching it to the part of the documentation that addresses that problem.

So, this is one of the many problems we can solve with Retrieval-Augmented Generation (RAG).

What we are building

In this series of articles, we will build an Operations Assistant: a Spring AI-based Java application that allows engineers to ask questions in plain English and receive answers that help them perform operations and solve problems, based on their operational knowledge base.

In this first article, we’ll focus on the foundation: loading documentation into a vector store and linking it to a language model so that every answer is anchored to real, company-specific content. We don’t want a generic response from an LLM. The result is already useful in itself: we will have APIs connected to a small UI, where the user can ask questions such as “What are the steps to roll back my latest deployment on Kubernetes?” and receive structured answers consistent with the company’s documentation.

In parts 2 and 3, we will add conversational memory and persistence, leveraging MongoDB as a unified database.

Why RAG and why MongoDB Atlas

An LLM is a perfect tool for generating generic responses, but it stops being effective the moment I ask it for specific information about your systems. And the problem is clear: it has never seen your runbooks, read your documentation, reviewed your postmortems, or understood the naming convention your team decided on over a post-work beer three years ago.

It is possible to fine-tune a model on this content, but it is an expensive, slow, and difficult process to keep up to date: every time someone updates a runbook, the model needs to be retrained.

Fortunately, there’s RAG. RAG allows us to store our information in an external container rather than within the model, retrieve this information when a request is made, and use it within the model’s context window alongside the query. Once the model receives the query, it reads the documentation and provides an answer. Quick win: the documentation is always up to date, and the model will always use the latest available version.

Where do I save this documentation? Well, that’s where MongoDB comes to the rescue. The same Atlas cluster that will contain our documentation will also allow us, in future articles, to host our conversation history and workflow checkpoints. A single platform serving multiple purposes: this means less management overhead and an infrastructure that’s easier to manage. One less headache for the operations team, which already has to handle other requests.

Atlas provides a native Vector Search feature that integrates directly with the MongoDBAtlasVectorStore abstraction provided by Spring AI. This means there is no separate vector database to set up and deploy, and most importantly, no ETL pipeline to synchronize.

Documents and their embeddings coexist within the same collection and can be retrieved using the same infrastructure and connection.

Another truly interesting and useful feature is metadata filtering. Every piece of documentation we save in our database includes metadata, such as the system it refers to, the environment, the associated severity, and which team is responsible. When a request is made, the retrieval advisor can pre-filter the vector search based on this metadata. In the example scenario, a request regarding the payments service in the production environment will bring to the model’s attention only the runbooks associated with this service and this environment. This is particularly efficient and accurate when the database grows.

How the Pieces Fit Together

Before diving into the code, it’s helpful to have a clear picture of what we’re about to build.

Let’s start with the engineer interacting with their assistant: when a question is asked, the request passes through a Spring AI ChatClient. This ChatClient has been configured with a chain of advisors, the most important of which, for this part of the tutorial, is the QuestionAnswerAdvisor.

The QuestionAnswerAdvisor intercepts all requests before they are submitted to the model and converts the question into a vector using, in this case, OpenAI’s text-embedding-3-small. Once a vector is obtained, a cosine similarity search is performed on the knowledge_chunks collection within Atlas. The top 5 document chunks that match the request are formatted and returned to the prompt to form the grounding context.

At this point, we have three components that are sent to the model: the system prompt that defines the role of the person submitting the request, the content retrieved from the knowledge base, and the original question from the operations expert. The prompt explicitly requires that the model cannot rely solely on its own knowledge to answer the question, but must rely on the context retrieved from the knowledge base, and when this is insufficient, the model must clearly indicate it.

From an ingestion perspective, whenever a new runbook, procedure, postmortem, or version of the documentation is produced, it must be split into overlapping chunks (to preserve context) of approximately 800 tokens by the TokenTextSplitter. Each chunk is then embedded and saved within the knowledge_chunks collection along with all associated metadata. All of this is possible with the VectorStore.add() method, which handles making the request to the embedding API and writing to MongoDB in a single operation.

The pipeline is shown below:

The true strength of Spring AI lies in its abstraction, which allows the controller and service to remain unaware of which model is being used—whether it’s GPT-5.4 or Sonnet 4.6—or whether the database is MongoDB Atlas or PostgreSQL. Changing the model or database is a matter of configuration, not a new development task. A game changer.

Getting the Project Running

Now that we’ve laid the groundwork for understanding what we’re building, let’s get started with the implementation. The project requires Java 21, Maven 3.9+, an OpenAI API key, and a MongoDB Atlas cluster.

Simply clone the repository:

git clone https://github.com/matteoroxis/operations-assistant.git

cd operations-assistant

Then set the two environment variables required to configure the database and OpenAI API key.

export MONGODB_URI="mongodb+srv://:@.mongodb.net/ops_assistant?appName=devrel-tutorial-java-agentic-workflows-foojay"

export OPENAI_API_KEY="sk-..."

Everything is ready. Let’s start the application:

mvn spring-boot:run

The configuration file specifies that the vector store should use a collection named knowledge_chunks with a search index named knowledge_vector_index. It also specifies the use of a text-embedding-3-small embedding model with 1536 dimensions and the gpt-5.4-mini model with a temperature of 0.2. What does the temperature indicate?

A temperature of 0.2 makes the output more deterministic and less likely to hallucinate steps that don't exist in the runbooks. For operational procedures, consistency is more important than variety.

All of this works thanks to Spring AI, which is integrated into the application using the following dependencies, listed in the BOM version 1.0.0

spring-ai-starter-model-openai for the chat and the embedding model
spring-ai-starter-vector-store-mongodb-atlas to use MongoDBAtlasVectorStore
spring-ai-advisors-vector-store for the QuestionAnswerAdvisor
spring-boot-starter-data-mongodb to manage interaction and connection with the Atlas cluster.

Everything else consists of standard Spring Boot dependencies.

The Ingestion Pipeline

The ingestion pipeline has a single task: taking a document, dividing it into chunks of the right size, and saving each chunk along with its embedding and corresponding metadata in the Atlas database. That’s it.

The chunking process is not as straightforward as it seems. If the chunks are too large, the embedding captures a topic that is too wide to provide the right level of detail. This causes the context window to be filled with information that may be irrelevant to the initial query. If the chunks are too small, each embedding becomes too limited and specific. As a result, the similarity score might provide inconsistent results with a lot of noise, and more importantly, there is a risk of never being able to retrieve related information scattered across multiple chunks.

For this reason, a target of 800 tokens was used with a small overlap between adjacent chunks, which is useful for retrieving context. This size, also used and recommended by OpenAI, is sufficient to preserve the logical structure of the runbook or documentation without losing precision during the search phase.

As mentioned earlier, metadata is attached to each chunk. For example, within the chunks that make up the runbook for the production payment service, information such as sourceType, system, environment, and team is associated. This metadata is used at query time to perform pre-filtering operations and improve similarity search. The MongoDB Atlas Vector Store saves each chunk as a BSON document with a content field, a metadata subdocument, and an embedding array: none of this requires custom mapping across collections or document sync operations.

The application exposes an API for manually ingesting a text document and its associated metadata as a JSON body. The application also provides sample runbooks in Markdown, covering scenarios such as detecting abnormal CPU usage, service rollbacks, disk space alerts, and network latency. A POST request to the API at /api/ops/knowledge/ingest/sample automatically loads all of them, allowing you to have a working system right away.

The Retrieval Pipeline

Spring AI truly stands out in the retrieval phase, making many of the operations performed transparent. The ChatClient is configured with the QuestionAnswerAdvisor, which allows every request coming from the chat to be wrapped in a retrieval operation. From the controller’s perspective, an incoming request translates into a series of sequential steps that produce an outgoing response, all linked together within a single method chain.

Behind the scenes:

The advisor embeds the user’s question using the same model used during knowledge base ingestion. This produces a vector compatible with all previously stored vectors.

The advisor performs a cosine-similarity search on the knowledge_chunks collection and retrieves the first 5 chunks that meet the minimum similarity threshold.
If the request includes contextual information, this is used as a pre-filter to reduce the number of potential search candidates.
The retrieved chunks are formatted and injected into the prompt as context, along with the user’s question and its characteristics.

The prompt assembled from the previous steps is passed to GPT-5mini, which formulates the response based on the provided context.

Throughout this process, the controller remains extremely lightweight, simply validating the inputs, constructing the filters to pass to the advisor, and building the response.

This separation of responsibilities will become clearer and more useful as we continue through the tutorial. In Parts 2 and 3, we will add new advisors to the chain—for conversational memory and long-term memory—all without touching the controller or the exposed interfaces.

The Atlas Vector Search Index

Before you can run any similarity search, you must have a Vector Search index on the knowledge_chunks collection. The Atlas cluster tier affects how the index is created.

Spring AI can create the index on its own. Simply specify the following in application.yml:

spring.ai.vectorstore.mongodb.initialize-schema: true

When the application starts, MongoDBAtlasVectorStore executes the createSearchIndexes command on the Atlas cluster, and the index is ready to use in just a few seconds.

If you want to create the index manually, you need to access the Atlas UI, navigate to the relevant cluster, open Atlas Search, and create a new index using the JSON Editor. Select the ops_assistant database and the knowledge_chunks collection, and use the following index definition, naming it knowledge_vector_index:

{
  "fields": [
    {
      "type": "vector",
      "path": "embedding",
      "numDimensions": 1536,
      "similarity": "cosine"
    },
    { "type": "filter", "path": "metadata.sourceType" },
    { "type": "filter", "path": "metadata.system" },
    { "type": "filter", "path": "metadata.environment" },
    { "type": "filter", "path": "metadata.severity" },
    { "type": "filter", "path": "metadata.team" }
  ]
}

Trying It Out

We’re finally ready to test. Open your browser to http://localhost:8080 while the application is running, click “Load Sample Runbooks,” and wait a few seconds for the sample runbooks to finish loading. Now, try typing the following in the chat panel:

> My payment-service pod is at 90% CPU. What should I check first?

The result is not the same as what you would get by asking the same question to any LLM. The result includes a reference to the actual steps outlined in the “Abnormal CPU Usage” runbook, which is included within the sample runbooks. Use kubectl top pods to identify the pods affected by the issue, collect a JVM thread dump with jstack, and analyze the garbage collector with jstat.

Finally, review all recent deployments within the cluster. All these operations accurately reflect the company’s operational documentation: the retrieval pipeline identified and collected the sections of the runbook relevant to this scenario, then placed them within the model’s context to provide a response tailored to your use case.

Let’s now try asking another question using the optional filters for system and environment, and see how the retrieved context changes. If we ask a question with the payment-service and prod filters, the retrieval mechanism retrieves a different subset of chunks compared to the unfiltered version, since these filters narrow the set of candidates before the similarity search.

Conclusion and What’s Next

In this tutorial, we built a RAG system that answers questions by providing responses based on the content of your documentation. To do this, we first gathered a collection of runbooks written in Markdown and divided them into appropriately sized chunks. All of these chunks were embedded within the MongoDB Vector Store, along with associated metadata. Everything was connected to an LLM using a Spring Boot advisor.

The real gem of this code is that many of the tasks required for proper operation are handled without writing custom code. We only step in when we need to add our business logic, and not to write complicated methods for interacting with the Vector Store or the LLMs, which remain completely transparent and interchangeable. Pretty cool, right? The application focuses on writing the domain logic, such as which metadata to use for pre-filtering or which filters to apply. Everything else is declarative.

The one thing this version lacks is memory. If we try to ask a second question in the chat, we’ll see that our application won’t remember at all what it responded to just a few seconds earlier. Each response provides a completely new context, unaffected by what happened previously.

The second part of this series introduces two complementary memory systems. The first, called short-term memory, uses a sliding window to keep the current conversation in memory, so that more complex analyses of the issues at hand can be performed. The second system, called long-term memory, contains preferences, decisions, recommendations, and events from past conversations, which are referenced in all new responses to apply consistent patterns to resolution recommendations. The system will therefore remember that I prefer to perform a rollback using a Helm chart rather than using kubectl via the command line.

You can find the code for this article in the following repository. Try modifying the content with your own runbooks and documentation to see how it behaves in your use cases.

The post Building an AI-Powered Operations Assistant with Spring AI and MongoDB Atlas — Part 1: RAG Foundation appeared first on foojay.

When Should You Use a Cache With MongoDB?

Andrew Morgan — Tue, 05 May 2026 15:23:30 +0000

Table of Contents

Why were caches like Memcached & Redis invented, and why do they thrive?So, what's wrong with having a caching tier?What's different with MongoDB?What does AI think?SummaryLearn more about MongoDB design reviews

From time to time, I'll run a design review for an application being migrated from a relational database onto MongoDB, where the customer shares an architectural diagram showing a caching layer (typically Redis) sitting between the app server and MongoDB.

I like to keep the architecture as simple as possible—after all, each layer brings its own complexity and management costs—so I'll ask why the caching layer is there. Of course, the answer is always that it's there to speed up data access. This reveals a misunderstanding of both the reason why caching layers were created and what MongoDB provides.

I've yet to finish a design review without recommending that the cache tier be removed.

So to answer the question in the title of this article—when should you use a cache with MongoDB?—the answer is probably never. This article attempts to explain why, but if you get to the end and still think your application needs it, then I'd love to discuss your app with you.

Why were caches like Memcached & Redis invented, and why do they thrive?

Caching tiers were introduced because it was too slow for applications to read the required data directly from a relational database.

Does this mean there aren't smart developers working on Oracle, DB2, Postgres, MySQL, etc.? Why couldn't those developers make relational databases fast? The answer is that all those databases were written by great developers who included indexes, internal database caches, and other features to make reading a record as fast as possible.

The problem is that the application rarely needs to read just a single record from the normalised relational database. Instead, it typically needs to perform multiple joins across many tables to form a single business object. These joins are expensive (they're slow and consume many resources). For this reason, the application doesn't want to incur that cost every time they read the same business object. That's where the caching tier adds value—join the normalised, relational data once and then cache the results so that the application can efficiently fetch the same results many times.

There's also the issue of data distribution. Most relational databases were designed 50 years ago when an enterprise would run the database and any applications in a single data center. Fast forward to today, when enterprises and customers are spread worldwide, with everyone wanting to work with the same data. You don't want globally distributed app servers to suffer the latency and expense of continually fetching the same data from a database located on a different continent. You want a copy of the data located locally close to every app server that needs it.

Relational databases were not designed with this data distribution requirement in mind. RDBMS vendors have attempted to bolt on various solutions to work around this, but they're far from optimal. Instead, many enterprises delegate the data distribution to a distributed cache tier.

Note that Redis and Memcached are widely used for session handling for web applications where persistence isn't a requirement. In that case, the cache is the only data store (i.e., not a cache layer between the application and MongoDB). While you can (and people do) use MongoDB for session management, that's beyond the scope of this article.

So, what's wrong with having a caching tier?

Introducing a caching layer is often a great solution when your database can't deliver the performance and latency your application needs.

However, this extra data tier comes with costs. The obvious ones are the software licenses and hardware required to provide the caching service.

Less obvious is the extra load on developers. It's a new query language (and possibly programming language) to master. What happens when the data in the RDBMS changes? How are those changes propagated to your cache tier?

So, a cache tier has to pay its way by delivering tangible benefits over having your application access the database directly.

What's different with MongoDB?

The MongoDB document model.

In MongoDB, we want you to store your data structured in a way that makes it efficient to quickly satisfy your application's most frequent queries (or those with the toughest SLAs). MongoDB mirrors the structure of objects by letting a single record (document) contain embedded (nested) objects. Support for arrays allows one-to-many and many-to-many relationships without joining multiple collections.

In many cases, the business object required by the application will map to a single MongoDB document. In other cases, it might require multiple documents that can be fetched with a single, indexed lookup.

MongoDB has its own internal LRU (least recently used) cache, so if your document has been accessed recently, chances are it's already in memory. So, as with Redis, MongoDB can satisfy the application's query by fetching a single document/object from memory.

Note that MongoDB supports joins, but we try to structure your data to minimize their use.

The other value-add from a caching layer is data locality in distributed architectures. MongoDB has this built in. A MongoDB replica set has a single primary node that handles all writes, together with up to 49 secondary nodes—each with a copy of the data. For the lowest latency queries, you can place secondaries locally at each of your app server locations. MongoDB is responsible for keeping the data in the secondary nodes up to date with the primary, so you don't need to write and maintain any extra synchronization code.

What does AI think?

The responses created by generative AI are driven by the information that's been published by real people, and so it should represent popular opinion on a topic. I thought it would be interesting to see what an AI has come to understand as conventional wisdom on why people place a cache in front of MongoDB.

I asked ChatGPT 4o this question:

"Explain why I would use a cache layer (such as Redis) rather than having my application read data from MongoDB directly."

I'll summarize and respond to the key benefits of the cache tier identified by ChatGPT:

"Improved Performance. Redis operates entirely in memory, making it significantly faster than MongoDB, which relies on disk I/O for data retrieval." As described above, MongoDB has its own in-memory cache, so the documents you frequently access will be in memory, and no disk access is required.

"Reduced Load on MongoDB. Frequently accessing MongoDB directly for the same data increases query load, which can slow down the database, especially under heavy read traffic." MongoDB is scalable. Extra secondary nodes can be added to the replica set to add extra query bandwidth. MongoDB sharding (partitioning) can scale data capacity or write throughput horizontally.

"Handling High-Read Traffic. Applications with high read-to-write ratios (e.g., web apps, APIs) benefit from Redis’s ability to serve cached data quickly." MongoDB's database cache provides the same benefits without the extra developer effort to synchronize data changes.

"Faster Access to Frequently Used Data. Redis is ideal for caching frequently accessed or hot data (e.g., user sessions, configurations, or product details)." Frequently accessed, hot data will be held in MongoDB's in-memory database cache.

"Lower Latency for Geo-Distributed Applications. By replicating Redis caches closer to end-users, you can avoid high network latency when querying MongoDB from distant locations." Data locality can be solved by placing replicas near your app server sites.

"Support for Expiring Data (TTL). Redis has a built-in Time-to-Live (TTL) feature that automatically removes cached data after a specified duration." MongoDB uses an LRU cache, so any documents that are no longer being queried will be removed from memory if the space is needed for more recently queried data. MongoDB also has TTL indexes if you want to remove the documents from the database entirely or Atlas Online Archive if you want to move them to cheaper storage.

"Cost Efficiency. Reading from MongoDB repeatedly can be resource-intensive, especially with complex queries, leading to increased infrastructure costs." Your MongoDB schema should be designed so that your important queries don't require complex queries.

"Use Cases for Specialized Data Structures. Redis supports advanced data structures like lists, sets, sorted sets, hashes, and streams, which MongoDB doesn’t provide natively." MongoDB supports lists and sets. Hashes can be represented in MongoDB as an array of documents containing key-value pairs (the MongoDB attribute pattern). MongoDB time series collections meet the same needs as Redis streams.

"Resilience and Fault Tolerance. A cache layer can serve as a fallback if MongoDB is temporarily unavailable or under heavy load." MongoDB can scale vertically or horizontally to meet any load demands. Scaling can be automated when using MongoDB Atlas. MongoDB replica sets provide fault tolerance for both reads and writes.

"Simplified Complex Query Results. MongoDB can take time to compute complex queries (e.g., aggregations, joins) for frequently requested results." Your MongoDB schema should be designed to avoid the need to run complex queries frequently. Results can be stored (cached) in a MongoDB materialized view, avoiding the need to repeatedly execute the same complex query/aggregation.

Note that the response you get from ChatGPT is heavily skewed by the question you ask. If I change my prompt to "Explain why I shouldn't use a cache layer (such as Redis) rather than having my application read data from MongoDB directly," ChatGPT is happy to dissuade me from adding the cache layer, citing issues such as increased system complexity, data consistency issues, performance for write-heavy workloads, cost, query flexibility, maintenance and reliability, small data sets (where the active data set fits in MongoDB's cache), and real-time reporting.

Summary

A cache layer can add much value when your RDBMS cannot deliver the query performance your application demands. When using MongoDB, the database of record and cache functionality is combined in a single layer, saving you money and developer time.

A distributed cache can mitigate shortfalls in your RDBMS, but MongoDB has a built-in distribution.

Respond to this article if you still believe your application would benefit from a cache layer between your application and MongoDB. I'd love to take a look.

Learn more about MongoDB design reviews

Design reviews are a chance for a design expert from MongoDB to advise you on how best to use MongoDB for your application. The reviews are focused on making you successful using MongoDB. It's never too early to request a review. By engaging us early (perhaps before you've even decided to use MongoDB), we can advise you when you have the best opportunity to act on it.

This article explained how designing a MongoDB schema that matches how your application works with data can meet your performance requirements without needing a cache layer.

The post When Should You Use a Cache With MongoDB? appeared first on foojay.

Large-Scale ETL Pipeline Architecture

Matteo Rossi — Fri, 01 May 2026 19:23:04 +0000

Table of Contents

Rethinking ETL for modern systemsArchitectural building blocksEmbracing concurrency with reactive pipelinesBackpressure: the hidden heroDesigning for failure: error handling strategiesRetry and recovery patternsIdempotency: the cornerstone of safe retriesBatching vs streamingParallelizing transformationsIntegrating with messaging systemsObservability and monitoringPutting it all togetherTrade-offs and practical considerationsConclusion

Modern data-driven systems. ETL pipelines are no longer simply scheduled background processes that run silently overnight, one after another. They are the backbone of real-time analytics, powering operational dashboards and recommendation systems. They enable machine learning workflows.

This evolution, while creating enormous benefits, has—with the increase in data volume and the decrease in latency tolerance—called into question the traditional sequential ETL approach. A bottleneck for the speed is now required.

Designing an ETL pipeline today that operates at scale means tackling issues such as concurrency management, resilience against failures—whether total or, worse yet, partial—and observability. The challenge is no longer simply moving data, but doing so quickly, securely, predictably, and in a way that can be observed.

In this article, we will explore how to design a high-throughput ETL pipeline architecture using Java, focusing on concurrency models, error recovery strategies, and practical implementation techniques. To do this, we will leverage tools such as Project Reactor to build scalable, non-blocking pipelines. In the loading stage, we will consider MongoDB as the sink for transformed data.

Rethinking ETL for modern systems

Conceptually, the ETL (Extract, Transform, Load) model remains unchanged. It is not in opposition to the emerging ELT model, but rather complements it. Its implementation, however, has evolved significantly. Instead of monolithic batch processes, today’s pipelines are, by their very nature:

Distributed
Concurrent
Fault-tolerant
Incremental or streaming

At scale, to achieve high throughput goals, each stage of the pipeline must be independently scalable and resilient. Extraction may involve retrieving data from APIs, databases, or message queues. Transformation may require enrichment, validation, and aggregation. Loading could target data warehouses, MongoDB collections, search indexes, or downstream services.

What is the main challenge? Coordinating these stages efficiently and effectively, without introducing bottlenecks or single points of failure.

Architectural building blocks

A robust architecture for an ETL pipeline must consist of several loosely coupled layers, interconnected through well-defined and clear interfaces and boundaries. Each layer is a modular unit with its own specific responsibility.

In general, the architecture looks like this:

Source ingestion layer
Transformation layer
Loading layer
Error handling and recovery layer
Observability layer

The layers must communicate with one another, avoiding, where possible, heavy synchronous synchronization. The advantage is the ability to leverage asynchronous communication that accounts for backpressure mechanisms.

Embracing concurrency with reactive pipelines

One of the most effective ways to build high-throughput pipelines is to adopt reactive, non-blocking processing. We can use libraries like Project Reactor, which provide abstractions such as Flux and Mono that allow us to model data as streams and process them concurrently.

Let’s look at an example, starting with the basics. A simplified extraction and loading pipeline:

Flux pipeline =
    extract()
        .flatMap(this::transform)
        .flatMap(this::load)
        .doOnError(error -> log.error("Pipeline error", error));

To the casual observer, this flow might appear to be a sequential process. The secret lies in using `flatMap`, which enablesthe simultaneous processing of multiple records. Each stage can process the elements independently, and the pipeline naturally adapts to the available resources.

First point to note: concurrency. Concurrency must be controlled. Unrestricted parallelism can overload downstream systems, causing a cascade of problems.

.flatMap(this::transform, 10) // limit concurrency
.flatMap(this::load, 5)

We always balance system throughput and stability by appropriately adjusting the levels of concurrency that are both possible and necessary for each layer.

Backpressure: the hidden hero

In this type of system, producers often overwhelm consumers with the speed at which they generate data and events. Without proper control, this leads to memory overload and system instability.

A new hero is in town: reactive streams introduce the concept of backpressure, allowing downstream components to signal how much data they can handle. With Project Reactor, this mechanism is built into the model. For example:

extract()
    .onBackpressureBuffer(1000)
    .flatMap(this::transform, 10)
    .flatMap(this::load, 5);

In this case, when the downstream system is slower, we try to accumulate up to 1,000 elements in the buffer. Alternatively, instead of accumulating, we can discard or limit the elements, depending on the use case.

Choosing the right backpressure management strategy is critical and depends heavily on the use case: for financial or transactional data, discarding records is unacceptable, whereas for telemetry or log data, it is acceptable.

Designing for failure: error handling strategies

Let’s start with a basic premise: failures in this type of pipeline are inevitable. Network issues, invalid data, timeouts, and downstream service interruptions are all events that occur regularly. The goal is not to eliminate failures, but to manage them properly.

A simple pipeline like the one below fails immediately:

.flatMap(this::transform)
.flatMap(this::load)

If a single record fails, the entire flow could be interrupted and leave the system in an undefined state: generally speaking, that’s not the outcome we’d like.

Instead, let’s try to isolate failures at the individual record level:

.flatMap(record ->
    transform(record)
        .flatMap(this::load)
        .onErrorResume(error -> {
            log.warn("Failed processing record {}", record.getId(), error);
            return Mono.empty();
        })
)

This ensures that one bad record does not stop the entire pipeline.

Retry and recovery patterns

Some types of failures are temporary, and the operation can be safely retried once the outage has ended. Other types of failures, however, require compensation or manual intervention.

Here is a simple example of a retry:

.flatMap(record ->
    transform(record)
        .flatMap(this::load)
        .retryWhen(Retry.backoff(3, Duration.ofMillis(200)))
)

The system makes a total of three attempts, applying an exponential backoff between each attempt. However, these attempts must be used with great caution and care to avoid cascading errors.

To ensure more reliable recovery, we can also use a dead letter queue (DLQ). Failed records are retained to allow for subsequent analysis:

.onErrorResume(error -> 
    sendToDeadLetterQueue(record, error)
)

This allows the pipeline to continue processing while preserving problematic data.

Idempotency: the cornerstone of safe retries

Retry attempts have one mandatory prerequisite: they only work if they are idempotent. Without idempotence, repeated execution can lead to duplicate data and inconsistent states.

Let’s look at some examples: loading data into a database should be designed to tolerate multiple identical operations on the same objects and prevent the existence of duplicates. In practice:

Use upserts instead of inserts
Include unique keys in the communication
Track the processing status

When MongoDB is used as the sink, a common approach is to map the business identifier to _id, or to enforce a unique index on the key used to identify the record. This allows the loading stage to safely retry the same operation without creating duplicates.

A simple example using an upsert with MongoDB:

public Mono load(DataRecord record) {
return Mono.from(
     mongoCollection.replaceOne(
         Filters.eq("_id", record.getId()),
         toDocument(record),
         new ReplaceOptions().upsert(true)
)
);
}

This ensures that reprocessing the same record does not corrupt data.

Batching vs streaming

Another key decision in designing this type of pipeline is the processing model. Should data be processed in batches or as a continuous stream?

Batch processing improves efficiency by reducing the I/O workload:

.buffer(100)
.flatMap(this::bulkLoad)

Streaming, on the other hand, reduces latency and improves overall responsiveness.

In practice, hybrid approaches work best: this means, for example, processing small batches continuously:

.bufferTimeout(100, Duration.ofSeconds(1))
.flatMap(this::bulkLoad)

This balances throughput and latency effectively.

When MongoDB is the sink, batching can be implemented through bulk write operations:

public Mono bulkLoadToMongo(List records) {
     if (records == null || records.isEmpty()) {
         return Mono.empty();
     }

     List> writes = records.stream()
         .map(record -> new ReplaceOneModel(
             Filters.eq("_id", record.getId()),
             toDocument(record),
             new ReplaceOptions().upsert(true)
         ))
         .toList();

     return Mono.from(
         mongoCollection.bulkWrite(
             writes,
             new BulkWriteOptions().ordered(false)
         )
     );
}

Using unordered bulk operations improves throughput because individual write failures do not necessarily block the rest of the batch.

Parallelizing transformations

Processing operations often place a heavy load on the CPU; we can try to maximize performance by parallelizing them wherever possible:

extract()
    .parallel()
    .runOn(Schedulers.parallel())
    .flatMap(this::transform)
    .sequential()
    .flatMap(this::load);

This allows you to distribute the work across multiple CPU cores: the more processors I have available, the sooner my work will be completed. However, parallelization introduces a certain level of complexity that must be understood and managed, especially when order matters.

If order matters, the first thing to do is preserve it:

.flatMapSequential(this::transform, 10)

This maintains order while still allowing some concurrency.

Integrating with messaging systems

High-throughput ETL architectures often rely on messaging systems such as Apache Kafka. Instead of retrieving data in batches, the pipelines process events as they occur. This produces a continuous stream of data, and in the case of Kafka, a consumer can be implemented as follows:

receiver.receive()
    .flatMap(record ->
        transform(record.value())
            .flatMap(this::load)
            .doOnSuccess(v -> record.receiverOffset().acknowledge())
    )

This approach enables real-time processing and horizontal scalability, allowing the system to handle the incoming data flow.

Kafka also offers durability and reproducibility through its durable logs mechanism, as well as parallelism through its partitioning mechanism: these are essential features for large-scale ETL systems.

Observability and monitoring

In distributed systems—and this is nothing new—observability is just as important as correctness. Without adequate observability processes and mechanisms, debugging becomes slow, complicated, or even impossible.

A good ETL pipeline should provide:

Metrics on point-in-time and overall throughput
Error rates
Processing latency
Number of retries

How can we achieve all this? Tools like Micrometer integrate seamlessly with reactive pipelines and enable the collection of system observability metrics:

.doOnNext(record -> metrics.incrementProcessed())
.doOnError(error -> metrics.incrementErrors())

When we talk about observability, it’s essential to also discuss logging and tracing. Every record should include a correlation ID to track its path through the pipeline and make troubleshooting easier.

Putting it all together

Let’s combine the concepts into a more complete pipeline:

public Flux buildPipeline() {

    return extract()
        .onBackpressureBuffer(1000)
        .flatMap(record ->
            transform(record)
                .flatMap(this::load)
                .retryWhen(Retry.backoff(3, Duration.ofMillis(200)))
                .onErrorResume(error ->
                    sendToDeadLetterQueue(record, error)
                ),
            10
        )
        .bufferTimeout(100, Duration.ofSeconds(1))
        .flatMap(this::bulkLoadToMongo)
        .doOnError(error -> log.error("Pipeline failure", error));
}

This pipeline:

Controls concurrency
Handles backpressure
Retries transient failures
Isolates bad records
Uses batching for efficiency
Preserves system stability

Trade-offs and practical considerations

As with any architectural discussion, there is no universally optimal choice. There are several options, each with its own advantages and disadvantages, as well as trade-offs that must be accepted and understood.

Reactive pipelines reduce thread usage but increase cognitive complexity. Debugging asynchronous flows can be more difficult than with traditional imperative code, which is primarily characterized by thread-by-thread operations.

Retry mechanisms improve system resilience but can amplify the load during service interruptions: they should be used with caution and discretion.

Batch processing improves overall throughput but increases latency.

The key is to align architectural decisions with the requirements of the solution being built: a real-time fraud detection system has very different constraints compared to a nightly reporting process.

Conclusion

Designing large-scale ETL pipelines involves addressing and understanding the associated complexities: concurrency, integration patterns, error handling, and observability.

By leveraging reactive programming with tools such as Project Reactor and integrating them with streaming platforms like Apache Kafka, and using MongoDB as the sink for transformed data, it is possible to build high-performance pipelines with appropriate resilience requirements.

The real shift lies in the design and the model: we move from linear data processing to a stream-based way of thinking. Changing this paradigm means designing systems as data streams rather than as a set of sequential steps. This shift enables higher throughput, better fault isolation and recovery, and increased scalability that scales with incoming demand.

Building ETL pipelines is not just about moving data: it is about creating a reliable and scalable backbone for the entire data platform. All the code is available at this repository.

The post Large-Scale ETL Pipeline Architecture appeared first on foojay.

Building a Personalized Content Delivery System

Farhan Hasin Chowdhury — Thu, 23 Apr 2026 15:10:45 +0000

Table of Contents

Prerequisites1. Data model2. Project setup3. Building the content-based recommendation engine

UserProfileController
GameRepository
RecommendationService core logic
RecommendationController
Manual test

4. User ratings and affinity adjustment

Ratings endpoint
Affinity adjustment logic
MongoDB update
Before and after demo

5. Adding Spring AI embeddings and MongoDB Atlas Vector Search

Spring AI setup
Generating embeddings
DataSeeder update
Atlas Vector Search index
Vector search query
Results

6. Combining both signals

Merging approach
Unified response
Recommendation flow
Tuning weights

7. Testing the full workflowConclusion

Recommendation engines have a reputation for requiring specialized ML infrastructure: matrix factorization pipelines, training jobs, and model serving layers. That is one way to do it, but not the only way. If your data already lives in MongoDB and your application runs on Spring Boot, you can build a practical recommendation system using tools you already have. MongoDB aggregation pipelines handle the scoring math server-side, and Atlas Vector Search adds semantic matching without a separate vector database.

In this article, you will build an indie game discovery platform with two complementary recommendation approaches. The first is content-based preference scoring: users create profiles with weighted preferences for genres, tags, and game mechanics, and MongoDB aggregation pipelines score every game against those weights. When users rate games, the system adjusts their preference weights over time, so recommendations improve with each interaction. The second approach uses Spring AI embeddings and MongoDB Atlas Vector Search to catch semantic connections that literal tag matching misses. A game tagged "exploration" and "mystery" should appeal to someone who likes "adventure" and "narrative," even though the strings never overlap.

By the end, you will have a working recommendation API built with Java 21+, Spring Boot 3.x, Spring Data MongoDB, and Spring AI, combining both approaches into a single ranked result. The embedding layer uses OpenAI's text-embedding-3-small model, but any embedding provider that Spring AI supports will work. The complete source code is available in the companion repository on GitHub.

Prerequisites

Java 21 or later
Spring Boot 3.x (use Spring Initializr with the Spring Data MongoDB and Spring Web dependencies; Spring AI is added manually later in the article)
A MongoDB Atlas cluster (the free tier is sufficient, and you will need it for Atlas Vector Search). You can set up one by following the MongoDB Atlas Getting Started Guide.
An OpenAI API key (used for generating embeddings in the second half of the article)
Basic familiarity with Spring Boot (controllers, services, dependency injection)

1. Data model

The system needs two collections: one for games and one for user profiles. Start with the Game document:

@Document(collection = "games")
public class Game {

    @Id
    private String id;
    private String title;
    private String description;
    private String developer;
    private List genres;
    private List tags;
    private List mechanics;
    private double rating;
    private int releaseYear;

    public Game(String title, String description, String developer,
                List genres, List tags, List mechanics,
                double rating, int releaseYear) {
        this.title = title;
        this.description = description;
        this.developer = developer;
        this.genres = genres;
        this.tags = tags;
        this.mechanics = mechanics;
        this.rating = rating;
        this.releaseYear = releaseYear;
    }

    // getters and setters
}

The genres, tags, and mechanics fields are all List rather than single values. A game like Hades is both a roguelike and an action game. It has tags like "fast-paced" and "mythology" and mechanics like "permadeath" and "procedural-generation." Storing these as arrays works well with MongoDB because you can match and filter on array fields directly in queries and aggregation pipelines. The recommendation engine you will build in section 3 relies on this to find games that overlap with a user's preferences.

Next, the UserProfile document:

@Document(collection = "user_profiles")
public class UserProfile {

    @Id
    private String id;
    private String username;
    private Preferences preferences;
    private List ratings;

    public UserProfile(String username, Preferences preferences) {
        this.username = username;
        this.preferences = preferences;
        this.ratings = new ArrayList<>();
    }

    // getters and setters
}

The Preferences class is an embedded object that holds three weighted maps:

public class Preferences {

    private Map genres;
    private Map tags;
    private Map mechanics;

    public Preferences(Map genres, Map tags,
                       Map mechanics) {
        this.genres = genres;
        this.tags = tags;
        this.mechanics = mechanics;
    }

    // getters and setters
}

Each map key is an attribute (like "roguelike" or "pixel-art"), and the value is a weight between 0 and 1 representing how strongly the user prefers it. This is a deliberate choice over plain lists. A plain list of preferred genres tells you that someone likes roguelikes and platformers equally. A weighted map like {"roguelike": 0.9, "platformer": 0.4} tells you they strongly favor roguelikes but only have mild interest in platformers. When the scoring engine computes recommendations, it multiplies matched attributes by their weights, so higher-affinity preferences produce higher-ranked results.

The GameRating class is another embedded object:

public class GameRating {

    private String gameId;
    private int score;
    private Instant ratedAt;

    public GameRating(String gameId, int score, Instant ratedAt) {
        this.gameId = gameId;
        this.score = score;
        this.ratedAt = ratedAt;
    }

    // getters and setters
}

To make the document structure concrete, here is what a couple of game documents look like in MongoDB:

[
    {
        "_id": "64a1b2c3d4e5f6a7b8c9d0e1",
        "title": "Hollow Knight",
        "description": "A challenging 2D action-adventure through a vast underground kingdom.",
        "developer": "Team Cherry",
        "genres": ["metroidvania", "action", "platformer"],
        "tags": ["atmospheric", "difficult", "exploration", "hand-drawn"],
        "mechanics": ["ability-unlocks", "backtracking", "boss-fights"],
        "rating": 4.7,
        "releaseYear": 2017
    },
    {
        "_id": "64a1b2c3d4e5f6a7b8c9d0e2",
        "title": "Slay the Spire",
        "description": "A deck-building roguelike where you craft a unique deck and climb the Spire.",
        "developer": "Mega Crit Games",
        "genres": ["roguelike", "strategy", "card-game"],
        "tags": ["replayable", "turn-based", "procedural"],
        "mechanics": ["deck-building", "permadeath", "procedural-generation"],
        "rating": 4.8,
        "releaseYear": 2019
    }
]

Notice how each game has multiple genres, tags, and mechanics. When a user's preference map contains {"roguelike": 0.9, "strategy": 0.6}, the recommendation engine can match both keys against Slay the Spire's genres array and sum the weights to compute a relevance score.

The companion repository includes a DataSeeder component implemented as a CommandLineRunner that loads approximately 25 indie games into the games collection on startup. This gives you a meaningful dataset to test recommendations against without manual data entry.

2. Project setup

Head over to Spring Initializr and configure a new project. Select Maven as the build tool, Java 21 as the language version, and the latest Spring Boot 3.x release. For dependencies, add Spring Web and Spring Data MongoDB. These two are all you need for now. Spring AI gets added later in section 5 when you build the embedding-based recommendation layer. Generate the project, unzip it, and open it in your IDE.

The codebase is organized into several packages that you will create as you go. The domain package holds the entity classes you defined in the previous section. The repository package contains Spring Data MongoDB repository interfaces. The service package contains the recommendation logic. The controller package exposes REST endpoints for creating users, fetching recommendations, and submitting ratings. The seeder package contains the DataSeeder class that populates the database on startup. You do not need to create all of these up front. Each package gets introduced in the section where it first becomes relevant.

To connect the application to your MongoDB Atlas cluster, open src/main/resources/application.properties and add the following:

spring.data.mongodb.uri=${MONGODB_URI:mongodb://localhost:27017/indie-game-discovery?appName=devrel-tutorial-indie-game-discovery}

The ${MONGODB_URI:...} syntax reads the connection string from an environment variable called MONGODB_URI. The value after the colon is a fallback that points to a local MongoDB instance if the variable is not set. The appName query parameter identifies your application in Atlas connection logs and monitoring dashboards. To use your Atlas cluster, set the environment variable before starting the application:

export MONGODB_URI="mongodb+srv://:@/indie-game-discovery?appName=devrel-tutorial-indie-game-discovery"

Replace the placeholders with your Atlas credentials and cluster URL. If you followed the Atlas getting started guide linked in the prerequisites, you already have these values.

If you want to skip the incremental setup and jump straight into a working project, clone the companion repository. It contains the complete source code for every section, so you can follow along with the article or run the finished application directly.

3. Building the content-based recommendation engine

Before you can generate recommendations, you need endpoints for managing user profiles and a repository for querying games. Start with a simple request DTO and controller for user profiles.

UserProfileController

Create a CreateUserRequest record that captures the data needed to build a new profile:

public record CreateUserRequest(String username, Preferences preferences) {
}

The controller exposes two endpoints: one to create a profile and one to retrieve it by ID.

@RestController
@RequestMapping("/api/users")
public class UserProfileController {

    private final UserProfileRepository userProfileRepository;

    public UserProfileController(UserProfileRepository userProfileRepository) {
        this.userProfileRepository = userProfileRepository;
    }

    @PostMapping
    public ResponseEntity createUser(@RequestBody CreateUserRequest request) {
        UserProfile profile = new UserProfile(request.username(), request.preferences());
        UserProfile saved = userProfileRepository.save(profile);
        return ResponseEntity.ok(saved);
    }

    @GetMapping("/{userId}")
    public ResponseEntity getUser(@PathVariable String userId) {
        return userProfileRepository.findById(userId)
                .map(ResponseEntity::ok)
                .orElse(ResponseEntity.notFound().build());
    }
}

The UserProfileRepository is a standard Spring Data MongoDB interface:

public interface UserProfileRepository extends MongoRepository {
}

Constructor injection is used throughout the codebase. Spring resolves the single constructor automatically without needing an @Autowired annotation.

GameRepository

The game repository needs two queries: one to fetch all games and one to find games that match any of a given set of genres. Spring Data MongoDB derives both from method names:

public interface GameRepository extends MongoRepository {

    List findByGenresIn(List genres);
}

The findByGenresIn method queries the genres array field and returns any game where at least one genre appears in the provided list. You will not use this method for the main recommendation pipeline, but it is useful for quick filtering when you want to narrow results to a specific genre subset.

RecommendationService core logic

The recommendation engine needs to solve a specific problem: the user's preferences are stored as Map (weighted maps where keys are attributes and values are affinity scores), while each game stores its genres, tags, and mechanics as plain List arrays. To score a game, you need to find which keys in the user's preference maps appear in the game's arrays, then sum the corresponding weights.

Consider a concrete example. A user has these genre preferences:

{"roguelike": 0.9, "pixel-art": 0.7, "strategy": 0.5}

A game has the genres array ["roguelike", "pixel-art", "metroidvania"]. The matching genres are "roguelike" and "pixel-art," so the genre score is 0.9 + 0.7 = 1.6.

The same user also has tag preferences {"replayable": 0.8, "difficult": 0.6} and the game has tags ["replayable", "atmospheric"]. Only "replayable" matches, so the tag score is 0.8.

No mechanics match in this case, so the mechanic score is 0.0. The total is the sum across all three categories:

1.6 + 0.8 + 0.0 = 2.4

You could pull every game into the application and compute scores in Java, but that wastes network bandwidth. Aggregation pipelines let you push this computation to the database, so MongoDB returns only the scored and sorted results. The pipeline works in four conceptual steps:

Convert the user's preference map from an object ({"roguelike": 0.9, "pixel-art": 0.7}) into an array of key-value pairs using $objectToArray. This produces [{"k": "roguelike", "v": 0.9}, {"k": "pixel-art", "v": 0.7}].
Extract just the keys from that array and use $setIntersection to find which keys overlap with the game's genre/tag/mechanic arrays.
Use $filter on the key-value pair array to keep only entries whose key appeared in the intersection, then use $reduce to sum their values into a single score.
Add the computed score as a new field with $addFields and sort by score descending.

Since the pipeline references the user's preferences as literal values (not fields from the games collection), you need to build it dynamically in Java. Here is the RecommendationService:

@Service
public class RecommendationService {

    private final MongoTemplate mongoTemplate;
    private final UserProfileRepository userProfileRepository;

    public RecommendationService(MongoTemplate mongoTemplate,
                                 UserProfileRepository userProfileRepository) {
        this.mongoTemplate = mongoTemplate;
        this.userProfileRepository = userProfileRepository;
    }

    public List getRecommendations(String userId) {
        UserProfile user = userProfileRepository.findById(userId)
                .orElseThrow(() -> new RuntimeException("User not found: " + userId));

        Preferences prefs = user.getPreferences();

        Document genreScoreExpr = buildScoreExpression(prefs.getGenres(), "$genres");
        Document tagScoreExpr = buildScoreExpression(prefs.getTags(), "$tags");
        Document mechanicScoreExpr = buildScoreExpression(prefs.getMechanics(), "$mechanics");

        Document totalScore = new Document("$add", List.of(genreScoreExpr, tagScoreExpr, mechanicScoreExpr));

        AggregationOperation addScoreField = context ->
                new Document("$addFields", new Document("score", totalScore));

        AggregationOperation sortByScore = context ->
                new Document("$sort", new Document("score", -1));

        Aggregation aggregation = Aggregation.newAggregation(addScoreField, sortByScore);

        AggregationResults results =
                mongoTemplate.aggregate(aggregation, "games", GameRecommendation.class);

        return results.getMappedResults();
    }

    private Document buildScoreExpression(Map preferenceMap, String gameField) {
        if (preferenceMap == null || preferenceMap.isEmpty()) {
            return new Document("$literal", 0.0);
        }

        Document prefObject = new Document();
        preferenceMap.forEach(prefObject::append);

        Document prefArray = new Document("$objectToArray", new Document("$literal", prefObject));

        Document prefKeys = new Document("$map",
                new Document("input", prefArray)
                        .append("as", "pref")
                        .append("in", "$$pref.k"));

        Document matchedKeys = new Document("$setIntersection", List.of(prefKeys, gameField));

        Document matchedEntries = new Document("$filter",
                new Document("input", prefArray)
                        .append("as", "entry")
                        .append("cond", new Document("$in", List.of("$$entry.k", matchedKeys))));

        return new Document("$reduce",
                new Document("input", matchedEntries)
                        .append("initialValue", 0.0)
                        .append("in", new Document("$add", List.of("$$value", "$$this.v"))));
    }
}

The buildScoreExpression method constructs the aggregation expression for a single preference category. It takes the user's preference map and the game's array field name as parameters, then works through four steps:

Convert the preference map to a BSON Document and wrap it in $objectToArray to get a key-value pair array.
Extract just the keys with $map.
Use $setIntersection to find which keys overlap with the game's array field, then $filter to keep only the matched entries.
Sum the matched values with $reduce to produce the score for that category.

The getRecommendations method calls buildScoreExpression three times (once each for genres, tags, and mechanics), adds the three results together into a total score, and runs the pipeline against the games collection.

RecommendationController

The controller takes a user ID, calls the service, and returns the ranked list:

@RestController
@RequestMapping("/api/recommendations")
public class RecommendationController {

    private final RecommendationService recommendationService;

    public RecommendationController(RecommendationService recommendationService) {
        this.recommendationService = recommendationService;
    }

    @GetMapping("/{userId}")
    public ResponseEntity> getRecommendations(@PathVariable String userId) {
        List recommendations = recommendationService.getRecommendations(userId);
        return ResponseEntity.ok(recommendations);
    }
}

The GameRecommendation response DTO includes the game fields along with the computed score:

public class GameRecommendation {

    private String id;
    private String title;
    private String description;
    private String developer;
    private List genres;
    private List tags;
    private List mechanics;
    private double rating;
    private int releaseYear;
    private double score;

    public GameRecommendation() {
    }

    // getters and setters
}

MongoDB's aggregation result maps directly into this class because $addFields attaches the score field alongside the existing game fields. Spring Data MongoDB deserializes the output documents into GameRecommendation objects automatically.

Manual test

Start the application and create a user profile with weighted preferences:

curl -X POST http://localhost:8080/api/users \
  -H "Content-Type: application/json" \
  -d '{
    "username": "alex",
    "preferences": {
      "genres": {"roguelike": 0.9, "platformer": 0.5, "metroidvania": 0.7},
      "tags": {"difficult": 0.8, "atmospheric": 0.6, "replayable": 0.7},
      "mechanics": {"permadeath": 0.9, "procedural-generation": 0.6}
    }
  }'

Copy the returned id field and hit the recommendations endpoint:

curl http://localhost:8080/api/recommendations/

The response is a list of games sorted by score. Take Slay the Spire as an example. It matches on multiple fronts:

"roguelike" in genres (0.9)
"replayable" in tags (0.7)
"permadeath" (0.9) and "procedural-generation" (0.6) in mechanics

That gives it a total score of 3.1. Compare that to Hollow Knight, which scores well on "metroidvania" (0.7) and tags like "difficult" (0.8) and "atmospheric" (0.6), but lacks roguelike traits, so it ends up lower in the ranking. The scores map directly to the user's preference weights, which makes the results easy to explain and debug.

4. User ratings and affinity adjustment

The recommendation engine works, but the preference weights are static. A user sets their initial preferences once, and the system never learns from their behavior. You need a ratings endpoint that lets users score games they have played, and adjustment logic that updates preference weights based on those ratings.

Ratings endpoint

Create a RatingRequest record to capture the incoming data:

public record RatingRequest(String gameId, int score) {
}

Add a new endpoint to UserProfileController that accepts a rating for a specific user:

@PostMapping("/{userId}/ratings")
public ResponseEntity rateGame(@PathVariable String userId,
                                            @RequestBody RatingRequest request) {
    UserProfile updatedProfile = ratingService.applyRating(userId, request);
    return ResponseEntity.ok(updatedProfile);
}

The controller delegates all the work to a RatingService. Inject it through the constructor alongside the existing UserProfileRepository:

private final UserProfileRepository userProfileRepository;
private final RatingService ratingService;

public UserProfileController(UserProfileRepository userProfileRepository,
                             RatingService ratingService) {
    this.userProfileRepository = userProfileRepository;
    this.ratingService = ratingService;
}

Affinity adjustment logic

The RatingService handles the core adjustment logic. When a user rates a game, the service looks up the game's genres, tags, and mechanics, then adjusts the corresponding weights in the user's preference maps according to these rules:

A rating of 4 or 5 pushes weights upward toward 1.0.
A rating of 1 or 2 pushes weights downward toward 0.0.
A rating of 3 leaves weights unchanged.

The adjustment uses an exponential moving average formula that naturally keeps weights bounded between 0 and 1:

newWeight = oldWeight + learningRate * (targetDirection - oldWeight)

The learningRate is set to 0.15, which means each rating shifts the weight by 15% of the remaining distance toward the target. The targetDirection is 1.0 for high ratings (the user wants more of this attribute) and 0.0 for low ratings (the user wants less).

Because the formula always moves the weight a fraction of the distance between its current value and the target, it can never exceed 1.0 or drop below 0.0. Two quick examples show why:

A weight of 0.9 rated highly moves to 0.9 + 0.15 * (1.0 - 0.9) = 0.915. A small nudge, since it is already close to the ceiling.
A weight of 0.2 rated highly moves to 0.2 + 0.15 * (1.0 - 0.2) = 0.32. A larger jump, because there is more room to grow.

Here is a concrete example. Suppose a user rates a stealth-puzzle game 5 stars. The game has the following genres: ["stealth", "puzzle"]. The user's current genre weights include "stealth": 0.4 and "puzzle": 0.6. Since the rating is 5, the target direction is 1.0:

Stealth: 0.4 + 0.15 * (1.0 - 0.4) = 0.4 + 0.09 = 0.49. Rounded, stealth goes from 0.4 to roughly 0.49.
Puzzle: 0.6 + 0.15 * (1.0 - 0.6) = 0.6 + 0.06 = 0.66. Puzzle goes from 0.6 to 0.66.

The same formula applies to the game's tags and mechanics. If the game has a tag like "atmospheric" and the user's tag weight for it is 0.5, it becomes 0.5 + 0.15 * (1.0 - 0.5) = 0.575.

Here is the RatingService:

@Service
public class RatingService {

    private static final double LEARNING_RATE = 0.15;

    private final MongoTemplate mongoTemplate;
    private final GameRepository gameRepository;
    private final UserProfileRepository userProfileRepository;

    public RatingService(MongoTemplate mongoTemplate,
                         GameRepository gameRepository,
                         UserProfileRepository userProfileRepository) {
        this.mongoTemplate = mongoTemplate;
        this.gameRepository = gameRepository;
        this.userProfileRepository = userProfileRepository;
    }

    public UserProfile applyRating(String userId, RatingRequest request) {
        UserProfile user = userProfileRepository.findById(userId)
                .orElseThrow(() -> new RuntimeException("User not found: " + userId));

        Game game = gameRepository.findById(request.gameId())
                .orElseThrow(() -> new RuntimeException("Game not found: " + request.gameId()));

        int score = request.score();
        if (score == 3) {
            pushRatingOnly(userId, request);
            return userProfileRepository.findById(userId).orElseThrow();
        }

        double targetDirection = score >= 4 ? 1.0 : 0.0;
        Preferences prefs = user.getPreferences();

        Map updatedGenres = adjustWeights(prefs.getGenres(), game.getGenres(), targetDirection);
        Map updatedTags = adjustWeights(prefs.getTags(), game.getTags(), targetDirection);
        Map updatedMechanics = adjustWeights(prefs.getMechanics(), game.getMechanics(), targetDirection);

        updateProfileInMongo(userId, request, updatedGenres, updatedTags, updatedMechanics);

        return userProfileRepository.findById(userId).orElseThrow();
    }

    private Map adjustWeights(Map currentWeights,
                                              List gameAttributes,
                                              double targetDirection) {
        Map updated = new HashMap<>(currentWeights);
        for (String attribute : gameAttributes) {
            double oldWeight = updated.getOrDefault(attribute, 0.5);
            double newWeight = oldWeight + LEARNING_RATE * (targetDirection - oldWeight);
            updated.put(attribute, Math.round(newWeight * 1000.0) / 1000.0);
        }
        return updated;
    }

    private void pushRatingOnly(String userId, RatingRequest request) {
        GameRating rating = new GameRating(request.gameId(), request.score(), Instant.now());
        Query query = Query.query(Criteria.where("_id").is(userId));
        Update update = new Update().push("ratings", rating);
        mongoTemplate.updateFirst(query, update, UserProfile.class);
    }

    private void updateProfileInMongo(String userId, RatingRequest request,
                                      Map genres,
                                      Map tags,
                                      Map mechanics) {
        GameRating rating = new GameRating(request.gameId(), request.score(), Instant.now());
        Query query = Query.query(Criteria.where("_id").is(userId));
        Update update = new Update()
                .set("preferences.genres", genres)
                .set("preferences.tags", tags)
                .set("preferences.mechanics", mechanics)
                .push("ratings", rating);
        mongoTemplate.updateFirst(query, update, UserProfile.class);
    }
}

The adjustWeights method iterates over the game's attributes and applies the formula to each matching weight. If the user does not already have a weight for a particular attribute (for example, a genre they have never encountered before), it defaults to 0.5 as a neutral starting point and adjusts from there.

MongoDB update

The updateProfileInMongo method performs the preference update and rating storage in a single MongoDB operation. The $set operator replaces the preferences.genres, preferences.tags, and preferences.mechanics maps with the recalculated versions, while $push appends the new GameRating entry to the ratings array. Because both modifications happen in one updateFirst call, there is no window during which the document is partially updated.

Before and after demo

To see the feedback loop in action, hit the recommendations endpoint before and after submitting a rating. Using the same user from section 3:

# get recommendations before rating
curl http://localhost:8080/api/recommendations/

# rate a game highly
curl -X POST http://localhost:8080/api/users//ratings \
  -H "Content-Type: application/json" \
  -d '{"gameId": "", "score": 5}'

# get recommendations after rating
curl http://localhost:8080/api/recommendations/

Compare the two responses. Games that share genres, tags, or mechanics with the highly rated game will have moved up in the rankings because their matching weights increased. Games that do not share those attributes remain at their previous scores. Each rating nudges the profile slightly, and over several ratings, the preference weights settle into a profile that matches what the user actually enjoys.

5. Adding Spring AI embeddings and MongoDB Atlas Vector Search

The preference engine works well when a user's tags literally match a game's tags. But it misses semantic connections. A game tagged "exploration" and "mystery" should appeal to a user who likes "adventure" and "narrative," because those concepts are closely related. The preference engine scores that match at zero since none of the strings overlap.

Embeddings solve this problem. They represent text as high-dimensional vectors, with semantically similar concepts close together. Instead of checking whether two strings are identical, you measure the distance between their vector representations.

Spring AI setup

Add the Spring AI OpenAI starter to your pom.xml. You also need the Spring AI BOM to manage dependency versions:


    
        
            org.springframework.ai
            spring-ai-bom
            1.0.0
            pom
            import
        
    



    
        org.springframework.ai
        spring-ai-starter-model-openai

Then add the OpenAI configuration to application.properties:

spring.ai.openai.api-key=${OPENAI_API_KEY}
spring.ai.openai.embedding.options.model=text-embedding-3-small

The text-embedding-3-small model produces 1536-dimensional vectors. Store your API key in an environment variable rather than hardcoding it.

Generating embeddings

To generate an embedding, concatenate a game's description, genres, tags, and mechanics into a single text block, then pass the resulting text to the EmbeddingModel. First, add an embedding field to the Game class:

@Document(collection = "games")
public class Game {

    // ... existing fields ...

    private float[] embedding;

    // getter and setter for embedding
    public float[] getEmbedding() { return embedding; }
    public void setEmbedding(float[] embedding) { this.embedding = embedding; }
}

Then create a helper method that builds the text representation and generates the embedding:

@Service
public class EmbeddingService {

    private final EmbeddingModel embeddingModel;

    public EmbeddingService(EmbeddingModel embeddingModel) {
        this.embeddingModel = embeddingModel;
    }

    public float[] generateGameEmbedding(Game game) {
        String text = game.getDescription() + " "
                + "Genres: " + String.join(", ", game.getGenres()) + ". "
                + "Tags: " + String.join(", ", game.getTags()) + ". "
                + "Mechanics: " + String.join(", ", game.getMechanics()) + ".";
        return embeddingModel.embed(text);
    }
}

The embed() method sends the text to OpenAI's embedding API and returns a float[] with 1536 values. Concatenating all of a game's metadata into one string gives the model enough context to produce a meaningful vector.

DataSeeder update

Update the DataSeeder to generate embeddings for each game on startup. After inserting the game documents, iterate over them and call the EmbeddingService:

@Component
public class DataSeeder implements CommandLineRunner {

    private final GameRepository gameRepository;
    private final EmbeddingService embeddingService;

    public DataSeeder(GameRepository gameRepository, EmbeddingService embeddingService) {
        this.gameRepository = gameRepository;
        this.embeddingService = embeddingService;
    }

    @Override
    public void run(String... args) {
        // ... existing game insertion logic ...

        List games = gameRepository.findAll();
        for (Game game : games) {
            if (game.getEmbedding() == null) {
                game.setEmbedding(embeddingService.generateGameEmbedding(game));
                gameRepository.save(game);
            }
        }
    }
}

The null check prevents re-generating embeddings on every restart. Each API call costs money, so you only want to embed games that do not already have a vector stored.

Atlas Vector Search index

Before you can query the embeddings, you need to create a Vector Search index in Atlas. Go to your cluster in the Atlas UI, select the Atlas Search tab, and click Create Search Index. Choose Atlas Vector Search as the index type, select the games collection, and use the following index definition:

{
  "fields": [
    {
      "type": "vector",
      "path": "embedding",
      "numDimensions": 1536,
      "similarity": "cosine"
    }
  ]
}

Name the index vector_index. The numDimensions value must match the output of your embedding model, which is 1536 for text-embedding-3-small. Cosine similarity is the standard choice for text embeddings because it measures the angle between vectors regardless of their magnitude.

Vector search query

With the index in place, you can build a method to find games that are semantically similar to a user's preferences. The approach is: construct a text summary of the user's top preferences, embed it, then run a $vectorSearch aggregation against the games collection.

Add a findSimilarGames method to the RecommendationService:

public List findSimilarGames(String userId) {
    UserProfile user = userProfileRepository.findById(userId)
            .orElseThrow(() -> new RuntimeException("User not found: " + userId));

    Preferences prefs = user.getPreferences();
    String preferenceText = buildPreferenceText(prefs);
    float[] queryVector = embeddingModel.embed(preferenceText);

    List queryVectorList = new ArrayList<>();
    for (float f : queryVector) {
        queryVectorList.add((double) f);
    }

    Document vectorSearchStage = new Document("$vectorSearch",
            new Document("index", "vector_index")
                    .append("path", "embedding")
                    .append("queryVector", queryVectorList)
                    .append("numCandidates", 50)
                    .append("limit", 10));

    AggregationOperation vectorSearch = context -> vectorSearchStage;

    AggregationOperation addScore = context ->
            new Document("$addFields",
                    new Document("score", new Document("$meta", "vectorSearchScore")));

    Aggregation aggregation = Aggregation.newAggregation(vectorSearch, addScore);

    AggregationResults results =
            mongoTemplate.aggregate(aggregation, "games", GameRecommendation.class);

    return results.getMappedResults();
}

private String buildPreferenceText(Preferences prefs) {
    List parts = new ArrayList<>();
    prefs.getGenres().entrySet().stream()
            .sorted(Map.Entry.comparingByValue().reversed())
            .limit(5)
            .forEach(e -> parts.add(e.getKey()));
    prefs.getTags().entrySet().stream()
            .sorted(Map.Entry.comparingByValue().reversed())
            .limit(5)
            .forEach(e -> parts.add(e.getKey()));
    prefs.getMechanics().entrySet().stream()
            .sorted(Map.Entry.comparingByValue().reversed())
            .limit(3)
            .forEach(e -> parts.add(e.getKey()));
    return "Games with " + String.join(", ", parts);
}

The buildPreferenceText method extracts the user's top-weighted genres, tags, and mechanics and combines them into a natural-language string like "Games with roguelike, metroidvania, difficult, atmospheric, replayable, permadeath, procedural-generation." This string is embedded into a query vector, and $vectorSearch finds the game embeddings closest to it using cosine similarity.

The numCandidates parameter controls how many candidates the search considers internally before returning the final limit results. Setting numCandidates higher than limit improves accuracy at the cost of slightly more processing.

To make this work, add the EmbeddingModel as a dependency in RecommendationService alongside the existing fields:

private final MongoTemplate mongoTemplate;
private final UserProfileRepository userProfileRepository;
private final EmbeddingModel embeddingModel;

public RecommendationService(MongoTemplate mongoTemplate,
                             UserProfileRepository userProfileRepository,
                             EmbeddingModel embeddingModel) {
    this.mongoTemplate = mongoTemplate;
    this.userProfileRepository = userProfileRepository;
    this.embeddingModel = embeddingModel;
}

Results

Using the same user profile from section 3, vector search surfaces games like Outer Wilds (tagged "exploration" and "mystery") even though the user's preferences contain "adventure" and "narrative" rather than those exact terms. The preference engine gives Outer Wilds a low score because there is no literal tag overlap, but the embedding vectors for "exploration" and "adventure" are close in vector space, so $vectorSearch ranks it highly. This is the gap that embeddings fill.

6. Combining both signals

You now have two recommendation approaches that each capture something the other misses. Content-based scoring reflects what the user explicitly told you they want. Vector similarity catches semantic relationships that literal tag matching overlooks. The next step is to merge both into a single ranked result.

Merging approach

Add three new fields to the GameRecommendation class you created in section 3:

public class GameRecommendation {

    // ... existing fields ...

    private double contentScore;
    private double similarityScore;
    private double combinedScore;

    // getters and setters for all three score fields
}

The combined score uses a weighted formula: 0.6 * contentScore + 0.4 * similarityScore. Content-based scoring gets the higher weight because it reflects explicit user intent. When a user sets "roguelike": 0.9, they are telling you directly what they want. Similarity scores can surface unexpected results that drift from stated preferences, so giving content-based recommendations to the majority share keeps recommendations grounded in what the user asked for while still benefiting from semantic matches.

Add a getCombinedRecommendations method to RecommendationService that calls both approaches and merges the results:

public List getCombinedRecommendations(String userId) {
    List contentResults = getRecommendations(userId);
    List similarityResults = findSimilarGames(userId);

    double maxContentScore = contentResults.stream()
            .mapToDouble(GameRecommendation::getScore)
            .max().orElse(1.0);

    double maxSimilarityScore = similarityResults.stream()
            .mapToDouble(GameRecommendation::getScore)
            .max().orElse(1.0);

    Map merged = new LinkedHashMap<>();

    for (GameRecommendation rec : contentResults) {
        rec.setContentScore(rec.getScore() / maxContentScore);
        rec.setSimilarityScore(0.0);
        merged.put(rec.getId(), rec);
    }

    for (GameRecommendation rec : similarityResults) {
        double normalizedSimilarity = rec.getScore() / maxSimilarityScore;
        if (merged.containsKey(rec.getId())) {
            GameRecommendation existing = merged.get(rec.getId());
            existing.setSimilarityScore(normalizedSimilarity);
        } else {
            rec.setContentScore(0.0);
            rec.setSimilarityScore(normalizedSimilarity);
            merged.put(rec.getId(), rec);
        }
    }

    for (GameRecommendation rec : merged.values()) {
        double combined = 0.6 * rec.getContentScore() + 0.4 * rec.getSimilarityScore();
        rec.setCombinedScore(Math.round(combined * 1000.0) / 1000.0);
    }

    return merged.values().stream()
            .sorted(Comparator.comparingDouble(GameRecommendation::getCombinedScore).reversed())
            .toList();
}

Both scoring methods operate on different scales. Content-based scores are unbounded sums of matched weights, while similarity scores are cosine distances between 0 and 1. The method normalizes each set of scores by dividing each score by the maximum value in that set, bringing both into the 0 to 1 range before combining them. Games that appear in both result sets get both scores populated. Games that only appear in one set receive a zero for the missing score.

Unified response

Update the GET /api/recommendations/{userId} endpoint in RecommendationController to call getCombinedRecommendations instead of getRecommendations:

@GetMapping("/{userId}")
public ResponseEntity> getRecommendations(@PathVariable String userId) {
    List recommendations =
            recommendationService.getCombinedRecommendations(userId);
    return ResponseEntity.ok(recommendations);
}

The response now includes all three scores for each game:

[
    {
        "id": "64a1b2c3d4e5f6a7b8c9d0e2",
        "title": "Slay the Spire",
        "genres": ["roguelike", "strategy", "card-game"],
        "contentScore": 1.0,
        "similarityScore": 0.87,
        "combinedScore": 0.948
    },
    {
        "id": "64a1b2c3d4e5f6a7b8c9d0e1",
        "title": "Hollow Knight",
        "genres": ["metroidvania", "action", "platformer"],
        "contentScore": 0.68,
        "similarityScore": 0.92,
        "combinedScore": 0.776
    },
    {
        "id": "64a1b2c3d4e5f6a7b8c9d0e5",
        "title": "Outer Wilds",
        "genres": ["adventure", "exploration"],
        "contentScore": 0.12,
        "similarityScore": 0.81,
        "combinedScore": 0.396
    }
]

Slay the Spire leads because it scores well on both signals. Hollow Knight has a strong similarity score but a weaker content match. Outer Wilds has a low content score, but still appears because its high similarity score pulls it up.

Recommendation flow

The combined system follows this flow: user preferences and ratings feed into the content-based scoring pipeline, which computes a score for each game using MongoDB aggregation. In parallel, the user's top preferences are converted to a text summary and passed through the embedding model to produce a query vector. MongoDB Atlas Vector Search uses that vector to find semantically similar games. Both sets of scores are normalized and merged using the weighted formula, and the final output is a single ranked list of recommendations sorted by combined score.

Tuning weights

The 0.6/0.4 split is a reasonable starting point, not a universal answer. The right balance depends on how much preference data you have. When a user has submitted many ratings and their preference weights are well-calibrated, the content-based signal is reliable and deserves more weight. For new users who have set only a few initial preferences, the content-based scores may be sparse, and increasing the similarity weight (e.g to 0.5/0.5 or even 0.4/0.6) can yield better early recommendations by leaning on semantic connections.

Treat these weights as a tunable parameter, not a fixed constant. You could also make them dynamic per user, shifting toward content-based as the system accumulates more ratings.

7. Testing the full workflow

With all the pieces in place, walk through the full cycle: create a user, get initial recommendations, submit ratings, and observe how the results change.

Start the application. The DataSeeder loads games into the games collection and generates embeddings for each one on the first run. Once the application is ready, create a user profile:

curl -X POST http://localhost:8080/api/users \
  -H "Content-Type: application/json" \
  -d '{
    "username": "alex",
    "preferences": {
      "genres": {"roguelike": 0.9, "platformer": 0.5, "metroidvania": 0.7},
      "tags": {"difficult": 0.8, "atmospheric": 0.6, "replayable": 0.7},
      "mechanics": {"permadeath": 0.9, "procedural-generation": 0.6}
    }
  }'

The response includes the generated user ID. Copy it and request recommendations:

curl http://localhost:8080/api/recommendations/682f1a3b5e4d

[
    {"title": "Slay the Spire", "contentScore": 1.0, "similarityScore": 0.87, "combinedScore": 0.948},
    {"title": "Hades", "contentScore": 0.84, "similarityScore": 0.91, "combinedScore": 0.868},
    {"title": "Dead Cells", "contentScore": 0.77, "similarityScore": 0.83, "combinedScore": 0.794},
    {"title": "Hollow Knight", "contentScore": 0.68, "similarityScore": 0.92, "combinedScore": 0.776},
    {"title": "Outer Wilds", "contentScore": 0.12, "similarityScore": 0.81, "combinedScore": 0.396}
]

Slay the Spire leads because it hits roguelike (0.9), replayable (0.7), permadeath (0.9), and procedural-generation (0.6) all at once. Outer Wilds ranks low on content score since its tags do not literally match the user's preferences, but embeddings still pull it into the list.

Now submit a few ratings. Rate Hollow Knight highly and Slay the Spire lower:

curl -X POST http://localhost:8080/api/users/682f1a3b5e4d/ratings \
  -H "Content-Type: application/json" \
  -d '{"gameId": "64a1b2c3d4e5f6a7b8c9d0e1", "score": 5}'

curl -X POST http://localhost:8080/api/users/682f1a3b5e4d/ratings \
  -H "Content-Type: application/json" \
  -d '{"gameId": "64a1b2c3d4e5f6a7b8c9d0e2", "score": 2}'

curl -X POST http://localhost:8080/api/users/682f1a3b5e4d/ratings \
  -H "Content-Type: application/json" \
  -d '{"gameId": "64a1b2c3d4e5f6a7b8c9d0e5", "score": 4}'

The first rating pushes metroidvania, action, and platformer weights upward. The second pulls roguelike and card-game weights down. The third boosts adventure and exploration. Fetch recommendations again:

curl http://localhost:8080/api/recommendations/682f1a3b5e4d

[
    {"title": "Hollow Knight", "contentScore": 0.91, "similarityScore": 0.92, "combinedScore": 0.914},
    {"title": "Hades", "contentScore": 0.82, "similarityScore": 0.89, "combinedScore": 0.848},
    {"title": "Dead Cells", "contentScore": 0.74, "similarityScore": 0.84, "combinedScore": 0.780},
    {"title": "Outer Wilds", "contentScore": 0.38, "similarityScore": 0.81, "combinedScore": 0.552},
    {"title": "Slay the Spire", "contentScore": 0.61, "similarityScore": 0.85, "combinedScore": 0.706}
]

Hollow Knight jumped from fourth to first. Its content score increased from 0.68 to 0.91 because the 5-star rating boosted the weights for metroidvania, platformer, and atmospheric. Slay the Spire dropped because the 2-star rating pulled down roguelike and card-game weights. Outer Wilds moved up thanks to the 4-star rating increasing adventure and exploration weights, which also shifted the embedding query to favor similar games. Each rating adjusts the preference profile incrementally, and the combined scoring reflects those changes immediately.

Conclusion

You built a recommendation engine with two layers. Content-based preference scoring uses MongoDB aggregation pipelines to match games against weighted user preferences. Embedding-based similarity uses Spring AI and MongoDB Atlas Vector Search to surface games that are semantically related to a user's tastes, even when tags do not literally overlap. User ratings close the feedback loop by adjusting preference weights over time.

From here, try swapping in a different embedding model to see how it affects similarity results. Once you have enough users, add collaborative filtering to recommend games based on what similar users enjoyed. You could also incorporate additional signals like playtime, wishlist activity, or purchase history to make the preference model richer.

The complete source code is available in the companion repository. Clone it, plug in your MongoDB Atlas connection string and OpenAI API key, and start experimenting with your own game catalog and preference configurations.

The post Building a Personalized Content Delivery System appeared first on foojay.

Distributed Cache Invalidation Patterns

Matteo Rossi — Tue, 21 Apr 2026 13:52:03 +0000

Table of Contents

Why Cache Invalidation Becomes Hard in Distributed SystemsTime-Based Expiration (TTL)The Cache-Aside PatternEvent-Based Cache InvalidationVersioned Cache KeysMulti-Layer CachingEvent-Driven Cache RebuildsChoosing the Right StrategyFinal Thoughts

Caching is one of the most powerful tools developers have at their disposal for optimizing application performance. Caching systems can significantly reduce latency and reduce the load on databases or external systems by storing frequently accessed data as close as possible to the application layer. The result? Improved responsiveness and overall system usability.

In small monolithic applications, cache management is usually very simple. A service retrieves data from a database, stores it in memory, and fulfills subsequent requests by retrieving the data directly from the cache. When the data changes, the cache key is invalidated or updated.

Things get complicated—and not just a little—when the system evolves into a distributed architecture.

Modern, cloud-native applications run multiple service instances behind load balancers. Each instance can maintain its own local cache, and the system may include shared distributed caches such as Redis or Memcached. In these environments, maintaining cache consistency and coherence becomes much more difficult.

If one node updates a record while other nodes continue to serve stale records from the cache, users may notice inconsistent behavior across requests. The system may remain fast, but correctness is no longer guaranteed.

This is the main reason why cache invalidation is often considered one of the most complex issues to manage in distributed infrastructures.

In this article, we will explore several practical models for managing cache invalidation. We will focus on the different strategies developers can apply in real-world systems using tools such as Spring Boot, Redis, and Apache Kafka.

Why Cache Invalidation Becomes Hard in Distributed Systems

To better understand why cache invalidation in a distributed system is so complex, let’s consider how modern systems are typically implemented.

Most cloud applications, built according to 12-factor principles, run multiple instances of the same service to ensure scalability and fault tolerance. Each instance handles requests independently, and these applications often maintain a local in-memory cache to avoid repeated calls to the database or external services.

Let’s imagine a simple service tasked with retrieving information about products from the database:

A request arrives at instance A
The product data is loaded from the database
The result is stored in the local cache
Future requests are handled using the data in the cache

Now, let’s suppose that another request updates the product information. If the update occurs on instance B, only that instance is aware of it and will therefore invalidate its own cached record. Instance A might still retain the old value in memory.

The result? When the load balancer routes requests across instances, users might receive different responses depending on which node handles the request.

The following diagram shows and explains the current situation:

This problem becomes even more complex when the architecture includes multiple cache levels, such as:

In-memory caches within application instances
Distributed caches shared across services
CDN or edge caches

Ensuring that all these levels remain consistent is no trivial matter. As systems scale and become increasingly distributed, we must balance competing priorities: data freshness, data consistency, system performance, and operational complexity.

The solution? A good cache invalidation strategy should minimize stale data while keeping the system scalable and resilient. Let’s see how to do that.

Time-Based Expiration (TTL)

One of the simplest strategies for cache invalidation is to apply a time-based expiration, often implemented using a TTL (time-to-live).

With this strategy, the system allows cached values to expire after a predefined time-to-live (TTL) rather than actively invalidating records in the cache when the data changes. This is a simplified approach that avoids the need for distributed coordination among service instances.

For example, a Redis-based cache in a Spring Boot application can be configured with a default expiration time.

@Configuration
@EnableCaching

public class CacheConfig {
    @Bean
    public RedisCacheManager cacheManager(RedisConnectionFactory connectionFactory) {
        RedisCacheConfiguration config = RedisCacheConfiguration.defaultCacheConfig()
                .entryTtl(Duration.ofMinutes(10));
        return RedisCacheManager.builder(connectionFactory)
                .cacheDefaults(config)
                .build();
    }
}

Entries logically expire after ten minutes. Redis removes expired keys lazily when they are accessed, plus a background process periodically cleans them up. This means expired keys may still consume memory briefly after their TTL expires.

We can indicate that the result of a method should be cached by using Spring's caching abstraction:

@Service

public class ProductService {
    @Cacheable("products")
    public Product getProduct(String id) {
        return productRepository.findById(id).orElseThrow();
    }
}

The main advantage of TTL-based caching is definitely its simplicity: it works well when the application can tolerate short periods of outdated data.

However, TTL alone rarely solves the entire problem: if a record changes immediately after being cached, the system may serve outdated information for the entire TTL.

More proactive and effective invalidation strategies are necessary when dealing with highly dynamic data.

The Cache-Aside Pattern

A widely used approach to application-level caching is the “cache-aside” model, also known as the “lazy loading” mechanism. In this model, the application itself handles interactions with both the cache and the database (or any other system to be cached).

When reading data, the service first checks the cache. If the value is not there, the application fetches it from the database and stores it in the cache for future requests.

This model is exactly what Spring’s @Cacheable annotation implements:

@Cacheable(value = "products", key = "#id")

public Product getProduct(String id) {
    return productRepository.findById(id).orElseThrow();
}

When data changes, the application explicitly removes the corresponding cache entry.

@CacheEvict(value = "products", key = "#id")

public void updateProduct(Product product) {
    productRepository.save(product);
}

The next request will trigger the process again: reading from the database and repopulating the cache.

Cache-aside works very well in single-instance applications. In distributed systems, however, it invalidates the cache only on the node that performs the update. The other nodes may continue to serve outdated values unless additional coordination mechanisms are implemented.

Event-Based Cache Invalidation

A common approach to invalidating a distributed cache is to use event-driven communication.

Instead of relying on individual nodes to invalidate their own caches, services publish events whenever data changes. The other nodes listen for these events and invalidate their cache entries accordingly.

The typical workflow is as follows:

A record is updated
The service publishes an invalidation event
All application instances receive the event
Each instance deletes the corresponding cache entry.

For this purpose, messaging platforms such as Apache Kafka or RabbitMQ are typically used. For simpler systems, Redis Pub/Sub may be sufficient.

Let’s look at a small example that uses Redis to publish an invalidation message every time a product is updated.

@Service

public class ProductService {
    private final RedisTemplate redisTemplate;
    public void updateProduct(Product product) {
        productRepository.save(product);
        redisTemplate.convertAndSend(
                "cache-invalidation",
                product.getId()
        );
    }
}

Each service instance subscribes to the invalidation channel and clears the cache entry locally.

@Component

public class CacheInvalidationListener implements MessageListener {
    private final CacheManager cacheManager;
    @Override
    public void onMessage(Message message, byte[] pattern) {
        String productId = new String(message.getBody());
        cacheManager.getCache("products")
                .evict(productId);
    }
}

This approach ensures that all nodes have the opportunity to respond to the same stream of events, keeping caches synchronized across the entire system.

The main challenge lies in managing reliability issues, such as message delivery guarantees and duplicate events. For this reason, enterprise systems with strict requirements often rely on durable messaging platforms rather than the simple Pub/Sub model.

Versioned Cache Keys

Another effective strategy is using versioned cache keys. Instead of deleting cache entries when data changes, the system creates a new cache key with an incremented version.

For example:

product:123:v1

product:123:v2

When the product changes, the application increments the version number and writes the updated value under the new key; at this point, users automatically retrieve the latest version.

We can create a helper method to manage versioned keys:

public String buildCacheKey(String productId, int version) {
    return "product:" + productId + ":v" + version;
}

This technique eliminates race conditions in which one node invalidates a cache entry while another node is writing a new value to that entry.

Versioned keys are particularly useful in high-throughput systems, where invalidation events may arrive in random order. What is the drawback? Keys can accumulate over time, leading to cache overload. It is therefore necessary to implement a periodic cleanup process to remove obsolete and no-longer-useful versions.

Multi-Layer Caching

Many modern systems combine local in-memory caches with distributed caches. This multi-tiered approach reduces latency while maintaining the necessary scalability.

Let’s imagine a typical architecture:

One or more application instances
Local in-memory caches (e.g., Caffeine)
Distributed cache (e.g., Redis)
Database

The local cache ensures extremely fast reads, while the distributed cache ensures that data is shared across nodes. For example, we can configure our application to use Caffeine for local caching and Redis for distributed in-memory storage.

@Bean

public CacheManager cacheManager() {
    CaffeineCacheManager caffeineManager = new CaffeineCacheManager("products");
    caffeineManager.setCaffeine(
        Caffeine.newBuilder()
            .expireAfterWrite(5, TimeUnit.MINUTES)
            .maximumSize(10_000)
    );
    return caffeineManager;
}

In a setup like this, invalidation events must clear both cache levels. While this adds complexity, it allows us to significantly reduce the number of remote cache calls and improve response times under heavy load. It’s important to note that local cache size should be tuned relative to the number of instances. With 10 instances each caching 10,000 entries, total memory consumption across the fleet is 100,000 entries. Size it carefully!

Event-Driven Cache Rebuilds

There are some architectural strategies, particularly those inspired by CQRS, where caches are not simply invalidated but are rebuilt from domain events.

In this case, the system maintains read models derived from a stream of events rather than storing arbitrary cache entries.

Every time an entity changes, the system emits an event of the type:

ProductCreatedEvent
ProductUpdatedEvent
InventoryAdjustedEvent

Consumers subscribe to these events and update read-optimized data structures.

A Kafka listener in a Spring Boot application might look like this:

@KafkaListener(topics = "product-events")

public void handleProductUpdate(ProductUpdatedEvent event) {
    cacheManager.getCache("products")
            .put(event.getProductId(), event.getProduct());
}

Applying this pattern transforms the cache into a projection of the event stream rather than a layer of temporary storage.

It is a powerful pattern, but it requires a mature event infrastructure and careful design focused on ensuring the consistency of the final result.

Choosing the Right Strategy

So what is the best approach? None. There is no single optimal approach to cache invalidation in distributed systems.

Different applications have different levels of tolerance for stale data, operational complexity, and infrastructure resilience. Furthermore, the best strategy depends on the data and the business process at hand. Every case is unique and must be treated as such.

In many real-world systems, a hybrid strategy is certainly the best approach.

A starting combination could be:

TTL expiration as a safety net
cache-aside loading for simplicity
event-driven invalidation for faster consistency

Systems with high-throughput requirements can adopt versioned keys or event-driven read patterns to ensure the overall effectiveness of the invalidation model.

Final Thoughts

Caching remains one of the most effective ways to improve the performance of distributed systems. When implemented effectively and in line with business requirements, it can drastically reduce the load on the database or external services and greatly improve response times and overall system latency.

However, there is a downside. Distributed caches introduce new challenges in terms of consistency and coordination. Without proper invalidation strategies, caches can serve stale data and compromise system correctness without anyone noticing.

The modern Java ecosystem offers excellent tools for implementing solid and robust caching solutions. Spring Boot simplifies cache integration within an application, whether local or distributed. Technologies like Redis and Apache Kafka enable scalable and resilient distributed coordination.

By combining models such as TTL expiration, cache-aside loading, event-driven invalidation, and multi-tier caching, you can build systems that remain fast and consistent even as they scale.

In conclusion, caching is not a feature to simply enable or disable. It is an architectural component to be integrated and managed within the ecosystem, designed alongside the application to ensure consistency and reliability.

If you’d like to take a look at the examples in the article, feel free to visit the repository.

The post Distributed Cache Invalidation Patterns appeared first on foojay.

CQRS in Java: Separating Reads and Writes Cleanly

Mike LaSpina — Thu, 16 Apr 2026 20:25:28 +0000

Table of Contents

What you'll learn

The Spring standard repositoryCreating separate repositories for the read and write

The read Repository
The write repository

When requirements changeThe double-edged sword of Spring updates in MongoDB

Increased network traffic
Oplog bloat and replacing documents

Conclusion

Further reading

What you'll learn

How the MongoDB Spring repository can be used to abstract MongoDB operations
Separating Reads and Writes in your application
How separating these can make schema design changes easier
Why you should avoid save() and saveAll() functions in Spring

The Command Query Responsibility Segregation (CQRS) pattern is a design method that segregates data access into separate services for reading and writing data. This allows a higher level of maintainability in your applications, especially if the schema or requirements change frequently. This pattern was originally developed with separate read and write sources in mind. However, implementing CQRS for a single data source is an effective way to abstract data from the application and make maintenance easier in the future. In this blog, we will use Spring Boot with MongoDB in order to create a CQRS pattern-based application.

Spring Boot applications generally have two main components to a repository pattern: standard repository items from Spring—in this case, MongoRepository—and then custom repository items that you create to perform operations beyond what is included with the standard repository. In our case, we will be using 2 custom repositories - ItemReadRepository and ItemWriteRepository to segregate the reads and writes from each other.

The code in this article is based on the grocery item sample app. View the updated version of this code used in this article. Note the connection string in the application.properties file passes the app name of 'devrel-blog-java-cqrs' to the DB.

The Spring standard repository

The standard repo items can extend the base MongoRepository class. This greatly reduces the amount of code needed for standard CRUD operations. However, if we're using the CQRS pattern, we'll want to use custom repositories for both the read and write functions to the DB. We can still use standard Spring functions for some of our read functionality.

Creating separate repositories for the read and write

To implement CQRS, we'll need two custom repos in Spring. One for the reads and one for the writes. Although this may add a little bit of complexity up front, this will make it easier to modify going forward. Consider the following changes that may need to be made in the future:

Adding the ability to send changes to Pub/Sub, such as Kafka.
Changes to document schema as the system grows - in this case, consider using the schema versioning pattern. Having separate read and write functionality makes changes much easier in this case.
New security requirements, such as permissioning and encryption.
Reading and writing to different data sources.

The read Repository

Our first repository will be for data reading. Note that this repo only queries the DB. In our case, we'll call it ItemReadRepository:

public interface ItemReadRepository extends MongoRepository {
   @Query("{name:'?0'}")
   GroceryItem findItemByName(String name);
   @Query(value="{category:'?0'}", fields="{'name' : 1, 'quantity' : 1}")
   List findAllbyCategory(String category);
   public long count();
}

Here, we define two different query functions plus a count function. In this case, we can use the standard MongoRepository functions by adding query annotations.

findItemByName: This passes the query {name: ''} to the find function in MongoRepository. As seen by the declaration, it returns a single GroceryItem. This maps to the MongoDB findOne function and translates to the MongoDB query:

db.groceryitem.findOne({"name" : ""})

findAllbyCategory: This returns a list of GroceryItems by category. Since this function returns a list, the MongoDB find function is used to return all items that meet the criteria. In this example, we add a projection using the 'fields' parameter to only return the 'name' and 'quantity' fields. The MongoDB find method is called under the covers:

db.groceryitem.find({"category" : ""}).
project({"name" : 1, "quantity" : 1})

count: This simply counts the items in the collection.

The write repository

The next repository will be for writing data (C, U, and D of CRUD). This repo only writes to the DB. In our case, we'll call it ItemWriteRepository:

public interface ItemWriteRepository {
   void updateItemQuantity(String itemName, float newQuantity);
   void bulkUpdateItemCategories(String category, String newCategory);
   void deleteAll();
   void deleteById(String id);
   void insert(GroceryItem item);
}

As discussed in my previous blog regarding Spring I/O and MongoDB updates, it's best to provide your own write functionality rather than relying on Spring's brute force approach of replacing the entire document - see the section regarding the Double-Edge Sword of Spring and MongoDB below.

We use the ItemWriteRepositoryImpl class in order to specify the exact operations to be carried out for each write function:

public class ItemWriteRepositoryImpl implements ItemWriteRepository {
   @Autowired
   MongoTemplate mongoTemplate;
   public void updateItemQuantity(String name, float newQuantity) {
      Query query = new Query(Criteria.where("name").is(name));
      Update update = new Update();
      update.set("quantity", newQuantity);
      UpdateResult result = mongoTemplate.updateFirst(query, update, GroceryItem.class);
      if(result == null)
         System.out.println("No documents updated");
      else
         System.out.println(result.getModifiedCount() + " document(s) updated");
   }

   public void bulkUpdateItemCategories(String category, String newCategory) {
      Query query = new Query(Criteria.where("category").is(category));
      Update update = new Update();
      update.set("category", newCategory);
      UpdateResult result = mongoTemplate.updateMulti(query, update, GroceryItem.class);
      if(result == null)
         System.out.println("No documents updated");
      else
         System.out.println(result.getModifiedCount() + " document(s) updated");
   }

   public void deleteAll() {
      DeleteResult result = mongoTemplate.remove(new Query(), GroceryItem.class);
      if(result == null)
         System.out.println("No documents deleted");
      else
         System.out.println(result.getDeletedCount() + " document(s) deleted");
   }

   public void insert(GroceryItem itm){
      GroceryItem result =  mongoTemplate.insert(itm);
      if(result == null)
         System.out.println("No document inserted");
      else
         System.out.println("  " + result.getName() + " inserted");
   }

   public void deleteById(String id){
      Query query = new Query();
      query.addCriteria(Criteria.where("_id").is(id));
      DeleteResult result = mongoTemplate.remove(query, GroceryItem.class);
      if(result == null)
         System.out.println("No documents deleted");
      else
         System.out.println(result.getDeletedCount() + " documents deleted");
   }
}

When requirements change

At some point in the future, we decide to add validation to the groceryItem so that the quantity must be greater than zero. Since we don't need to worry about this when reading data, we only need to modify the ItemWriteRepository in order to implement this check:

public boolean validate(GroceryItem itm){
   boolean result = true;
   if(itm.getItemQuantity() <= 0)
   {
      System.out.println("  **Validation error - quantity for item " +
            itm.getName() + " must be greater than zero.**");
      result = false;
   }
   return result;
}

We must also call the new validator - here's the modified insert function:

public void insert(GroceryItem itm){
   if(validate(itm)) {
      GroceryItem result = mongoTemplate.insert(itm);
      if (result == null)
         System.out.println("No document inserted");
      else
         System.out.println("  " + result.getName() + " inserted");
   }
   else
      System.out.println("  Could not insert " + itm.getName() + " due to validation error above.");
}

In our code example, we also have an "updateItemQuantity" function. Since this only takes a name and a new quantity, the rule regarding quantity > 0 can be applied in this function by checking the 'newQuantity' parameter before doing the update. When using the CQRS pattern with Spring, we can be certain that we only need to update a single repository ItemWriteRepository, to ensure that ALL sources of application writes will apply the rules consistently.

The double-edged sword of Spring updates in MongoDB

I've discussed this previously in my article Building Java Microservices with the Repository Pattern, but it bears repeating here. In the code examples above, we wrote our own update statement to change a category. This is preferred to calling the saveAll repository function for several reasons:

Extra data and network I/O are incurred by pulling all documents to the client and sending them all back to the DB for update.
Performance could be severely affected if a large number of documents need to be updated.
This uses the standard MongoRepository saveAll() function to update an existing document, which is generally considered an anti-pattern in MongoDB.

Why should we avoid save() and saveAll() when updating documents? The main reasons to avoid these are for network traffic and oplog bloat. Let's discuss these individually.

Increased network traffic

In the original example, changing the category of a set of items required each document to be retrieved from the DB to the client. When there are only a couple of documents, this amount of overhead won't make much of a difference. However, imagine the traffic we would incur if there were thousands of items that had to have their categories changed. This could also consume a significant amount of memory on the client fetching this list.

When the saveAll() method is called, Spring will iterate through each document in the list to determine if it's a new document needing to be inserted or an existing one needing to be replaced. This is also a drag on performance as it must iterate through the list, check for the existence of the document by _id, and then decide what to do. Each document in this list will be sent to the DB individually. This is also a non-atomic operation, which could result in a partial update should there be some sort of error or outage.

Oplog bloat and replacing documents

The MongoDB operation log (or oplog) is how MongoDB replicates writes from the primary to secondaries. The oplog is a capped collection in MongoDB, meaning it is typically a fixed size. Although this collection can grow beyond its configured size, it's best to minimize data from each operation in order to reduce storage requirements and data going across the network. As the size of each operation grows in the oplog, fewer can fit before the oldest ones are overwritten in the collection. This translates into a smaller oplog window, which is the time a secondary can be offline and able to catch up when coming back online.

Let's see an example. The standard save() method will replace the entire document based on the existing _id. The resulting oplog entry would look something like this (some fields are eliminated for brevity):

{
  "op": "u", // for Update
  "ns": "mygrocerylist.GroceryItem",
  "o2": {"_id": "Whole Wheat Biscuit"},
  "o": {
    "_id": "Whole Wheat Biscuit",
    "name": "Whole Wheat Biscuit",
    "quantity": 5,
    "category": "munchies",
    "_class": "com.example.mdbspringboot.model.GroceryItem"
  }
}

Using the 'updateMulti' function in our bulkUpdateItemCategories function, this translates to an update of a single field using the MongoDB $set operator. The oplog entry would be smaller, in this case:

{
  "op" : "u", //  for update
  "ns": "mygrocerylist.GroceryItem",
  "o2": {"_id": "Whole Wheat Biscuit"},
  "o" : { "$set" : { "category" : "munchies" } } 
}

In the case of the first example, all of the highlighted fields have not changed and are simply bloating the oplog. Imagine if this document had 200 fields—we would be including all of the fields in the oplog for a single field update! When updating documents, it's best to write your own repo functions to avoid sending all of the document's fields to the DB for replacement. Use the updateXXX repo functions to provide an update that uses the $set MongoDB function under the covers.

Conclusion

The Command Query Responsibility Segregation (CQRS) model can (and should) be used to separate read and write logic. This has several benefits:

Less complexity when implementing business logic in the application
Easier to maintain - all logic can be implemented in the write repository
Easier to add functionality - for example, sending updates to Kafka for downstream systems
Easy to separate read and write sources if needed
Less effort to change the schema of your documents

There are a few downsides to CQRS:

The initial code can appear more complicated as there are read and write repositories to maintain. This may not be worth the effort for a very simple microservice that reads and writes to a single DB.
Needs to be enforced - your organization's code review process should ensure that these are kept separate in order to ensure maintainability going forward

Avoid using the standard Spring save() and saveAll() methods to update documents. They take a brute force approach by replacing the entire document, which can lead to significantly more network traffic, poor performance, and negative impacts on cluster availability due to oplog bloat.

Building a Kotlin App with Spring Boot and MongoDB Search

Ricardo Mello — Thu, 09 Apr 2026 15:21:05 +0000

Table of Contents

DemonstrationPre-requisitesWhat is MongoDB Search?Load sample datasetCreating the MongoDB Search indexTesting our index in MongoDB CompassBuilding a Kotlin applicationCreating the projectAdding MongoDB driver dependencyEstablishing a connectionCreating the repositoryCreating a serviceCreating a controllerFinal application structureApplication structureRunning the applicationConclusion

One of my favorite activities is traveling and exploring the world. You know that feeling of discovering a new place and thinking, "How have I not been here before?" It's with that sensation that I'm always motivated to seek out new places to discover. Often, when searching for a place to stay, we're not entirely sure what we're looking for or what experiences we'd like to have. For example, we might want to rent a room in a city with a view of a castle. Finding something like that can seem difficult, right? However, there is a way to search for information accurately using MongoDB Search.

In this tutorial, we will learn to build an application in Kotlin that utilizes full-text search in a database containing thousands of Airbnb listings. We'll explore how we can find the perfect accommodation that meets our specific needs.

Demonstration

To achieve our goal, we will create a Kotlin Spring Boot application that communicates with MongoDB Atlas using the Kotlin Sync Driver.

The application will use a pre-imported database in Atlas called sample_airbnb, utilizing the listingsAndReviews collection, which contains information about various Airbnbs.

To identify the best Airbnb listings, we will create an endpoint that returns information about these listings. This endpoint will use the summary field from the collection to perform a full-text search with the fuzzy parameter in text operator. Additionally, we will filter the documents based on a minimum number of reviews, utilizing the search functionalities provided by MongoDB Search.

Pre-requisites

MongoDB Atlas account
- Get started with MongoDB Atlas for free! If you don’t already have an account, MongoDB offers a free-forever Atlas cluster.
Java 21+
Gradle 8.8+
IDE of your choice

What is MongoDB Search?

MongoDB Search is a feature in MongoDB Atlas that provides powerful and flexible search capabilities for your data. It integrates with Apache Lucene, enabling advanced text analysis, custom scoring, and result highlighting. This allows you to build sophisticated search functionality directly within your MongoDB applications.

To utilize MongoDB Search effectively, we will focus on three key operators: text, range, and compound. Although there are various operators available, our exploration will concentrate on these to illustrate their practical applications.

Text: This operator will be used to perform text searches within our endpoint, allowing for approximate matching and handling variations in the search terms.
Range: We will explore the range operator specifically with the gte (greater than or equals) condition for the number_of_reviews field. This will enable us to query and filter based on review counts effectively.
Compound: The compound operator will be used to combine the text fuzzy and range queries into a more complex and refined search. This will demonstrate how to merge multiple criteria for more sophisticated search functionality.

While this article will not delve deeply into all available operators, those interested in a more comprehensive exploration can refer to the MongoDB Atlas Search documentation for further details.

Load sample dataset

Before starting, you'll need to import the sample dataset, which includes several databases and collections, like the Airbnb list. After setting up your cluster, just click on "Database" in the left menu and choose "Load sample dataset," as shown in the image:

If everything goes smoothly, after the import, you will see our databases and collections displayed as shown in the image.

Creating the MongoDB Search index

After importing the collections, the next step is to create an index for the Airbnb collection. To do this, select "Database" from the side menu under “Deployment,” go to the "MongoDB Search" tab, and click on "JSON Editor," as shown in the image.

In the next step, select the sample_airbnb database and the listingsAndReviews collection (the Airbnb collection). Then, name your index "searchPlaces":

Note that we are using Dynamic Mappings for simplicity, which allows MongoDB Search to automatically index the fields of supported types in each document. For more details, I suggest checking out Define Field Mappings.

If everything goes well, the "searchPlaces" index will be created successfully, and you can view it here.

Testing our index in MongoDB Compass

To test our index, we need to create an aggregation pipeline. While there are various methods to test this, we will use MongoDB Compass for convenience. MongoDB Compass is a powerful GUI tool that facilitates managing and analyzing MongoDB data. It provides features to visualize schemas, build queries, and manage data through an intuitive interface.

We need to set up an aggregation pipeline to meet the following requirements: Filter the summary field by text and ensure a minimum number of reviews. Here’s the aggregation pipeline we will use for testing:

[
  {
    $search: {
      index: "searchPlaces",
      compound: {
        filter: [
          {
            range: {
              path: "number_of_reviews",
              gte: 50
            }
          },
          {
            text: {
              path: "summary",
              query: "Istambun",
              fuzzy: {
                maxEdits: 2
              }
            }
          }
        ]
      }
    }
  },
  {
    $limit: 5
  },
  {
    $project: {
      _id: 0,
      name: 1,
      summary: 1,
      number_of_reviews: 1,
      price: 1,    
      street: "$address.street",
    }
  }
]

Let’s break down each stage:

$search: The $search stage uses the MongoDB Search capabilities to perform a full-text search with additional filtering.
1. index: "searchPlaces": This specifies the search index to use. If the index name were "default," we would not need to specify it here.
2. compound: This allows you to combine multiple search criteria. The compound query here is used to filter the search results based on both text and range criteria.
3. filter: This contains an array of filter criteria applied to the search results.
4. range: This filters documents where the number_of_reviews field is greater than or equal to 50.
5. text: Text performs a full-text search on the summary field with the query "Istambun." The fuzzy option with maxEdits: 2 allows for fuzzy matching, meaning it can match terms that are similar to "Istambun" with up to two character edits (insertions, deletions, or substitutions).
$limit: This limits the number of documents returned by the query to 5. Using a limit is essential to maintain performance.
$project: This specifies which fields to include or exclude in the final result.

Simply run this pipeline to obtain the results. See:

Building a Kotlin application

Our application will be developed in Kotlin with Spring. It’s important to note that we will not be using Spring Data. Instead, we will use the Kotlin Sync Driver, which is specialized for communication between the application and MongoDB. The goal of our application is simple: to provide an endpoint that allows us to make requests and communicate with MongoDB Atlas.

Creating the project

To do this, we'll use the Spring Initializer official page to create our project:

As you can see, I have only added the Spring Web dependency.

Adding MongoDB driver dependency

The first thing we’ll do is open the build.gradle.kts file and add the mongodb-driver-kotlin-sync dependency.

dependencies {
 implementation("org.mongodb:mongodb-driver-kotlin-sync:5.1.1")
}

Establishing a connection

To establish our connection, we need to follow these steps. First, update the application.properties file with the required values.

spring.application.name=Airbnb Searcher

spring.data.mongodb.uri=mongodb+srv://user:pass@cluster0.cluster.mongodb.net/

spring.data.mongodb.database=sample_airbnb

Notice: Don't forget to change your MongoDB string connection.

Next, we will create a MongoConfig class within the config directory to set up the connection when our application starts.

package com.mongodb.searcher.application.config
import com.mongodb.kotlin.client.MongoClient
import com.mongodb.kotlin.client.MongoDatabase
import org.springframework.beans.factory.annotation.Value
import org.springframework.context.annotation.Bean
import org.springframework.context.annotation.Configuration

@Configuration
class MongoConfig {
   @Value("\${spring.data.mongodb.uri}")
   lateinit var uri: String
   @Value("\${spring.data.mongodb.database}")
   lateinit var databaseName: String

   @Bean
   fun getMongoClient(): MongoClient {
       return MongoClient.create(uri)
   }

   @Bean
   fun mongoDatabase(mongoClient: MongoClient): MongoDatabase {
       return mongoClient.getDatabase(databaseName)
   }
}

Great, we have defined our MongoConfig class, which will use the values from application.properties. Create the class AirbnbEntity within the resources package:

package com.mongodb.searcher.resources
import com.mongodb.searcher.domain.Airbnb
import org.bson.codecs.pojo.annotations.BsonId
import org.bson.codecs.pojo.annotations.BsonProperty
import org.bson.types.Decimal128

data class AirbnbEntity(
   @BsonId val id: String,
   val name: String,
   val summary: String,
   val price: Decimal128,
   @BsonProperty("number_of_reviews")
   val numbersOfReviews: Int,
   val address: Address
) {

   data class Address(
       val street: String,
       val country: String,
       @BsonProperty("country_code")
       val countryCode: String
   )

   fun toDomain(): Airbnb {
       return Airbnb(
           id = id,
           name = name,
           summary = summary,
           price = price,
           numbersOfReviews = numbersOfReviews,
           street = address.street
       )
   }
}

Creating the repository

Now, let’s create our class that will utilize the MongoDB Search index. To do this, create the AirbnbRepository class within the resources package.

package com.mongodb.searcher.resources

import com.mongodb.client.model.Aggregates
import com.mongodb.client.model.Projections
import com.mongodb.client.model.search.FuzzySearchOptions
import com.mongodb.client.model.search.SearchOperator
import com.mongodb.client.model.search.SearchOptions
import com.mongodb.client.model.search.SearchPath
import com.mongodb.kotlin.client.MongoDatabase
import org.slf4j.LoggerFactory
import org.springframework.stereotype.Repository

@Repository
class AirbnbRepository(
   private val mongoDatabase: MongoDatabase
) {
   companion object {
       private val logger = LoggerFactory.getLogger(AirbnbRepository::class.java)
       private const val COLLECTION = "listingsAndReviews"
   }

   fun find(query: String, minNumberReviews: Int): List {
       val collection = mongoDatabase.getCollection(COLLECTION)
       return try {
           collection.aggregate(
               listOf(
                   createSearchStage(query, minNumberReviews),
                   createLimitStage(),
                   createProjectionStage()
               )
           ).toList()
       } catch (e: Exception) {
           logger.error("An exception occurred when trying to aggregate the collection: ${e.message}")
           emptyList()
       }
   }

   private fun createSearchStage(query: String, minNumberReviews: Int) =
       Aggregates.search(
           SearchOperator.compound().filter(
               listOf(
                   SearchOperator.numberRange(SearchPath.fieldPath("number_of_reviews"))
                       .gte(minNumberReviews),
                   SearchOperator.text(SearchPath.fieldPath(AirbnbEntity::summary.name), query)
                       .fuzzy(FuzzySearchOptions.fuzzySearchOptions().maxEdits(2))
               )
           ),
           SearchOptions.searchOptions().index("searchPlaces")
       )

   private fun createLimitStage() =
       Aggregates.limit(5)
   private fun createProjectionStage() =
       Aggregates.project(
           Projections.fields(
               Projections.include(
                   listOf(
                       AirbnbEntity::name.name,
                       AirbnbEntity::id.name,
                       AirbnbEntity::summary.name,
                       AirbnbEntity::price.name,
                       "number_of_reviews",
                       AirbnbEntity::address.name
                   )
               )
           )
       )
}

Let's analyze the find method.

As you can see, the method expects a query string and an int (minNumberReviews) and returns a list of AirbnbEntity. This list is generated through an aggregation pipeline, which consists of three stages:

Search stage: Utilizes the $search operator to filter documents based on the query and the minimum number of reviews
Limit stage: Restricts the result set to a maximum number of documents
Projection stage: Specifies which fields to include in the returned documents (this stage is optional and included here just to illustrate how to use it)

Notice: Depending on the scenario, adding stages after the $search stage can drastically impact the application's performance. For more details, refer to our docs on performance considerations.

Creating a service

To continue with our project, let's create a domain package with two classes. The first will be our Airbnb.

package com.mongodb.searcher.domain

import org.bson.codecs.pojo.annotations.BsonId
import org.bson.codecs.pojo.annotations.BsonProperty
import org.bson.types.Decimal128

data class Airbnb(
   @BsonId val id: String,
   val name: String,
   val summary: String,
   val price: Decimal128,
   @BsonProperty("number_of_reviews")
   val numbersOfReviews: Int,
   val street: String
)

Next, our service:

package com.mongodb.searcher.domain

import com.mongodb.searcher.resources.AirbnbRepository
import org.springframework.stereotype.Service

@Service
class AirbnbService(
   private val airbnbRepository: AirbnbRepository
) {
   fun find(query: String, minNumberReviews: Int): List {
       require(query.isNotEmpty()) { "Query must not be empty" }
       require(minNumberReviews > 0) { "Minimum number of reviews must not be negative" }
      return airbnbRepository.find(query, minNumberReviews).map { it.toDomain() }
   }
}

Notice that this class is responsible for validating our inputs and accessing the repository.

Creating a controller

To enable REST communication, create the AirbnbController class within the application.web package:

package com.mongodb.searcher.application.web

import com.mongodb.searcher.domain.Airbnb
import com.mongodb.searcher.domain.AirbnbService
import org.springframework.web.bind.annotation.GetMapping
import org.springframework.web.bind.annotation.RequestParam
import org.springframework.web.bind.annotation.RestController

@RestController
class AirbnbController(
   private val airbnbService: AirbnbService
) {
   @GetMapping("/airbnb/search")
   fun find(
       @RequestParam("query") query: String,
       @RequestParam("minNumberReviews") minNumberReviews: Int
   ): List {
       return airbnbService.find(query, minNumberReviews)
   }
}

Final application structure

Great. If all the steps have been followed, our folder structure should look similar to the one in the image:

Application structure

Running the application

Simply run the application and access the endpoint provided at 'http://localhost:8080/airbnb/search'. Below is an example of how to use it:

curl --location 'http://localhost:8080/airbnb/search?query=Istambun&minNumberReviews=50'

Conclusion

In this tutorial, we built a Kotlin-based Spring Boot application that uses MongoDB Search to find Airbnb listings efficiently. We demonstrated how to set up MongoDB Atlas, create a search index, and implement an aggregation pipeline for filtering and searching data.

While we focused on fuzzy matching and review count filtering, MongoDB Search offers many other powerful features, such as custom scoring and advanced text analysis.

Exploring these additional capabilities can further enhance your search functionality and provide even more refined results. The example source code used in this series is available on GitHub.

For more details on MongoDB Search, you can refer to the Exploring Search Capabilities With MongoDB Search article.

The post Building a Kotlin App with Spring Boot and MongoDB Search appeared first on foojay.

foojay – a place for friends of OpenJDK

What is Sharding in MongoDB and When Should You Use It?

A Practical Introduction to Horizontal Scaling

The Scaling Problem Most Databases Face

What is Horizontal Scaling?

What is Sharding in MongoDB?

MongoDB Sharded Cluster Architecture

1. Shards

2. Config Servers

3. Mongos Router

Choosing a Shard Key

Creating a Sharded Collection

Querying Data in a Sharded Cluster

When Should You Use Sharding?

Large datasets

High write throughput

Rapid data growth

When Sharding Might Be Overkill

Sharding vs Replication

Final Thoughts

Exploring MongoT (Atlas Search)

Let’s dive in!

Simple Example - Text Search

Breakdown Table (for a ~9ms $search aggregation path through MongoT)

Local Debugging

Sample Data

Interesting Example - Faceted Text Search

Lucene Indexing Strategy + Benefits over MongoD Indexes

Current State of MongoDB’s Built-In Search

Advanced Text Search

Multiple-index searching with merged results

Vector Search

Further Reading

Vector Search Example

Local Grafana Monitoring

Performance

Java Code Packages

So what can you learn from MongoT?

Wrap

AI-Powered Code Review Assistant: Automated Code Analysis with Spring AI and MongoDB

Prerequisites

1. Project setup

2. Storing and managing review patterns

Defining the pattern model

Creating the repository

Building the service layer

Exposing the REST endpoints

3. Embedding patterns with Spring AI and MongoDB Atlas Vector Search

Adding Spring AI dependencies

Generating embeddings

Seeding the pattern library

Creating the Atlas Vector Search index

Searching for similar patterns

4. Building the code review engine

Defining the submission and finding models

The review service

Prompt design

The review controller

Testing the review engine

5. Tracking review trends with aggregation pipelines

6. Testing the full workflow

Conclusion

Building an AI-Powered Operations Assistant with Spring AI and MongoDB Atlas — Part 1: RAG Foundation

The problem

What we are building

Why RAG and why MongoDB Atlas

How the Pieces Fit Together

Getting the Project Running

The Ingestion Pipeline

The Retrieval Pipeline

The Atlas Vector Search Index

Trying It Out

Conclusion and What’s Next

When Should You Use a Cache With MongoDB?

Why were caches like Memcached & Redis invented, and why do they thrive?

So, what's wrong with having a caching tier?

What's different with MongoDB?

What does AI think?

Summary

Learn more about MongoDB design reviews