Back to Blog
Headshot of Teodora Brašančević

4 minutes read

Using MongoDB for distributed locking

Teodora Brašančević

Software Engineer

When building distributed systems, one of the biggest challenges is ensuring consistency when multiple data-processing flows are working with the same data. Here we examine our approach for distributed locking using MongoDB.

In our case, we were developing an Ad export system, where multiple flows (Kafka consumers and APIs) were trying to process the same Ads in parallel. Without proper coordination and a locking mechanism, we could risk race conditions that lead to inconsistent results.

To prevent this, we implemented a distributed locking mechanism with MongoDB. We used MongoDB’s atomic updates and versioning to build a reliable, document-level locking system.

Here’s how we did it.

What we are dealing with

First, let’s see how Ad flows are configured. On our platform Ads get exported to an external platform under various triggers, for example:

Ad configuration:

  • New Ad created → export to external platform
  • Existing Ad updated → export to external platform
  • Deleted Ad → remove from external platform

User configuration:

  • User enables Ads synchronisation → export all their Ads
  • User disables Ads synchronisation → remove all their Ads
  • User opts out via API → remove all their Ads
  • User opts in via API → export all their Ads

Recovery:

  • Retry job → retries failed export and deletions

All these actions are handled by four different flows with the following sources:

  • Kafka topic with Ad creation, update and deletion
  • Kafka topic with user setting changes
  • API call for user opt out and opt in
  • Scheduled retry job for failed exports and deletions

Why is that a problem?

As data processing flows run in parallel, it is possible for multiple entry points to act on the same Ad at the same time, leading to race conditions in this distributed system. For example, consider this scenario:

  1. An Ad is updated, triggering an export operation. 
  2. Shortly after, the user disables automatic export, which triggers deletion.
  3. However, the export operation from step 1 completes after the deletion.
  4. As a result, the Ad gets re-exported even though it was meant to be removed.

To handle this scenario and many others, we need a way to guarantee consistency and ensure that only one flow can process a given Ad at a time, even if multiple are triggered simultaneously.

Our solution

This solution acts as a MongoDB-based distributed lock manager for Ads, ensuring consistency during parallel processing. To manage the document state, we integrated our locking mechanism directly into the Ad document itself. It includes three key fields:

  • lockedBy – tells which flow (e.g. Kafka consumer, API call) is currently processing the Ad
  • exportStatus – tells the current export state represented by enum PENDING, EXPORT_FAILED, EXPORTED, DELETION_PENDING, DELETION_FAILED
  • documentVersion – prevents acquiring a lock on a stale document

Now, the implementation. Before any flow starts processing an Ad, it tries to acquire a lock. The lock is only granted if:

  1. The Ad has a specific ID and document version. This means that there are no stale updates,
  2. And one of the following is true:
  • exportStatus is in a final state
  • The Ad is already locked by the same flow
  • The Ad hasn’t been locked yet
public Ad acquireLock(Ad newAd, LockedBy lockedBy, ExportStatus pendingStatus) {
  var documentVersion = Optional.ofNullable(newAd.getDocumentVersion()).orElse(1L);
  var adId = newAd.getId().getValue();
  var query = getQueryForAcquiringLock(adId, documentVersion, lockedBy);
  var update = new Update().set("documentVersion", documentVersion + 1)
      .set("lockedBy", lockedBy.name()).set("exportStatus", pendingStatus);
  try {
    var updateResult = mongoTemplate.updateFirst(query, update, AdEntity.class);
    validateUpdateResult(updateResult, adId, documentVersion);
  } catch (Exception e) {
    throw new RepositoryException(
        "Failed to update status and version for export for ad with id: " + adId
            + " and document version: " + documentVersion, e);
  }
  newAd.setDocumentVersion(documentVersion + 1);
  newAd.setExportStatus(pendingStatus);
  newAd.setLockedBy(lockedBy);
  return newAd;
}


private static Query getQueryForAcquiringLock(Long id, Long documentversion, LockedBy lockedBy) {
  Criteria baseCriteria = Criteria.where("_id").is(id).and("documentVersion").is(documentversion);
  Criteria additionalCriteria = new Criteria().orOperator(Criteria.where("exportStatus")
          .in(ExportStatus.EXPORTED.name(), ExportStatus.EXPORT_FAILED.name(),
              ExportStatus.DELETION_FAILED.name()), Criteria.where("lockedBy").is(lockedBy.name()),
      Criteria.where("lockedBy").exists(false));
  return new Query(new Criteria().andOperator(baseCriteria, additionalCriteria));
}

This ensures that:

  • only one flow can act on an Ad at the same time;
  • flows can resume processing Ads they have already locked;
  • stale operations (based on old versions) are prevented.

Here is how one of the flows handles an update, step by step:

  1. Try to acquire the lock
  2. Update the status to PENDING and increment the version
  3. Process the Ad (export or delete)
  4. Update the status to EXPORTED
  5. Persist the updated Ad
log.info("Setting pending status and incrementing version for Ad id: {}",
    newAd.getId().getValue());
var updatedAd = concurrentAdRepository.acquireLock(newAd, lockedBy, ExportStatus.PENDING);
log.info("Map to external model for Ad id: {}", newAd.getId().getValue());
var externalAd = adMapper.toExternalAd(newAd, existingAd.getRemoteId().orElse(null));
log.info("Submitting Ad to external platform for Ad id: {}", newAd.getId().getValue());
apiClient.updateAd(newAd.getId().getValue(), externalAd);
log.info("Saving updated ad to repository for ad id: {}", newAd.getId().getValue());
updatedAd.setExportStatus(ExportStatus.EXPORTED);
concurrentAdRepository.update(updatedAd);

All flows follow the same locking pattern before proceeding with export or deletion.

Why does this work?

If one consumer fails to acquire the lock, because another flow got it first, it won’t drop the message. Thanks to the Kafka retry mechanism, it will keep retrying until it eventually succeeds. Even if the service is temporarily down, Kafka ensures that the message is still there when the service comes back online. For the opt-in / opt-out API flow, retries are handled through our Retry job, which picks up any Ads that didn’t finish processing and reattempts them later. 

This retry behaviour is what allows our MongoDB locking approach to remain simple, while still guaranteeing eventual success for every message across all flows.

Why MongoDB?

We chose MongoDB because it offers exactly what we needed for safe and efficient locking. 

Its updateOne (updateFirst when using Spring Boot) operation supports atomic conditional updates, meaning the database checks the condition and applies the update in a single indivisible step.

MongoDB serializes these write operations on the primary node by default. This ensures that only one flow can modify a document at a time and that the update is applied only if the specified condition is met at the moment of execution. This approach was simple to implement, fully aligned with our data model, and easy to maintain and scale.

Potential issues

So far, we’ve encountered one issue: stale distributed locks.

Stale distributed locks

A stale distributed lock occurs when an Ad is in a pending state but, due to an unexpected error, the flow gets interrupted, the lock isn’t released and the process isn’t retried. In other words, the Ad basically gets stuck in a limbo.

This happened to us when several Ads were locked by one of the flows and remained in pending state. An exception stopped the flow, leaving the Ads locked and preventing retries. To resolve this, we manually updated its status to DELETION_FAILED using the following MongoDB query:

db.getCollection("ads").updateMany({lockedBy: "USER_SETTING_CHANGE", exportStatus: "PENDING"}, {$set: {exportStatus: "DELETION_FAILED"}})

This allowed our Retry job to pick them up and reprocess them after the fix was implemented.

If stale distributed locks were to become a recurring issue, aside from fixing the root cause, we would consider implementing the following solution:

  • add a lockTimestamp to each Ad when lock is acquired;
  • let the Retry job release old locks based on timestamp;
  • optionally: implement an internal endpoint to clear stale locks on demand

Final thoughts

To prevent race conditions in a system with multiple distributed flows running in parallel, we implemented a distributed locking mechanism using MongoDB. The lock acquisition is handled through MongoDB’s atomic conditional updates and we store the lock state directly within each document. We rely on the Kafka retry mechanism and our scheduled Retry job to handle retries. This approach helps us prevent concurrency issues across multiple flows and keeps our logic consistent and maintainable.

Headshot of Teodora Brašančević

Teodora Brašančević

Software Engineer

Teodora is a backend software engineer experienced in building and maintaining microservice‑based systems. She is committed to clean code principles and works on solutions with reliability, clarity, and long‑term maintainability in mind. She enjoys improving development processes and continuously expanding her expertise to build robust, well‑structured systems.

Related posts.