Technology7 min readNovember 16, 2025

Launching Agent Runtime: Parallel MCP at Scale

AI agents need to make hundreds of MCP calls. Doing them sequentially is slow. Agent Runtime parallelizes everything. Here's how it works.

Lu Xian
Co-Founder
MCPPerformanceAI AgentsAgent Runtime

AI agents are only as fast as their slowest operation.

When an agent needs to query 100 database records, search 50 files, and call 20 APIs—all through MCP—doing them one at a time is a bottleneck. Sequential execution means your agent spends most of its time waiting.

Agent Runtime fixes this. It parallelizes MCP tool calls, manages concurrency, and handles failures gracefully. Here's what that looks like.

The Sequential Problem

Most MCP implementations execute tools sequentially:

// Sequential - slow
for (const item of items) {
  await mcpServer.callTool('process', { data: item });
}
// 100 items × 200ms per call = 20 seconds

With 100 MCP calls at 200ms each, you're waiting 20 seconds. The agent is idle 99% of the time.

Parallel Execution

Agent Runtime executes MCP calls concurrently:

// Parallel - fast
const promises = items.map(item => 
  runtime.call('process', { data: item })
);
await Promise.all(promises);
// 100 items in ~200ms (with proper concurrency limits)

Same 100 calls, now executing in parallel. With proper concurrency control (e.g., 10 concurrent connections), this drops to ~2 seconds instead of 20.

How Agent Runtime Works

1. Connection Pool Management

Maintains a pool of MCP connections:

  • Reuses connections across calls
  • Handles connection lifecycle
  • Manages authentication per session
const runtime = new AgentRuntime({
  maxConnections: 10,
  connectionTimeout: 5000,
});

2. Concurrency Control

Prevents overwhelming the MCP server:

  • Limits concurrent requests
  • Queues overflow requests
  • Respects rate limits
runtime.configure({
  maxConcurrent: 10, // Max 10 parallel calls
  queueSize: 1000,   // Queue up to 1000 requests
});

3. Automatic Retry & Failure Handling

Handles transient failures:

  • Retries failed calls with exponential backoff
  • Circuit breaker for failing services
  • Collects partial results
const results = await runtime.callMany('queryDatabase', queries, {
  retries: 3,
  failFast: false, // Continue even if some fail
});

// Returns: { success: [...], failed: [...] }

4. Result Aggregation

Collects and organizes results:

  • Maintains call order
  • Tracks success/failure per call
  • Provides progress callbacks
const results = await runtime.callMany('search', items, {
  onProgress: (completed, total) => {
    console.log(`${completed}/${total} complete`);
  }
});

Real-World Use Cases

1. Batch Data Processing

Process 1000 records through an MCP tool:

const records = await database.getRecords();

const processed = await runtime.callMany(
  'processRecord',
  records.map(r => ({ id: r.id, data: r.data })),
  { maxConcurrent: 20 }
);

Before: 1000 × 150ms = 150 seconds (2.5 minutes) After: ~7.5 seconds (20 concurrent batches)

2. Multi-Source Search

Search across multiple data sources:

const sources = ['github', 'slack', 'notion', 'drive'];

const results = await runtime.callMany(
  'search',
  sources.map(source => ({ source, query: 'bug reports' }))
);

All searches execute in parallel. Get results in the time of the slowest source, not the sum of all sources.

3. Parallel API Calls

Call multiple external APIs through MCP:

const apis = [
  { endpoint: 'weather', params: { city: 'SF' } },
  { endpoint: 'stock', params: { symbol: 'AAPL' } },
  { endpoint: 'news', params: { topic: 'tech' } },
];

const responses = await runtime.callMany('apiCall', apis);

Performance Comparison

ScenarioSequentialParallel (10 concurrent)Speedup
100 calls × 200ms20s~2s10x
500 calls × 150ms75s~7.5s10x
1000 calls × 100ms100s~10s10x

Linear scaling with concurrency limit. The more calls, the bigger the win.

Error Handling

Agent Runtime doesn't fail the entire batch if one call fails:

const results = await runtime.callMany('process', items, {
  failFast: false,
  onError: (error, item) => {
    console.error(`Failed to process ${item.id}: ${error}`);
  }
});

console.log(`Success: ${results.success.length}`);
console.log(`Failed: ${results.failed.length}`);

Get partial results. Log failures. Continue processing.

Configuration Options

const runtime = new AgentRuntime({
  // Connection pool
  maxConnections: 10,
  connectionTimeout: 5000,
  keepAlive: true,
  
  // Concurrency
  maxConcurrent: 20,
  queueSize: 1000,
  
  // Retry logic
  retries: 3,
  retryDelay: 1000,
  backoffMultiplier: 2,
  
  // Circuit breaker
  failureThreshold: 5,
  resetTimeout: 30000,
});

When to Use Agent Runtime

Use it when:

  • You have multiple MCP calls that can run independently
  • Latency matters (agents waiting = bad UX)
  • You're processing batches of data
  • You need reliability with automatic retries

Skip it when:

  • You have only a few sequential calls
  • Call order matters (dependencies between calls)
  • Your MCP server can't handle concurrent requests

Getting Started

Install:

npm install @leanmcp/agent-runtime

Basic usage:

import { AgentRuntime } from '@leanmcp/agent-runtime';

const runtime = new AgentRuntime({
  mcpServerUrl: 'http://localhost:3001',
  maxConcurrent: 10,
});

// Single call
const result = await runtime.call('toolName', { param: 'value' });

// Parallel batch
const results = await runtime.callMany('toolName', [
  { param: 'value1' },
  { param: 'value2' },
  { param: 'value3' },
]);

Resources

AI agents shouldn't wait around. Agent Runtime makes them fast by default.