Launching Agent Runtime: Parallel MCP at Scale
AI agents need to make hundreds of MCP calls. Doing them sequentially is slow. Agent Runtime parallelizes everything. Here's how it works.
AI agents are only as fast as their slowest operation.
When an agent needs to query 100 database records, search 50 files, and call 20 APIs—all through MCP—doing them one at a time is a bottleneck. Sequential execution means your agent spends most of its time waiting.
Agent Runtime fixes this. It parallelizes MCP tool calls, manages concurrency, and handles failures gracefully. Here's what that looks like.
The Sequential Problem
Most MCP implementations execute tools sequentially:
// Sequential - slow
for (const item of items) {
await mcpServer.callTool('process', { data: item });
}
// 100 items × 200ms per call = 20 seconds
With 100 MCP calls at 200ms each, you're waiting 20 seconds. The agent is idle 99% of the time.
Parallel Execution
Agent Runtime executes MCP calls concurrently:
// Parallel - fast
const promises = items.map(item =>
runtime.call('process', { data: item })
);
await Promise.all(promises);
// 100 items in ~200ms (with proper concurrency limits)
Same 100 calls, now executing in parallel. With proper concurrency control (e.g., 10 concurrent connections), this drops to ~2 seconds instead of 20.
How Agent Runtime Works
1. Connection Pool Management
Maintains a pool of MCP connections:
- Reuses connections across calls
- Handles connection lifecycle
- Manages authentication per session
const runtime = new AgentRuntime({
maxConnections: 10,
connectionTimeout: 5000,
});
2. Concurrency Control
Prevents overwhelming the MCP server:
- Limits concurrent requests
- Queues overflow requests
- Respects rate limits
runtime.configure({
maxConcurrent: 10, // Max 10 parallel calls
queueSize: 1000, // Queue up to 1000 requests
});
3. Automatic Retry & Failure Handling
Handles transient failures:
- Retries failed calls with exponential backoff
- Circuit breaker for failing services
- Collects partial results
const results = await runtime.callMany('queryDatabase', queries, {
retries: 3,
failFast: false, // Continue even if some fail
});
// Returns: { success: [...], failed: [...] }
4. Result Aggregation
Collects and organizes results:
- Maintains call order
- Tracks success/failure per call
- Provides progress callbacks
const results = await runtime.callMany('search', items, {
onProgress: (completed, total) => {
console.log(`${completed}/${total} complete`);
}
});
Real-World Use Cases
1. Batch Data Processing
Process 1000 records through an MCP tool:
const records = await database.getRecords();
const processed = await runtime.callMany(
'processRecord',
records.map(r => ({ id: r.id, data: r.data })),
{ maxConcurrent: 20 }
);
Before: 1000 × 150ms = 150 seconds (2.5 minutes) After: ~7.5 seconds (20 concurrent batches)
2. Multi-Source Search
Search across multiple data sources:
const sources = ['github', 'slack', 'notion', 'drive'];
const results = await runtime.callMany(
'search',
sources.map(source => ({ source, query: 'bug reports' }))
);
All searches execute in parallel. Get results in the time of the slowest source, not the sum of all sources.
3. Parallel API Calls
Call multiple external APIs through MCP:
const apis = [
{ endpoint: 'weather', params: { city: 'SF' } },
{ endpoint: 'stock', params: { symbol: 'AAPL' } },
{ endpoint: 'news', params: { topic: 'tech' } },
];
const responses = await runtime.callMany('apiCall', apis);
Performance Comparison
| Scenario | Sequential | Parallel (10 concurrent) | Speedup |
|---|---|---|---|
| 100 calls × 200ms | 20s | ~2s | 10x |
| 500 calls × 150ms | 75s | ~7.5s | 10x |
| 1000 calls × 100ms | 100s | ~10s | 10x |
Linear scaling with concurrency limit. The more calls, the bigger the win.
Error Handling
Agent Runtime doesn't fail the entire batch if one call fails:
const results = await runtime.callMany('process', items, {
failFast: false,
onError: (error, item) => {
console.error(`Failed to process ${item.id}: ${error}`);
}
});
console.log(`Success: ${results.success.length}`);
console.log(`Failed: ${results.failed.length}`);
Get partial results. Log failures. Continue processing.
Configuration Options
const runtime = new AgentRuntime({
// Connection pool
maxConnections: 10,
connectionTimeout: 5000,
keepAlive: true,
// Concurrency
maxConcurrent: 20,
queueSize: 1000,
// Retry logic
retries: 3,
retryDelay: 1000,
backoffMultiplier: 2,
// Circuit breaker
failureThreshold: 5,
resetTimeout: 30000,
});
When to Use Agent Runtime
Use it when:
- You have multiple MCP calls that can run independently
- Latency matters (agents waiting = bad UX)
- You're processing batches of data
- You need reliability with automatic retries
Skip it when:
- You have only a few sequential calls
- Call order matters (dependencies between calls)
- Your MCP server can't handle concurrent requests
Getting Started
Install:
npm install @leanmcp/agent-runtime
Basic usage:
import { AgentRuntime } from '@leanmcp/agent-runtime';
const runtime = new AgentRuntime({
mcpServerUrl: 'http://localhost:3001',
maxConcurrent: 10,
});
// Single call
const result = await runtime.call('toolName', { param: 'value' });
// Parallel batch
const results = await runtime.callMany('toolName', [
{ param: 'value1' },
{ param: 'value2' },
{ param: 'value3' },
]);
Resources
AI agents shouldn't wait around. Agent Runtime makes them fast by default.