Roadmap

Tracked feature ideas, technical debt, and enhancements for MCPSmithy.

Improvements

Plugin / module system

Problem: Sources are limited to three built-in types (local, git, scrape) and tool template functions are a fixed set. There is no way to add external data sources or custom tool logic without modifying core code. Both extension points share the same shape, but there is no shared interface or registration mechanism that would allow logic to live outside the main binary.

Value: The core stays short and security-auditable; config parsing, validation, sandbox, MCP protocol, template engine, BM25 indexing, transport. Specialized logic like http_get (HTTP + .netrc auth), scrape (HTML parsing), and git (clone via local git binary) could migrate to officially maintained plugins. Community authors could add integrations without forking the project.

Why parked: The current three source types and built-in functions cover all concrete use cases today. A plugin system is an architecture decision with long-lived API commitments; designing the contract, versioning story, and security boundary before there is a forcing use case risks over-engineering.

OpenTelemetry observability

Problem: Tool call log lines are isolated events. There is no way to correlate them back to the upstream request that triggered them; whether that's a user request hitting a backend API, a job in an agentic workload, or a pipeline step. Each tool/call done line has latency and outcome, but no link to the broader execution context it was part of.

Value: If MCPSmithy is part of an application stack, e.g. a backend delegates to an agent, the agent calls mcpsmithy tools over HTTP; OTel traceparent propagation would let tool calls appear as spans within the originating request's trace. You'd get end-to-end visibility across services: user request → backend → agent → tool call, all in one trace. That's meaningfully different from what log aggregation alone can provide.

Why parked: slog JSON output is sufficient for the current use cases; local/stdio for individual engineers, and shared HTTP for doc server deployments. The application stack pattern (mcpsmithy embedded inside a product's agentic workload) is a real use case but no one is running it yet. OTel is only worth building once that adoption exists; the SDK adds meaningful binary weight and the integration requires operator infra (Jaeger, Tempo, etc.) to be in place anyway. Revisit when a concrete deployment requires cross-service trace correlation.

Chunking improvements

Problem: The current chunking strategies have two gaps. First, there is no upper bound on chunk size; a very large file becomes a single chunk that dilutes search scores. Second, the section strategy only works correctly for Markdown; code files have natural boundaries too, but there is no way to split them at the right granularity today.

Value: Bounded chunk sizes would prevent large files from dominating search results. Per-language chunking would improve search precision for code sources, returning the relevant function or declaration rather than the whole file.

Why parked: For the token-bound problem, splitting at fixed token boundaries destroys semantic meaning; a coherent section or whole file is always a better unit for the current search approach. This tradeoff only shifts once embedding-based search is added, so token chunking should be implemented alongside that, not before. For per-language chunking, auto-detection is correct and safe at current scale; whole-file results with preview snippets are sufficient for navigation. Revisit when code search quality becomes a demonstrated pain point.

Environment variable substitution in config

Problem: There is no way to inject dynamic values; environment-specific paths or tokens; into the config without modifying the file directly. This makes it harder to share configs across environments or keep sensitive values out of version control.

Value: Would allow configs to be portable across environments without modification and keep sensitive values out of the config file itself.

Why parked: Path portability is already handled by relative paths and Docker mounts. The credential injection use case belongs to the action tools design; expansion at parse time is the wrong shape for secrets, since the config file is readable by the agent and any injected variable names would be visible to it. Any future design here must address that exposure at the same time.

MCP Prompts support

Problem: Conventions and design docs are currently discovered through the tool-based workflow; agents search with search_for or call a convention listing tool. This works but requires multiple steps and offers limited UX for clients.

Value: The MCP spec defines a prompts capability that would allow conventions and docs to be exposed as native, browsable prompts (prompts/list and prompts/get); no search needed. Conventions become discoverable as menu items, and non-agent users (humans in VS Code) can browse project context directly.

Why parked: Client support for MCP prompts is sparse; most MCP clients don't support the prompts capability yet, limiting the practical benefit to users today.

Progress notifications

Problem: Long-running operations provide no feedback to the client, making them appear frozen.

Value: Users see real-time feedback on long operations, improving perceived performance and preventing timeout assumptions.

Why parked: Today's tool surface (search, file reads, templates) completes in milliseconds with no external dependencies. Progress reporting only becomes valuable when tools genuinely require background work; async fetches, external API calls, compute-heavy operations.

`logging/setLevel` capability

Problem: In remote/HTTP deployments, there is no way to change server log verbosity without restarting the container. Ambient server-side events; index build failures, source fetch errors; are invisible to the client, appearing only on the server's stderr, which is inaccessible when the server runs remotely.

Value: Runtime verbosity control without restart would help operators debug remote deployments. notifications/message forwarding would surface server health events directly in the client's interface.

Why parked: --log-level at startup covers the verbosity use case. The more useful half of this feature; forwarding server log events to the client via notifications/message; has no meaningful client support today. Revisit if a client we're targeting starts consuming those notifications.

MCP Resources support

Problem: Browsing project files requires calling the file_read tool, which isn't intuitive for clients that support native resource browsing.

Value: Enhanced UX for clients; files appear in a navigable resource tree rather than requiring tool invocations. Real-time file change notifications. Better parity with traditional client interfaces.

Why parked: The functionality already works via file_read. Resources are a UX layer that depends on client support. Revisit if client adoption of Resources becomes widespread.

Embedding-based semantic search and large corpus support

Problem: The current search fails when the user doesn't know the right vocabulary; a query like "how do I ship my app?" scores zero against a doc titled "Release Procedure" because the terms don't overlap. Embedding-based search doesn't have this limitation, but its storage and compute costs grow with corpus size in a way keyword search doesn't.

Value: Embedding-based search matches by meaning rather than exact terms, so vocabulary mismatch stops being a blocker. At project scale the infrastructure cost is modest; a dedicated vector index only becomes necessary at much larger corpus sizes than current use cases require.

Why parked: A capable agent running against a well-configured mcpsmithy instance can already bridge much of the vocabulary gap by rewriting queries before calling search_for. The infrastructure cost (embedding model dependency, vector storage, token-bounded chunking) is high relative to the marginal search quality improvement at current corpus sizes.

Family-level initiatives

Strategic initiatives that span the broader Smithy family — agent runtime, model routing, Kubernetes operator, template registry — are tracked on the Smithy roadmap. Agent execution has already moved out of mcpsmithy and into agentsmithy; the remaining initiatives live at the family level until they have repos of their own.

Improvements​

Plugin / module system​

OpenTelemetry observability​

Chunking improvements​

Environment variable substitution in config​

MCP Prompts support​

Progress notifications​

logging/setLevel capability​

MCP Resources support​

Embedding-based semantic search and large corpus support​

Family-level initiatives​