Perseus MCP Server: How It Works

This document explains internal design, request flow, and extension points for the perseus MCP server.

High-Level Design

The server is a single-process Python MCP tool host built with FastMCP:

All tool functions are async and return text payloads. Many are raw upstream responses, while discovery, plaintext, and navigation fallback tools shape responses locally.

Architecture Goals and Tradeoffs

This project is intentionally small and adapter-like. It does not attempt to mirror or warehouse Perseus data locally. Instead, it exposes a stable MCP tool surface over public Perseus/Scaife HTTP services so an LLM client can discover, search, retrieve, and navigate Greek texts on demand.

The main design goals are:

  1. LLM-client portability: any MCP-capable application should be able to run the same local command and receive the same tool names, descriptions, input schemas, and text outputs.
  2. Scholarly fidelity: raw CTS XML and Scaife JSON remain available for core retrieval and search operations so users can inspect upstream data rather than a lossy local rewrite. Convenience helpers such as get_passage_plaintext, get_author_resources, and navigation fallbacks are added where they reduce repetitive parsing or compensate for malformed upstream responses.
  3. Low operational burden: the MCP server itself requires no database, indexing job, API key, or background service. A user installs Python dependencies and runs the MCP server command from their client. Optional client-side LLM adapters, such as the OpenRouter notebook, have their own provider credentials.
  4. Readable extension path: adding another CTS operation should be a small tool wrapper around _cts_request(...), making the implementation easy to audit for classicists, students, and developers.

The tradeoff is that availability and latency depend on the upstream Perseus and Scaife services. The server also returns mostly text payloads instead of a fully normalized domain model; that preserves source fidelity but means some clients will parse XML/JSON in their own workflow.

Why FastMCP?

FastMCP is used because it keeps this server close to the conceptual model of MCP: typed Python functions become MCP tools. For this repository, that choice provides several practical benefits:

A lower-level MCP implementation would give more manual control over protocol details, but it would add boilerplate that is not central to the research task. A standalone REST API would be familiar to web developers, but LLM applications would still need an MCP adapter to expose tools. FastMCP is therefore the smallest abstraction that serves the core user story: make Perseus research tools available to the LLM of the user’s choice.

External Services

1) Perseus CTS endpoint

Base URL:

CTS tools call this endpoint with a query parameter named request, plus optional parameters such as urn and level.

2) Scaife search endpoint

Base URL:

search_perseus normalizes Greek Unicode/Beta Code input, then calls this endpoint with q, kind=form, type=library, and page_num=1. The language argument determines whether Greek query normalization is applied; it is not currently sent as a Scaife language filter.

Core HTTP Helpers

_get(url, params=None, timeout=20.0)

_cts_request(request, urn=None, **extra_params)

This abstraction keeps tool methods concise and consistent.

Tool Behavior

All tools are decorated with @mcp.tool and become MCP-exposed functions.

Author resource filtering

get_author_resources(author) is a convenience layer over CTS GetCapabilities. It fetches the capabilities XML, finds matching TextGroup or textgroup entries by case-insensitive author/group name or textgroup URN fragment, and returns JSON instead of raw XML. Each matched author entry includes the textgroup URN, names, works, work languages, titles, editions, and translations so clients can discover resource URNs without manually parsing the full capabilities response.

The live Perseus CTS service may return malformed HTML for GetFirstUrn and GetPrevNextUrn. The server first attempts those CTS operations directly. If the response is not well-formed XML with the expected root element, it requests GetValidReff and constructs a small well-formed XML response from the ordered reference URNs. For get_prev_next_urn, the fallback derives the work or edition URN by removing the passage component after the final colon.

This fallback preserves the tool contract but is locally shaped output rather than a verbatim upstream response.

Greek query normalization

Before Greek searches are sent to Scaife, search_perseus normalizes input with _normalize_greek_query(...). Unicode Greek is NFC-normalized, while detected or forced Beta Code is transliterated to Unicode Greek, including common breathings, accents, diaeresis, iota subscript, uppercase markers, and final sigma handling. query_format may be auto, betacode, or unicode; auto detects explicit Beta Code marks and short unaccented Beta Code-like queries.

Unicode normalization findings

The Greek search path normalizes outgoing Greek queries to NFC because Perseus Greek text samples use composed Unicode for polytonic Greek. For example, canonical Iliad text such as μῆνιν ἄειδε θεὰ ... Ἀχιλῆος contains precomposed code points like U+1FC6 GREEK SMALL LETTER ETA WITH PERISPOMENI, U+1F04 GREEK SMALL LETTER ALPHA WITH PSILI AND OXIA, and U+1F08 GREEK CAPITAL LETTER ALPHA WITH PSILI. A local Unicode check of that sample reports NFC-normalized text and not NFD-normalized text, so Beta Code conversion should emit composed Unicode Greek before search requests are sent to Scaife.

You can re-check a sample manually from the project root with:

python - <<'PY'
import unicodedata

sample = "μῆνιν ἄειδε θεὰ Πηληϊάδεω Ἀχιλῆος"
print("NFC", unicodedata.is_normalized("NFC", sample))
print("NFD", unicodedata.is_normalized("NFD", sample))
for character in sample:
    if character.strip():
        print(f"U+{ord(character):04X}", unicodedata.name(character, "UNKNOWN"))
PY

Error Model

Errors are not swallowed:

Potential future enhancement:

Data Contract

Current return type is text payloads for mixed raw and locally shaped data:

This shape is deliberate for mixed human/LLM use. Raw XML or JSON lets a user verify exactly what came from Perseus/Scaife, while helper tools provide a friendlier path for common tasks where full CTS XML is too verbose. In an LLM client, the recommended pattern is to ask for discovery JSON first, choose a URN, then fetch passage text or raw XML as needed.

Potential future enhancement:

Extending the Server

To add a new CTS operation:

  1. Add a new async function with @mcp.tool.
  2. Call _cts_request("<OperationName>", urn=..., ...).
  3. Document it in README.md and docs/enduser.md.

To add a non-CTS endpoint:

  1. Add constants for base URL(s).
  2. Build a thin tool function that calls _get(...).
  3. Decide whether to return raw payload or normalized output.

Operational Notes