Status — living document │ Last updated: 22 June 2025
This document distils what I currently know about the full-detail output of the Morpheus Greek morphological analyser. While extensive, it is not yet exhaustive, please report omissions or corrections so I can refine this page and the code denoding on it (like the Morphkit package). The main intention of this page is to provide details on the way data-extraction from the Morpheus internal database was implemented.
A descriptive document about the function and usage of Morpheus is available online in raw format at GitHub or as a nicely HTML rendered webpage via GitHack. This document describes the inner workings at a high level and provides build instructions. The most formal source — which, as far as I know, is not publicly available in digital form — that describes Morpheus’s architecture is:
Generating and Parsing Classical Greek
GREGORY CRANE
Literary and Linguistic Computing, Volume 6, Issue 4, 1991, Pages 243–245,
https://doi.org/10.1093/llc/6.4.243
Published: 01 January 1991
Morpheus’s operations fundamentally differ from a simple lookup method to determine morphological interpretations of a given word. Instead, it attempts to analyse the textual form by identifying components such as the stem and the ending, and then infers the morphological features. A comment by Zachary Fletcher offers insight into how Morpheus works internally:
… Morpheus works differently from a relational database. When you ask it about καλῶν, it first tries to separate the stem from the ending and then checks both of them separately in its collection of stems and endings. The stems and endings are in the /stemlib/Greek directory I linked above. Morpheus also has some special case logic to deal with elision, crasis, dialectical differences (e.g. συν- vs. ξυν-), etc. If you’d like to figure out the logic, /src/anal/checkstring.c is a good place to start. (hyperlinks added by TJ)
Also included in the same discussion is an image from Gregory Crane’s article, which illustrates the high-level analytical flow of the Morpheus software.
Additional derived information and technical details can be found in open-access resources such as the digitalclassicist’s wiki page.
Several versions of Morpheus are available. Some provide an XML API to perform morphological lookups. One example is the build provided by the alpheios-project. Although they share a codebase with the “standard” Morpheus, the Alpheios Project version includes more recent updates on their stem files.
To maintain clarity about data provenance and ensure full compatibility with the Perseids tools, this project uses the plain, vanilla Morpheus implementation, as the dataset is named Morpheus. Fortunately, a Docker container was available that provided a plain version with an API.
The process of installing the Morpheus Docker is described here. After starting the Docker container, an API becomes available at the Docker’s IP address in my local LAN using port 1315. The impact of the options used is shown below (see Jupyter notebook compare_response_two_API_calls.ipynb for details).
When called with ?opts=d?opts=n
the responce is:
:raw a)/nqrwpou
:workw a)nqrw/pou
:lem a)/nqrwpos
:prvb
:aug1
:stem a)nqrwp masc os_ou
:suff
:end ou masc gen sg os_ou
:raw a)/nqrwpou
:workw a_)nqrwpou=
:lem a)nqrwpo/omai
... etc ...
And when called with ?opts=n
the responce is:
a)/nqrwpou
<NL>N a)nqrw/pou,a)/nqrwpos masc gen sg os_ou</NL><NL>V a_)nqrwpou=,a)nqrwpo/omai imperf ind mp 2nd sg doric aeolic contr ow_pr,ow_denom</NL><NL>V a)nqrwpou=,a)nqrwpo/omai pres imperat mp 2nd sg contr ow_pr,ow_denom</NL><NL>V a)nqrwpou=,a)nqrwpo/omai imperf ind mp 2nd sg homeric ionic contr unaugmented ow_pr,ow_denom</NL><NL>N a)nqrwpou=,a)nqrwpw/ fem nom/voc/acc dual contr w_oos</NL>
In this project, the core component of Morpheus — the “cruncher” — was invoked using the -d
flag. This option dumps internal database information, providing the maximum level of detail available. The decoding schema described on this page corresponds specifically to the output format generated with these flags (i.e. ?opts=d?opts=n
). This is independ from the method used to obtain the results, either the script I initially used (which can be found here) or the use of an API.
When Morpheus successfully analyses a token, it prints one or multiple records, each representing a distinct morphological parse. Every record is a block of colon-prefixed lines (e.g. :lem
, :stem
, :end
). The analyser’s PrntAnalyses function writes these lines in a fixed order. There is a maximum of 25 blocks that can be returned for any word.
For instance, the following four analysis blocks are returned when executing morphkit.get_word_blocks('*ai)gupti/wn', base_url
):
:raw *ai)gupti/wn
:workw *ai)gupti/wn
:lem *ai)gu/ptios
:prvb
:aug1
:stem *ai)gupti os_h_on
:suff
:end wn fem gen pl os_h_on
:raw *ai)gupti/wn
:workw *ai)gupti/wn
:lem *ai)gu/ptios
:prvb
:aug1
:stem *ai)gupti os_h_on
:suff
:end wn masc/neut gen pl os_h_on
:raw *ai)gupti/wn
:workw *ai)gupti/wn
:lem *ai)guptio/w
:prvb
:aug1
:stem *ai)gupti ow_pr,ow_denom
:suff
:end wn imperf ind act 3rd pl doric aeolic contr ow_pr
:raw *ai)gupti/wn
:workw *ai)gupti/wn
:lem *ai)guptio/w
:prvb
:aug1
:stem *ai)gupti ow_pr,ow_denom
:suff
:end wn imperf ind act 1st sg doric aeolic contr ow_pr
When using the function morphkit.analyse_word_with_morpheus('*ai)gupti/wn', base_url)
, it will also gather these blocks, but also analyse all it’s elements and store them in a dictionary with labeled morphological details according to the schema detailed on this page.
The following tables can be used to break down each line into it constituent information elements.
Tag/field | Always data present? | Typical contents | Notes/description |
---|---|---|---|
:raw | yes | Token as supplied (Beta Code) | The raw form of the word, as it was inputted. This may include ellipsis (indicated with ‘). Crane provides as example ἐπέμπετ᾽, which could stand for ἐπέμπετε (“you [pl] were sending”) or ἐπέμπετο (“s/he was being sent”) |
:workw | yes | The working token after basic normalisation | In most cases, the raw and work word are identical. |
:lem | yes | The lemma (Beta Code) | Determined dictionary form or root of the Greek word. |
:prvb | Preverb(s); dialect; morph‑flags | Details about attached preposition (e.g., ἐν- or meta-). | |
:aug1 | Augment / reduplication; dialect; morph‑flags | Augment, indicating a prefix marking past/perfect tense (typically absent in non-past forms). | |
:stem | Stem; inherent morphology; dialect; morph‑flags; stem‑type code(s) | The base or stem type of the word. It also shows the paradigm the analysis was based upon (e.g. os_h_on , aor2 ). |
|
:suff | Suffix segment; | Suffixes, if any. Mostly empty in GNT corpora? (see suffix_gstr_of) | |
:end | Ending; full morph features; dialect; morph‑flags; paradigm code(s) | Analysis ending details (=Core morphological data) including grammatical information like case, number, gender, mood, tense, and dialect. |
While the first three lines (:raw
, :workw
and :lem
) always contain only one argument, the lines that follow may have a varying amount of items. Items on each line are separted by tabs. (\t
)
:raw
The untouched input string. If an apostrophe appears at the end (classical elision) Morpheus does not expand it; instead, a separate record will appear under :workw
for each plausible expansion.
:workw
The token after Morpheus applies minimal normalisation. When Morpheus is run with the -S
switch, it performs minimal normalisation on each input token. This includes (in certain situations?) decapitalisation, which means that the leading asterisk in ‘*tou=to’ (which marks an uppercase first letter) is removed. In that case we recieve output like: ‘:raw *tou=to’ and ‘:workw tou=to’. Accents are also regularised so that the token matches a lexicon entry. For instance, the raw input kai\
is changed to kai/
.
All analysis lines that follow — within the same output block — are based on this normalised :workw
form.
:lem
Lemma in Beta-Code format. My exporter adds a Unicode copy to the dictionairy under the key lemma
. [note: it should also take care of multiple senses!]
In certain cases, there are two lemma entries, indicated by a number concatenated to the Betacode lemma. In such cases, the entries differ in grammatical role and meaning and should be treated as two separate lexemes that merely share the lemma string (homonymy).
For example, the word h)\
(betacode for the single‐letter word ἢ) has two lemma entries, h)/1
and h)/2
. Note the back slash \
and forward slash /
are accent marks belonging to the lemma; only the final number functions as a numeric suffix, just as in the standard way of tagging homonyms (think ἢ¹ vs. ἢ²). Entry 1 has the part-of-speech tag conj
(co-ordinating conjunction “or”), where entry 2 has the part-of-speech tag exclam
(exclamatory particle “ah!”, “verily”, etc.).
For certain lemmata Morpheus adds a betacode suffix -pl
(-πλ). Examining its occurenses in the GNT all instances where Morpheus adds the suffix -pl
can be linked mostly to persons, in a few instances to places (e.g., *(ierosolu/mois
; Ἱεροσολύμοις).
:prvb
(Preverb Line)The following table shows the 5 columns structure of the line (columns are separted with tabs; \t
).
Label | Prepostionpart | Unknown | Dialect | MorphFlags | Unknown |
---|---|---|---|---|---|
:prvb |
E.g.: a)po/ ,a)na/ |
- | E.g.: epic , doric |
E.g.: prevb_aug , doubled_cons |
- |
Prepostionpart: This can be either empty, one or two prepositions. Eg. διακατηλέγχετο (‘diakathle/gxeto’; Acts 18:28) has 2: dia/,kata/.
Dialect: contains dialect details arranged according C.D. Buck, The Greek Dialects; grammar, selected inscriptions, glossary (Chicago: The University of Chicago Press,, 1955),9. The values are defined in dialect.h.
MorphFlags: contains details about morphological peculiarities. There are many possible values which are defined in file morphflags.h.
:aug1
(Augment/Reduplication Line)Same 5‑column structure as :prvb
. The Augmentpart (col 1) shows the actual augment characters, e.g. e)>h)
.
Label | Augmentpart | Unknown | Dialect | MorphFlags | Unknown |
---|---|---|---|---|---|
:aug1 |
E.g.: e)>h) |
- | E.g.: attic , ionic |
E.g.: syll_augment |
- |
Dialect: contains dialect details arranged according C.D. Buck, The Greek Dialects; grammar, selected inscriptions, glossary (Chicago: The University of Chicago Press,, 1955),9. The values are defined in dialect.h.
MorphFlags: contains details about morphological peculiarities. There are many possible values which are defined in file morphflags.h and/or morphkeys.h.
:stem
Label | Stempart | Morphology | Dialect | MorphFlags | Stemtype |
---|---|---|---|---|---|
:stem |
E.g.: eu)xarist |
E.g.: fem , masc sg , masc voc sg |
E.g.: doric aeolic |
E.g.: unaugmented |
E.g.: aor2 , os_h_on , numi |
Morphology: the usual morphological elements (number, gender, case, etc).
Dialect: contains dialect details arranged according C.D. Buck, The Greek Dialects; grammar, selected inscriptions, glossary (Chicago: The University of Chicago Press, 1955),9. The values are defined in dialect.h.
MorphFlags: contains details about morphological peculiarities. There are many possible values which are defined in file morphflags.h and/or morphkeys.h.
Stemtype: list the paradigm code controlling endings & accent. There are many possible values which seems are defined in various tables in the code.
:suff
label | Suffixpart | Unknown | Unknown | Unknown | Unknown |
---|---|---|---|---|---|
:suff |
always empty? |
No occurences in our GNT data. Is this due to a ‘dummy’ suffixtype.h?
:end
(Ending & Morphology)This line is of particular interest as it contains most the morphological data I would like to capture.
label | prepostion | Morphology | Dialect | MorphFlags | PoS and decl |
---|---|---|---|---|---|
:end |
E.g.: io/ntwn |
E.g.: aor ind act 3rd sg |
E.g.: doric laconia |
E.g. nu_movable |
E.g. ew_pr , reg_fut |
Author | Tony Jurg |
Version | 2.3 |
Date | 22 June 2025 |