morphkit.parse_word_block

morphkit.parse_word_block(block: List[str], language: str = 'greek', debug: bool = False) Tuple[str, List[Dict[str, Any]]][source]

Parse a single Morpheus output block of Beta-code lines into structured morphological data.

Each block corresponds to all analyses for one Greek form. Lines are labeled with prefixes like :raw, :lem, :stem, :end, etc. This function walks through those labels, extracts fields, and assembles a dictionary with morphological features.

Args:

block (List[str]):

A list of lines (strings) from Morpheus output, each starting with a label like :raw, :lem, :stem, or :end, followed by tab-separated fields. Usualy this is the output generated by function get_word_blocks().

language (str):

Optional argument. Defaults to greek. The other option is latin.

debug (bool):

Optional argument. Defaults to False. If set to True the function print some debug information.

Returns:

Tuple[str, List[Dict[str, Any]]]:

A pair (raw_beta, parses)

Each pair consist of:

raw_beta (str):

raw Beta-code form as returned by Morpheus (from the last :raw line).

parses (List[Dict]):

a list of parse dictionaries, one per analysis block. Each parse dictionary may contain keys such as:

  • “raw_bc”: the original betacode word.

  • “workw_bc”: the segment Morpheus analysed in betacode.

  • “lem_full_bc”: the full lemma form (incl homonym or pl suffix) in betacode.

  • grammatical features: “case”, “number”, “gender”, “tense”, “mood”, “voice”, “person”, “degree”.

  • lists: “morph_codes”, “morph_flags”, “dialects”.

  • computed fields added to it such as “pos” (Part of Speech) and “morph” (SP tag).

Raises:

ValueError:

If the language parameter is invalid (only ‘greek’ and ‘latin’ are allowed).

General notes: