morphkit.parse_word_block
- morphkit.parse_word_block(block: List[str], language: str = 'greek', debug: bool = False) Tuple[str, List[Dict[str, Any]]] [source]
Parse a single Morpheus output block of Beta-code lines into structured morphological data.
Each block corresponds to all analyses for one Greek form. Lines are labeled with prefixes like :raw, :lem, :stem, :end, etc. This function walks through those labels, extracts fields, and assembles a dictionary with morphological features.
Args:
- block (List[str]):
A list of lines (strings) from Morpheus output, each starting with a label like :raw, :lem, :stem, or :end, followed by tab-separated fields. Usualy this is the output generated by function
get_word_blocks()
.- language (str):
Optional argument. Defaults to greek. The other option is latin.
- debug (bool):
Optional argument. Defaults to False. If set to True the function print some debug information.
Returns:
- Tuple[str, List[Dict[str, Any]]]:
A pair (raw_beta, parses)
Each pair consist of:
- raw_beta (str):
raw Beta-code form as returned by Morpheus (from the last :raw line).
- parses (List[Dict]):
a list of parse dictionaries, one per analysis block. Each parse dictionary may contain keys such as:
“raw_bc”: the original betacode word.
“workw_bc”: the segment Morpheus analysed in betacode.
“lem_full_bc”: the full lemma form (incl homonym or pl suffix) in betacode.
grammatical features: “case”, “number”, “gender”, “tense”, “mood”, “voice”, “person”, “degree”.
lists: “morph_codes”, “morph_flags”, “dialects”.
computed fields added to it such as “pos” (Part of Speech) and “morph” (SP tag).
Raises:
- ValueError:
If the language parameter is invalid (only ‘greek’ and ‘latin’ are allowed).
General notes: