morphkit.analyse_pos
- morphkit.analyse_pos(parse: Dict[str, Any], debug: bool = False) str [source]
analyse a single Morpheus parse record and determine its part of speech.
Args:
- parse (dict):
A parse dictionary with the following structure:
{ 'raw_uc': '...', 'stam_codes': [...], ... 'morph_flags': [...], 'tense': 'present', ... }
- debug (bool):
Optional argument. Defaults to False. If set to True the function print some debug information.
Returns:
- str:
The determined Part of Speech label (e.g. ‘noun’, ‘verb’, ‘adverb’, …), or ‘unknown’ if no rule applies.
Steps:
The analysis consist of the following major steps:
Verbs (presence of ‘tense’ or ‘mood’ keys).
Note: one could argue for two dedicated POS classes, for participle and infinitive, c.f Wallace GGBB p.613 & p.588. This was NOT done in order to stay in line with the current N1904-TF classification used by feature sp. The differentation between participle, infinitive and ‘other’ verb types is done in module ‘init_compare_tags’.
Specific morph codes and flags → mapped POS (e.g. ‘conj’ → conjunction).
Indeclinable forms (‘indeclform’ flag):
Neuter-singular nom/acc → adverb.
Numeral indecl → numeral.
Proper noun indecl if gender/number present → proper noun.
Otherwise → other indeclinable noun.
Proclitic or enclitic forms → particle.
Anything with case or gender → noun.
If other_end_token == adverbial → adverb.
Fallback → unknown.
Example:
parse = {'raw_uc':'λέγω','tense':'present','mood':'indicative', ...} morphkit.analyse_pos(parse) 'verb'