morphkit.analyse_pos

morphkit.analyse_pos(parse: Dict[str, Any], debug: bool = False) str[source]

analyse a single Morpheus parse record and determine its part of speech.

Args:

parse (dict):

A parse dictionary with the following structure:

 {
     'raw_uc': '...',
     'stam_codes': [...],
     ...
     'morph_flags': [...],
     'tense': 'present',
     ...
}
debug (bool):

Optional argument. Defaults to False. If set to True the function print some debug information.

Returns:

str:

The determined Part of Speech label (e.g. ‘noun’, ‘verb’, ‘adverb’, …), or ‘unknown’ if no rule applies.

Steps:

The analysis consist of the following major steps:

  1. Verbs (presence of ‘tense’ or ‘mood’ keys).

    Note: one could argue for two dedicated POS classes, for participle and infinitive, c.f Wallace GGBB p.613 & p.588. This was NOT done in order to stay in line with the current N1904-TF classification used by feature sp. The differentation between participle, infinitive and ‘other’ verb types is done in module ‘init_compare_tags’.

  2. Specific morph codes and flags → mapped POS (e.g. ‘conj’ → conjunction).

  3. Indeclinable forms (‘indeclform’ flag):

    • Neuter-singular nom/acc → adverb.

    • Numeral indecl → numeral.

    • Proper noun indecl if gender/number present → proper noun.

    • Otherwise → other indeclinable noun.

  4. Proclitic or enclitic forms → particle.

  5. Anything with case or gender → noun.

  1. If other_end_token == adverbial → adverb.

  2. Fallback → unknown.

Example:

parse = {'raw_uc':'λέγω','tense':'present','mood':'indicative', ...}
morphkit.analyse_pos(parse)
'verb'