morphkit.analyse_pos

morphkit.analyse_pos(parse: Dict[str, Any], debug: bool = False) → str[source]

analyse a single Morpheus parse record and determine its part of speech.

Args:

parse (dict):
A parse dictionary with the following structure:
 {
     'raw_uc': '...',
     'stam_codes': [...],
     ...
     'morph_flags': [...],
     'tense': 'present',
     ...
}
debug (bool):

Optional argument. Defaults to False. If set to True the function print some debug information.

Returns:

str:

The determined Part of Speech label (e.g. ‘noun’, ‘verb’, ‘adverb’, …), or ‘unknown’ if no rule applies.

Steps:

The analysis consist of the following major steps:

Verbs (presence of ‘tense’ or ‘mood’ keys).

Note: one could argue for two dedicated POS classes, for participle and infinitive, c.f Wallace GGBB p.613 & p.588. This was NOT done in order to stay in line with the current N1904-TF classification used by feature sp. The differentation between participle, infinitive and ‘other’ verb types is done in module ‘init_compare_tags’.

Specific morph codes and flags → mapped POS (e.g. ‘conj’ → conjunction).

Indeclinable forms (‘indeclform’ flag):

Neuter-singular nom/acc → adverb.

Numeral indecl → numeral.

Proper noun indecl if gender/number present → proper noun.

Otherwise → other indeclinable noun.

Proclitic or enclitic forms → particle.

Anything with case or gender → noun.

If other_end_token == adverbial → adverb.

Fallback → unknown.

Example:

parse = {'raw_uc':'λέγω','tense':'present','mood':'indicative', ...}
morphkit.analyse_pos(parse)
'verb'