morphkit.annotate_and_sort_analyses

morphkit.annotate_and_sort_analyses(full_analysis: Dict[str, Any], reference_morph: str, reference_lemma: str, base_key: str = 'lem_base_bc', full_key: str = 'lem_full_bc', morph_key: str = 'morph', sim_key: str = 'morph_similarity', lower_case: bool = True, debug: bool = False) Dict[str, Any][source]

Annotate and sort analyses in a morphkit-compatible structure, grouping by base lemma and appending homonym suffixes extracted from lem_full_bc minus lem_base_bc.

Args:

full_analysis (Dict[str, Any]):

A dict with an ‘analyses’ list of blocks (dicts).

reference_morph (str):

The reference morph tag to compare against each block.

reference_lemma (str):

The Betacode lemma (base form, without suffix) to prioritize.

base_key (str):

Optional argument. Defaults to ‘lem_base_bc’. Key under which the base lemma is stored in each block.

full_key (str):

Optional argument. Defaults to ‘lem_full_bc’. Key under which the full lemma is stored in each block.

morph_key (str):

Optional argument. Defaults to ‘morph’. Key under which the raw morph string is stored.

sim_key (str):

Optional argument. Defaults to ‘morph_similarity’. Key under which to store the similarity string.

lower_case (bool):

Optional argument. Defaults to True. If set to True, convert lemmas to lowercase before comparison.

debug (bool):

Optional argument. Defaults to False. If set to True, the function print some debug information.

Returns:

Dict[str, Any]:

A new full_analysis dictionairy with annotated and sorted analyses, and with lem_base_bc modified to include homonym suffix when applicable.

Steps:

  1. Deep-copy the input to avoid mutating the original data.

  2. For each analysis block:

    1. Compute the homonym suffix as the portion of lem_full_bc after lem_base_bc.

    2. If non-empty, append “_(SUFFIX)” to lem_base_bc.

    3. Compute similarity percentages for each tag against reference_morph.

    4. Store sim_key as a slash-separated string of percentages.

    5. Store ‘_max_’ + sim_key as the integer max similarity for this block.

  3. Group blocks by their finalized lem_base_bc (with suffix).

  4. Identify which group key should be first:

    • If reference_lemma matches any finalized base lemma exactly, that group is first.

    • Else if normalize(reference_lemma) matches normalize(base lemma), that group is first.

  5. Compute for each group:

    • group_max: the highest block-level max similarity within that group.

  6. Sort groups so that:

    • The chosen reference group (if any) comes first.

    • Remaining groups follow in descending order of group_max.

  7. Within each group, sort its blocks by descending block-level max similarity.

  8. Flatten groups back into a single list.

  9. Remove temporary helper keys and return the new full_analysis dict.