morphkit.annotate_and_sort_analyses
- morphkit.annotate_and_sort_analyses(full_analysis: Dict[str, Any], reference_morph: str, reference_lemma: str, base_key: str = 'lem_base_bc', full_key: str = 'lem_full_bc', morph_key: str = 'morph', sim_key: str = 'morph_similarity', lower_case: bool = True, debug: bool = False) Dict[str, Any] [source]
Annotate and sort analyses in a morphkit-compatible structure, grouping by base lemma and appending homonym suffixes extracted from lem_full_bc minus lem_base_bc.
Args:
- full_analysis (Dict[str, Any]):
A dict with an ‘analyses’ list of blocks (dicts).
- reference_morph (str):
The reference morph tag to compare against each block.
- reference_lemma (str):
The Betacode lemma (base form, without suffix) to prioritize.
- base_key (str):
Optional argument. Defaults to ‘lem_base_bc’. Key under which the base lemma is stored in each block.
- full_key (str):
Optional argument. Defaults to ‘lem_full_bc’. Key under which the full lemma is stored in each block.
- morph_key (str):
Optional argument. Defaults to ‘morph’. Key under which the raw morph string is stored.
- sim_key (str):
Optional argument. Defaults to ‘morph_similarity’. Key under which to store the similarity string.
- lower_case (bool):
Optional argument. Defaults to True. If set to True, convert lemmas to lowercase before comparison.
- debug (bool):
Optional argument. Defaults to False. If set to True, the function print some debug information.
Returns:
- Dict[str, Any]:
A new full_analysis dictionairy with annotated and sorted analyses, and with lem_base_bc modified to include homonym suffix when applicable.
Steps:
Deep-copy the input to avoid mutating the original data.
For each analysis block:
Compute the homonym suffix as the portion of lem_full_bc after lem_base_bc.
If non-empty, append “_(SUFFIX)” to lem_base_bc.
Compute similarity percentages for each tag against reference_morph.
Store sim_key as a slash-separated string of percentages.
Store ‘_max_’ + sim_key as the integer max similarity for this block.
Group blocks by their finalized lem_base_bc (with suffix).
Identify which group key should be first:
If reference_lemma matches any finalized base lemma exactly, that group is first.
Else if normalize(reference_lemma) matches normalize(base lemma), that group is first.
Compute for each group:
group_max: the highest block-level max similarity within that group.
Sort groups so that:
The chosen reference group (if any) comes first.
Remaining groups follow in descending order of group_max.
Within each group, sort its blocks by descending block-level max similarity.
Flatten groups back into a single list.
Remove temporary helper keys and return the new full_analysis dict.