morphkit.compare_tags

morphkit.compare_tags(tag1, tag2, debug=False)

Compare two morphological parsing tags by decoding them into features and computing a weighted similarity score.

This function is generated by init_compare_tags() and performs the following actions:

Uses decodeTag to turn each tag (e.g. “V-PAI-3S”) into a dict of grammatical features.

For each feature (Part of Speech, Tense, Case, etc.), looks up the similarity via prebuilt similarity functions.

Multiplies each similarity by its weight, sums and normalizes to the range [0.0,1.0].

Returns both the overall score and a breakdown per feature.

Args:

tag1 (str):

The “gold standard” tag you expect (e.g. from a reference corpus).

tag2 (str):

The tag you want to evaluate against the “gold standard”.

debug (bool):

Optional argument. Defaults to False. If True, print each feature’s known vs. generated value, the raw similarity score, and the feature’s weight.

Returns:

dict:

A dictionairy with the following structure:

"tag" (str),                   # echo of `generated_tag`.
"overall_similarity" (float)   # weighted, normalized [0.0–1.0].
"details" (dict)               # for each feature name, a sub-dict with:
    "tag1" (str)               # the decoded known feature.
    "tag2" (str)               # the decoded generated feature.
    "similarity" (float)       # the raw sim score (0.0–1.0).

Example:

result = morphkit.compare_tags("N-NSM", "N-DSM")
print(result["overall_similarity"])
0.875
print(result["details"]["Case"])
{"tag1": "Nominative", "tag2": "Dative", "similarity": 0.2}

Flow diagram:

       +----------------------------+
       | decode_tag(tag1)           |
       | decode_tag(tag2)           |
       +-------------+--------------+
                     |
                     v
+------------------------------------------+
|   Adjust POS if Mood = Participle/Inf    |
+--------------------+---------------------+
                     |
                     v
+------------------------------------------+
| for each feature in weights:             |
|   - get tag1/2 values                    |
|   - sim = sim_funcs[feature](tag1, tag2) |
|   - accumulate score × weight            |
|   - store details                        |
+--------------------+---------------------+
                     |
                     v
+------------------------------------------+
|     Normalize: total_score / weight      |
+--------------------+---------------------+
                     |
                     v
+------------------------------------------+
| Return: dict with tag1, tag2, similarity |
|          and per-feature details         |
+------------------------------------------+