Morphkit logo

Home

Welcome to the documentation for Morphkit, a Python research tool for processing the output of the Morpheus Morphological analyser.

You are currently reading the |docs_label| documentation for Morphkit |release_version|. Use the version selector in the sidebar to switch between the stable release, development docs, and older tagged versions.

This package was created as part of a research project to create a Text-Fabric dataset containing the Morpheus analytical data for each word of the Nestle1904 Greek New Testament. A number of functions are specifically related to this use case.

In 1.0.0, Morphkit is best understood as a semantic translation layer between two incompatible morphological systems: the raw Morpheus analyses and the SP / N1904-TF tagging conventions used in that project. The initial release is therefore tightly bound to the N1904-TF environment. It is packaged and documented so the research workflow can be reproduced, not because it has already become a fully general standalone package.

Features

  • Research-oriented middleware around Morpheus output.

  • Translation of Morpheus analyses into SP / N1904-TF-style tags.

  • Intended primarily for Nestle1904 Text-Fabric scripts, notebooks, and exports.

  • Basic support for Latin within the same architecture.

Using this package

Installation

How to install the reproducible research snapshot

Usage

How to use this tool in its intended research setting

Architecture

How the 1.0.0 Morphkit translation layer is structured internally

License

How code and non-code materials in this repository are licensed

GitHub

You can find the project’s source code on GitHub and report issues or suggestions at the issue tracker.

Summary of functions

morphkit.analyse_pos

analyse a single Morpheus parse record and determine its part of speech.

morphkit.analyse_morph_tag

Compute the Sandborg–Petersen morphological tag for a single Morpheus analyses block.

morphkit.analyse_word_with_morpheus

Query the Morpheus morphological analyser for a Greek word in Betacode and parse its analyses.

morphkit.annotate_and_sort_analyses

Annotate and sort analyses in a morphkit-compatible structure, grouping by base lemma and appending homonym suffixes extracted from lem_full_bc minus lem_base_bc.

morphkit.compare_tags

Compare two morphological parsing tags by decoding them into features and computing a weighted similarity score.

morphkit.decode_tag

Decode a morphological tag into a set of human-readable features.

morphkit.get_word_blocks

Retrieve the raw word blocks data for a given beta-code word from a Morpheus endpoint.

morphkit.init_compare_tags

Factory that initializes and returns a fully-configured compare_tags() function.

morphkit.parse_word_block

Parse a single Morpheus output block of Beta-code lines into structured morphological data.

morphkit.split_into_raw_blocks

Split the input text into blocks at each ':raw' header using multiline regex.