Home
Welcome to the documentation for Morphkit, a Python research tool for processing the output of the Morpheus Morphological analyser.
You are currently reading the |docs_label| documentation for Morphkit |release_version|. Use the version selector in the sidebar to switch between the stable release, development docs, and older tagged versions.
This package was created as part of a research project to create a Text-Fabric dataset containing the Morpheus analytical data for each word of the Nestle1904 Greek New Testament. A number of functions are specifically related to this use case.
In 1.0.0, Morphkit is best understood as a semantic translation layer between two incompatible morphological systems: the raw Morpheus analyses and the SP / N1904-TF tagging conventions used in that project. The initial release is therefore tightly bound to the N1904-TF environment. It is packaged and documented so the research workflow can be reproduced, not because it has already become a fully general standalone package.
Features
Research-oriented middleware around Morpheus output.
Translation of Morpheus analyses into SP /
N1904-TF-style tags.Intended primarily for Nestle1904 Text-Fabric scripts, notebooks, and exports.
Basic support for Latin within the same architecture.
Using this package
- Installation
How to install the reproducible research snapshot
- Usage
How to use this tool in its intended research setting
- Architecture
How the
1.0.0Morphkit translation layer is structured internally- License
How code and non-code materials in this repository are licensed
GitHub
You can find the project’s source code on GitHub and report issues or suggestions at the issue tracker.
Summary of functions
analyse a single Morpheus parse record and determine its part of speech. |
|
Compute the Sandborg–Petersen morphological tag for a single Morpheus analyses block. |
|
Query the Morpheus morphological analyser for a Greek word in Betacode and parse its analyses. |
|
Annotate and sort analyses in a morphkit-compatible structure, grouping by base lemma and appending homonym suffixes extracted from lem_full_bc minus lem_base_bc. |
|
Compare two morphological parsing tags by decoding them into features and computing a weighted similarity score. |
|
Decode a morphological tag into a set of human-readable features. |
|
Retrieve the raw word blocks data for a given beta-code word from a Morpheus endpoint. |
|
Factory that initializes and returns a fully-configured |
|
Parse a single Morpheus output block of Beta-code lines into structured morphological data. |
|
Split the input text into blocks at each ':raw' header using multiline regex. |