Morphkit logo

Home

Welcome to the documentation for Morphkit, a Python toolkit for processing the output of the Morpheus Morphological analyser.

This package was created as part of a research project to create a Text-Fabric dataset containing the Morpheus analytical data for each word of the Nestle1904 Greek New Testament. A number of functions are specificly related to this use case.

Features

  • Lightweight and modular morphological toolkit.

  • Compatible with Morpheus environments.

  • Designed for use with Greek New Testament texts (SP tags).

  • Basic support for Latin.

Using this package

Installation

How to install this package in your Python environments

Usage

How to use this package

GitHub

You can find the project’s source code on GitHub and report issues or suggestions at the issue tracker.

Summary of functions

morphkit.analyse_pos

analyse a single Morpheus parse record and determine its part of speech.

morphkit.analyse_morph_tag

Compute the Sandborg–Petersen morphological tag for a single Morpheus analyses block.

morphkit.analyse_word_with_morpheus

Query the Morpheus morphological analyser for a Greek word in Betacode and parse its analyses.

morphkit.annotate_and_sort_analyses

Annotate and sort analyses in a morphkit-compatible structure, grouping by base lemma and appending homonym suffixes extracted from lem_full_bc minus lem_base_bc.

morphkit.compare_tags

Compare two morphological parsing tags by decoding them into features and computing a weighted similarity score.

morphkit.decode_tag

Decode a morphological tag into a set of human-readable features.

morphkit.get_word_blocks

Retrieve the raw word blocks data for a given beta-code word from a Morpheus endpoint.

morphkit.init_compare_tags

Factory that initializes and returns a fully-configured compare_tags() function.

morphkit.parse_word_block

Parse a single Morpheus output block of Beta-code lines into structured morphological data.

morphkit.split_into_raw_blocks

Split the input text into blocks at each ':raw' header using multiline regex.