This repository documents the creation of a new morphology-focused feature-set for the Nestle 1904 Greek New Testament Text-Fabric dataset (N1904-TF). The main goal is to add all possible morphological analyses to each Greek word, based on its textual form. To achieve this, the project uses the well-known Perseus Morpheus analyzer. The parses produced by Morpheus are ranked using a heuristic that compares them with existing Text-Fabric morphological features (such as case
, number
, and tense
). The highest-ranked parse is the one that most closely matches the generally accepted interpretation of the word in its specific context.
This repository provides insight into the processing pipeline, including the Python code (primarily embedded in Jupyter Notebooks with comments), intermediate data, and the resulting Text-Fabric feature files. The final feature files (*.tf) are included in the package available at the tonyjurg/N1904addons repository. This repository also explains how an executable instance of Morpheus was set up to run inside a Docker virtualization environment.
The dataset builds on a previously developed Text-Fabric feature that added a betacode representation to each surface-level word. A new word-node feature, betacode
, was created to store the betacode equivalent of the Unicode text found in the text
feature.
All procedures and tools are fully documented and openly accessible to ensure complete reproducibility. The workflow is implemented in Python using Jupyter Notebooks, with each stage of the process modularized into standalone notebooks or scripts. This openness aims to encourage reuse and highlight Text-Fabric’s transparency and flexibility.