N1904addons

Additional features for the N1904-TF, the syntactic annotated Text-Fabric dataset of the Greek New Testament.

About this dataset
Featureset
Loading the dataset
Using the Morpheus features
Latest release

N1904addons - Feature: morph_ttr

Feature group Feature type Data type Available for node types Feature status
statistic Node str book chapter verse sentence group wg phrase subphrase clause

Feature short description

Type to Token Ratio based on morph-tags for all word nodes under this node.

Feature values

A float number stored as a string representing a ratio in the range 0 to 1 (inclusive) where the dot denotes a decimal point, not a thousands separator.

Detailed feature description

This feature provides the Morph-to-Token Ratio (MTR), which is a measure for morph diversity. It is defined as:

\[\text{MTR} = \frac{|\{\text{unique morphs in the text}\}|}{N}\]

Visualizing

The following plot compares the Type-to-Token Ratios measured over word form (TTR), lemma (LTR), and morphology (MTR) for each book of the New Testament. The image clearly shows that shorter books generaly are resulting in higher ratio, even though TTR is iself already a normalized measure. To account for this length-related bias there are various methods of normalization. A large number of those methods are made conveniently accessible using the Python package lexicalrichness.

See also

Related features:

Data source

The production notebook can be found on this repository.