Additional features for the N1904-TF, the syntactic annotated Text-Fabric dataset of the Greek New Testament.
About this datasetThis repository offers a set of features built specifically for use with the N1904 Text-Fabric dataset. Text-Fabric is a powerful platform for analyzing biblical texts with linguistic and structural detail.
The features of this package can be grouped as follows:
These features describe the Morpheus analysis per word-node (more info).
Feature name | Data type | Available on node | Description | Examples |
---|---|---|---|---|
betacode | str |
word |
The Greek unicode word in betacode | *)ihsou= *xristou= |
mm_raw_bc | str |
word |
Betacode representation of the raw: field of Morpheus for this word node | *)ihsou= *xristou= |
mm_raw_uc | str |
word |
Unicode representation of the raw: field of Morpheus for this word node | Ἰησοῦς |
mm_num_lemmas | int |
word |
Number of different lemmas returned by the Morpheus analytic blocks for this word node | 1 3 |
mm_num_blocks | int |
word |
Total number of analytic Morpheus blocks return for this word node | 1 9 11 |
mm_num_morphs | int |
word |
Total number of returned different morph tags | 1 2 4 |
mm_max_sim | int |
word |
Maximum similarity with N1904-TF morph tag for any lemma | 100 90 11 |
These features summarize the Morpheus analysis grouped per lemma (more info).
Feature name | Data type | Available on node | Description | Examples |
---|---|---|---|---|
ms{num}_block_nums | str |
word |
Summary feature for grouped analysis #{num} providing a list with the associated blocknumbers. | 1/2 4/5/6 |
ms{num}_lem_base_bc | str |
word |
Summary feature for grouped analysis #{num} providing the base lemma (clean; without suffixes) encoded in betacode | ku/rios *solomw/n |
ms{num}_lem_base_uc | str |
word |
Summary feature for grouped analysis #{num} providing the base lemma (clean; without suffixes) encoded in unicode | φανάω ἀνά-διδράσκω |
ms{num}_lem_full_bc | str |
word |
Summary feature for grouped analysis #{num} providing the full lemma incl. homonym or pl-suffix identifiers, if any) encoded in betacode | *daui/dhs-pl h)/2 |
ms{num}_lem_full_uc | str |
word |
Summary feature for grouped analysis #{num} providing the full lemma incl. homonym or pl-suffix identifiers, if any) encoded in unicode | Ἰησοῦσ-πλ λέγω2 |
ms{num}_lem_homonym | str |
word |
lemma homonym indicator | |
ms{num}_lem_pl_suff | str |
word |
lemma pl suffix indicator | |
ms{num}_morph | str |
word |
Summary feature for grouped analysis #{num} providing a list of morphs | N-GSM/N-VSM/N-PRI V-IEI-2S |
ms{num}_morph_sim | str |
word |
Summary feature for grouped analysis #{num} providing a list of morps similairties to the N1904-TF morph | 100/93/56 56 |
ms{num}_num_blocks | int |
word |
Summary feature for grouped analysis #{num} providing the number of analytic blocks it summarizes | 3 2 |
These features provide access to detailes of each individual Morpheus analysis block (more info).
Feature name | Data type | Available on node | Description |
---|---|---|---|
md{num}_{inf_elem} | str |
word |
Overview of features providing access to data from individual Morpheus analytic blocks |
These features provide access to features based upon analysis of the Morpheus data in connection with N1904-TF morphosyntactic tagging (more info).
Feature name | Data type | Available on node | Description |
---|---|---|---|
ma{num}_pf_altern | str |
phrase |
Phrase function alternatives based on morphological tagging according to base N1904-TF (num=0) or Morpheus (num in range 1,12) |
ma{num}_pf_flip | int |
phrase |
Max function flip potential for phrase according to morphological tagging according to base N1904-TF (num=0) or Morpheus (num in range 1,12) |
ma_pf_max_flip | int |
phrase |
Max function flip potential for both N1904-TF and Morpheus based tagging. |
Feature name | Data type | Available on node | Description | Examples |
---|---|---|---|---|
lemma_entr_mbit | int |
word |
Absolute entropy of the lemma as predictor of its parent phrase function in mili-bits | number ranging from 0 to 2550 |
lemma_entr_norm | int |
word |
Normalized entropy of the lemma as predictor of its parent phrase function (scale 0-1000) | number ranging from 0 to 737 |
lemma_ttr | str |
book chapter sentence group wg phrase subphrase clause |
Type to Token Ratio based on lemma for all word nodes under this node | 0.10438939 0.06539666 |
morph_entr_mbit | int |
word |
Absolute entropy of a morph-tag as predictor of its parent phrase function in mili-bits | number ranging from 0 to 2727 |
morph_entr_norm | int |
word |
Normalized entropy of a morph as predictor of its parent phrase function (scale 0-1000) | number ranging from 0 to 788 |
morph_ttr | str |
book chapter sentence group wg phrase subphrase clause |
Type to Token Ratio based on lemma for all word nodes under this node | 0.11847122 0.36065573 |
num_words | int |
book chapter verse sentence group wg phrase subphrase clause |
Number of word nodes below this node | 2 15643 |
text_entr_mbit | int |
word |
Absolute entropy of the surface level wordform as predictor of its parent phrase function in mili-bits | number ranging from 0 to 2584 |
text_entr_norm | int |
word |
Normalized entropy of a surface level wordform as predictor of its parent phrase function (scale 0-1000) | number ranging from 0 to 747 |
text_ttr | str |
book chapter sentence group wg phrase subphrase clause |
Type to Token Ratio based on wordform for all word nodes under this node | 0.30498177 0.37854251 |
Feature name | Data type | Available on node | Description | Examples |
---|---|---|---|---|
osis_id | str |
book chapter verse |
Book name accoring to the OSIS (Open Scriptural Information Standard) ID | 2Thess Phlm |
Feature name | Data type | Available on node | Description | Examples |
---|---|---|---|---|
formaltag | str |
word |
The “form-oriented” morphological tag from GBI nodes data | V-AAI-3S |
orig_order | int |
word |
Word order as it is found in the XML Low Fat Tree implementation of the corpus | 567 1043 |
penntree | str |
sentence |
Penntree like syntax tree |