Additional features for the N1904-TF, the syntactic annotated Text-Fabric dataset of the Greek New Testament.
About this datasetFeature group | Feature type | Data type | Available for node types | Feature status |
---|---|---|---|---|
Morpheus |
Node |
int |
word |
✅ |
Summary feature for grouped analysis #{ind} providing the base lemma (clean; without suffixes) encoded in uniccode.
This is a Morpheus summary data feature.
The lemma in unicode.
The following Python code demonstrates how to programaticaly obtain details like lemma and morphological tags per grouped analysis blocks.
wordNodes = F.otype.s("word")
for wordNode in wordNodes:
for blockNumber in range(1, 9):
# dynamically get F.ms{blockNumber}_lemma & morph
lemma = Fs(f"ms{blockNumber}_lemma").v(wordNode)
if not lemma: continue
morph_string = Fs(f"ms{blockNumber}_morph").v(wordNode)
morph_sim_string = Fs(f"ms{blockNumber}_morph_sim").v(wordNode)
# decompose on the slash
parts = morph_string.split("/")
# print what was found
print(f"node={wordNode}, number={blockNumber} → lemma={lemma}, tags: {parts}")
node=1, number=1 → lemma=Βίβλος, tags: ['N-NSM']
node=1, number=2 → lemma=βίβλος, tags: ['N-NSF']
node=2, number=1 → lemma=γένεσις, tags: ['N-GSF']
node=3, number=1 → lemma=Ἰησοῦς, tags: ['N-GSM', 'N-VSM', 'N-PRI']
...
The snippet below uses a list comprehension to dynamically generate the names of Morpheus features indexed by number.
This allows the resulting list to be passed directly to A.show()
for displaying multiple feature layers at once.
# Dynamically generate feature names for all morphology sets
morphFeatureList = (
[f'ms{ind}_lemma' for ind in range(1, 9)]
+ [f'ms{ind}_morph' for ind in range(1, 9)]
+ [f'ms{ind}_morph_sim' for ind in range(1, 9)]
)
# Display the query results with the Morpheus features
A.show(QueryResult, extraFeatures=morphFeatureList)
The image below shows a syntax tree with the display of these features enabled.
GitHub repository Create_morpheus_TF_dataset.