Text-Fabric dataset for Greek New Testament based upon Nestle 1904 (Low Fat tree dataset)
About Text-FabricIn Text-Fabric, a “feature” refers to attributes associated with nodes, which represent linguistic elements in the text, including words, word groups, sentences, and verses. These features contain additional information specific to these nodes, facilitating diverse linguistic analyses and data extraction.
The full featureset of this Text-Fabric dataset can be viewed by different grouping methods:
Grid
: pertains to the arrangement and organization of the data.Sectional
: encompasses attributes or elements related to divisions within the text.Lexical
: focuses on aspects related to individual words, their meanings, and lexical properties.Orthograpic
: deals with features related to the visual representation of the text.Textcritical
: deals with features related to textual critical issue.Morphological
: involves attributes that describe the internal structure and form of words.Syntactic
: covers properties related to the arrangement of words and phrases to form meaningful sentences and phrases.Relational
: encompasses attributes that describe relationships or connections between nodes.word
: represents individual words in the text.wg
(wordgroup): refers to a collection or grouping of words that form a cohesive unit.sentence
: represents individual sentences in the text.verse
: pertains to divisions within a larger textual unit, specificaly the biblical verse.chapter
: divisions within the text that group related content together, specificaly the biblical chapter.book
: the highest-level division within the text, corresponding to a bible book.string
: Datatype of feature is string.integer
: Datatype of feature is integer.link
: Datatype of feature is link.configuration
: Configuration data.Text-Fabric, true to its name, implements the concepts of ‘warp’ and ‘weft’, inspired by textile weaving, to represent its data. The ‘warp’ denotes the foundational structured data, encompassing linguistic annotations like words, and phrases, while the ‘weft’ refers to the additional layers of information, known as features. These features encompass linguistic data, annotations, and metadata, seamlessly woven into the ‘warp’ data, resulting in a clear separation between structure and content. This approach enables Text-Fabric to efficiently handle complex linguistic datasets with versatility.