{ "cells": [ { "cell_type": "markdown", "id": "d824a2cc-56ba-443c-90a4-7ded0d791353", "metadata": {}, "source": [ "# Hapax legomena in parasha #5: Chayei Sarah (Genesis 23:1-25:18)" ] }, { "cell_type": "markdown", "id": "60af701a-2af8-4297-b840-545749d9a1f1", "metadata": {}, "source": [ "## Table of Content (ToC)\n", "\n", "* 1 - Introduction\n", "* 2 - Load Text-Fabric app and data\n", "* 3 - Performing the queries\n", "* 4 - Required libraries\n", "* 5 - Further reading\n", "* 6 - Notebook version details" ] }, { "cell_type": "markdown", "id": "b17560e5-9e90-4008-ae35-04a34624a43a", "metadata": { "tags": [] }, "source": [ "# 1 - Introduction \n", "##### [Back to ToC](#TOC)\n", "\n", "A *hapax legomenon* (ἅπαξ λεγόμενον) is the term used in linguistics and philology to refer to a word or expression that appears only once within a specific context. Usually, this context is defined as the entire works of an author or a well-defined corpus of literature. The term comes from Greek, where \"hapax\" means \"once\" and \"legomenon\" means \"something said.\" In this Notebook, the context to determine the *hapax legomena* is the full text of the Tenach, or more precisely, the full Biblica Hebraica Stuttgartensia." ] }, { "cell_type": "markdown", "id": "0ad449f1-c3bf-4e45-95e0-6eefac4bc1dc", "metadata": {}, "source": [ "# 2 - Load Text-Fabric app and data \n", "##### [Back to ToC](#TOC)\n", "\n", "The following code will load the Text-Fabric version of the [Biblia Hebraica Stuttgartensia (Amstelodamensis)](https://etcbc.github.io/bhsa/) together with the additonal parasha related features from [tonyjurg/BHSaddons](https://github.com/tonyjurg/BHSaddons)." ] }, { "cell_type": "code", "execution_count": 1, "id": "21fbcf7c-98da-4319-a344-d644da5cfec5", "metadata": { "tags": [] }, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "id": "a290709d-978e-4d29-918c-fddd7d8041e4", "metadata": {}, "outputs": [], "source": [ "# Loading the Text-Fabric code\n", "# Note: it is assumed Text-Fabric is installed in your environment.\n", "from tf.fabric import Fabric\n", "from tf.app import use" ] }, { "cell_type": "code", "execution_count": 3, "id": "14e1cb4c-554a-4785-a4dd-6a89778d5cd3", "metadata": { "scrolled": true, "tags": [] }, "outputs": [ { "data": { "text/markdown": [ "**Locating corpus resources ...**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "app: ~/text-fabric-data/github/etcbc/BHSA/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/etcbc/BHSA/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/tonyjurg/BHSaddons/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/etcbc/phono/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/etcbc/parallels/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " TF: TF API 12.6.1, etcbc/BHSA/app v3, Search Reference
\n", " Data: etcbc - BHSA 2021, Character table, Feature docs
\n", "
Node types\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
Name# of nodes# slots / node% coverage
book3910938.21100
chapter929459.19100
lex923046.22100
verse2321318.38100
half_verse451799.44100
sentence637176.70100
sentence_atom645146.61100
clause881314.84100
clause_atom907044.70100
phrase2532031.68100
phrase_atom2675321.59100
subphrase1138501.4238
word4265901.00100
\n", " Sets: no custom sets
\n", " Features:
\n", "
Parallel Passages\n", "
\n", "\n", "
\n", "
\n", "crossref\n", "
\n", "
int
\n", "\n", " 🆗 links between similar passages\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis\n", "
\n", "\n", "
\n", "
\n", "book\n", "
\n", "
str
\n", "\n", " ✅ book name in Latin (Genesis; Numeri; Reges1; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "book@ll\n", "
\n", "
str
\n", "\n", " ✅ book name in amharic (ኣማርኛ)\n", "\n", "
\n", "\n", "
\n", "
\n", "chapter\n", "
\n", "
int
\n", "\n", " ✅ chapter number (1; 2; 3; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "code\n", "
\n", "
int
\n", "\n", " ✅ identifier of a clause atom relationship (0; 74; 367; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "det\n", "
\n", "
str
\n", "\n", " ✅ determinedness of phrase(atom) (det; und; NA.)\n", "\n", "
\n", "\n", "
\n", "
\n", "domain\n", "
\n", "
str
\n", "\n", " ✅ text type of clause (? (Unknown); N (narrative); D (discursive); Q (Quotation).)\n", "\n", "
\n", "\n", "
\n", "
\n", "freq_lex\n", "
\n", "
int
\n", "\n", " ✅ frequency of lexemes\n", "\n", "
\n", "\n", "
\n", "
\n", "function\n", "
\n", "
str
\n", "\n", " ✅ syntactic function of phrase (Cmpl; Objc; Pred; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons\n", "
\n", "
str
\n", "\n", " ✅ word consonantal-transliterated (B R>CJT BR> >LHJM ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons_utf8\n", "
\n", "
str
\n", "\n", " ✅ word consonantal-Hebrew (ב ראשׁית ברא אלהים)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex\n", "
\n", "
str
\n", "\n", " ✅ lexeme pointed-transliterated (B.:- R;>CIJT B.@R@> >:ELOH ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ lexeme pointed-Hebrew (בְּ רֵאשִׁית בָּרָא אֱלֹה)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_word\n", "
\n", "
str
\n", "\n", " ✅ word pointed-transliterated (B.:- R;>CI73JT B.@R@74> >:ELOHI92JM)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_word_utf8\n", "
\n", "
str
\n", "\n", " ✅ word pointed-Hebrew (בְּ רֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים)\n", "\n", "
\n", "\n", "
\n", "
\n", "gloss\n", "
\n", "
str
\n", "\n", " 🆗 english translation of lexeme (beginning create god(s))\n", "\n", "
\n", "\n", "
\n", "
\n", "gn\n", "
\n", "
str
\n", "\n", " ✅ grammatical gender (m; f; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "label\n", "
\n", "
str
\n", "\n", " ✅ (half-)verse label (half verses: A; B; C; verses: GEN 01,02)\n", "\n", "
\n", "\n", "
\n", "
\n", "language\n", "
\n", "
str
\n", "\n", " ✅ of word or lexeme (Hebrew; Aramaic.)\n", "\n", "
\n", "\n", "
\n", "
\n", "lex\n", "
\n", "
str
\n", "\n", " ✅ lexeme consonantal-transliterated (B R>CJT/ BR>[ >LHJM/)\n", "\n", "
\n", "\n", "
\n", "
\n", "lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ lexeme consonantal-Hebrew (ב ראשׁית֜ ברא אלהים֜)\n", "\n", "
\n", "\n", "
\n", "
\n", "ls\n", "
\n", "
str
\n", "\n", " ✅ lexical set, subclassification of part-of-speech (card; ques; mult)\n", "\n", "
\n", "\n", "
\n", "
\n", "nametype\n", "
\n", "
str
\n", "\n", " ⚠️ named entity type (pers; mens; gens; topo; ppde.)\n", "\n", "
\n", "\n", "
\n", "
\n", "nme\n", "
\n", "
str
\n", "\n", " ✅ nominal ending consonantal-transliterated (absent; n/a; JM, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "nu\n", "
\n", "
str
\n", "\n", " ✅ grammatical number (sg; du; pl; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "number\n", "
\n", "
int
\n", "\n", " ✅ sequence number of an object within its context\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "pargr\n", "
\n", "
str
\n", "\n", " 🆗 hierarchical paragraph number (1; 1.2; 1.2.3.4; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "pdp\n", "
\n", "
str
\n", "\n", " ✅ phrase dependent part-of-speech (art; verb; subs; nmpr, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "pfm\n", "
\n", "
str
\n", "\n", " ✅ preformative consonantal-transliterated (absent; n/a; J, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix consonantal-transliterated (absent; n/a; W; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_gn\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix gender (m; f; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_nu\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix number (sg; du; pl; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_ps\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix person (p1; p2; p3; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "ps\n", "
\n", "
str
\n", "\n", " ✅ grammatical person (p1; p2; p3; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere\n", "
\n", "
str
\n", "\n", " ✅ word pointed-transliterated masoretic reading correction\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_trailer\n", "
\n", "
str
\n", "\n", " ✅ interword material -pointed-transliterated (Masoretic correction)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_trailer_utf8\n", "
\n", "
str
\n", "\n", " ✅ interword material -pointed-transliterated (Masoretic correction)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_utf8\n", "
\n", "
str
\n", "\n", " ✅ word pointed-Hebrew masoretic reading correction\n", "\n", "
\n", "\n", "
\n", "
\n", "rank_lex\n", "
\n", "
int
\n", "\n", " ✅ ranking of lexemes based on freqnuecy\n", "\n", "
\n", "\n", "
\n", "
\n", "rela\n", "
\n", "
str
\n", "\n", " ✅ linguistic relation between clause/(sub)phrase(atom) (ADJ; MOD; ATR; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "sp\n", "
\n", "
str
\n", "\n", " ✅ part-of-speech (art; verb; subs; nmpr, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "st\n", "
\n", "
str
\n", "\n", " ✅ state of a noun (a (absolute); c (construct); e (emphatic).)\n", "\n", "
\n", "\n", "
\n", "
\n", "tab\n", "
\n", "
int
\n", "\n", " ✅ clause atom: its level in the linguistic embedding\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer\n", "
\n", "
str
\n", "\n", " ✅ interword material pointed-transliterated (& 00 05 00_P ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer_utf8\n", "
\n", "
str
\n", "\n", " ✅ interword material pointed-Hebrew (־ ׃)\n", "\n", "
\n", "\n", "
\n", "
\n", "txt\n", "
\n", "
str
\n", "\n", " ✅ text type of clause and surrounding (repetion of ? N D Q as in feature domain)\n", "\n", "
\n", "\n", "
\n", "
\n", "typ\n", "
\n", "
str
\n", "\n", " ✅ clause/phrase(atom) type (VP; NP; Ellp; Ptcp; WayX)\n", "\n", "
\n", "\n", "
\n", "
\n", "uvf\n", "
\n", "
str
\n", "\n", " ✅ univalent final consonant consonantal-transliterated (absent; N; J; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "vbe\n", "
\n", "
str
\n", "\n", " ✅ verbal ending consonantal-transliterated (n/a; W; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "vbs\n", "
\n", "
str
\n", "\n", " ✅ root formation consonantal-transliterated (absent; n/a; H; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "verse\n", "
\n", "
int
\n", "\n", " ✅ verse number\n", "\n", "
\n", "\n", "
\n", "
\n", "voc_lex\n", "
\n", "
str
\n", "\n", " ✅ vocalized lexeme pointed-transliterated (B.: R;>CIJT BR> >:ELOHIJM)\n", "\n", "
\n", "\n", "
\n", "
\n", "voc_lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ vocalized lexeme pointed-Hebrew (בְּ רֵאשִׁית ברא אֱלֹהִים)\n", "\n", "
\n", "\n", "
\n", "
\n", "vs\n", "
\n", "
str
\n", "\n", " ✅ verbal stem (qal; piel; hif; apel; pael)\n", "\n", "
\n", "\n", "
\n", "
\n", "vt\n", "
\n", "
str
\n", "\n", " ✅ verbal tense (perf; impv; wayq; infc)\n", "\n", "
\n", "\n", "
\n", "
\n", "mother\n", "
\n", "
none
\n", "\n", " ✅ linguistic dependency between textual objects\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
Phonetic Transcriptions\n", "
\n", "\n", "
\n", "
\n", "phono\n", "
\n", "
str
\n", "\n", " 🆗 phonological transcription (bᵊ rēšˌîṯ bārˈā ʔᵉlōhˈîm)\n", "\n", "
\n", "\n", "
\n", "
\n", "phono_trailer\n", "
\n", "
str
\n", "\n", " 🆗 interword material in phonological transcription\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
tonyjurg/BHSaddons/tf\n", "
\n", "\n", "
\n", "
\n", "aliyotnum\n", "
\n", "
str
\n", "\n", " The sequence number of the aliyot within the parasha\n", "\n", "
\n", "\n", "
\n", "
\n", "maftir\n", "
\n", "
str
\n", "\n", " Set to 1 if this verse is part of a maftir\n", "\n", "
\n", "\n", "
\n", "
\n", "parashahebr\n", "
\n", "
str
\n", "\n", " The name of the parasha in Hebrew\n", "\n", "
\n", "\n", "
\n", "
\n", "parashanum\n", "
\n", "
int
\n", "\n", " The sequence number of the parasha\n", "\n", "
\n", "\n", "
\n", "
\n", "parashatrans\n", "
\n", "
str
\n", "\n", " Transliteration of the Hebrew parasha name\n", "\n", "
\n", "\n", "
\n", "
\n", "parashaverse\n", "
\n", "
str
\n", "\n", " The sequence number of the verse within the parasha\n", "\n", "
\n", "\n", "
\n", "
\n", "wordboundary\n", "
\n", "
str
\n", "\n", " indicates wordboudaries (spaces OR maqaf)\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", " Settings:
specified
  1. apiVersion: 3
  2. appName: etcbc/BHSA
  3. appPath: C:/Users/tonyj/text-fabric-data/github/etcbc/BHSA/app
  4. commit: gd905e3fb6e80d0fa537600337614adc2af157309
  5. css: ''
  6. dataDisplay:
    • exampleSectionHtml:<code>Genesis 1:1</code> (use <a href=\"https://github.com/{org}/{repo}/blob/master/tf/{version}/book%40en.tf\" target=\"_blank\">English book names</a>)
    • excludedFeatures:
      • g_uvf_utf8
      • g_vbs
      • kq_hybrid
      • languageISO
      • g_nme
      • lex0
      • is_root
      • g_vbs_utf8
      • g_uvf
      • dist
      • root
      • suffix_person
      • g_vbe
      • dist_unit
      • suffix_number
      • distributional_parent
      • kq_hybrid_utf8
      • crossrefSET
      • instruction
      • g_prs
      • lexeme_count
      • rank_occ
      • g_pfm_utf8
      • freq_occ
      • crossrefLCS
      • functional_parent
      • g_pfm
      • g_nme_utf8
      • g_vbe_utf8
      • kind
      • g_prs_utf8
      • suffix_gender
      • mother_object_type
    • noneValues:
      • none
      • unknown
      • no value
      • NA
  7. docs:
    • docBase: {docRoot}/{repo}
    • docExt: ''
    • docPage: ''
    • docRoot: https://{org}.github.io
    • featurePage: 0_home
  8. interfaceDefaults: {}
  9. isCompatible: True
  10. local: local
  11. localDir: C:/Users/tonyj/text-fabric-data/github/etcbc/BHSA/_temp
  12. provenanceSpec:
    • corpus: BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis
    • doi: 10.5281/zenodo.1007624
    • moduleSpecs:
      • :
        • backend: no value
        • corpus: Phonetic Transcriptions
        • docUrl:https://nbviewer.jupyter.org/github/etcbc/phono/blob/master/programs/phono.ipynb
        • doi: 10.5281/zenodo.1007636
        • org: etcbc
        • relative: /tf
        • repo: phono
      • :
        • backend: no value
        • corpus: Parallel Passages
        • docUrl:https://nbviewer.jupyter.org/github/etcbc/parallels/blob/master/programs/parallels.ipynb
        • doi: 10.5281/zenodo.1007642
        • org: etcbc
        • relative: /tf
        • repo: parallels
    • org: etcbc
    • relative: /tf
    • repo: BHSA
    • version: 2021
    • webBase: https://shebanq.ancient-data.org/hebrew
    • webHint: Show this on SHEBANQ
    • webLang: la
    • webLexId: True
    • webUrl:{webBase}/text?book=<1>&chapter=<2>&verse=<3>&version={version}&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt
    • webUrlLex: {webBase}/word?version={version}&id=<lid>
  13. release: v1.8
  14. typeDisplay:
    • clause:
      • label: {typ} {rela}
      • style: ''
    • clause_atom:
      • hidden: True
      • label: {code}
      • level: 1
      • style: ''
    • half_verse:
      • hidden: True
      • label: {label}
      • style: ''
      • verselike: True
    • lex:
      • featuresBare: gloss
      • label: {voc_lex_utf8}
      • lexOcc: word
      • style: orig
      • template: {voc_lex_utf8}
    • phrase:
      • label: {typ} {function}
      • style: ''
    • phrase_atom:
      • hidden: True
      • label: {typ} {rela}
      • level: 1
      • style: ''
    • sentence:
      • label: {number}
      • style: ''
    • sentence_atom:
      • hidden: True
      • label: {number}
      • level: 1
      • style: ''
    • subphrase:
      • hidden: True
      • label: {number}
      • style: ''
    • word:
      • features: pdp vs vt
      • featuresBare: lex:gloss
  15. writing: hbo
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
TF API: names N F E L T S C TF Fs Fall Es Eall Cs Call directly usable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# load the app and data\n", "BHSA = use (\"etcbc/BHSA\", version=\"2021\", mod=\"tonyjurg/BHSaddons/tf/\", hoist=globals())" ] }, { "cell_type": "markdown", "id": "bd939a75-8253-4119-b64d-3d5c403ed096", "metadata": {}, "source": [ "# 3 - Performing the queries \n", "##### [Back to ToC](#TOC)\n", "\n", "The Text-Fabric code in this Notebook first queries to find all word nodess for this parasha. The value for the feature freq_lex is then examined for all word nodes within this range. Whenever the value for freq_lex is set to one, the related word and the verse it is part of are reported as a *hapax legomenon*. The indicated verse is hyperlinked to the STEP Bible, allowing for easy review of the verse. Clicking on the Hebrew word will bring you to the relevant Hebrew verse in Bible Online Learner (BibleOL). The last three columns provide additional information regarding the type and function of the word." ] }, { "cell_type": "code", "execution_count": 4, "id": "f51beb93-6c01-4a85-9913-9fa896a47077", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.26s 1972 results\n" ] } ], "source": [ "# find all word nodes for this parasha (we can either use the transliterated name or the sequence number)\n", "parashaQuery = '''\n", "verse parashanum=5\n", " word\n", "'''\n", "parashaResults = BHSA.search(parashaQuery)" ] }, { "cell_type": "code", "execution_count": 5, "id": "5c47a638-aaaa-4edb-85e5-3883d8e4e88e", "metadata": {}, "outputs": [], "source": [ "# Store parashname, start and end verse for future use\n", "startNode=parashaResults[0][0]\n", "endNode=parashaResults[-1][0]\n", "parashaNameHebrew=F.parashahebr.v(startNode)\n", "parashaNameEnglish=F.parashatrans.v(startNode)\n", "bookStart,chapterStart,startVerse=T.sectionFromNode(startNode)\n", "parashaStart=f'{bookStart} {chapterStart}:{startVerse}'\n", "bookEnd,chapterEnd,startEnd=T.sectionFromNode(endNode)\n", "parashaEnd=f'{chapterEnd}:{startEnd}'\n", "\n", "def wrapHTML(body, title):\n", " output = (\n", " f'{title}'\n", " f'{body}

Data generated by `hapax.ipynb` at ' \n", " '`'\n", " 'github.com/tonyjurg/Parashot`

'\n", " )\n", " return output" ] }, { "cell_type": "markdown", "id": "53742352-a904-43b6-8ebe-a254dfba3be2", "metadata": {}, "source": [ "The following cell contains code that allows us to provide additional information in the table **as it is annotated in the BHSA dataset**. See also the caution on feature [nametype](https://github.com/ETCBC/bhsa/blob/master/docs/features/nametype.md): \n", "> It is unclear how completely and correctly this feature has been assigned." ] }, { "cell_type": "code", "execution_count": 6, "id": "47a7f4d7-8eae-4af5-8988-058534ec34b1", "metadata": {}, "outputs": [], "source": [ "# part of speech expantion table \n", "# https://github.com/ETCBC/bhsa/blob/master/docs/features/sp.md\n", "posMapping= {\n", " # \"abreviation\" (key) : \"description\"\n", " 'art':\t'article',\n", " 'verb':\t'verb',\n", " 'subs':\t'noun',\n", " 'nmpr':\t'proper noun',\n", " 'advb':\t'adverb',\n", " 'prep':\t'preposition',\n", " 'conj':\t'conjunction',\n", " 'prps':\t'personal pronoun',\n", " 'prde':\t'demonstrative pronoun',\n", " 'prin':\t'interrogative pronoun',\n", " 'intj':\t'interjection',\n", " 'nega':\t'negative particle',\n", " 'inrg':\t'interrogative particle',\n", " 'adjv':\t'adjective'\n", "}\n", "\n", "# Subclassification of part of speech (feature ls on word and lex nodes)\n", "# https://github.com/ETCBC/bhsa/blob/master/docs/features/ls.md\n", "subclassMapping = {\n", " # \"abreviation\" (key) : \"description\"\n", " 'nmdi':\t'distributive noun',\n", " 'nmcp':\t'copulative noun',\n", " 'padv':\t'potential adverb',\n", " 'afad':\t'anaphoric adverb',\n", " 'ppre':\t'potential preposition',\n", " 'cjad':\t'conjunctive adverb',\n", " 'ordn':\t'ordinal',\n", " 'vbcp':\t'copulative verb',\n", " 'mult':\t'noun of multitude',\n", " 'focp':\t'focus particle',\n", " 'ques':\t'interrogative particle',\n", " 'gntl':\t'gentilic',\n", " 'quot':\t'quotation verb',\n", " 'card':\t'cardinal',\n", " 'none': ''\n", "}\n", "\n", "# expand information in feature nametype (a comma separated list)\n", "# https://github.com/ETCBC/bhsa/blob/master/docs/features/nametype.md\n", "nametypeExpantions = {\n", " 'pers':\t'person',\n", " 'mens':\t'measurement unit',\n", " 'gens':\t'people',\n", " 'topo':\t'place',\n", " 'ppde':\t'demonstrative personal pronoun'\n", "}\n", "def expandNametype(inputText):\n", " outputText = inputText\n", " if inputText is not None:\n", " for old, new in nametypeExpantions.items():\n", " outputText = outputText.replace(old, new)\n", " return outputText" ] }, { "cell_type": "markdown", "id": "5e1cfcb4-f66d-43d3-98a8-9f0afb033d3a", "metadata": {}, "source": [ "The following cell performs the actual gathering of the hapax legomena:" ] }, { "cell_type": "code", "execution_count": 7, "id": "53bffc02-51ba-477b-b97f-2b70e3b8ec66", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Hapax Legomena for parasha Chayei Sarah (Genesis 23:1-25:18)

VerseHebrew WordEnglish GlossPart of SpeechSubclassName Type
Genesis 24:21מִשְׁתָּאֵ֖הgazeverb
Genesis 24:63שׂ֥וּחַ<uncertain>verb
Genesis 25:3אַשּׁוּרִ֥םAsshuritesproper nounpeople
Genesis 25:3לְטוּשִׁ֖יםLetushitesproper nounpeople
Genesis 25:3לְאֻמִּֽיםLeummitesproper nounpeople

5 hapaxes found.

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML, display\n", "\n", "def escapeMarkdown(text):\n", " return text.replace(\"<\", \"<\").replace(\">\", \">\") if text is not None else \"\"\n", "\n", "# Initialize HTML content\n", "reportTitle=f'Hapax Legomena for parasha {parashaNameEnglish} ({parashaStart}-{parashaEnd})'\n", "htmlContent = f\"

{reportTitle}

\"\n", "htmlContent += \"\"\n", "htmlContent += \"\"\n", "\n", "numberOfHapax = 0\n", "\n", "# Generate table rows\n", "for verse, node in parashaResults:\n", " freq = F.freq_lex.v(node)\n", " if freq == 1:\n", " numberOfHapax += 1\n", " sectionTuple = T.sectionFromNode(node)\n", " wordLink = (\n", " f\"{F.g_word_utf8.v(node)}\"\n", " )\n", " linkSTEPbible = (\n", " f\"{sectionTuple[0]} {sectionTuple[1]}:{sectionTuple[2]}\"\n", " )\n", " htmlContent += (\n", " f\"\"\n", " f\"\"\n", " f\"\"\n", " f\"\"\n", " f\"\"\n", " f\"\"\n", " f\"\"\n", " f\"\"\n", " )\n", "\n", "# Close the table\n", "htmlContent += \"
VerseHebrew WordEnglish GlossPart of SpeechSubclassName Type
{linkSTEPbible}{wordLink}{escapeMarkdown(F.gloss.v(node))}{escapeMarkdown(posMapping.get(F.sp.v(node), ''))}{escapeMarkdown(subclassMapping.get(F.ls.v(node), ''))}{escapeMarkdown(expandNametype(F.nametype.v(node)))}
\"\n", "htmlContent += f\"

{numberOfHapax} hapaxes found.

\"\n", "\n", "\n", "# Display the HTML content in the notebook\n", "display(HTML(htmlContent))\n", "\n", "# Define the HTML filename and store to file\n", "fileName = f\"hapax_legomena({parashaNameEnglish.replace(' ','_')}).html\"\n", "htmlContentFull = wrapHTML(htmlContent,reportTitle)\n", "with open(fileName, \"w\", encoding=\"utf-8\") as file:\n", " file.write(htmlContentFull)\n", " \n", "# display download button\n", "downloadButton = f\"\"\"\n", "', '>').replace('\"', '"').replace(\"'\", ''')}\" target=\"_blank\">\n", " \n", "\n", "\"\"\"\n", "display(HTML(downloadButton))" ] }, { "cell_type": "markdown", "id": "93852912-fa5c-420a-88ed-8be3b090eb3a", "metadata": { "tags": [] }, "source": [ "# 4 - Required libraries \n", "##### [Back to ToC](#TOC)\n", "\n", "The scripts in this notebook require (beside `text-fabric`) the following Python libraries to be installed in the environment:\n", "\n", " IPython\n", "\n", "You can install any missing library from within Jupyter Notebook using either`pip` or `pip3`." ] }, { "cell_type": "markdown", "id": "bc5b6c04-4855-4d2d-aa9a-a1dac0256074", "metadata": {}, "source": [ "# 5 - Further reading \n", "##### [Back to ToC](#TOC)\n", "\n", "An discussion regarding Hapax Legomena, including details about ten hapaxes in the Hebrew Bible can be found at [The Torah.com](https://www.thetorah.com/article/hapax-legomena-ten-biblical-examples)." ] }, { "cell_type": "markdown", "id": "68573424-b71f-4596-95e7-468cf9ef9c1e", "metadata": {}, "source": [ "# 6 - Notebook version details\n", "##### [Back to ToC](#TOC)\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AuthorTony Jurg
Version1.2
Date18 November 2024
\n", "
" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.7" } }, "nbformat": 4, "nbformat_minor": 5 }