Name | \n", "# of nodes | \n", "# slots / node | \n", "% coverage | \n", "
---|---|---|---|
book | \n", "39 | \n", "10938.21 | \n", "100 | \n", "
chapter | \n", "929 | \n", "459.19 | \n", "100 | \n", "
lex | \n", "9230 | \n", "46.22 | \n", "100 | \n", "
verse | \n", "23213 | \n", "18.38 | \n", "100 | \n", "
half_verse | \n", "45179 | \n", "9.44 | \n", "100 | \n", "
sentence | \n", "63717 | \n", "6.70 | \n", "100 | \n", "
sentence_atom | \n", "64514 | \n", "6.61 | \n", "100 | \n", "
clause | \n", "88131 | \n", "4.84 | \n", "100 | \n", "
clause_atom | \n", "90704 | \n", "4.70 | \n", "100 | \n", "
phrase | \n", "253203 | \n", "1.68 | \n", "100 | \n", "
phrase_atom | \n", "267532 | \n", "1.59 | \n", "100 | \n", "
subphrase | \n", "113850 | \n", "1.42 | \n", "38 | \n", "
word | \n", "426590 | \n", "1.00 | \n", "100 | \n", "
3
etcbc/BHSA
C:/Users/tonyj/text-fabric-data/github/etcbc/BHSA/app
gd905e3fb6e80d0fa537600337614adc2af157309
''
<code>Genesis 1:1</code> (use <a href=\"https://github.com/{org}/{repo}/blob/master/tf/{version}/book%40en.tf\" target=\"_blank\">English book names</a>)
g_uvf_utf8
g_vbs
kq_hybrid
languageISO
g_nme
lex0
is_root
g_vbs_utf8
g_uvf
dist
root
suffix_person
g_vbe
dist_unit
suffix_number
distributional_parent
kq_hybrid_utf8
crossrefSET
instruction
g_prs
lexeme_count
rank_occ
g_pfm_utf8
freq_occ
crossrefLCS
functional_parent
g_pfm
g_nme_utf8
g_vbe_utf8
kind
g_prs_utf8
suffix_gender
mother_object_type
none
unknown
NA
{docRoot}/{repo}
''
''
https://{org}.github.io
0_home
{}
True
local
C:/Users/tonyj/text-fabric-data/github/etcbc/BHSA/_temp
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis
10.5281/zenodo.1007624
Phonetic Transcriptions
https://nbviewer.jupyter.org/github/etcbc/phono/blob/master/programs/phono.ipynb
10.5281/zenodo.1007636
etcbc
/tf
phono
Parallel Passages
https://nbviewer.jupyter.org/github/etcbc/parallels/blob/master/programs/parallels.ipynb
10.5281/zenodo.1007642
etcbc
/tf
parallels
etcbc
/tf
BHSA
2021
https://shebanq.ancient-data.org/hebrew
Show this on SHEBANQ
la
True
{webBase}/text?book=<1>&chapter=<2>&verse=<3>&version={version}&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt
{webBase}/word?version={version}&id=<lid>
v1.8
{typ} {rela}
''
True
{code}
1
''
True
{label}
''
True
gloss
{voc_lex_utf8}
word
orig
{voc_lex_utf8}
{typ} {function}
''
True
{typ} {rela}
1
''
{number}
''
True
{number}
1
''
True
{number}
''
pdp vs vt
lex:gloss
hbo
Data generated by `hapax.ipynb` at ' \n", " '`'\n", " 'github.com/tonyjurg/Parashot`
'\n", " )\n", " return output" ] }, { "cell_type": "markdown", "id": "53742352-a904-43b6-8ebe-a254dfba3be2", "metadata": {}, "source": [ "The following cell contains code that allows us to provide additional information in the table **as it is annotated in the BHSA dataset**. See also the caution on feature [nametype](https://github.com/ETCBC/bhsa/blob/master/docs/features/nametype.md): \n", "> It is unclear how completely and correctly this feature has been assigned." ] }, { "cell_type": "code", "execution_count": 6, "id": "47a7f4d7-8eae-4af5-8988-058534ec34b1", "metadata": {}, "outputs": [], "source": [ "# part of speech expantion table \n", "# https://github.com/ETCBC/bhsa/blob/master/docs/features/sp.md\n", "posMapping= {\n", " # \"abreviation\" (key) : \"description\"\n", " 'art':\t'article',\n", " 'verb':\t'verb',\n", " 'subs':\t'noun',\n", " 'nmpr':\t'proper noun',\n", " 'advb':\t'adverb',\n", " 'prep':\t'preposition',\n", " 'conj':\t'conjunction',\n", " 'prps':\t'personal pronoun',\n", " 'prde':\t'demonstrative pronoun',\n", " 'prin':\t'interrogative pronoun',\n", " 'intj':\t'interjection',\n", " 'nega':\t'negative particle',\n", " 'inrg':\t'interrogative particle',\n", " 'adjv':\t'adjective'\n", "}\n", "\n", "# Subclassification of part of speech (feature ls on word and lex nodes)\n", "# https://github.com/ETCBC/bhsa/blob/master/docs/features/ls.md\n", "subclassMapping = {\n", " # \"abreviation\" (key) : \"description\"\n", " 'nmdi':\t'distributive noun',\n", " 'nmcp':\t'copulative noun',\n", " 'padv':\t'potential adverb',\n", " 'afad':\t'anaphoric adverb',\n", " 'ppre':\t'potential preposition',\n", " 'cjad':\t'conjunctive adverb',\n", " 'ordn':\t'ordinal',\n", " 'vbcp':\t'copulative verb',\n", " 'mult':\t'noun of multitude',\n", " 'focp':\t'focus particle',\n", " 'ques':\t'interrogative particle',\n", " 'gntl':\t'gentilic',\n", " 'quot':\t'quotation verb',\n", " 'card':\t'cardinal',\n", " 'none': ''\n", "}\n", "\n", "# expand information in feature nametype (a comma separated list)\n", "# https://github.com/ETCBC/bhsa/blob/master/docs/features/nametype.md\n", "nametypeExpantions = {\n", " 'pers':\t'person',\n", " 'mens':\t'measurement unit',\n", " 'gens':\t'people',\n", " 'topo':\t'place',\n", " 'ppde':\t'demonstrative personal pronoun'\n", "}\n", "def expandNametype(inputText):\n", " outputText = inputText\n", " if inputText is not None:\n", " for old, new in nametypeExpantions.items():\n", " outputText = outputText.replace(old, new)\n", " return outputText" ] }, { "cell_type": "markdown", "id": "5e1cfcb4-f66d-43d3-98a8-9f0afb033d3a", "metadata": {}, "source": [ "The following cell performs the actual gathering of the hapax legomena:" ] }, { "cell_type": "code", "execution_count": 7, "id": "53bffc02-51ba-477b-b97f-2b70e3b8ec66", "metadata": {}, "outputs": [ { "data": { "text/html": [ "Verse | Hebrew Word | English Gloss | Part of Speech | Subclass | Name Type |
---|---|---|---|---|---|
Genesis 24:21 | מִשְׁתָּאֵ֖ה | gaze | verb | ||
Genesis 24:63 | שׂ֥וּחַ | <uncertain> | verb | ||
Genesis 25:3 | אַשּׁוּרִ֥ם | Asshurites | proper noun | people | |
Genesis 25:3 | לְטוּשִׁ֖ים | Letushites | proper noun | people | |
Genesis 25:3 | לְאֻמִּֽים | Leummites | proper noun | people |
5 hapaxes found.
" ], "text/plain": [ "Verse | Hebrew Word | English Gloss | Part of Speech | Subclass | Name Type |
---|---|---|---|---|---|
{linkSTEPbible} | \"\n", " f\"{wordLink} | \"\n", " f\"{escapeMarkdown(F.gloss.v(node))} | \"\n", " f\"{escapeMarkdown(posMapping.get(F.sp.v(node), ''))} | \"\n", " f\"{escapeMarkdown(subclassMapping.get(F.ls.v(node), ''))} | \"\n", " f\"{escapeMarkdown(expandNametype(F.nametype.v(node)))} | \"\n", " f\"
{numberOfHapax} hapaxes found.
\"\n", "\n", "\n", "# Display the HTML content in the notebook\n", "display(HTML(htmlContent))\n", "\n", "# Define the HTML filename and store to file\n", "fileName = f\"hapax_legomena({parashaNameEnglish.replace(' ','_')}).html\"\n", "htmlContentFull = wrapHTML(htmlContent,reportTitle)\n", "with open(fileName, \"w\", encoding=\"utf-8\") as file:\n", " file.write(htmlContentFull)\n", " \n", "# display download button\n", "downloadButton = f\"\"\"\n", "', '>').replace('\"', '"').replace(\"'\", ''')}\" target=\"_blank\">\n", " \n", "\n", "\"\"\"\n", "display(HTML(downloadButton))" ] }, { "cell_type": "markdown", "id": "93852912-fa5c-420a-88ed-8be3b090eb3a", "metadata": { "tags": [] }, "source": [ "# 4 - Required libraries \n", "##### [Back to ToC](#TOC)\n", "\n", "The scripts in this notebook require (beside `text-fabric`) the following Python libraries to be installed in the environment:\n", "\n", " IPython\n", "\n", "You can install any missing library from within Jupyter Notebook using either`pip` or `pip3`." ] }, { "cell_type": "markdown", "id": "bc5b6c04-4855-4d2d-aa9a-a1dac0256074", "metadata": {}, "source": [ "# 5 - Further reading \n", "##### [Back to ToC](#TOC)\n", "\n", "An discussion regarding Hapax Legomena, including details about ten hapaxes in the Hebrew Bible can be found at [The Torah.com](https://www.thetorah.com/article/hapax-legomena-ten-biblical-examples)." ] }, { "cell_type": "markdown", "id": "68573424-b71f-4596-95e7-468cf9ef9c1e", "metadata": {}, "source": [ "# 6 - Notebook version details\n", "##### [Back to ToC](#TOC)\n", "\n", "Author | \n", "Tony Jurg | \n", "
Version | \n", "1.2 | \n", "
Date | \n", "18 November 2024 | \n", "