# Building the N1904-TF morph browser

## Table of content (ToC)<a class="anchor" id="TOC"></a>
* <a href="#bullet1">1 - Introduction</a>
* <a href="#bullet2">2 - Load N1904-TF with N1904addons</a>
* <a href="#bullet3">3 - Load the morphkit library</a>
* <a href="#bullet4">4 - Create a grouped morphology browser</a>
* <a href="#bullet5">5 - Download result as standalone interactive page</a>
* <a href="#bullet6">6 - Attribution and footnotes</a>
* <a href="#bullet7">7 - Required libraries</a>
* <a href="#bullet8">8 - Notebook version details</a> 

# 1 - Introduction <a class="anchor" id="bullet1"></a>
##### [Back to ToC](#TOC)

This notebook colllects every unique morphological tag from the N1904-TF version of the Greek New Testament. It obtains this details from Text-Fabric wile picking a single word example for each tag (i.e., the first occurence of a word mapping to that specific morphological tag). Next it obtains the analysis for that specific word from Morpheus using an API to access a local running service. Then it renders it together in an interactive, collapsible HTML view grouped by part of speech. It’s a quick-and-easy way to browse and inspect how each morph tag relates to the Morpheus analysis.

# 2 - Load N1904-TF with N1904addons <a class="anchor" id="bullet2"></a>
##### [Back to ToC](#TOC)

I also need the [betacode](https://github.com/tonyjurg/N1904addons/blob/main/docs/features/betacode.md) feature, so I also load [N1904addons](https://github.com/tonyjurg/N1904addons).

In [1]:
# Load the autoreload extension to automatically reload modules before executing code
%load_ext autoreload
%autoreload 2

In [2]:
# Loading the Text-Fabric code
from tf.fabric import Fabric
from tf.app import use

In [3]:
# Load the N1904-TF app and data with the additional features
A = use ("CenterBLC/N1904", version="1.0.0", mod="tonyjurg/N1904addons/tf/", hoist=globals())

**Locating corpus resources ...**

Name,# of nodes,# slots / node,% coverage
book,27,5102.93,100
chapter,260,529.92,100
verse,7944,17.34,100
sentence,8011,17.2,100
group,8945,7.01,46
clause,42506,8.36,258
wg,106868,6.88,533
phrase,69007,1.9,95
subphrase,116178,1.6,135
word,137779,1.0,100


Display is setup for viewtype [syntax-view](https://github.com/CenterBLC/N1904/blob/main/docs/syntax-view.md#start)

See [here](https://github.com/CenterBLC/N1904/blob/main/docs/viewtypes.md#start) for more information on viewtypes

In [None]:
# The following will push the Text-Fabric stylesheet to this notebook (to facilitate proper display with notebook viewer)
A.dh(A.getCss())

# 3 - Load the morphkit library <a class="anchor" id="bullet3"></a>
##### [Back to ToC](#TOC)

This is my own library (under development) available via [github.com/tonyjurg/morphkit](https://github.com/tonyjurg/morphkit). This is developed to simplify interactions with the Morpheus service running on my Docker container.

In [7]:
import sys
sys.path.insert(0, "../../morphkit")    # relative to notebook dir
import morphkit

# 4 - Create a grouped morphology browser<a class="anchor" id="bullet4"></a>
##### [Back to ToC](#TOC)

This is the actual code creating the collapsable view utilizing both Text-Farbic and morphkit.

# 4 - Add dynamic lookup function of Morpheus results <a class="anchor" id="bullet4"></a>
##### [Back to ToC](#TOC)

Now let us add a dynamic lookup function of the Morpheus analytical results. We will use the morphkit library to perform the Morpheus lookups.

In [5]:
from collections import defaultdict
from IPython.display import display, HTML
import html

# Base URL for Morpheus API on Docker
baseUrl = "http://10.0.1.156:1315/greek/"

# Step 1: Build one example per morph tag, grouped by part-of-speech initial
# Outer dict: key = first char of morph tag (POS), value = dict of examples
groupedMorphs = defaultdict(dict)

for wordNode in F.otype.s('word'):
    # Get the full morph tag for this word node
    morph = F.morph.v(wordNode)

    # Skip if missing or already recorded for this POS
    if not morph or morph in groupedMorphs[morph[0]]:
        continue

    # Retrieve the Unicode form and beta-code representation
    unicodeForm = F.unicode.v(wordNode)
    betaCode = F.betacode.v(wordNode)

    # Store one example per unique tag, organized under its POS bucket
    groupedMorphs[morph[0]][morph] = {
        "unicode": unicodeForm,
        "betacode": betaCode,
        "morph": morph
    }

# Step 2: Step 2: Build HTML for interactive, collapsible view of grouped examples

# Helper to build full HTML content once
def buildGroupedMorphsHtml(grouped):
    # Header with summary of content
    htmlContent = "<h3>Unique Morph Tags by POS (1 example each)</h3>"

    # Loop through POS buckets in sorted order
    for pos in sorted(grouped.keys()):
        morphs = grouped[pos]
        # Create a collapsible <details> for each POS group
        htmlContent += f"<details><summary><b>{pos}</b> ({len(morphs)} items)</summary><ul style='margin-left:20px'>"

        # For each morph tag, fetch and display its morphological block
        for morphTag in sorted(morphs.keys()):
            entry = morphs[morphTag]
            betacode = entry["betacode"]
            morphBlock = morphkit.get_word_blocks(betacode,baseUrl)
            
            # Main list item showing tag, unicode, and beta-code
            htmlContent += f"<li><code>{morphTag}</code> — {entry['unicode']} — <b>{betacode}</b>"

            if morphBlock:
                # Escape HTML and determine if block indicates an error
                escapedBlock = html.escape(f"-----------------------------\n{morphBlock}\n-----------------------------")
                hasError = "Error: No response for" in morphBlock
                # Highlight errors with a warning style
                summaryStyle = "background-color: #fff3cd; border-left: 4px solid orange; padding: 2px;" if hasError else ""
                summaryText = "Morphology details" if hasError else "Morphology details"

                # Nested <details> showing the raw morphology block
                htmlContent += f"<details style='margin-left:10px'>"
                htmlContent += f"<summary style='{summaryStyle}'>{summaryText}</summary>"
                htmlContent += f"<pre>{escapedBlock}</pre></details>"

            # Close the list item
            htmlContent += "</li>"

        # Close the unordered list and POS <details>
        htmlContent += "</ul></details>"

    return htmlContent

# Step 3: Render the HTML in the notebook
def displayGroupedMorphs(grouped):
    htmlContent = buildGroupedMorphsHtml(grouped)
    display(HTML(htmlContent))

# Execute display function to show the interactive view
displayGroupedMorphs(groupedMorphs)

# 5 - Download result as standalone interactive page <a class="anchor" id="bullet5"></a>
##### [Back to ToC](#TOC)

In [6]:
from IPython.display import FileLink

# Save to file + provide download link
def saveGroupedMorphsToHtml(grouped, outputPath="grouped_morphology_dynamic.html"):
    htmlContent = buildGroupedMorphsHtml(grouped)
    with open(outputPath, "w", encoding="utf-8") as f:
        f.write(f"<!DOCTYPE html><html><head><meta charset='utf-8'></head><body>{htmlContent}</body></html>")
    print(f"HTML file saved as: {outputPath}")
    display(FileLink(outputPath))

saveGroupedMorphsToHtml(groupedMorphs, "grouped_morphology_dynamic.html")

HTML file saved as: grouped_morphology_dynamic.html


# 6 - Attribution and footnotes <a class="anchor" id="bullet6"></a>
##### [Back to ToC](#TOC)

This Jupyter notebook used the following sources for the analysis and implementation:

- [Morpheus Morphological Analyzer (Perseus Project)](https://github.com/perseids-tools/morpheus/)
- [Greek Beta Code standard](https://stephanus.tlg.uci.edu/encoding/BCM.pdf)
- [beta-code-py](https://github.com/perseids-tools/beta-code-py)
- ['Parsing Information for Robinson-like parsing tags Adapted from Ulrik Sandborg-Petersen's Description for Tischendorf 8th'](https://github.com/biblicalhumanities/Nestle1904/blob/master/morph/parsing.txt)
- Python package [morphkit](https://tonyjurg.github.io/morphkit/)

Greek base text: Nestle1904 Greek New Testament, edited by Eberhard Nestle, published in 1904 by the British and Foreign Bible Society.
> Nestle, Eberhard. Η Καινή Διαθήκη Novum Testamentum Graece (New York: Fleming H. Revell Company, 1904).

The 1913 reprint is available [here](https://archive.org/details/hkainediathekete00lond/), which was transcribed by [Diego Santos](https://sites.google.com/site/nestle1904/home). All this material is in Public domain.

Betacode syntax follows the TLG/Perseus convention: [Thesaurus Linguae Graecae / Perseus Project spec.](https://stephanus.tlg.uci.edu/encoding/BCM.pdf)

The conversion code between Unicode and Betacode is available at [GitHub repository perseids-tools/beta-code-py](https://github.com/perseids-tools/beta-code-py).

The [N1904-TF dataset](https://centerblc.github.io/N1904/) available under [MIT licence](https://github.com/CenterBLC/N1904/blob/main/LICENSE.md), Copyright (c) 2025 Center of Biblical Languages and Computing (CBLC). Formal reference: 
> Tony Jurg, Saulo de Oliveira Cantanhêde, & Oliver Glanz. (2024). *CenterBLC/N1904: Nestle 1904 Text-Fabric data*. Zenodo. DOI: [10.5281/zenodo.13117911](https://doi.org/10.5281/zenodo.13117910).

The Text-Fabric dataset [tonyjurg.github.io/N1904addons](https://tonyjurg.github.io/N1904addons/) is made available under the [Creative Commons Attribution 4.0 International (CC BY 4.0)](https://github.com/tonyjurg/N1904addons/blob/main/LICENSE.md) license.

The [Anaconda Asisstant](https://www.anaconda.com/capability/anaconda-assistant) (using [OpenAI](https://openai.com/) as backend) was used to debug and/or optimze the code in this Jupyter Notebook.

# 7 - Required libraries <a class="anchor" id="bullet7"></a>
##### [Back to ToC](#TOC)

Since the scripts in this notebook utilize Text-Fabric, [it requires currently (Apr 2025) Python >=3.9.0](https://pypi.org/project/text-fabric) together with the following libraries installed in the environment:

``` python
    beta_code
    IPython.display
    json
    morphkit
```

You can install any missing library from within Jupyter Notebook using either `pip` or `pip3`.

# 8 - Notebook version details<a class="anchor" id="bullet8"></a>
##### [Back to ToC](#TOC)

<div style="float: left;">
  <table>
    <tr>
      <td><strong>Author</strong></td>
      <td>Tony Jurg</td>
    </tr>
    <tr>
      <td><strong>Version</strong></td>
      <td>1.3</td>
    </tr>
    <tr>
      <td><strong>Date</strong></td>
      <td>29 April 2025</td>
    </tr>
  </table>
</div>