# Decode the unresolved words

## Table of content (ToC)<a class="anchor" id="TOC"></a>
* <a href="#bullet1">1 - Identify the words which could not be looked up</a>
* <a href="#bullet2">2 - Convert missing word list from Betacode to Unicode</a>
* <a href="#bullet3">3 - Analyse the list</a>
* <a href="#bullet6">6 - Required libraries</a>
* <a href="#bullet7">7 - Notebook version</a>


# 1 - Identify the words which could not be looked up<a class="anchor" id="bullet1"></a>
##### [Back to ToC](#TOC)

Note: The following cell was fixated (i.e. turned to RAW) in order to prevent from execution. Its input files are not available any more/

Extracted 653 unresolved BetaCode words and saved to unresolvedBetacode.txt.

#  2 - Convert missing word list from Betacode to Unicode<a class="anchor" id="bullet2"></a>
##### [Back to ToC](#TOC)

## 2.1 - Load N1904-TF with N1904addons featureset 

In [1]:
# Loading the Text-Fabric code
# Note: it is assumed Text-Fabric is installed in your environment
from tf.fabric import Fabric
from tf.app import use

In [5]:
# load the N1904 app and data
N1904 = use ("CenterBLC/N1904", version="1.0.0", mod="tonyjurg/N1904addons/tf/", hoist=globals())

**Locating corpus resources ...**

Name,# of nodes,# slots / node,% coverage
book,27,5102.93,100
chapter,260,529.92,100
verse,7944,17.34,100
sentence,8011,17.2,100
group,8945,7.01,46
clause,42506,8.36,258
wg,106868,6.88,533
phrase,69007,1.9,95
subphrase,116178,1.6,135
word,137779,1.0,100


Display is setup for viewtype [syntax-view](https://github.com/CenterBLC/N1904/blob/main/docs/syntax-view.md#start)

See [here](https://github.com/CenterBLC/N1904/blob/main/docs/viewtypes.md#start) for more information on viewtypes

# 3 - Analyse the list<a class="anchor" id="bullet2"></a>
##### [Back to ToC](#TOC)

In [6]:
# File containing Greek words (one word per line)
wordsFile = 'unresolvedUnicode.txt'

# Output file for Louw-Nida domains
outputFile = 'greekWordFeatures.csv'

In [7]:
import json

# Paths to input and output files
inputFile = 'unresolvedBetacode.txt'      # File containing unresolved words in BetaCode
outputFile = 'unresolvedUnicode.json'    # File to save the converted Beta Code words

# Read Greek words in BetaCode from the input file
with open(inputFile, 'r', encoding='utf-8') as inFile:
    greekWords = [word.strip() for word in inFile.read().splitlines()]

# Initialize dictionary to store unique entries
uniqueWordsDict = {}

# Iterate over the words to look them up in Text-Fabric
for word in greekWords:
    for node in F.otype.s('word'):
        if F.betacode.v(node) == word:
            # Retrieve all relevant attributes
            unicode = F.unicode.v(node)
            gender = F.gender.v(node)
            lemma = F.lemma.v(node)
            morph = F.morph.v(node)
            mood = F.mood.v(node)
            number = F.number.v(node)
            person = F.person.v(node)
            pos = F.sp.v(node)
            tense = F.tense.v(node)
            typems = F.typems.v(node)
            voice = F.voice.v(node)
            
            # Create a dictionary of the current entry
            entry = {
                "betacode": word,
                "unicode": unicode,
                "gender": gender,
                "lemma": lemma,
                "morph": morph,
                "mood": mood,
                "number": number,
                "person": person,
                "pos": pos,
                "tense": tense,
                "typems": typems,
                "voice": voice,
            }

            # Create or update the entry for the Unicode word
            if unicode not in uniqueWordsDict:
                uniqueWordsDict[unicode] = {}

            # Check if the entry already exists in the sub-dictionary
            if entry not in uniqueWordsDict[unicode].values():
                currentIndex = len(uniqueWordsDict[unicode]) + 1
                uniqueWordsDict[unicode][str(currentIndex)] = entry

# Write the dictionary to the output JSON file
with open(outputFile, 'w', encoding='utf-8') as outFile:
    json.dump(uniqueWordsDict, outFile, ensure_ascii=False, indent=4)

print(f"Processed {len(greekWords)} words. Unique entries saved to {outputFile}.")


Processed 653 words. Unique entries saved to unresolvedUnicode.json.


# 7 - Required libraries <a class="anchor" id="bullet7"></a>
##### [Back to ToC](#TOC)

Since the scripts in this notebook utilize Text-Fabric, [it requires currently (Apr 2025) Python >=3.9.0](https://pypi.org/project/text-fabric) together with the following libraries installed in the environment:

``` python
    beta_code
    IPython.display
    json
    morphkit
```

You can install any missing library from within Jupyter Notebook using either `pip` or `pip3`.

# 8 - Notebook version details<a class="anchor" id="bullet8"></a>
##### [Back to ToC](#TOC)

<div style="float: left;">
  <table>
    <tr>
      <td><strong>Author</strong></td>
      <td>Tony Jurg</td>
    </tr>
    <tr>
      <td><strong>Version</strong></td>
      <td>1.0</td>
    </tr>
    <tr>
      <td><strong>Date</strong></td>
      <td>19 March 2025</td>
    </tr>
  </table>
</div>