{
"cells": [
{
"cell_type": "markdown",
"id": "9bdebd80-43ef-452e-a27c-7b38f9847d0e",
"metadata": {},
"source": [
"# Create a JSON Knowledge Graph representing a Text-Fabric dataset (N1904-TF)"
]
},
{
"cell_type": "markdown",
"id": "462dc028-7cbc-43f8-b0ab-b857dc1afb1f",
"metadata": {},
"source": [
"## Table of content (ToC)\n",
"* 1 - Introduction\n",
"* 2 - Load the TF dataset\n",
"* 3 - Run part of the Doc4TF code\n",
"* 4 - Run the extra code\n",
"* 5 - The result: a JSON Knowledge Graph\n",
"* 6 - Notebook version details"
]
},
{
"cell_type": "markdown",
"id": "6d15763f-4f24-4828-8e5b-c806e471fbf9",
"metadata": {},
"source": [
"# 1 - Introduction \n",
"##### [Back to ToC](#TOC)\n",
"\n",
"In this notebook we will create the bare (JSON) Knowlede Graph. To create the source dictionairy we will re-use part of the code I created for [Doc4TF](https://github.com/tonyjurg/Doc4TF)."
]
},
{
"cell_type": "markdown",
"id": "1ebfc77b-cc01-49f0-931e-15e2bd4179ef",
"metadata": {},
"source": [
"## 2 - Load the TF dataset \n",
"##### [Back to ToC](#TOC)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "d1cae453-890b-49bf-90aa-082ca82c36d7",
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"**Locating corpus resources ...**"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"app: ~/text-fabric-data/github/CenterBLC/N1904/app"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"data: ~/text-fabric-data/github/CenterBLC/N1904/tf/1.0.0"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
" TF: TF API 12.6.1, CenterBLC/N1904/app v3, Search Reference
\n",
" Data: CenterBLC - N1904 1.0.0, Character table, Feature docs
\n",
" Node types
\n",
"\n",
" \n",
" Name | \n",
" # of nodes | \n",
" # slots / node | \n",
" % coverage | \n",
"
\n",
"\n",
"\n",
" book | \n",
" 27 | \n",
" 5102.93 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" chapter | \n",
" 260 | \n",
" 529.92 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" verse | \n",
" 7944 | \n",
" 17.34 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" sentence | \n",
" 8011 | \n",
" 17.20 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" group | \n",
" 8945 | \n",
" 7.01 | \n",
" 46 | \n",
"
\n",
"\n",
"\n",
" clause | \n",
" 42506 | \n",
" 8.36 | \n",
" 258 | \n",
"
\n",
"\n",
"\n",
" wg | \n",
" 106868 | \n",
" 6.88 | \n",
" 533 | \n",
"
\n",
"\n",
"\n",
" phrase | \n",
" 69007 | \n",
" 1.90 | \n",
" 95 | \n",
"
\n",
"\n",
"\n",
" subphrase | \n",
" 116178 | \n",
" 1.60 | \n",
" 135 | \n",
"
\n",
"\n",
"\n",
" word | \n",
" 137779 | \n",
" 1.00 | \n",
" 100 | \n",
"
\n",
"
\n",
" Sets: no custom sets
\n",
" Features:
\n",
"Nestle 1904 Greek New Testament
\n",
" \n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
material after the end of the word\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
1 if it is an apposition container\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
1 if the sentence, group, clause, phrase or wg has an article\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
this is XML attribute before\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
book name (full name)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
book name (abbreviated) from ref attribute in xml\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
grammatical case\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
chapter number, from ref attribute in xml\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
clause type\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
this is XML attribute cls\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
clause type\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
this is XML attribute criticalsign\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
clause rule (from xml attribute Rule)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
grammatical degree\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
1 if the word is out of sequence in the xml\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
domain\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
this is XML attribute framespec\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
this is XML attribute function\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
grammatical gender\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
English gloss (BGVB)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
xml id\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
type of junction\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
language the text is in\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
lexical lemma\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
transliteration of the word lemma\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
ln\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
verbal mood\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
morphological code\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
node id (as in the XML source data)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
lemma normalized\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
annotation of linguistic nature\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
generated number (not in xml): book: (Matthew=1, Mark=2, ..., Revelation=27); sentence: numbered per chapter; word: numbered per verse.\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
grammatical number\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
grammatical person\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
punctuation found after a word\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
biblical reference with word counting\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
number of referent\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
this is XML attribute rela\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
role\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
syntactical rule\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
part-of-speach\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
strong number\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
this is XML attribute subjrefspec\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
verbal tense\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
the text of a word\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
material after the end of the word (excluding critical signs)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
translation of the word surface text according to the Berean Interlinear Bible\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
transliteration of the word surface text\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
syntactical type (on sentence, group, clause or phrase)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
morphological type (on word), syntactical type (on sentence, group, clause, phrase or wg)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
word in unicode characters without accents and diacritical markers\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
word in unicode characters plus material after it\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
this is XML attribute variant\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
verse number, from ref attribute in xml\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
verbal voice\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
frame\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
none
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
none
\n",
"\n",
"
parent relationship between words\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
this is XML attribute sibling\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
none
\n",
"\n",
"
number of subject referent\n",
"\n",
"
\n",
"\n",
"
\n",
" \n",
"\n",
" Settings:
specified
- apiVersion:
3
- appName:
CenterBLC/N1904
- appPath:
C:/Users/tonyj/text-fabric-data/github/CenterBLC/N1904/app
- commit:
gdb630837ae89b9468c9e50d13bda05cfd3de4f18
- css:
''
dataDisplay:
- excludedFeatures:
[]
noneValues:
- sectionSep1:
- sectionSep2:
:
- textFormat:
text-orig-full
docs:
- docBase:
https://github.com/CenterBLC/N1904/tree/main/docs
- docPage:
about
- docRoot:
https://github.com/CenterBLC/N1904
featureBase:
https://github.com/CenterBLC/N1904/blob/main/docs/features/<feature>.md
- featurePage:
README
- interfaceDefaults: {fmt:
text-orig-full
} - isCompatible:
True
- local:
local
localDir:
C:/Users/tonyj/text-fabric-data/github/CenterBLC/N1904/_temp
provenanceSpec:
- branch:
main
- corpus:
Nestle 1904 Greek New Testament
- doi:
10.5281/zenodo.13117910
- moduleSpecs:
[]
- org:
CenterBLC
- relative:
/tf
- repo:
N1904
- repro:
N1904
- version:
1.0.0
- webBase:
https://learner.bible/text/show_text/nestle1904/
- webHint:
Show this on the website
- webLang:
en
webUrl:
https://learner.bible/text/show_text/nestle1904/<1>/<2>/<3>
- webUrlLex:
{webBase}/word?version={version}&id=<lid>
- release:
1.0.0
typeDisplay:
clause:
- condense:
True
- label:
{typ} {function} {rela} \\\\ {cls} {role} {junction}
- style:
''
group:
- label:
{typ} {function} {rela} \\\\ {typems} {role} {rule}
- style:
''
phrase:
- condense:
True
- label:
{typ} {function} {rela} \\\\ {typems} {role} {rule}
- style:
''
sentence:
- label:
{typ} {function} {rela} \\\\ {role} {rule}
- style:
''
subphrase:
- label:
{typ} {function} {rela} \\\\ {typems} {role} {rule}
- style:
''
verse:
- condense:
True
- label:
{book} {chapter}:{verse}
- style:
''
wg:
- condense:
True
- label:
{typems} {role} {rule} {junction}
- style:
''
word:
features:
- featuresBare: [
gloss
]
- writing:
grc
\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"Display is setup for viewtype [syntax-view](https://github.com/CenterBLC/N1904/blob/main/docs/syntax-view.md#start)"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"See [here](https://github.com/CenterBLC/N1904/blob/main/docs/viewtypes.md#start) for more information on viewtypes"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from tf.app import use\n",
"from collections import defaultdict\n",
"import json\n",
"\n",
"# Load the N1904 Text-Fabric dataset\n",
"A = use('CenterBLC/N1904', version='1.0.0', hoist=globals())"
]
},
{
"cell_type": "markdown",
"id": "937845a6-75dc-427b-8a3c-14669eb205aa",
"metadata": {},
"source": [
"# 3 - Run part of the Doc4TF code \n",
"##### [Back to ToC](#TOC)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "d51d7b9b-8c04-4272-a4d2-46e2ad94097e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Gathering generic details\n",
"Analyzing Node Features: ........................................................\n",
"Analyzing Edge Features: .....\n",
"Finished in 19.82 seconds.\n"
]
}
],
"source": [
"verbose=False\n",
"tableLimit=10\n",
"\n",
"# Initialize an empty dictionary to store feature data\n",
"featureDict = {}\n",
"import time\n",
"overallTime = time.time()\n",
"\n",
"def getFeatureDescription(metaData):\n",
" \"\"\"\n",
" This function looks for the 'description' key in the metadata dictionary. If the key is found,\n",
" it returns the corresponding description. If the key is not present, it returns a default \n",
" message indicating that no description is available.\n",
"\n",
" Parameters:\n",
" metaData (dict): A dictionary containing metadata about a feature.\n",
"\n",
" Returns:\n",
" str: The description of the feature if available, otherwise a default message.\n",
" \"\"\"\n",
" return metaData.get('description', \"No feature description\")\n",
"\n",
"def setDataType(metaData):\n",
" \"\"\"\n",
" This function checks for the 'valueType' key in the metadata. If the key is present, it\n",
" returns 'String' if the value is 'str', and 'Integer' for other types. If the 'valueType' key\n",
" is not present, it returns 'Unknown'.\n",
"\n",
" Parameters:\n",
" metaData (dict): A dictionary containing metadata, including the 'valueType' of a feature.\n",
"\n",
" Returns:\n",
" str: A string indicating the determined data type ('String', 'Integer', or 'Unknown').\n",
" \"\"\"\n",
" if 'valueType' in metaData:\n",
" return \"String\" if metaData[\"valueType\"] == 'str' else \"Integer\"\n",
" return \"Unknown\"\n",
"\n",
"def processFeature(feature, featureType, featureMethod):\n",
" \"\"\"\n",
" Processes a given feature by extracting metadata, description, and data type, and then\n",
" compiles frequency data for different node types in a feature dictionary. Certain features\n",
" are skipped based on their type. The processed data is added to a global feature dictionary.\n",
"\n",
" Parameters:\n",
" feature (str): The name of the feature to be processed.\n",
" featureType (str): The type of the feature ('Node' or 'Edge').\n",
" featureMethod (function): A function to obtain feature data.\n",
"\n",
" Returns:\n",
" None: The function updates a global dictionary with processed feature data and does not return anything.\n",
" \"\"\"\n",
" \n",
" # Obtain the meta data\n",
" featureMetaData = featureMethod(feature).meta\n",
" featureDescription = getFeatureDescription(featureMetaData)\n",
" dataType = setDataType(featureMetaData)\n",
"\n",
" # Initialize dictionary to store feature frequency data\n",
" featureFrequencyDict = {}\n",
"\n",
" # Skip for specific features based on type\n",
" if not (featureType == 'Node' and feature == 'otype') and not (featureType == 'Edge' and feature == 'oslots'):\n",
" for nodeType in F.otype.all:\n",
" frequencyLists = featureMethod(feature).freqList(nodeType)\n",
" \n",
" # Calculate the total frequency\n",
" if not isinstance(frequencyLists, int):\n",
" frequencyTotal = sum(freq for _, freq in frequencyLists)\n",
" else:\n",
" frequencyTotal = frequencyLists\n",
" \n",
" # Calculate the number of entries\n",
" if not isinstance(frequencyLists, int):\n",
" numberOfEntries = len(frequencyLists)\n",
" else:\n",
" numberOfEntries = 1 if frequencyLists != 0 else 0\n",
" # Check the length of the frequency table\n",
" truncated = True if numberOfEntries > tableLimit else False\n",
" \n",
" if not isinstance(frequencyLists, int):\n",
" if len(frequencyLists)!=0:\n",
" featureFrequencyDict[nodeType] = {'nodetype': nodeType, 'freq': frequencyLists[:tableLimit], 'total': frequencyTotal, 'truncated': truncated}\n",
" elif isinstance(frequencyLists, int):\n",
" if frequencyLists != 0:\n",
" featureFrequencyDict[nodeType] = {'nodetype': nodeType, 'freq': [(\"Link\", frequencyLists)], 'total': frequencyTotal, 'truncated': truncated}\n",
"\n",
" # Add processed feature data to the main dictionary\n",
" featureDict[feature] = {'name': feature, 'descr': featureDescription, 'type': featureType, 'datatype': dataType, 'freqlist': featureFrequencyDict}\n",
" \n",
"########################################################\n",
"# MAIN FUNCTION #\n",
"########################################################\n",
"\n",
"########################################################\n",
"# Gather general information #\n",
"########################################################\n",
"\n",
"print('Gathering generic details')\n",
"\n",
"# Initialize default values\n",
"corpusName = A.appName\n",
"liveName = ''\n",
"versionName = A.version\n",
"\n",
"# Trying to locate corpus information\n",
"if A.provenance:\n",
" for parts in A.provenance[0]: \n",
" if isinstance(parts, tuple):\n",
" key, value = parts[0], parts[1]\n",
" if verbose: print (f'General info: {key}={value}')\n",
" if key == 'corpus': corpusName = value\n",
" if key == 'version': versionName = value\n",
" # value for live is a tuple\n",
" if key == 'live': liveName=value[1]\n",
"if liveName is not None and len(liveName)>1:\n",
" # an URL was found\n",
" pageTitleMD = f'Doc4TF pages for [{corpusName}]({liveName}) (version {versionName})'\n",
" pageTitleHTML = f'Doc4TF pages for {corpusName} (version {versionName})
'\n",
"else:\n",
" # No URL found\n",
" pageTitleMD = f'Doc4TF pages for {corpusName} (version {versionName})'\n",
" pageTitleHTML = f'Doc4TF pages for {corpusName} (version {versionName})
'\n",
"\n",
"# Overwrite in case user provided a title\n",
"if 'customPageTitleMD_' in globals():\n",
" pageTitleMD = customPageTitleMD\n",
"if 'customPageTitleHTML' in globals():\n",
" pageTitleHTML = customPageTitleHTML\n",
"\n",
" \n",
"########################################################\n",
"# Processing node features #\n",
"########################################################\n",
"\n",
"print('Analyzing Node Features: ', end='')\n",
"for nodeFeature in Fall():\n",
" if not verbose: print('.', end='') # Progress indicator\n",
" processFeature(nodeFeature, 'Node', Fs)\n",
" if verbose: print(f'\\nFeature {nodeFeature} = {featureDict[nodeFeature]}\\n') # Print feature data if verbose\n",
"\n",
"########################################################\n",
"# Processing edge features #\n",
"########################################################\n",
"\n",
"print('\\nAnalyzing Edge Features: ', end='')\n",
"for edgeFeature in Eall():\n",
" if not verbose: print('.', end='') # Progress indicator\n",
" processFeature(edgeFeature, 'Edge', Es)\n",
" if verbose: print(f'\\nFeature {edgeFeature} = {featureDict[edgeFeature]}\\n') # Print feature data if verbose\n",
"\n",
"########################################################\n",
"# Sorting feature dictionary #\n",
"########################################################\n",
"\n",
"# Sort the feature dictionary alphabetically by keys\n",
"sortedFeatureDict = {k: featureDict[k] for k in sorted(featureDict)}\n",
"\n",
"# Print the sorted feature dictionary if verbose\n",
"if verbose:\n",
" print(\"\\nSorted Feature Dictionary:\")\n",
" for key, value in sortedFeatureDict.items():\n",
" print(f\"Feature {key} = {value}\")\n",
" \n",
"print(f'\\nFinished in {time.time() - overallTime:.2f} seconds.')"
]
},
{
"cell_type": "markdown",
"id": "a781c10a-7e95-46f2-a9b9-f94e725b5b3a",
"metadata": {},
"source": [
"# 4 - Run the extra code \n",
"##### [Back to ToC](#TOC)"
]
},
{
"cell_type": "markdown",
"id": "e3018ede-7247-4b94-8aba-649b6e41b6c9",
"metadata": {},
"source": [
"import json\n",
"\n",
"knowledgeGraph = {\n",
" \"nodes\": {},\n",
" \"edges\": []\n",
"}\n",
"\n",
"for featName, featInfo in featureDict.items():\n",
" # Determine if \"Node\" or \"Edge\" feature\n",
" featureKind = featInfo.get(\"type\", \"Node\") # \"Node\" or \"Edge\"\n",
" if featureKind.lower() == \"edge\":\n",
" featureType = \"edge_feature\"\n",
" else:\n",
" featureType = \"node_feature\"\n",
"\n",
" # Build a namespaced key for this feature\n",
" featureKey = f\"feature::{featName}\"\n",
"\n",
" # Make sure the feature node is in the graph\n",
" nodeEntry = knowledgeGraph[\"nodes\"].setdefault(featureKey, {\n",
" \"type\": featureType,\n",
" \"valid_on\": []\n",
" })\n",
"\n",
" # Store more metadata about the feature\n",
" nodeEntry[\"featureName\"] = featInfo.get(\"name\", featName) # e.g. \"after\"\n",
" nodeEntry[\"description\"] = featInfo.get(\"descr\", \"\") # e.g. \"material after the end of ...\"\n",
" nodeEntry[\"datatype\"] = featInfo.get(\"datatype\", \"\") # e.g. \"String\"\n",
"\n",
" # Collect node types from the freqlist\n",
" freqInfo = featInfo.get(\"freqlist\", {})\n",
" for freqKey, freqDict in freqInfo.items():\n",
" # freqKey might be \"phrase\", \"word\", etc.\n",
" # freqDict has \"nodetype\": \"phrase\" (or \"word\"), plus \"freq\", \"total\", ...\n",
" nodeTypeName = freqDict.get(\"nodetype\", freqKey)\n",
"\n",
" # Build a namespaced key for this node type\n",
" nodeTypeKey = f\"otype::{nodeTypeName}\"\n",
"\n",
" # Make sure that node type is declared\n",
" if nodeTypeKey not in knowledgeGraph[\"nodes\"]:\n",
" knowledgeGraph[\"nodes\"][nodeTypeKey] = {\n",
" \"type\": \"node_type\",\n",
" \"origName\": nodeTypeName\n",
" }\n",
"\n",
" # Record that this feature is valid on this node type\n",
" if nodeTypeKey not in nodeEntry[\"valid_on\"]:\n",
" nodeEntry[\"valid_on\"].append(nodeTypeKey)\n",
"\n",
" # Add an edge with frequency detail\n",
" knowledgeGraph[\"edges\"].append({\n",
" \"from\": featureKey,\n",
" \"to\": nodeTypeKey,\n",
" \"relation\": \"valid on\",\n",
" \"freqDetail\": freqDict\n",
" })\n",
"\n",
"# Output the JSON\n",
"outputPath = \"n1904_knowledge_graph.json\"\n",
"with open(outputPath, \"w\", encoding=\"utf-8\") as f:\n",
" json.dump(knowledgeGraph, f, indent=2)\n",
"\n",
"print(f\"Knowledge graph saved to {outputPath}\")\n",
"\n",
"# Summary\n",
"numNodeTypes = sum(1 for n, d in knowledgeGraph[\"nodes\"].items() if d[\"type\"] == \"node_type\")\n",
"numFeatures = sum(1 for n, d in knowledgeGraph[\"nodes\"].items() if d[\"type\"].endswith(\"_feature\"))\n",
"numEdges = len(knowledgeGraph[\"edges\"])\n",
"print(f\" - Node types: {numNodeTypes}\")\n",
"print(f\" - Features: {numFeatures}\")\n",
"print(f\" - Edges: {numEdges}\")"
]
},
{
"cell_type": "markdown",
"id": "1fa83632-db4a-49f6-8740-418183b986f2",
"metadata": {},
"source": [
"# 5 - The result: a JSON Knowledge Graph \n",
"##### [Back to ToC](#TOC)"
]
},
{
"cell_type": "markdown",
"id": "80a020d1-5a7e-46a0-9c78-83ddcfc348f4",
"metadata": {},
"source": [
"The resulting JSON is the actual Knowledge Graph which will be used as input for the [other notebook](generate_cytoscape_html.ipynb)."
]
},
{
"cell_type": "markdown",
"id": "0eba060e-a680-4da1-a545-546584aa6214",
"metadata": {
"jp-MarkdownHeadingCollapsed": true
},
"source": [
"# 6 - Notebook version details\n",
"##### [Back to ToC](#TOC)\n",
"\n",
"\n",
"
\n",
" \n",
" Author | \n",
" Tony Jurg | \n",
"
\n",
" \n",
" Version | \n",
" 1.1 | \n",
"
\n",
" \n",
" Date | \n",
" 3 April 2025 | \n",
"
\n",
"
\n",
"
"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}