{ "cells": [ { "cell_type": "markdown", "id": "9bdebd80-43ef-452e-a27c-7b38f9847d0e", "metadata": {}, "source": [ "# Create a JSON Knowledge Graph representing a Text-Fabric dataset (N1904-TF)" ] }, { "cell_type": "markdown", "id": "462dc028-7cbc-43f8-b0ab-b857dc1afb1f", "metadata": {}, "source": [ "## Table of content (ToC)\n", "* 1 - Introduction\n", "* 2 - Load the TF dataset\n", "* 3 - Run part of the Doc4TF code\n", "* 4 - Run the extra code\n", "* 5 - The result: a JSON Knowledge Graph\n", "* 6 - Notebook version details" ] }, { "cell_type": "markdown", "id": "6d15763f-4f24-4828-8e5b-c806e471fbf9", "metadata": {}, "source": [ "# 1 - Introduction \n", "##### [Back to ToC](#TOC)\n", "\n", "In this notebook we will create the bare (JSON) Knowlede Graph. To create the source dictionairy we will re-use part of the code I created for [Doc4TF](https://github.com/tonyjurg/Doc4TF)." ] }, { "cell_type": "markdown", "id": "1ebfc77b-cc01-49f0-931e-15e2bd4179ef", "metadata": {}, "source": [ "## 2 - Load the TF dataset \n", "##### [Back to ToC](#TOC)" ] }, { "cell_type": "code", "execution_count": 1, "id": "d1cae453-890b-49bf-90aa-082ca82c36d7", "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "**Locating corpus resources ...**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "app: ~/text-fabric-data/github/CenterBLC/N1904/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/CenterBLC/N1904/tf/1.0.0" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " TF: TF API 12.6.1, CenterBLC/N1904/app v3, Search Reference
\n", " Data: CenterBLC - N1904 1.0.0, Character table, Feature docs
\n", "
Node types\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
Name# of nodes# slots / node% coverage
book275102.93100
chapter260529.92100
verse794417.34100
sentence801117.20100
group89457.0146
clause425068.36258
wg1068686.88533
phrase690071.9095
subphrase1161781.60135
word1377791.00100
\n", " Sets: no custom sets
\n", " Features:
\n", "
Nestle 1904 Greek New Testament\n", "
\n", "\n", "
\n", "
\n", "after\n", "
\n", "
str
\n", "\n", " material after the end of the word\n", "\n", "
\n", "\n", "
\n", " \n", "
int
\n", "\n", " 1 if it is an apposition container\n", "\n", "
\n", "\n", "
\n", "
\n", "articular\n", "
\n", "
int
\n", "\n", " 1 if the sentence, group, clause, phrase or wg has an article\n", "\n", "
\n", "\n", "
\n", "
\n", "before\n", "
\n", "
str
\n", "\n", " this is XML attribute before\n", "\n", "
\n", "\n", "
\n", "
\n", "book\n", "
\n", "
str
\n", "\n", " book name (full name)\n", "\n", "
\n", "\n", "
\n", "
\n", "bookshort\n", "
\n", "
str
\n", "\n", " book name (abbreviated) from ref attribute in xml\n", "\n", "
\n", "\n", "
\n", "
\n", "case\n", "
\n", "
str
\n", "\n", " grammatical case\n", "\n", "
\n", "\n", "
\n", "
\n", "chapter\n", "
\n", "
int
\n", "\n", " chapter number, from ref attribute in xml\n", "\n", "
\n", "\n", "
\n", "
\n", "clausetype\n", "
\n", "
str
\n", "\n", " clause type\n", "\n", "
\n", "\n", "
\n", "
\n", "cls\n", "
\n", "
str
\n", "\n", " this is XML attribute cls\n", "\n", "
\n", "\n", "
\n", "
\n", "cltype\n", "
\n", "
str
\n", "\n", " clause type\n", "\n", "
\n", "\n", "
\n", "
\n", "criticalsign\n", "
\n", "
str
\n", "\n", " this is XML attribute criticalsign\n", "\n", "
\n", "\n", "
\n", "
\n", "crule\n", "
\n", "
str
\n", "\n", " clause rule (from xml attribute Rule)\n", "\n", "
\n", "\n", "
\n", "
\n", "degree\n", "
\n", "
str
\n", "\n", " grammatical degree\n", "\n", "
\n", "\n", "
\n", "
\n", "discontinuous\n", "
\n", "
int
\n", "\n", " 1 if the word is out of sequence in the xml\n", "\n", "
\n", "\n", "
\n", "
\n", "domain\n", "
\n", "
str
\n", "\n", " domain\n", "\n", "
\n", "\n", "
\n", "
\n", "framespec\n", "
\n", "
str
\n", "\n", " this is XML attribute framespec\n", "\n", "
\n", "\n", "
\n", "
\n", "function\n", "
\n", "
str
\n", "\n", " this is XML attribute function\n", "\n", "
\n", "\n", "
\n", "
\n", "gender\n", "
\n", "
str
\n", "\n", " grammatical gender\n", "\n", "
\n", "\n", "
\n", "
\n", "gloss\n", "
\n", "
str
\n", "\n", " English gloss (BGVB)\n", "\n", "
\n", "\n", "
\n", "
\n", "id\n", "
\n", "
str
\n", "\n", " xml id\n", "\n", "
\n", "\n", "
\n", "
\n", "junction\n", "
\n", "
str
\n", "\n", " type of junction\n", "\n", "
\n", "\n", "
\n", "
\n", "lang\n", "
\n", "
str
\n", "\n", " language the text is in\n", "\n", "
\n", "\n", "
\n", "
\n", "lemma\n", "
\n", "
str
\n", "\n", " lexical lemma\n", "\n", "
\n", "\n", "
\n", "
\n", "lemmatranslit\n", "
\n", "
str
\n", "\n", " transliteration of the word lemma\n", "\n", "
\n", "\n", "
\n", "
\n", "ln\n", "
\n", "
str
\n", "\n", " ln\n", "\n", "
\n", "\n", "
\n", "
\n", "mood\n", "
\n", "
str
\n", "\n", " verbal mood\n", "\n", "
\n", "\n", "
\n", "
\n", "morph\n", "
\n", "
str
\n", "\n", " morphological code\n", "\n", "
\n", "\n", "
\n", "
\n", "nodeid\n", "
\n", "
str
\n", "\n", " node id (as in the XML source data)\n", "\n", "
\n", "\n", "
\n", "
\n", "normalized\n", "
\n", "
str
\n", "\n", " lemma normalized\n", "\n", "
\n", "\n", "
\n", "
\n", "note\n", "
\n", "
str
\n", "\n", " annotation of linguistic nature\n", "\n", "
\n", "\n", "
\n", "
\n", "num\n", "
\n", "
int
\n", "\n", " generated number (not in xml): book: (Matthew=1, Mark=2, ..., Revelation=27); sentence: numbered per chapter; word: numbered per verse.\n", "\n", "
\n", "\n", "
\n", "
\n", "number\n", "
\n", "
str
\n", "\n", " grammatical number\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "person\n", "
\n", "
str
\n", "\n", " grammatical person\n", "\n", "
\n", "\n", "
\n", "
\n", "punctuation\n", "
\n", "
str
\n", "\n", " punctuation found after a word\n", "\n", "
\n", "\n", "
\n", "
\n", "ref\n", "
\n", "
str
\n", "\n", " biblical reference with word counting\n", "\n", "
\n", "\n", "
\n", "
\n", "referent\n", "
\n", "
str
\n", "\n", " number of referent\n", "\n", "
\n", "\n", "
\n", "
\n", "rela\n", "
\n", "
str
\n", "\n", " this is XML attribute rela\n", "\n", "
\n", "\n", "
\n", "
\n", "role\n", "
\n", "
str
\n", "\n", " role\n", "\n", "
\n", "\n", "
\n", "
\n", "rule\n", "
\n", "
str
\n", "\n", " syntactical rule\n", "\n", "
\n", "\n", "
\n", "
\n", "sp\n", "
\n", "
str
\n", "\n", " part-of-speach\n", "\n", "
\n", "\n", "
\n", "
\n", "strong\n", "
\n", "
int
\n", "\n", " strong number\n", "\n", "
\n", "\n", "
\n", "
\n", "subjrefspec\n", "
\n", "
str
\n", "\n", " this is XML attribute subjrefspec\n", "\n", "
\n", "\n", "
\n", "
\n", "tense\n", "
\n", "
str
\n", "\n", " verbal tense\n", "\n", "
\n", "\n", "
\n", "
\n", "text\n", "
\n", "
str
\n", "\n", " the text of a word\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer\n", "
\n", "
str
\n", "\n", " material after the end of the word (excluding critical signs)\n", "\n", "
\n", "\n", "
\n", "
\n", "trans\n", "
\n", "
str
\n", "\n", " translation of the word surface text according to the Berean Interlinear Bible\n", "\n", "
\n", "\n", "
\n", "
\n", "translit\n", "
\n", "
str
\n", "\n", " transliteration of the word surface text\n", "\n", "
\n", "\n", "
\n", "
\n", "typ\n", "
\n", "
str
\n", "\n", " syntactical type (on sentence, group, clause or phrase)\n", "\n", "
\n", "\n", "
\n", "
\n", "typems\n", "
\n", "
str
\n", "\n", " morphological type (on word), syntactical type (on sentence, group, clause, phrase or wg)\n", "\n", "
\n", "\n", "
\n", "
\n", "unaccent\n", "
\n", "
str
\n", "\n", " word in unicode characters without accents and diacritical markers\n", "\n", "
\n", "\n", "
\n", "
\n", "unicode\n", "
\n", "
str
\n", "\n", " word in unicode characters plus material after it\n", "\n", "
\n", "\n", "
\n", "
\n", "variant\n", "
\n", "
str
\n", "\n", " this is XML attribute variant\n", "\n", "
\n", "\n", "
\n", "
\n", "verse\n", "
\n", "
int
\n", "\n", " verse number, from ref attribute in xml\n", "\n", "
\n", "\n", "
\n", "
\n", "voice\n", "
\n", "
str
\n", "\n", " verbal voice\n", "\n", "
\n", "\n", "
\n", "
\n", "frame\n", "
\n", "
str
\n", "\n", " frame\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "parent\n", "
\n", "
none
\n", "\n", " parent relationship between words\n", "\n", "
\n", "\n", "
\n", "
\n", "sibling\n", "
\n", "
int
\n", "\n", " this is XML attribute sibling\n", "\n", "
\n", "\n", "
\n", "
\n", "subjref\n", "
\n", "
none
\n", "\n", " number of subject referent\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", " Settings:
specified
  1. apiVersion: 3
  2. appName: CenterBLC/N1904
  3. appPath: C:/Users/tonyj/text-fabric-data/github/CenterBLC/N1904/app
  4. commit: gdb630837ae89b9468c9e50d13bda05cfd3de4f18
  5. css: ''
  6. dataDisplay:
    • excludedFeatures: []
    • noneValues:
      • none
      • unknown
      • no value
      • NA
    • sectionSep1:
    • sectionSep2: :
    • textFormat: text-orig-full
  7. docs:
    • docBase: https://github.com/CenterBLC/N1904/tree/main/docs
    • docPage: about
    • docRoot: https://github.com/CenterBLC/N1904
    • featureBase:https://github.com/CenterBLC/N1904/blob/main/docs/features/<feature>.md
    • featurePage: README
  8. interfaceDefaults: {fmt: text-orig-full}
  9. isCompatible: True
  10. local: local
  11. localDir:C:/Users/tonyj/text-fabric-data/github/CenterBLC/N1904/_temp
  12. provenanceSpec:
    • branch: main
    • corpus: Nestle 1904 Greek New Testament
    • doi: 10.5281/zenodo.13117910
    • moduleSpecs: []
    • org: CenterBLC
    • relative: /tf
    • repo: N1904
    • repro: N1904
    • version: 1.0.0
    • webBase: https://learner.bible/text/show_text/nestle1904/
    • webHint: Show this on the website
    • webLang: en
    • webUrl:https://learner.bible/text/show_text/nestle1904/<1>/<2>/<3>
    • webUrlLex: {webBase}/word?version={version}&id=<lid>
  13. release: 1.0.0
  14. typeDisplay:
    • clause:
      • condense: True
      • label: {typ} {function} {rela} \\\\ {cls} {role} {junction}
      • style: ''
    • group:
      • label: {typ} {function} {rela} \\\\ {typems} {role} {rule}
      • style: ''
    • phrase:
      • condense: True
      • label: {typ} {function} {rela} \\\\ {typems} {role} {rule}
      • style: ''
    • sentence:
      • label: {typ} {function} {rela} \\\\ {role} {rule}
      • style: ''
    • subphrase:
      • label: {typ} {function} {rela} \\\\ {typems} {role} {rule}
      • style: ''
    • verse:
      • condense: True
      • label: {book} {chapter}:{verse}
      • style: ''
    • wg:
      • condense: True
      • label: {typems} {role} {rule} {junction}
      • style: ''
    • word:
      • features:
        • lemma
        • sp
      • featuresBare: [gloss]
  15. writing: grc
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
TF API: names N F E L T S C TF Fs Fall Es Eall Cs Call directly usable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "Display is setup for viewtype [syntax-view](https://github.com/CenterBLC/N1904/blob/main/docs/syntax-view.md#start)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "See [here](https://github.com/CenterBLC/N1904/blob/main/docs/viewtypes.md#start) for more information on viewtypes" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from tf.app import use\n", "from collections import defaultdict\n", "import json\n", "\n", "# Load the N1904 Text-Fabric dataset\n", "A = use('CenterBLC/N1904', version='1.0.0', hoist=globals())" ] }, { "cell_type": "markdown", "id": "937845a6-75dc-427b-8a3c-14669eb205aa", "metadata": {}, "source": [ "# 3 - Run part of the Doc4TF code \n", "##### [Back to ToC](#TOC)" ] }, { "cell_type": "code", "execution_count": 2, "id": "d51d7b9b-8c04-4272-a4d2-46e2ad94097e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Gathering generic details\n", "Analyzing Node Features: ........................................................\n", "Analyzing Edge Features: .....\n", "Finished in 19.82 seconds.\n" ] } ], "source": [ "verbose=False\n", "tableLimit=10\n", "\n", "# Initialize an empty dictionary to store feature data\n", "featureDict = {}\n", "import time\n", "overallTime = time.time()\n", "\n", "def getFeatureDescription(metaData):\n", " \"\"\"\n", " This function looks for the 'description' key in the metadata dictionary. If the key is found,\n", " it returns the corresponding description. If the key is not present, it returns a default \n", " message indicating that no description is available.\n", "\n", " Parameters:\n", " metaData (dict): A dictionary containing metadata about a feature.\n", "\n", " Returns:\n", " str: The description of the feature if available, otherwise a default message.\n", " \"\"\"\n", " return metaData.get('description', \"No feature description\")\n", "\n", "def setDataType(metaData):\n", " \"\"\"\n", " This function checks for the 'valueType' key in the metadata. If the key is present, it\n", " returns 'String' if the value is 'str', and 'Integer' for other types. If the 'valueType' key\n", " is not present, it returns 'Unknown'.\n", "\n", " Parameters:\n", " metaData (dict): A dictionary containing metadata, including the 'valueType' of a feature.\n", "\n", " Returns:\n", " str: A string indicating the determined data type ('String', 'Integer', or 'Unknown').\n", " \"\"\"\n", " if 'valueType' in metaData:\n", " return \"String\" if metaData[\"valueType\"] == 'str' else \"Integer\"\n", " return \"Unknown\"\n", "\n", "def processFeature(feature, featureType, featureMethod):\n", " \"\"\"\n", " Processes a given feature by extracting metadata, description, and data type, and then\n", " compiles frequency data for different node types in a feature dictionary. Certain features\n", " are skipped based on their type. The processed data is added to a global feature dictionary.\n", "\n", " Parameters:\n", " feature (str): The name of the feature to be processed.\n", " featureType (str): The type of the feature ('Node' or 'Edge').\n", " featureMethod (function): A function to obtain feature data.\n", "\n", " Returns:\n", " None: The function updates a global dictionary with processed feature data and does not return anything.\n", " \"\"\"\n", " \n", " # Obtain the meta data\n", " featureMetaData = featureMethod(feature).meta\n", " featureDescription = getFeatureDescription(featureMetaData)\n", " dataType = setDataType(featureMetaData)\n", "\n", " # Initialize dictionary to store feature frequency data\n", " featureFrequencyDict = {}\n", "\n", " # Skip for specific features based on type\n", " if not (featureType == 'Node' and feature == 'otype') and not (featureType == 'Edge' and feature == 'oslots'):\n", " for nodeType in F.otype.all:\n", " frequencyLists = featureMethod(feature).freqList(nodeType)\n", " \n", " # Calculate the total frequency\n", " if not isinstance(frequencyLists, int):\n", " frequencyTotal = sum(freq for _, freq in frequencyLists)\n", " else:\n", " frequencyTotal = frequencyLists\n", " \n", " # Calculate the number of entries\n", " if not isinstance(frequencyLists, int):\n", " numberOfEntries = len(frequencyLists)\n", " else:\n", " numberOfEntries = 1 if frequencyLists != 0 else 0\n", " # Check the length of the frequency table\n", " truncated = True if numberOfEntries > tableLimit else False\n", " \n", " if not isinstance(frequencyLists, int):\n", " if len(frequencyLists)!=0:\n", " featureFrequencyDict[nodeType] = {'nodetype': nodeType, 'freq': frequencyLists[:tableLimit], 'total': frequencyTotal, 'truncated': truncated}\n", " elif isinstance(frequencyLists, int):\n", " if frequencyLists != 0:\n", " featureFrequencyDict[nodeType] = {'nodetype': nodeType, 'freq': [(\"Link\", frequencyLists)], 'total': frequencyTotal, 'truncated': truncated}\n", "\n", " # Add processed feature data to the main dictionary\n", " featureDict[feature] = {'name': feature, 'descr': featureDescription, 'type': featureType, 'datatype': dataType, 'freqlist': featureFrequencyDict}\n", " \n", "########################################################\n", "# MAIN FUNCTION #\n", "########################################################\n", "\n", "########################################################\n", "# Gather general information #\n", "########################################################\n", "\n", "print('Gathering generic details')\n", "\n", "# Initialize default values\n", "corpusName = A.appName\n", "liveName = ''\n", "versionName = A.version\n", "\n", "# Trying to locate corpus information\n", "if A.provenance:\n", " for parts in A.provenance[0]: \n", " if isinstance(parts, tuple):\n", " key, value = parts[0], parts[1]\n", " if verbose: print (f'General info: {key}={value}')\n", " if key == 'corpus': corpusName = value\n", " if key == 'version': versionName = value\n", " # value for live is a tuple\n", " if key == 'live': liveName=value[1]\n", "if liveName is not None and len(liveName)>1:\n", " # an URL was found\n", " pageTitleMD = f'Doc4TF pages for [{corpusName}]({liveName}) (version {versionName})'\n", " pageTitleHTML = f'

Doc4TF pages for {corpusName} (version {versionName})

'\n", "else:\n", " # No URL found\n", " pageTitleMD = f'Doc4TF pages for {corpusName} (version {versionName})'\n", " pageTitleHTML = f'

Doc4TF pages for {corpusName} (version {versionName})

'\n", "\n", "# Overwrite in case user provided a title\n", "if 'customPageTitleMD_' in globals():\n", " pageTitleMD = customPageTitleMD\n", "if 'customPageTitleHTML' in globals():\n", " pageTitleHTML = customPageTitleHTML\n", "\n", " \n", "########################################################\n", "# Processing node features #\n", "########################################################\n", "\n", "print('Analyzing Node Features: ', end='')\n", "for nodeFeature in Fall():\n", " if not verbose: print('.', end='') # Progress indicator\n", " processFeature(nodeFeature, 'Node', Fs)\n", " if verbose: print(f'\\nFeature {nodeFeature} = {featureDict[nodeFeature]}\\n') # Print feature data if verbose\n", "\n", "########################################################\n", "# Processing edge features #\n", "########################################################\n", "\n", "print('\\nAnalyzing Edge Features: ', end='')\n", "for edgeFeature in Eall():\n", " if not verbose: print('.', end='') # Progress indicator\n", " processFeature(edgeFeature, 'Edge', Es)\n", " if verbose: print(f'\\nFeature {edgeFeature} = {featureDict[edgeFeature]}\\n') # Print feature data if verbose\n", "\n", "########################################################\n", "# Sorting feature dictionary #\n", "########################################################\n", "\n", "# Sort the feature dictionary alphabetically by keys\n", "sortedFeatureDict = {k: featureDict[k] for k in sorted(featureDict)}\n", "\n", "# Print the sorted feature dictionary if verbose\n", "if verbose:\n", " print(\"\\nSorted Feature Dictionary:\")\n", " for key, value in sortedFeatureDict.items():\n", " print(f\"Feature {key} = {value}\")\n", " \n", "print(f'\\nFinished in {time.time() - overallTime:.2f} seconds.')" ] }, { "cell_type": "markdown", "id": "a781c10a-7e95-46f2-a9b9-f94e725b5b3a", "metadata": {}, "source": [ "# 4 - Run the extra code \n", "##### [Back to ToC](#TOC)" ] }, { "cell_type": "markdown", "id": "e3018ede-7247-4b94-8aba-649b6e41b6c9", "metadata": {}, "source": [ "import json\n", "\n", "knowledgeGraph = {\n", " \"nodes\": {},\n", " \"edges\": []\n", "}\n", "\n", "for featName, featInfo in featureDict.items():\n", " # Determine if \"Node\" or \"Edge\" feature\n", " featureKind = featInfo.get(\"type\", \"Node\") # \"Node\" or \"Edge\"\n", " if featureKind.lower() == \"edge\":\n", " featureType = \"edge_feature\"\n", " else:\n", " featureType = \"node_feature\"\n", "\n", " # Build a namespaced key for this feature\n", " featureKey = f\"feature::{featName}\"\n", "\n", " # Make sure the feature node is in the graph\n", " nodeEntry = knowledgeGraph[\"nodes\"].setdefault(featureKey, {\n", " \"type\": featureType,\n", " \"valid_on\": []\n", " })\n", "\n", " # Store more metadata about the feature\n", " nodeEntry[\"featureName\"] = featInfo.get(\"name\", featName) # e.g. \"after\"\n", " nodeEntry[\"description\"] = featInfo.get(\"descr\", \"\") # e.g. \"material after the end of ...\"\n", " nodeEntry[\"datatype\"] = featInfo.get(\"datatype\", \"\") # e.g. \"String\"\n", "\n", " # Collect node types from the freqlist\n", " freqInfo = featInfo.get(\"freqlist\", {})\n", " for freqKey, freqDict in freqInfo.items():\n", " # freqKey might be \"phrase\", \"word\", etc.\n", " # freqDict has \"nodetype\": \"phrase\" (or \"word\"), plus \"freq\", \"total\", ...\n", " nodeTypeName = freqDict.get(\"nodetype\", freqKey)\n", "\n", " # Build a namespaced key for this node type\n", " nodeTypeKey = f\"otype::{nodeTypeName}\"\n", "\n", " # Make sure that node type is declared\n", " if nodeTypeKey not in knowledgeGraph[\"nodes\"]:\n", " knowledgeGraph[\"nodes\"][nodeTypeKey] = {\n", " \"type\": \"node_type\",\n", " \"origName\": nodeTypeName\n", " }\n", "\n", " # Record that this feature is valid on this node type\n", " if nodeTypeKey not in nodeEntry[\"valid_on\"]:\n", " nodeEntry[\"valid_on\"].append(nodeTypeKey)\n", "\n", " # Add an edge with frequency detail\n", " knowledgeGraph[\"edges\"].append({\n", " \"from\": featureKey,\n", " \"to\": nodeTypeKey,\n", " \"relation\": \"valid on\",\n", " \"freqDetail\": freqDict\n", " })\n", "\n", "# Output the JSON\n", "outputPath = \"n1904_knowledge_graph.json\"\n", "with open(outputPath, \"w\", encoding=\"utf-8\") as f:\n", " json.dump(knowledgeGraph, f, indent=2)\n", "\n", "print(f\"Knowledge graph saved to {outputPath}\")\n", "\n", "# Summary\n", "numNodeTypes = sum(1 for n, d in knowledgeGraph[\"nodes\"].items() if d[\"type\"] == \"node_type\")\n", "numFeatures = sum(1 for n, d in knowledgeGraph[\"nodes\"].items() if d[\"type\"].endswith(\"_feature\"))\n", "numEdges = len(knowledgeGraph[\"edges\"])\n", "print(f\" - Node types: {numNodeTypes}\")\n", "print(f\" - Features: {numFeatures}\")\n", "print(f\" - Edges: {numEdges}\")" ] }, { "cell_type": "markdown", "id": "1fa83632-db4a-49f6-8740-418183b986f2", "metadata": {}, "source": [ "# 5 - The result: a JSON Knowledge Graph \n", "##### [Back to ToC](#TOC)" ] }, { "cell_type": "markdown", "id": "80a020d1-5a7e-46a0-9c78-83ddcfc348f4", "metadata": {}, "source": [ "The resulting JSON is the actual Knowledge Graph which will be used as input for the [other notebook](generate_cytoscape_html.ipynb)." ] }, { "cell_type": "markdown", "id": "0eba060e-a680-4da1-a545-546584aa6214", "metadata": { "jp-MarkdownHeadingCollapsed": true }, "source": [ "# 6 - Notebook version details\n", "##### [Back to ToC](#TOC)\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AuthorTony Jurg
Version1.1
Date3 April 2025
\n", "
" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.7" } }, "nbformat": 4, "nbformat_minor": 5 }