The following bash script looks up the morphology of all Greek words in the Greek New Testament (GNT).
The following bash script accepts a text file containing all GNT words in Betacode as input. The script runs within a docker container, where it reads the words from the inputfile and pipes them using echo
to acruncher
; the Morpheus binary tool that performs the actual morphological analysis.
#!/bin/bash
# Variables
INPUT_FILE="gnt_words.txt" # Input file with GNT words
OUTPUT_FILE="gnt_morphology_results.txt" # Output file for morphology results
CRUNCHER_COMMAND="MORPHLIB=stemlib bin/cruncher" # Path to the cruncher command
# Check if the input file exists
if [[ ! -f "$INPUT_FILE" ]]; then
echo "Error: Input file '$INPUT_FILE' not found."
exit 1
fi
# Clear or create the output file
> "$OUTPUT_FILE"
echo "Processing words in '$INPUT_FILE'..."
# Process each word
while IFS= read -r word; do
# Run cruncher and capture the result
result=$(echo "$word" | $CRUNCHER_COMMAND -d)
# Append the result to the output file
if [[ -n "$result" ]]; then
echo "Word: $word" >> "$OUTPUT_FILE"
echo "$result" >> "$OUTPUT_FILE"
echo "-----------------------------" >> "$OUTPUT_FILE"
else
echo "Word: $word" >> "$OUTPUT_FILE"
echo "Error: No response for '$word'" >> "$OUTPUT_FILE"
echo "-----------------------------" >> "$OUTPUT_FILE"
fi
done < "$INPUT_FILE"
echo "Morphology lookup complete. Results saved to '$OUTPUT_FILE'."
Running the script (after opening terminal/containder console in Portainer). In my case the script is called process_words.sh
:
root@morpheus:/# /mnt/process_words.sh
Variables: The script contains the following variables: INPUT_FILE
which points to the file with the Greek words, OUTPUT_FILE
which provides the filename where to saves the morphological analysis, and CRUNCHER_COMMAND which defines the details for the
cruncher binary to be used in the
MORPHLIB` environment, its variable and desired options.
Processing Each Word: For each word in the input file, the script pipes the word into cruncher
using echo
and appends the result to the output file, along with the original inputted word.
Error Handling: If no result is returned, an error message (e.g., ‘Error: No response for ….’) is recorded in the output file.
First check that cruncher
works for instance by running echo 'a)/nqrwpos' | MORPHLIB=stemlib bin/cruncher -S
successfully from the terminal.
Prepare the input file, say gnt_words.txt
with one Greek word in betacode per line. Move the file over to the container.
Move the script over to the docker environment (in my case I named it process_words.sh
).
Important! Make the script executable inside the docker environment:
chmod +x process_words.sh
Now run the script in the directory where the script is stored (in my case /mnt
; the shared location with the Synology host. If you prefer to start from other location add path details):
./process_words.sh
When encountering wierd errors like bash: $'\r': command not found
when running your script, this may be (if you created your script on a Windows machine) due to the script using Windows line endings (\r\n), but Bash expects Unix/Linux line endings (\n). \r is a carriage return (from Windows-style line endings), and Bash doesn’t know what to do with it.
To fix it there are verious options (like using dos2unix) or let your texteditor like Notepad++ save is as a unix fileformat (Edit -> EOL Conversion -> Unix (LF)).
You can also fix it on the fly using streaming editor (sed):
sed -i 's/\r$//' process_words.sh
Note that a similar (but not identical) interaction is pressent when porting files following the mac line feed scheme to a Linux/Unix environment.
There are a few options to modify the behaviour of the CRUNCHER_COMMAND
. The two following tables are found in PerseusDL morpheus documentation which are here presented with minor changes.
The following are the commonly used command-line switches.
Switch | Use |
---|---|
-L | sets language to Latin |
-I | sets language to Italian |
-S | turn off Strict case. For Greek, allows words with an initial capital to be recognized, so that for example the personification *tu/xhs at Soph. OT 1080 is recognized as the genitive singular of tu/xh. For languages in the Roman alphabet, allows words with initial capital or in all capitals to be recognized. |
-n | ignore accents. Allows words with no accents or breathings, or with incorrect ones, to be recognized. |
The following other switches are supported.
Switch | Use |
---|---|
-d | database format. This switch changes the output from "Perseus format" to "database format." Output appears in a series of tagged fields. |
-e | ending index. Instead of showing the analysis in readable form, this switch gives the indices of the tense, mood, case, number, and so on (as appropriate) in the internal tables. |
-k | keep beta-code. When "Perseus format" is enabled (the default), this switch does nothing. When "Perseus format" is off, Greek output is normally converted to the old Greek Keys encoding. This switch disables that conversion so that Greek output stays in beta-code. Note that the handling of this switch was not updated when Latin was implemented, so when "Perseus format" is disabled, Latin and Italian will also be converted to this Greek font encoding. Hence if you are disabling Perseus format in those languages, you should also set the -k switch. |
-l | show lemma. When this switch is set, instead of printing the entire analysis, cruncher will only show the lemma or headword from which the given form is made. |
-P | turn off Perseus format. Output will be in the form
$feminam& is^M &from$ femina^M $fe_minam^M [&stem $fe_min-& ]^M & a_ae fem acc sg^MNote the returns, without line feeds, between the fields. |
-V | analyze Verbs only. When this switch is set, words that are not verbs will not be recognized, and words that could be analyzed as either verb forms or noun forms will be treated as certainly verbs |
The following switches, which appear in the main routine, do nothing. (note TJ: if you want to trace yourself, start at [stdiomorph.c](https://github.com/PerseusDL/morpheus/blob/master/src/anal/stdiomorph.c#L49))
Switch | Use |
---|---|
-a | sets the SHOW_ANAL flag, which is never checked |
-b | sets the BUFFER_ANALS flag, which is no longer checked |
-c | sets the CHECK_PREVERB flag, which is no longer checked |
-i | sets the SHOW_FULL_INFO flag, which is never checked |
-m | sets the SHOW_MISSES flag, which is never checked |
-p | sets the PARSE_FORMAT flag, which is unconditionally turned on later anyway |
-s | sets the DBASESHORT flag, which is checked only in a routine that is never called |
-x | sets the LEXICON_OUTPUT flag, which is checked only in a routine that is never called |
Author | Tony Jurg |
Version | 1.4 |
Date | 23 April 2025 |