# Mapping between Unicode and Betacode

## Table of content (ToC)<a class="anchor" id="TOC"></a>
* <a href="#bullet1">1 - What is betacode?</a>
* <a href="#bullet2">2 - The beta-code python library</a>
* <a href="#bullet3">3 - Instalation</a>
* <a href="#bullet4">4 - Example usage</a>
* <a href="#bullet5">5 - Remarks</a>
* <a href="#bullet6">6 - Notebook version details</a>

# 1 - What is betacode? <a class="anchor" id="bullet1"></a>
##### [Back to ToC](#TOC)

Beta Code is a system developed in the 1980s for representing ancient Greek (and some other special scripts) using only regular ASCII characters (A–Z, punctuation).Accents, breathings, and other diacritics are shown using special symbols like =, /, \, ), (.

It was created to make it possible to type, store, and search Greek texts on computers that could not handle Greek letters directly. 

Beta Code was mainly used in projects like the Thesaurus Linguae Graecae (TLG). Today, Unicode has made Beta Code mostly obsolete for display, but it's still used for certain tools and databases (like Morpheus).

# 2 - The beta-code python library <a class="anchor" id="bullet2"></a>
##### [Back to ToC](#TOC)

A small python lib is available that can map unicode Greek into betacode and vise versa. The code is available via [github.com/perseids-tools/beta-code-py](https://github.com/perseids-tools/beta-code-py).

The JSON file with the mapping (which is the core of the beta-code package) can be found [here](https://github.com/perseids-tools/beta-code-json/blob/master/unicode_to_beta_code.json).

# 3 - Installation <a class="anchor" id="bullet3"></a>
##### [Back to ToC](#TOC)

In your Python environment:

```
pip install beta-code
```
Response:
```
Collecting beta-code
  Downloading beta_code-1.1.0-py3-none-any.whl.metadata (1.9 kB)
Downloading beta_code-1.1.0-py3-none-any.whl (8.4 kB)
Installing collected packages: beta-code
Successfully installed beta-code-1.1.0
```

# 4 - Example usage <a class="anchor" id="bullet4"></a>
##### [Back to ToC](#TOC)

First load the lib:

In [1]:
import beta_code

Mapping from Unicode to Betacode is as simple as :

In [2]:
beta_code.greek_to_beta_code(u'χαῖρε ὦ κόσμε')

'xai=re w)= ko/sme'

Likewise mapping from Betacode to Unicode:


In [3]:
beta_code.beta_code_to_greek('xai=re w)= ko/sme')

'χαῖρε ὦ κόσμε'

# 5 - Remarks <a class="anchor" id="bullet5"></a>
##### [Back to ToC](#TOC)

The `u` in `u'χαῖρε ὦ κόσμε'` simply means "this is a Unicode string". In Python 2, you had two types of strings: 'abc' → byte string (raw bytes, not necessarily Unicode) and u'abc' → Unicode string (actual text characters). Since Greek characters are not plain ASCII, in Python 2.x one needed to add the `u`. In Python 3 (which is standard today), all strings are regarded Unicode by default, so the `u` is not needed anymore. I did put it still in there as in Python 3, it's optional, so the example should work both in Python 2 and in Python 3.

# 6 - Notebook version details<a class="anchor" id="bullet6"></a>
##### [Back to ToC](#TOC)

<div style="float: left;">
  <table>
    <tr>
      <td><strong>Author</strong></td>
      <td>Tony Jurg</td>
    </tr>
    <tr>
      <td><strong>Version</strong></td>
      <td>1.2</td>
    </tr>
    <tr>
      <td><strong>Date</strong></td>
      <td>29 April 2025</td>
    </tr>
  </table>
</div>