OpenPecha is an e-text and annotations store made available on GitHub and through a set of APIs.
The project’s primary aim is to facilitate the collection, proofreading, and enrichment of e-texts by leveraging language technology and collaboration.
Download a featured dataset
Get the latest OP datasets to train Tibetan language AI models.
Get the OP toolkit
Use the OpenPecha API
Harness the power of OpenPecha with the OP API.
Get the latest news
Read our blog to learn the latest from OpenPecha and the Tibetan AI space.
- Contains a dataset of more than 13,000 texts that is gradually increasing in quantity and quality through contributions from core members and apps that use our APIs
- Uses the opf (OpenPecha format), which uses standoff markdown in annotation layers to link to characters in a base text layer
- Includes a base layer, a table of contents layer, a footnotes layer, and a hyperlinks layer by default
- Supports virtually unlimited additional layers for witnesses, commentaries, layers of same-type tags, and more
- Supports changes to the base layer through OpenPecha's Character Coordinate Translation Vector (CCTV), which locks tags in annotation layers to characters in the base layer even as they move