Skip to content

Welcome

OpenPecha is an e-text and annotations store made available on GitHub and through a set of APIs.

The project’s primary aim is to facilitate the collection, proofreading, and enrichment of e-texts by leveraging language technology and collaboration.

  • Download a featured dataset


    Get the latest OP datasets to train Tibetan language AI models.

    Featured datasets

  • Get the OP toolkit


    Install the OpenPecha toolkit with pip and get up and running in minutes.

    OP toolkit

  • Use the OpenPecha API


    Harness the power of OpenPecha with the OP API.

    OpenPecha API

  • Get the latest news


    Read our blog to learn the latest from OpenPecha and the Tibetan AI space.

    OpenPecha blog

Key features

  • Contains a dataset of more than 13,000 texts that is gradually increasing in quantity and quality through contributions from core members and apps that use our APIs
  • Uses the opf (OpenPecha format), which uses standoff markdown in annotation layers to link to characters in a base text layer
  • Includes a base layer, a table of contents layer, a footnotes layer, and a hyperlinks layer by default
  • Supports virtually unlimited additional layers for witnesses, commentaries, layers of same-type tags, and more
  • Supports changes to the base layer through OpenPecha's Character Coordinate Translation Vector (CCTV), which locks tags in annotation layers to characters in the base layer even as they move