TheorIA Dataset

A curated, open and high-quality dataset of Theoretical Physics equations and derivations.

View entries
GitHub Repository

Why TheorIA?

There is a lack of curated datasets with all important equations and derivations of theoretical physics to train better machine learning models, beyond raw text in papers and books.

TheorIA bridges that gap; it is the first dataset to provide a high-quality collection of theoretical physics equations, their derivations and explanations in a structured format.

🚀 Jump in! We've kicked off TheorIA with a handful of killer entries, and we need your help to grow it. If you're a theoretical‑physics buff—researcher, educator, or enthusiast—now's the time to contribute your favorite equations, crisp derivations, and clear explanations.

What's Inside?

High Quality

Each entry is crafted and carefully reviewed by someone with a background in physics. Contributor name or ORCID is baked into the metadata.

Formal Derivations

AsciiMath step by step, annotated for easier understanding and programmatic formalization to guarantee correctness.

Self-Contained JSON

One entry per file under entries/ folder, so you can fork, version, and collaborate without conflicts.

Domain Tags

ArXiv‑style categories (e.g., gr-qc, hep-th) for easy filtering.

Global Manifest

manifest.json tracks versions, file list, covered domains, and updates.

Open License

CC‑BY 4.0—use it, remix it, teach with it.

How to Contribute

We are tiny right now, every new entry counts! To contribute:

1Fork and Clone

Fork the repository on GitHub and clone it to your local machine.

2Create a JSON Entry

Create a JSON file in the entries/ folder following the instructions in the CONTRIBUTING.md file and the schema in schemas/entry.schema.json.

3Submit a Pull Request

CI will automatically validate your JSON against the schema. If it passes, we will review and merge it.

Using TheorIA for Machine Learning

You can either use the individual JSON files or automatically generate a single merged file using a script (e.g., with jq):

jq -s '.' entries/*.json > dataset.json

Feed dataset.json (or per‑entry files) straight into your training pipeline.

License & Citation

Licensed under CC-BY 4.0 License. If you use it please cite it as:

Loading citation...

Contact

Issues, questions, or exciting physics ideas? Drop an issue on GitHub—and let's build this thing together.