A curated, open and high-quality dataset of Theoretical Physics equations and derivations.
GitHub RepositoryThere is a lack of curated datasets with all important equations and derivations of theoretical physics to train better machine learning models, beyond raw text in papers and books.
TheorIA bridges that gap; it is the first dataset to provide a high-quality collection of theoretical physics equations, their derivations and explanations in a structured format.
🚀 Jump in! We've kicked off TheorIA with a handful of killer entries, and we need your help to grow it. If you're a theoretical‑physics buff—researcher, educator, or enthusiast—now's the time to contribute your favorite equations, crisp derivations, and clear explanations.
Each entry is crafted and carefully reviewed by someone with a background in physics. Contributor name or ORCID is baked into the metadata.
AsciiMath step by step, annotated for easier understanding and programmatic formalization to guarantee correctness.
One entry per file under
entries/
folder, so you can fork,
version, and collaborate without conflicts.
ArXiv‑style categories (e.g., gr-qc, hep-th) for easy filtering.
manifest.json
tracks versions, file
list, covered domains, and updates.
CC‑BY 4.0
—use it, remix it, teach with
it.
We are tiny right now, every new entry counts! To contribute:
Fork the repository on GitHub and clone it to your local machine.
Create a JSON file in the
entries/
folder following the
instructions in the CONTRIBUTING.md
file
and the schema in
schemas/entry.schema.json
.
CI will automatically validate your JSON against the schema. If it passes, we will review and merge it.
You can either use the individual JSON files or automatically generate
a single merged file using a script (e.g., with
jq
):
jq -s '.' entries/*.json > dataset.json
Feed dataset.json
(or per‑entry files)
straight into your training pipeline.
Licensed under CC-BY 4.0 License. If you use it please cite it as:
Loading citation...
Issues, questions, or exciting physics ideas? Drop an issue on GitHub—and let's build this thing together.