Building the most comprehensive, exact and open-source dataset of all physics results.
GitHub Repository FAQOur Aspiration: TheorIA aims to create a high-quality structured dataset containing all physics results—complete, exact, and open-source for the global physics community.
What Makes TheorIA Unique: There is a lack of structured physiscs datasets that systematically captures all physics results. TheorIA aims to serve both as a useful reference for researchers and as training data for next-generation AI models and LLMs in physics.
Current Status: We have many entries generated by AI that require expert curation and validation to meet our quality standards. We are also working on having a better structure for the dataset.
We Need You: We're seeking physicists to review and curate entries. Your expertise ensures accuracy, and your name will be permanently associated with every entry you review, building your scientific legacy.
Each entry of the dataset is a physics result, which is either validated by a physicist or generated by AI and looking for a specialist to improve it.
AsciiMath step by step, annotated for easier understanding and programmatic formalization to guarantee correctness.
One entry per file under entries/
folder, so you can fork, version, and collaborate without conflicts.
ArXiv‑style categories (e.g., gr-qc, hep-th) for easy filtering.
Entries include regime validity, historical context, dependencies between results, and other relevant metadata for comprehensive understanding.
Every entry includes an automatically generated Jupyter notebook that opens directly in Google Colab for interactive exploration and verification of the physics derivations.
CC‑BY 4.0
—use it, remix it, teach with it.
We are tiny right now, every new entry counts! Multiple ways to contribute:
User-friendly contribution system designed for scientists and researchers
Submit entries through structured issue forms
Fork the repository on GitHub and clone it to your local machine.
Create a JSON file in the entries/
folder following the
instructions in the CONTRIBUTING.md
file
and the schema in schemas/entry.schema.json
.
CI will automatically validate your JSON against the schema. If it passes, we will review and merge it.
You can either use the individual JSON files or automatically generate a single merged file using a script (e.g., with jq
):
jq -s '.' entries/*.json > dataset.json
Feed dataset.json
(or per‑entry files) straight into your training pipeline.
Licensed under CC-BY 4.0 License. If you use it please cite it as:
Loading citation...
Issues, questions, or exciting physics ideas? Drop an issue on GitHub—and let's build this thing together.