TheorIA Dataset

Building the most comprehensive, exact and open-source dataset of all physics results.

Browse All Entries
GitHub Repository FAQ

Why TheorIA?

Our Aspiration: TheorIA aims to create a high-quality structured dataset containing all physics results—complete, exact, and open-source for the global physics community.

What Makes TheorIA Unique: There is a lack of structured physiscs datasets that systematically captures all physics results. TheorIA aims to serve both as a useful reference for researchers and as training data for next-generation AI models and LLMs in physics.

Current Status: We have many entries generated by AI that require expert curation and validation to meet our quality standards. We are also working on having a better structure for the dataset.

We Need You: We're seeking physicists to review and curate entries. Your expertise ensures accuracy, and your name will be permanently associated with every entry you review, building your scientific legacy.

What's Inside?

📋 Structured Entries

Each entry of the dataset is a physics result, which is either validated by a physicist or generated by AI and looking for a specialist to improve it.

🧮 Formal Derivations

AsciiMath step by step, annotated for easier understanding and programmatic formalization to guarantee correctness.

📁 Self-Contained JSON

One entry per file under entries/ folder, so you can fork, version, and collaborate without conflicts.

🏷️ Domain Tags

ArXiv‑style categories (e.g., gr-qc, hep-th) for easy filtering.

📚 Rich Context

Entries include regime validity, historical context, dependencies between results, and other relevant metadata for comprehensive understanding.

📓 Interactive Notebooks

Every entry includes an automatically generated Jupyter notebook that opens directly in Google Colab for interactive exploration and verification of the physics derivations.

🌐 Open License

CC‑BY 4.0—use it, remix it, teach with it.

How to Contribute

We are tiny right now, every new entry counts! Multiple ways to contribute:

🐙 GitHub Issues

Submit entries through structured issue forms

  • ✓ Direct integration
  • ✓ Community discussion
  • ✓ Automatic processing
Submit via GitHub
Advanced: Direct Development

1Fork and Clone

Fork the repository on GitHub and clone it to your local machine.

2Create a JSON Entry

Create a JSON file in the entries/ folder following the instructions in the CONTRIBUTING.md file and the schema in schemas/entry.schema.json.

3Submit a Pull Request

CI will automatically validate your JSON against the schema. If it passes, we will review and merge it.

Using TheorIA for Machine Learning

You can either use the individual JSON files or automatically generate a single merged file using a script (e.g., with jq):

jq -s '.' entries/*.json > dataset.json

Feed dataset.json (or per‑entry files) straight into your training pipeline.

License & Citation

Licensed under CC-BY 4.0 License. If you use it please cite it as:

Loading citation...

Contact

Issues, questions, or exciting physics ideas? Drop an issue on GitHub—and let's build this thing together.