Frequently Asked Questions (FAQ)

Why TheorIA Dataset?

TheorIA Dataset addresses the need for a high-quality, structured collection of theoretical physics derivations that are self-contained and mathematically rigorous. Unlike scattered resources across textbooks and papers, this dataset provides a unified format where each entry includes complete derivations, programmatic verification, and clear explanations suitable for machine learning applications and educational purposes.

For example, in Wikipedia each physics entry has a different structure, the derivations are not clearly located, there is a lot of context - we are reducing it to the strictly minimum necessary needed to understand the concept and where it comes from.

How can TheorIA Dataset be used?

Will this remain a free initiative?

Yes, absolutely. TheorIA Dataset is released under the CC-BY-4.0 license and will always remain freely available to the academic community, researchers, educators, and anyone interested in theoretical physics. This is a commitment to open science and knowledge sharing.

How can I participate?

There are several ways to contribute:

See our Contributing Guidelines for detailed instructions on how to get started.

Why are there so many draft entries?

The draft entries are AI-generated and serve as a starting point for the dataset. However, they do not yet meet the quality standards we want to achieve for the final dataset. Current AI models are not as good yet at producing rigorous physics derivations, and that is, in part, the point of this collective effort - to create high-quality training data that can help improve future AI systems in physics. This is precisely where people with knowledge in physics are needed - to review, refine, and validate these entries to ensure they meet our rigorous academic standards. Your expertise in physics is crucial for transforming these drafts into high-quality, peer-reviewed entries.

What makes a good entry?

A good entry should be:

Most importantly, one should follow the CONTRIBUTING guidelines strictly to ensure consistency and quality across all entries.

How do I know if my entry is correct?

Within the repository there are automatic ways to ensure that the entry meets some minimum format and types, it is called the schema. We also ensure that the programmatic verification runs without error.

If you are adding or modifying an entry by cloning the repository you can use:

If you are doing it through the forms in the webpage, we will run them for you and keep some communication via email.

Can I suggest improvements to existing entries?

Yes! You can:

What physics domains are included?

We follow the arXiv taxonomy for domain classification, including but not limited to:

How is quality ensured?