The primary setting for this work is Symbolic Regression (SR), the task of deriving mathematical formula from observational data without any fore-knowledge of the domain or problem. In essence, this is the scientific process performed by a computer. Hypotheses are formulated, tested against the observations, and compared for explanatory value. Because the resulting models are mathematical formula, SR can assist domain experts in almost all fields.
The main contribution of this work is Prioritized Grammar Enumeration (PGE), a deterministic machine learning algorithm for solving Symbolic Regression. Working with a grammar’s rules, PGE prioritizes the enumeration of mathematical expressions in order to find the best fit model. By recognizing large overlaps in the search space and introducing mechanisms for memoization, PGE can exploring the space of all equations efficiently. Most notably, PGE provides reproducibility of results, a key aspect to any system used by scientists at large.
By the author
Symbolic Regression for mathematical discoveryMotivation
Overview, context, details, implementationsThe Problem - Definition, classes, applications
Standing on the shoulders of othersGenetic Programming - The original Symbolic Regression algorithm
Deterministic, reproducible, and reliable Symbolic RegressionTheory - Rethinking the Symbolic Regression problem.
Overcoming limitations in the original formulationDecoupling - Separating into scalable services
This chapter focuses on testing the PGE algorithmOverview - Synopsis for the benchmarks and purposes
Final thoughts and places to go
Open-source projects for PGE
Comparative results for the PyPGE, DEAP, and FFX Python packages
The system classes and equations used for benchmarkingExplicit Equations - Benchmarks from the SR literature
A quick reference