prev   top
#### Overview We'll first talk about measurements and benchmarking; what, why, and how we are measuring during experimentation. We'll then talk about the benchmarks; where they are drawn from, what they test, how they were generated and handled, as well as the environment in which benchmarking was performed. We'll then proceed through three groups of benchmarking tasks. The first group evaluates the orginial algorithm and provides the proof of concept. The second group explores how the algorithm performs under various conditions related to the data and numerical computations. This includes amount of data, features, and noise, as well as error metrics, normalization, and other data handling aspects. The third group of tasks demonstrate the efficacy of the enhancements to the PGE algorithm. We will conclude by applying the lessons learned on some difficult and open problems. A comparative evaluation against other SR implementations can be found in [Chapter 6](/sr/06-comparison/). ### Basic Experiments #### Proof of Concept #### Differential Equations | Problem | Solved | Iters | Models | Evals | I-Level | G-Level | F-Level | Functions | | -------------------- |:------:|:------:|:---------:|:---------:|:-------:|:-------:|:-------:|:---------:| | BacResp - dx | - | - | - | - | - | - | - | - | | BacResp - dy | - | - | - | - | - | - | - | - | | BarMags - dX | - | - | - | - | - | - | - | trig | | BarMags - dY | - | - | - | - | - | - | - | trig | | Glider - dv | - | - | - | - | - | - | - | trig | | Glider - dA | - | - | - | - | - | - | - | trig | | Ecoli - dG | - | - | - | - | - | - | - | - | | Ecoli - dA | - | - | - | - | - | - | - | - | | Ecoli - dL | - | - | - | - | - | - | - | - | | Lorenz - dx | - | - | - | - | - | - | - | - | | Lorenz - dy | - | - | - | - | - | - | - | - | | Lorenz - dz | - | - | - | - | - | - | - | - | | ShearFlow - dA | - | - | - | - | - | - | - | trig | | ShearFlow - dB | - | - | - | - | - | - | - | trig | | vanderPol - dx | - | - | - | - | - | - | - | - | | vanderPol - dy | - | - | - | - | - | - | - | - | | LotkaVolterra - dx | - | - | - | - | - | - | - | - | | LotkaVolterra - dy | - | - | - | - | - | - | - | - | | PredPrey - dx | - | - | - | - | - | - | - | - | | PredPrey - dy | - | - | - | - | - | - | - | - | | SimplePendulum - dA | - | - | - | - | - | - | - | trig | | SimplePendulum - dV | - | - | - | - | - | - | - | trig | | ChaoticPendulum - dA | - | - | - | - | - | - | - | trig | | ChaoticPendulum - dV | - | - | - | - | - | - | - | trig | #### Section Flow - Eqn types & Data sets - Setup & Settings - Explicit - Diffeq - 2-stage SR - Machines - Data Set details - How generated - pre-processing - Links to Appendix - Benchmark / Reproducibility - GP needs better - Designed for specific purpose - Diffeq, closer to real-world - Which benchmarks are good / bad, why? - Reproducibility, links to code & data - PGE basic - most basic settings, to show possible under ideal conditions - explicit & diffeq - no noise - different function settings - different init / grow levels - different selection methods - PGE settings - Setup & Settings - First formulation - various policies - initialization sets - expansion bases / subs - selection methods - Numerical / data related - data policies - amount of noise - quantities / coverage - error metrics (x4) - mae, mse, rmae, rmse - fitness metrics (x2) - regular, normalized ~ error metric - diffeqs - point evaluation vs integration for N points - PGE enhancements - Algorithm - Memoization - Searching (graph) - Parallel processing - Other analyses to consider (future work?) - Coefficient value inheritance - feature selection - data transformations, scaling?
top #### Measurement Details What measurements How measured What output will look like how success is determined - solution found or not - test with sympy - in final pareto - threshold reached - error - average error - explained variance - number of iterations - eqn/point evaluations - real-time - whole run - to solution - percent per phase in an iteration - search space coverage - number of equations tested - graph metrics - unique vs total generated
top #### Evaluation Settings This section describes the benchmarks and testing environment used during evaluation. ##### Benchmarks We use several benchmark groups to evaluate PGE. 1. Explicit benchmarks from SR literature 1. Regression problems from NIST website 1. Differential equations and dynamical systems 1. Self generated benchmarks for specific testing The majority of these datasets were chosen because there is prior published results, either in SR literature or based on a specific regression model. We next provide a brief description of each benchmark group. Details, charts, data, and code can be found in [Appendix 3](/sr/A3-benchmarks/). **Explicit problems** are drawn from the SR literature. The main source is a paper from titled ["GP Needs Better Benchmarks."](https://cs.gmu.edu/~sean/papers/gecco12benchmarks3.pdf) We select a subset of these problems, those that have been used more often in published work. **NIST problems** are taken from the NIST website. These benchmarks are used to test linear and non-linear regression techniques. Here we apply PGE to these same problems. **Differential equation** benchmarks are almost exclusively drawn from the work of Hod Lipson and his students. The Lipson group has demonstrated GP based SR on a wide range of dynamical systems. Here we show PGE can also solve these problems. **Self generated** benchmarks are the result of producing tests for stressing various aspects of the PGE system and SR implementations in general. ##### Execution Environment 1. Hardware specs 1. Linux distro, kernel version 1. Language / compiler versions 1. Package versions
top ### PGE Algorithm | Problem | SSC-T6 | SSC-T7 | PGE-1 | PGE-2 | PGE-3 | | ----------------|:-------- |:-------- |:-------------- |:-------------- |:-------------- | | Nguyen-01 | 0.0035 | 0.003 | **0.000003** | **0.000003** | **0.000003** | | Nguyen-02 | 0.0075 | 0.007 | **0.000004** | **0.000004** | **0.000004** | | Nguyen-03 | 0.009 | 0.0095 | **0.000003** | 0.094944 | **0.000003** | | Nguyen-04 | 0.013 | 0.011 | 0.107365 | 0.000005 | **0.000004** | | Nguyen-05 | 0.0045 | 0.0045 | 0.000068 | 0.000014 | **0.000001** | | Nguyen-06 | 0.0045 | 0.0035 | 0.011798 | 0.000446 | **0.000036** | | Nguyen-07 | 0.003 | 0.0035 | 0.002571 | 0.000351 | **0.000251** | | Nguyen-08 | 0.0065 | 0.005 | **0.000001** | **0.000001** | **0.000001** | | Nguyen-09 | 0.0264 | 0.0099 | 0.003898 | **0.000001** | 0.000270 | | Nguyen-10 | 0.0122 | 0.0066 | **0.000002** | 0.000004 | 0.000004 |

| Problem | AEG | -T2 | AEG | -T4 | PGE-1 | | PGE-2 | | PGE-3 | | | | error | equations | error | equations | error | equations | error | equations | error | equations | | -------------|:------ |:--------- |:------ |:--------- |:----------- |:--------- |:---------- |:--------- |:---------- |:--------- | | Korns-01 | 0.00 | .15K | 0.00 | .06K | 0.000000 | .17K | 0.000000 | .47K | 0.000000 | .35K | | Korns-02 | 0.00 | 3.26K | 0.00 | 113.00K | 0.027277 | 25.05K | 0.0055 | 34.07K | 0.1135 | 2.30K | | Korns-03 | 0.00 | 804.49K | 0.00 | 222.46K | 0.498 | 0.36K | 0.0065 | 29.00K | 0.1245 | 1.96K | | Korns-04 | 0.00 | .59K | 0.00 | .86K | 0.000000 | .17K | 0.000000 | 10.95K | 0.000000 | 28.06K | | Korns-05 | 0.00 | .25K | 0.00 | .16K | 0.000000 | .17K | 0.000000 | .48K | 0.000000 | .35K | | Korns-06 | 0.00 | .13K | 0.00 | .01K | 0.000000 | .17K | 0.000000 | .48K | 0.000000 | .35K | | Korns-07 | 0.00 | 187.26K | 0.00 | 4.10K | 0.031941 | 8.37K | 0.0075 | 53.08K | 0.058696 | 9.45K | | Korns-08 | 0.00 | 5.99K | 0.00 | 11.00K | 0.021827 | 19.34K | 0.000000 | 9.86K | 0.069829 | 54.18K | | Korns-09 | 0.00 | 97.24K | 0.00 | 116.81K | 0.1855 | 1.87K | 0.000000 | 7.14K | 0.0615 | .70K | | Korns-10 | 0.99 | 763.53K | 0.00 | 1.34M | 0.055193 | 4.42K | 0.008 | 78.19K | 0.107 | 1.65K | | Korns-11 | 1.00 | 774.89K | 0.00 | 4.7M | 0.493 | .17K | 0.0055 | 8.33K | 0.1195 | 2.62K | | Korns-12 | 1.04 | 812.79K | 1.00 | 16.7M | 0.117404 | 34.34K | 0.0065 | 44.27K | 0.124 | 1.67K | Diffeq Comparisons | Problem | GP-Time | GP-Evals | PGE-Time | PGE-Evals | |---------------|------------|-------------|-------------|-------------| | Glider-v | 10.219 | 1030M | 3.523 | 0.468M | | Glider-o | 5.062 | 500M | 0.895 | 0.223M | | BackResp-x | 74.047 | 7590M | 15.131 | 2.692M | | BackResp-y | 30.547 | 3090M | 21.929 | 3.825M | | PredPrey-x | 81.718 | 8260M | 94.879 | 9.008M | | PredPrey-y | 290.578 | 29380M | 81.592 | 8.323M | | BarMags-o1 | 11.750 | 1190M | 10.344 | 1.883M | | BarMags-o2 | 15.609 | 1580M | 3.551 | 1.005M | | ShearFlow-o | 3.562 | 360M | 0.216 | 0.168M | | ShearFlow-p | 33.859 | 3420M | 7.558 | 2.071M | | VanDerPol-x | 25.547 | 2580M | 13.779 | 1.544M | | VanDerPol-y | 0.859 | 90M | 0.354 | 0.089M | | LotkaVolt-x | 4.250 | 430M | 0.336 | 0.938M | | LotkaVolt-y | 1.063 | 110M | 0.449 | 0.952M |
Figure # - Diffeq Results
#### Decoupling Into Services After decoupling the services, the expansion phase became the most time-consuming part of the algorithm. The surmise reason for this is two-fold, though it warrants further investigation. First, the time required to process a model in the algebra and services became less than the time required to send messages, even when the other services are co-located. In addition to the network latency, there was additional overhead in our original messages due to using a human readable format for the equation over the wire. This required the equations to be printed and then parsed at both service ends. After converting the message to use the same integer serialization used in the memoization process, we saw a 20\% reduction in overall runtimes. % % Need to also parse int-sequence at python end %
Figure # - Percent in each Phase
top ### Data Related Issues data coverage of the domain - issue with problem domains, need extending or there is an inability to effectively distinguish between different solutions. - sampling - volume - features - function complexity - system complexity - noisy features - noisy channels - multiple data sets / system parameters - windowing - weighting samples - equation library?
top ### PGE Enhancements Settings: | Parameter | Value | | ------------- |------- | | Functions | none | | Function Level | n/a | | Initial Level | low | | Growing Level | low |
Figure # - 1D, Group 1
Figure # - 1D, Group 2

Settings: | Parameter | Value | | ------------- |------- | | Functions | sin,cos | | Function Level | linear | | Initial Level | low | | Growing Level | low |
#### Explicit Problems | Problem | Solved | Iters | Models | Evals | I-Level | G-Level | F-Level | Functions | | -------------- |:------:|:------:|:---------:|:---------:|:-------:|:-------:|:-------:|:---------:| | koza_01 | yes | 6 | 90 | 1.092M | low | low | - | - | | koza_02 | yes | 7 | 112 | 1.075M | low | low | - | - | | koza_03 | yes | 8 | 114 | 1.349M | low | low | - | - | | lipson_01 | yes | 2 | 48 | 0.408M | low | low | - | - | | lipson_02 | - | - | - | - | - | - | - | trig,extra| | lipson_03 | - | - | - | - | - | - | - | exp,trig | | -------------- | ------ | ------ | --------- | --------- | ------- | ------- | ------- | ------- | | nguyen_01 | yes | 3 | 60 | 0.557M | low | low | - | - | | nguyen_02 | yes | 6 | 85 | 0.900M | low | low | - | - | | nguyen_03 | - | - | - | - | - | - | - | - | | nguyen_04 | - | - | - | - | - | - | - | - | | nguyen_05 | - | - | - | - | - | - | - | trig | | nguyen_06 | - | - | - | - | - | - | - | trig | | nguyen_07 | - | - | - | - | - | - | - | exp,extra | | nguyen_08 | - | - | - | - | - | - | - | exp,extra | | -------------- | ------ | ------ | --------- | --------- | ------- | ------- | ------- | ------- | | nguyen_09 | - | - | - | - | - | - | - | trig | | nguyen_10 | - | - | - | - | - | - | - | trig | | nguyen_11 | NO | - | - | - | - | - | - | $$x^y$$ | | nguyen_12 | yes | 6 | 447 | 10.03M | low | low | - | - | | -------------- | ------ | ------ | --------- | --------- | ------- | ------- | ------- | ------- | | korns_01 | yes | 0 | 210 | 1.316M | low | low | - | - | | korns_02 | yes | 6 | 1575 | 38.87M | low | low | - | - | | korns_03 | yes | 5 | 1491 | 43.48M | low | low | - | - | | korns_04 | - | - | - | - | - | - | - | trig | | korns_05 | - | - | - | - | - | - | - | exp | | korns_06 | - | - | - | - | - | - | - | sqrt | | korns_07 | NO | - | - | - | - | - | - | $$x^y$$ | | korns_08 | - | - | - | - | - | - | - | sqrt | | korns_09 | - | - | - | - | - | - | - | exp,sqrt | | korns_10 | - | - | - | - | - | - | - | - | | korns_11 | - | - | - | - | - | - | - | trig | | korns_12 | - | - | - | - | - | - | - | trig | | korns_13 | - | - | - | - | - | - | - | trig,tan | | korns_14 | NO | - | - | - | - | - | - | tanh | | korns_15 | NO | - | - | - | - | - | - | $$x^y$$ | ">next   index

Enhancing PGE

Decoupling
Evaluation
Selection
Expansion
Enhanced PGE


This chapter describes our enhancements to the original PGE algorithm. After acheiving a proof-of-concept the limitations of the original implementation began to materialize. As with many iterative processes, each enhancement brought to light a new limitation. The enhancements described here follow the sequence in which they were implemented. in order to relate how solutions addressed one limitation and the limitations that then came to light.


top

Decoupling into Services

Evaluation is the most time consuming process in any SR implementation, generally exceeding 90% of the total time.

The main reasons for this are:

  1. Floating point calculations take longer
  2. Implementations generate many candidate models
  3. Volume of data for training is increasing
  4. The other phases tend not to require much time

One of the nice properties of SR, and evaluation, is that it has both data and functional parallelism. As a result, research into parallel and distributed GP has been extensive and fruitful (see Chapter 3).

We follow suit with PGE by splitting the algorithm into a set of services and deploying them to cloud infrastucture. We decoupled the PGE algorithm into three services:

  1. Main Search Loop
  2. Evaluation (for parallel benefits)
  3. Algebra (for a mature CAS)

We elected for the additional algebra service so that a mature Computer Algebra System (CAS) could be used. We chose SymPy because it had the desired functionality and was simple to integrate. We believe it also possible to decouple the expansion and memoization functionality into their own services. This is however, beyound the scope of this work. This enhancement was also made in conjunction with the last enhancement, waterfall evaluation.

Figure # - Decoupled PGE

The Three Services

Algebra Service

The original PGE algorithm internally implemented basic algebra operations to collect terms and perform simple rewrites. While decoupling this service, we opted to replace the internal algorithms with a third-party library. We chose Sympy for algebraic manipulations as it offers more mature and well tested set of functionalities.

Initially we used the string representation of an equation when sending messages between services. To alleviate some of the overhead we converted to using the same integer serialization as is used in the memoization process.

Evaluation Service

As evaluation is the most resource intensive part of the search, this was the first natural choice to decouple. Each PGE search will require several evaluator instances and so we chose to have them only concerned with a single data set at a time.

PGE uses a third-party library, Levmar, to perform non-linear fitting of a model’s parameters. PGE uses the analytical jacobian fitting function and thus requires the jacobian, w.r.t. the parameters, to be calculated. The evaluator service calls upon the algebra service to perform this operation on its behalf. Since both the subset and fullfit evaluation phases make use of the non-linear regression, they thus both require the jacobian to be available.

Each evaluation service instance is first sent the necessary information for setup, including the parameters and the dataset to be used for evaluation. They then sit in an event loop waiting for a request to evaluate a model.

Search Service

The search loop exists as its own service and is the main control-flow structure. It remains basically the same as the original search loop. The main difference is that function calls to the algebra and evalutation service now require network activity. During each iteration, PGE delegates to the appropriate services at the neccessary times.

Service Interactions

The algebra service is independent and does not depend on the other two services. The evaluation service depends upon the algebra service for calculating the analytical jacobian of a model w.r.t. the parameters. The search service depends upon both the algebra and evaluation service.

Each search service has its own set of evaluation service instances. The algebra services are more flexible, as they are independent. Several variations we explore are: 1) each service instance having its own dedicated algebra instance (additionally the search process can have multiple instances) 2) a search instance and its evaluator instances can share a pool of algebra services 3) multiple PGE search processes can share a larger pool of algebra service instances. We explore the trade-offs and appropriate situations for each case in the evaluation phase. The best setups are tied to both the problem at hand and the current implementation.

Our preliminary experiences in decoupling allowed us to reduce the original count of algebra service instances after converting messages to use the serialization of a model, as opposed to the human readable versions of the model. This was most likely the result of removing the string processing necessary in tokenization and parsing into the syntax tree.


top

Progressive Evaluation

One of the issues in PGE is that a significant amount of effort is spent evaluating models which end up with poor fit. A useful idea is to evaluate models on a subset of the data. This method has been shown to provide good estimates with as few as just 8 data points [ hod:08:coevo_fp ].

Figure # - PGE Progressive Evaluation

We use this idea of subsampling and introduce an extra stage of processing, which we call peeked evaluation. Peeked evaluation is used to provide a quick estimate of fitness from a small number of training points. The training data is uniformly sampled to the reduced data set.

With peeked evaluation, each equation must pass through an additional priority queue. The idea is to get a quick estimate of accuracy, and then prioritize full evaluation from these estimates. The main reason for incorporating peeked evaluation is to avoid fully evaluating models which will be poor candidates. Only models which are fully fit can be entered into the priority queue for expansion candidacy. In this way, there is an extra barrier to poor models producing more poor models.

Figure “peeked” below shows the psuedo code with the additional phases for evaluation and prioritization.

Figure #: PGE Peeked Flow
1
2
3
4
5
6
7
8
9
10
for each iteration:
    to_eval = FitQueue.Pop()
    fullfitEvaluate(to_eval)
    ExpandQueue.Push(to_eval)

    to_expand = ExpandQueue.Pop()
    expanded = expand(to_expand)
    unique = memoize(expanded)
    peekEvaluate(unique)
    FitQueue.Push(unique)

The loop is now composed of two pop-process-push sequences. At the begining of the iteration all equations have already been estimate under peeked evaluation. The FitQueue then has several models popped for full evaluation and subsequent pushing into the ExpandQueue.

Next the ExpandQueue is popped to select models to serve as the next expansion points. The new models are checked for uniqueness, estimated, and pushed into the FitQueue. Under this scheme, a single model can pass through all phases in a single iteration.

One interesting consequence is that the majority of coefficient tuning can happen in the peek evaluation phase. Initially, we make a guess of 1 for the coefficient values. Peek evaluation makes the best estimates on limited data. However, when a model makes it to full evaluation, it has much better starting points for the coefficient values. Often, the full fitting procedures need fewer passes because of this situation.


top

Selection Improvements

The original PGE algorithm used tree size and an error metric to calculate model fitness. It then used a simple Pareto sorting mechanism to sort the population. This section describes enhancements we make to the metrics, fitness, and selection processes.

More Model Metrics

Our experience showed that modelling error was often insufficient for effectively separating good models from the rest. We add several additional metrics for models which account for quality while penalizing complexity. This is separate from the complexity and accuracy tradeoff which happens at the global level. They do however have a benefical influence on the global selection and prioritization process.

Penalized complexity is a tree size calculation which adds a penalty to specific types of nodes. In our experiments, we applied it to function nodes from the triganomic and logarithmic families. The penalty accounts for increased relative computational complexity which we set at +2. The value, while effective, deserves further investigation than we provide.

Jacobian complexity is the summed tree size of the jacobians for the model. We also included the penalized version as well. This metric has multiple effect for minimizing complexity. It minimizes absolute, summed size. It minimizes the number of model parameters, because fewer terms means smaller sum. It also minimizes the nesting complexity of the original model. When coefficients are dependent upon one another, the jacobians become quite complex and drastically increase the value of this metric.

R-squared measures the goodness of fit for a model and attempts to account for the amount of variance explained. Values range from zero to one and can be interpreted as the percentage of variance the model explains. One issue with this metric is that extra variables can be included to increase its value. The adjusted R-squared addresses this by adding a penalty which increases with each extra variable [Taylor:1997:error].

reduced Chi-squared also measures the goodness of fit for a model. It includes the variance in the data and degrees of freedom. This effectively normalizes the value to the number of data points and model complexity [ Taylor:1997:error, Pearson:1900:chi ].

AIC/BIC measure the relative quality of fit for models. Akaike information criterion (AIC) and Bayesian information criterion (BIC) do not say anything about the absolute quality of the model. Rather they measure the relative quality of models against eachother. Both add a penalty term based on the number of model parameters. BIC penalizes complexity more and is thus the more conservative metric [ Akaike:1974:aic, Schwarz:1978:bic ].

Improvement over parent

Improvement over the parent refers to how much an accuracy or goodness of fit metric changes in an offspring, relative to the parent. We use a simple difference between the values, making sure to account for whether the metric is a minimizing or maximising one.

One interesting situation happens when we find a new path to an existing model. In this case, we can look to Dijkstra’s algorithm and relax to find a new relative improvement value.

Model Fitness

Model fitness is used to make relative comparisons during the selection process. The fitness for an individual model is usually a combination of complexity and accuracy in an attempt to regularize.

The original PGE algorithm used tree size and model accuracy, as determined by an error metric, as the components to the fitness metric. It then used a the basic Pareto non-dominated sort to prioritize the next models to be expanded.

In our enhancements, we made several changes in how we used the metrics in fitness. In addition to being able to use any of the new metrics, we added the ability to use multiple metrics at once, allowing the number of components to grow beyond just one for size and one for accuracy. We then added the ability to weight each component separately. This proved beneficial as we discovered the best policy was to weight the complexity measure so that it was equal to multiple accuracy metrics. We also found that competing certain metrics was beneficial, such as minimizing BIC while maximizing the improvement of BIC.

One of the more significant changes we made was to normalize each of the values across the set of models. For each metric being used, we normalized across the models, and then used the new value as the component to the fitness for the model. This removed the effect of scale of values on the relative comparisons made during selection. With out normalization, one of the components can easily dominate the solution. Additionally, the dominating component often changed as the search progressed. In example, as error tends towards zero the size becomes the dominate factor, and distinguishing between models on error gets more difficult for the algorithm.

Density based Pareto methods

The orginial PGE algorithm used a simlistic Pareto sorting algorithm. Research in GP has shown that accounting for denisity along the frontier, better choices can be made. Section ? in Chapter 3 discussed the various improved algorithms for Pareto selection. We settled on the NSGA-II algorithm for its good balance between effectiveness and computational complexity.

In our experiments, the addition of NSGA-II did not, by itself, improve the prioritization process. It was its combined usage when the fitness values contained more than two components. NSGA-II was able to distribute selection by accounting for density across all dimension, making the use of more dimensions beneficial.


top

Expansion Improvements

This section describes improvements to the expansion phase and functionality. The first changes enable multiple levels of complexity and context awareness when appling the expanions operators. We then describe two new expansion operators which were designed to work on summations. Finally, we enable multiple tiers of model expansion, thereby applying progressively complex expansion operations to progressively fewer models at a time.

Configurable complexity levels

The original PGE algorithm had a three different sets of expansion functions. They encompased three different complexities or scenarios. We alter this idea for greater grainularity. Each of the four operators can have its own complexity level. The low, medium, and high options provide flexibility while simplifying the choices for an end-user.

The following tables describe the the basis functions used for each of the operators. Their application remains the same as the original PGE algorithm.


Initialization

level result set
low
med
high


Variable Substitution

level result set
low
med
high


Multiplication Extension

level result set
low
med
high


Addition Extension

level result set
low
med
high

Context-aware

The original expansion polices were applied recursively at each node in the model tree. This was done without regard for the context in which these operations took place. This led to situations where trigonimic functions were embedded within eachother, such as . It also led to the production of ovely complex expressions involving coefficients and addition. In example, is more simply expressed as a single level summation. Additionally, it is much more difficult to fit the coefficients as they involve many nonlinear relationships.

We incorporate the idea of context-aware expansion policies to deal with undesirable situations and consequences. Here we include two types of context-awareness, functional and depth, though there are generally more options. Context aware operators are easily created with state traking during the recursion process. Our experience has been that their use implies restriction based on context. However, there may be situations where certain operations are only permitted within a particular context.

New Extension Operator

The additon and multiplication expansion policies extend the number of terms to the respective operators. They do this by drawing from a predefined pool of basis functions.

We make use of a new extension operator which only works at root level addition. This operator extends an addition with small modification to its current terms. First, each term is expanded with a set of basis terms. This is similar to the multiplication expansion. Then each of these terms is included in the summation, as opposed to replacing an existing term.

This operator makes larger movements around the search space, effectively being the combination of smaller modifications with several intermediate steps. We chose to only use this at root level addition because deeper recursion becomes overly complex very quickly.

New Shrinkage Operator

Under the original policies, there was no explicit method to go backwards, or decrease the size of a model. This can occur when a cancelling term is applied through an expansion operator, such as .

We make use of a new shrinkage operator which removes terms from additions. The operator recurses over the entire tree. When an addition is found, it produces all expressions possible, with one term removed.

The reason for adding this is to clean up after intermediary terms in models. During the search process, it is often the case that additonal terms are incorporated which either bridge a gap or are meant to deal with noise. This is particularly noticable with trigonimic functions whos output is in the range [-1:1].

ADD PICTURES HERE FROM ACTUAL BENCHMARKS / PROBLEMS

The shrinkage operator removes excessive terms from overfitting noise or from the intemediary terms needed during search progression. We exclude a similar operator for multiplication for two reasons. The first is that there would be a large combinatorial number of models generated from this process. Second, in most cases the models were or can be reached through the usual multiplication expansion process. That is, this process would be to similar to the multiplying by a cancelling term to make a significant difference.

Progressive expansion

Progressive expansion is an idea that a series of increasingly complex expansion schemes can be applied to a decreasing number of models. This follows along with the multiple tiers of refinement motif.

Once a model is fully fit during a search, it becomes a candidate for expansion. If selected through the heap prioritization, it then has the expansion methods applied.

Under the progressive expansion design, each step in the series has its own selection heap and its own set of expansion policies. When a model has been selected and expanded it is pushed into the next tier’s heap.

Figure # - PGE Progressive Expansion

Example schemes for progressive expansion are:

  1. Increasing complexity of modifications
  2. Limiting variables available at early tiers
  3. Reducing contextual restrictions
  4. Delaying incorporation of functions


top

Enhanced PGE

This chapter has described many enhancements to the original PGE algorithm. The diagram below depicts PGE with these enhancements. The loop remains the same however several phases have been internally extended with progressive application and intemediate heaps. The next chapter will show their effectiveness over a range of experiments.

Figure # - PGE Progressive Expansion

prev top
#### Overview We'll first talk about measurements and benchmarking; what, why, and how we are measuring during experimentation. We'll then talk about the benchmarks; where they are drawn from, what they test, how they were generated and handled, as well as the environment in which benchmarking was performed. We'll then proceed through three groups of benchmarking tasks. The first group evaluates the orginial algorithm and provides the proof of concept. The second group explores how the algorithm performs under various conditions related to the data and numerical computations. This includes amount of data, features, and noise, as well as error metrics, normalization, and other data handling aspects. The third group of tasks demonstrate the efficacy of the enhancements to the PGE algorithm. We will conclude by applying the lessons learned on some difficult and open problems. A comparative evaluation against other SR implementations can be found in [Chapter 6](/sr/06-comparison/). ### Basic Experiments #### Proof of Concept #### Differential Equations | Problem | Solved | Iters | Models | Evals | I-Level | G-Level | F-Level | Functions | | -------------------- |:------:|:------:|:---------:|:---------:|:-------:|:-------:|:-------:|:---------:| | BacResp - dx | - | - | - | - | - | - | - | - | | BacResp - dy | - | - | - | - | - | - | - | - | | BarMags - dX | - | - | - | - | - | - | - | trig | | BarMags - dY | - | - | - | - | - | - | - | trig | | Glider - dv | - | - | - | - | - | - | - | trig | | Glider - dA | - | - | - | - | - | - | - | trig | | Ecoli - dG | - | - | - | - | - | - | - | - | | Ecoli - dA | - | - | - | - | - | - | - | - | | Ecoli - dL | - | - | - | - | - | - | - | - | | Lorenz - dx | - | - | - | - | - | - | - | - | | Lorenz - dy | - | - | - | - | - | - | - | - | | Lorenz - dz | - | - | - | - | - | - | - | - | | ShearFlow - dA | - | - | - | - | - | - | - | trig | | ShearFlow - dB | - | - | - | - | - | - | - | trig | | vanderPol - dx | - | - | - | - | - | - | - | - | | vanderPol - dy | - | - | - | - | - | - | - | - | | LotkaVolterra - dx | - | - | - | - | - | - | - | - | | LotkaVolterra - dy | - | - | - | - | - | - | - | - | | PredPrey - dx | - | - | - | - | - | - | - | - | | PredPrey - dy | - | - | - | - | - | - | - | - | | SimplePendulum - dA | - | - | - | - | - | - | - | trig | | SimplePendulum - dV | - | - | - | - | - | - | - | trig | | ChaoticPendulum - dA | - | - | - | - | - | - | - | trig | | ChaoticPendulum - dV | - | - | - | - | - | - | - | trig | #### Section Flow - Eqn types & Data sets - Setup & Settings - Explicit - Diffeq - 2-stage SR - Machines - Data Set details - How generated - pre-processing - Links to Appendix - Benchmark / Reproducibility - GP needs better - Designed for specific purpose - Diffeq, closer to real-world - Which benchmarks are good / bad, why? - Reproducibility, links to code & data - PGE basic - most basic settings, to show possible under ideal conditions - explicit & diffeq - no noise - different function settings - different init / grow levels - different selection methods - PGE settings - Setup & Settings - First formulation - various policies - initialization sets - expansion bases / subs - selection methods - Numerical / data related - data policies - amount of noise - quantities / coverage - error metrics (x4) - mae, mse, rmae, rmse - fitness metrics (x2) - regular, normalized ~ error metric - diffeqs - point evaluation vs integration for N points - PGE enhancements - Algorithm - Memoization - Searching (graph) - Parallel processing - Other analyses to consider (future work?) - Coefficient value inheritance - feature selection - data transformations, scaling?
top #### Measurement Details What measurements How measured What output will look like how success is determined - solution found or not - test with sympy - in final pareto - threshold reached - error - average error - explained variance - number of iterations - eqn/point evaluations - real-time - whole run - to solution - percent per phase in an iteration - search space coverage - number of equations tested - graph metrics - unique vs total generated
top #### Evaluation Settings This section describes the benchmarks and testing environment used during evaluation. ##### Benchmarks We use several benchmark groups to evaluate PGE. 1. Explicit benchmarks from SR literature 1. Regression problems from NIST website 1. Differential equations and dynamical systems 1. Self generated benchmarks for specific testing The majority of these datasets were chosen because there is prior published results, either in SR literature or based on a specific regression model. We next provide a brief description of each benchmark group. Details, charts, data, and code can be found in [Appendix 3](/sr/A3-benchmarks/). **Explicit problems** are drawn from the SR literature. The main source is a paper from titled ["GP Needs Better Benchmarks."](https://cs.gmu.edu/~sean/papers/gecco12benchmarks3.pdf) We select a subset of these problems, those that have been used more often in published work. **NIST problems** are taken from the NIST website. These benchmarks are used to test linear and non-linear regression techniques. Here we apply PGE to these same problems. **Differential equation** benchmarks are almost exclusively drawn from the work of Hod Lipson and his students. The Lipson group has demonstrated GP based SR on a wide range of dynamical systems. Here we show PGE can also solve these problems. **Self generated** benchmarks are the result of producing tests for stressing various aspects of the PGE system and SR implementations in general. ##### Execution Environment 1. Hardware specs 1. Linux distro, kernel version 1. Language / compiler versions 1. Package versions
top ### PGE Algorithm | Problem | SSC-T6 | SSC-T7 | PGE-1 | PGE-2 | PGE-3 | | ----------------|:-------- |:-------- |:-------------- |:-------------- |:-------------- | | Nguyen-01 | 0.0035 | 0.003 | **0.000003** | **0.000003** | **0.000003** | | Nguyen-02 | 0.0075 | 0.007 | **0.000004** | **0.000004** | **0.000004** | | Nguyen-03 | 0.009 | 0.0095 | **0.000003** | 0.094944 | **0.000003** | | Nguyen-04 | 0.013 | 0.011 | 0.107365 | 0.000005 | **0.000004** | | Nguyen-05 | 0.0045 | 0.0045 | 0.000068 | 0.000014 | **0.000001** | | Nguyen-06 | 0.0045 | 0.0035 | 0.011798 | 0.000446 | **0.000036** | | Nguyen-07 | 0.003 | 0.0035 | 0.002571 | 0.000351 | **0.000251** | | Nguyen-08 | 0.0065 | 0.005 | **0.000001** | **0.000001** | **0.000001** | | Nguyen-09 | 0.0264 | 0.0099 | 0.003898 | **0.000001** | 0.000270 | | Nguyen-10 | 0.0122 | 0.0066 | **0.000002** | 0.000004 | 0.000004 |

| Problem | AEG | -T2 | AEG | -T4 | PGE-1 | | PGE-2 | | PGE-3 | | | | error | equations | error | equations | error | equations | error | equations | error | equations | | -------------|:------ |:--------- |:------ |:--------- |:----------- |:--------- |:---------- |:--------- |:---------- |:--------- | | Korns-01 | 0.00 | .15K | 0.00 | .06K | 0.000000 | .17K | 0.000000 | .47K | 0.000000 | .35K | | Korns-02 | 0.00 | 3.26K | 0.00 | 113.00K | 0.027277 | 25.05K | 0.0055 | 34.07K | 0.1135 | 2.30K | | Korns-03 | 0.00 | 804.49K | 0.00 | 222.46K | 0.498 | 0.36K | 0.0065 | 29.00K | 0.1245 | 1.96K | | Korns-04 | 0.00 | .59K | 0.00 | .86K | 0.000000 | .17K | 0.000000 | 10.95K | 0.000000 | 28.06K | | Korns-05 | 0.00 | .25K | 0.00 | .16K | 0.000000 | .17K | 0.000000 | .48K | 0.000000 | .35K | | Korns-06 | 0.00 | .13K | 0.00 | .01K | 0.000000 | .17K | 0.000000 | .48K | 0.000000 | .35K | | Korns-07 | 0.00 | 187.26K | 0.00 | 4.10K | 0.031941 | 8.37K | 0.0075 | 53.08K | 0.058696 | 9.45K | | Korns-08 | 0.00 | 5.99K | 0.00 | 11.00K | 0.021827 | 19.34K | 0.000000 | 9.86K | 0.069829 | 54.18K | | Korns-09 | 0.00 | 97.24K | 0.00 | 116.81K | 0.1855 | 1.87K | 0.000000 | 7.14K | 0.0615 | .70K | | Korns-10 | 0.99 | 763.53K | 0.00 | 1.34M | 0.055193 | 4.42K | 0.008 | 78.19K | 0.107 | 1.65K | | Korns-11 | 1.00 | 774.89K | 0.00 | 4.7M | 0.493 | .17K | 0.0055 | 8.33K | 0.1195 | 2.62K | | Korns-12 | 1.04 | 812.79K | 1.00 | 16.7M | 0.117404 | 34.34K | 0.0065 | 44.27K | 0.124 | 1.67K | Diffeq Comparisons | Problem | GP-Time | GP-Evals | PGE-Time | PGE-Evals | |---------------|------------|-------------|-------------|-------------| | Glider-v | 10.219 | 1030M | 3.523 | 0.468M | | Glider-o | 5.062 | 500M | 0.895 | 0.223M | | BackResp-x | 74.047 | 7590M | 15.131 | 2.692M | | BackResp-y | 30.547 | 3090M | 21.929 | 3.825M | | PredPrey-x | 81.718 | 8260M | 94.879 | 9.008M | | PredPrey-y | 290.578 | 29380M | 81.592 | 8.323M | | BarMags-o1 | 11.750 | 1190M | 10.344 | 1.883M | | BarMags-o2 | 15.609 | 1580M | 3.551 | 1.005M | | ShearFlow-o | 3.562 | 360M | 0.216 | 0.168M | | ShearFlow-p | 33.859 | 3420M | 7.558 | 2.071M | | VanDerPol-x | 25.547 | 2580M | 13.779 | 1.544M | | VanDerPol-y | 0.859 | 90M | 0.354 | 0.089M | | LotkaVolt-x | 4.250 | 430M | 0.336 | 0.938M | | LotkaVolt-y | 1.063 | 110M | 0.449 | 0.952M |
Figure # - Diffeq Results
#### Decoupling Into Services After decoupling the services, the expansion phase became the most time-consuming part of the algorithm. The surmise reason for this is two-fold, though it warrants further investigation. First, the time required to process a model in the algebra and services became less than the time required to send messages, even when the other services are co-located. In addition to the network latency, there was additional overhead in our original messages due to using a human readable format for the equation over the wire. This required the equations to be printed and then parsed at both service ends. After converting the message to use the same integer serialization used in the memoization process, we saw a 20\% reduction in overall runtimes. % % Need to also parse int-sequence at python end %
Figure # - Percent in each Phase
top ### Data Related Issues data coverage of the domain - issue with problem domains, need extending or there is an inability to effectively distinguish between different solutions. - sampling - volume - features - function complexity - system complexity - noisy features - noisy channels - multiple data sets / system parameters - windowing - weighting samples - equation library?
top ### PGE Enhancements Settings: | Parameter | Value | | ------------- |------- | | Functions | none | | Function Level | n/a | | Initial Level | low | | Growing Level | low |
Figure # - 1D, Group 1
Figure # - 1D, Group 2

Settings: | Parameter | Value | | ------------- |------- | | Functions | sin,cos | | Function Level | linear | | Initial Level | low | | Growing Level | low |
#### Explicit Problems | Problem | Solved | Iters | Models | Evals | I-Level | G-Level | F-Level | Functions | | -------------- |:------:|:------:|:---------:|:---------:|:-------:|:-------:|:-------:|:---------:| | koza_01 | yes | 6 | 90 | 1.092M | low | low | - | - | | koza_02 | yes | 7 | 112 | 1.075M | low | low | - | - | | koza_03 | yes | 8 | 114 | 1.349M | low | low | - | - | | lipson_01 | yes | 2 | 48 | 0.408M | low | low | - | - | | lipson_02 | - | - | - | - | - | - | - | trig,extra| | lipson_03 | - | - | - | - | - | - | - | exp,trig | | -------------- | ------ | ------ | --------- | --------- | ------- | ------- | ------- | ------- | | nguyen_01 | yes | 3 | 60 | 0.557M | low | low | - | - | | nguyen_02 | yes | 6 | 85 | 0.900M | low | low | - | - | | nguyen_03 | - | - | - | - | - | - | - | - | | nguyen_04 | - | - | - | - | - | - | - | - | | nguyen_05 | - | - | - | - | - | - | - | trig | | nguyen_06 | - | - | - | - | - | - | - | trig | | nguyen_07 | - | - | - | - | - | - | - | exp,extra | | nguyen_08 | - | - | - | - | - | - | - | exp,extra | | -------------- | ------ | ------ | --------- | --------- | ------- | ------- | ------- | ------- | | nguyen_09 | - | - | - | - | - | - | - | trig | | nguyen_10 | - | - | - | - | - | - | - | trig | | nguyen_11 | NO | - | - | - | - | - | - | $$x^y$$ | | nguyen_12 | yes | 6 | 447 | 10.03M | low | low | - | - | | -------------- | ------ | ------ | --------- | --------- | ------- | ------- | ------- | ------- | | korns_01 | yes | 0 | 210 | 1.316M | low | low | - | - | | korns_02 | yes | 6 | 1575 | 38.87M | low | low | - | - | | korns_03 | yes | 5 | 1491 | 43.48M | low | low | - | - | | korns_04 | - | - | - | - | - | - | - | trig | | korns_05 | - | - | - | - | - | - | - | exp | | korns_06 | - | - | - | - | - | - | - | sqrt | | korns_07 | NO | - | - | - | - | - | - | $$x^y$$ | | korns_08 | - | - | - | - | - | - | - | sqrt | | korns_09 | - | - | - | - | - | - | - | exp,sqrt | | korns_10 | - | - | - | - | - | - | - | - | | korns_11 | - | - | - | - | - | - | - | trig | | korns_12 | - | - | - | - | - | - | - | trig | | korns_13 | - | - | - | - | - | - | - | trig,tan | | korns_14 | NO | - | - | - | - | - | - | tanh | | korns_15 | NO | - | - | - | - | - | - | $$x^y$$ | ">next (Experiments)