diff options
Diffstat (limited to 'thesis/parts/design.tex')
-rw-r--r-- | thesis/parts/design.tex | 5 |
1 files changed, 3 insertions, 2 deletions
diff --git a/thesis/parts/design.tex b/thesis/parts/design.tex index fba4437..84643b1 100644 --- a/thesis/parts/design.tex +++ b/thesis/parts/design.tex @@ -138,11 +138,12 @@ We then perform regression, using the collection size $n$ to predict $t$. In the case of \code{Vec::contains}, we would expect the resulting polynomial to be roughly linear. In our implementation, we fit a function of the form $x_0 + x_1 n + x_2 n^2 + x_3 \log_2 n$, using regular least-squares fitting. +Before fitting, we discard all observations that are more than one standard deviation out from the mean for a given $n$ value. + Whilst we could use a more complex technique, in practice this is good enough: Most common operations are polynomial at worst, and more complex models risk overfitting. -\todo{mention discarding outliers} + %% Limitations This method works well for many operations and structures, although has notable limitations. - In particular, implementations which defer work from one function to another will be extremely inconsistent. For example, \code{LazySortedVec} (provided by Primrose) inserts new elements at the end by default, and waits to sort the list until the contents of the list are read from (such as by using \code{contains}). |