From e091c37926281dbd5bf58b249d2d8d1b370897f2 Mon Sep 17 00:00:00 2001 From: Aria Shrimpton Date: Fri, 29 Mar 2024 22:22:28 +0000 Subject: introduction, conclusion, and minor cleanup --- thesis/parts/design.tex | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'thesis/parts/design.tex') diff --git a/thesis/parts/design.tex b/thesis/parts/design.tex index fba4437..84643b1 100644 --- a/thesis/parts/design.tex +++ b/thesis/parts/design.tex @@ -138,11 +138,12 @@ We then perform regression, using the collection size $n$ to predict $t$. In the case of \code{Vec::contains}, we would expect the resulting polynomial to be roughly linear. In our implementation, we fit a function of the form $x_0 + x_1 n + x_2 n^2 + x_3 \log_2 n$, using regular least-squares fitting. +Before fitting, we discard all observations that are more than one standard deviation out from the mean for a given $n$ value. + Whilst we could use a more complex technique, in practice this is good enough: Most common operations are polynomial at worst, and more complex models risk overfitting. -\todo{mention discarding outliers} + %% Limitations This method works well for many operations and structures, although has notable limitations. - In particular, implementations which defer work from one function to another will be extremely inconsistent. For example, \code{LazySortedVec} (provided by Primrose) inserts new elements at the end by default, and waits to sort the list until the contents of the list are read from (such as by using \code{contains}). -- cgit v1.2.3