diff options
Diffstat (limited to 'thesis/parts/implementation.tex')
-rw-r--r-- | thesis/parts/implementation.tex | 15 |
1 files changed, 8 insertions, 7 deletions
diff --git a/thesis/parts/implementation.tex b/thesis/parts/implementation.tex index 478280c..bc2802c 100644 --- a/thesis/parts/implementation.tex +++ b/thesis/parts/implementation.tex @@ -33,7 +33,7 @@ The library source can be found in \code{src/crates/library}. \code{VecMap} & A Vec of (K, V) tuples sorted by key, used as a Mapping \\ \code{HashMap} & Hash map with quadratic probing \\ \code{HashSet} & Hash map with empty values \\ - \code{BTreeMap} & B-Tree\citep{bayer_organization_1970} map with linear search. \\ + \code{BTreeMap} & B-Tree \citep{bayer_organization_1970} map with linear search. \\ \code{BTreeSet} & B-Tree map with empty values \\ \end{tabular} \caption{Implementations in our library} @@ -46,7 +46,7 @@ We also added new syntax to Primrose's domain-specific language to support defin While performing integration testing, we found and fixed several other issues with the existing code: \begin{enumerate} -\item Only push and pop operations could be modelled in properties. Ohter operations would raise an error during type-checking. +\item Only push and pop operations could be modelled in properties. Other operations would raise an error during type-checking. \item The Rosette code generated for properties using other operations was incorrect. \item Some trait methods used mutable borrows unnecessarily, making it difficult or impossible to write safe Rust using them. \item The generated code would perform an unnecessary heap allocation for every created container, which could affect performance. @@ -65,7 +65,7 @@ As Rust's generics are monomorphised, our generic code is compiled as if we were Each benchmark is run in a 'warmup' loop for a fixed amount of time (currently 500ms), then runs for a fixed number of iterations (currently 50). This is important because we are using least squares fitting - if there are less data points at higher $n$ values then our resulting model may not fit those points as well. -We repeat each benchmark at a range of $n$ values: $10, 50, 100, 250, 500, 1,000, 6,000, 12,000, 24,000, 36,000, 48,000, 60,000$. +We repeat each benchmark at a range of $n$ values: $10, 50, 100, 250, 500, 1000, 6000, 12000, 24000, 36000, 48000, 60000$. Each benchmark we run corresponds to one container operation. For most operations, we insert $n$ random values to a new container, then run the operation once per iteration. @@ -73,14 +73,15 @@ For certain operations which are commonly amortized (\code{insert}, \code{push}, As discussed previously, we discard all points that are outwith one standard deviation of the mean for each $n$ value. We use the least squares method to fit a polynomial of form $x_0 + x_1 n + x_2 n^2 + x_3 \log_2 n$. -As most operations on common data structures are polynomial or logarithmic complexity, we believe that least squares fitting is good enough to capture the cost of most operations. + +As most operations on common data structures are polynomial or logarithmic complexity, we believe that this function is good enough to capture the cost of most operations. We originally experimented with coefficients up to $x^3$, but found that this led to overfitting. \section{Profiling} -We implement profiling using a \code{ProfilerWrapper} type (\code{src/crates/library/src/profiler.rs}), which takes as type parameters the inner container implementation and an index, used later to identify what container type the output corresponds to. +We implement profiling using the \code{ProfilerWrapper} type (\code{src/crates/library/src/profiler.rs}), which takes as type parameters the inner container implementation and an index, used later to identify what container type the output corresponds to. We then implement any primrose traits that the inner container implements, counting the number of times each operation is called. -We also check the length of the container after each insertion operation, and track the maximum. +We also check the length of the container after each insert operation, and track the maximum. Tracking is done per-instance, and recorded when the container goes out of scope and its \code{Drop} implementation is called. We write the counts of each operation and maximum size of the collection to a location specified by an environment variable. @@ -109,7 +110,7 @@ In order to try and suggest an adaptive container, we use the following algorith \item Calculate the cost for each candidate in each partition individually. \item For each partition, find the best candidate and store it in the array \code{best}. Note that we don't sum across all partitions this time. \item Find the lowest index \code{i} where \code{best[i] != best[0]} -\item Check that \code{i} splits the list properly: For all \code{j < i}, \code{best[j] == best[0]} and for all \code{j>=i}, \code{best[j] == best[i]}. +\item Check that \code{i} splits the list properly: For all \code{j < i}, we require \code{best[j] == best[0]} and for all \code{j>=i}, we require \code{best[j] == best[i]}. \item Let \code{before} be the name of the candidate in \code{best[0]}, \code{after} be the name of the candidate in \code{best[i]}, and \code{threshold} be halfway between the maximum n values of partition \code{i} and partition \code{i-1}. \item Calculate the cost of switching as: $$ |