diff options
Diffstat (limited to 'thesis/parts/results.tex')
-rw-r--r-- | thesis/parts/results.tex | 38 |
1 files changed, 34 insertions, 4 deletions
diff --git a/thesis/parts/results.tex b/thesis/parts/results.tex index 17b6088..896fa9d 100644 --- a/thesis/parts/results.tex +++ b/thesis/parts/results.tex @@ -14,8 +14,9 @@ We start by looking at our generated cost models, and comparing them both to the As we build a total of 51 cost models from our library, we will not examine all of them. We look at ones for the most common operations, and group them by containers that are commonly selected together. -%% ** Insertion operations +\subsection{Insertion operations} Starting with the \code{insert} operation, Figure \ref{fig:cm_insert} shows how the estimated cost changes with the size of the container. +The lines correspond to our fitted curves, while the points indicate the raw observations they are drawn from. To help readability, we group these into regular \code{Container} implementations, and our associative key-value \code{Mapping} implementations. \begin{figure}[h] @@ -27,11 +28,40 @@ To help readability, we group these into regular \code{Container} implementation \label{fig:cm_insert} \end{figure} -%% ** Contains operations -%% ** Comment on some bad/weird ones +For \code{Vec}, we see that insertion is incredibly cheap, and gets slightly cheaper as the size of the container increases. +This is to be expected, as Rust's Vector implementation grows by a multiple whenever it reaches its maximum capacity, so we would expect amortised inserts to require less resizes as $n$ increases. -%% ** Conclusion +\code{LinkedList} has a more stable, but significantly slower insertion. +This is likely because it requires a heap allocation for every item inserted, no matter the current size. +This would also explain why data points appear spread out more, as it can be hard to predict the performance of kernel calls, even on systems with few other processes running. + +It's unsurprising that these two implementations are the cheapest, as they have no ordering or uniqueness guarantees, unlike our other implementations. + +\code{HashSet} insertions are the next most expensive, however the cost appears to rise as the size of the collection goes up. +This is likely due to hash collisions being more likely as the size of the collection increases. + +\code{BTreeSet} insertions are also expensive, however the cost appears to level out as the collection size goes up (a logarithmic curve). +It's important to note that Rust's \code{BTreeSet}s are not based on binary tree search, but instead a more general tree search originally proposed by R Bayer and E McCreight\parencite{bayer_organization_1970}, where each node contains $B-1$ to $2B-1$ elements in an array. +\todo{The standard library documentation states that searches are expected to take $B\log(n)$ comparisons on average\parencite{rust_documentation_team_btreemap_2024}, which would explain the logarithm-like growth.} + +Our two mapping types, \code{BTreeMap} and \code{HashMap}, mimic the behaviour of their set counterparts. + +Our two outlier containers, \code{SortedUniqueVec} and \code{SortedVec}, both have a substantially higher insertion cost which grows roughly linearly. +Internally, both of these containers perform a binary search to determine where the new element should go. +This would suggest we should see a roughly logarithmic complexity. +However, as we will be inserting most elements near the middle of a list, we will on average be copying half the list every time. +This could explain why we see a roughly linear growth. + +\todo{Graph this, and justify further} + +\subsection{Contains operations} + +We now examine the cost of the \code{contains} operation. + +\subsection{Outliers / errors} + +\subsection{Evaluation} %% * Predictions \section{Selections} |