%% * Testing setup, benchmarking rationale \section{Testing setup} %% ** Specs and VM setup In order to ensure consistent results and reduce the chance of outliers, all benchmarks were run on a KVM virtual machine on server hardware. We used 4 cores of an Intel Xeon E5-2687Wv4 CPU, and 4GiB of RAM. %% ** Reproducibility The VM was managed and provisioned using NixOS, meaning it can be easily reproduced with the exact software we used. \section{Cost models} We start by looking at our generated cost models, and comparing them both to the observations they are based on, and what we expect from asymptotic analysis. As we build a total of 51 cost models from our library, we will not examine all of them. We look at ones for the most common operations, and group them by containers that are commonly selected together. \subsection{Insertion operations} Starting with the \code{insert} operation, Figure \ref{fig:cm_insert} shows how the estimated cost changes with the size of the container. The lines correspond to our fitted curves, while the points indicate the raw observations they are drawn from. To help readability, we group these into regular \code{Container} implementations, and our associative key-value \code{Mapping} implementations. \begin{figure}[h!] \centering \includegraphics[width=10cm]{assets/insert.png} \caption{Estimated cost of insert operation by implementation} \label{fig:cm_insert} \end{figure} For \code{Vec}, we see that insertion is incredibly cheap, and gets slightly cheaper as the size of the container increases. This is to be expected, as Rust's Vector implementation grows by a multiple whenever it reaches its maximum capacity, so we would expect amortised inserts to require less resizes as $n$ increases. \code{LinkedList} has a more stable, but significantly slower insertion. This is likely because it requires a heap allocation for every item inserted, no matter the current size. This would also explain why data points appear spread out more, as it can be hard to predict the performance of kernel calls, even on systems with few other processes running. It's unsurprising that these two implementations are the cheapest, as they have no ordering or uniqueness guarantees, unlike our other implementations. \code{HashSet} insertions are the next most expensive, however the cost appears to rise as the size of the collection goes up. This is likely due to hash collisions being more likely as the size of the collection increases. \code{BTreeSet} insertions are also expensive, however the cost appears to level out as the collection size goes up (a logarithmic curve). It's important to note that Rust's \code{BTreeSet}s are not based on binary tree search, but instead a more general tree search originally proposed by R Bayer and E McCreight\citep{bayer_organization_1970}, where each node contains $B-1$ to $2B-1$ elements in an array. Our two mapping types, \code{BTreeMap} and \code{HashMap}, mimic the behaviour of their set counterparts. Our two outlier containers, \code{SortedUniqueVec} and \code{SortedVec}, both have a substantially higher insertion cost which grows roughly linearly. Internally, both of these containers perform a binary search to determine where the new element should go. This would suggest we should see a roughly logarithmic complexity. However, as we will be inserting most elements near the middle of a list, we will on average be copying half the list every time. This could explain why we see a roughly linear growth. \todo{This explanation could be better} \subsection{Contains operations} We now examine the cost of the \code{contains} operation. Figure \ref{fig:cm_contains} shows our built cost models. These are grouped for readability, with the first graph showing sets and sorted lists, the second showing sets and sorted lists, and the third showing key-value mappings. Notably, the observations in these graphs have a much wider spread than our \code{insert} operations do. This is probably because we attempt to get a different random element in our container every time, so our observations show the best and worst case of our data structures. This is desirable assuming that \code{contains} operations are actually randomly distributed in the real world, which seems likely. \begin{figure}[h!] \centering \includegraphics[width=10cm]{assets/contains.png} \caption{Estimated cost of \code{contains} operation by implementation} \label{fig:cm_contains} \end{figure} Both \code{LinkedList} and \code{Vec} implementations have roughly linear growth, which makes sense as these are not kept ordered. \code{LinkedList} has a significantly higher cost at all points, and a wider spread of outliers. This makes sense as each item in a linked list is not guaranteed to be in the same place in memory, so traversing them is likely to be more expensive, making the best and worst cases further apart. Some of the spread could also be explained by heap allocations being put in different locations in memory, with less or more locality between each run. \code{SortedVec} and \code{SortedUniqueVec} both exhibit a wide spread of observations, with what looks like a roughly linear growth. Looking at the raw output, we find the following equations being used for each cost model: \begin{align*} C(n) &\approx 22.8 + 4.6\log_2 n + 0.003n - (1 * 10^{-9}) * n^2 & \textrm{SortedVec} \\ C(n) &\approx -5.9 + 8.8\log_2 n - (4 * 10^{-5}) n - (3 * 10^{-8}) * n^2 & \textrm{SortedUniqueVec} \end{align*} As both of these implementations use a binary search for \code{contains}, the dominating logarithmic factors are expected. This is possibly a case of overfitting, as the observations for both implementations also have a wide spread. \code{HashSet} appears roughly linear as expected, with only a slow logarithmic rise, probably due to an increasing amount of collisions. \code{BTreeSet} is consistently above it, with a slightly higher logarithmic rise. The standard library documentation states that searches are expected to take $B\log(n)$ comparisons on average\citep{rust_documentation_team_btreemap_2024}, which is in line with observations. \code{BTreeMap} and \code{HashMap} both mimic their set counterparts, though are more expensive in most places. This is probably due to the increased size more quickly exhausting CPU cache. \subsection{Evaluation} In the cost models we examined, we found that most were in line with our expectations. Although we will not examine them in detail, we briefly describe observations from the rest of the built cost models: \begin{enumerate} \item Our models for \code{push} and \code{pop} operations are pretty much the same as for \code{insert} operations, as they are the same inner implementation. \item \code{first}, \code{last}, and \code{nth} operations show the time complexity we expect. However, some overfitting appears to occur, meaning our cost models may not generalise as well outside of the range of n values they were benchmarked with. \end{enumerate} Overall, our cost models appear to be a good representation of each implementations performance impact. Future improvements could address the overfitting problems some operations had, either by pre-processing the data to detect and remove outliers, or by employing a more complex fitting procedure. %% * Predictions \section{Selections} \subsection{Benchmarks} %% ** Chosen benchmarks Our test cases broadly fall into two categories: Example cases, which just repeat a few operations many times, and our 'real' cases, which are implementations of common algorithms and solutions to programming puzles. We expect the results from our example cases to be relatively unsurprising, while our real cases are more complex and harder to predict. Most of our real cases are solutions to puzzles from Advent of Code\citep{wastl_advent_2015}, a popular collection of programming puzzles. Table \ref{table:test_cases} lists and briefly describes our test cases. \begin{table}[h!] \centering \begin{tabular}{|c|c|} Name & Description \\ \hline example\_sets & Repeated insert and contains on a set. \\ example\_stack & Repeated push and pop from a stack. \\ example\_mapping & Repeated insert and get from a mapping. \\ prime\_sieve & Sieve of eratosthenes algorithm. \\ aoc\_2021\_09 & Flood-fill like algorithm (Advent of Code 2021, Day 9) \\ aoc\_2022\_08 & Simple 2D raycasting (AoC 2022, Day 8) \\ aoc\_2022\_09 & Simple 2D soft-body simulation (AoC 2022, Day 9) \\ aoc\_2022\_14 & Simple 2D particle simulation (AoC 2022, Day 14) \\ \end{tabular} \caption{Our test applications} \label{table:test_cases} \end{table} %% ** Effect of selection on benchmarks (spread in execution time) Table \ref{table:benchmark_spread} shows the difference in benchmark results between the slowest possible assignment of containers, and the fastest. Even in our example projects, we see that the wrong choice of container can slow down our programs substantially. \begin{table}[h!] \centering \begin{tabular}{|c|c|c|} Project & worst - best time (seconds) & Maximum slowdown \\ \hline aoc\_2021\_09 & 29.685 & 4.75 \\ aoc\_2022\_08 & 0.036 & 2.088 \\ aoc\_2022\_09 & 10.031 & 132.844 \\ aoc\_2022\_14 & 0.293 & 2.036 \\ prime\_sieve & 28.408 & 18.646 \\ example\_mapping & 0.031 & 1.805 \\ example\_sets & 0.179 & 12.65 \\ example\_stack & 1.931 & 8.454 \\ \end{tabular} \caption{Spread in total benchmark results by project} \label{table:benchmark_spread} \end{table} %% ** Summarise predicted versus actual \subsection{Prediction accuracy} We now compare the implementations suggested by our system, to the selection that is actually best. For now, we ignore suggestions for adaptive containers. Table \ref{table:predicted_actual} shows the predicted best assignments alongside the actual best assignment, obtained by brute-force. In all but two of our test cases (marked with *), we correctly identify the best container. \todo{but also its just vec/hashset every time, which is kinda boring. we should either get more variety (by adding to the library or adding new test cases), or mention this as a limitation in testing} \begin{table}[h!] \centering \begin{tabular}{|c|c|c|c|} Project & Container Type & Best implementation & Predicted best \\ \hline aoc\_2022\_09 & Set & HashSet & HashSet \\ example\_stack & StackCon & Vec & Vec \\ aoc\_2021\_09 & Set & HashSet & HashSet \\ aoc\_2021\_09 & Map & HashMap & HashMap \\ aoc\_2022\_14 & Set & HashSet & HashSet \\ aoc\_2022\_14 & List & Vec & LinkedList \\ aoc\_2022\_08 & Map & HashMap & HashMap \\ example\_sets & Set & HashSet & HashSet \\ example\_mapping & Map & HashMap & HashMap \\ prime\_sieve & Primes & HashSet & BTreeSet \\ prime\_sieve & Sieve & Vec & LinkedList \\ \end{tabular} \caption{Actual best vs predicted best implementations} \label{table:predicted_actual} \end{table} %% ** Evaluate performance \subsection{Evaluation} %% ** Comment on distribution of best implementation %% ** Surprising ones / Explain failures %% * Performance of adaptive containers \section{Adaptive containers} \todo{These also need more work, and better test cases} %% ** Find where adaptive containers get suggested %% ** Comment on relative performance speedup %% ** Suggest future improvements? %% * Selection time / developer experience %% \section{Selection time} %% ** Mention speedup versus naive brute force