diff options
Diffstat (limited to 'thesis/parts/results.tex')
-rw-r--r-- | thesis/parts/results.tex | 53 |
1 files changed, 28 insertions, 25 deletions
diff --git a/thesis/parts/results.tex b/thesis/parts/results.tex index 76247a4..4db297d 100644 --- a/thesis/parts/results.tex +++ b/thesis/parts/results.tex @@ -1,4 +1,4 @@ -In this chapter, we present the methodology used for benchmarking our system, and comment on the results we got. +In this chapter, we present the methodology used for benchmarking our system, our results, and analysis. We examine the produced cost models of certain operations in detail, with reference to the expected asymptotics of each operation. We then compare the selections made by our system to the actual optimal selections (obtained by brute force) for a variety of test cases. This includes examining when adaptive containers are suggested, and their effectiveness. @@ -24,11 +24,11 @@ The most important software versions are listed below. \section{Cost models} -We start by examining some of our generated cost models, and comparing them both to the observations they are based on, and what we expect from asymptotic analysis. +We start by examining some of our generated cost models, comparing them both to the observations they are based on, and what we expect from asymptotic analysis. As we build a total of 77 cost models from our library, we will not examine them all in detail. We look at models of the most common operations, grouped by containers that are commonly selected together. -\subsection{Insertion operations} +\subsection{Insert operations} Starting with the \code{insert} operation, Figure \ref{fig:cm_insert} shows how the estimated cost changes with the size of the container. The lines correspond to our fitted curves, while the points indicate the raw observations we drew from. @@ -46,7 +46,7 @@ The former appears to be in line with the observations, and is likely due to the The latter appears to diverge from the observations, and may indicate poor fitting. \code{LinkedList} has a significantly slower insertion. -This is likely because it requires a syscall for heap allocation for every item inserted, no matter the current size. +This is likely because it requires a heap allocation system call for every item inserted, no matter the current size. This would also explain why data points appear spread out more, as system calls have more unpredictable latency, even on systems with few other processes running. Notably, insertion appears to start to get cheaper past $n=24,000$, although this is only weakly suggested by observations. @@ -66,8 +66,9 @@ This is what we expect for hash-based collections, with the slight growth likely \code{BTreeSet} has similar behaviour, but settles at a larger value overall. \code{BTreeMap} appears to grow more rapidly, and cost more overall. + It's important to note that Rust's \code{BTreeSet} is not based on binary tree search, but instead a more general tree search originally proposed by \cite{bayer_organization_1970}, where each node contains $B-1$ to $2B-1$ elements in an unsorted array. -The standard library documentation\citep{rust_documentation_team_btreemap_2024} states that search is expected to take $O(B\lg n)$ comparisons. +The standard library documentation~\citep{rust_documentation_team_btreemap_2024} states that search is expected to take $O(B\lg n)$ comparisons. Since both of these implementations require searching the collection before inserting, the close-to-logarithmic growth seems to makes sense. \subsubsection{Small n values} @@ -105,9 +106,9 @@ The observations in these graphs have a much wider spread than our \code{insert} This is probably because we attempt to get a different random element in our container every time, so our observations show the best and worst case of our data structures. This is desirable assuming that \code{contains} operations are actually randomly distributed in the real world, which seems likely. -For the \code{SortedVec} family, we would expect to see roughly logarithmic growth, as contains is based on binary search. +For the \code{SortedVec} family, we would expect to see roughly logarithmic growth, as we are performing a binary search. This is the case for \code{SortedVecMap}, however \code{SortedVec} and \code{SortedVecSet} both show exponential growth with a 'dip' around $n=25,000$. -It's unclear why this happened, although it could be due to how the elements we query are randomly distributed throughout the list. +It's unclear why this is, one reason could be that the elements we query are randomly distributed throughout the list, and this distribution may not be fair for all benchmarks. A possible improvement would be to run contains with a known distribution of values, including low, high, and not present values in equal parts. The \code{Vec} family exhibits roughly linear growth, which is expected, since this implementation scans through the whole array each time. @@ -127,6 +128,7 @@ It's unclear why this is, however it could be related to the larger spread in ob Overall, our cost models appear to be a good representation of each implementations performance impact. Future improvements should focus on improving accuracy at lower $n$ values, such as by employing a more complex fitting procedure, or on ensuring operations have their best and worst cases tested fairly. +\newpage %% * Predictions \section{Selections} @@ -135,10 +137,10 @@ We now proceed with end-to-end testing of the system, selecting containers for a \subsection{Benchmarks} %% ** Chosen benchmarks -Our test programs broadly fall into two categories: Examples, which repeat a few operations many times, and real-life programs, which are implementations of common algorithms and solutions to programming puzles. +Our test programs broadly fall into two categories: examples programs, which repeat a few operations many times, and real-life programs, which are implementations of common algorithms and solutions to programming puzles. We expect the results from our example programs to be relatively obvious, while our real programs are more complex and harder to predict. -Most of our real programs are solutions to puzzles from Advent of Code\citep{wastl_advent_2015}, a popular collection of programming puzzles. +Most of our real programs are solutions to puzzles from Advent of Code~\citep{wastl_advent_2015}, a popular collection of programming puzzles. Table \ref{table:test_cases} lists and briefly describes our test programs. \begin{table}[h!] @@ -194,20 +196,20 @@ In all but two of our test cases (marked with *), we correctly identify the best \begin{table}[h!] \centering - \begin{tabular}{|c|c|c|c|c|} - Project & Container Type & Best implementation & Predicted best & \\ + \begin{tabular}{c|c|c|c|c|} + & Project & Container Type & Best implementation & Predicted best \\ \hline - aoc\_2021\_09 & Map & HashMap & HashMap & \\ - aoc\_2021\_09 & Set & HashSet & HashSet & \\ - aoc\_2022\_08 & Map & HashMap & HashMap & \\ - aoc\_2022\_09 & Set & HashSet & HashSet & \\ - aoc\_2022\_14 & Set & HashSet & HashSet & \\ - aoc\_2022\_14 & List & Vec & LinkedList & * \\ - example\_mapping & Map & HashMap & HashMap & \\ - example\_sets & Set & HashSet & HashSet & \\ - example\_stack & StackCon & Vec & Vec & \\ - prime\_sieve & Primes & BTreeSet & BTreeSet & \\ - prime\_sieve & Sieve & Vec & LinkedList & * \\ + & aoc\_2021\_09 & Map & HashMap & HashMap \\ + & aoc\_2021\_09 & Set & HashSet & HashSet \\ + & aoc\_2022\_08 & Map & HashMap & HashMap \\ + & aoc\_2022\_09 & Set & HashSet & HashSet \\ + & aoc\_2022\_14 & Set & HashSet & HashSet \\ + * & aoc\_2022\_14 & List & Vec & LinkedList \\ + & example\_mapping & Map & HashMap & HashMap \\ + & example\_sets & Set & HashSet & HashSet \\ + & example\_stack & StackCon & Vec & Vec \\ + & prime\_sieve & Primes & BTreeSet & BTreeSet \\ + * & prime\_sieve & Sieve & Vec & LinkedList \\ \end{tabular} \caption{Actual best vs predicted best implementations} \label{table:predicted_actual} @@ -218,7 +220,7 @@ From looking at detailed profiling information, it seems that both of these cont Therefore this is likely caused by our cost models being inaccurate at small $n$ values, as mentioned in section \ref{section:cm_small_n}. Overall, our results suggest that our system is effective, at least for large enough $n$ values. -Unfortunately, these tests are somewhat limited, as the best container is almost always predictable: \code{Vec} where uniqueness is not important, and \code{Hash*} otherwise. +Unfortunately, these tests are somewhat limited, as the best container seems easy to predict for most cases: \code{Vec} where uniqueness is not important, and \code{Hash*} otherwise. Therefore, more thorough testing is needed to fully establish the system's effectiveness. \subsection{Adaptive containers} @@ -304,7 +306,7 @@ This shows that adaptive containers as we have implemented them are not effectiv Even in cases where we never reach the size threshold, the presence of adaptive containers has an overhead which slows down the program 3x in the worst case (\code{example_mapping}, size = 150). One explanation for this could be that every operation now requires checking which inner implementation we are using, resulting in an additional check for each operation. -More work could be done to minimise this overhead, although it's unclear exactly how much this could be minimised. +More work could be done to minimise this overhead, although it's unclear how. It is also unclear if the threshold values that we suggest are the optimal ones. Currently, we decide our threshold by picking a value between two partitions with different best containers. @@ -312,9 +314,10 @@ Future work could take a more complex approach that finds the best threshold val \subsection{Evaluation} -Overall, we find that the main part of our container selection system appears to have merit. +Overall, we find that the main part of our container selection system has merit. Whilst our testing has limitations, it shows that we can correctly identify the best container even in complex programs. More work is needed on improving our system's performance for very small containers, and on testing with a wider range of programs. Our proposed technique for identifying adaptive containers appears ineffective. The primary challenges appear to be in the overhead introduced to each operation, and in finding the correct point at which to switch implementations. +This could also suggest that adaptive containers are less effective in lower-level compiled languages, as previous literate focused mostly on higher-level languages such as Java \citep{hutchison_coco_2013,costa_collectionswitch_2018}. |