1 files changed, 50 insertions, 26 deletions
diff --git a/thesis/parts/results.tex b/thesis/parts/results.tex
index 6915b12..747631d 100644
--- a/thesis/parts/results.tex
+++ b/thesis/parts/results.tex
@@ -19,7 +19,7 @@ Starting with the \code{insert} operation, Figure \ref{fig:cm_insert} shows how
 The lines correspond to our fitted curves, while the points indicate the raw observations they are drawn from.
 To help readability, we group these into regular \code{Container} implementations, and our associative key-value \code{Mapping} implementations.
 
-\begin{figure}[h]
+\begin{figure}[h!]
   \centering
   \includegraphics[width=10cm]{assets/insert_containers.png}
   \par\centering\rule{11cm}{0.5pt}
@@ -42,7 +42,6 @@ This is likely due to hash collisions being more likely as the size of the colle
 
 \code{BTreeSet} insertions are also expensive, however the cost appears to level out as the collection size goes up (a logarithmic curve).
 It's important to note that Rust's \code{BTreeSet}s are not based on binary tree search, but instead a more general tree search originally proposed by R Bayer and E McCreight\parencite{bayer_organization_1970}, where each node contains $B-1$ to $2B-1$ elements in an array.
-\todo{The standard library documentation states that searches are expected to take $B\log(n)$ comparisons on average\parencite{rust_documentation_team_btreemap_2024}, which would explain the logarithm-like growth.}
 
 Our two mapping types, \code{BTreeMap} and \code{HashMap}, mimic the behaviour of their set counterparts.
 
@@ -52,7 +51,7 @@ This would suggest we should see a roughly logarithmic complexity.
 However, as we will be inserting most elements near the middle of a list, we will on average be copying half the list every time.
 This could explain why we see a roughly linear growth.
 
-\todo{Graph this, and justify further}
+\todo{This explanation could be better}
 
 \subsection{Contains operations}
 
@@ -64,7 +63,7 @@ Notably, the observations in these graphs have a much wider spread than our \cod
 This is probably because we attempt to get a different random element in our container every time, so our observations show the best and worst case of our data structures.
 This is desirable assuming that \code{contains} operations are actually randomly distributed in the real world, which seems likely.
 
-\begin{figure}[h]
+\begin{figure}[h!]
   \centering
   \includegraphics[width=10cm]{assets/contains_lists.png}
   \par\centering\rule{11cm}{0.5pt}
@@ -89,11 +88,11 @@ C(n) &\approx -5.9 + 8.8\log_2 n - (4 * 10^{-5}) n - (3 * 10^{-8}) * n^2 & \text
 \end{align*}
 
 As both of these implementations use a binary search for \code{contains}, the dominating logarithmic factors are expected.
-\code{SortedUniqueVec} likely has a larger $n^2$ coefficient due to more collisions happening at larger container sizes.
-\todo{elaborate: we insert that many random items, but some may be duplicates}
+This is possibly a case of overfitting, as the observations for both implementations also have a wide spread.
 
-\code{HashSet} appears roughly linear as expected, with only a slow logarithmic rise, probably due to collisions.
+\code{HashSet} appears roughly linear as expected, with only a slow logarithmic rise, probably due to an increasing amount of collisions.
 \code{BTreeSet} is consistently above it, with a slightly higher logarithmic rise.
+The standard library documentation states that searches are expected to take $B\log(n)$ comparisons on average\parencite{rust_documentation_team_btreemap_2024}, which is in line with observations.
 
 \code{BTreeMap} and \code{HashMap} both mimic their set counterparts, though are more expensive in most places.
 This is probably due to the increased size more quickly exhausting CPU cache.
@@ -124,7 +123,7 @@ We expect the results from our example cases to be relatively unsurprising, whil
 Most of our real cases are solutions to puzzles from Advent of Code\parencite{wastl_advent_2015}, a popular collection of programming puzzles.
 Table \ref{table:test_cases} lists and briefly describes our test cases.
 
-\begin{table}[h]
+\begin{table}[h!]
   \centering
   \begin{tabular}{|c|c|}
     Name & Description \\
@@ -147,29 +146,56 @@ Table \ref{table:test_cases} lists and briefly describes our test cases.
 Table \ref{table:benchmark_spread} shows the difference in benchmark results between the slowest possible assignment of containers, and the fastest.
 Even in our example projects, we see that the wrong choice of container can slow down our programs substantially.
 
-
-\begin{table}[h]
+\begin{table}[h!]
 \centering
-\begin{tabular}{|c|c|}
-  Project & Total difference between best and worst benchmarks (seconds) & Maximum slowdown from bad container choices \\
-  \hline
-  aoc\_2021\_09 & 29.685 & 4.75 \\
-  aoc\_2022\_08 & 0.036 & 2.088 \\
-  aoc\_2022\_09 & 10.031 & 132.844 \\
-  aoc\_2022\_14 & 0.293 & 2.036 \\
-  prime\_sieve & 28.408 & 18.646 \\
-  example\_mapping & 0.031 & 1.805 \\
-  example\_sets & 0.179 & 12.65 \\
-  example\_stack & 1.931 & 8.454 \\
+\begin{tabular}{|c|c|c|}
+Project & worst - best time (seconds) & Maximum slowdown \\
+\hline
+aoc\_2021\_09 & 29.685 & 4.75 \\
+aoc\_2022\_08 & 0.036 & 2.088 \\
+aoc\_2022\_09 & 10.031 & 132.844 \\
+aoc\_2022\_14 & 0.293 & 2.036 \\
+prime\_sieve & 28.408 & 18.646 \\
+example\_mapping & 0.031 & 1.805 \\
+example\_sets & 0.179 & 12.65 \\
+example\_stack & 1.931 & 8.454 \\
 \end{tabular}
 \caption{Spread in total benchmark results by project}
 \label{table:benchmark_spread}
 \end{table}
 
-
 %% ** Summarise predicted versus actual
 \subsection{Prediction accuracy}
 
+We now compare the implementations suggested by our system, to the selection that is actually best.
+For now, we ignore suggestions for adaptive containers.
+
+Table \ref{table:predicted_actual} shows the predicted best assignments alongside the actual best assignment, obtained by brute-force.
+In all but two of our test cases (marked with *), we correctly identify the best container.
+
+\todo{but also its just vec/hashset every time, which is kinda boring. we should either get more variety (by adding to the library or adding new test cases), or mention this as a limitation in testing}
+
+\begin{table}[h!]
+  \centering
+  \begin{tabular}{c|c|c|c|c|}
+    & Project & Container Type & Actual Best & Predicted Best \\
+    \hline
+    & aoc\_2021\_09 & Map & HashMap & HashMap \\
+    & aoc\_2021\_09 & Set & HashSet & HashSet \\
+    & aoc\_2022\_14 & Set & HashSet & HashSet \\
+    * & aoc\_2022\_14 & List & Vec & LinkedList \\
+    & example\_stack & StackCon & Vec & Vec \\
+    & example\_sets & Set & HashSet & HashSet \\
+    & example\_mapping & Map & HashMap & HashMap \\
+    & aoc\_2022\_08 & Map & HashMap & HashMap \\
+    * & prime\_sieve & Primes & BTreeSet & HashSet \\
+    & prime\_sieve & Sieve & Vec & Vec \\
+    & aoc\_2022\_09 & Set & HashSet & HashSet \\
+  \end{tabular}
+  \caption{Actual best vs predicted best implementations}
+  \label{table:predicted_actual}
+\end{table}
+
 %% ** Evaluate performance
 \subsection{Evaluation}
 
@@ -180,7 +206,7 @@ Even in our example projects, we see that the wrong choice of container can slow
 %% * Performance of adaptive containers
 \section{Adaptive containers}
 
-\todo{Try and make these fucking things work}
+\todo{These also need more work, and better test cases}
 
 %% ** Find where adaptive containers get suggested
 
@@ -189,8 +215,6 @@ Even in our example projects, we see that the wrong choice of container can slow
 %% ** Suggest future improvements?
 
 %% * Selection time / developer experience
-\section{Selection time}
-
-\todo{selection time}
+%% \section{Selection time}
 
 %% ** Mention speedup versus naive brute force