6 files changed, 57 insertions, 47 deletions
diff --git a/thesis/main.tex b/thesis/main.tex
index 9b51754..7657f58 100644
--- a/thesis/main.tex
+++ b/thesis/main.tex
@@ -16,6 +16,8 @@
 \usepackage{hyperref}
 \bibliographystyle{unsrtnat}
 \setcitestyle{authoryear,open={(},close={)}}
+\usepackage{algorithm}
+\usepackage{algpseudocode}
 
 %% Convenience macros
 \newcommand{\code}[1]{\lstinline$#1$}
diff --git a/thesis/parts/abstract.tex b/thesis/parts/abstract.tex
index ae86901..8f13f9d 100644
--- a/thesis/parts/abstract.tex
+++ b/thesis/parts/abstract.tex
@@ -5,10 +5,10 @@ We present Candelabra, a system for selecting the best implementation of a conta
 Using the DSL proposed in \cite{qin_primrose_2023}, developers specify the way a container must behave and what operations it must be able to perform.
 Once they have done this, we are able to select implementations that meet those requirements, and suggest which will be the fastest based on the usage patterns of the user's program.
 
-Our system is designed with flexibility in mind, meaning it is easy to add new container implementations, and operations.
+Our system is designed with flexibility in mind, meaning it is easy to add new container implementations and operations.
 It is also able to scale up to larger programs, without suffering the exponential blowup in time taken that would happen with a brute-force approach.
 
-Our approach is able to suggest the fastest implementation in most of our tests, although further testing is required on a wider range of workloads.
+Our approach generates accurate estimates of each implementation's performance, and is able to find the fastest implementation in the majority of our tests.
 
 We also investigate the feasibility of adaptive containers, which switch implementation once the size reaches a certain threshold.
 In doing so, we identify several key concerns that future work should address.
diff --git a/thesis/parts/background.tex b/thesis/parts/background.tex
index ea30f36..65372ef 100644
--- a/thesis/parts/background.tex
+++ b/thesis/parts/background.tex
@@ -79,7 +79,7 @@ Interfaces, or their closest equivalent, are often used to abstract over similar
 In Java, ordered collections implement the interface \code{List<E>}, with similar interfaces for \code{Set<E>}, \code{Queue<E>}, etc.
 This allows most code to be implementation-agnostic, with functional requirements specified by the interface used.
 
-Whilst this provides some flexibility, it still requires the developer to choose a concrete implementation at some point.
+While this provides some flexibility, it still requires the developer to choose a concrete implementation at some point.
 In most cases, developers will simply choose the most common implementation and assume it will be fast enough.
 
 Otherwise, developers are forced to guess based on their knowledge of specific implementations and their program's behaviour.
diff --git a/thesis/parts/design.tex b/thesis/parts/design.tex
index 796549e..ab9cbf0 100644
--- a/thesis/parts/design.tex
+++ b/thesis/parts/design.tex
@@ -1,9 +1,4 @@
-We now outline the design of our container selection system (Candelabra), and justify our design decisions.
-
-We first restate our aims and priorities for the system, illustrating its usage with an example.
-We then provide an overview of the container selection process, and each part in it.
-
-We leave detailed discussion of implementation for chapter \ref{chap:implementation}.
+We now outline the design of our container selection system (Candelabra), and justify our design decisions. We first restate our aims and priorities for the system, illustrating its usage with an example. We then provide an overview of the container selection process, and each part in it, although we leave detailed discussion of implementation for chapter \ref{chap:implementation}.
 
 \section{Aims \& Usage}
 
@@ -54,7 +49,7 @@ Here, the code generated uses \code{Vec} as the implementation for \code{Sieve},
 \begin{table}[h]
   \centering
   \begin{tabular}{|c|c|c|c|}
-    & Container Type & Implementation & Estimated cost \\
+    Best performing & Container Type & Implementation & Estimated cost \\
     \hline
     * & Sieve & LinkedList & 14179471355 \\
       & Sieve & Vec & 26151238698 \\
@@ -97,7 +92,7 @@ Primrose allows users to specify both the traits they require (syntactic propert
 Each container type that we want to select an implementation for is bound by a list of traits and a list of properties (lines 11 and 12 in Listing \ref{lst:selection_example}).
 
 %% Short explanation of selection method
-In brief, primrose works by:
+In brief, Primrose works by:
 
 \begin{itemize}
 \item Finding all implementations in the container library that implement all required traits
@@ -125,7 +120,7 @@ After this stage, we have a list of implementations for each container type we a
 \end{table}
 
 %% Abstraction over backend
-Although we use primrose in our implementation, the rest of our system isn't dependent on it, and it would be relatively simple to use a different approach for selecting based on functional requirements.
+Although we use Primrose in our implementation, the rest of our system isn't dependent on it, and it would be relatively simple to use a different approach for selecting based on functional requirements.
 
 \section{Cost Models}
 
@@ -173,19 +168,25 @@ We then aggregate all results for a single container type into a list of partiti
 
 Each partition simply stores an average value for each component of our results (maximum size and a count for each operation), along with a weight indicating how many results fell into that partition.
 
-Results are processed as follows:
-
-\begin{itemize}
-\item We start with an empty list of partitions.
-\item For each result, if there is a partition with an average max n value within 100 of that result's maximum n, add the result to that partition:
-  \begin{itemize}
-  \item Adjust the partition's average maximum n according to the new result
-  \item Adjust the partitions' average count of each operation according to the counts in the new result
-    \item Add 1 to the weight of the partition.
-  \end{itemize}
-\item If there is no such partition, create a new one with the values from the result, and with weight 1.
-\item Once all results have been processed, normalize the partition weights by dividing each by the sum of all weights.
-\end{itemize}
+Results are processed as in algorithm \ref{alg:results}.
+
+\begin{algorithm}
+  \caption{Results processing algorithm}
+  \label{alg:results}
+  \begin{algorithmic}[1]
+    \State Start with an empty list of partitions
+    \For{result in results}
+    \If{there is a partition with an average max n value within 100 of result's maximum n}
+    \State Adjust the partition's average maximum n according to the new result
+    \State Adjust the partitions' average count of each operation according to the counts in the new result
+    \State Add 1 to the weight of the partition.
+    \Else
+    Add a new partition with the values from the result, and with weight 1.
+    \EndIf
+    \EndFor
+    \State Once all results have been processed, normalize the partition weights by dividing each by the sum of all weights.
+  \end{algorithmic}
+\end{algorithm}
 
 The use of partitions serves 3 purposes.
 The first is to compress the data, which speeds up processing and stops us running out of memory in more complex programs.
@@ -200,8 +201,7 @@ We now combine these to estimate the total cost of each implementation.
 For each implementation, our estimate for its total cost is:
 
 $$
-\sum_{o\in \mathit{ops}, (r_{o}, N, W) \in \mathit{partitions}} C_o(N) * r_o
-* W
+\sum_{o\in \mathit{ops}, (r_{o}, N, W) \in \mathit{partitions}} C_o(N) r_o W
 $$
 
 \begin{itemize}
diff --git a/thesis/parts/implementation.tex b/thesis/parts/implementation.tex
index bc2802c..d8997f0 100644
--- a/thesis/parts/implementation.tex
+++ b/thesis/parts/implementation.tex
@@ -80,7 +80,7 @@ We originally experimented with coefficients up to $x^3$, but found that this le
 \section{Profiling}
 
 We implement profiling using the \code{ProfilerWrapper} type (\code{src/crates/library/src/profiler.rs}), which takes as type parameters the inner container implementation and an index, used later to identify what container type the output corresponds to.
-We then implement any primrose traits that the inner container implements, counting the number of times each operation is called.
+We then implement any Primrose traits that the inner container implements, counting the number of times each operation is called.
 We also check the length of the container after each insert operation, and track the maximum.
 
 Tracking is done per-instance, and recorded when the container goes out of scope and its \code{Drop} implementation is called.
@@ -103,23 +103,31 @@ Selection is done per container type.
 For each candidate implementation, we calculate its cost on each partition in the profiler output, then sum these values to get the total estimated cost for each implementation.
 This is implemented in \code{src/crates/candelabra/src/profiler/info.rs} and \code{src/crates/candelabra/src/select.rs}.
 
-In order to try and suggest an adaptive container, we use the following algorithm:
-
-\begin{enumerate}
-\item Sort the list of partitions in order of ascending maximum n values.
-\item Calculate the cost for each candidate in each partition individually.
-\item For each partition, find the best candidate and store it in the array \code{best}. Note that we don't sum across all partitions this time.
-\item Find the lowest index \code{i} where \code{best[i] != best[0]}
-\item Check that \code{i} splits the list properly: For all \code{j < i}, we require \code{best[j] == best[0]} and for all \code{j>=i}, we require \code{best[j] == best[i]}.
-\item Let \code{before} be the name of the candidate in \code{best[0]}, \code{after} be the name of the candidate in \code{best[i]}, and \code{threshold} be halfway between the maximum n values of partition \code{i} and partition \code{i-1}.
-\item Calculate the cost of switching as:
-  $$
-  C_{\mathit{before,clear}}(\mathit{threshold}) + \mathit{threshold} * C_{\mathit{after,insert}}(\mathit{threshold})
-  $$
-\item Calculate the cost of not switching: The sum of the difference in cost between \code{before} and \code{after} for all partitions with index \code{> i}.
-\item If the cost of not switching is less than the cost of switching, don't make a suggestion.
-\item Otherwise, suggest an adaptive container which switches from \code{before} to \code{after} when $n$ gets above \code{threshold}. Its estimated cost is the cost for \code{before} up to partition \code{i}, plus the cost of \code{after} for all other partitions, and the cost of switching.
-\end{enumerate}
+In order to try and suggest an adaptive container, we use algorithm \ref{alg:adaptive_container}.
+
+\begin{algorithm}
+  \caption{Adaptive container suggestion algorithm}
+  \label{alg:adaptive_container}
+  \begin{algorithmic}
+    \State Sort $\mathit{partitions}$ in order of ascending maximum n values.
+    \State $\mathit{costs} \gets$ the cost for each candidate in each partition.
+    \State $\mathit{best} \gets$ the best candidate per partition
+    \State $i \gets$ the lowest index where $\mathit{best}[i] \neq \mathit{best}[0]$
+    \If{exists $j < i$ where $\mathit{best}[j] \neq \mathit{best}[0]$, or exists $j\geq i$ where $\mathit{best}[j] \neq \mathit{best}[i]$}
+    \State \Return no suggestion
+    \EndIf
+    \State $\mathit{before} \gets$ name of the candidate in \code{best[0]}
+    \State $\mathit{after} \gets$ name of the candidate in \code{best[i]}
+    \State $\mathit{threshold} \gets$ halfway between the max n values of partition $i$ and $i-1$
+    \State $\mathit{switching\_cost} \gets C_{\mathit{before,clear}}(\mathit{threshold}) + \mathit{threshold} * C_{\mathit{after,insert}}(\mathit{threshold})$
+    \State $\mathit{not\_switching\_cost} \gets$ the sum of the difference in cost between $\mathit{before}$ and $\mathit{after}$ for all partitions with index $> i$.
+    \If{$\mathit{switching\_cost} > \mathit{not\_switching\_cost}$}
+      \State \Return no suggestion
+    \Else
+      \State \Return suggestion for an adaptive container which switches from $\mathit{before}$ to $\mathit{after}$ when $n$ gets above $\mathit{threshold}$. Its estimated cost is $\mathit{switching\_cost} + \sum_{k=0}^i \mathit{costs}[\mathit{before}][k] + \sum_{k=i}^{|partitions|} \mathit{costs}[\mathit{after}][k]$
+    \EndIf
+  \end{algorithmic}
+\end{algorithm}
 
 \section{Code Generation}
 
diff --git a/thesis/parts/results.tex b/thesis/parts/results.tex
index 4db297d..622cb12 100644
--- a/thesis/parts/results.tex
+++ b/thesis/parts/results.tex
@@ -197,7 +197,7 @@ In all but two of our test cases (marked with *), we correctly identify the best
 \begin{table}[h!]
   \centering
   \begin{tabular}{c|c|c|c|c|}
-    & Project & Container Type & Best implementation & Predicted best   \\
+   Incorrect & Project & Container Type & Best implementation & Predicted best   \\
     \hline
       &  aoc\_2021\_09   & Map      & HashMap  & HashMap \\
       &  aoc\_2021\_09   & Set      & HashSet  & HashSet \\
@@ -315,7 +315,7 @@ Future work could take a more complex approach that finds the best threshold val
 \subsection{Evaluation}
 
 Overall, we find that the main part of our container selection system has merit.
-Whilst our testing has limitations, it shows that we can correctly identify the best container even in complex programs.
+While our testing has limitations, it shows that we can correctly identify the best container even in complex programs.
 More work is needed on improving our system's performance for very small containers, and on testing with a wider range of programs.
 
 Our proposed technique for identifying adaptive containers appears ineffective.