1 files changed, 90 insertions, 53 deletions
diff --git a/thesis/parts/design.tex b/thesis/parts/design.tex
index 84643b1..01cd858 100644
--- a/thesis/parts/design.tex
+++ b/thesis/parts/design.tex
@@ -1,7 +1,6 @@
-This chapter outlines the design of our container selection system (Candelabra), and justifies our design decisions.
-
-We first describe our aims and priorities for the system, and illustrate its usage with an example.
+We now outline the design of our container selection system (Candelabra), and justify our design decisions.
 
+We first restate our aims and priorities for the system, illustrating its usage with an example.
 We then provide an overview of the container selection process, and each part in it.
 
 We leave detailed discussion of implementation for chapter \ref{chap:implementation}.
@@ -12,7 +11,7 @@ As mentioned previously, we aim to create an all-in-one solution for container s
 Flexibility is a high priority: It should be easy to add new container implementations, and to integrate our system into existing applications.
 Our system should also be able to scale to larger programs, and remain convenient for developers to use.
 
-We chose to implement our system as a Rust CLI, and to work on programs also written in Rust.
+We chose to implement our system as a CLI, and to work on programs written in Rust.
 We chose Rust both for the expressivity of its type system, and its focus on speed and low-level control.
 However, most of the techniques we use are not tied to Rust in particular, and so should be possible to generalise to other languages.
 
@@ -20,16 +19,11 @@ We require the user to provide their own benchmarks, which should be representat
 
 Users specify their functional requirements by listing the required traits and properties they need for a given container type.
 Traits are Rust's primary method of abstraction, and are similar to interfaces in object-oriented languages, or typeclasses in functional languages.
-Properties are specified in a lisp-like DSL as a predicate on a model of the container.
+Properties are specified in a Lisp-like DSL as a predicate on a model of the container.
 
 For example, Listing \ref{lst:selection_example} shows code from our test case based on the sieve of Eratosthenes (\code{src/tests/prime_sieve} in the source artifacts).
-Here we request two container types: \code{Sieve} and \code{Primes}.
-The first must implement the \code{Container} and \code{Stack} traits, and must satisfy the \code{lifo} property. This property is defined at the top as only being applicable to \code{Stack}s, and requires that for any \code{x}, pushing \code{x} then popping from the container returns \code{x}.
-
-The second container type, \code{Primes}, must only implement the \code{Container} trait, and must satisfy the \code{ascending} property.
-This property requires that for all consecutive \code{x, y} pairs in the container, \code{x <= y}.
 
-\begin{figure}
+\begin{figure}[h]
 \begin{lstlisting}[caption=Container type definitions for prime\_sieve,label={lst:selection_example}]
 /*SPEC*
 property lifo<T> {
@@ -47,20 +41,26 @@ type Primes<S> = {c impl (Container) | (ascending c)}
 \end{lstlisting}
 \end{figure}
 
+Here we request two container types: \code{Sieve} and \code{Primes}.
+The first must implement the \code{Container} and \code{Stack} traits, and must satisfy the \code{lifo} property. This property is defined at the top, and requires that for any \code{x}, pushing \code{x} then popping from the container returns \code{x}.
+
+The second container type, \code{Primes}, must implement the \code{Container} trait, and must satisfy the \code{ascending} property.
+This property requires that for all consecutive \code{x, y} pairs in the container, \code{x <= y}.
+
 Once we've specified our functional requirements and provided a benchmark (\code{src/tests/prime_sieve/benches/main.rs}), we can simply run Candelabra to select a container: \code{candelabra-cli -p prime_sieve select}.
-This command outputs something like Table \ref{table:selection_output}, and saves the best combination of container types to be used the next time the program is run.
+This command outputs something like table \ref{table:selection_output}, and saves the best combination of container types to be used the next time the program is run.
 Here, the code generated uses \code{Vec} as the implementation for \code{Sieve}, and \code{HashSet} as the implementation for \code{Primes}.
 
 \begin{table}[h]
   \centering
   \begin{tabular}{|c|c|c|c|}
-    Name & Implementation & Estimated Cost \\
+    & Container Type & Implementation & Estimated cost \\
     \hline
-    Sieve & std::vec::Vec & 159040493883 \\
-    Sieve & std::collections::LinkedList & 564583506434 \\
-    Primes & primrose\_library::SortedVec & 414991320 \\
-    Primes & std::collections::BTreeSet & 355962089 \\
-    Primes & std::collections::HashSet & 309638677 \\
+    * & Sieve & LinkedList & 14179471355 \\
+      & Sieve & Vec & 26151238698 \\
+      & Primes & HashSet & 117005368 \\
+      & Primes & SortedVec & 112421356 \\
+    * & Primes & BTreeSet & 108931859 \\
   \end{tabular}
   \caption{Example output from selection command}
   \label{table:selection_output}
@@ -68,16 +68,18 @@ Here, the code generated uses \code{Vec} as the implementation for \code{Sieve},
 
 \section{Overview of process}
 
-Our tool integrates with Rust's packaging system (Cargo) to discover the information it needs about our project, then runs Primrose to find a list of implementations satsifying our functional requirements, from a pre-built library of container implementations.
+Our tool integrates with Rust's packaging system (Cargo) to discover the information it needs about our project.
+It then runs Primrose to find a list of implementations satsifying our functional requirements from a pre-built library of container implementations.
 
-Once we have this list, we then build a 'cost model' for each candidate type. This allows us to get an upper bound for the runtime cost of an operation at any given n.
+Once we have this list, we build a 'cost model' for each candidate type. This allows us to get an upper bound for the runtime cost of an operation at any given n.
+We choose to focus only on CPU time, and disregard memory usage due to the difficulty of accurately measuring memory footprint.\footnote{As Rust is not interpreted, we would need to hook into calls to the OS' memory allocator. This is very platform-specific, although the currently work in progress allocator API may make this easier in future.}
 
-We then run the user-provided benchmarks, using any of the valid candidates instrumented to track how many times each operation is performed, and the maximum size of the container.
+We then run the user-provided benchmarks, using a wrapper around any of the valid candidates to track how many times each operation is performed, and the maximum size the container reaches.
 
 We combine this information with our cost models to estimate a total cost for each candidate, which is an upper bound on the total time taken for all container operations.
 At this point, we also check if an 'adaptive' container would be better, by checking if one implementation is better performing at a lower n, and another at a higher n.
 
-Finally, we pick the implementation with the minimum cost, and generate code which sets the container type to use that implementation.
+Finally, we pick the implementation with the minimum cost, and generate code which allows the program to use that implementation.
 
 Our solution requires little user intervention, integrates well with existing workflows, and the time it takes scales linearly with the number of container types in a given project.
 
@@ -95,26 +97,27 @@ Each container type that we want to select an implementation for is bound by a l
 
 %% Short explanation of selection method
 In brief, primrose works by:
+
 \begin{itemize}
 \item Finding all implementations in the container library that implement all required traits
-\item Translate any specified properties to a Rosette expression
-\item For each implementation, model the behaviour of each operation in Rosette, and check that the required properties always hold
+\item Translating any specified properties to a Rosette expression
+\item For each implementation, modelling the behaviour of each operation in Rosette, and checking that the required properties always hold
 \end{itemize}
 
-We use the code provided with the Primrose paper, with minor modifications elaborated on in Chapter \ref{chap:implementation}.
+We use the code provided in the Primrose paper, with minor modifications elaborated on in Chapter \ref{chap:implementation}.
 
-At this stage, we have a list of implementations for each container type we are selecting. The command \code{candelabra-cli candidates} will show this output, as in Table \ref{table:candidates_prime_sieve}.
+After this stage, we have a list of implementations for each container type we are selecting. The command \code{candelabra-cli candidates} will show this output, as in Table \ref{table:candidates_prime_sieve}.
 
 \begin{table}[h]
   \centering
   \begin{tabular}{|c|c|c|}
-    Type & Implementation \\
+    Type & Candidate implementation \\
     \hline
-    Primes & primrose\_library::EagerSortedVec \\
-    Primes & std::collections::HashSet \\
-    Primes & std::collections::BTreeSet \\
-    Sieve & std::collections::LinkedList \\
-    Sieve & std::vec::Vec \\
+    Primes & EagerSortedVec \\
+    Primes & HashSet \\
+    Primes & BTreeSet \\
+    Sieve & LinkedList \\
+    Sieve & Vec \\
   \end{tabular}
   \caption{Usable implementations by container type for \code{prime_sieve}}
   \label{table:candidates_prime_sieve}
@@ -125,7 +128,7 @@ Although we use primrose in our implementation, the rest of our system isn't dep
 
 \section{Cost Models}
 
-Now that we have a list of possible implementations, we need to understand the performance characteristics of each of them.
+Now that we have a list of correct implementations for each container type, we need a way to understand the performance characteristics of each of them in isolation.
 We use an approach similar to CollectionSwitch\citep{costa_collectionswitch_2018}, which assumes that the main factor in how long an operation takes is the current size of the collection.
 
 %% Benchmarks
@@ -134,21 +137,23 @@ An implementation has a seperate cost model for each operation, which we obtain
 For example, to build a cost model for \code{Vec::contains}, we would create several \code{Vec}s of varying sizes, and find the average execution time $t$ of \code{contains} at each.
 
 %% Linear Regression
-We then perform regression, using the collection size $n$ to predict $t$.
+We then perform linear regression, using the collection size $n$ to predict $t$.
 In the case of \code{Vec::contains}, we would expect the resulting polynomial to be roughly linear.
 
 In our implementation, we fit a function of the form $x_0 + x_1 n + x_2 n^2 + x_3 \log_2 n$, using regular least-squares fitting.
 Before fitting, we discard all observations that are more than one standard deviation out from the mean for a given $n$ value.
 
-Whilst we could use a more complex technique, in practice this is good enough: Most common operations are polynomial at worst, and more complex models risk overfitting.
+Whilst we could use a more complex technique, in practice this is good enough: Most common operations are polynomial at worst, and more complex models are at higher risk of overfitting.
 
 %% Limitations
 This method works well for many operations and structures, although has notable limitations.
 In particular, implementations which defer work from one function to another will be extremely inconsistent.
+
 For example, \code{LazySortedVec} (provided by Primrose) inserts new elements at the end by default, and waits to sort the list until the contents of the list are read from (such as by using \code{contains}).
+Our cost models have no way to express that the runtime of one operation varies based on the history of previous operations.
 
 We were unable to work around this, and so we have removed these variants from our container library.
-A potential solution could be to perform untimed 'warmup' operations before each operation, but this is complex because it requires some understanding of what operations will cause work to be deferred.
+One potential solution could be to perform untimed 'warmup' operations before each operation, but this is complex because it requires some understanding of what operations will cause work to be deferred.
 
 At the end of this stage, we are able to reason about the relative cost of operations between implementations.
 These models are cached for as long as our container library remains the same, as they are independent of what program the user is currently working on.
@@ -159,25 +164,42 @@ We now need to collect information about how the user's application uses its con
 
 %% Data Collected
 As mentioned above, the ordering of operations can have a large effect on container performance.
-Unfortunately, tracking every container operation in order quickly becomes unfeasible, so we settle for tracking the count of each operation, and the maximum size of each collection instance.
+Unfortunately, tracking every container operation in order quickly becomes unfeasible.
+Instead, we settle for tracking the count of each operation, and the maximum size the collection reaches.
+
+Every instance or allocation of the collection produces a separate result.
+We then aggregate all results for a single container type into a list of partitions.
 
-Every instance or allocation of the collection is tracked separately, and results are collated after profiling.
-%% Segmentation
-Results with a close enough n value get sorted into partitions, where each partition stores the average count of each operation, and a weight indicating how common results in that partition were.
-This serves 3 purposes.
+Each partition simply stores an average value for each component of our results (maximum size and a count for each operation), along with a weight indicating how many results fell into that partition.
 
+Results are processed as follows:
+
+\begin{itemize}
+\item We start with an empty list of partitions.
+\item For each result, if there is a partition with an average max n value within 100 of that result's maximum n, add the result to that partition:
+  \begin{itemize}
+  \item Adjust the partition's average maximum n according to the new result
+  \item Adjust the partitions' average count of each operation according to the counts in the new result
+    \item Add 1 to the weight of the partition.
+  \end{itemize}
+\item If there is no such partition, create a new one with the values from the result, and with weight 1.
+\item Once all results have been processed, normalize the partition weights by dividing each by the sum of all weights.
+\end{itemize}
+
+The use of partitions serves 3 purposes.
 The first is to compress the data, which speeds up processing and stops us running out of memory in more complex programs.
 The second is to capture the fact that the number of operations will likely depend on the size of the container.
-The third is to aid in searching for adaptive containers, which we will elaborate on later.
+The third is to aid in searching for adaptive containers, a process which relies on understanding the different sizes of containers in the application.
 
 \section{Selection process}
 
 %% Selection process
-Once we have an estimate of how long each operation may take (from our cost models), and how often we use each operation (from our profiling information), we combine these to estimate the total cost of each implementation.
-For each implementation, our total cost estimate is:
+At this stage, we have an estimate of how long each operation may take (from our cost models), and how often we use each operation (from our profiling information).
+We now combine these to estimate the total cost of each implementation.
+For each implementation, our estimate for its total cost is:
 
 $$
-\sum_{o\in \textrm{ops}} \sum_{(r_{o}, N, W) \in \textrm{partitions}} C_o(N) * r_o
+\sum_{o\in \mathit{ops}, (r_{o}, N, W) \in \mathit{partitions}} C_o(N) * r_o
 * W
 $$
 
@@ -185,32 +207,47 @@ $$
 \item $C_o(N)$ is the cost estimated by the cost model for operation $o$ at n value $N$
 \item $r_o$ is the average count of a given operation in a partition
 \item $N$ is the average maximum N value in a partition
-\item $W$ is the weight of a partition, representing how many allocations fell in to this partition
+\item $W$ is the weight of a partition, proportional to how many results fell in to this partition
 \end{itemize}
 
 Essentially, we scale an estimated worst-case cost of each operation by how frequently we think we will encounter it.
+This results in a pessimistic estimate of how much time we will spend in total on container operations for a given implementation.
+
+Now we simply pick the implementation with the smallest estimated cost that satisfies our functional requirements for that container type.
+
+This process is repeated for each container type in the project, and provides not only suggestions but rankings for each type.
+We are able to do this while only running our user's benchmarks once, which is ideal for larger applications or for long-running benchmarks.
 
 \section{Adaptive containers}
 
-In many cases, the maximum size of a container type varies greatly between program runs.
-In these cases, it may be desirable to start off with one container type, and switch to another one if the size of the container grows greatly.
+The above process forms the core of our system, and is sufficient for normal container selection.
+However, a common situation in many programs is that the maximum size of a container type depends on the size of some input.
+In these cases, the user may write benchmarks for a range of sizes, and look for a container type that achieves good enough performance throughout the whole range.
+
+An alternative approach is to start off with whichever container type is best at small sizes, and switch to one more suited for large amounts of data once we grow past a certain threshold.
+In theory, this allows the best of both worlds: A lower overhead container when the program is being run on a small input, and a more complex one for longer runs.
 
 For example, if a program requires a set, then for small sizes it may be best to keep a sorted list and use binary search for \code{contains} operations.
+This is what we do in our \code{SortedVecSet} container implementation.
 But when the size of the container grows, the cost of doing \code{contains} may grow high enough that using a \code{HashSet} would actually be faster.
 
-Adaptive containers attempt to address this need, by starting off with one implementation (the low or before implementation), and switching to a new implemenation (the high or after implementation) once the size of the container passes a certain threshold.
+Adaptive containers attempt to address this need, by starting off with one implementation (referred to as the low or before implementation), and switching to a new implemenation (the high or after implementation) once the size of the container passes a certain threshold.
 
-This is similar to systems such as CoCo\citep{hutchison_coco_2013} and in work by \"{O}sterlund\citep{osterlund_dynamically_2013}.
+This is similar to systems such as CoCo\citep{hutchison_coco_2013} and \cite{osterlund_dynamically_2013}.
 However, we decide when to switch container implementation before the program is run, rather than as it is running.
 We also do so in a way that requires no knowledge of the implementation internals.
 
 %% Adaptive container detection
-Using our list of partitions, we sort it by ascending container size and attempt to find a split.
+After regular container selection is done, we attempt to suggest an adaptive implementation for each container type.
+We first sort our list of partitions by ascending container size.
 If we can split our partitions in half such that everything to the left performs best with one implementation, and everything to the right with another, then we should be able to switch implementation around that n value.
 
 In practice, finding the correct threshold is more difficult: We must take into account the cost of transforming from one implementation to another.
 If we adapt our container too early, we may do more work adapting it than we save if we just stuck with our low implementation.
 If we adapt too late, we have more data to move and less of our program gets to take advantage of the new implementation.
-
 We choose the relatively simple strategy of switching halfway between two partitions.
-Our cost models let us estimate how expensive switching implementations will be, which we compare against how much we save by switching to the after implementation.
+
+Our cost models allow us to estimate how expensive switching implementations will be.
+We compare this estimate to how much better the high implementation is than the low one, to account for the overhead of changing implementations.
+
+A full explanation of our algorithm is given in section \ref{section:impl_selection}.