6 files changed, 220 insertions, 164 deletions
diff --git a/thesis/parts/background.tex b/thesis/parts/background.tex
index 6b4e295..e65afad 100644
--- a/thesis/parts/background.tex
+++ b/thesis/parts/background.tex
@@ -41,16 +41,16 @@ For a \code{HashSet}, this would include that there are never any duplicates.
 A \code{Vec} would not have this property, but would have the property that insertion order is preserved.
 
 To select a correct container implementation, we then need to ensure we meet some syntactic and semantic requirements specific to our program.
-So long as we specify our requiremets correctly, and use an implementation which provides all of the properties we're looking for, our program shouldn't be able to tell the difference.
+So long as we specify our requirements correctly, and use an implementation which provides all of the properties we're looking for, our program shouldn't be able to tell the difference.
 
 \subsection{Non-functional requirements}
 
 While meeting our program's functional requirements should ensure that it runs correctly, this doesn't say anything about our program's efficiency.
-We likely also want to choose the most efficient implementation available, striking a balance between runtime and memory usage.
+We also want to choose the most efficient implementation available, striking a balance between runtime and memory usage.
 
 Prior work has demonstrated that the right container implementation can give substantial performance improvements.
-\cite{l_liu_perflint_2009} found and suggested fixes for ``hundreds of suboptimal patterns in a set of large C++ benchmarks,'' with one such case improving performance by 17\%.
-Similarly, \cite{jung_brainy_2011} demonstrates an average increase in speed of 27-33\% on real-world applications and libraries using a similar approach.
+Perflint\citep{l_liu_perflint_2009} found and suggested fixes for ``hundreds of suboptimal patterns in a set of large C++ benchmarks,'' with one such case improving performance by 17\%.
+Similarly, Brainy\citep{jung_brainy_2011} demonstrates an average increase in speed of 27-33\% on real-world applications and libraries using a similar approach.
 
 If we can find a set of implementations that satisfy our functional requirements, then an obvious solution is to benchmark the program with each of these implementations in place.
 This will obviously work, as long as our benchmarks are roughly representative of the real world.
@@ -61,7 +61,7 @@ This quickly becomes unfeasible, so we must explore other selection methods.
 
 \section{Prior literature}
 
-In this section we outline methods for container selection available within and outside of current programming languages and their limitations.
+In this section we outline existing methods for container selection, in both current programming languages and literature.
 
 \subsection{Approaches in common programming languages}
 
@@ -76,39 +76,40 @@ In other languages, collections are given as part of a standard library or must
 Java comes with growable lists as part of its standard library, as does Rust.
 In both cases, the standard library implementation is not special --- users can implement their own and use them in the same ways.
 
-Often interfaces, or their closest equivalent, are used to abstract over 'similar' collections.
+Interfaces, or their closest equivalent, are often used to abstract over 'similar' collections.
 In Java, ordered collections implement the interface \code{List<E>}, with similar interfaces for \code{Set<E>}, \code{Queue<E>}, etc.
 
-This allows most code to be implementation-agnostic, however the developer must still choose a concrete implementation at some point.
-This means that developers are forced to guess based on their knowledge of the underlying implementations, or simply choose the most common implementation.
+This allows most code to be implementation-agnostic, but still requires the developer to choose a concrete implementation at some point.
+This means that developers are forced to guess based on their knowledge of the underlying implementations, or to simply choose the most common implementation.
 
 \subsection{Rules-based approaches}
 
-One approach to the container selection problem is to allow the developer to make the choice initially, but use some tool to detect poor choices.
+One approach to this problem is to allow the developer to make the choice initially, but use some tool to detect poor choices.
 Chameleon\citep{shacham_chameleon_2009} uses this approach.
 
-It first collects statistics from program benchmarks using a ``semantic profiler''.
+First, it collects statistics from program benchmarks using a ``semantic profiler''.
 This includes the space used by collections over time and the counts of each operation performed.
 These statistics are tracked per individual collection allocated and then aggregated by 'allocation context' --- the call stack at the point where the allocation occured.
 
-These aggregated statistics are passed to a rules engine, which uses a set of rules to suggest container types which might improve performance.
+These aggregated statistics are passed to a rules engine, which uses a set of rules to suggest different container types which might have better performance.
 This results in a flexible engine for providing suggestions which can be extended with new rules and types as necessary.
+A similar approach is used by \cite{l_liu_perflint_2009} for the C++ standard library.
 
 However, adding new implementations requires the developer to write new suggestion rules.
-This can be difficult as it requires the developer to understand all of the existing implementations' performance characteristics.
+This can be difficult, as it requires the developer to understand all of the existing implementations' performance characteristics.
 
 To satisfy functional requirements, Chameleon only suggests new types that behave identically to the existing type.
 This results in selection rules being more restricted than they otherwise could be.
 For instance, a rule cannot suggest a \code{HashSet} instead of a \code{LinkedList} as the two are not semantically identical.
 Chameleon has no way of knowing if doing so will break the program's functionality and so it does not make the suggestion.
 
-CoCo \citep{hutchison_coco_2013} and work by \"{O}sterlund \citep{osterlund_dynamically_2013} use similar techniques, but work as the program runs.
-This works well for programs with different phases of execution, such as loading and then working on data.
+CoCo\citep{hutchison_coco_2013} and \cite{osterlund_dynamically_2013} use similar techniques, but work as the program runs.
+This was shown to work well for programs with different phases of execution, such as loading and then working on data.
 However, the overhead from profiling and from checking rules may not be worth the improvements in other programs, where access patterns are roughly the same throughout.
 
 \subsection{ML-based approaches}
 
-Brainy\citep{jung_brainy_2011} gathers statistics similarly, however it uses machine learning (ML) for selection instead of programmed rules.
+Brainy\citep{jung_brainy_2011} gathers similar statistics, but uses machine learning for selection instead of programmed rules.
 
 ML has the advantage of being able to detect patterns a human may not be aware of.
 For example, Brainy takes into account statistics from hardware counters, which are difficult for a human to reason about.
@@ -116,24 +117,19 @@ This also makes it easier to add new collection implementations, as rules do not
 
 \subsection{Estimate-based approaches}
 
-CollectionSwitch\citep{costa_collectionswitch_2018} is an online solution which adapts as the program runs and new information becomes available.
+CollectionSwitch\citep{costa_collectionswitch_2018} is another solution, which attempts to estimate the performance characteristics of each implementation individually.
 
 First, a performance model is built for each container implementation.
 This gives an estimate of some cost for each operation at a given collection size.
-We call the measure of cost the ``cost dimension''.
-Examples of cost dimensions include memory usage and execution time.
+This cost might be a measurement of memory usage, or execution time.
 
-This is combined with profiling information to give cost estimates for each collection type and cost dimension.
-Switching between container types is then done based on the potential change in each cost dimension.
+The system then collects data on how the program uses containers as it runs, and combines this with the built cost models to estimate the performance impact for each collection type.
+It may then decide to switch between container types if the potential change in cost seems high enough.
 For instance, we may choose to switch if we reduce the estimated space cost by more than 20\%, so long as the estimated time cost doesn't increase by more than 20\%.
 
 By generating a cost model based on benchmarks, CollectionSwitch manages to be more flexible than rule-based approaches.
 Like ML approaches, adding new implementations requires little extra work, but has the advantage of being possible without having to re-train a model.
 
-A similar approach is used by \cite{l_liu_perflint_2009} for the C++ standard library.
-It focuses on measuring the cost and frequency of more fine-grained operations, such as list resizing.
-However, it does not take the collection size into account.
-
 \subsection{Functional requirements}
 
 Most of the approaches we have highlighted focus on non-functional requirements, and use programming language features to enforce functional requirements.
@@ -142,24 +138,26 @@ We will now examine tools which focus on container selection based on functional
 Primrose \citep{qin_primrose_2023} is one such tool, which uses a model-based approach.
 It allows the application developer to specify semantic requirements using a Domain-Specific Language (DSL), and syntactic requirements using Rust's traits.
 
-The semantic requirements are expressed as a list of predicates, each representing a semantic property.
+Semantic requirements are expressed as a list of predicates, each representing a semantic property.
 Predicates act on an abstract model of the container type.
-Each implementation also specifies the conditions it upholds using an abstract model.
-A constraint solver then checks if a given implementation will always meet the conditions required by the predicate(s).
+Each implementation specifies how it works on this abstract model, and a constraint solver checks if the two will always agree.
 
 This allows developers to express any combination of semantic requirements, rather than limiting them to common ones (as in Java's approach).
 It can also be extended with new implementations as needed, though this does require modelling the semantics of the new implementation.
 
-\cite{franke_collection_2022} uses a similar idea, but is limited to properties defined by the library authors and implemented on the container implementations.
+\cite{franke_collection_2022} uses an idea more similar to Java's standard library, where properties are defined by the library authors and container implementations opt in to providing them.
 
 To select the final container implementation, both tools rely on benchmarking each candidate.
 As we note above, this scales poorly.
 
 \section{Contribution}
 
-We aim to create a container selection method that addresses both functional and non-functional requirements.
+Of the tools presented, none are able to deal with both functional and non-functional requirements properly.
+Our contribution is a system for container selection that addresses both of these aspects.
+
+Users are able to specify their functional requirements in a way that is expressive enough for most usecases, and easy to integrate with existing projects.
+We then find which implementations in our library satisfy these requirements, and estimate which will have the best performance.
 
-Users should be able to specify their functional requirements in a way that is expressive enough for most usecases, and easy to integrate with existing projects.
-It should also be easy to add new container implementations, and we should be able to use it on large projects without selection time becoming an issue.
+We also aim to make it easy to add new container implementations, and for our system to scale up to large projects without selection time becoming an issue.
 
-We focus on offline container selection (done before the program is compiled), however we also attempt to detect when changing implementation at runtime is desirable.
+Whilst the bulk of our system is focused on offline selection (done before the program is compiled), we also attempt to detect when changing implementation at runtime is desirable.
diff --git a/thesis/parts/conclusion.tex b/thesis/parts/conclusion.tex
index e018efa..cb0f9a4 100644
--- a/thesis/parts/conclusion.tex
+++ b/thesis/parts/conclusion.tex
@@ -2,14 +2,14 @@
 We have presented an integrated system for container implementation selection, which can take into account the functional and non-functional requirements of the program it is working on.
 
 %% Ease of extending / flexibility
-Our system is extremely flexible, and can be easily extended with new container types and new functionality on those types, as we showed by adding associative collections and several new data types.
+Our system is extremely flexible, and can be easily extended with new container types and new functionality on those types, as we showed by adding associative collections and several new data types to our library.
 
 %% Demonstrated predictive power of profiling and benchmarking, although limited testing
-We demonstrated that benchmarking of container implementations and profiling of target applications can be done separately and then combined to suggest the fastest container implementation for a particular program.
+We demonstrated how benchmarking of container implementations and profiling of target applications can be done separately and then combined.
 We prove that this approach has merit, although our testing had notable limitations that future work should improve on.
 We also found that while linear regression is powerful enough for many cases, more research is required on how best to gather and preprocess data in order to best capture an implementation's performance characteristics.
 
 %% Researched feasibility of adaptive containers, found issues with overhead and threshold detection
 We test the effectiveness of switching container implementation as the n value changes, and in doing so find several important factors to consider.
 %% Future work should focus on minimising overhead and finding the ideal threshold
-Future work should focus on minimising the overhead applied to every operation, as well as finding the correct threshold at which to switch implementation.
+Future work should focus on minimising the overhead applied to every operation, as well as on finding the correct threshold at which to switch implementation.
diff --git a/thesis/parts/design.tex b/thesis/parts/design.tex
index 84643b1..01cd858 100644
--- a/thesis/parts/design.tex
+++ b/thesis/parts/design.tex
@@ -1,7 +1,6 @@
-This chapter outlines the design of our container selection system (Candelabra), and justifies our design decisions.
-
-We first describe our aims and priorities for the system, and illustrate its usage with an example.
+We now outline the design of our container selection system (Candelabra), and justify our design decisions.
 
+We first restate our aims and priorities for the system, illustrating its usage with an example.
 We then provide an overview of the container selection process, and each part in it.
 
 We leave detailed discussion of implementation for chapter \ref{chap:implementation}.
@@ -12,7 +11,7 @@ As mentioned previously, we aim to create an all-in-one solution for container s
 Flexibility is a high priority: It should be easy to add new container implementations, and to integrate our system into existing applications.
 Our system should also be able to scale to larger programs, and remain convenient for developers to use.
 
-We chose to implement our system as a Rust CLI, and to work on programs also written in Rust.
+We chose to implement our system as a CLI, and to work on programs written in Rust.
 We chose Rust both for the expressivity of its type system, and its focus on speed and low-level control.
 However, most of the techniques we use are not tied to Rust in particular, and so should be possible to generalise to other languages.
 
@@ -20,16 +19,11 @@ We require the user to provide their own benchmarks, which should be representat
 
 Users specify their functional requirements by listing the required traits and properties they need for a given container type.
 Traits are Rust's primary method of abstraction, and are similar to interfaces in object-oriented languages, or typeclasses in functional languages.
-Properties are specified in a lisp-like DSL as a predicate on a model of the container.
+Properties are specified in a Lisp-like DSL as a predicate on a model of the container.
 
 For example, Listing \ref{lst:selection_example} shows code from our test case based on the sieve of Eratosthenes (\code{src/tests/prime_sieve} in the source artifacts).
-Here we request two container types: \code{Sieve} and \code{Primes}.
-The first must implement the \code{Container} and \code{Stack} traits, and must satisfy the \code{lifo} property. This property is defined at the top as only being applicable to \code{Stack}s, and requires that for any \code{x}, pushing \code{x} then popping from the container returns \code{x}.
-
-The second container type, \code{Primes}, must only implement the \code{Container} trait, and must satisfy the \code{ascending} property.
-This property requires that for all consecutive \code{x, y} pairs in the container, \code{x <= y}.
 
-\begin{figure}
+\begin{figure}[h]
 \begin{lstlisting}[caption=Container type definitions for prime\_sieve,label={lst:selection_example}]
 /*SPEC*
 property lifo<T> {
@@ -47,20 +41,26 @@ type Primes<S> = {c impl (Container) | (ascending c)}
 \end{lstlisting}
 \end{figure}
 
+Here we request two container types: \code{Sieve} and \code{Primes}.
+The first must implement the \code{Container} and \code{Stack} traits, and must satisfy the \code{lifo} property. This property is defined at the top, and requires that for any \code{x}, pushing \code{x} then popping from the container returns \code{x}.
+
+The second container type, \code{Primes}, must implement the \code{Container} trait, and must satisfy the \code{ascending} property.
+This property requires that for all consecutive \code{x, y} pairs in the container, \code{x <= y}.
+
 Once we've specified our functional requirements and provided a benchmark (\code{src/tests/prime_sieve/benches/main.rs}), we can simply run Candelabra to select a container: \code{candelabra-cli -p prime_sieve select}.
-This command outputs something like Table \ref{table:selection_output}, and saves the best combination of container types to be used the next time the program is run.
+This command outputs something like table \ref{table:selection_output}, and saves the best combination of container types to be used the next time the program is run.
 Here, the code generated uses \code{Vec} as the implementation for \code{Sieve}, and \code{HashSet} as the implementation for \code{Primes}.
 
 \begin{table}[h]
   \centering
   \begin{tabular}{|c|c|c|c|}
-    Name & Implementation & Estimated Cost \\
+    & Container Type & Implementation & Estimated cost \\
     \hline
-    Sieve & std::vec::Vec & 159040493883 \\
-    Sieve & std::collections::LinkedList & 564583506434 \\
-    Primes & primrose\_library::SortedVec & 414991320 \\
-    Primes & std::collections::BTreeSet & 355962089 \\
-    Primes & std::collections::HashSet & 309638677 \\
+    * & Sieve & LinkedList & 14179471355 \\
+      & Sieve & Vec & 26151238698 \\
+      & Primes & HashSet & 117005368 \\
+      & Primes & SortedVec & 112421356 \\
+    * & Primes & BTreeSet & 108931859 \\
   \end{tabular}
   \caption{Example output from selection command}
   \label{table:selection_output}
@@ -68,16 +68,18 @@ Here, the code generated uses \code{Vec} as the implementation for \code{Sieve},
 
 \section{Overview of process}
 
-Our tool integrates with Rust's packaging system (Cargo) to discover the information it needs about our project, then runs Primrose to find a list of implementations satsifying our functional requirements, from a pre-built library of container implementations.
+Our tool integrates with Rust's packaging system (Cargo) to discover the information it needs about our project.
+It then runs Primrose to find a list of implementations satsifying our functional requirements from a pre-built library of container implementations.
 
-Once we have this list, we then build a 'cost model' for each candidate type. This allows us to get an upper bound for the runtime cost of an operation at any given n.
+Once we have this list, we build a 'cost model' for each candidate type. This allows us to get an upper bound for the runtime cost of an operation at any given n.
+We choose to focus only on CPU time, and disregard memory usage due to the difficulty of accurately measuring memory footprint.\footnote{As Rust is not interpreted, we would need to hook into calls to the OS' memory allocator. This is very platform-specific, although the currently work in progress allocator API may make this easier in future.}
 
-We then run the user-provided benchmarks, using any of the valid candidates instrumented to track how many times each operation is performed, and the maximum size of the container.
+We then run the user-provided benchmarks, using a wrapper around any of the valid candidates to track how many times each operation is performed, and the maximum size the container reaches.
 
 We combine this information with our cost models to estimate a total cost for each candidate, which is an upper bound on the total time taken for all container operations.
 At this point, we also check if an 'adaptive' container would be better, by checking if one implementation is better performing at a lower n, and another at a higher n.
 
-Finally, we pick the implementation with the minimum cost, and generate code which sets the container type to use that implementation.
+Finally, we pick the implementation with the minimum cost, and generate code which allows the program to use that implementation.
 
 Our solution requires little user intervention, integrates well with existing workflows, and the time it takes scales linearly with the number of container types in a given project.
 
@@ -95,26 +97,27 @@ Each container type that we want to select an implementation for is bound by a l
 
 %% Short explanation of selection method
 In brief, primrose works by:
+
 \begin{itemize}
 \item Finding all implementations in the container library that implement all required traits
-\item Translate any specified properties to a Rosette expression
-\item For each implementation, model the behaviour of each operation in Rosette, and check that the required properties always hold
+\item Translating any specified properties to a Rosette expression
+\item For each implementation, modelling the behaviour of each operation in Rosette, and checking that the required properties always hold
 \end{itemize}
 
-We use the code provided with the Primrose paper, with minor modifications elaborated on in Chapter \ref{chap:implementation}.
+We use the code provided in the Primrose paper, with minor modifications elaborated on in Chapter \ref{chap:implementation}.
 
-At this stage, we have a list of implementations for each container type we are selecting. The command \code{candelabra-cli candidates} will show this output, as in Table \ref{table:candidates_prime_sieve}.
+After this stage, we have a list of implementations for each container type we are selecting. The command \code{candelabra-cli candidates} will show this output, as in Table \ref{table:candidates_prime_sieve}.
 
 \begin{table}[h]
   \centering
   \begin{tabular}{|c|c|c|}
-    Type & Implementation \\
+    Type & Candidate implementation \\
     \hline
-    Primes & primrose\_library::EagerSortedVec \\
-    Primes & std::collections::HashSet \\
-    Primes & std::collections::BTreeSet \\
-    Sieve & std::collections::LinkedList \\
-    Sieve & std::vec::Vec \\
+    Primes & EagerSortedVec \\
+    Primes & HashSet \\
+    Primes & BTreeSet \\
+    Sieve & LinkedList \\
+    Sieve & Vec \\
   \end{tabular}
   \caption{Usable implementations by container type for \code{prime_sieve}}
   \label{table:candidates_prime_sieve}
@@ -125,7 +128,7 @@ Although we use primrose in our implementation, the rest of our system isn't dep
 
 \section{Cost Models}
 
-Now that we have a list of possible implementations, we need to understand the performance characteristics of each of them.
+Now that we have a list of correct implementations for each container type, we need a way to understand the performance characteristics of each of them in isolation.
 We use an approach similar to CollectionSwitch\citep{costa_collectionswitch_2018}, which assumes that the main factor in how long an operation takes is the current size of the collection.
 
 %% Benchmarks
@@ -134,21 +137,23 @@ An implementation has a seperate cost model for each operation, which we obtain
 For example, to build a cost model for \code{Vec::contains}, we would create several \code{Vec}s of varying sizes, and find the average execution time $t$ of \code{contains} at each.
 
 %% Linear Regression
-We then perform regression, using the collection size $n$ to predict $t$.
+We then perform linear regression, using the collection size $n$ to predict $t$.
 In the case of \code{Vec::contains}, we would expect the resulting polynomial to be roughly linear.
 
 In our implementation, we fit a function of the form $x_0 + x_1 n + x_2 n^2 + x_3 \log_2 n$, using regular least-squares fitting.
 Before fitting, we discard all observations that are more than one standard deviation out from the mean for a given $n$ value.
 
-Whilst we could use a more complex technique, in practice this is good enough: Most common operations are polynomial at worst, and more complex models risk overfitting.
+Whilst we could use a more complex technique, in practice this is good enough: Most common operations are polynomial at worst, and more complex models are at higher risk of overfitting.
 
 %% Limitations
 This method works well for many operations and structures, although has notable limitations.
 In particular, implementations which defer work from one function to another will be extremely inconsistent.
+
 For example, \code{LazySortedVec} (provided by Primrose) inserts new elements at the end by default, and waits to sort the list until the contents of the list are read from (such as by using \code{contains}).
+Our cost models have no way to express that the runtime of one operation varies based on the history of previous operations.
 
 We were unable to work around this, and so we have removed these variants from our container library.
-A potential solution could be to perform untimed 'warmup' operations before each operation, but this is complex because it requires some understanding of what operations will cause work to be deferred.
+One potential solution could be to perform untimed 'warmup' operations before each operation, but this is complex because it requires some understanding of what operations will cause work to be deferred.
 
 At the end of this stage, we are able to reason about the relative cost of operations between implementations.
 These models are cached for as long as our container library remains the same, as they are independent of what program the user is currently working on.
@@ -159,25 +164,42 @@ We now need to collect information about how the user's application uses its con
 
 %% Data Collected
 As mentioned above, the ordering of operations can have a large effect on container performance.
-Unfortunately, tracking every container operation in order quickly becomes unfeasible, so we settle for tracking the count of each operation, and the maximum size of each collection instance.
+Unfortunately, tracking every container operation in order quickly becomes unfeasible.
+Instead, we settle for tracking the count of each operation, and the maximum size the collection reaches.
+
+Every instance or allocation of the collection produces a separate result.
+We then aggregate all results for a single container type into a list of partitions.
 
-Every instance or allocation of the collection is tracked separately, and results are collated after profiling.
-%% Segmentation
-Results with a close enough n value get sorted into partitions, where each partition stores the average count of each operation, and a weight indicating how common results in that partition were.
-This serves 3 purposes.
+Each partition simply stores an average value for each component of our results (maximum size and a count for each operation), along with a weight indicating how many results fell into that partition.
 
+Results are processed as follows:
+
+\begin{itemize}
+\item We start with an empty list of partitions.
+\item For each result, if there is a partition with an average max n value within 100 of that result's maximum n, add the result to that partition:
+  \begin{itemize}
+  \item Adjust the partition's average maximum n according to the new result
+  \item Adjust the partitions' average count of each operation according to the counts in the new result
+    \item Add 1 to the weight of the partition.
+  \end{itemize}
+\item If there is no such partition, create a new one with the values from the result, and with weight 1.
+\item Once all results have been processed, normalize the partition weights by dividing each by the sum of all weights.
+\end{itemize}
+
+The use of partitions serves 3 purposes.
 The first is to compress the data, which speeds up processing and stops us running out of memory in more complex programs.
 The second is to capture the fact that the number of operations will likely depend on the size of the container.
-The third is to aid in searching for adaptive containers, which we will elaborate on later.
+The third is to aid in searching for adaptive containers, a process which relies on understanding the different sizes of containers in the application.
 
 \section{Selection process}
 
 %% Selection process
-Once we have an estimate of how long each operation may take (from our cost models), and how often we use each operation (from our profiling information), we combine these to estimate the total cost of each implementation.
-For each implementation, our total cost estimate is:
+At this stage, we have an estimate of how long each operation may take (from our cost models), and how often we use each operation (from our profiling information).
+We now combine these to estimate the total cost of each implementation.
+For each implementation, our estimate for its total cost is:
 
 $$
-\sum_{o\in \textrm{ops}} \sum_{(r_{o}, N, W) \in \textrm{partitions}} C_o(N) * r_o
+\sum_{o\in \mathit{ops}, (r_{o}, N, W) \in \mathit{partitions}} C_o(N) * r_o
 * W
 $$
 
@@ -185,32 +207,47 @@ $$
 \item $C_o(N)$ is the cost estimated by the cost model for operation $o$ at n value $N$
 \item $r_o$ is the average count of a given operation in a partition
 \item $N$ is the average maximum N value in a partition
-\item $W$ is the weight of a partition, representing how many allocations fell in to this partition
+\item $W$ is the weight of a partition, proportional to how many results fell in to this partition
 \end{itemize}
 
 Essentially, we scale an estimated worst-case cost of each operation by how frequently we think we will encounter it.
+This results in a pessimistic estimate of how much time we will spend in total on container operations for a given implementation.
+
+Now we simply pick the implementation with the smallest estimated cost that satisfies our functional requirements for that container type.
+
+This process is repeated for each container type in the project, and provides not only suggestions but rankings for each type.
+We are able to do this while only running our user's benchmarks once, which is ideal for larger applications or for long-running benchmarks.
 
 \section{Adaptive containers}
 
-In many cases, the maximum size of a container type varies greatly between program runs.
-In these cases, it may be desirable to start off with one container type, and switch to another one if the size of the container grows greatly.
+The above process forms the core of our system, and is sufficient for normal container selection.
+However, a common situation in many programs is that the maximum size of a container type depends on the size of some input.
+In these cases, the user may write benchmarks for a range of sizes, and look for a container type that achieves good enough performance throughout the whole range.
+
+An alternative approach is to start off with whichever container type is best at small sizes, and switch to one more suited for large amounts of data once we grow past a certain threshold.
+In theory, this allows the best of both worlds: A lower overhead container when the program is being run on a small input, and a more complex one for longer runs.
 
 For example, if a program requires a set, then for small sizes it may be best to keep a sorted list and use binary search for \code{contains} operations.
+This is what we do in our \code{SortedVecSet} container implementation.
 But when the size of the container grows, the cost of doing \code{contains} may grow high enough that using a \code{HashSet} would actually be faster.
 
-Adaptive containers attempt to address this need, by starting off with one implementation (the low or before implementation), and switching to a new implemenation (the high or after implementation) once the size of the container passes a certain threshold.
+Adaptive containers attempt to address this need, by starting off with one implementation (referred to as the low or before implementation), and switching to a new implemenation (the high or after implementation) once the size of the container passes a certain threshold.
 
-This is similar to systems such as CoCo\citep{hutchison_coco_2013} and in work by \"{O}sterlund\citep{osterlund_dynamically_2013}.
+This is similar to systems such as CoCo\citep{hutchison_coco_2013} and \cite{osterlund_dynamically_2013}.
 However, we decide when to switch container implementation before the program is run, rather than as it is running.
 We also do so in a way that requires no knowledge of the implementation internals.
 
 %% Adaptive container detection
-Using our list of partitions, we sort it by ascending container size and attempt to find a split.
+After regular container selection is done, we attempt to suggest an adaptive implementation for each container type.
+We first sort our list of partitions by ascending container size.
 If we can split our partitions in half such that everything to the left performs best with one implementation, and everything to the right with another, then we should be able to switch implementation around that n value.
 
 In practice, finding the correct threshold is more difficult: We must take into account the cost of transforming from one implementation to another.
 If we adapt our container too early, we may do more work adapting it than we save if we just stuck with our low implementation.
 If we adapt too late, we have more data to move and less of our program gets to take advantage of the new implementation.
-
 We choose the relatively simple strategy of switching halfway between two partitions.
-Our cost models let us estimate how expensive switching implementations will be, which we compare against how much we save by switching to the after implementation.
+
+Our cost models allow us to estimate how expensive switching implementations will be.
+We compare this estimate to how much better the high implementation is than the low one, to account for the overhead of changing implementations.
+
+A full explanation of our algorithm is given in section \ref{section:impl_selection}.
diff --git a/thesis/parts/implementation.tex b/thesis/parts/implementation.tex
index 8c5483d..bba1a3f 100644
--- a/thesis/parts/implementation.tex
+++ b/thesis/parts/implementation.tex
@@ -1,4 +1,4 @@
-This chapter elaborates on some implementation details glossed over in the previous chapter.
+We now elaborate on our implementation, explaining some of the finer details of our design.
 With reference to the source code, we explain the structure of our system's implementation, and highlight areas with difficulties.
 
 \section{Modifications to Primrose}
@@ -8,15 +8,15 @@ In order to facilitate integration with Primrose, we refactored large parts of t
 This also required updating the older code to a newer edition of Rust, and improving the error handling throughout.
 
 %% Mapping trait
-As suggested in the original paper, we added the ability to ask for associative container types: ones that map a key to a value.
-This was done by adding a new \code{Mapping} trait to the library, and updating the type checking and analysis code to support multiple type variables in container type declarations, and be aware of the operations available on mappings.
+As suggested in the original paper, we added the ability to deal with associative container types: key to value mappings.
+We added the \code{Mapping} trait to the implementation library, and updated the type checking and analysis code to support multiple type variables.
 
 Operations on mapping implementations can be modelled and checked against constraints in the same way that regular containers can be.
 They are modelled in Rosette as a list of key-value pairs.
 \code{src/crates/library/src/hashmap.rs} shows how mapping container types can be declared, and operations on them modelled.
 
 Table \ref{table:library} shows the library of container types we used.
-Most come from the Rust standard library, with the exceptions of \code{SortedVec} and \code{SortedUniqueVec}, which use \code{Vec} internally.
+Most come from the Rust standard library, with the exceptions of the \code{SortedVec} family of containers, which use \code{Vec} internally.
 The library source can be found in \code{src/crates/library}.
 
 \begin{table}[h]
@@ -46,95 +46,88 @@ We also added new syntax to Primrose's domain-specific language to support defin
 While performing integration testing, we found and fixed several other issues with the existing code:
 
 \begin{enumerate}
-\item Only push and pop operations could be modelled in properties without raising an error during type-checking.
-\item The Rosette code generated for properties using other operations would be incorrect.
+\item Only push and pop operations could be modelled in properties. Ohter operations would raise an error during type-checking.
+\item The Rosette code generated for properties using other operations was incorrect.
 \item Some trait methods used mutable borrows unnecessarily, making it difficult or impossible to write safe Rust using them.
 \item The generated code would perform an unnecessary heap allocation for every created container, which could affect performance.
 \end{enumerate}
 
-We also added a requirement for all \code{Container}s and \code{Mappings} to implement \code{IntoIterator} and \code{FromIterator}, as well as to allow iterating over elements.
+We also added requirements to the \code{Container} and \code{Mapping} traits related to Rust's \code{Iterator} API.
+Among other things, this allows us to use for loops, and to more easily move data from one implementation to another.
 
 \section{Building cost models}
 
 %% Benchmarker crate
-In order to benchmark container types, we use a seperate crate (\code{src/crates/candelabra-benchmarker}) which contains benchmarking code for each trait in the Primrose library.
+In order to benchmark container types, we use a seperate crate (\code{src/crates/benchmarker}) containing benchmarking code for each trait in the Primrose library.
 When benchmarks need to be run for an implementation, we dynamically generate a new crate, which runs all benchmark methods appropriate for the given implementation (\code{src/crate/candelabra/src/cost/benchmark.rs}).
 
 As Rust's generics are monomorphised, our generic code is compiled as if we were using the concrete type in our code, so we don't need to worry about affecting the benchmark results.
 
 Each benchmark is run in a 'warmup' loop for a fixed amount of time (currently 500ms), then runs for a fixed number of iterations (currently 50).
-This is important because we use every observation when fitting our cost models, so varying the number of iterations would change our curve's fit.
-We repeat each benchmark at a range of $n$ values, ranging from $10$ to $60,000$.
+This is important because we are using least squares fitting - if there are less data points at higher $n$ values then our resulting model may not fit those points as well.
+We repeat each benchmark at a range of $n$ values: $10, 50, 100, 250, 500, 1,000, 6,000, 12,000, 24,000, 36,000, 48,000, 60,000$.
 
 Each benchmark we run corresponds to one container operation.
 For most operations, we insert $n$ random values to a new container, then run the operation once per iteration.
 For certain operations which are commonly amortized (\code{insert}, \code{push}, and \code{pop}), we instead run the operation itself $n$ times and divide all data points by $n$.
 
-We use least squares to fit a polynomial to all of our data.
-As operations on most common data structures are polynomial or logarithmic complexity, we believe that least squares fitting is good enough to capture the cost of most operations.
-We originally experimented with coefficients up to $x^3$, but found that this led to bad overfitting.
+As discussed previously, we discard all points that are outwith one standard deviation of the mean for each $n$ value.
+We use the least squares method to fit a polynomial of form $x_0 + x_1 n + x_2 n^2 + x_3 \log_2 n$.
+As most operations on common data structures are polynomial or logarithmic complexity, we believe that least squares fitting is good enough to capture the cost of most operations.
+We originally experimented with coefficients up to $x^3$, but found that this led to overfitting.
 
 \section{Profiling}
 
-We implement profiling by using a \code{ProfilerWrapper} type (\code{src/crates/library/src/profiler.rs}), which takes as type parameters the 'inner' container implementation and an index later used to identify what type the profiling info corresponds to.
+We implement profiling using a \code{ProfilerWrapper} type (\code{src/crates/library/src/profiler.rs}), which takes as type parameters the inner container implementation and an index, used later to identify what container type the output corresponds to.
 We then implement any primrose traits that the inner container implements, counting the number of times each operation is called.
 We also check the length of the container after each insertion operation, and track the maximum.
 
-This tracking is done per-instance, and recorded when the instance goes out of scope and its \code{Drop} implementation is called.
+Tracking is done per-instance, and recorded when the container goes out of scope and its \code{Drop} implementation is called.
 We write the counts of each operation and maximum size of the collection to a location specified by an environment variable.
 
 When we want to profile a program, we pick any valid inner implementation for each selection site, and use that candidate with our profiling wrapper as the concrete implementation for that site.
+We then run all of the program's benchmarks once, which gives us an equal sample of data from each of them.
 
 This approach has the advantage of giving us information on each individual collection allocated, rather than only statistics for the type as a whole.
 For example, if one instance of a container type is used in a very different way from the rest, we will be able to see it more clearly than a normal profiling tool would allow us to.
 
-Although there is noticeable overhead in our current implementation, it's not important as we aren't measuring the program's execution time when profiling.
-Future work could likely improve the overhead by batching file outputs, however this wasn't necessary for us.
+Although there is noticeable overhead in our current implementation, this is not important as we aren't measuring the program's execution time when profiling.
+We could likely reduce profiling overhead by batching file outputs, however this wasn't necessary for us.
 
-\section{Selection and Codegen}
+\section{Container Selection}
+\label{section:impl_selection}
 
 %% Selection Algorithm incl Adaptiv
 Selection is done per container type.
 For each candidate implementation, we calculate its cost on each partition in the profiler output, then sum these values to get the total estimated cost for each implementation.
-This provides us with estimates for each singular candidate.
+This is implemented in \code{src/crates/candelabra/src/profiler/info.rs} and \code{src/crates/candelabra/src/select.rs}.
 
 In order to try and suggest an adaptive container, we use the following algorithm:
 
 \begin{enumerate}
-\item Sort partitions in order of ascending maximum n values.
-\item Calculate the cost for each candidate and for each partition
+\item Sort the list of partitions in order of ascending maximum n values.
+\item Calculate the cost for each candidate in each partition individually.
 \item For each partition, find the best candidate and store it in the array \code{best}. Note that we don't sum across all partitions this time.
 \item Find the lowest index \code{i} where \code{best[i] != best[0]}
-\item Check that \code{i} partitions the list properly: For all \code{j < i}, \code{best[j] == best[0]} and for all \code{j>=i}, \code{best[j] == best[i]}.
+\item Check that \code{i} splits the list properly: For all \code{j < i}, \code{best[j] == best[0]} and for all \code{j>=i}, \code{best[j] == best[i]}.
 \item Let \code{before} be the name of the candidate in \code{best[0]}, \code{after} be the name of the candidate in \code{best[i]}, and \code{threshold} be halfway between the maximum n values of partition \code{i} and partition \code{i-1}.
 \item Calculate the cost of switching as:
   $$
-  C_{\textrm{before,clear}}(\textrm{threshold}) + \textrm{threshold} * C_{\textrm{after,insert}}(\textrm{threshold})
+  C_{\mathit{before,clear}}(\mathit{threshold}) + \mathit{threshold} * C_{\mathit{after,insert}}(\mathit{threshold})
   $$
 \item Calculate the cost of not switching: The sum of the difference in cost between \code{before} and \code{after} for all partitions with index \code{> i}.
-\item If the cost of not switching is less than the cost of switching, we can't make a suggestion.
-\item Otherwise, suggest an adaptive container which switches from \code{before} to \code{after} when $n$ gets above \code{threshold}. Its estimated cost is the cost for \code{before} up to partition \code{i}, plus the cost of \code{after} for all other partitions.
+\item If the cost of not switching is less than the cost of switching, don't make a suggestion.
+\item Otherwise, suggest an adaptive container which switches from \code{before} to \code{after} when $n$ gets above \code{threshold}. Its estimated cost is the cost for \code{before} up to partition \code{i}, plus the cost of \code{after} for all other partitions, and the cost of switching.
 \end{enumerate}
 
+\section{Code Generation}
+
 %% Generated code (opaque types)
-As mentioned above, the original Primrose code would generate code as in Listing \ref{lst:primrose_codegen}.
+As mentioned in chapter \ref{chap:design}, we made modifications to Primrose's code generation in order to improve the resulting code's performance.
+The original Primrose code would generate code as in Listing \ref{lst:primrose_codegen}.
 In order to ensure that users specify all of the traits they need, this code only exposes methods on the implementation that are part of the trait bounds given.
 However, it does this by using a \code{dyn} object, Rust's mechanism for dynamic dispatch.
 
-Although this approach works, it adds an extra layer of indirection to every call: The caller must use the dyn object's vtable to find the method it needs to call.
-This also prevents the compiler from optimising across this boundary.
-
-In order to avoid this, we make use of Rust's support for existential types: Types that aren't directly named, but are inferred by the compiler.
-Existential types only guarantee their users the given trait bounds, therefore they accomplish the same goal of forcing users to specify all of their trait bounds upfront.
-
-Figure \ref{lst:new_codegen} shows our equivalent generated code.
-The type alias \code{Stack<S>} only allows users to use the \code{Container<S>}, \code{Stack<S>}, and \code{Default} traits.
-Our unused 'dummy' function \code{_StackCon} has the return type \code{Stack<S>}.
-Rust's type inference step sees that its actual return type is \code{Vec<S>}, and therefore sets the concrete type of \code{Stack<S>} to \code{Vec<S>} at compile time.
-
-Unfortunately, this feature is not yet in stable Rust, meaning we have to opt in to it using an unstable compiler flag (\code{feature(type_alias_impl_trait)}).
-At time of writing, the main obstacle to stabilisation appears to be design decisions that only apply to more complicated use-cases, therefore we are confident that this code will remain valid and won't encounter any compiler bugs.
-
 \begin{figure}[h]
   \begin{lstlisting}[caption=Code generated by original Primrose project,label={lst:primrose_codegen},language=Rust]
 pub trait StackTrait<T> : Container<T> + Stack<T> {}
@@ -155,6 +148,17 @@ impl<T: 'static + Ord + std::hash::Hash> ContainerConstructor for Stack<T> {
 \end{lstlisting}
 \end{figure}
 
+Although this approach works, it adds an extra layer of indirection to every call: The caller must use the dyn object's vtable to find the method it needs to call.
+This also prevents the compiler from optimising across this boundary.
+
+In order to avoid this, we make use of Rust's support for existential types: Types that aren't directly named, but are inferred by the compiler.
+Existential types only guarantee their users the given trait bounds, therefore they accomplish the same goal of forcing users to specify all of their trait bounds upfront.
+
+Figure \ref{lst:new_codegen} shows our equivalent generated code.
+The type alias \code{Stack<S>} only allows users to use the \code{Container<S>}, \code{Stack<S>}, and \code{Default} traits.
+Our unused 'dummy' function \code{_StackCon} has the return type \code{Stack<S>}.
+Rust's type inference step sees that its actual return type is \code{Vec<S>}, and therefore sets the concrete type of \code{Stack<S>} to \code{Vec<S>} at compile time.
+
 \begin{figure}[h]
   \begin{lstlisting}[caption=Code generated with new method,label={lst:new_codegen},language=Rust]
 pub type StackCon<S: PartialEq + Ord + std::hash::Hash> = impl Container<S> + Stack<S> + Default;
@@ -165,3 +169,7 @@ fn _StackCon<S: PartialEq + Ord + std::hash::Hash>() -> StackCon<S> {
 }
 \end{lstlisting}
 \end{figure}
+
+Unfortunately, this feature is not yet in stable Rust, meaning we have to opt in to it using an unstable compiler flag (\code{feature(type_alias_impl_trait)}).
+At time of writing, the main obstacle to stabilisation appears to be design decisions that only apply to more complicated use-cases, therefore we are confident that this code will remain valid and won't encounter any compiler bugs.
+
diff --git a/thesis/parts/introduction.tex b/thesis/parts/introduction.tex
index 17830a9..e24abea 100644
--- a/thesis/parts/introduction.tex
+++ b/thesis/parts/introduction.tex
@@ -28,7 +28,7 @@ It is easy to adopt our system incrementally, and we integrate with existing too
 The time it takes to select containers scales roughly linearly, even in complex cases, allowing our system to be used even on larger projects.
 
 %% **** Flexibility of selection
-It is also able to suggest adaptive containers: containers which switch from one underlying implementation to another once they get past a cretain size.
+We are also able to suggest adaptive containers: containers which switch from one underlying implementation to another once they get past a certain size.
 %% **** Overview of results
 Whilst we saw reasonable suggestions in our test cases, we found important performance concerns which future work could improve on.
 
diff --git a/thesis/parts/results.tex b/thesis/parts/results.tex
index f16c502..cc66073 100644
--- a/thesis/parts/results.tex
+++ b/thesis/parts/results.tex
@@ -1,6 +1,6 @@
-In this chapter, we present our results from benchmarking our system.
+In this chapter, we present the methodology used for benchmarking our system, and comment on the results we got.
 We examine the produced cost models of certain operations in detail, with reference to the expected asymptotics of each operation.
-We then compare the selections made by our system to the actual optimal selections for a variety of test cases.
+We then compare the selections made by our system to the actual optimal selections (obtained by brute force) for a variety of test cases.
 This includes examining when adaptive containers are suggested, and their effectiveness.
 
 %% * Testing setup, benchmarking rationale
@@ -26,11 +26,11 @@ The most important software versions are listed below.
 
 We start by examining some of our generated cost models, and comparing them both to the observations they are based on, and what we expect from asymptotic analysis.
 As we build a total of 77 cost models from our library, we will not examine them all in detail.
-We look at models of the most common operations, and group them by containers that are commonly selected together.
+We look at models of the most common operations, grouped by containers that are commonly selected together.
 
 \subsection{Insertion operations}
 Starting with the \code{insert} operation, Figure \ref{fig:cm_insert} shows how the estimated cost changes with the size of the container.
-The lines correspond to our fitted curves, while the points indicate the raw observations these curves are fitted from.
+The lines correspond to our fitted curves, while the points indicate the raw observations we drew from.
 
 \begin{figure}[h!]
   \centering
@@ -39,13 +39,13 @@ The lines correspond to our fitted curves, while the points indicate the raw obs
   \label{fig:cm_insert}
 \end{figure}
 
-Starting with the operation on a \code{Vec}, we see that insertion is very cheap, and gets slightly cheaper as the size of the container increases.
+Starting with \code{Vec}, we see that insertion is very cheap, and gets slightly cheaper as the size of the container increases.
 This roughly agrees with the expected $O(1)$ time of amortised inserts on a Vec.
 However, we also note a sharply increasing curve when $n$ is small, and a slight 'bump' around $n=35,000$.
 The former appears to be in line with the observations, and is likely due to the static growth rate of Rust's Vec implementation.
 The latter appears to diverge from the observations, and may indicate poor fitting.
 
-\code{LinkedList} has a more stable, but significantly slower insertion.
+\code{LinkedList} has a significantly slower insertion.
 This is likely because it requires a syscall for heap allocation for every item inserted, no matter the current size.
 This would also explain why data points appear spread out more, as system calls have more unpredictable latency, even on systems with few other processes running.
 Notably, insertion appears to start to get cheaper past $n=24,000$, although this is only weakly suggested by observations.
@@ -66,16 +66,17 @@ This is what we expect for hash-based collections, with the slight growth likely
 
 \code{BTreeSet} has similar behaviour, but settles at a larger value overall.
 \code{BTreeMap} appears to grow more rapidly, and cost more overall.
-It's important to note that Rust's \code{BTreeSet}s are not based on binary tree search, but instead a more general tree search originally proposed by \cite{bayer_organization_1970}, where each node contains $B-1$ to $2B-1$ elements in an array.
+It's important to note that Rust's \code{BTreeSet} is not based on binary tree search, but instead a more general tree search originally proposed by \cite{bayer_organization_1970}, where each node contains $B-1$ to $2B-1$ elements in an unsorted array.
 The standard library documentation\citep{rust_documentation_team_btreemap_2024} states that search is expected to take $O(B\lg n)$ comparisons.
-Since both of these implementations require searching the collection before inserting, the close-to-logarithmic growth makes sense.
+Since both of these implementations require searching the collection before inserting, the close-to-logarithmic growth seems to makes sense.
 
 \subsubsection{Small n values}
+\label{section:cm_small_n}
 
-Whilst our main figures for insertion operations indicate a clear winner within each category, looking at small $n$ values reveals some more complexity.
+Whilst our main figures for insertion operations indicate a clear winner within each category, looking at small $n$ values reveals more complexity.
 Figure \ref{fig:cm_insert_small_n} shows the cost models for insert operations on different set implementations at smaller n values.
 
-In particular, for $n<1800$ the overhead from sorting a vec is less than running the default hasher function (at least on this hardware).
+Note that for $n<1800$ the overhead from sorting a vec is less than running the default hasher function (at least on this hardware).
 
 We also see a sharp spike in the cost for \code{SortedVecSet} at low $n$ values, and an area of supposed 0 cost from around $n=200$ to $n=800$.
 This seems inaccurate, and indicates that our current fitting procedure may not be able to deal with low $n$ values properly.
@@ -100,22 +101,23 @@ Figure \ref{fig:cm_contains} shows our built cost models, again grouped for read
   \label{fig:cm_contains}
 \end{figure}
 
-Notably, the observations in these graphs have a much wider spread than our \code{insert} operations do.
+The observations in these graphs have a much wider spread than our \code{insert} operations do.
 This is probably because we attempt to get a different random element in our container every time, so our observations show the best and worst case of our data structures.
 This is desirable assuming that \code{contains} operations are actually randomly distributed in the real world, which seems likely.
 
 For the \code{SortedVec} family, we would expect to see roughly logarithmic growth, as contains is based on binary search.
 This is the case for \code{SortedVecMap}, however \code{SortedVec} and \code{SortedVecSet} both show exponential growth with a 'dip' around $n=25,000$.
-It's unclear why this happened, although it could be due to how the elements we query are distributed throughout the list.
+It's unclear why this happened, although it could be due to how the elements we query are randomly distributed throughout the list.
 A possible improvement would be to run contains with a known distribution of values, including low, high, and not present values in equal parts.
 
 The \code{Vec} family exhibits roughly linear growth, which is expected, since this implementation scans through the whole array each time.
+
 \code{LinkedList} has roughly logarithmic growth, at a significantly higher cost.
 The higher cost is expected, although its unclear why growth is logarithmic rather than linear.
 As the spread of points also appears to increase at larger $n$ values, its possible that this is due to larger $n$ values causing a higher proportion of the program's memory to be dedicated to the container, resulting in better cache utilisation.
 
 \code{HashSet} appears roughly linear as expected, with only a slow logarithmic rise, probably due to an increasing amount of collisions.
-\code{BTreeSet} is consistently above it, with a slightly higher logarithmic rise.
+\code{BTreeSet} is consistently above it, with a slightly faster logarithmic rise.
 
 \code{BTreeMap} and \code{HashMap} both mimic their set counterparts, but with a slightly lower cost and growth rate.
 It's unclear why this is, however it could be related to the larger spread in observations for both implementations.
@@ -123,30 +125,30 @@ It's unclear why this is, however it could be related to the larger spread in ob
 \subsection{Evaluation}
 
 Overall, our cost models appear to be a good representation of each implementations performance impact.
-Future improvements could address the overfitting problems some operations had, such as by employing a more complex fitting procedure, or by doing more to ensure operations have their best and worst cases tested fairly.
+Future improvements should focus on improving accuracy at lower $n$ values, such as by employing a more complex fitting procedure, or on ensuring operations have their best and worst cases tested fairly.
 
 %% * Predictions
 \section{Selections}
 
-We now proceed with end-to-end testing of the system, selecting containers for a selection of programs with varying needs.
+We now proceed with end-to-end testing of the system, selecting containers for a sample of test programs with varying needs.
 
 \subsection{Benchmarks}
 
 %% ** Chosen benchmarks
-Our test cases broadly fall into two categories: Example cases, which repeat a few operations many times, and 'real' cases, which are implementations of common algorithms and solutions to programming puzles.
-We expect the results from our example cases to be relatively unsurprising, while our real cases are more complex and harder to predict.
+Our test programs broadly fall into two categories: Examples, which repeat a few operations many times, and real-life programs, which are implementations of common algorithms and solutions to programming puzles.
+We expect the results from our example programs to be relatively obvious, while our real programs are more complex and harder to predict.
 
-Most of our real cases are solutions to puzzles from Advent of Code\citep{wastl_advent_2015}, a popular collection of programming puzzles.
-Table \ref{table:test_cases} lists and briefly describes our test cases.
+Most of our real programs are solutions to puzzles from Advent of Code\citep{wastl_advent_2015}, a popular collection of programming puzzles.
+Table \ref{table:test_cases} lists and briefly describes our test programs.
 
 \begin{table}[h!]
   \centering
   \begin{tabular}{|c|c|}
     Name & Description \\
     \hline
-    example\_sets & Repeated insert and contains on a set. \\
-    example\_stack & Repeated push and pop from a stack. \\
-    example\_mapping & Repeated insert and get from a mapping. \\
+    example\_sets & Repeated insert and contains operations on a set. \\
+    example\_stack & Repeated push and pop operations on a stack. \\
+    example\_mapping & Repeated insert and get operations on a mapping. \\
     prime\_sieve & Sieve of eratosthenes algorithm. \\
     aoc\_2021\_09 & Flood-fill like algorithm (Advent of Code 2021, Day 9) \\
     aoc\_2022\_08 & Simple 2D raycasting (AoC 2022, Day 8) \\
@@ -154,13 +156,14 @@ Table \ref{table:test_cases} lists and briefly describes our test cases.
     aoc\_2022\_14 & Simple 2D particle simulation (AoC 2022, Day 14) \\
   \end{tabular}
 
-  \caption{Our test applications}
+  \caption{Our test programs}
   \label{table:test_cases}
 \end{table}
 
 %% ** Effect of selection on benchmarks (spread in execution time)
 Table \ref{table:benchmark_spread} shows the difference in benchmark results between the slowest possible assignment of containers, and the fastest.
-Even in our example projects, we see that the wrong choice of container can slow down our programs substantially, with the exception of two of our test cases which were largely unaffected.
+Even in our example programs, we see that the wrong choice of container can slow down our programs substantially.
+In all but two programs, the wrong implementation can more than double the runtime.
 
 \begin{table}[h!]
 \centering
@@ -176,15 +179,15 @@ example\_sets & $1.33$ & $1.6$ \\
 example\_stack & $0.36$ & $19.2$ \\
 prime\_sieve & $26093.26$ & $34.1$ \\
 \end{tabular}
-\caption{Spread in total benchmark results by project}
+\caption{Spread in total benchmark results by program}
 \label{table:benchmark_spread}
 
 \end{table}
 
 \subsection{Prediction accuracy}
 
-We now compare the implementations suggested by our system to the selection that is actually best, obtained by brute force.
-For now, we ignore suggestions for adaptive containers.
+We now compare the implementations suggested by our system to the selection that is actually best, which we obtain by brute-forcing all possible implementations.
+We leave analysis of adaptive container suggestions to section \ref{section:results_adaptive_containers}
 
 Table \ref{table:predicted_actual} shows the predicted best assignments alongside the actual best assignment, obtained by brute-force.
 In all but two of our test cases (marked with *), we correctly identify the best container.
@@ -212,13 +215,14 @@ In all but two of our test cases (marked with *), we correctly identify the best
 
 Both of these failures appear to be caused by being overly eager to suggest a \code{LinkedList}.
 From looking at detailed profiling information, it seems that both of these container types had a relatively small amount of items in them.
-Therefore this is likely caused by our cost models being inaccurate at small $n$ values, such as in Figure \ref{fig:cm_insert_small_n}.
+Therefore this is likely caused by our cost models being inaccurate at small $n$ values, as mentioned in section \ref{section:cm_small_n}.
 
-Overall, our results show our system is able to suggest the best containers, at least for large enough $n$ values.
-Unfortunately, these tests are somewhat limited, as the best container seems relatively predictable: \code{Vec} where uniqueness is not important, and \code{Hash*} otherwise.
-Therefore more thorough testing is needed to fully establish the system's effectiveness.
+Overall, our results suggest that our system is effective, at least for large enough $n$ values.
+Unfortunately, these tests are somewhat limited, as the best container is almost always predictable: \code{Vec} where uniqueness is not important, and \code{Hash*} otherwise.
+Therefore, more thorough testing is needed to fully establish the system's effectiveness.
 
 \subsection{Adaptive containers}
+\label{section:results_adaptive_containers}
 
 We now look at cases where an adaptive container was suggested, and evaluate the result.
 
@@ -252,10 +256,10 @@ As the $n$ threshold after which we switch is outside the range we benchmark our
 
 %% ** Comment on relative performance speedup
 Table \ref{table:adaptive_perfcomp} compares our adaptive container suggestions with the fastest non-adaptive implementation.
-Since we must select an implementation for all containers before selecting a project, we show all possible combinations of adaptive and non-adaptive container selections.
+Since we must select an implementation for all containers before selecting a project, we show all possible combinations of adaptive and non-adaptive container selections where appropriate.
 
 Note that the numbered columns indicate the benchmark 'size', not the actual size that the container reaches within that benchmark.
-The exact definition of this varies by benchmark.
+What this means exactly varies by benchmark.
 
 \begin{table}[h]
   \centering
@@ -299,9 +303,18 @@ In the \code{aoc_2022_09} project, the adaptive container is marginally faster u
 This shows that adaptive containers as we have implemented them are not effective in practice.
 Even in cases where we never reach the size threshold, the presence of adaptive containers has an overhead which slows down the program 3x in the worst case (\code{example_mapping}, size = 150).
 
-One explanation for this could be that every operation now requires checking which inner implementation we are using, resulting in branching overhead.
-More work could be done to minimise the overhead introduced, such as by using indirect jumps rather than branching instructions.
+One explanation for this could be that every operation now requires checking which inner implementation we are using, resulting in an additional check for each operation.
+More work could be done to minimise this overhead, although it's unclear exactly how much this could be minimised.
 
 It is also unclear if the threshold values that we suggest are the optimal ones.
 Currently, we decide our threshold by picking a value between two partitions with different best containers.
 Future work could take a more complex approach that finds the best threshold value based on our cost models, and takes the overhead of all operations into account.
+
+\subsection{Evaluation}
+
+Overall, we find that the main part of our container selection system appears to have merit.
+Whilst our testing has limitations, it shows that we can correctly identify the best container even in complex programs.
+More work is needed on improving our system's performance for very small containers, and on testing with a wider range of programs.
+
+Our proposed technique for identifying adaptive containers appears ineffective.
+The primary challenges appear to be in the overhead introduced to each operation, and in finding the correct point at which to switch implementations.