aboutsummaryrefslogtreecommitdiff
path: root/thesis
diff options
context:
space:
mode:
authorAria Shrimpton <me@aria.rip>2024-03-10 18:39:02 +0000
committerAria Shrimpton <me@aria.rip>2024-03-10 18:39:02 +0000
commit4dca559b1a7d4ad6b104bec3f0d909cb68259fe4 (patch)
tree9d67e912f13ddb8d661c7f0915d9efd55c255245 /thesis
parent6a08fd153587608d66a088bd5deee9eeee40c5c0 (diff)
more writing
Diffstat (limited to 'thesis')
-rw-r--r--thesis/parts/design.tex87
-rw-r--r--thesis/parts/implementation.tex3
-rw-r--r--thesis/parts/results.tex1
3 files changed, 60 insertions, 31 deletions
diff --git a/thesis/parts/design.tex b/thesis/parts/design.tex
index 9b8ebce..18efa9c 100644
--- a/thesis/parts/design.tex
+++ b/thesis/parts/design.tex
@@ -68,16 +68,16 @@ Here, the code generated uses \code{Vec} as the implementation for \code{Sieve},
\section{Overview of process}
-Our tool integrates with Rust's packaging system (Cargo) to discover the information it needs about our project, then runs Primrose to find a list of implementations satsifying our functional requirements.
+Our tool integrates with Rust's packaging system (Cargo) to discover the information it needs about our project, then runs Primrose to find a list of implementations satsifying our functional requirements, from a pre-built library of container implementations.
Once we have this list, we then build a 'cost model' for each candidate type. This allows us to get an upper bound for the runtime cost of an operation at any given n.
-We then run the user-provided benchmarks, using any of the valid candidates instrumented to track the number of each operation, and the maximum n value it reaches.
+We then run the user-provided benchmarks, using any of the valid candidates instrumented to track how many times each operation is performed, and the maximum size of the container.
We combine this information with our cost models to estimate a total cost for each candidate, which is an upper bound on the total time taken for all container operations.
At this point, we also check if an 'adaptive' container would be better, by checking if one implementation is better performing at a lower n, and another at a higher n.
-Finally, we pick the container with the minimum cost, and create a new Rust file where the chosen container type is exported.
+Finally, we pick the implementation with the minimum cost, and generate code which sets the container type to use that implementation.
Our solution requires little user intervention, integrates well with existing workflows, and the time it takes scales linearly with the number of container types in a given project.
@@ -86,7 +86,8 @@ We now go into more detail on how each step works, although we leave some specif
\section{Functional requirements}
%% Explain role in entire process
-As described in Chapter \ref{chap:background}, any implementation we pick needs to satisfy the program's functional requirements. We do this by integrating Primrose \parencite{qin_primrose_2023} as a first step.
+As described in Chapter \ref{chap:background}, any implementation we pick must satisfy the program's functional requirements.
+To do this, we integrate Primrose \parencite{qin_primrose_2023} as a first step.
Primrose allows users to specify both the traits they require in an implementation (essentially the API and methods available), and what properties must be satisfied.
@@ -107,80 +108,108 @@ At this stage, we have a list of implementations for each container type we are
\begin{table}[h]
\centering
\begin{tabular}{|c|c|c|}
- Type & Implementation & File \\
+ Type & Implementation \\
\hline
- Primes & primrose\_library::EagerSortedVec & prime\_sieve/src/types.pr.rs \\
- Primes & std::collections::HashSet & prime\_sieve/src/types.pr.rs \\
- Primes & std::collections::BTreeSet & prime\_sieve/src/types.pr.rs \\
- Sieve & std::collections::LinkedList & prime\_sieve/src/types.pr.rs \\
- Sieve & std::vec::Vec & prime\_sieve/src/types.pr.rs \\
+ Primes & primrose\_library::EagerSortedVec \\
+ Primes & std::collections::HashSet \\
+ Primes & std::collections::BTreeSet \\
+ Sieve & std::collections::LinkedList \\
+ Sieve & std::vec::Vec \\
\end{tabular}
\caption{Usable implementations by container type for \code{prime_sieve}}
\label{table:candidates_prime_sieve}
\end{table}
%% Abstraction over backend
-Although we use primrose in our implementation, the rest of our system isn't dependent on it, and it would be relatively simple to use a different approach to select based on functional requirements.
+Although we use primrose in our implementation, the rest of our system isn't dependent on it, and it would be relatively simple to use a different approach for selecting based on functional requirements.
\section{Cost Models}
+Now that we have a list of possible implementations, we need to understand the performance characteristics of each of them.
We use an approach similar to CollectionSwitch\parencite{costa_collectionswitch_2018}, which assumes that the main factor in how long an operation takes is the current size of the collection.
%% Benchmarks
-Each operation has a seperate cost model, which we build by executing the operation repeatedly on collections of various sizes.
+An implementation has a seperate cost model for each operation, which we obtain by executing the operation repeatedly on collections of various sizes.
For example, to build a cost model for \code{Vec::contains}, we would create several \code{Vec}s of varying sizes, and find the average execution time $t$ of \code{contains} at each.
%% Linear Regression
-We then perform linear regression, using the collection size $n$ to predict $t$.
+We then perform regression, using the collection size $n$ to predict $t$.
In the case of \code{Vec::contains}, we would expect the resulting polynomial to be roughly linear.
-Once we have the data, we fit a polynomial to the data.
-Whilst we could use a more complex technique, in practice this is good enough: Very few common operations are above $O(n^3)$, and factors such as logarithms are usually 'close enough'.
+In our implementation, we fit a function of the form $x_0 + x_1 n + x_2 n^2 + x_3 \log_2 n$, using regular least-squares fitting.
+Whilst we could use a more complex technique, in practice this is good enough: Most common operations are polynomial at worst, and more complex models risk overfitting.
%% Limitations
This method works well for many operations and structures, although has notable limitations.
-For example, the container implementation \code{LazySortedVec} (provided by Primrose) inserts new elements at the end by default, and only sorts them when an operation that relies on the order is called.
+In particular, implementations which defer work from one function to another will be extremely inconsistent.
+For example, \code{LazySortedVec} (provided by Primrose) inserts new elements at the end by default, and waits to sort the list until the contents of the list are read from (such as by using \code{contains}).
We were unable to work around this, and so we have removed these variants from our container library.
A potential solution could be to perform untimed 'warmup' operations before each operation, but this is complex because it requires some understanding of what operations will cause work to be deferred.
+At the end of this stage, we are able to reason about the relative cost of operations between implementations.
+These models are cached for as long as our container library remains the same, as they are independent of what program the user is currently working on.
+
\section{Profiling applications}
+We now need to collect information about how the user's application uses its container types.
+
%% Data Collected
As mentioned above, the ordering of operations can have a large effect on container performance.
Unfortunately, tracking every container operation in order quickly becomes unfeasible, so we settle for tracking the count of each operation, and the maximum size of each collection instance.
-Every instance/allocation of the collection is tracked separately, and results are collated after profiling.
+Every instance or allocation of the collection is tracked separately, and results are collated after profiling.
%% Segmentation
Results with a close enough n value get sorted into partitions, where each partition stores the average count of each operation, and a weight indicating how common results in that partition were.
This serves 3 purposes.
The first is to compress the data, which speeds up processing and stops us running out of memory in more complex programs.
The second is to capture the fact that the number of operations will likely depend on the size of the container.
-The third is to aid in searching for adaptive containers, which will be elaborated on later.
-
-%% Limitations w/ pre-benchmark steps
-\todo{not taking into account 'preparatory' operations during benchmarks}
+The third is to aid in searching for adaptive containers, which we will elaborate on later.
-\section{Selection process \& adaptive containers}
+\section{Selection process}
%% Selection process
Once we have an estimate of how long each operation may take (from our cost models), and how often we use each operation (from our profiling information), we combine these to estimate the total cost of each implementation.
For each implementation, our total cost estimate is:
-\[ \sum_{op\in \textrm{ops}} \sum_{(r_{op}, N) \in \textrm{partitions}} C_\textrm{op}(N) * r_\textrm{op} \]
+$$
+\sum_{o\in \textrm{ops}} \sum_{(r_{o}, N, W) \in \textrm{partitions}} C_o(N) * r_o
+* W
+$$
+
+\begin{itemize}
+\item $C_o(N)$ is the cost estimated by the cost model for operation $o$ at n value $N$
+\item $r_o$ is the average count of a given operation in a partition
+\item $N$ is the average maximum N value in a partition
+\item $W$ is the weight of a partition, representing how many allocations fell in to this partition
+\end{itemize}
+
+Essentially, we scale an estimated worst-case cost of each operation by how frequently we think we will encounter it.
-where $C_{op}$ is the cost estimated by the cost model for operation $op$ at n value $N$, and $r_{op}, N$ is the average count of a given operation and the maximum N in a partition.
+\section{Adaptive containers}
+In many cases, the maximum size of a container type varies greatly between program runs.
+In these cases, it may be desirable to start off with one container type, and switch to another one if the size of the container grows greatly.
-%% Adaptive container detection
-adaptive containers are implemented using const generics, and a wrapper class.
+For example, if a program requires a set, then for small sizes it may be best to keep a sorted list and use binary search for \code{contains} operations.
+But when the size of the container grows, the cost of doing \code{contains} may grow high enough that using a \code{HashSet} would actually be faster.
-they are detected by finding the best implementation for each partition, sorting by n, and seeing if we can split the partitions in half where a different implementation is best on each side
+Adaptive containers attempt to address this need, by starting off with one implementation (the low or before implementation), and switching to a new implemenation (the high or after implementation) once the size of the container passes a certain threshold.
-we then check if the cost saving is greater than the cost of a clear operation and n insert operations
+This is similar to systems such as CoCo\parencite{hutchison_coco_2013} and in work by \"{O}sterlund\parencite{osterlund_dynamically_2013}.
+However, we decide when to switch container implementation before the program is run, rather than as it is running.
+We also do so in a way that requires no knowledge of the implementation internals.
+
+%% Adaptive container detection
+Using our list of partitions, we sort it by ascending container size and attempt to find a split.
+If we can split our partitions in half such that everything to the left performs best with one implementation, and everything to the right with another, then we should be able to switch implementation around that n value.
-%% Code generation
+In practice, finding the correct threshold is more difficult: We must take into account the cost of transforming from one implementation to another.
+If we adapt our container too early, we may do more work adapting it than we save if we just stuck with our low implementation.
+If we adapt too late, we have more data to move and less of our program gets to take advantage of the new implementation.
+We choose the relatively simple strategy of switching halfway between two partitions.
+Our cost models let us estimate how expensive switching implementations will be, which we compare against how much we save by switching to the after implementation.
diff --git a/thesis/parts/implementation.tex b/thesis/parts/implementation.tex
index b1e156a..1c131ed 100644
--- a/thesis/parts/implementation.tex
+++ b/thesis/parts/implementation.tex
@@ -78,11 +78,12 @@ This provides us with estimates for each singular candidate.
In order to try and suggest an adaptive container, we use the following algorithm:
\begin{enumerate}
+\item Sort partitions in order of ascending maximum n values.
\item Calculate the cost for each candidate and for each partition
\item For each partition, find the best candidate and store it in the array \code{best}. Note that we don't sum across all partitions this time.
\item Find the lowest index \code{i} where \code{best[i] != best[0]}
\item Check that \code{i} partitions the list properly: For all \code{j < i}, \code{best[j] == best[0]} and for all \code{j>=i}, \code{best[j] == best[i]}.
-\item Let \code{before} be the name of the candidate in \code{best[0]}, \code{after} be the name of the candidate in \code{best[i]}, and \code{threshold} be the maximum n value of partition \code{i}.
+\item Let \code{before} be the name of the candidate in \code{best[0]}, \code{after} be the name of the candidate in \code{best[i]}, and \code{threshold} be halfway between the maximum n values of partition \code{i} and partition \code{i-1}.
\item Calculate the cost of switching as:
$$
C_{\textrm{before,clear}}(\textrm{threshold}) + \textrm{threshold} * C_{\textrm{after,insert}}(\textrm{threshold})
diff --git a/thesis/parts/results.tex b/thesis/parts/results.tex
index 2e2373a..b07e581 100644
--- a/thesis/parts/results.tex
+++ b/thesis/parts/results.tex
@@ -28,7 +28,6 @@ To help readability, we group these into regular \code{Container} implementation
\label{fig:cm_insert}
\end{figure}
-
For \code{Vec}, we see that insertion is incredibly cheap, and gets slightly cheaper as the size of the container increases.
This is to be expected, as Rust's Vector implementation grows by a multiple whenever it reaches its maximum capacity, so we would expect amortised inserts to require less resizes as $n$ increases.