aboutsummaryrefslogtreecommitdiff
path: root/thesis/parts/design.tex
diff options
context:
space:
mode:
authorAria Shrimpton <me@aria.rip>2024-02-19 21:27:55 +0000
committerAria Shrimpton <me@aria.rip>2024-02-19 21:28:03 +0000
commitd3be4136218ac50f4f46ff739f858300e5313a85 (patch)
treea6e781e68e28fa7e69c7a013eb353353bb904c61 /thesis/parts/design.tex
parentd0a3f00a1c4116433343d11b63872888b0c2aaf6 (diff)
work on design chapter
Diffstat (limited to 'thesis/parts/design.tex')
-rw-r--r--thesis/parts/design.tex55
1 files changed, 36 insertions, 19 deletions
diff --git a/thesis/parts/design.tex b/thesis/parts/design.tex
index 6c3afab..1ff0048 100644
--- a/thesis/parts/design.tex
+++ b/thesis/parts/design.tex
@@ -10,17 +10,17 @@ We aim to create a program for container selection that can select based on both
Flexibility is a high priority: It should be easy to add new container implementations, and to integrate our system into existing applications.
Our system should also be able to scale to larger programs, and remain convenient for developers to use.
-We chose to implement our system as a Rust CLI, and limit it to selecting containers for Rust programs.
-We require the user to provide their own benchmarks, which should be representative of a typical application run.
+We chose to implement our system as a Rust CLI, and to work on programs also written in Rust.
+We require the user to provide their own benchmarks, which should be representative of a typical application run - without this, we have no consistent way to evaluate speed.
-The user can specify their functional requirements by listing the required traits, and specifying properties that must always hold in a lisp-like language.
+Users can specify their functional requirements by listing the required traits, and specifying properties that must always hold in a lisp-like language.
-For example, Listing \ref{lst:selection_example} shows code from our test case based on the sieve of Eratosthenes (\code{src/tests/prime\_sieve} in the source artifacts).
+For example, Listing \ref{lst:selection_example} shows code from our test case based on the sieve of Eratosthenes (\code{src/tests/prime_sieve} in the source artifacts).
Here we request two container types: \code{Sieve} and \code{Primes}.
The first must implement the \code{Container} and \code{Stack} traits, and must satisfy the \code{lifo} property. This property is defined at the top as only being applicable to \code{Stack}s, and requires that for any \code{x}, pushing \code{x} then popping from the container returns \code{x}.
The second container type, \code{Primes}, must only implement the \code{Container} trait, and must satisfy the \code{ascending} property.
-This property requires that at any point, for all consecutive \code{x, y} pairs in the container, \code{x $\leq$ y}.
+This property requires that at any point, for all consecutive \code{x, y} pairs in the container, \code{x <= y}.
\begin{figure}
\begin{lstlisting}[caption=Container type definitions for prime\_sieve,label={lst:selection_example}]
@@ -40,29 +40,29 @@ type Primes<S> = {c impl (Container) | (ascending c)}
\end{lstlisting}
\end{figure}
-Once we've specified our functional requirements and provided a benchmark (\code{src/tests/prime\_sieve/benches/main.rs}), we can simply run candelabra to select a container:
+Once we've specified our functional requirements and provided a benchmark (\code{src/tests/prime_sieve/benches/main.rs}), we can simply run candelabra to select a container:
\todo{Show selection process}
Our tool integrates with Rust's packaging system (Cargo) to discover the information it needs about our project, then runs Primrose to find a list of implementations satsifying our functional requirements.
Once it has this list, it estimates a 'cost' for each candidate, which is an upper bound on the total time taken for all container operations.
-At this point, it also checks if an 'adaptive' container would be better, by checking if one implementation is better performing at a lower n, and another at a higher n.
+At this point, it also checks if an ``adaptive'' container would be better, by checking if one implementation is better performing at a lower n, and another at a higher n.
Finally, it picks the container with the minimum cost, and creates a new Rust file where the chosen container type is exported.
Our tool requires little user intervention, integrates well with existing workflows, and the time it takes scales linearly with the number of container types in a given project.
-\section{Selecting for functional requirements}
+\section{Functional requirements}
%% Explain role in entire process
-Before we can select the fastest container, we first need to find a list of candidates which satisfy the program's functional requirements.
+As described in Chapter \ref{chap:background}, any implementation we pick needs to satisfy the program's functional requirements. We do this by integrating Primrose \parencite{qin_primrose_2023} as a first step.
Primrose allows users to specify both the traits they require in an implementation (essentially the API and methods available), and what properties must be satisfied.
Each container type that we want to select an implementation for is bound by a list of traits and a list of properties (lines 11 and 12 in Listing \ref{lst:selection_example}).
%% Short explanation of selection method
-The exact internals are beyond the scope of this paper, but in brief this works by:
+In brief, primrose works by:
\begin{itemize}
\item Finding all implementations in the container library that implement all required traits
\item Translate any specified properties to a Rosette expression
@@ -72,7 +72,24 @@ The exact internals are beyond the scope of this paper, but in brief this works
We use the code provided with the Primrose paper, with minor modifications elaborated on in Chapter \ref{chap:implementation}.
%% Abstraction over backend
-\todo{Abstraction over backend}
+Although we use primrose in our implementation, the rest of our system isn't dependent on it, and it would be relatively simple to use a different approach to select based on functional requirements.
+
+At this stage, we have a list of implementations for each container type we are selecting. The command \code{candelabra-cli candidates} will show this output, as in Table \ref{table:candidates_prime_sieve}.
+
+\begin{table}[h]
+ \centering
+ \begin{tabular}{|c|c|c|}
+ Type & Implementation & File \\
+ \hline
+ Primes & primrose\_library::EagerSortedVec & prime\_sieve/src/types.pr.rs \\
+ Primes & std::collections::HashSet & prime\_sieve/src/types.pr.rs \\
+ Primes & std::collections::BTreeSet & prime\_sieve/src/types.pr.rs \\
+ Sieve & std::collections::LinkedList & prime\_sieve/src/types.pr.rs \\
+ Sieve & std::vec::Vec & prime\_sieve/src/types.pr.rs \\
+ \end{tabular}
+ \caption{Usable implementations by container type for \code{prime_sieve}}
+ \label{table:candidates_prime_sieve}
+\end{table}
\section{Cost Models}
@@ -95,8 +112,8 @@ This method works well for many operations and structures, although has notable
For example, the container implementation \code{LazySortedVec} (provided by Primrose) inserts new elements at the end by default, and only sorts them when an operation that relies on the order is called.
-We were unable to work around this, although a potential solution could be to perform untimed 'warmup' operations before each operation.
-this is complex because it requires some understanding of what operations will have deferred work for them.
+We were unable to work around this, and so we have removed these variants from our container library.
+A potential solution could be to perform untimed 'warmup' operations before each operation, but this is complex because it requires some understanding of what operations will cause work to be deferred.
\todo{Find a place for this}
Whilst it would be possible to share this data across computers, micro-architecture can have a large effect on collection performance\parencite{jung_brainy_2011}, so we calculate it on demand.
@@ -105,23 +122,23 @@ Whilst it would be possible to share this data across computers, micro-architect
%% Data Collected
As mentioned above, the ordering of operations can have a large effect on container performance.
-Unfortunately, tracking every container operation in order quickly becomes unfeasible, so we settle for tracking the count of each operation, and the size of the collection.
+Unfortunately, tracking every container operation in order quickly becomes unfeasible, so we settle for tracking the count of each operation, and the maximum size of each collection instance.
-Every instance of the collection is tracked separately, and results are collated after profiling.
+Profiling is done per-instance, so for every place a new container is allocated.
+Although we collect these seperately, we immediately summarise them into a list of partitions.
%% Segmentation
-
-results with a close enough n value get sorted into partitions, where each partition has the average amount of each operation, and a weight indicating how common results in that partition were.
-this is done to compress the data, and also to allow searching for adaptive containers later
+Results are grouped by their n value - everything with a close enough n value gets put in the same partition, where we average our metrics.
+This provides a form of compression, and is also used to detect situations where an adaptive container should be used (see next section).
%% Limitations w/ pre-benchmark steps
-
\todo{not taking into account 'preparatory' operations during benchmarks}
\section{Selection process \& adaptive containers}
%% Selection process
+
%% Adaptive container detection
adaptive containers are implemented using const generics, and a wrapper class.