aboutsummaryrefslogtreecommitdiff
path: root/thesis/parts
diff options
context:
space:
mode:
authorAria Shrimpton <me@aria.rip>2024-01-30 18:18:39 +0000
committerAria Shrimpton <me@aria.rip>2024-01-30 18:18:39 +0000
commitcd8c8ddba45babdd60057bbc6714350b6b96ba67 (patch)
tree642304b6922aa796c3ef47d0e6f123ab739358bb /thesis/parts
parent73ff29c9b40e911d2e01c862db44107e250550f7 (diff)
writing: design
Diffstat (limited to 'thesis/parts')
-rw-r--r--thesis/parts/design.tex50
-rw-r--r--thesis/parts/methodology.tex5
2 files changed, 50 insertions, 5 deletions
diff --git a/thesis/parts/design.tex b/thesis/parts/design.tex
new file mode 100644
index 0000000..ce7fa8e
--- /dev/null
+++ b/thesis/parts/design.tex
@@ -0,0 +1,50 @@
+\todo{Introduction}
+\todo{Aims / expected input}
+
+\section{Overview of approach}
+
+Once a list of functionally close enough implementations have been found, selection is done by:
+
+\begin{itemize}
+\item Get a list of implementations that satisfy the program's functional requirements
+\item Estimating the cost of each operation, for each implementation, for any given n value
+\item Profiling the program to rank operation 'importance',
+\item Combining the two to create an estimate of the relative cost of each implementation
+\end{itemize}
+
+\subsection{Cost Estimation}
+
+We use an approach similar to CollectionSwitch\parencite{costa_collectionswitch_2018}, which assumes that the main factor in how long an operation takes is the current size of the collection.
+
+Each operation has a seperate cost model, which we build by executing the operation repeatedly on collections of various sizes.
+
+For example, to build a cost model for \code{Vec::contains}, we would create several \code{Vec}s of varying sizes, and find the average execution time $t$ of \code{contains} at each.
+
+We then perform linear regression, using the collection size $n$ to predict $t$.
+In the case of \code{Vec::contains}, we would expect the resulting polynomial to be roughly linear.
+
+This method works well for many operations and structures, although has notable limitations.
+
+For example, the container implementation \code{LazySortedVec} (provided by Primrose) inserts new elements at the end by default, and only sorts them when an operation that relies on the order is called.
+
+This means that operations which are performed on a full container will appear much worse than they should, since they must do work 'deferred' by the benchmarking setup.
+To prevent this, we perform some untimed 'warmup' operations.
+
+\todo{No the fuck we don't}
+
+Once we have the data, we fit a polynomial to the data.
+Whilst we could use a more complex technique, in practice this is good enough: Very few common operations are above $O(n^3)$, and factors such as logarithms are usually 'close enough'.
+
+We cache this data for as long as the implementation is unchanged.
+Whilst it would be possible to share this data across computers, micro-architecture can have a large effect on collection performance\parencite{jung_brainy_2011}, so we calculate it on demand.
+
+\subsection{Profiling}
+
+As mentioned above, the ordering of operations can have a large effect on container performance.
+Unfortunately, tracking every container operation in order quickly becomes unfeasible, so we settle for tracking the count of each operation, and the size of the collection.
+
+Every instance of the collection is tracked separately, and results are collated after profiling.
+
+\todo{Combining}
+
+\todo{Summary}
diff --git a/thesis/parts/methodology.tex b/thesis/parts/methodology.tex
deleted file mode 100644
index 9269e77..0000000
--- a/thesis/parts/methodology.tex
+++ /dev/null
@@ -1,5 +0,0 @@
-\todo{Introduction}
-\todo{Overview of approach}
-\todo{Cost Estimation}
-\todo{Profiling}
-\todo{Extensions to primrose}