3 files changed, 53 insertions, 8 deletions
diff --git a/thesis/main.tex b/thesis/main.tex
index 16066cc..21de962 100644
--- a/thesis/main.tex
+++ b/thesis/main.tex
@@ -12,7 +12,7 @@
 
 %% Convenience macros
 \newcommand{\code}{\texttt}
-\newcommand{\todo}[1]{\colorbox{yellow}{#1} \newline}
+\newcommand{\todo}[1]{\colorbox{yellow}{TODO: #1} \newline}
 
 \begin{document}
 \begin{preliminary}
@@ -56,8 +56,8 @@ from the Informatics Research Ethics committee.
 \chapter{Background}
 \input {parts/background}
 
-\chapter{Methodology}
-\input{parts/methodology}
+\chapter{Design}
+\input{parts/design}
 
 \chapter{Implementation}
 \input{parts/implementation}
diff --git a/thesis/parts/design.tex b/thesis/parts/design.tex
new file mode 100644
index 0000000..ce7fa8e
--- /dev/null
+++ b/thesis/parts/design.tex
@@ -0,0 +1,50 @@
+\todo{Introduction}
+\todo{Aims / expected input}
+
+\section{Overview of approach}
+
+Once a list of functionally close enough implementations have been found, selection is done by:
+
+\begin{itemize}
+\item Get a list of implementations that satisfy the program's functional requirements
+\item Estimating the cost of each operation, for each implementation, for any given n value
+\item Profiling the program to rank operation 'importance',
+\item Combining the two to create an estimate of the relative cost of each implementation
+\end{itemize}
+
+\subsection{Cost Estimation}
+
+We use an approach similar to CollectionSwitch\parencite{costa_collectionswitch_2018}, which assumes that the main factor in how long an operation takes is the current size of the collection.
+
+Each operation has a seperate cost model, which we build by executing the operation repeatedly on collections of various sizes.
+
+For example, to build a cost model for \code{Vec::contains}, we would create several \code{Vec}s of varying sizes, and find the average execution time $t$ of \code{contains} at each.
+
+We then perform linear regression, using the collection size $n$ to predict $t$.
+In the case of \code{Vec::contains}, we would expect the resulting polynomial to be roughly linear.
+
+This method works well for many operations and structures, although has notable limitations.
+
+For example, the container implementation \code{LazySortedVec} (provided by Primrose) inserts new elements at the end by default, and only sorts them when an operation that relies on the order is called.
+
+This means that operations which are performed on a full container will appear much worse than they should, since they must do work 'deferred' by the benchmarking setup.
+To prevent this, we perform some untimed 'warmup' operations.
+
+\todo{No the fuck we don't}
+
+Once we have the data, we fit a polynomial to the data.
+Whilst we could use a more complex technique, in practice this is good enough: Very few common operations are above $O(n^3)$, and factors such as logarithms are usually 'close enough'.
+
+We cache this data for as long as the implementation is unchanged.
+Whilst it would be possible to share this data across computers, micro-architecture can have a large effect on collection performance\parencite{jung_brainy_2011}, so we calculate it on demand.
+
+\subsection{Profiling}
+
+As mentioned above, the ordering of operations can have a large effect on container performance.
+Unfortunately, tracking every container operation in order quickly becomes unfeasible, so we settle for tracking the count of each operation, and the size of the collection.
+
+Every instance of the collection is tracked separately, and results are collated after profiling.
+
+\todo{Combining}
+
+\todo{Summary}
diff --git a/thesis/parts/methodology.tex b/thesis/parts/methodology.tex
deleted file mode 100644
index 9269e77..0000000
--- a/thesis/parts/methodology.tex
+++ /dev/null
@@ -1,5 +0,0 @@
-\todo{Introduction}
-\todo{Overview of approach}
-\todo{Cost Estimation}
-\todo{Profiling}
-\todo{Extensions to primrose}