thesis/parts/design.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

\todo{Introduction}
\todo{Aims / expected input}

\section{Overview of approach}

Once a list of functionally close enough implementations have been found, selection is done by:

\begin{itemize}
\item Get a list of implementations that satisfy the program's functional requirements
\item Estimating the cost of each operation, for each implementation, for any given n value
\item Profiling the program to rank operation 'importance',
\item Combining the two to create an estimate of the relative cost of each implementation
\end{itemize}

\subsection{Cost Estimation}

We use an approach similar to CollectionSwitch\parencite{costa_collectionswitch_2018}, which assumes that the main factor in how long an operation takes is the current size of the collection.

Each operation has a seperate cost model, which we build by executing the operation repeatedly on collections of various sizes.

For example, to build a cost model for \code{Vec::contains}, we would create several \code{Vec}s of varying sizes, and find the average execution time $t$ of \code{contains} at each.

We then perform linear regression, using the collection size $n$ to predict $t$.
In the case of \code{Vec::contains}, we would expect the resulting polynomial to be roughly linear.

This method works well for many operations and structures, although has notable limitations.

For example, the container implementation \code{LazySortedVec} (provided by Primrose) inserts new elements at the end by default, and only sorts them when an operation that relies on the order is called.

This means that operations which are performed on a full container will appear much worse than they should, since they must do work 'deferred' by the benchmarking setup.
To prevent this, we perform some untimed 'warmup' operations.

\todo{No the fuck we don't}

Once we have the data, we fit a polynomial to the data.
Whilst we could use a more complex technique, in practice this is good enough: Very few common operations are above $O(n^3)$, and factors such as logarithms are usually 'close enough'.

We cache this data for as long as the implementation is unchanged.
Whilst it would be possible to share this data across computers, micro-architecture can have a large effect on collection performance\parencite{jung_brainy_2011}, so we calculate it on demand.

\subsection{Profiling}

As mentioned above, the ordering of operations can have a large effect on container performance.
Unfortunately, tracking every container operation in order quickly becomes unfeasible, so we settle for tracking the count of each operation, and the size of the collection.

Every instance of the collection is tracked separately, and results are collated after profiling.

\todo{Combining}

\todo{Summary}