From cd8c8ddba45babdd60057bbc6714350b6b96ba67 Mon Sep 17 00:00:00 2001 From: Aria Shrimpton Date: Tue, 30 Jan 2024 18:18:39 +0000 Subject: writing: design --- thesis/main.tex | 6 +++--- thesis/parts/design.tex | 50 ++++++++++++++++++++++++++++++++++++++++++++ thesis/parts/methodology.tex | 5 ----- 3 files changed, 53 insertions(+), 8 deletions(-) create mode 100644 thesis/parts/design.tex delete mode 100644 thesis/parts/methodology.tex diff --git a/thesis/main.tex b/thesis/main.tex index 16066cc..21de962 100644 --- a/thesis/main.tex +++ b/thesis/main.tex @@ -12,7 +12,7 @@ %% Convenience macros \newcommand{\code}{\texttt} -\newcommand{\todo}[1]{\colorbox{yellow}{#1} \newline} +\newcommand{\todo}[1]{\colorbox{yellow}{TODO: #1} \newline} \begin{document} \begin{preliminary} @@ -56,8 +56,8 @@ from the Informatics Research Ethics committee. \chapter{Background} \input {parts/background} -\chapter{Methodology} -\input{parts/methodology} +\chapter{Design} +\input{parts/design} \chapter{Implementation} \input{parts/implementation} diff --git a/thesis/parts/design.tex b/thesis/parts/design.tex new file mode 100644 index 0000000..ce7fa8e --- /dev/null +++ b/thesis/parts/design.tex @@ -0,0 +1,50 @@ +\todo{Introduction} +\todo{Aims / expected input} + +\section{Overview of approach} + +Once a list of functionally close enough implementations have been found, selection is done by: + +\begin{itemize} +\item Get a list of implementations that satisfy the program's functional requirements +\item Estimating the cost of each operation, for each implementation, for any given n value +\item Profiling the program to rank operation 'importance', +\item Combining the two to create an estimate of the relative cost of each implementation +\end{itemize} + +\subsection{Cost Estimation} + +We use an approach similar to CollectionSwitch\parencite{costa_collectionswitch_2018}, which assumes that the main factor in how long an operation takes is the current size of the collection. + +Each operation has a seperate cost model, which we build by executing the operation repeatedly on collections of various sizes. + +For example, to build a cost model for \code{Vec::contains}, we would create several \code{Vec}s of varying sizes, and find the average execution time $t$ of \code{contains} at each. + +We then perform linear regression, using the collection size $n$ to predict $t$. +In the case of \code{Vec::contains}, we would expect the resulting polynomial to be roughly linear. + +This method works well for many operations and structures, although has notable limitations. + +For example, the container implementation \code{LazySortedVec} (provided by Primrose) inserts new elements at the end by default, and only sorts them when an operation that relies on the order is called. + +This means that operations which are performed on a full container will appear much worse than they should, since they must do work 'deferred' by the benchmarking setup. +To prevent this, we perform some untimed 'warmup' operations. + +\todo{No the fuck we don't} + +Once we have the data, we fit a polynomial to the data. +Whilst we could use a more complex technique, in practice this is good enough: Very few common operations are above $O(n^3)$, and factors such as logarithms are usually 'close enough'. + +We cache this data for as long as the implementation is unchanged. +Whilst it would be possible to share this data across computers, micro-architecture can have a large effect on collection performance\parencite{jung_brainy_2011}, so we calculate it on demand. + +\subsection{Profiling} + +As mentioned above, the ordering of operations can have a large effect on container performance. +Unfortunately, tracking every container operation in order quickly becomes unfeasible, so we settle for tracking the count of each operation, and the size of the collection. + +Every instance of the collection is tracked separately, and results are collated after profiling. + +\todo{Combining} + +\todo{Summary} diff --git a/thesis/parts/methodology.tex b/thesis/parts/methodology.tex deleted file mode 100644 index 9269e77..0000000 --- a/thesis/parts/methodology.tex +++ /dev/null @@ -1,5 +0,0 @@ -\todo{Introduction} -\todo{Overview of approach} -\todo{Cost Estimation} -\todo{Profiling} -\todo{Extensions to primrose} -- cgit v1.2.3