From e091c37926281dbd5bf58b249d2d8d1b370897f2 Mon Sep 17 00:00:00 2001 From: Aria Shrimpton Date: Fri, 29 Mar 2024 22:22:28 +0000 Subject: introduction, conclusion, and minor cleanup --- thesis/parts/conclusion.tex | 16 +++++++++++++++- thesis/parts/design.tex | 5 +++-- thesis/parts/introduction.tex | 13 ++++++------- thesis/parts/results.tex | 4 ++-- 4 files changed, 26 insertions(+), 12 deletions(-) (limited to 'thesis/parts') diff --git a/thesis/parts/conclusion.tex b/thesis/parts/conclusion.tex index a00234d..e018efa 100644 --- a/thesis/parts/conclusion.tex +++ b/thesis/parts/conclusion.tex @@ -1 +1,15 @@ -\todo{Summarise, etc.} +%% Presented a system accounting for functional and non-functional requirements +We have presented an integrated system for container implementation selection, which can take into account the functional and non-functional requirements of the program it is working on. + +%% Ease of extending / flexibility +Our system is extremely flexible, and can be easily extended with new container types and new functionality on those types, as we showed by adding associative collections and several new data types. + +%% Demonstrated predictive power of profiling and benchmarking, although limited testing +We demonstrated that benchmarking of container implementations and profiling of target applications can be done separately and then combined to suggest the fastest container implementation for a particular program. +We prove that this approach has merit, although our testing had notable limitations that future work should improve on. +We also found that while linear regression is powerful enough for many cases, more research is required on how best to gather and preprocess data in order to best capture an implementation's performance characteristics. + +%% Researched feasibility of adaptive containers, found issues with overhead and threshold detection +We test the effectiveness of switching container implementation as the n value changes, and in doing so find several important factors to consider. +%% Future work should focus on minimising overhead and finding the ideal threshold +Future work should focus on minimising the overhead applied to every operation, as well as finding the correct threshold at which to switch implementation. diff --git a/thesis/parts/design.tex b/thesis/parts/design.tex index fba4437..84643b1 100644 --- a/thesis/parts/design.tex +++ b/thesis/parts/design.tex @@ -138,11 +138,12 @@ We then perform regression, using the collection size $n$ to predict $t$. In the case of \code{Vec::contains}, we would expect the resulting polynomial to be roughly linear. In our implementation, we fit a function of the form $x_0 + x_1 n + x_2 n^2 + x_3 \log_2 n$, using regular least-squares fitting. +Before fitting, we discard all observations that are more than one standard deviation out from the mean for a given $n$ value. + Whilst we could use a more complex technique, in practice this is good enough: Most common operations are polynomial at worst, and more complex models risk overfitting. -\todo{mention discarding outliers} + %% Limitations This method works well for many operations and structures, although has notable limitations. - In particular, implementations which defer work from one function to another will be extremely inconsistent. For example, \code{LazySortedVec} (provided by Primrose) inserts new elements at the end by default, and waits to sort the list until the contents of the list are read from (such as by using \code{contains}). diff --git a/thesis/parts/introduction.tex b/thesis/parts/introduction.tex index 068b1a5..aae5474 100644 --- a/thesis/parts/introduction.tex +++ b/thesis/parts/introduction.tex @@ -7,7 +7,7 @@ A common requirement when programming is the need to keep a collection of data t Often, programmers will have some requirements they want to impose on this collection, such as not storing duplicate elements, or storing the items in sorted order. %% **** Functionally identical implementations -However, implementing these collection types manually is usually a waste of time, as is fine-tuning their implementation to perform better. +However, implementing these collection types manually is usually a waste of time, as is fine-tuning a custom implementation to perform better. Most programmers will simply use one or two collection types provided by their language. %% **** Large difference in performance @@ -16,18 +16,17 @@ The underlying implementation of container types which function the same can hav %% *** Motivate w/ effectiveness claims We propose a system, Candelabra, for the automatic selection of container implementations, based on both user-specified requirements and inferred requirements for performance. -From our testing, we are able to select the best performing containers for a program, in significantly less time than brute force. +In our testing, we are able to accurately select the best performing containers for a program in significantly less time than brute force. %% *** Overview of aims & approach %% **** Ease of adding new container types -We have designed our system with flexibility in mind --- adding new container implementations requires little effort. +We have designed our system with flexibility in mind: adding new container implementations requires little effort. %% **** Ease of integration into existing projects It is easy to adopt our system incrementally, and we integrate with existing tools to making doing so easy. %% **** Scalability to larger projects The time it takes to select containers scales roughly linearly, even in complex cases, allowing our tool to be used even on larger projects. %% **** Flexibility of selection -Our system is also able to suggest adaptive containers --- containers which switch underlying implementation as they grow. - -%% *** Overview of results -\todo{Overview of results} +Our system is also able to suggest adaptive containers: containers which switch underlying implementation as they grow. +%% **** Overview of results +Whilst we saw reasonable suggestions in our test cases, we found the overhead of switching and of checking the current implementation to be more of a problem than expected, which future work could improve on. diff --git a/thesis/parts/results.tex b/thesis/parts/results.tex index 6562d38..944ecd8 100644 --- a/thesis/parts/results.tex +++ b/thesis/parts/results.tex @@ -116,8 +116,8 @@ As the spread of points also appears to increase at larger $n$ values, its possi \code{HashSet} appears roughly linear as expected, with only a slow logarithmic rise, probably due to an increasing amount of collisions. \code{BTreeSet} is consistently above it, with a slightly higher logarithmic rise. -\code{BTreeMap} and \code{HashMap} both mimic their set counterparts, but with a slightly lower overall cost this time. -\todo{It's unclear why this is.} +\code{BTreeMap} and \code{HashMap} both mimic their set counterparts, but with a slightly lower cost and growth rate. +It's unclear why this is, however it could be related to the larger spread in observations for both implementations. \subsection{Evaluation} -- cgit v1.2.3