diff options
Diffstat (limited to 'thesis/parts')
-rw-r--r-- | thesis/parts/background.tex | 10 | ||||
-rw-r--r-- | thesis/parts/design.tex | 6 | ||||
-rw-r--r-- | thesis/parts/implementation.tex | 2 | ||||
-rw-r--r-- | thesis/parts/results.tex | 6 |
4 files changed, 12 insertions, 12 deletions
diff --git a/thesis/parts/background.tex b/thesis/parts/background.tex index a4383bc..f705aad 100644 --- a/thesis/parts/background.tex +++ b/thesis/parts/background.tex @@ -77,7 +77,7 @@ This means that developers are forced to guess based on their knowledge of the u \subsection{Rules-based approaches} One approach to the container selection problem is to allow the developer to make the choice initially, but use some tool to detect poor choices. -Chameleon\parencite{shacham_chameleon_2009} uses this approach. +Chameleon\citep{shacham_chameleon_2009} uses this approach. It first collects statistics from program benchmarks using a ``semantic profiler''. This includes the space used by collections over time and the counts of each operation performed. @@ -94,13 +94,13 @@ This results in selection rules being more restricted than they otherwise could For instance, a rule cannot suggest a \code{HashSet} instead of a \code{LinkedList} as the two are not semantically identical. Chameleon has no way of knowing if doing so will break the program's functionality and so it does not make the suggestion. -CoCo \parencite{hutchison_coco_2013} and work by \"{O}sterlund \parencite{osterlund_dynamically_2013} use similar techniques, but work as the program runs. +CoCo \citep{hutchison_coco_2013} and work by \"{O}sterlund \citep{osterlund_dynamically_2013} use similar techniques, but work as the program runs. This works well for programs with different phases of execution, such as loading and then working on data. However, the overhead from profiling and from checking rules may not be worth the improvements in other programs, where access patterns are roughly the same throughout. \subsection{ML-based approaches} -Brainy\parencite{jung_brainy_2011} gathers statistics similarly, however it uses machine learning (ML) for selection instead of programmed rules. +Brainy\citep{jung_brainy_2011} gathers statistics similarly, however it uses machine learning (ML) for selection instead of programmed rules. ML has the advantage of being able to detect patterns a human may not be aware of. For example, Brainy takes into account statistics from hardware counters, which are difficult for a human to reason about. @@ -108,7 +108,7 @@ This also makes it easier to add new collection implementations, as rules do not \subsection{Estimate-based approaches} -CollectionSwitch\parencite{costa_collectionswitch_2018} is an online solution which adapts as the program runs and new information becomes available. +CollectionSwitch\citep{costa_collectionswitch_2018} is an online solution which adapts as the program runs and new information becomes available. First, a performance model is built for each container implementation. This gives an estimate of some cost for each operation at a given collection size. @@ -131,7 +131,7 @@ However, it does not take the collection size into account. Most of the approaches we have highlighted focus on non-functional requirements, and use programming language features to enforce functional requirements. We will now examine tools which focus on container selection based on functional requirements. -Primrose \parencite{qin_primrose_2023} is one such tool, which uses a model-based approach. +Primrose \citep{qin_primrose_2023} is one such tool, which uses a model-based approach. It allows the application developer to specify semantic requirements using a Domain-Specific Language (DSL), and syntactic requirements using Rust's traits. The semantic requirements are expressed as a list of predicates, each representing a semantic property. diff --git a/thesis/parts/design.tex b/thesis/parts/design.tex index 18efa9c..6d9c482 100644 --- a/thesis/parts/design.tex +++ b/thesis/parts/design.tex @@ -87,7 +87,7 @@ We now go into more detail on how each step works, although we leave some specif %% Explain role in entire process As described in Chapter \ref{chap:background}, any implementation we pick must satisfy the program's functional requirements. -To do this, we integrate Primrose \parencite{qin_primrose_2023} as a first step. +To do this, we integrate Primrose \citep{qin_primrose_2023} as a first step. Primrose allows users to specify both the traits they require in an implementation (essentially the API and methods available), and what properties must be satisfied. @@ -126,7 +126,7 @@ Although we use primrose in our implementation, the rest of our system isn't dep \section{Cost Models} Now that we have a list of possible implementations, we need to understand the performance characteristics of each of them. -We use an approach similar to CollectionSwitch\parencite{costa_collectionswitch_2018}, which assumes that the main factor in how long an operation takes is the current size of the collection. +We use an approach similar to CollectionSwitch\citep{costa_collectionswitch_2018}, which assumes that the main factor in how long an operation takes is the current size of the collection. %% Benchmarks An implementation has a seperate cost model for each operation, which we obtain by executing the operation repeatedly on collections of various sizes. @@ -199,7 +199,7 @@ But when the size of the container grows, the cost of doing \code{contains} may Adaptive containers attempt to address this need, by starting off with one implementation (the low or before implementation), and switching to a new implemenation (the high or after implementation) once the size of the container passes a certain threshold. -This is similar to systems such as CoCo\parencite{hutchison_coco_2013} and in work by \"{O}sterlund\parencite{osterlund_dynamically_2013}. +This is similar to systems such as CoCo\citep{hutchison_coco_2013} and in work by \"{O}sterlund\citep{osterlund_dynamically_2013}. However, we decide when to switch container implementation before the program is run, rather than as it is running. We also do so in a way that requires no knowledge of the implementation internals. diff --git a/thesis/parts/implementation.tex b/thesis/parts/implementation.tex index cd7b4b7..706dbd5 100644 --- a/thesis/parts/implementation.tex +++ b/thesis/parts/implementation.tex @@ -32,7 +32,7 @@ The library source can be found in \code{src/crates/library}. \code{SortedUniqueVec} & Vec kept in sorted order, with no duplicates \\ \code{HashMap} & Hash map with quadratic probing \\ \code{HashSet} & Hash map with empty values \\ - \code{BTreeMap} & B-Tree\parencite{bayer_organization_1970} map with linear search. \\ + \code{BTreeMap} & B-Tree\citep{bayer_organization_1970} map with linear search. \\ \code{BTreeSet} & B-Tree map with empty values \\ \end{tabular} \caption{Implementations in our library} diff --git a/thesis/parts/results.tex b/thesis/parts/results.tex index 747631d..dbc8b14 100644 --- a/thesis/parts/results.tex +++ b/thesis/parts/results.tex @@ -41,7 +41,7 @@ It's unsurprising that these two implementations are the cheapest, as they have This is likely due to hash collisions being more likely as the size of the collection increases. \code{BTreeSet} insertions are also expensive, however the cost appears to level out as the collection size goes up (a logarithmic curve). -It's important to note that Rust's \code{BTreeSet}s are not based on binary tree search, but instead a more general tree search originally proposed by R Bayer and E McCreight\parencite{bayer_organization_1970}, where each node contains $B-1$ to $2B-1$ elements in an array. +It's important to note that Rust's \code{BTreeSet}s are not based on binary tree search, but instead a more general tree search originally proposed by R Bayer and E McCreight\citep{bayer_organization_1970}, where each node contains $B-1$ to $2B-1$ elements in an array. Our two mapping types, \code{BTreeMap} and \code{HashMap}, mimic the behaviour of their set counterparts. @@ -92,7 +92,7 @@ This is possibly a case of overfitting, as the observations for both implementat \code{HashSet} appears roughly linear as expected, with only a slow logarithmic rise, probably due to an increasing amount of collisions. \code{BTreeSet} is consistently above it, with a slightly higher logarithmic rise. -The standard library documentation states that searches are expected to take $B\log(n)$ comparisons on average\parencite{rust_documentation_team_btreemap_2024}, which is in line with observations. +The standard library documentation states that searches are expected to take $B\log(n)$ comparisons on average\citep{rust_documentation_team_btreemap_2024}, which is in line with observations. \code{BTreeMap} and \code{HashMap} both mimic their set counterparts, though are more expensive in most places. This is probably due to the increased size more quickly exhausting CPU cache. @@ -120,7 +120,7 @@ Future improvements could address the overfitting problems some operations had, Our test cases broadly fall into two categories: Example cases, which just repeat a few operations many times, and our 'real' cases, which are implementations of common algorithms and solutions to programming puzles. We expect the results from our example cases to be relatively unsurprising, while our real cases are more complex and harder to predict. -Most of our real cases are solutions to puzzles from Advent of Code\parencite{wastl_advent_2015}, a popular collection of programming puzzles. +Most of our real cases are solutions to puzzles from Advent of Code\citep{wastl_advent_2015}, a popular collection of programming puzzles. Table \ref{table:test_cases} lists and briefly describes our test cases. \begin{table}[h!] |