diff options
Diffstat (limited to 'thesis/parts/background.tex')
-rw-r--r-- | thesis/parts/background.tex | 93 |
1 files changed, 44 insertions, 49 deletions
diff --git a/thesis/parts/background.tex b/thesis/parts/background.tex index 7b995b0..630e253 100644 --- a/thesis/parts/background.tex +++ b/thesis/parts/background.tex @@ -1,27 +1,28 @@ In this chapter, we provide an overview of the problem of container selection, and its effect on program correctness and performance. -We then provide an overview of how modern programming languages approach this problem, and how existing literature differs. -Finally, we examine the gaps in the existing literature, and explain how this paper aims to contribute to it. +We then provide an overview of how modern programming languages approach this problem, and how existing literature contributes. +Finally, we examine the gaps in the existing literature, and explain how this paper aims to contribute. \section{Container Selection} The vast majority of programs will use make extensive use of collection data types - types intended to hold many different instances of other data types. This can refer to anything from fixed-size arrays, to growable linked lists, to associative key-value mappings or dictionaries. -In some cases, these are built-in parts of the language: In Go, a list of ints has type \code{[]int} and a map from string to string has type \code{map[string]string}. -In other languages, these are instead part of some standard library, or in some cases must be defined by the user. +In some cases, core collections are built-in parts of the language: In Go, a growable list (or vector) of ints has type \code{[]int}. + +In other languages, vectors are instead part of some standard library, or must be defined by the user. In Rust, you might write \code{Vec<isize>} and \code{HashMap<String, String>} for the same purpose. This forces us to make a choice upfront: what type should we use? In this case the answer is obvious - the two have very different purposes and don't support the same operations. However, if we were to consider \code{Vec<isize>} and \code{HashSet<isize>}, the answer is much less obvious. If we care about the ordering, or about preserving duplicates, then we must use \code{Vec<isize>}. -But if we don't, then \code{HashSet<isize>} may be more performant - for example if we use \code{contains} a lot. +But if we don't, then \code{HashSet<isize>} might be more performant if we use \code{contains} a lot. -We refer to this problem as container selection, and split it into two parts: Functional requirements, and non-functional requirements. +We refer to this problem as container selection, and say that we must satisfy both functional requirements, and non-functional requirements. \subsection{Functional requirements} -Functional requirements refers to a similar definition as is normally used for software: The container must behave the way that the program expects it to. +The functional requirements tell us how the container will be used, and how it must behave. Continuing with our previous example, we can see that \code{Vec} and \code{HashSet} implement different methods. \code{Vec} implements methods like \code{.get(index)} and \code{.push(value)}, while \code{HashSet} implements neither - they don't make sense for an unordered collection. @@ -33,10 +34,9 @@ In object-oriented programming, we might say they must implement an interface. In Rust, we would say that they implement a trait, or that they belong to a type class. However, syntactic properties alone are not always enough to select an appropriate container. -Suppose our program only requires a container to have \code{.insert(value)}, \code{.contains(value)}, and \code{.len()}. -Both \code{Vec} and \code{HashSet} will satisfy these requirements. -However, our program might rely on \code{.len()} returning a count including duplicates. -In this case, \code{HashSet} would give us different behaviour, possibly causing our program to behave incorrectly. +Suppose our program only requires a container to have \code{.insert(value)}, and \code{.len()}. +Both \code{Vec} and \code{HashSet} will satisfy these requirements, however our program might rely on \code{.len()} returning a count including duplicates. +In this case, \code{HashSet} would give us different behaviour, causing our program to behave incorrectly. To express this, we say that a container implementation also has ``semantic properties'' that must satisfy our requirements. Intuitively we can think of this as what conditions the container upholds. @@ -44,42 +44,44 @@ For a \code{HashSet}, this would include that there are never any duplicates, wh \subsection{Non-functional requirements} -While meeting the functional requirements is generally enough to ensure a program runs correctly, we also want to ensure we choose the 'best' type we can. -For our purposes, this will simply be the type that minimises runtime, although other approaches also consider the balance between memory usage and time. +While meeting the functional requirements is generally enough to ensure a program runs correctly, we also want to ensure we choose the 'best' type that we can. +For our purposes, we will only consider program run time, although other approaches also consider the balance between memory usage and time. Prior work has shown that properly considering container selection selection can give substantial performance improvements, even in large applications. For instance, tuning performed in \cite{chung_towards_2004} achieved an up to 70\% increase in the throughput of a complex web application, and a 15-40\% decrease in the runtime of several scientific applications. -\cite{l_liu_perflint_2009} found and suggested fixes for ``hundreds of suboptimal patterns in a set of large C++ benchmarks'', with one such case improving performance by 17\%. +\cite{l_liu_perflint_2009} found and suggested fixes for ``hundreds of suboptimal patterns in a set of large C++ benchmarks,'' with one such case improving performance by 17\%. Similarly, \cite{jung_brainy_2011} achieves an average speedup of 27-33\% on real-world applications and libraries. -If we assume we can find a selection of types that satisfy the functional requirements, then one obvious solution is just to benchmark the program with each of these implementations in place, and see which works best. +If we assume we can find a selection of types that satisfy our functional requirements, then one obvious solution is to benchmark the program with each of these implementations in place, and see which works best. This will obviously work, so long as our benchmarks are roughly representative of 'real world' inputs. Unfortunately, this technique scales poorly for bigger applications. As the number of container types we must select increases, the number of combinations we must try increases exponentially (assuming they all have roughly the same number of candidates). -This quickly becomes unfeasible, and so we must find other ways of improving our performance. +This quickly becomes unfeasible, and so we must find other selection methods. \section{Prior Literature} +This section outlines the options available in current programming languages, and in existing literature. + \subsection{Approaches in common programming languages} Modern programming languages broadly take one of two approaches to container selection. Some languages, usually higher-level ones, recommend built-in structures as the default, using implementations that perform fine for the vast majority of use-cases. -Popular examples include Python, which uses \code{[1, 2, 3]} and \code{\{'one': 1\}} for lists and maps respectively; and Go, which uses \code{int[]\{1, 2, 3\}} and \code{map[string]int\{"one": 1\}} for the same purposes. -This approach prioritises developer ergonomics: programmers writing in these languages do not need to think about how these are implemented in the vast majority of cases. -In both languages, other implementations are possible to a certain extent, although these aren't usually preferred and come at the cost of code readability. +One popular examples is Python, which uses dynamic arrays as its default implementation. +This approach prioritises developer ergonomics: Programmers do not need to think about how these are implemented. +Usually, other implementations are possible, but are used only when needed and come at the cost of code readability. In other languages, collections are given as part of a standard library, or must be written by the user. -For example, C does not support growable lists at the language level - users must bring in their own implementation or use an existing library. -Java comes with growable lists and maps as part of its standard library, as does Rust (with some macros to make use easier). +Java comes with growable lists as part of its standard library, as does Rust (with some macros to make use easier). In both cases, the ``blessed'' implementation of collections is not special - users can implement their own. -In many languages, interfaces or their closest equivalent are used to distinguish 'similar' collections. +Often, interfaces or their closest equivalent are used to distinguish 'similar' collections. In Java, ordered collections implement the interface \code{List<E>}, while similar interfaces exist for \code{Set<E>}, \code{Queue<E>}, etc. This means that when the developer chooses a type, the compiler enforces the syntactic requirements of the collection, and the writer of the implementaiton ``promises'' they have met the semantic requirements. -Other languages give much weaker guarantees, for instance Rust has no typeclasses for List or Set. -Its closest equivalents are traits like \code{Index<I>} and \code{IntoIterator}, neither of which make semantic guarantees. + +Many other languages give much weaker guarantees, for instance Rust has no typeclasses for List or Set. +Its closest equivalents are traits like \code{Index<I>} and \code{IntoIterator}, neither of which have particularly strong semantic guarantees. Whilst the approach Java takes is the most expressive, both of these approaches either put the choice on the developer, or remove the choice entirely. This means that developers are forced to guess based on their knowledge of the underlying implementations, or more often to just pick the most common implementation. @@ -89,47 +91,33 @@ The papers we will examine all attempt to choose for the developer, based on a v Chameleon\parencite{shacham_chameleon_2009} is a tool for Java codebases, which uses a rules engine to identify sub-optimal choices. -First, it runs the program with some representative input, and collects data on the collections used using a ``semantic profiler''. -This data includes the space used by collections, the minimum space that could be used by all of the items of that collection, and the counts of each operation performed. +It works by collecting data from benchmarks using a ``semantic profiler''. +This data includes the space used by collections over time, and the counts of each operation performed. These statistics are tracked per individual collection allocated, and then aggregated by 'allocation context' - a portion of the callstack where the allocation occured. These aggregated statistics are then passed to a rules engine, which uses a set of rules to suggest places a different container type might improve performance. -For example, a rule could check when a linked list often has items accessed by index, and suggest a different list implementation as a replacement. This results in a flexible engine for providing suggestions, which can be extended with new rules and types as necessary. Unfortunately, this does require the developer to come up with and add replacement rules for each implementation. In many cases, there may be patterns that could be used to suggest a better option, but that the developer does not see or is not able to formalise. -Chameleon also relies only on the existing type to decide what it can suggest. +To satisfy functional requirements, Chameleon only suggests new types that behave identically to the existing type. This results in selection rules needing to be more restricted than they otherwise could be. For instance, a rule cannot suggest a \code{HashSet} instead of a \code{LinkedList}, as the two are not semantically identical. Chameleon has no way of knowing if doing so will break the program's functionality, and so it does not make a suggestion. \subsection{Brainy} -%% - uses ai model to predict based on target microarchitecture, and runtime behaviour -%% - uses access pattersn, etc. -%% - also assumes semantically identical set of candidates -%% - uses application generator for training data -%% - focuses on the performance difference between microarchitectures -%% - intended to be run at each install site - -Brainy\parencite{jung_brainy_2011} also focuses on non-functional requirements, but uses Machine Learning techniques instead of set rules. +Brainy\parencite{jung_brainy_2011} also focuses on non-functional requirements, but uses machine learning techniques instead of defined rules. Similar to Chameleon, Brainy runs the program with example input, and collects statistics on how collections are used. Unlike Chameleon, these statistics include some hardware counters, such as cache utilisation and branch misprediction rate. -This profiling information is then fed to an ML model, which predicts the implementation likely to be most performant for the specific program and microarchitecture, from the models that the model was trained to use. +This profiling information is then fed to an ML model, which predicts the best implementation. Of the existing literature, Brainy appears to be the only method which directly accounts for hardware factors. The authors propose that their tool can be run at install-time for each target system, and then used by developers or by applications integrated with it to select the best data structure for that hardware. -This allows it to compensate for the differences in performance that can come from different hardware configurations - for instance, the size of the cache may affect the performance of a growable list compared to a linked list. -The paper itself demonstrates the effectiveness of this, stating that ``On average, 43\% of the randomly generated applications have different optimal data structures [across different architectures]''. - -The model itself is trained on a dataset of randomly generated applications, which are randomly generated sequences of operations. -This is intended to avoid overfitting on specific applications, as a large number of applications with different characteristics can be generated. -However, the applications generated are unlikely to be representative of real applications. -In practice, there are usually patterns of certain combinations that are repeated, meaning the next operation is never truly random. +The paper itself demonstrates the effectiveness of this, finding that ``On average, 43\% of the randomly generated applications have different optimal data structures [across different architectures].'' Brainy determines which types satisfy the functional requirements based on the original data structure (vector, list, set), and whether the order is ever used. This allows for a bigger pool of containers to choose from, for instance a vector can also be swapped for a set in some circumstances. @@ -143,15 +131,22 @@ First, a performance model is built for each container implementation. This is done by performing each operation many times in succession, varying the length of the collection. This data is used to fit a polynomial, which gives an estimate of cost per operation at a given n. -The total cost for each collection type is then calculated for each individual instance over its lifetime. -If switching to another implementation will drop the average total cost more than a certain threshold, then CollectionSwitch will start using that collection for newly allocated instances, and may also switch existing instances over to it. +This is then combined with the frequency of each operation counts to give cost estimates for each collection type, operation, and 'cost dimension' (time and space). +Rules then decide when switching to a new implementation is worth it based on these cost estimates and defined thresholds. By generating a cost model based on benchmarks, CollectionSwitch manages to be more flexible than other rules-based approaches such as Chameleon. - -%% TODO: comment on functional selection +It expects applications to use Java's \code{List}, \code{Set}, and \code{Map} interfaces, which express enough functional requirements for most problems. \subsection{Primrose} -%% TODO +Primrose \parencite{qin_primrose_2023} focuses on the functional requirements of container selection. + +It allows the application developer to specify both syntactic and semantic requirements using a Lisp DSL. +The available implementations are then checked against these requirements using an SMT solver, to obtain a set of usable implementations. + +Developers must then choose which of these implementations will work best for their non-functional requirements. + +This allows developers to express any combination of semantic requirements, rather than limiting them to common ones like Java's approach. +It can also be extended with new implementations as needed, although this does require modelling the semantics of the new implementation. \section{Contributions} |