diff options
author | Aria <me@aria.rip> | 2023-10-20 18:20:50 +0100 |
---|---|---|
committer | Aria <me@aria.rip> | 2023-10-20 18:20:50 +0100 |
commit | da823974b3e417bef457e26c49113f90f1075be1 (patch) | |
tree | 742810e9d49a4a1cf0b6495b414b1cfdc40bd938 /thesis/parts | |
parent | 718d7490f4e1a8b3cad6eac94f74765b8fe13f3d (diff) |
first draft of background done
Diffstat (limited to 'thesis/parts')
-rw-r--r-- | thesis/parts/background.tex | 26 |
1 files changed, 17 insertions, 9 deletions
diff --git a/thesis/parts/background.tex b/thesis/parts/background.tex index 8977152..a0326df 100644 --- a/thesis/parts/background.tex +++ b/thesis/parts/background.tex @@ -6,7 +6,7 @@ We then provide an overview of approaches from modern programming languages and The vast majority of programs will make extensive use of collection data types --- types intended to hold multiple instances of other types. In many languages, the standard library provides a variety of collections, forcing users to choose which is best for their program. -Consider the Rust types \code{Vec<T>}, a dynamic array, and \code{HashSet<T>}, a hash-based set. +Consider the Rust types \code{Vec}, a dynamic array, and \code{HashSet}, a hash-based set. If a user cares about ordering, or about preserving duplicates, then they must use \code{Vec<T>}. But if they do not, then \code{HashSet<T>} might be more performant, provided \code{contains} is used repeatedly. @@ -34,7 +34,7 @@ For a \code{HashSet}, this would include that there are never any duplicates, wh \subsection{Non-functional requirements} -While meeting the functional requirements should ensure our program runs correctly, we also want to choose the 'best' type that we can, striking an ideal balance between runtime and memory usage. +While meeting the functional requirements should ensure our program runs correctly, we also want to choose the 'best' type that we can, striking a balance between runtime and memory usage. Prior work has demonstrated that proper container selection can result in substantial performance improvements. \cite{l_liu_perflint_2009} found and suggested fixes for ``hundreds of suboptimal patterns in a set of large C++ benchmarks,'' with one such case improving performance by 17\%. @@ -49,7 +49,7 @@ This quickly becomes unfeasible, so we must explore other selection methods. \section{Prior literature} -In this section we outline methods for container selection available within and outside of current programming languages and their limitations based on existing literature on the topic. +In this section we outline methods for container selection available within and outside of current programming languages and their limitations. \subsection{Approaches in common programming languages} @@ -62,9 +62,9 @@ Often other implementations are possible, but are used only when needed and come In other languages, collections are given as part of a standard library or must be written by the user. Java comes with growable lists as part of its standard library, as does Rust. -In both cases, the ``blessed'' implementation of collections is not special --- users can implement their own and use them in the same ways. +In both cases, the standard library implementation is not special --- users can implement their own and use them in the same ways. -Often interfaces, or their closest equivalent, are used to distinguish 'similar' collections. +Often interfaces, or their closest equivalent, are used to abstract over 'similar' collections. In Java, ordered collections implement the interface \code{List<E>}, with similar interfaces for \code{Set<E>}, \code{Queue<E>}, etc. This allows most code to be implementation-agnostic, however the developer must still choose a concrete implementation at some point. @@ -119,10 +119,9 @@ By generating a cost model based on benchmarks, CollectionSwitch manages to be m Like ML approaches, adding new implementations requires little extra work, but has the advantage of being possible without having to re-train a model. A similar approach is used by \cite{l_liu_perflint_2009} for the C++ standard library. -It focuses on measuring more fine-grained operations, such as list resizing. +It focuses on measuring the cost and frequency of more fine-grained operations, such as list resizing. However, it does not take the collection size into account. - \subsection{Functional requirements} Most of the approaches we have highlighted focus on non-functional requirements, and use programming language features to enforce functional requirements. @@ -139,9 +138,18 @@ A constraint solver then checks if a given implementation will always meet the c This allows developers to express any combination of semantic requirements, rather than limiting them to common ones (as in Java's approach). It can also be extended with new implementations as needed, though this does require modelling the semantics of the new implementation. -\cite{franke_collection_2022} also uses the idea of refinement types, but is limited to properties defined by the library authors and implemented on the container implementations. +\cite{franke_collection_2022} uses a similar idea, but is limited to properties defined by the library authors and implemented on the container implementations. To select the final container implementation, both tools rely on benchmarking each candidate. As we note above, this scales poorly. -We will be creating a container selection method that primarily uses the Primrose approach while incorporating elements of CollectionSwitch's approach in order to combat the issue of scaling that many existing implementations face. +\section{Contribution} + +We aim to create a container selection method that addresses both functional and non-functional requirements in a scalable way. + +Primrose will be used as the first step, in order to select candidates based on functional requirements. +We will then collect statistics from user-provided benchmarks, and create cost estimates for each candidate, similar to CollectionSwitch. +Unlike CollectionSwitch, this will be done offline, rather than as the program is running. + +This will provide an integrated and flexible solution that addresses both functional and non-functional requirements. +We believe no such solution exists in current literature. |