diff options
Diffstat (limited to 'thesis/parts/background.tex')
-rw-r--r-- | thesis/parts/background.tex | 43 |
1 files changed, 21 insertions, 22 deletions
diff --git a/thesis/parts/background.tex b/thesis/parts/background.tex index 825c986..38cadac 100644 --- a/thesis/parts/background.tex +++ b/thesis/parts/background.tex @@ -1,29 +1,31 @@ -\todo{Integrate with rest of paper} - In this chapter we provide an overview of the problem of container selection and its effect on program correctness and performance. We then provide an overview of approaches from modern programming languages and existing literature. +Finally, we explain how our system is novel, and the weaknesses in existing literature it solves. \section{Container Selection} The vast majority of programs will make extensive use of collection data types --- types intended to hold multiple instances of other types. -In many languages, the standard library provides a variety of collections, forcing users to choose which is best for their program. -Consider the Rust types \code{Vec}, a dynamic array, and \code{HashSet}, a hash-based set. -If a user cares about ordering, or about preserving duplicates, then they must use \code{Vec<T>}. -But if they do not, then \code{HashSet<T>} might be more performant, provided \code{contains} is used repeatedly. +In many languages, the standard library provides a variety of collections, with users able to choose which is best for their program. +This saves users a lot of time, however selecting the best type is not always straightforward. + +Consider a program which needs to store and query a set of numbers, and doesn't care about ordering or duplicates. +If the number of items ($n$) is small enough, it might be fastest to use a dynamic array, and scan through each time we want to check if a number is inside. +On the other hand, if the set we deal with is much larger, we may want the constant-time lookups provided by hash sets, at the cost of a generally slower lookup. -We refer to this problem as container selection, and say that we must satisfy both functional requirements and non-functional requirements. +In this case, there are two factors driving our decision. +Our functional requirements, that we don't care about ordering or duplicates, and our non-functional requirements, that we want our program to be fast. \subsection{Functional requirements} -The functional requirements tell us how the container will be used and how it must behave. +Functional requirements tell us how the container will be used and how it must behave. +Continuing with our previous example, we'll compare Rust's \code{Vec} type (a dynamic array), with the \code{HashSet} type. -Continuing with our previous example, we can see that \code{Vec} and \code{HashSet} implement different methods. -\code{Vec} implements \code{.get(index)}, while \code{HashSet} does not; this would not be possible for an unordered collection. -If we attempt to replace \code{Vec} with \code{HashSet}, the resulting program will likely not compile. +Note that the two types have different methods: \code{Vec} implements \code{.get(index)}, while \code{HashSet} does not; \code{HashSet}s aren't ordered so this doesn't make sense. +If we were building a program that needed an ordered collection, replacing \code{Vec} with \code{HashSet} probably wouldn't compile. -We will call the operations a container implements the ``syntactic properties'' of the container. -In object-oriented programming, we might say they must implement an ``interface'', while in Rust, we would say that they implement a ``trait''. +We will call the operations a container provides the ``syntactic properties'' of the container. +In object-oriented programming, we might say they must implement an ``interface'', while in Rust, we could say that they implement a ``trait''. However, syntactic properties alone are not always enough to select an appropriate container. Suppose our program only requires a container to have \code{.insert(value)} and \code{.len()}. @@ -75,7 +77,7 @@ This means that developers are forced to guess based on their knowledge of the u \subsection{Rules-based approaches} One approach to the container selection problem is to allow the developer to make the choice initially, but use some tool to detect poor choices. -Chameleon\parencite{shacham_chameleon_2009} is a solution of this type. +Chameleon\parencite{shacham_chameleon_2009} uses this approach. It first collects statistics from program benchmarks using a ``semantic profiler''. This includes the space used by collections over time and the counts of each operation performed. @@ -92,8 +94,7 @@ This results in selection rules being more restricted than they otherwise could For instance, a rule cannot suggest a \code{HashSet} instead of a \code{LinkedList} as the two are not semantically identical. Chameleon has no way of knowing if doing so will break the program's functionality and so it does not make the suggestion. -%% TODO: Don't use citations as nouns -\cite{hutchison_coco_2013} and \cite{osterlund_dynamically_2013} use similar techniques, but work as the program runs. +CoCo \parencite{hutchison_coco_2013} and work by \"{O}sterlund \parencite{osterlund_dynamically_2013} use similar techniques, but work as the program runs. This works well for programs with different phases of execution, such as loading and then working on data. However, the overhead from profiling and from checking rules may not be worth the improvements in other programs, where access patterns are roughly the same throughout. @@ -148,11 +149,9 @@ As we note above, this scales poorly. \section{Contribution} -We aim to create a container selection method that addresses both functional and non-functional requirements in a scalable way. +We aim to create a container selection method that addresses both functional and non-functional requirements. -Primrose will be used as the first step, in order to select candidates based on functional requirements. -We will then collect statistics from user-provided benchmarks and create cost estimates for each candidate, similarly to CollectionSwitch. -Unlike CollectionSwitch, this will be done offline rather than as the program is running. +Users should be able to specify their functional requirements in a way that is expressive enough for most usecases, and easy to integrate with existing projects. +It should also be easy to add new container implementations, and we should be able to use it on large projects without selection time becoming an issue. -This will provide an integrated and flexible solution that addresses both functional and non-functional requirements. -We believe no such solution exists in current literature. +We focus on offline container selection (done before the program is compiled), however we also attempt to detect when changing implementation at runtime is desirable. |