diff options
author | Aria Shrimpton <me@aria.rip> | 2024-02-17 16:02:03 +0000 |
---|---|---|
committer | Aria Shrimpton <me@aria.rip> | 2024-02-19 21:28:03 +0000 |
commit | 1edff7368236365c8db83e87c2110c42292b2907 (patch) | |
tree | a775f027e8913d0a7a6ee516750715fa26343341 /thesis | |
parent | 57c443ab2c13923812524176a0dc7f7e3fe3fc29 (diff) |
introduction & redraft background
Diffstat (limited to 'thesis')
-rw-r--r-- | thesis/main.tex | 6 | ||||
-rw-r--r-- | thesis/parts/background.tex | 43 | ||||
-rw-r--r-- | thesis/parts/introduction.tex | 34 |
3 files changed, 55 insertions, 28 deletions
diff --git a/thesis/main.tex b/thesis/main.tex index d16e956..bd7f755 100644 --- a/thesis/main.tex +++ b/thesis/main.tex @@ -4,15 +4,15 @@ \documentclass[logo,bsc,singlespacing,parskip]{infthesis} \usepackage{ugcheck} -\usepackage{microtype} \usepackage[dvipsnames]{xcolor} +\usepackage{microtype} \usepackage[style=numeric]{biblatex} \addbibresource{biblio.bib} %% Convenience macros -\newcommand{\code}{\texttt} -\newcommand{\todo}[1]{\colorbox{yellow}{TODO: #1} \newline} +\newcommand{\code}{\lstinline} +\newcommand{\todo}[1]{\colorbox{yellow}{TODO: #1} \par} %% Code blocks \usepackage{listings} diff --git a/thesis/parts/background.tex b/thesis/parts/background.tex index 825c986..38cadac 100644 --- a/thesis/parts/background.tex +++ b/thesis/parts/background.tex @@ -1,29 +1,31 @@ -\todo{Integrate with rest of paper} - In this chapter we provide an overview of the problem of container selection and its effect on program correctness and performance. We then provide an overview of approaches from modern programming languages and existing literature. +Finally, we explain how our system is novel, and the weaknesses in existing literature it solves. \section{Container Selection} The vast majority of programs will make extensive use of collection data types --- types intended to hold multiple instances of other types. -In many languages, the standard library provides a variety of collections, forcing users to choose which is best for their program. -Consider the Rust types \code{Vec}, a dynamic array, and \code{HashSet}, a hash-based set. -If a user cares about ordering, or about preserving duplicates, then they must use \code{Vec<T>}. -But if they do not, then \code{HashSet<T>} might be more performant, provided \code{contains} is used repeatedly. +In many languages, the standard library provides a variety of collections, with users able to choose which is best for their program. +This saves users a lot of time, however selecting the best type is not always straightforward. + +Consider a program which needs to store and query a set of numbers, and doesn't care about ordering or duplicates. +If the number of items ($n$) is small enough, it might be fastest to use a dynamic array, and scan through each time we want to check if a number is inside. +On the other hand, if the set we deal with is much larger, we may want the constant-time lookups provided by hash sets, at the cost of a generally slower lookup. -We refer to this problem as container selection, and say that we must satisfy both functional requirements and non-functional requirements. +In this case, there are two factors driving our decision. +Our functional requirements, that we don't care about ordering or duplicates, and our non-functional requirements, that we want our program to be fast. \subsection{Functional requirements} -The functional requirements tell us how the container will be used and how it must behave. +Functional requirements tell us how the container will be used and how it must behave. +Continuing with our previous example, we'll compare Rust's \code{Vec} type (a dynamic array), with the \code{HashSet} type. -Continuing with our previous example, we can see that \code{Vec} and \code{HashSet} implement different methods. -\code{Vec} implements \code{.get(index)}, while \code{HashSet} does not; this would not be possible for an unordered collection. -If we attempt to replace \code{Vec} with \code{HashSet}, the resulting program will likely not compile. +Note that the two types have different methods: \code{Vec} implements \code{.get(index)}, while \code{HashSet} does not; \code{HashSet}s aren't ordered so this doesn't make sense. +If we were building a program that needed an ordered collection, replacing \code{Vec} with \code{HashSet} probably wouldn't compile. -We will call the operations a container implements the ``syntactic properties'' of the container. -In object-oriented programming, we might say they must implement an ``interface'', while in Rust, we would say that they implement a ``trait''. +We will call the operations a container provides the ``syntactic properties'' of the container. +In object-oriented programming, we might say they must implement an ``interface'', while in Rust, we could say that they implement a ``trait''. However, syntactic properties alone are not always enough to select an appropriate container. Suppose our program only requires a container to have \code{.insert(value)} and \code{.len()}. @@ -75,7 +77,7 @@ This means that developers are forced to guess based on their knowledge of the u \subsection{Rules-based approaches} One approach to the container selection problem is to allow the developer to make the choice initially, but use some tool to detect poor choices. -Chameleon\parencite{shacham_chameleon_2009} is a solution of this type. +Chameleon\parencite{shacham_chameleon_2009} uses this approach. It first collects statistics from program benchmarks using a ``semantic profiler''. This includes the space used by collections over time and the counts of each operation performed. @@ -92,8 +94,7 @@ This results in selection rules being more restricted than they otherwise could For instance, a rule cannot suggest a \code{HashSet} instead of a \code{LinkedList} as the two are not semantically identical. Chameleon has no way of knowing if doing so will break the program's functionality and so it does not make the suggestion. -%% TODO: Don't use citations as nouns -\cite{hutchison_coco_2013} and \cite{osterlund_dynamically_2013} use similar techniques, but work as the program runs. +CoCo \parencite{hutchison_coco_2013} and work by \"{O}sterlund \parencite{osterlund_dynamically_2013} use similar techniques, but work as the program runs. This works well for programs with different phases of execution, such as loading and then working on data. However, the overhead from profiling and from checking rules may not be worth the improvements in other programs, where access patterns are roughly the same throughout. @@ -148,11 +149,9 @@ As we note above, this scales poorly. \section{Contribution} -We aim to create a container selection method that addresses both functional and non-functional requirements in a scalable way. +We aim to create a container selection method that addresses both functional and non-functional requirements. -Primrose will be used as the first step, in order to select candidates based on functional requirements. -We will then collect statistics from user-provided benchmarks and create cost estimates for each candidate, similarly to CollectionSwitch. -Unlike CollectionSwitch, this will be done offline rather than as the program is running. +Users should be able to specify their functional requirements in a way that is expressive enough for most usecases, and easy to integrate with existing projects. +It should also be easy to add new container implementations, and we should be able to use it on large projects without selection time becoming an issue. -This will provide an integrated and flexible solution that addresses both functional and non-functional requirements. -We believe no such solution exists in current literature. +We focus on offline container selection (done before the program is compiled), however we also attempt to detect when changing implementation at runtime is desirable. diff --git a/thesis/parts/introduction.tex b/thesis/parts/introduction.tex index 42f29c3..452a6d8 100644 --- a/thesis/parts/introduction.tex +++ b/thesis/parts/introduction.tex @@ -1,3 +1,31 @@ -\todo{Motivation: Effect of structure selection on performance} -\todo{Shortfalls in existing work: Flexibility, scalability} -\todo{Contributions: Speed, Accuracy, etc.} + +%% *** Introduce problem + +%% **** Container types common in programs + +The vast majority of programs will make extensive use of collection data types --- types intended to hold multiple instances of other types. +This allows programmers to use things like growable lists, sets, or trees without worrying about implementing them themselves. + +%% **** Functionally identical implementations + +However, this still leaves the problem of selecting the ``best'' underlying implementation. +Most programmers will simply stick with the same one every time, with some languages like Python even building in a single implementation for everyone. +%% **** Large difference in performance +While this is simplest, it can have a drastic effect on performance in many cases (\cite{l_liu_perflint_2009}, \cite{jung_brainy_2011}). + +%% *** Motivate w/ effectiveness claims + +%% *** Overview of aims & approach + +We propose a system for the automatic selection of container implementations, based on both user-specified requirements and inferred requirements for performance. +%% **** Scalability to larger projects +%% **** Ease of integration into existing projects +%% **** Ease of adding new container types +Our system is built to be scalable, both in the sense that it can be applied to large projects, and that new container types can be added with ease. + +%% **** Flexibility of selection +We are also able to detect some cases where the optimal container type varies at runtime, and supply containers which start off as one implementation, and move to another when it is more optimal to do so. + +%% *** Overview of results +\todo{Overview of results} + |