From 59702ad97a03230c61a186768046b4a94aead024 Mon Sep 17 00:00:00 2001 From: Aria Date: Sat, 30 Sep 2023 16:15:05 +0100 Subject: start writing background chapter --- thesis/main.tex | 3 +++ thesis/parts/background.tex | 31 ++++++++++++++++++++++++++++++- 2 files changed, 33 insertions(+), 1 deletion(-) (limited to 'thesis') diff --git a/thesis/main.tex b/thesis/main.tex index 05fe7ac..4024cd5 100644 --- a/thesis/main.tex +++ b/thesis/main.tex @@ -6,6 +6,9 @@ \usepackage{microtype} +%% Convenience macros +\newcommand{\code}{\texttt} + \begin{document} \begin{preliminary} diff --git a/thesis/parts/background.tex b/thesis/parts/background.tex index 32f06ad..f2490ef 100644 --- a/thesis/parts/background.tex +++ b/thesis/parts/background.tex @@ -1 +1,30 @@ -Some background information +This chapter provides an overview of the problem of container selection, and its effect on program correctness and performance. +Then, it provides an overview of how current programming languages approach this problem, and how the existing literature proposes to solve it. +Finally, we examine the gaps in the existing literature, and how this paper aims to contribute to it. + +\section{Container Types} + +The majority of programs make extensive use of collection data types, that is, types intended to hold many different instances of other data types. + +In many cases, these collections have very different properties and purposes. +For instance, a \code{HashMap} is associative, mapping arbitrary keys to values and disallowing duplicate keys. +By contrast, a \code{HashSet} stores some set of values, without ordering or keys. +A social networking site may use a \code{HashMap} to map usernames to followers, and a \code{HashSet} to store a set of names of followers. + +In this case, \code{HashMap} and \code{HashSet} both have a different set of operations that make sense. +This results in a different set of methods. HashMap would likely have methods such as \code{insert(Key, Value)} and \code{get(Key)}, whereas \code{HashSet} would have neither and would instead have \code{insert(T)} and \code{contains(T)}. +We will refer to the set of methods supported by a container as its ``syntactic properties''. + +However, syntactic properties alone are not enough to identify a container. +Note that an ordered container such as a \code{Vector} would be able to provide the same methods as a \code{HashSet}, and some extra. +As an application developer, we may require a container that does not allow duplicates, a constraint which \code{HashSet} satisfies but that \code{Vector} does not. +Therefore, we say that a container implementation must also have ``semantic properties''. We will avoid defining these formally for now, although informally they can be though of as conditions that will always hold for the container. + +Depending on the structure of the program, these collections will have varying interfaces, for instance they may be associative (mapping key to value), ordered (mapping index to value), or unordered (only keeping track of whether an element is contained or not). +In many programming languages, different implementations of these collections will implement a shared interface, for instance Collection in Java. +However, these interfaces are normally concerned only with the programming interface, and make no guarantees on the semantic properties of the implementation. In Java, both the HashSet and the ArrayList class implement Collection, however the former does not store duplicates and the latter does. + +In practice, the main way for developers to guarantee the semantic properties of some container, is to pick a concrete implementation rather than an interface. +This forces the developer to make a comparatively low-level choice, for instance between HashSet and LinkedHashSet. +In many cases, the developer does not care or understand about the implications of this choice, and so will simply choose at random. +Depending on the application however, the choice of concrete implementation can have a large effect on performance. -- cgit v1.2.3