1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
|
In this chapter, we provide an overview of the problem of container selection, and its effect on program correctness and performance.
We then provide an overview of how modern programming languages approach this problem, and how existing literature differs.
Finally, we examine the gaps in the existing literature, and explain how this paper aims to contribute to it.
\section{Container Selection}
The vast majority of programs will use make extensive use of collection data types - types intended to hold many different instances of other data types.
This can refer to anything from fixed-size arrays, to growable linked lists, to associative key-value mappings or dictionaries.
In some cases, these are built-in parts of the language: In Go, a list of ints has type \code{[]int} and a dictionary from string to string has type \code{map[string]string}.
In other languages, these are instead part of some standard library.
In Rust, you might write \code{Vec<isize>} and \code{HashMap<String, String>} for the same types.
This forces us to make a choice upfront: what type should we use?
In this case the answer is obvious - the two have very different purposes and don't support the same operations.
However, if we were to consider \code{Vec<isize>} and \code{HashSet<isize>}, the answer is much less obvious.
If we care about the ordering, or about preserving duplicates, then we must use \code{Vec<isize>}.
But if we don't, then \code{HashSet<isize>} may be more performant - for example if we use \code{contains} a lot.
We refer to this problem as container selection, and split it into two parts: Functional requirements, and non-functional requirements.
\subsection{Functional requirements}
Functional requirements refers to a similar definition as is normally used for software: The container must behave the way that the program expects it to.
Continuing with our previous example, we can first note that \code{Vec} and \code{HashSet} implement different sets of methods.
\code{Vec} implements methods like \code{.get(index)} and \code{.push(value)}, while \code{HashSet} implements neither - they don't make sense for an unordered collection.
Similarly, \code{HashSet} implements \code{.replace(value)} and \code{.is\_subset(other)}, neither of which make sense for \code{Vec}.
If we try to swap \code{Vec} for \code{HashSet}, the resulting program may not compile.
These restrictions form the first part of our functional requirements - the ``syntactic properties'' of the containers must satisfy the program's requirements.
In object-oriented programming, we might say they must implement an interface.
However, syntactic properties alone are not always enough to select an appropriate container.
Suppose our program only requires a container to have \code{.insert(value)}, \code{.contains(value)}, and \code{.len()}.
Both \code{Vec} and \code{HashSet} will satisfy these requirements.
However, our program might rely on \code{.len()} returning a count including duplicates.
In this case, \code{HashSet} would give us different behaviour, possibly causing our program to behave incorrectly.
To express this, we say that a container implementation also has ``semantic properties'' that must satisfy our requirements.
Intuitively we can think of this as what conditions the container upholds.
For a set, this would include that there are never any duplicates % TODO
\subsection{Non-functional requirements}
While meeting the functional requirements is generally enough to ensure a program runs correctly, we also want to ensure we choose the 'best' type we can.
There are many measures for this, but we will focus primarily on time: how much we can affect the runtime of the program.
If we assume we can find a selection of types that satisfy the functional requirements, then one obvious solution is just to benchmark the program with each of these implementations in place, and see which works best.
This will obviously work, however note that as well as our program, we need to develop benchmarks.
If the benchmarks are flawed, or don't represent how our program is used in practice, then we may get drastically different results in the 'real world'.
%% TODO: Motivate how this improves performance
\section{Prior Literature}
\subsection{Approaches in common programming languages}
%% TODO
\subsection{Chameleon}
Chameleon\parencite{shacham_chameleon_2009} is a solution that focuses on the non-functional requirements of container selection.
First, it runs the program with some example input, and collects data on the collections used using a ``semantic profiler''.
This data includes the space used by collections, the minimum space that could be used by all of the items of that collection, and the number of each operation performed.
These statistics are tracked per individual collection allocated, and then aggregated by 'allocation context' - a portion of the callstack where the allocation occured.
These aggregated statistics are then passed to a rules engine, which uses a set of rules to suggest places a different container type might improve performance.
For example, a rule could check when a linked list often has items accessed by index, and suggest a different list implementation as a replacement.
This results in a flexible engine for providing suggestions, which can be extended with new rules and types as necessary.
%% todo: something about online selection part
Unfortunately, this does require the developer to come up with and add replacement rules for each implementation.
In many cases, there may be patterns that could be used to suggest a better option, but that the developer does not see or is not able to formalise.
Chameleon also makes no attempt to select based on functional requirements.
This results in selection rules needing to be more restricted than they otherwise could be.
For instance, a rule cannot suggest a \code{HashSet} instead of a \code{LinkedList}, as the two are not semantically identical.
Chameleon has no way of knowing if doing so will break the program's functionality, and so it does not make a suggestion.
\subsection{Brainy}
%% - uses ai model to predict based on target microarchitecture, and runtime behaviour
%% - uses access pattersn, etc.
%% - also assumes semantically identical set of candidates
%% - uses application generator for training data
%% - focuses on the performance difference between microarchitectures
%% - intended to be run at each install site
Brainy\parencite{jung_brainy_2011} also focuses on non-functional requirements, but uses Machine Learning techniques instead of set rules.
Similar to Chameleon, Brainy runs the program with example input, and collects statistics on how collections are used.
Unlike Chameleon, these statistics include some hardware counters, such as cache utilisation and branch misprediction rate.
This profiling information is then fed to an ML model, which predicts the implementation likely to be most performant for the specific program and microarchitecture, from the models that the model was trained to use.
Of the existing literature, Brainy appears to be the only method which directly accounts for hardware factors.
The authors propose that their tool can be run at install-time for each target system, and then used by developers or by applications integrated with it to select the best data structure for that hardware.
This allows it to compensate for the differences in performance that can come from different hardware configurations - for instance, the size of the cache may affect the performance of a growable list compared to a linked list.
The paper itself demonstrates the effectiveness of this, stating that ``On average, 43\% of the randomly generated applications have different optimal data structures [across different architectures]''.
The model itself is trained on a dataset of randomly generated applications, which are randomly generated sequences of operations.
This is intended to avoid overfitting on specific applications, as a large number of applications with different characteristics can be generated.
However, the applications generated are unlikely to be representative of real applications.
In practice, there are usually patterns of certain combinations that are repeated, meaning the next operation is never truly random.
Brainy determines which types satisfy the functional requirements based on the original data structure (vector, list, set), and whether the order is ever used.
This allows for a bigger pool of containers to choose from, for instance a vector can also be swapped for a set in some circumstances.
However, this approach is still limited in the semantics it can identify, for instance it cannot differentiate a stack or queue from any other type of list.
\subsection{CollectionSwitch}
%% - online selection - uses library so easier to integrate
%% - collects access patterns, size patterns, etc.
%% - performance model is built beforehand for each concrete implementation, with a cost model used to estimate the relative performance of each based on observed usage
%% - switches underlying implementation dynamically
%% - also able to decide size thresholds where the implementation should be changed and do this
%% - doesn't require specific knowledge of the implementations, although does still assume all are semantically equivalent
CollectionSwitch\parencite{costa_collectionswitch_2018} takes a different approach to the container selection problem, adapting as the program runs and new information becomes available.
First, a performance model is built for each container implementation.
This is done by performing each operation many times in succession, varying the length of the collection.
This data is used to fit a polynomial, which gives an estimate of cost per operation at a given n.
The total cost for each collection type is then calculated for each individual instance over its lifetime.
If switching to another implementation will drop the average total cost more than a certain threshold, then CollectionSwitch will start using that collection for newly allocated instances, and may also switch existing instances over to it.
By generating a cost model based on benchmarks, CollectionSwitch manages to be more flexible than other rules-based approaches such as Chameleon.
%% TODO: comment on functional selection
\subsection{Primrose}
%% TODO
\section{Contributions}
|