This chapter outlines the design of our container selection system (Candelabra), and justifies our design decisions. We first describe our aims and priorities for the system, and illustrate its usage with an example. We then provide an overview of the container selection process, and each part in it. We leave detailed discussion of implementation for chapter \ref{chap:implementation}. \section{Aims \& Usage} As mentioned previously, we aim to create an all-in-one solution for container selection that can select based on both functional and non-functional requirements. Flexibility is a high priority: It should be easy to add new container implementations, and to integrate our system into existing applications. Our system should also be able to scale to larger programs, and remain convenient for developers to use. We chose to implement our system as a Rust CLI, and to work on programs also written in Rust. We chose Rust both for the expressivity of its type system, and its focus on speed and low-level control. However, most of the techniques we use are not tied to Rust in particular, and so should be possible to generalise to other languages. We require the user to provide their own benchmarks, which should be representative of a typical application run - without this, we have no consistent way to evaluate speed. Users specify their functional requirements by listing the required traits and properties they need for a given container type. Traits are Rust's primary method of abstraction, and are similar to interfaces in object-oriented languages, or typeclasses in functional languages. Properties are specified in a lisp-like DSL as a predicate on a model of the container. For example, Listing \ref{lst:selection_example} shows code from our test case based on the sieve of Eratosthenes (\code{src/tests/prime_sieve} in the source artifacts). Here we request two container types: \code{Sieve} and \code{Primes}. The first must implement the \code{Container} and \code{Stack} traits, and must satisfy the \code{lifo} property. This property is defined at the top as only being applicable to \code{Stack}s, and requires that for any \code{x}, pushing \code{x} then popping from the container returns \code{x}. The second container type, \code{Primes}, must only implement the \code{Container} trait, and must satisfy the \code{ascending} property. This property requires that for all consecutive \code{x, y} pairs in the container, \code{x <= y}. \begin{figure} \begin{lstlisting}[caption=Container type definitions for prime\_sieve,label={lst:selection_example}] /*SPEC* property lifo { \c <: (Stack) -> (forall \x -> ((equal? (pop ((op-push c) x))) x)) } property ascending { \c -> ((for-all-consecutive-pairs c) leq?) } type Sieve = {c impl (Container, Stack) | (lifo c)} type Primes = {c impl (Container) | (ascending c)} *ENDSPEC*/ \end{lstlisting} \end{figure} Once we've specified our functional requirements and provided a benchmark (\code{src/tests/prime_sieve/benches/main.rs}), we can simply run Candelabra to select a container: \code{candelabra-cli -p prime_sieve select}. This command outputs something like Table \ref{table:selection_output}, and saves the best combination of container types to be used the next time the program is run. Here, the code generated uses \code{Vec} as the implementation for \code{Sieve}, and \code{HashSet} as the implementation for \code{Primes}. \begin{table}[h] \centering \begin{tabular}{|c|c|c|c|} Name & Implementation & Estimated Cost \\ \hline Sieve & std::vec::Vec & 159040493883 \\ Sieve & std::collections::LinkedList & 564583506434 \\ Primes & primrose\_library::SortedVec & 414991320 \\ Primes & std::collections::BTreeSet & 355962089 \\ Primes & std::collections::HashSet & 309638677 \\ \end{tabular} \caption{Example output from selection command} \label{table:selection_output} \end{table} \section{Overview of process} Our tool integrates with Rust's packaging system (Cargo) to discover the information it needs about our project, then runs Primrose to find a list of implementations satsifying our functional requirements. Once we have this list, we then build a 'cost model' for each candidate type. This allows us to get an upper bound for the runtime cost of an operation at any given n. We then run the user-provided benchmarks, using any of the valid candidates instrumented to track the number of each operation, and the maximum n value it reaches. We combine this information with our cost models to estimate a total cost for each candidate, which is an upper bound on the total time taken for all container operations. At this point, we also check if an 'adaptive' container would be better, by checking if one implementation is better performing at a lower n, and another at a higher n. Finally, we pick the container with the minimum cost, and create a new Rust file where the chosen container type is exported. Our solution requires little user intervention, integrates well with existing workflows, and the time it takes scales linearly with the number of container types in a given project. We now go into more detail on how each step works, although we leave some specifics until Chapter \ref{chap:implementation}. \section{Functional requirements} %% Explain role in entire process As described in Chapter \ref{chap:background}, any implementation we pick needs to satisfy the program's functional requirements. We do this by integrating Primrose \parencite{qin_primrose_2023} as a first step. Primrose allows users to specify both the traits they require in an implementation (essentially the API and methods available), and what properties must be satisfied. Each container type that we want to select an implementation for is bound by a list of traits and a list of properties (lines 11 and 12 in Listing \ref{lst:selection_example}). %% Short explanation of selection method In brief, primrose works by: \begin{itemize} \item Finding all implementations in the container library that implement all required traits \item Translate any specified properties to a Rosette expression \item For each implementation, model the behaviour of each operation in Rosette, and check that the required properties always hold \end{itemize} We use the code provided with the Primrose paper, with minor modifications elaborated on in Chapter \ref{chap:implementation}. At this stage, we have a list of implementations for each container type we are selecting. The command \code{candelabra-cli candidates} will show this output, as in Table \ref{table:candidates_prime_sieve}. \begin{table}[h] \centering \begin{tabular}{|c|c|c|} Type & Implementation & File \\ \hline Primes & primrose\_library::EagerSortedVec & prime\_sieve/src/types.pr.rs \\ Primes & std::collections::HashSet & prime\_sieve/src/types.pr.rs \\ Primes & std::collections::BTreeSet & prime\_sieve/src/types.pr.rs \\ Sieve & std::collections::LinkedList & prime\_sieve/src/types.pr.rs \\ Sieve & std::vec::Vec & prime\_sieve/src/types.pr.rs \\ \end{tabular} \caption{Usable implementations by container type for \code{prime_sieve}} \label{table:candidates_prime_sieve} \end{table} %% Abstraction over backend Although we use primrose in our implementation, the rest of our system isn't dependent on it, and it would be relatively simple to use a different approach to select based on functional requirements. \section{Cost Models} We use an approach similar to CollectionSwitch\parencite{costa_collectionswitch_2018}, which assumes that the main factor in how long an operation takes is the current size of the collection. %% Benchmarks Each operation has a seperate cost model, which we build by executing the operation repeatedly on collections of various sizes. For example, to build a cost model for \code{Vec::contains}, we would create several \code{Vec}s of varying sizes, and find the average execution time $t$ of \code{contains} at each. %% Linear Regression We then perform linear regression, using the collection size $n$ to predict $t$. In the case of \code{Vec::contains}, we would expect the resulting polynomial to be roughly linear. Once we have the data, we fit a polynomial to the data. Whilst we could use a more complex technique, in practice this is good enough: Very few common operations are above $O(n^3)$, and factors such as logarithms are usually 'close enough'. %% Limitations This method works well for many operations and structures, although has notable limitations. For example, the container implementation \code{LazySortedVec} (provided by Primrose) inserts new elements at the end by default, and only sorts them when an operation that relies on the order is called. We were unable to work around this, and so we have removed these variants from our container library. A potential solution could be to perform untimed 'warmup' operations before each operation, but this is complex because it requires some understanding of what operations will cause work to be deferred. \section{Profiling applications} %% Data Collected As mentioned above, the ordering of operations can have a large effect on container performance. Unfortunately, tracking every container operation in order quickly becomes unfeasible, so we settle for tracking the count of each operation, and the maximum size of each collection instance. Every instance/allocation of the collection is tracked separately, and results are collated after profiling. %% Segmentation Results with a close enough n value get sorted into partitions, where each partition stores the average count of each operation, and a weight indicating how common results in that partition were. This serves 3 purposes. The first is to compress the data, which speeds up processing and stops us running out of memory in more complex programs. The second is to capture the fact that the number of operations will likely depend on the size of the container. The third is to aid in searching for adaptive containers, which will be elaborated on later. %% Limitations w/ pre-benchmark steps \todo{not taking into account 'preparatory' operations during benchmarks} \section{Selection process \& adaptive containers} %% Selection process Once we have an estimate of how long each operation may take (from our cost models), and how often we use each operation (from our profiling information), we combine these to estimate the total cost of each implementation. For each implementation, our total cost estimate is: \[ \sum_{op\in \textrm{ops}} \sum_{(r_{op}, N) \in \textrm{partitions}} C_\textrm{op}(N) * r_\textrm{op} \] where $C_{op}$ is the cost estimated by the cost model for operation $op$ at n value $N$, and $r_{op}, N$ is the average count of a given operation and the maximum N in a partition. %% Adaptive container detection adaptive containers are implemented using const generics, and a wrapper class. they are detected by finding the best implementation for each partition, sorting by n, and seeing if we can split the partitions in half where a different implementation is best on each side we then check if the cost saving is greater than the cost of a clear operation and n insert operations %% Code generation