#+TITLE: Tasks * TODO Write background chapter DEADLINE: <2023-10-20 Fri> ** TODO Problem Introduction - applications use many different container types - developers often only care about the functional requirements/semantics of these containers - however, they are usually forced to specify a concrete implementation (examples) ** TODO Motivation - justify performance benefit ** TODO Look into Perflint https://ieeexplore.ieee.org/abstract/document/4907670 ** TODO Brainy - uses ai model to predict based on target microarchitecture, and runtime behaviour - uses access pattersn, etc. - also assumes semantically identical set of candidates - uses application generator for training data - focuses on the performance difference between microarchitectures - intended to be run at each install site ** TODO Redraft Chameleon ** TODO CollectionSwitch - online selection - uses library so easier to integrate - collects access patterns, size patterns, etc. - performance model is built beforehand for each concrete implementation, with a cost model used to estimate the relative performance of each based on observed usage - switches underlying implementation dynamically - also able to decide size thresholds where the implementation should be changed and do this - doesn't require specific knowledge of the implementations, although does still assume all are semantically equivalent ** TODO Primrose - primrose allows specifying syntactic and semantic properties, and gives concrete implementations satisfying these properties - however, this only deals with the functional requirements for the program, and not non-functional requirements - it is still up to the developer to choose which of these performs the best, etc. or brute force it ** TODO other papers [20] MITCHELL, J. C. Representation independence and data abstraction. In POPL ’86: Proceedings of the 13th ACM SIGACT-SIGPLAN symposium on Principles of programming languages (New York, NY, USA, 1986), ACM, pp. 263–276. * Workshop background - gives context for project - motivates project and explains importance/contributions, as well as feasibility 1. what has been done previously 2. where does project fit 3. what does reader need to know 2 & 3 guide 1 audience is someone with background in informatics, but not necessariy in the same problem space * Planned design - Based on design used by collectionswitch - Least intervention required per implementation - Integrate with primrose to get the candidate collections - Ideally this would just be using the rust crate, or having a JSON interface to a CLI - For each collection and for each 'critical operation', generate a cost estimate when the collection is a given size - $C_{op}(n)$ - Perform operation repeatedly at various n, and fit a polynomial to that - Requires some trait constraints, and some annotation of traits to know what are 'critical operations' - This step should only need to be run once per computer - could be shared by default and run again for better accuracy - Semantic Profiler - For each allocated collection: - Max size (in terms of items) - # of each operation - This should be aggregated by 'allocation site' (specified by last few bits of callstack). - Not sure how to do this, maybe look at how tracing crate does it - Requires user to write their own benchmarks - criterion is popular for this, and might have hooks? - doesn't need to be /super/ lightweight, just enough to not make things painful to run. - Approximate a cost for each candidate as $\sum_{}op C_{op}(n) * #op/#total$. - We could extend this to suggest different approaches if there is a spread of max n. - If time allows, could attempt to create a 'wrapper type' that switches between collections as n changes, using rules decided by something similar to the above algorithm.