Tasks.org


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72

#+TITLE: Tasks

* TODO Write background chapter
DEADLINE: <2023-10-20 Fri>

** TODO Problem Introduction

- applications use many different container types
- developers often only care about the functional requirements/semantics of these containers
- however, they are usually forced to specify a concrete implementation (examples)

** TODO Motivation

  - justify performance benefit

** TODO Look into Perflint

https://ieeexplore.ieee.org/abstract/document/4907670

** TODO Brainy

  - uses ai model to predict based on target microarchitecture, and runtime behaviour
  - uses access pattersn, etc.
  - also assumes semantically identical set of candidates
  - uses application generator for training data
  - focuses on the performance difference between microarchitectures
  - intended to be run at each install site

** TODO Redraft Chameleon

** TODO CollectionSwitch

  - online selection - uses library so easier to integrate
  - collects access patterns, size patterns, etc.
  - performance model is built beforehand for each concrete implementation, with a cost model used to estimate the relative performance of each based on observed usage
  - switches underlying implementation dynamically
  - also able to decide size thresholds where the implementation should be changed and do this
  - doesn't require specific knowledge of the implementations, although does still assume all are semantically equivalent

** TODO Primrose

- primrose allows specifying syntactic and semantic properties, and gives concrete implementations satisfying these properties
- however, this only deals with the functional requirements for the program, and not non-functional requirements
- it is still up to the developer to choose which of these performs the best, etc. or brute force it

** TODO other papers

[20] MITCHELL, J. C. Representation independence and data abstraction. In POPL ’86: Proceedings of the 13th ACM SIGACT-SIGPLAN symposium on Principles of programming languages (New York, NY, USA, 1986), ACM, pp. 263–276.

* Planned design

  - Based on design used by collectionswitch
    - Least intervention required per implementation
  - Integrate with primrose to get the candidate collections
    - Ideally this would just be using the rust crate, or having a JSON interface to a CLI
  - For each collection and for each 'critical operation', generate a cost estimate when the collection is a given size - $C_{op}(n)$
    - Perform operation repeatedly at various n, and fit a polynomial to that
    - Requires some trait constraints, and some annotation of traits to know what are 'critical operations'
    - This step should only need to be run once per computer
      - could be shared by default and run again for better accuracy
  - Semantic Profiler
    - For each allocated collection:
      - Max size (in terms of items)
      - # of each operation
    - This should be aggregated by 'allocation site' (specified by last few bits of callstack).
      - Not sure how to do this, maybe look at how tracing crate does it
    - Requires user to write their own benchmarks
      - criterion is popular for this, and might have hooks?
    - doesn't need to be /super/ lightweight, just enough to not make things painful to run.
  - Approximate a cost for each candidate as $\sum_{}op C_{op}(n) * #op/#total$.
    - We could extend this to suggest different approaches if there is a spread of max n.
  - If time allows, could attempt to create a 'wrapper type' that switches between collections as n changes, using rules decided by something similar to the above algorithm.