aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAria <me@aria.rip>2023-10-01 17:03:09 +0100
committerAria <me@aria.rip>2023-10-01 17:03:09 +0100
commit57e87634490eca3333e2283b596ed48f887cfb89 (patch)
tree9a3761711bbc5094a94fac9e078d5a76ea780dcd
parent5b60993829edaab8254491358ac11a0a19268168 (diff)
some notes on planned design
-rw-r--r--Tasks.org25
1 files changed, 24 insertions, 1 deletions
diff --git a/Tasks.org b/Tasks.org
index 82b31ec..1b71fb5 100644
--- a/Tasks.org
+++ b/Tasks.org
@@ -47,4 +47,27 @@ https://ieeexplore.ieee.org/abstract/document/4907670
[20] MITCHELL, J. C. Representation independence and data abstraction. In POPL ’86: Proceedings of the 13th ACM SIGACT-SIGPLAN symposium on Principles of programming languages (New York, NY, USA, 1986), ACM, pp. 263–276.
-* NEXT Make notes on planned design
+* Planned design
+
+The design used by CollectionSwitch is the one that works with the least intervention required per new collection type.
+
+We need some way to integrate with primrose to get the candidate collections.
+Ideally this would just be using the rust crate, or having a JSON interface to a CLI.
+
+For each collection and for each 'critical operation', we generate a performance estimate by performing it repeatedly at various n, and fitting a polynomial to that.
+This gives us an estimate of the cost of each operation when the collection is a given size - $C_{op}(n)$.
+This step should only need to be run once per compuer, or it could even be shared by default and run again for better accuracy.
+
+Then, we need a 'semantic profiler'. For our cases, this should collect:
+ - Max size (in terms of memory used?)
+ - Max size (in terms of items)
+ - # of each operation
+for each individually allocated collection.
+This should then be aggregated by 'allocation site' (specified by last few bits of callstack).
+This does require the user to write their own benchmarks - we could maybe hook into criterion for data collection, as it is already popular.
+This profiler doesn't need to be /super/ lightweight, just enough to not make things painful to run.
+
+Then we can approximate a cost for each candidate as $\sum_{}op C_{op}(n) * #op/#total$.
+We could extend this to suggest different approaches if there is a spread of max n.
+
+If time allows, we could attempt to create a 'wrapper type' that switches between collections as n changes, using rules decided by something similar to the above algorithm.