Mitigating cadCAD overhead

Hi!

I am trying to model Ethereum’s EIP 1559, which is a mechanism for pricing blockchain transactions. The idea is that user transactions accumulate over time (the “demand”, like a mempool), while block producers follow a policy for to choose which transactions to include depending on the amount they receive from the transaction (the “tip”). The transactions included in a block are then removed from the demand.

It can be quite elegantly written using cadCAD (e.g., in this notebook). Particularly, I am simulating what happens to the mechanism when demand spikes suddenly, so my state object representing the demand grows quite large.

The simulation becomes quite slow when the spike hits (you can also see a plot of that spike before the run in the notebook). I wanted to check whether this was due to how I manage the demand object, which needs to be sorted and deleted from in each step. I also logged the time it takes for a step to complete (“loop time”):

As you can see from the log above (also in the notebook), around step 32 the loop time starts increasing, while my sorting and deleting operations remain fairly quick.

To compare, I also produced the “same” simulation, this time without cadCAD, in this notebook. The loop time is much quicker, as it doesn’t seem to suffer from the same overhead (apparently I cannot put two images in a post but I left the output of the simulation run in the notebook).

All in all, it seems to me that some of the operations happening in the background of cadCAD’s execution take a bit of time when the state grows.

  • This could probably be mitigated by keeping the demand out of the state, but conceptually this is very much part of my system specifications, or at least something I want cadCAD to “track” and feed into my policies.
  • Another way might be to only keep in state changes to the demand (as in, differentials), since cadCAD records historical state, so at any step I should be able to reconstruct the current demand based on historical demand and historical included transactions, though this would result in code that I don’t think would be very intuitive and seems to negate the benefits of cadCAD. The simulation without cadCAD records historical state too.
  • Maybe there is another much simpler reason why loop times are increasing!

I was wondering if you had more insight into what could be improved or which operations you believe could be causing the overhead. I hope the examples in the notebooks are minimal enough. I also looked into cadCAD’s source though did not do additional logging from there.

Looking forward to your ideas!

1 Like

Thanks for raising this, Barnabe.

One of the causes behind this is the fact that behind the scenes cadCAD deepcopies state objects between timesteps. In your non-cadCAD notebook I saw an increase in loop time when I deepcopy the demand object instead of just copying it. But it doesn’t seem to explain it all, so it might be worth timing some of the internals of cadCAD source to get the full picture.

@JEJodesty might have some ideas of how to address this. He’s spent some time working on ways to minimize memory usage due to large objects being copied over and over, and some of the solutions might be fit to address the computation bottleneck of this use case.

1 Like

Thank you Markus!
Indeed, I was only doing a shallow copy of demand since there is no side effect here (I am not mutating the objects demand holds so I can keep multiple references to the same Transaction).

I also tried to change demand.copy() to deepcopy(demand), this slows down the simulation clearly but is still somewhat faster than the cadCAD version. Adding the log of loop times here:

Could memory management be part of the cadCAD API, like could I specify that I don’t need a deep copy always? (maybe keeping deepcopy as a default still since that’s the most intuitive)

1 Like

Yes, we have something like this in the roadmap. In the meantime, I wonder if overriding the deepcopy method so as to return a shallow copy instead could work as a workaroud? In your case, instead of storing the demand in a dict you’d store it in a custom subclass of dict. Actually, if anyone in the community wants to do some research on this and implement a demo, I think it would be extremely valuable.

1 Like

@barnabemonnot, there is an cadCAD fork which uses immutable objects instead of deepcopy() on https://github.com/danlessa/cadCAD/tree/tweaks

This can generate huge speedups on some simulations. Just clone it and install locally by using pip3 install -e .

4 Likes

@barnabemonnot we ran into this same issue last week in the context of a client project, and @danlessa came up with this idea. We’re still evaluating this and how best to add it to the next release of cadCAD, but it looks very promising. In some of our internal benchmarks, simulation time was cut down by a factor of 100!

I ran commit 9e6732a of the EIP 1559 model and these were the results.

So thanks again for flagging this, @barnabemonnot

2 Likes

Thanks @danlessa and @markusbkoch!

Indeed from your results it looks like the individual operations broadly add up to the loop time, which is the ideal. Will definitely try out Danillo’s fork!

I have yet to try Dan’s fork, but was thinking that the main reason I would want deepcopy is so that I can query the dataset after the simulation, and pretty much get anything out of the historical states.

These “anythings” I want are observers, i.e. functions of the state that map to some other domain, usually a “simple” domain, like numbers or arrays of elementary types, not deep objects. For instance given the state containing a dictionary of all transactions, an observer could be the number of transactions in the dictionary. Measuring the observers is done ex post in the tutorials by calling lambda functions on the resulting df. This assumes that we can loop over historical states.

Instead we could define a state variable for each observer we want to measure, like keep in the state the number of transactions, but this might “pollute” the state space, and observers are morally not like state variables. So another idea is to specify observers (as in, lambda functions) much like state update blocks are specified before the simulation, with observers called after each state update. Instead of obtaining the complete dataset of historical states and then mapping that with observers over every time step, the observers would be called at each state update, and there would be no need to keep around copies of the state at every step, simply record the measured observers which are “smaller” than the state. Would that work?

Edit to change observable to observer, which seems to be the more correct term. Also tried out @danlessa 's fork and it works like a charm on the EIP 1559 notebook! I also tried it out on a different notebook where my state variable is a deep object (the state variable is a custom Network which contains BeaconState objects) and as expected it is faster but does not retain historical state. The observer pattern would be useful here.

1 Like

We use that pattern often, even when the relevant state variable is not a “deep object”. We call them “metrics”, but I can see how “observer” could be a more general term.

Computing the metric in the simulation (as if it were just another state variable) can increase code readability and improve version control, since diffing jupyter notebooks is often a pain. Also, in some cases the equation that defines the metric might depend on a system parameter being swept and computing it ex-post like we do in the tutorials could require some additional data munging. On the other hand, it might come at the expense of an additional Partial State Update Block, which could potentially increase the time it takes to run the simulation. But it does seem to be the only way of working around the unavailability of historical state when that’s the case.

This discussion is super interesting, I didn’t see it in this forum. Thank you for sharing!

Yes, it does seem that it’s less effort to add them as state variables, I tried this out here. To keep their “ontologies” somewhat separate in the declaration, I created simple functions to add user-defined metrics/observers to the PSUBs (this is only cosmetic). Combined with @danlessa’s fork the performance is really improved!

2 Likes