One of our recent properietary client projects is to improve the autonomous driving (AD) simulation they use for training AI models for self-driving cars. The pre-existing project referenced literally hundreds of third-party libraries, and includes modules for spatial indexing, 3d rendering, traffic simulators, agent controllers, environment simulators, physics engines, etc.
Client: "it's slow".
Well, of course it's slow! But how to diagnose the problem and propose a solution in such a complex environment? The key: understanding everything in the system, every component, what it's doing, how fast it should be, measuring how fast it is...
When we started, the simulator struggled to achieve 10 frames per second (fps); with only a basic first-pass optimization, it now achieves 25fps, a 250% speedup. Next on the roadmap is to port the simulator core from Python to C (while still allowing clients to use the same Python APIs), "vectorizing it" along the way. We have no doubt we'll be able to achieve 100fps, and we'll be disappointed if we don't achieve 1000fps, while providing an increased level of functionality to boot.