Why commands are so slow?

Deril's Avatar

Deril

03 Jan, 2012 01:50 PM

Hello,
I like this framework very much for cleaner code and DI. But framework performance does not look nice.
I am doing some serious stress testing on RobotLegs.

To run one empty command RL eats 0.07ms of execution time.

To put it to perspective. in 1 ms you can run 14 commands.

If you run you application at 30 FPS you got 33ms for your disposal. That gives you 460 Commands per ENTER_FRAME to completely use you execution time for empty command calls. (no space for rendering or any other code..)

For comparison : PureMVC does this work about 8 times faster.. giving you. 3500 empty commands per ENTER_FRAME.

For comparison : you can call object function 90.000 times per ENTER_FRAME to use execution time.


460 Commands per frame tick can be very limiting.

for example... its easy to imagine 500 objects on screen that invokes commands to do something with them per frame. You have no alternatives but to create commands that affects many objects.


If I understand correctly... RL command calls has this work to do:

1: creation of event object(and destruction..) (uses ~0.0006ms + more if garbage collector is triggered.)
2: creation of Command object(and destruction..) (uses ~0.0006ms)
3: injection magic...
4: call execute(). (uses ~0.0003ms )

Point 1 : can be improved by using signals. (it raised command count from 14 to 17 per 1ms)

any other ideas how to improve command performance?

Is it possible to have lazy commands triggered by event? same way we have lazy mediators?

  1. 1 Posted by jos on 03 Jan, 2012 02:18 PM

    jos's Avatar

    So, is it fast enough for your application? Or are these numbers just to have numbers? While benchmarks are important, the real key is, is it fast enough for YOUR application?

    If you are building a game, then you probably don't want to be using RL in your game's main loop. But for controlling the rest of the game - view transitions between screens, backend communications, setup/init of external data, etc etc, where speed is perhaps not as critical, then you are golden.

    My 2 cents.

  2. Support Staff 2 Posted by Till Schneidere... on 03 Jan, 2012 02:50 PM

    Till Schneidereit's Avatar

    Thanks for your feedback, Raima.

    I agree that that's a lot of overhead. The majority of that probably
    comes from the Swiftsuspenders DI container that creates the commands,
    whereas only a smaller part should be caused by Robotlegs itself.

    Which is why I went to great efforts to tune the performance for
    Swiftsuspenders 2, which Robotlegs 2 will use.

    Based on that, Robotlegs 2 should be a lot faster than 1. Once it's
    properly optimized itself, that is. We didn't really do that, yet, but
    Swiftsuspenders is already tuned as much as I know how to.

    On top of these gains, it will be possible to use advanced measures
    such as command instance pooling to further get the overhead down.

  3. 3 Posted by Deril on 03 Jan, 2012 04:03 PM

    Deril's Avatar

    jos - I need to stress test the tools and be able to "sell" them for big games, at the moment with games I had to deal with I imagine easily using a lot off commands per frame without getting into main game loop.
    It's especially dangerous then you have to see application in perspective. It have to be able to grow up to 2-3 years.

    Till Schneidereit - that sounds great! I am waiting for ver.2 eagerly.

    I wander is it possible to post-process application for release.. something like swf optimization tool sometimes do.

    Maybe it's possible to remove dynamic nature of injection with post-processing application for release. Imagine... instead of getting class definition dynamical every time you use it - it could be done once by post-processing application. It would scan all application classes and create with definitions about stuff that has to be injected.

    Is it something that would improve application execution...?

    That leads me to another idea... is it possible to add extra functions for injector, to let developers Create and add these concrete classes with description what must be injected?
    so in situations there performance becomes a problem - developers could remove stress from Swiftsuspenders DI, replacing it with some manual work.

  4. Support Staff 4 Posted by Till Schneidere... on 03 Jan, 2012 04:13 PM

    Till Schneidereit's Avatar

    > I wander is it possible to post-process application for release.. something like swf optimization tool sometimes do.
    >
    > Maybe it's possible to remove dynamic nature of injection with post-processing application for release.
    > Imagine... instead of getting class definition dynamical every time you use it - it could be done once by post-processing application. It would scan all
    > application classes and create with definitions about stuff that has to be injected.

    That is certainly possible - but quite a lot of work. Swiftsuspenders
    is already pretty complex and adding an ahead of time mechanism as a
    build-time step would probably increase its complexity quite a lot.
    That isn't to say that it'll never happen, but I don't think that I
    will work on such a feature any time soon. Patches, on the other hand,
    are very much welcome ;)

    >
    > Is it something that would improve application execution...?
    >
    > That leads me to another idea... is it possible to add extra functions for injector, to let developers Create and add these concrete classes with description what must be injected?
    > so in situations there performance becomes a problem - developers could remove stress from Swiftsuspenders DI, replacing it with some manual work.

    You can do that in Swiftsuspenders 2 already: Simply create your own
    DependencyProvider[1] and let it create your instances the fastest way
    possible.

    [1]: https://github.com/tschneidereit/SwiftSuspenders/blob/master/src/org/swiftsuspenders/dependencyproviders/DependencyProvider.as
    Here are two small samples:
    https://github.com/tschneidereit/SwiftSuspenders/tree/master/test/org/swiftsuspenders/support/providers

  5. 5 Posted by Deril on 03 Jan, 2012 04:38 PM

    Deril's Avatar

    I am reading it at the moment! (ver 1.6)

    I see class description IS read once. Then data is stored to m_injecteeDescriptions : Dictionary;

    I will dig into the code!

    thanks for time!

  6. Support Staff 6 Posted by Shaun Smith on 04 Jan, 2012 03:06 PM

    Shaun Smith's Avatar

    There is always going to be overhead when introducing a framework into your codebase, but when used properly that overhead should be entirely negligible. A game (or application) that fires off enough commands in a given frame for the impact to be noticeable has something very wrong with its architecture.

    Anything performance critical, like the game loop or rending engine, should not be flowing through framework machinery.

    Some examples of appropriate commands:

    LoadUserProfileCommand
    LoadGameWorldCommand
    UnlockAchievementCommand

    Some examples of very inappropriate commands:

    FireBulletCommand
    CheckCollisionsCommand
    RenderFrameCommand

    An application framework is designed to co-ordinate collaboration between functional areas of your application, NOT to actually do the work that those functional areas themselves should be doing. I think that this is a big mistake that a lot of people make.

    That's not to say that we don't care about performance - we have tried to keep things as lightweight as possible, and will create a performance test harness for V2 to tune things further.

  7. 7 Posted by Jos on 04 Jan, 2012 03:53 PM

    Jos's Avatar

    Here is another reply, by Shaun, to a game-related question.

  8. 8 Posted by Deril on 05 Jan, 2012 11:22 AM

    Deril's Avatar

    "A game (or application) that fires off enough commands in a given frame for the impact to be noticeable has something very wrong with its architecture."

    or.. it can be just a very big application. And even if you have only 3-5 commands for main loop every tick... I can imagine many random commands stack and have great impact to performance. especial if you want to break it down to modules well.

    I did a mistake with testing... testing it in debug mode.. :) release boosts performance to 33 empty RobotLegs commands to eat up 1 ms of execution time.

    This is still a bad number. And I am looking to options to boosts it.

    thanks for responses.

    PS: there is no need to be hostile. I am not attacking your Shrine of RobotLegs. Robotlegs is great! I love it!
    But I need to be skeptical and critical because applications that I want to use it on has a high performance limitations, huge scope, and involves lives of many people over course of couple years.

    I will release results of my stress testing with code shortly...

  9. Support Staff 9 Posted by Shaun Smith on 05 Jan, 2012 11:32 AM

    Shaun Smith's Avatar

    PS: there is no need to be hostile. I am not attacking your Shrine of RobotLegs. Robotlegs is great! I love it!

    Sorry, I did not mean to sound hostile, that wasn't my intention at all.

  10. Support Staff 10 Posted by Shaun Smith on 05 Jan, 2012 01:14 PM

    Shaun Smith's Avatar

    or.. it can be just a very big application. And even if you have only 3-5 commands for main loop every tick...

    It's not about the size or complexity of the application. Firing even a single application command on every tick is just not a good idea. Again, please don't think I'm being hostile, these are just my opinions, take it all with a pinch of salt.

    I can imagine many random commands stack and have great impact to performance.

    I'm suggesting that if this happens then there is something very wrong with the design of the application. Application commands should not be tied to the game loop at all. Anything that needs to happen on every tick should be happening elsewhere - inside of an engine, module, or object.

    When it's time to pause, load a new level or send data to a server, then firing off a command to co-ordinate that task is appropriate.

    But updating game entities, detecting collisions and rendering scenes are low-level implementation details that should not be forced through an application framework. It doesn't matter how big the application is, it's about working at the right level of abstraction and using the right tools for the job.

    For example, I have a huge application that has been in development for a number of years. It has a bunch of commands, but not that many. They exist to bootstrap the application and co-ordinate its various sub systems.

    Within that app there is a drawing tool that people can use to draw pictures on photos. The drawing tool makes use of the Command Pattern to provide undo/redo functionality. But those commands are lightweight and do not involve the rest of my application in any way. They are not dispatched on the application's event bus, they are not injected with application dependencies, and nothing outside of the drawing tool module even knows that they exist. Only when it's time to actually do something with a drawing does the rest of the application get involved.

    The drawing tool should not know or care about what the application does with a drawing after a user hits the save button. And the application should not know or care about how the drawing tool works internally. If the drawing tool dispatched events and fired commands on the application's event bus then I would have tightly coupled my application to the drawing tool, and for no good reason.

    But that's enough about all of that. Benchmarking is fun and when you release your results I'll certainly check them out. I'm sure you know all of this already, but please be sure to take the usual things into account when benchmarking: running multiple iterations to get min, max, and deviation values, garbage collection interference, system instability, warmup etc. http://gskinner.com/blog/archives/2010/02/performancetest.html

    For comparison : you can call object function 90.000 times per ENTER_FRAME to use execution time.

    Exactly! Which is why you should be calling a method instead of executing a command :)

  11. 11 Posted by Deril on 06 Jan, 2012 09:44 PM

    Deril's Avatar

    My post with performance testing:

    http://www.mindscriptact.com/robotlegs-vs-puremvc-performance-battle/

    Thanks for your help.

  12. Support Staff 12 Posted by Shaun Smith on 07 Jan, 2012 12:32 AM

    Shaun Smith's Avatar

    Coolio, I've replied on your blog (I tweaked your tests a bit).

  13. 13 Posted by Deril on 09 Jan, 2012 11:44 AM

    Deril's Avatar

    to Shaun : you hit the right spot with you tweaks!

    So.. empty command uses ~0.007 ms! * every inject costs ~0.003 ms.

    Making extending Command a very poor choice for your command creation
    as it has 5 Injects and none of them are used extensively. (not in my practice at least..)

    contextView:DisplayObjectContainer .. as I mentioned in one of previous posts... this one is almost never used...
    injector, commandMap, mediatorMap is needed for set-up commands... and almost nowhere else?
    (I guess commandMap is exceptional... needed for executing other commands...) eventDispatcher - also rarely used. In most cases I let Models do the talking.

    I imagine you would use ~3 injections per command on average... so it's still slow.. but three time less then I initially thought it will be.

    Thanks!

  14. Support Staff 14 Posted by Shaun Smith on 09 Jan, 2012 03:28 PM

    Shaun Smith's Avatar

    Yeh, I don't think we'll have those abstract classes (Command, Actor etc) in RL2 - people tend to make their own abstract classes, and as you pointed out, the various injected dependencies are hardly ever actually used anyway.

  15. Support Staff 15 Posted by Shaun Smith on 10 Jan, 2012 12:56 AM

    Shaun Smith's Avatar

    @deril: Any chance you'll update your post with the new stats? Otherwise, if you feel that it's fair because most people probably extend mvcs.Command, I understand. In that case, could you at least add a "Command with no dependencies" row?

  16. 16 Posted by Deril on 10 Jan, 2012 11:02 AM

    Deril's Avatar

    post updated... (sorry for delay... I am preparing RobotLegs workshop...)

    in any case... my conclusions is not changed. It's just commands are not that bad as I thought it is.

    oh... and I updated and cleaned the code...

  17. Support Staff 17 Posted by Shaun Smith on 10 Jan, 2012 11:26 AM

    Shaun Smith's Avatar

    Aha, excellent!

  18. 18 Posted by Deril on 11 Jan, 2012 05:58 PM

    Deril's Avatar

    Hm... more I dig into this.. more I want to know MORE! :)

    I want to know why every line of RobotLegs and swiftsuspender is written.. :) and it's not always easy to understand... and not having RobotLegs v2 code does not help..

    in any case - I have feeling that it is very bloated. Have a look here : http://www.mindscriptact.com/blogFiles/frameworkTests/RobotLegsComm... this is a log of 2 empty commands running... it's huge!!

  19. Support Staff 19 Posted by Till Schneidere... on 11 Jan, 2012 09:58 PM

    Till Schneidereit's Avatar

    Hi Deril,

    sorry for jumping into the discussion this late.

    Please have a look at the RL2 source here:
    https://github.com/robotlegs/robotlegs-framework/tree/version2/
    The contained version to of Swiftsuspenders can be found here:
    https://github.com/tschneidereit/SwiftSuspenders/

    Right now, that new version isn't that optimized, causing the overhead
    of command invocations to be even bigger than in RL1. I have a
    prototype of a hacked version of the command map that reduces that
    overhead and brings it to within 30% to 50% of PureMVC's.

    The more important point, though, is that that overhead buys you a
    drastically reduced amount of coupling between the different actors.
    Not only can you use anything that has a public `execute` method as a
    command, you also don't need a central `facade` or any other
    singletons, for that matter. And you can inject into your command
    whatever you need (and have mapped in your context, of course) without
    having to have that knowledge carried through several different layers
    of the framework.

    As you note, that decoupling comes with the price having a lot of
    indirection (and thus overhead and long stacktraces). To some extent,
    that will be reduces in RL2, but mostly, it's the price you have to
    pay. In cases that where that is too high, Robotlegs simply isn't the
    right tool for the job. But here I very much agree with Shaun: In
    those cases, it's extremely likely that it's not the right job for
    other reasons anyway and you should seriously look into entity
    frameworks like Ember2 or something else altogether.

    Having said all that, I'd of course be very interested in any changes
    we could make to improve performance. For Swiftsuspenders 2, I tried
    my very best to shave off every bit of overhead I could, but maybe I'm
    missing some major pieces.

    cheers,
    till

  20. Support Staff 20 Posted by Shaun Smith on 12 Jan, 2012 01:39 AM

    Shaun Smith's Avatar

    Hey Deril,

    Most of the stacktrace comes from the automated dependency injection side of things. Analysing a class and finding and resolving its dependencies is fairly complex. But what's actually happening here is that we're shifting work from the developer to the machine.

    C is faster than Ruby, but I'd much rather write Ruby and let the machine do all of the boring work while I focus on actually delivering a product. The same principle applies here.

    For the shortest stacktrace possible you could simply do this:

    new SomeCommand().execute();
    

    But that wouldn't be very convenient. You have to weigh up convenience and development speed with raw execution speed. Personally, I'm fine with letting the machine do what it's good at if it let's me build my applications more quickly, reduces boilerplate, and reduces the number of bugs I might introduce if I did everything manually.

    Again, that's not to say that we don't care about performance at all. We do keep a close eye on performance. But, it's important to keep things in perspective. Executing a piece of code that does hardly anything at all is obviously going to have a shallower stacktrace than executing a piece of code that does a lot of boring work that I'd rather not do myself.

    However, after saying all that, it's still useful having you look into these things. When we start optimising the performance of RL2 it will be handy to have these stats and benchmarks to look back at.

  21. Ondina D.F. closed this discussion on 23 Feb, 2012 10:34 AM.

Comments are currently closed for this discussion. You can start a new one.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac