I see a lot of threads here about how to mitigate the impact of having a large number of insignificant countries, and that the main performance bottleneck is memory-related. This isn't really a technical forum, but I'm going to take the liberty of making a technical post on how to go about solving the performance issue.
Firstly, you have to have tools. Either write them yourselves or get someone else to write them for you (hint, hint
). VTune is good for some things but inadequate for industrial-strength performance analysis.
The first thing you need is an UDP talker/listener class so whatever you log on your main machine can be piped to another machine and displayed graphically. We have to be able to look over and "see" the bottleneck, but displaying it on the machine running the app is going to invalidate your measurements. You all seem to be pretty savvy net programmers, but you can just lift this directly from Microsoft Press's "Network Programming for Windows" book, one of those orange thingies.
Next, you need a capable inline profiling dll. Implementing the one suggested by Wyatt's article in the May 1998 Game Developer magazine, source available for download, is where I'd go.
Someone is going to have to write a graphical display to put the results up on the second machine, I've got a hokey bar-code app that I wouldn't admit to having written, but it works. Also, on the client side, you have to be able to display a histogram of excessively lengthy frames, and have the option of enabling a bar-code "freeze" if there's an exceptionally large time/memory spike. Basic windows programming.
If you're doing your own memory allocation, instrumenting the total amount is trivial. I suspect this isn't the case for this app, as it isn't for most windows programs.
Picking out memory abuses for memory we actually allocate *when it is allocated* and display it immediately is a bit tricky, but not impossible. On starting the app, make a call to your allocation. Log your stack frame right after you return and grab the memory address of the allocation function. Then, you disable write protect in the code segment containing the function, overwrite the supplied library with a jump to your own allocator, and hook in your instrumentation there. Again, pipe the results on a frame-by-frame basis to the second machine and display them. Actually, I've forgotten some of this, there might be better ways to instrument where and how much memory is allocated, perhaps replacing the windows "new/new[]" operator with your own, but the vital thing is that it be done and reported in a timely fashion.
There is no way I know of to directly instrument a virtual memory swap, which isn't to say it can't be done, I just haven't done it. It's trivial, however, with an inline profiling system, to isolate exactly what event triggered the swap. To be frank, I'd doubt that we'd have to go further than this plus some basic memory usage instrumentation, but if we have to I've done it and I can dig up the code.
None of this is easy. After it's been done a few times, though, it's significantly easier, and again, I'm off for some indeterminate amount of time and would love to work on it pro-bono, so why not? I can't see a reson this app shouldn't scream on a 256M machine.
Best,
Randall
Firstly, you have to have tools. Either write them yourselves or get someone else to write them for you (hint, hint
The first thing you need is an UDP talker/listener class so whatever you log on your main machine can be piped to another machine and displayed graphically. We have to be able to look over and "see" the bottleneck, but displaying it on the machine running the app is going to invalidate your measurements. You all seem to be pretty savvy net programmers, but you can just lift this directly from Microsoft Press's "Network Programming for Windows" book, one of those orange thingies.
Next, you need a capable inline profiling dll. Implementing the one suggested by Wyatt's article in the May 1998 Game Developer magazine, source available for download, is where I'd go.
Someone is going to have to write a graphical display to put the results up on the second machine, I've got a hokey bar-code app that I wouldn't admit to having written, but it works. Also, on the client side, you have to be able to display a histogram of excessively lengthy frames, and have the option of enabling a bar-code "freeze" if there's an exceptionally large time/memory spike. Basic windows programming.
If you're doing your own memory allocation, instrumenting the total amount is trivial. I suspect this isn't the case for this app, as it isn't for most windows programs.
Picking out memory abuses for memory we actually allocate *when it is allocated* and display it immediately is a bit tricky, but not impossible. On starting the app, make a call to your allocation. Log your stack frame right after you return and grab the memory address of the allocation function. Then, you disable write protect in the code segment containing the function, overwrite the supplied library with a jump to your own allocator, and hook in your instrumentation there. Again, pipe the results on a frame-by-frame basis to the second machine and display them. Actually, I've forgotten some of this, there might be better ways to instrument where and how much memory is allocated, perhaps replacing the windows "new/new[]" operator with your own, but the vital thing is that it be done and reported in a timely fashion.
There is no way I know of to directly instrument a virtual memory swap, which isn't to say it can't be done, I just haven't done it. It's trivial, however, with an inline profiling system, to isolate exactly what event triggered the swap. To be frank, I'd doubt that we'd have to go further than this plus some basic memory usage instrumentation, but if we have to I've done it and I can dig up the code.
None of this is easy. After it's been done a few times, though, it's significantly easier, and again, I'm off for some indeterminate amount of time and would love to work on it pro-bono, so why not? I can't see a reson this app shouldn't scream on a 256M machine.
Best,
Randall