The code isn't optimal especially not on the merged mining side. Currently there is no optimal method to handle both block chains without a more advanced miner-pool communication protocol.
Excluding merged mining issues, latency is a large factor. When a block change occurs every single miner's effective hashing power is zero until they begin working on updated block header. Thus a pool's effective hashing power goes from total hashing power down to 0 and then rises as miner's are updated. So pool's long term effective hashing power depends on how quickly it can deal with a block change.
This involves three components.
Detecting block change. A good pool should have large number of connections to the Bitcoin network to minimize the delay in learning of block chain. A good pool operator will ensure they maintain connections "close" (within 1 or 2 hops) of every major pool.
Recalculating block headers. During a block each miner will complete their work at differing times and thus getwork requests are staggered. However when a block changes the pool needs to update every miner's block header at once. A pool lacking sufficient processing power to quickly compute block headers will have miners working on stale work longer and thus have a higher average stale %. Updating miners in the order of their hashing power could reduce the pool's overall stales slightly. I don't know if any pool currently does that.
Update miners. The latency of the miner's link is beyond the control of the pool but a pool can improve efficiency by having multiple pool servers reducing the number of hops to all miners. Pool servers should be located as close to miners as possible. A pool consisting mostly of miners in Asia shouldn't use US East Coast datacenter for example.
NTimeRolling reduces the number of communications (getworks) required for a given number of hashes. this makes the pool more efficient as a given amount of hardware can support more clients however it doesn't reduce load at a block change.
Implementing Long Polling ensures that miners are notified when a block change occurs (minus latency indicated above) rather than continuing to work on stale data until complete. Completing one nonce range takes roughly 10 seconds for a 400MH miner. Without long polling on average the miner will waste 5 seconds per block change working on data that can never produce a valid block. Given block changes occur every 600 seconds that is roughly 1% of CPU time wasted hashing invalid block headers. Slower miners have longer period between getworks and thus waste a greater % of GPU time. No pool can be efficient without a good Long Polling implementation.
Both NTimeRolling and Long Polling (LP) require a miner which properly understands these commands.