A great article here on diagnosing problems with batch system performance: https://www.simple-talk.com/sql/performance/the-etl-from-hell—diagnosing-batch-system-performance-issues/
Basically it goes like this – when there’s problems with batch processes overlapping or timing out, companies typically try the following:
- Can we ‘fix’ this with more hardware? (this is expensive, and a temporary fix)
- Can we change the product – for example from SQL to Oracle? (this only work with a redesign. Usually the product isn’t the issue!)
- Turnkey solution (expensive and typically doesn’t address the root issue)
- Split systems into parts, each on isolated servers (hard to synchronize running jobs across your now-separate systems)
Instead of these knee-jerk reactions, there’s a better way – both strategic and tactical – to fix this:
-
Tactical
- Gather statistics – how long is a process likely to take?
- Get a baseline of long-running problem queries
- Gather statistics – how long is a process likely to take?
-
Strategic
- Determine processing time windows
- Create a task list in Excel – and include internal/external dependencies
- Monitor the system – and include management in monitoring reports
- Define threads – tasks in sequence – and processing times
- Implement batch system control that runs off your table of tasks/dependencies
- Determine processing time windows