ETL from hell – getting past denial

A great article here on diagnosing problems with batch system performance: https://www.simple-talk.com/sql/performance/the-etl-from-hell—diagnosing-batch-system-performance-issues/

Basically it goes like this – when there’s problems with batch processes overlapping or timing out, companies typically try the following:

  1. Can we ‘fix’ this with more hardware? (this is expensive, and a temporary fix)
  2. Can we change the product – for example from SQL to Oracle? (this only work with a redesign. Usually the product isn’t the issue!)
  3. Turnkey solution (expensive and typically doesn’t address the root issue)
  4. Split systems into parts, each on isolated servers (hard to synchronize running jobs across your now-separate systems)

Instead of these knee-jerk reactions, there’s a better way – both strategic and tactical – to fix this:

  • Tactical
    • Gather statistics – how long is a process likely to take?
    • Get a baseline of long-running problem queries
  • Strategic
    • Determine processing time windows
    • Create a task list in Excel – and include internal/external dependencies
    • Monitor the system – and include management in monitoring reports
    • Define threads – tasks in sequence – and processing times
    • Implement batch system control that runs off your table of tasks/dependencies

Noodling around with Entity Framework.

Entity Framework is the default way that Microsoft wants us to be noodling around with data. And it’s easy to see why – this is MSFT’s answer to the challenge posed by Ruby, well over ten years ago. EF gets better with every version.



From the Getting Started documentation, it should just be a simple matter of

  1. Add a new diagram – and make sure for delete (for example) all changes are set to cascade.
  2. Add a new web form + master page.
  3. Add an Entity Data Source. In design view, configure it to point to your Entities data source you created in step #1.
  4. Add a new GridView. Select your entity data source. Select Enable Paging…. down to Enable Delete. Delete the ID fields.
  5. Set the Dates template, etc on any custom fields.
  6. Set up the ContextTypeName attribute to enable lazy loading.

However, I’m getting a “The provider did not return a ProviderManifest instance” error message. This usually indicates an issue with the connection string. I’ve tried switching to Integrated Security, changing the username/pwd – no dice. I honestly think this is a bug with EF6, since I’m using the most recent version of Entity Framework and I can’t recall seeing this issue in EF5.

Since the forms use sprocs by and large, and we are using webforms, I’m OK with – for now – going old-school and using sprocs not EF for my data connections. For our new app though I am going to revisit this and use either Linq-to-SQL or preferably that nifty BackboneJS/KnockoutJS + MVC stack and a webapi data layer.