Month: March 2014

Warren Buffett’s Ten Rules.

Went to Jimmy John’s the other day, and this was on the wall. Love it!

Warrant Buffett’s Ten Rules for Success

1. Reinvest Your Profits. This makes sense not only in the stock market, but in a small business as well. Entrepreneurs who bleed all the profits out of a business find that they may struggle to grow the business into something larger and more valuable.

2. Be Willing to be Different. Buffet didn’t make his fortune by following the crowd. Instead, he invested when everyone else was panicked, and sold off when everyone else was buying. That strategy always beats the market. Doing what everybody else is doing – the same way they are doing it – is the recipe for becoming average. Nobody pays extra for “average”.

3. Never Suck Your Thumb. After you gather the information you need, make a decision. To Buffet, any time wasted to get to a decision is just “thumb sucking”. Success comes from immediately grabbing every opportunity that you can recognize.

4. Spell out the Deal In Advance. Your bargaining position is never stronger than before you are committed. So, advantage of that opportunity to spell out the details and specifics of any deal before you start. This is especially true when working with friends or family.

5. Watch Small Expenses. In the investment world, this translates into watching not just the returns offered by investment funds, but also the fees charged by the fund managers. This is so true in every aspect of small business and personal finance, as well.

6. Limit What You Borrow. Buffet claims to never have borrowed a significant amount of money. His advice is to remain debt-free, and then save and invest money. This is a very counter-cultural (see #2) contrast to those who preach getting rich using Other People’s Money.

7. Be Persistent. This is an advantage that the small entrepreneur has over larger, more established competitors. Persistence and ingenuity can, and often does win against large odds. If you’ve done your research, taken care of the details, watched your expenses and stayed out of debt, your success through persistence may only be a matter of time.

8. Know When to Quit. Don’t throw good money after bad. Resist the temptation to salvage a bad deal with a last-minute home run.

9. Assess the Risk. Buffet recommends thinking through both the best-case and the worst-case scenarios. This helps clarify the risks and rewards for any venture, which is critical to the decision making process.

10. Know What Success Means. Buffet doesn’t measure success in terms of dollars. As he says, “When you get to my age, you’ll measure your success in life by how many of the people you want to have love you actually do love you.” Here is wisdom.

Fun with Ebola

Spent the weekend camping at the coast and ended up with bronchitis. Probably not the best time to be reading “The Hot Zone”, but that’s what I did. Frankly, it scares the hell out of me. Collapsing and bleeding out must be one of the worst ways to die.

Taking some #’s off the internet and playing around with it in Excel gives some interesting facts. Notice how close the red and blue bars are below. That’s showing, just about everybody that catches Ebola – and these aren’t small samples, often they’re in the hundreds – die. So far, Ebola hasn’t spread widely outside of a few areas of outbreak (helped along by hospitals with substandard hygiene practices like sharing needles). But we know it can spread via the respiratory system – it did in Reston VA a few years back with a colony of crab monkeys, and also in 2012 with pigs. The virus hasn’t successfully made that animal-to-human and then human-to-human respiratory jump yet.

 

The above includes all five varieties of the virus (including Marburg). Using a slicer, let’s look at just Ebola:

So, an interesting little look at the kinds of analytics you can do with PowerPivot in Excel and a simple data table. And a glimpse into Big Data. I could have sliced this up by country, etc.

And it illuminates how data can shed light on a big problem. In this case, Ebola is hiding out somewhere in Africa, and trying like a safecracker to “break into” the human race – different combinations each time. Notice the cluster of events after 1994 – on an almost yearly basis, you see more outbreaks. The chart below shows the gaps more across time. If and when the virus can spread through the air human-to-human, and with our global communications net shuttling the virus across the globe – there’d be no stopping it. With a 67% average fatality rate, it’d be a new Black Death. Terrifying.

 

Data warehousing and working within structures

I saw the great Hilary Hahn a few days ago at the Oregon Symphony.

The amazing thing about a great concert violinist like Hilary is, they’re fantastic innovators – but always within bounds. You see her rattling off 45 minutes (more or less straight) of beautiful, expressive music – but always within the framework of the larger orchestra. They work in perfect synch.

It’s similar with data warehousing. Twenty or so years into my career, I’ve seen it done wrong multiple times (and right only a few) – and it always comes down to not working within the framework of what analytical databases are designed to do. It’s MUCH cheaper to not try to make up your own organizational definition of what data warehousing is… yet that tends to be a strange kind of compulsion that dogs many DW projects. It’s odd why so many companies choose to go down this route. You NEVER see people quibbling over relational rules with OLTP systems the way we do with analytical stores!

Microsoft’s BI best practices doc is pretty plain vanilla but it does break down DW into these pieces. Failing to understand any of these could mean wasting many, many man-years of effort and millions of dollars in rewrites.

The guide is, sadly, a real snoozer. I’ll break it down in a few sentences:

  1. Data source is where get our data from. (Duh.)
  2. Data integration is a layer that handles data extraction and cleansing.
    1. It will likely contain both a staging area (a temporary area that houses raw tables from the source) and…
    2. a loading area (data is now cleansed, validated, transformed, and in a consumable format by a DW).
  3. A data storage area that holds your dimensional model, including partitions for truly huge tables (yes, you’ll have these). Heavily indexed.
  4. Data analysis area holding OLAP cubes from SSAS/Cognos/etc and other data mining business logic.
  5. Data Presentation, including dashboards, flat reports, and moldable data (Powerpivot yay!)

Let’s not get bogged down in details here, but you want to have a staging area that contains your raw tables (and deltas) – this looks EXACTLY like the source data, without relational ties (or even indexes to keep load times to a minimum). Sometimes I drop and add indexes post-load to cut down on ETL times. Your loading area contains consumable objects that are multi-purpose and generic – think a “Products”, “Customers”, and “Suppliers” set of tables, where each row defines a customer, a supplier, etc. This is a relational store but not a OLTP system- you won’t (or shouldn’t) be seeing tons of lookup tables here. Think very flat, breaking some rules with denormalization (i.e. ProductCategory1, ProductCategory2, etc) in the name of performance. The goal here is a set of objects that will be a “one stop shop” of data, hopefully robust enough to survive organizational change with a minimum of bloodshed. For example, currently we’re writing a DW based on an ERP that we’re ‘pretty sure’ will be going away in three months. But I’m reasonably sure, even if we cut over to SAP, that I’ll still have a table of “Items”, “Products”, etc. By having a staging area, I minimize disruption to our users that are becoming dependent on our reports.

So breaking this down even further, for a data warehouse you need:

It looks like this:

  1. A staging area separate from your source data.
    1. Format: Raw tables exactly like your source tables, maybe split out using schemas.
  2. A loading area that contains a canonical view of your data from an organizational/decisionmaking standpoint.
    1. Format: Denormalized tables that define specific entities, highly indexed, relational but not hyperrelational.
  3. A storage area with your star schema.
    1. Format: A central fact table(s) and dimensional tables for slicing. You can have many fact tables and several different data stores to present your data in a way that’s most consumable by different departments/groups.
  4. A set of OLAP cubes for consumption by your client tools.
  5. Client tools so your users can slice and dice data themselves instead of bogging your development teams down with reporting.

Links

Yeats

The Song of Wandering Aengus – Yeats

 

I WENT out to the hazel wood,    

Because a fire was in my head,    

And cut and peeled a hazel wand,    

And hooked a berry to a thread;    

And when white moths were on the wing,

And moth-like stars were flickering out,    

I dropped the berry in a stream    

And caught a little silver trout.    

    

When I had laid it on the floor    

I went to blow the fire a-flame,

But something rustled on the floor,    

And someone called me by my name:    

It had become a glimmering girl    

With apple blossom in her hair    

Who called me by my name and ran

And faded through the brightening air.    

    

Though I am old with wandering    

Through hollow lands and hilly lands,    

I will find out where she has gone,    

And kiss her lips and take her hands;    

And walk among long dappled grass,    

And pluck till time and times are done,    

The silver apples of the moon,    

The golden apples of the sun.    

 

Console heck.

For the record, I hate writing console applications. There’s just only so much magic you can pull without a UI to interact with. But, sometimes, there’s no alternative. Take this problem set:

Basically this is a software key provisioning process, where a central server acts as a key repository – holding a pool of allocated keys. There could be 20 or more provisioning stations that need to pull down these keys, one at a time, and assign it to a serial number – add a hash code, and pass it back to the central database for more processing. Above, we have a “user” running this console application – but since it will be hands-free, writing this as an ETL package (on the server side of things, to handle license requests and keep track of assigned serial numbers) for the server side, and pairing it up with a simple console application that can get kicked off command-line on the individual stations made sense.

The actual implementation console-side was relatively easy – so much so I won’t put it here. The console app calls a sproc to receive a datatable and passes it out as an XML file; it’s then decorated with some attributes (updating the process state and adding a hash), and then it’s picked up again and the record is updated back to the database. Pretty beaten down ground there.

The codeset and SQL is here (except the SSIS) – FFKI

However, the Output SSIS package was funky. Following the steps in this post, I added a data flow to output to an ADO.NET recordset, and then added a ForEach loop against that source.

I originally was going to do a database call within the for/each loop – but that seemed wasteful. I mean, the data is RIGHT THERE. Why are we calling the database twice – once accessing the records in table form to form a recordset, and a second time to form XML and output to a file? So, shudder, I used a cursor.

Tough to say this, but the cursor is actually very performant and works fine as the source for a ADO.NET recordset. This can then be set in a variable called @SerialNumber and @OutputXML in a ForEach loop – and that’s what our script works against. Plain and simple, it accepts it – and writes it out to a file. Easy peasy. We could have updated it as a set but I didn’t want dirty records (in case some were inserted or updated to 1 mid-process) – so that’s done as part of the loop individually. (Another solution is to have yet another ToUpdate status of 2 being mid-process and 3 being done.)

Little things like setting the opening command line arguments I’d forgotten (even how to generate it using csc CMD-side. How quaint!)

SSIS Package Joy

The SSIS package code. Very spaghetti but it works; I don’t think the structure of VB in SSIS encourages modular programming. (Yes it’s a bit of a copout. I found the same thing true with the console app. If there’s no reuse potential and one Main() entry point, what’s the sense in splitting it out into umpteen methods and separate classes?)

Basic order of events:

  1. Reads XML from FilePath + FileName using StreamReader
  2. Adds <Keys> root using FileContent.Replace
  3. Sets up an archive and process folder using ArchivePath (or ProcessPath) + FileName. We could have used FilePath but it’s possible they’ll want separate file drops in the future.
  4. Writes to ProcessPath XML file using StreamWriter, and moves orig file to archive destination.


public
void Main()

{


String ErrInfo = “”;


String FilePath = Dts.Variables[“User::FilePath”].Value.ToString()+ Dts.Variables[“User::FileName”].Value.ToString();


//MessageBox.Show(“Filename: ” + FilePath);


try

{


String FileContent; //Variable to store File Contents

FileContent = ReadFile(FilePath, ErrInfo);


if (ErrInfo.Length > 0)

{

Dts.Log(“Error while reading File “ + FilePath, 0, null);

Dts.Log(ErrInfo, 0, null);

Dts.TaskResult = (int)ScriptResults.Failure;


return;

}


//FileContent Before Replace;


//MessageBox.Show(FileContent);


//Find and Replace –> Modify WHERE clause

FileContent = FileContent.Replace(


“<Key>”,


“<Keys><Key>”

);

FileContent = FileContent.Replace(


“</Key>”,


“</Key></Keys>”

);


//FileContent After Replace;


//MessageBox.Show(FileContent);

Dts.Variables[“User::FileContent”].Value = FileContent;


String ArchivePath = Dts.Variables[“User::ArchivePath”].Value.ToString() + Dts.Variables[“User::FileName”].Value.ToString();// +@”\Processed\” + Dts.Variables[“User::FileName”].Value.ToString();


String ProcessPath = Dts.Variables[“User::ProcessPath”].Value.ToString() + Dts.Variables[“User::FileName”].Value.ToString();


//MessageBox.Show(ArchivePath);


//Write the contents back to File


if (File.Exists(ProcessPath))

{


File.Delete(ProcessPath);

}

WriteToFile(ProcessPath, FileContent, ErrInfo);


if (ErrInfo.Length > 0)

{

Dts.Log(“Error while writing File “ + ProcessPath, 0, null);

Dts.Log(ErrInfo, 0, null);

Dts.TaskResult = (int)ScriptResults.Failure;


return;

}


//and move the orig files


if (File.Exists(ArchivePath))

{


File.Delete(ArchivePath);

}


File.Move(FilePath, ArchivePath);

}


catch (Exception e)

{

Dts.Log(e.Message, 0, null);

Dts.TaskResult = (int)ScriptResults.Failure;

}

}


public
String ReadFile(String FilePath, String ErrInfo)

{


String strContents;


StreamReader sReader;


try

{

sReader = File.OpenText(FilePath);

strContents = sReader.ReadToEnd();

sReader.Close();


return strContents;

}


catch (Exception e)

{


MessageBox.Show(ErrInfo);

ErrInfo = e.Message;


return
“”;

}

}


public
void WriteToFile(String FilePath, String strContents, String ErrInfo)

{


StreamWriter sWriter;


try

{

sWriter = new
StreamWriter(FilePath);

sWriter.Write(strContents);

sWriter.Close();

}


catch (Exception e)

{


MessageBox.Show(ErrInfo);

ErrInfo = e.Message;

}

}

Links of Note

I think, at this point, since I can’t read values at run-time from app.config – and I am out of time and need to move on – that I’ll add them as global variables, however hokey that is, directly in the class.