Thursday, February 22, 2007

Useless reports and GIGO

I got pulled into a couple of meetings today. Both meetings involved doing some analysis of data submitted to me. Both results/reports had problems, per the reason for the meetings. I will talk about one of them as it is more specific to the point.


A person (the manager) entered information through Excel. I sorted and categorized the information based on unique identifiers in one of the columns. The meeting started off with why does this report have 4 different numbers and identifiers for 'XYZ'? I should have merged together the 'X YZ', the 'XZY' (typo), and the 'X Y Z' identifiers together to get one correct number, this report is of no use, and where did I get this information?


I tried to explain that this was the information that person entered in Excel. I then went forward to offer a couple of options on enumerated/listed identifiers so that they would be consistent. "But I do it this way because I can [enter uber entry technique process] it very fast". That is good...but why are we having this meeting? "Because the report is useless".


The sooner you can control the input process of data, the better for everyone. I am looking for comments and suggestions to help control scenarios like this that do not involve nerf bats that have a big 'GIGO' label on it.

Monday, February 12, 2007

Pentaho BI

It has been several months since my last blog, and I feel like I should be struck with rosemary beads and dunked in holy water for waiting so long. But, without further ado -

Pentaho - - is a 'conglomerate' open source project that has put together several related projects under one Umbrella. BI, or Business Intelligence, is the entire process of obtaining, scrubbing, analyzing, reporting, and then re-analyzing/re-reporting on business data.

Pentaho has combined many of the elements to handle most of the BI stack under a friendly LGPL/no-cost license allowing you start using Business Intelligence in even the smallest of projects. That is huge...usually a project had to push over 1/4 of a million (pending your resources/randomly selected amount) before you really could engage Business Intelligence. Now, that is no longer the case. :-)

I had the wonderful opportunity to go to the Advanced Implementation Workshop in Orlando, FL from Jan 29-Feb1, 2007. Having previous projects using a portion of Pentaho, I had sufficient knowledge to get the most of the Workshop.

In addition, was able to meet with Matt Casters of the Kettle (Pentaho Data Integration) project, Julian Hyde of the Mondrian (Pentaho Data Analysis Services) project, and Thomas Morgner of the JfreeReport (Pentaho Reporting) project. This added confidence as talking with them I got a good impression they thoroughly understood their individual domains.

Pentaho is still 'in the rough' as there are occasional user-facing design and presentation items to clear up, as well as enterprise-functionality from a developer standpoint that needs to be resolved or added. Most of these are related to the full BI Suite and are usually minor (which itself is only a couple years old), while the individual projects have been around for quite some time.

Overall, I am very impressed and will start working more heavily with Pentaho, regardless if it is (initially) just reporting or more full-fledged BI solution. Now, if only they could replace jpivot...