Wednesday, December 12, 2007

A Response to DSS Governance

Every couple of years or so, you get one of those rare students in a class - one who actually teaches you something about the topic. Bruce Fowler was one of those students in a Data Warehousing course that I taught earlier this year. He's a data warehouse manager for a resource management company, and so has an interest in DW management and governance issues.

He posted a response to the DSS Governance post I put up recently, but due to the new comments system that Peter's been playing around with, it didn't come through. It was substantive enough, and included some diagrams, that I thought it reasonable to post (with Bruce's permission!) his comments as a separate entry. All images in (and linked to by) this entry are Copyright © 2007 Bruce Fowler

The comparison of IT, railroads and power as "infrastructure technologies" - sharing characteristics of competitive advantage, ubiquity and finally commoditisation (followed by a loss of strategic benefit and value) is a fairly long bow. It is important to remember that the catalyst for the demise of rail was not its cost or availability (at least not directly), it was the advent of more cost effective and time efficient technological alternatives (combustion engines, aeroplanes). I am not sure I would support the contention that the commoditisation of a technology has any specific correlation to the technology’s loss of strategic initiative or competitive advantage.

Railway lines continue to provide cost effective and logistically efficient means of transporting large volumes of material across the country (provided the infrastructure exists and the alternative means remain less cost effective and time efficient), and are being used in new and innovative ways to supplement conventional income streams for logistics organisations through integrated fibre networks. Power companies continue to explore delivery of new products and services over existing infrastructure (i.e. broadband over powerlines), and are currently reinventing themselves in biofuels space to enable delivery of “green” energy to a more environmentally conscious market. Combustion engines are being redesigned to be more fuel efficient and “environmentally friendly”.

The delivery platforms have been around for some time and form part of our everyday lives – their use is evolving in new and innovative ways. IT – perhaps more than any other technology platform – has the capacity to continue to be adapted and evolved to meet the ever-changing demands of its user base. Commodity? Yes. Does it matter? Of course it does.

Back on topic …

I suspect there is a significant difference in the structure, objectives and necessities of Corporate Governance and IT Governance; and of the relationship between the two in comparison to the same for DW/DSS Governance. Even then, there are perhaps different factors that need to be considered from the perspective of DW Governance versus DSS Governance, and their respective relationships with IT Governance.

Consider the basis for the introduction of Corporate Governance – a means of managing the seemingly inevitable consequences of the centralisation of power and decision making authority (Husted, 1999); then consider the basis for the introduction of IT Governance – the patterns of authority for the significant IT activities of an organisation including IT Infrastructure, IT Use and IT Project Management. In simplistic terms, the former focuses on risk mitigation, the latter on efficiency and effectiveness.

Over time, the use of Corporate Governance models as a direct risk mitigation strategy has given way to an army of formalised standards, auditing and reporting obligations – spanning multiple levels of business, including technology operations – administered by committees through levels of delegation of authority. The line between Corporate Governance and IT Governance has blurred, and the need to maintain alignment of IT (infrastructure, use and project management) with the organisations mission objectives is now forefront in the minds of most informed corporate executives. This alignment recognises that the critical strategic importance of IT to successful business operation.

Get it right, and IT can (at a minimum) provide a stable platform from which other strategic endeavours can be launched. Get it wrong, and a failed IT system or project can (in the best of cases) reduce your business efficiency or effectiveness, or (in the most severe of circumstances) end your business (someone say ERP?).

A pencil is a commodity. IT is a tool that can be used to create competitive business advantage, or as easily be misused resulting in catastrophic business or process failure.

I find myself off topic again …

The evolution of DW Governance arrangements depends on the confluence of many factors that interact with one another in a number of complex ways (Sambamurthy and Zmud, 1999). The key factors, their interactions and dependencies are included below.

Given the nature of the relationships identified below, perhaps we could identify the interactions has a loosely coupled hierarchy: with each child exhibiting some characteristics of its parent, and the will and initiative to move around (and sometimes break out of) the boundaries defined by the ever-watchful parent (who will evolve and adapt their boundaries to meet the growing needs and demands of the child, but have the foresight and capacity to bring the child back into line if needs be).

Thanks Bruce. Generally, I vigorously agree with everything you've written here. If you would like a copy of Bruce's original diagrams, drop me a line and I'll pass on the request.

Monday, December 10, 2007

DSS Governance

In the last post, I mentioned I had been looking at the issue of governance and DSS. In fact, this is something I've been thinking about since a student asked in a lecture if there was anything on data warehouse governance a couple of years ago, and I've just written a paper for the bi-annual conference for the academic DSS community, IFIP Working Group 8.3.

The paper is currently under review, so I won't post it here yet (I'll put up a link when it's gone through that process), but I thought I'd put the basic argument out there for people to comment on, since it's all still conceptual at this stage.

IT governance is an important topic - on the one hand corporate governance is a big thing; and on the other, we've got Nicholas Carr telling us that IT is not a strategic advantage for organisations. The IT industry needs to ensure that we're managing an important corporate resource effectively.

As a significant chunk of the IT industry DSS (ie. BI) is all a part of IT governance. Unfortunately, there's not a lot of academic work that talks about how to do this effectively for DSS (there's a bit on data warehousing, but that's it).

The argument I make in the paper is based on the idea that DSS is different to other kinds of IT in two ways:

  1. DSS are chaotic systems. They evolve. They can and should evolve quickly. If they don't evolve, then there's something wrong: learning isn't taking place, and the system isn't doing what it's supposed to: provide support with semi- or un-structured decision-making. Sure, evolutionary development is used to build all kinds of systems, but there is usually some end-point in mind where the system becomes stable (relatively). This isn't the case for DSS.
  2. DSS are subversive systems. They're designed to deal with strategic decisions (a corollary of being built for semi- and un-structured decisions). Their use deliberately changes some aspect of an organisation's structure (not the physical structure, but the organisation's strategic direction, policies, values, procedures, work-flows, etc.). Other systems may have this effect too, but often it's not deliberate. With DSS it's intentional - part of it's raison d'etre.

IT governance is largely about how to control IT resources, enforce standards, and manage changes in a methodical fashion. It's based on a mindset of stability and prediction. Although there's been a lot written on IT governance (check out Weill & Ross's excellent book on IT governance), and a lot of it focuses on the appropriate approach for given organisational types (eg. centralised versus decentralised management cultures), there's nothing I've seen that actually takes characteristics of the technology into account. My assertion in the paper is that the underlying assumptions of a given governance approach should be consistent with the underlying assumptions embedded in the technology being governed.

Bureaucratic approaches that are appropriate for managing technologies like transaction-processing systems - steering committees, IT councils, service level agreements, etc - are inappropriate for chaotic systems. Enforcement of an organisational structure on the operation of a technology is inappropriate for a technology designed to question and change that same structure. Excessive control can (and has, we've seen it) stifle and eventually kill a DSS project.

The conclusion that I come to is that for DSS to thrive, the developers and users need the autonomy to play around with the system's design and functionality without going through multiple layers of bureaucracy. DSS should operate, therefore, in a kind of 'governance sandbox', where the DSS team are trusted to do the right thing as they see it. This kind of approach needs some clear boundaries however, including clear goals and objectives, and what constitutes overstepping the mark. This in turn requires a pre-existing, well planned general IT governance strategy.

Anyway, those are my current thoughts. Feel free to shoot through any comments. A couple of issues spring to mind, such as how do different kinds of DSS technologies differ in their governance requirements - eg. data warehouses versus dashboards versus small-scale throwaway spreadsheets. What specific mechanisms work for DSS governance inside the 'sandbox'? How much scope should DSS developers and users have? All food for future research...

Monday, November 12, 2007

IT Archaeology: Whatever Happened to SDS?

I've been digging through some ancient texts (in IT terms) over the past week or so, looking at the issue of IT governance and how it relates to the development of decision support systems. In doing so, I read again an article from 1971 in Sloan Management Review that first coined the term 'decision support system' written by two academics from MIT: Anthony Gorry and Michael Scott Morton.

In defining DSS, they also designed a second class of information system that I'd completely forgotten about, known as a 'structured decision system' (SDS). The term 'structured' comes from an adaptation of a model of decision-types by Nobel laureate Herbert Simon. In Gorry & Scott Morton's terms, decisions are either structured (well-understood, fairly easy to resolve, lend themselves to well-defined workflows or decision-rules), unstructured (difficult, high levels of ambiguity, no clear process for making the decision) or somewhere in-between (semi-structured).

Gorry & Scott Morton argued that such systems are designed to support semi- and un-structured decision problems. For structured decisions like inventory control, short-term forecasting and so on, they coined the term SDS.

This got me thinking. I had always seen today's BI systems as the inheritor of the DSS concept - basically the latest term in a long string of marketing names for systems designed to support managerial decision making. Now, I'm not so sure. In looking at Gorry & Scott Morton's definitions, most BI tools seem to be targetted more at structured, rather than unstructured decisions. Couple this with efforts by people like Howard Dresner to shift the BI concept more and more to enterprise reporting, and I reckon that BI is more about SDS than DSS.

Which is a problem.

There is a class of decisions that really need the support of systems that can help the decision-maker through the decision-making process by embedding principles of good decision-making in the system, and helping them make sense of the information they have. These decisions tend to be strategic and important, and the potential ROI for a system that can improve decision-making in this area goes way beyond being able to run an operational report that used to take hours in 30 seconds. In other words, there is a real need for the decision support systems that Gorry & Scott Morton described 36 years ago, and I don't think that BI tools are currently being used to build them.

This is not to say that they're not being built, of course. Instead, I think they tend to fly under the radar more, as individual strategic decision-makers throw together systems to answer specific questions in a way that they're comfortable with: witness the continued pervasion of Excel in the upper echelons of organisations despite the flash-wizardry of the latest 3D-pie-charting engine; skunk-works projects for individual managers; the explosion of independent data marts.

As I said above, the impact of the effective use of a DSS far outstrips that of a SDS because the decisions made fundamentally and directly affect the strategic direction of a firm. It's critical that good quality DSS are developed, and I don't think we're going about it in the right way at the moment. Excel can be dangerous if not used properly, but because everyone is focused on the highly-visible SDS-like BI projects, the DSS-needs of an organisation are often addressed in an ad hoc way.

It's a pity that we don't pay attention more to what was written in the past about IT. Not only do you realise that we keep making the same mistakes despite the changing technology, you come across interesting ideas that can change the way you perceive today's industry.

Wednesday, October 24, 2007

The Myth of BI for the Masses

Glenn Alsup over at the Viewmark Blog has a great post on how it's next to impossible to properly satisfy all user needs with a single analytics tool/data warehouse. I've blogged on this theme before, but it's worth hammering home time and again, since it's diametrically opposed to what the vendors want you to think. The problem with BI is knowing the information requirements prior to users using the system. Given the role of BI is to support decision making, people don't know what they need until they start to work through the issues - the very thing the BI system is supposed to support. No data warehouse design, however big, is ever going to be able to anticipate the specific information requirements needed by a decision-maker, especially if the decision is a strategic one (the real sweet-spot for BI).

My personal view is that it's better to spend money on a team with some great analytical skills than a software solution that ends up as a glorified enterprise reporting tool. In many decision situations, small-scale, non-permanent 'ephemeral' systems (so-called 'personal DSS') thrown together to answer specific questions are more useful than multi-million dollar BI systems. But then that doesn't sell software licenses, consulting or support contracts.

BI Marketers are eevil. As in fruuuuits of the deveeeil.

Just received via spam from a well known vendor:

Markets Demand Accurate Forecasts - Learn how to improve your forecasts by a minimum 50%

If I'm likely to fall for a line like that, I'd be better off doing a basic stats course to improve my forecasting accuracy. It's silly statements like this that give BI vendors the reputation they have today. Shame on you.

Tuesday, October 23, 2007

CRM: Seeing things from other people's perspectives.

We've just finished teaching for the semester here at Monash, and one of the subjects I taught was a Masters level unit on customer relationship management. As part of teaching the unit, I created a blog, which I've just switched off, but thought this post was worth saving, and relevant to this blog here. It was originally posted on the 2nd of August, 2007, and appears slightly edited (for context) here.

Sometimes we lose sight of the point of our business initiatives, failing to put ourselves in the shoes of stakeholders like customers (or users, in the case of BI).

Fournier, S., Dobscha, S. & Mick, D.G. (1998) Preventing the Premature Death of Relationship Marketing, Harvard Business Review, Jan/Feb98, V. 76, I. 1, pp. 42-51

I came across the HBR article above today and was struck by its relevance to the Barry Schwartz video below* (and the one viewed in this week's seminar). Although it's a bit old now (nearly 10 years), the article talks about how the idea of relationship marketing (the underpinning of CRM) is often subverted by the very activities marketers engage in to fulfil it. The relevance of Schwartz's idea of the paradox of choice to CRM is that it raises doubts about this ideal of the one-to-one relationship between a customer and a company. The article above builds on this theme very nicely. From the article:

Every company wants the rewards of long-term, committed partnerships. But people maintain literally hundreds of one-to-one relationships in their personal lives - with spouses, co-workers, casual acquaintances. And clearly, only a hadnful of them are of a close and committed nature. How can we expect people to do anymore in their lives as consumers?

"It's overkill," said one woman we interviewed, referencing the number of advances from companies wanting to initiate or improve their relationship with her. "One is more meaningless than the next."

The article points out that consumer-satisfaction is at an all-time low, despite all these relationship-marketing efforts. I reckon a lot of that has to do with the phenomenon that Schwartz talks about, but relationship-marketing, aided and abetted by CRM, seems to only exacerbate it all. You have to wonder how effective our BI systems are at doing what they try to do: improve the decision-making process.

[* The Schwartz video referred to was posted on the original blog - it is a presentation by Barry Schwartz to Google on his concept of the Paradox of Choice. You can watch it here. It's worth watching in it's own right. ]

10 Keys to BI Success

This CIO magazine headline (10 Keys to a Successful Business Intelligence Strategy) popped up in the links to the right recently, and it's worth the read. It's spot on about most things, especially recommendations nine and ten that you should start simple, go for 'low-hanging fruit'. Author Diann Daniel is also quite right in pointing out that the main driver needs to be a c-level executive other than the CIO.

I'm not a fan of the 'X keys/steps to "insert good thing here"' approach to journalism, but good stuff nevertheless.

Thursday, October 11, 2007

SAP & Business Objects

Everyone will have heard by now about the massive buyout of Business Objects by SAP. At just under US$7 billion, this is almost twice as much as Oracle paid for Hyperion several months ago (ok, 7 billion US is not that much anymore, but 4.7 billion euro is a nice stash of cash). While the deal is still awaiting approval by shareholders and regulators, it's probable it will go ahead.

The most obvious observation to be made about the deal is that it's a reaction to the Oracle/Hyperion deal, which it no doubt is, in part. There is another angle, though, touched on by Joshua Greenbaum over at Enterprise Anti-matter. One of the points he makes is that the ERP market is starting to stagnate as a result of market saturation. This is certainly the case in the market here in Australia: a conversation I had with a contact in a large BI/Data Warehousing company highlighted the fact that a lot of former ERP consultants are applying for jobs in business intelligence. If the industry research companies are to be believed, BI is still experiencing growth and is apparently on every CIO's 'radar' for the next year (I am so sick of military metaphors for business...).

Joshua also mentions the different perspectives that ERP and BI companies have on what it is that they do, with ERP obviously focussed on transactions, while BI is tools-oriented. This is certainly true, but I think that it goes further than that.

Transaction-oriented systems like ERP or any OLTP system, have inherently different design requirements from systems designed to support decision-making. Transaction systems need to process lots of small packets of data very quickly, are designed for efficient data entry and storage, and are mainly used by people with relatively little organisational authority. They are also, comparatively, easy to design. The workflows and processes supported by these systems (or in the case of ERP, imposed by these systems) are well understood and possible to document in minute detail.

Decision support tools, though, are designed to answer questions and facilitate a decision-making process. Making decisions is a fundamentally different kind of activity to keeping track of business events. The need for a decision-support tool necessarily means that the users (ie. decision-makers) are functioning in an environment characterised by uncertainty. They may have an idea of the kinds of questions they need to answer, but they can't be absolutely sure that this list is complete (and practice shows that it never is). Indeed, in answering those questions, a whole range of new questions invariably arises. In short, the use of a decision-support tool is fundamentally a learning process:

  1. The users don't know exactly what they need
  2. They don't necessarily need what they want
  3. Use of the system itself changes their understanding of what they need
  4. The system therefore needs to change to help answer new questions (and back to 3 again..)

Learning is an evolutionary process, which means the development and use of decision-support tools also needs to be evolutionary. The difference between ERP and BI is not just a tools-based perspective, it's a need for a fundamentally different mindset on the development and use of the system. When you throw users with a massive amount of organisational clout (senior executives) and cognitive factors associated with decision-making into the mix, the practice of BI and ERP are worlds apart.

Unfortunately, a lot of IT folk don't understand this difference, and we see time and again the use of old-fashioned engineering approaches to BI projects. Such approaches may work fine for transactional systems, like ERP, but doom many BI projects to failure. SAP's own BI module, SAP/BW is widely loathed as a decision-support tool, in large part because it was developed by people with a transactional-mind set. I'm aware of one very large BI project that is currently underway (not using Business Objects or SAP/BW) that exhibits exactly this engineering approach: 18 months into the project, not a single user-facing aspect of the system has been delivered to anyone, with all the work going to the back-end infrastructure. Without something for the users to react against, the learning process can't kick-off, and the requirements (against which all this infrastructure is being designed) cannot possibly be properly understood. Down the track, either one of two things will happen (unless something changes): either the requirements will change and the system will have to be redesigned; or requirements will change, the system won't be updated, and no-one will use it because it doesn't answer their questions. The only way around this is to deliver some quick wins in the form of one or more pilot projects that focus on easy, yet strategically important business areas - get the system in front of some user eyeballs, and the requirements elicitation process will be driven by their feedback.

In any case, to bring this back to SAP and Business Objects, the move makes a lot of sense for SAP. They pick up expertise on developing systems for decision support, not just recording transactions. Hopefully, they recognise this, and the move is not just motivated by a reaction to Oracle and Hyperion, or to tap into a growth market. The fact that SAP are saying that Business Objects will continue to operate as a separate business suggests they want to keep the skill base around, and hopefully it will translate into some skills transfer in the direction of SAP.

Friday, August 31, 2007

Free Advertising

I'm not usually in the habit of giving free kicks to commercial ventures, but this is worth plugging. Long time associate of the Monash BI group, Martin Kratky runs BI consultancy Intalign. In great news for the BI industry here in Melbourne, Sydney and Brisbane, they are bringing out Stephen Few to run some executive workshops in October.

As you may know, Stephen is an academic at UC Berkeley specialising in data visualisation, with several excellent texts dealing with data presentation and the use (and misuse!) of the now ubiquitous dashboard. Stephen also runs consultancy Perceptual Edge, and has a well-read blog at Visual Business Intelligence. He is passionate, doesn't pull his punches and has a message that all BI consultants and developers should hear.

Monday, April 30, 2007

HP: "Ooh look! We're a data warehousing vendor too!"

While hunting around in their software portfolio, it looks like HP discovered that they do data warehousing too. HP have long provided server hardware for Oracle-based databases with their NonStop line of servers. Turns out that the Tandem hardware design that HP had used to build the NonStop server line was originally developed to support OLAP-oriented databases as well as OLTP. Now HP want to compete with Teradata. I wonder what Teradata think of that...

HP claim 200,000 BI implementations per year. Um, yeah, ok. Ben Barnes talks about it all in the video below. Be warned though (if the IDG statistic above wasn't enough) - Ben claims that HP's product suite is "next generation" BI, because it provides an enterpise-wide information resource rather than silo-ed information stores. Teradata in particular would be raising more than an eyebrow there, since it's been their bread-and-butter for decades. It's what data warehousing has been for a long, long time (often with disastrous results when the enterprise approach has been naively adopted). If it wasn't for the (c) 2007 text superimposed on the video, you'd swear it was 1989. As for Ben's use of the idea of parallel querying and data load as a selling point, well, try telling that to users as the data in reports changes before their eyes...

Here are the main points of Ben's pitch for HP flavoured BI:

  • Cost-effectiveness, based, as far as I can tell, on the same argument that other DW appliance vendors use.
  • No need for the batch-window, upload data as people query it (see above).
  • Reliability for a large userbase - fine, but HP aren't the only ones selling hardware/DBMS for warehouses with large userbases in a reliable way.
  • Minimal need for tuning - again, standard appliance pitch.
Nothing at all revolutionary or "next generation" here, and certainly some worrying evidence that HP don't know much about BI beyond the hardware. Very little that's tricky about BI (and by the way, 50,000 users using a DW is not BI) has anything to do with the hardware or software platform.

Thanks to Craig for passing the Yahoo! the article along.

Friday, March 2, 2007

A Giant On The Move

The news is out, and the speculation has been confirmed: Oracle is going to buy Hyperion for a cool USD$3.3 billion. The purpose of the move is to let Oracle have a crack at toppling SAP from its enterprise systems pedestal, and so is only partly to do with the BI industry. Oracle's President Charles Phillip must have struggled to keep the smirk off his face when he announced:

Thousands of SAP customers rely on Hyperion as their financial consolidation, analysis and reporting system of record... Now Oracle's Hyperion software will be the lens through which SAP's most important customers view and analyze their underlying SAP ERP data.
Indeed, Cognos and Business Objects seem to be sitting back and enjoying the show a bit - they seem to think they'll pick up a few rats jumping ship as Hyperion/SAP customers re-evaluate their software license portfolio. Or maybe it's more to do with the anticipation that they'll be looked at as a potential marriage partner for SAP.

So what does this mean for the BI landscape? Probably not a lot. SAP may pick up one of the other BI vendors. Hyperion customers will probably get crappier service, but then that's already been happening apparently as Hyperion has grown. Fewer players in the market place will also lessen the likelihood of any fundamental innovation in the kinds of BI products available - but then the current crop of players don't really seem to be doing anything earth-shattering in that respect either. In reality, this is an ERP industry story that will encourage the view that BI is just another module of an enterprise system that provides enterprise reporting.

Tuesday, February 27, 2007

Active Data Warehousing and ROI

This has been up a little while now, but it's worth pointing out - Take some time to listen to this Teradata podcast. It stars Rob Armstrong, Director of Data Warehousing Support...MORE

Thursday, February 22, 2007

Gartner says ... "Business Intelligence" is top CIO technology priority in 2007

Interesting press release from Gartner summarising the results of their latest large survey of CIOs (

Typical self fulfilling stuff from Gartner - they write it, CIOs read it and do it, and as a result it becomes real. I've always worried about the power Gartner and other analyst firms have, they never really give much detail about their research method - I know I'm only reading their press release but what does surveying more than 1400 CIOs mean? Did they get 1400 odd replies or did they just stuff 1400 envelopes and get back a much smaller sample ... they often don't tell. Anyway, no CIO ever got sacked for aligning their strategy with what Gartner tells them their strategy should be, so we all need to know about it, as this report is going to be read and regurgitated again and again (especially by the BI vendors!).

Tuesday, February 20, 2007

SAS Meets The Wizard of Oz

We're not in Kansas anymore, Toto! Is SAS The Great Oz behind the curtain, or Glinda, there to help poor business folk get back to the farm?

Ok, ok. Enough with the videos, I know. Posts with substance are on their way, as soon as next week's semester start is over and done with.

Tuesday, January 23, 2007

Finnish BI Video

Hot on the heels of the Digg BI Stories RSS feed I've added a feed for BI videos on YouTube. Not a lot there at the moment, but thought I'd link to this one. It's a Finnish promotional video with SAS at what I'm guessing is a business software conference, and it's here purely because of the way they say "decision support" and "business intelliggence". Those wacky Finns...

You're a BI What? The Myopic View of BI Vendors

An excellent article that I came across while adding the Digg functionality to this blog yesterday (notice the new Digg It! button below, and the list of BI stories on Digg on the right!) does a wonderful job of outlining some of the major shortcomings of the BI industry at the moment. Neil Raden from consulting outfit Hired Brains and the fingertips behind the Addicted to BI blog, gets stuck in, criticising vendors for a myopic product/customer/sales view of organisations, and being too focused on the software/hardware tool, rather than the all-important decision support. Shades of Peter Keen there, who made the same criticism of DSS developers and vendors back in the 1980s - it's the first 'S' in DSS that's the most important, ie. support should take precedence over the system (since the latter is only a means to achieving the former).

Raden also talks about the potential of Web 2.0 (ugh) concepts that may be of benefit to decision-makers: collaboration, tagging, etc. Although I'm not a fan of the label, I am very supportive of Web 2.0 thinking that sees social interaction and bottom-up creation of content as the key to useful tools on the web. If we can overcome the problems we currently have in getting BI software users to contribute metadata, etc., then some interesting things might happen.

My favourite quote from Neil comes at the end, though, and goes directly to the issue of the provision of support to decision-makers. He makes the point that most business people are tech-savvy, often more aware of the latest tech trends than internal IT support staff. They just won't put up with crappy support:

"Look, I'm playing a 3-D video strategy game with four people in China I don't even know, while I'm downloading data to my iPod, while I'm answering messages in Yahoo messenger. Are you going to tell me I can't have a report for three months because it has to go through QA?"

Tuesday, January 16, 2007

Binary Search Broken

Still on the theme of ubiquitous bugs, I was chatting with a friend the other day and he mentioned that the canonical description of the binary search algorithm contained a bug. The Official Google Research Blog has all the details. The binary search algorithm, for those of you who don't have a CompSc background, is an amazingly efficient algorithm for searching an ordered list of items - elegant and simple, it's much more efficient than a simple linear search: a straight through iterative search is order N/2 (on average, for a list of N elements, the search will take N/2 iterations to find a specific element); binary search is order log (2) N. The basic algorithm works like this:

  1. Find the midpoint in the ordered list
  2. Compare the middle element to the value you are searching for.
  3. If the element is greater than the search key, then discard the top half of the list.
  4. If the element is less than the search key, then discard the bottom half of the list.
  5. With the remaining list, find the new midpoint, and repeat until the search term is found.
For a list of 1000 elements, finding a specific element using a linear search would take, on average, 500 comparisons. The algorithm above, a little less than 10. I remember being blown away by this algorithm in an undergraduate lecture - it's so simple, and so powerful. In fact, this divide and conquer approach is used for a number of list-based operations: sorting, searching, and so on.

A typical implementation of the binary search algorithm, and the one used in the Java Developers' Kit, and other code libraries looks like this (taken from the blog post linked to above, and direct from the JDK):

1:     public static int binarySearch(int[] a, int key) {
2: int low = 0;
3: int high = a.length - 1;
5: while (low <= high) {
6: int mid = (low + high) / 2;
7: int midVal = a[mid];
9: if (midVal < low =" mid"> key)
10: low = mid + 1;
11: else if (midVal > key)
12: high = mid - 1;
13: else
14: return mid; // key found
15: }
16: return -(low + 1); // key not found.
17: }
The problem is in the line that finds the midpoint in the list: (low+high)/2. For most applications, this will work fine, but as low and high get very large, there's a danger that the maximum integer value for a variable is approached (that's 2^31-1, or about 2 billion for Java). In other words, if the search list contains billions of elements, the algorithm as implemented above will overflow to a negative value (since the topmost bit represents the sign of a number), and throw an error when you try to look up that element.

There are solutions of course (there are other ways to calculate the midpoint without adding two very large numbers together). But the bug is a timebomb for any application that needs to search or sort very large lists. Sure, 2 billion is a large number, but I'm sure there's a few data warehouses out there that would be dangerously close to that number in terms of fact table rows. Be sure that your DBMS vendor is all across this - it took 10 years for the bug to show up in Java.

Thursday, January 11, 2007

Excel Patch for Standard Deviation Bug

Just noticed a new update for the Mac version of Microsoft Office 2004 (11.3.3) - note that it's not yet showing up in the automatic update tool. One of the fixes included in the update is for:

an issue that causes standard deviation calculations to produce inaccurate results when the calculations are used in PivotTable reports.

For those of you running Macs (and there's a few, judging by our logs), and using StDev in your pivot tables (or use a tool that does), get on updating.

It does make you think - reliance on any one tool always exposes us to the risk that bugs or kludges in implementation will give us incorrect results, and particularly so with Excel given its ubiquity. I couldn't find any more details on the bug in my quick hunt on Microsoft's site, or a Google search, but did turn up these papers critiquing Excel 97's implementation of a number of statistical functions (referred to and addressed by Microsoft in this KnowledgeBase article):
  • Knusel, L. On the Accuracy of Statistical Distributions in Microsoft Excel 97, Computational Statistics and Data Analysis, 26, 375-377, 1998.
  • McCullough, B.D. & B. Wilson, On the accuracy of statistical procedures in Microsoft Excel 97, Computational Statistics and Data Analysis, 31, 27-37, 1999.
For those of you interested in the use of spreadsheets (in general, not just Excel) and the associated risks, check out the European Spreadsheet Risks Interest Group's site.