Monash Centre for Decision Support and Enterprise Systems Research

Thursday, April 17, 2008

Data Warehouse / BI Security

Peter O'Donnell and myself are currently supervising an honours student who is looking at the issue of data warehouse security, with a view to doing a survey of DW security practices in Australian companies. It's still early days, but one of the things that Justin has found is that there is very little literature (academic or otherwise) talking about the issue (either highlighting problems, or outlining best practice). This is both good and bad news: it means that Justin will be making a real contribution, but he's going to have trouble writing the literature review section of his thesis!

To give you some idea of where our thinking is at, here's a generic architecture for the flow of information through a data warehouse:

Each component of the diagram above is a potential security problem. Just the ETL process, for example, poses problems of massive amounts of data moving around a network, taken out of what is presumably an initially secure environment. We've found very little that talks about securing the individual components of the architecture, or of taking an holistic view and securing the whole process, end-to-end. On the flip-side, security often poses a problem from a functionality or performance perspective - what can we do to make the whole thing as responsive and functional as possible while still protecting an important organisational assett?

Any thoughts, war stories, pointers to resources or comments would be appreciated!

Tuesday, April 1, 2008

DSS Governance Paper

Here's a link to the final paper that I mentioned previously in the post on DSS governance. Hopefully some people find it useful and/or thought provoking.

Wednesday, March 19, 2008

Trends in Data Warehousing


Last year I co-authored a book chapter with two other colleagues, Peter O'Donnell and David Arnott, on the use of data warehouses for decision support, and it's just recently been published. The book is called Handbook on Decision Support Systems edited by Frada Burstein (another Monash colleague) and Clyde Holsapple. One section of the chapter that I wrote looked at current trends in DW practice, and I thought, as I wrote it in late 2006, that it would probably be better as a blog post, than part of a chapter in a (hopefully long-lived) book. Here's the excerpt. I'd be interested to hear what other people think are the big trends in DW and where it's headed.

Current Trends and the Future of Data Warehousing Practice

Forecasting future trends in any area of technology is always an exercise in inaccuracy, but there are a number of noticeable trends which will have a significant impact in the short-to-medium term. Many of these are a result of improvements and innovations in the underlying hardware and database management system (DBMS) software. The most obvious of these is the steady increase in the size and speed of data warehouses connected to the steady increase in processing power of CPUs available today, improvements in parallel processing technologies for databases, and decreasing prices for data storage. This trend can be seen in the results of Winter Corporation's "Top Ten Program," which surveys companies and reports on the top ten transaction-processing and data warehouse databases, according to several different measures. Figure 11 depicts the increase in reported data warehouse sizes from the 2003 and 2005 surveys (2007 data has not yet been released):


Ten Largest Global Data Warehouses by Database Size, 2003/2005. From Winter Corporation.

The data warehousing industry has seen a number of recent changes that will continue to have an impact on data warehouse deployments in the short-to-medium term. One of these is the introduction by several vendors, such as Teradata, Netezza and DATAllegro, of the concept of a data warehouse 'appliance' (Russom, 2005). The idea of an appliance is a scalable, plug-and-play combination of hardware and DBMS that an organization can purchase and deploy with minimal configuration. The concept is not uncontroversial (see Gaskell, 2005 for instance), but is marketed heavily by some vendors never-the-less.

Another controversial current trend is the concept of 'active' data warehousing. Traditionally, the refresh of data in a data warehouse occurs at regular, fixed points of time in a batch-mode. This means that data in the data warehouse is always out of date by a small amount of time (since the last execution of the ETL process). Active data warehousing is an attempt to approach real-time, constant refreshing of the data in the warehouse: as transactions are processed in source systems, new data flows through immediately to the warehouse. To date, however, there has been very limited success in achieving this, as it depends on not just the warehouse itself, but performance and load on source systems to be able to handle the increased data handling. Many ETL processes are scheduled to execute at times of minimal load (eg. overnight or on weekends), but active warehousing shifts this processing to peak times for transaction-processing systems. Added to this are the minimal benefits that can be derived from having up-to-the-second data in the data warehouse, with most uses of the data not so time-sensitive that decisions made would be any different. As a result, the rhetoric of active data warehousing has shifted to "right-time" data warehousing (see Linstedt, 2006 for instance), which relaxes the real-time requirement for a more achievable 'data when it's needed' standard. How this right-time approach differs significantly in practice from standard scheduling of ETL processing is unclear.

Other than issues of hardware and software, a number of governance issues are introducing change to the industry. One of these is the prevalence of outsourcing information systems - in particular the transaction-processing systems that provide the source data for warehouse projects. With many of these systems operated by third party vendors, governed by service level agreements that do not cover extraction of data for warehouses, data warehouse developers are facing greater difficulties in getting access to source systems. Arnott (2006) describes one such project where the client organization had no IT staff at all, and all 13 source systems were operated off-site. The outsourcing issue is compounded by data quality problems, which is a common occurrence. Resolution of data quality problems is difficult even when source systems are operated in-house: political confrontations over who should pay for rectifying data quality problems, and even recognition of data quality as a problem (in many cases, it's only a problem for data warehouse developers, as the transaction processing system that provides the source data is able to cope with the prevailing level of data quality) can be difficult to overcome. When the system is operated off-site and in accordance with a contractual service level agreement that may not have anticipated the development of a data warehouse, they become even more difficult to resolve.

In addition to the issues of outsourcing, alternative software development and licensing approaches are becoming more commonplace. In particular, a number of open source vendors have released data warehousing products, such as Greenplum's Bizgres DBMS (also sold as an appliance) based on the Postgres relational DBMS. Other open source tools such as MySQL have also been used as the platform for data warehousing projects (Ashenfelter, 2006). The benefits of the open source model are not predominantly to do with the licensing costs (the most obvious difference to proprietary licensing models), but rather have more to do with increased flexibility, freedom from a relentless upgrade cycle, and varied support resources that are not deprecated when a new version of the software is released (Wheatley, 2004). Hand-in-hand with alternative licensing models is the use of new approaches to software development, such as Agile methodologies (see http://www.agilealliance.org) (Ashenfelter, 2006). The adaptive, prototyping oriented approaches of the Agile methods are probably well suited to the adaptive and changing requirements that drive data warehouse development.

The increased use of enterprise resource planning (ERP) systems is also having an impact on the data warehousing industry at present. Although ERP systems have quite different design requirements to data warehouses, vendors such as SAP are producing add-on modules (SAP Business Warehouse) that aim to provide business intelligence-style reporting and analysis services without the need for a separate data warehouse. The reasoning behind such systems is obvious: since an ERP system is an integrated tool capturing transaction data in a single location, the database resembles a data warehouse, insofar as it's a centralized, integrated repository. However, the design aims of a data warehouse that dictate the radically different approach to data design described above in Sections 3.1 and 4 mean that adequate support for management decision-making requires something other than simply adding a reporting module to an ERP system. Regardless, the increased usage of ERP systems means that data warehouses will need to interface with these tools more and more. This will further drive the market for employees with the requisite skill set to work with the underlying data models and databases driving common ERP systems.

Finally, Microsoft's continued development of their Microsoft SQL Server database engine has produced a major impact on Business Intelligence vendors. Because of Microsoft's domination of end-user's desktops, it is able to integrate its BI tools with other productivity applications such as Microsoft Excel, Microsoft Word and Microsoft PowerPoint with more ease than their competitors. The dominance of Microsoft on the desktop, combined with the pricing of SQL Server, and the bundling of BI tools with the DBMS means that many business users already have significant BI infrastructure available to them, without purchasing expensive software from other BI vendors. Although SQL Server has been traditionally regarded as a mid-range DBMS, not suitable for large-scale data warehouses, Microsoft is actively battling this perception. They recently announced a project to develop very large data warehouse applications for an external and an internal client, to handle data volumes up to 270 terabytes (Computerworld, 2006). If Microsoft are able to dispel the perception that SQL Server is only suited for mid-scale applications, it will put them into direct competition with large-scale vendors such as Oracle, IBM and Teradata, with significantly lower license fees. Even if this is not achieved, the effect that Microsoft has had on business intelligence vendors will flow through to data warehousing vendors, with many changes being driven by perceptions of what Microsoft will be doing with forthcoming product releases.

Friday, February 29, 2008

Threat Level "Burgundy, If You Will"

Just when you thought you'd seen every stupid data visualisation trick out there, someone invents the "magic pie chart" and the rotating "statistical lazy susan." The US cable news networks are outdoing themselves during the US presidential primaries, and breaking just about every rule in the data visualisation book. BI vendors, eat your hearts out! Check out this gem from The Daily Show with Jon Stewart:


Wednesday, December 12, 2007

A Response to DSS Governance

Every couple of years or so, you get one of those rare students in a class - one who actually teaches you something about the topic. Bruce Fowler was one of those students in a Data Warehousing course that I taught earlier this year. He's a data warehouse manager for a resource management company, and so has an interest in DW management and governance issues.

He posted a response to the DSS Governance post I put up recently, but due to the new comments system that Peter's been playing around with, it didn't come through. It was substantive enough, and included some diagrams, that I thought it reasonable to post (with Bruce's permission!) his comments as a separate entry. All images in (and linked to by) this entry are Copyright © 2007 Bruce Fowler

The comparison of IT, railroads and power as "infrastructure technologies" - sharing characteristics of competitive advantage, ubiquity and finally commoditisation (followed by a loss of strategic benefit and value) is a fairly long bow. It is important to remember that the catalyst for the demise of rail was not its cost or availability (at least not directly), it was the advent of more cost effective and time efficient technological alternatives (combustion engines, aeroplanes). I am not sure I would support the contention that the commoditisation of a technology has any specific correlation to the technology’s loss of strategic initiative or competitive advantage.

Railway lines continue to provide cost effective and logistically efficient means of transporting large volumes of material across the country (provided the infrastructure exists and the alternative means remain less cost effective and time efficient), and are being used in new and innovative ways to supplement conventional income streams for logistics organisations through integrated fibre networks. Power companies continue to explore delivery of new products and services over existing infrastructure (i.e. broadband over powerlines), and are currently reinventing themselves in biofuels space to enable delivery of “green” energy to a more environmentally conscious market. Combustion engines are being redesigned to be more fuel efficient and “environmentally friendly”.

The delivery platforms have been around for some time and form part of our everyday lives – their use is evolving in new and innovative ways. IT – perhaps more than any other technology platform – has the capacity to continue to be adapted and evolved to meet the ever-changing demands of its user base. Commodity? Yes. Does it matter? Of course it does.

Back on topic …

I suspect there is a significant difference in the structure, objectives and necessities of Corporate Governance and IT Governance; and of the relationship between the two in comparison to the same for DW/DSS Governance. Even then, there are perhaps different factors that need to be considered from the perspective of DW Governance versus DSS Governance, and their respective relationships with IT Governance.

Consider the basis for the introduction of Corporate Governance – a means of managing the seemingly inevitable consequences of the centralisation of power and decision making authority (Husted, 1999); then consider the basis for the introduction of IT Governance – the patterns of authority for the significant IT activities of an organisation including IT Infrastructure, IT Use and IT Project Management. In simplistic terms, the former focuses on risk mitigation, the latter on efficiency and effectiveness.

Over time, the use of Corporate Governance models as a direct risk mitigation strategy has given way to an army of formalised standards, auditing and reporting obligations – spanning multiple levels of business, including technology operations – administered by committees through levels of delegation of authority. The line between Corporate Governance and IT Governance has blurred, and the need to maintain alignment of IT (infrastructure, use and project management) with the organisations mission objectives is now forefront in the minds of most informed corporate executives. This alignment recognises that the critical strategic importance of IT to successful business operation.

Get it right, and IT can (at a minimum) provide a stable platform from which other strategic endeavours can be launched. Get it wrong, and a failed IT system or project can (in the best of cases) reduce your business efficiency or effectiveness, or (in the most severe of circumstances) end your business (someone say ERP?).

A pencil is a commodity. IT is a tool that can be used to create competitive business advantage, or as easily be misused resulting in catastrophic business or process failure.

I find myself off topic again …

The evolution of DW Governance arrangements depends on the confluence of many factors that interact with one another in a number of complex ways (Sambamurthy and Zmud, 1999). The key factors, their interactions and dependencies are included below.

Given the nature of the relationships identified below, perhaps we could identify the interactions has a loosely coupled hierarchy: with each child exhibiting some characteristics of its parent, and the will and initiative to move around (and sometimes break out of) the boundaries defined by the ever-watchful parent (who will evolve and adapt their boundaries to meet the growing needs and demands of the child, but have the foresight and capacity to bring the child back into line if needs be).

Thanks Bruce. Generally, I vigorously agree with everything you've written here. If you would like a copy of Bruce's original diagrams, drop me a line and I'll pass on the request.

Monday, December 10, 2007

DSS Governance

In the last post, I mentioned I had been looking at the issue of governance and DSS. In fact, this is something I've been thinking about since a student asked in a lecture if there was anything on data warehouse governance a couple of years ago, and I've just written a paper for the bi-annual conference for the academic DSS community, IFIP Working Group 8.3.

The paper is currently under review, so I won't post it here yet (I'll put up a link when it's gone through that process), but I thought I'd put the basic argument out there for people to comment on, since it's all still conceptual at this stage.

IT governance is an important topic - on the one hand corporate governance is a big thing; and on the other, we've got Nicholas Carr telling us that IT is not a strategic advantage for organisations. The IT industry needs to ensure that we're managing an important corporate resource effectively.

As a significant chunk of the IT industry DSS (ie. BI) is all a part of IT governance. Unfortunately, there's not a lot of academic work that talks about how to do this effectively for DSS (there's a bit on data warehousing, but that's it).

The argument I make in the paper is based on the idea that DSS is different to other kinds of IT in two ways:

  1. DSS are chaotic systems. They evolve. They can and should evolve quickly. If they don't evolve, then there's something wrong: learning isn't taking place, and the system isn't doing what it's supposed to: provide support with semi- or un-structured decision-making. Sure, evolutionary development is used to build all kinds of systems, but there is usually some end-point in mind where the system becomes stable (relatively). This isn't the case for DSS.
  2. DSS are subversive systems. They're designed to deal with strategic decisions (a corollary of being built for semi- and un-structured decisions). Their use deliberately changes some aspect of an organisation's structure (not the physical structure, but the organisation's strategic direction, policies, values, procedures, work-flows, etc.). Other systems may have this effect too, but often it's not deliberate. With DSS it's intentional - part of it's raison d'etre.

IT governance is largely about how to control IT resources, enforce standards, and manage changes in a methodical fashion. It's based on a mindset of stability and prediction. Although there's been a lot written on IT governance (check out Weill & Ross's excellent book on IT governance), and a lot of it focuses on the appropriate approach for given organisational types (eg. centralised versus decentralised management cultures), there's nothing I've seen that actually takes characteristics of the technology into account. My assertion in the paper is that the underlying assumptions of a given governance approach should be consistent with the underlying assumptions embedded in the technology being governed.

Bureaucratic approaches that are appropriate for managing technologies like transaction-processing systems - steering committees, IT councils, service level agreements, etc - are inappropriate for chaotic systems. Enforcement of an organisational structure on the operation of a technology is inappropriate for a technology designed to question and change that same structure. Excessive control can (and has, we've seen it) stifle and eventually kill a DSS project.

The conclusion that I come to is that for DSS to thrive, the developers and users need the autonomy to play around with the system's design and functionality without going through multiple layers of bureaucracy. DSS should operate, therefore, in a kind of 'governance sandbox', where the DSS team are trusted to do the right thing as they see it. This kind of approach needs some clear boundaries however, including clear goals and objectives, and what constitutes overstepping the mark. This in turn requires a pre-existing, well planned general IT governance strategy.

Anyway, those are my current thoughts. Feel free to shoot through any comments. A couple of issues spring to mind, such as how do different kinds of DSS technologies differ in their governance requirements - eg. data warehouses versus dashboards versus small-scale throwaway spreadsheets. What specific mechanisms work for DSS governance inside the 'sandbox'? How much scope should DSS developers and users have? All food for future research...

Monday, November 12, 2007

IT Archaeology: Whatever Happened to SDS?

I've been digging through some ancient texts (in IT terms) over the past week or so, looking at the issue of IT governance and how it relates to the development of decision support systems. In doing so, I read again an article from 1971 in Sloan Management Review that first coined the term 'decision support system' written by two academics from MIT: Anthony Gorry and Michael Scott Morton.

In defining DSS, they also designed a second class of information system that I'd completely forgotten about, known as a 'structured decision system' (SDS). The term 'structured' comes from an adaptation of a model of decision-types by Nobel laureate Herbert Simon. In Gorry & Scott Morton's terms, decisions are either structured (well-understood, fairly easy to resolve, lend themselves to well-defined workflows or decision-rules), unstructured (difficult, high levels of ambiguity, no clear process for making the decision) or somewhere in-between (semi-structured).

Gorry & Scott Morton argued that such systems are designed to support semi- and un-structured decision problems. For structured decisions like inventory control, short-term forecasting and so on, they coined the term SDS.

This got me thinking. I had always seen today's BI systems as the inheritor of the DSS concept - basically the latest term in a long string of marketing names for systems designed to support managerial decision making. Now, I'm not so sure. In looking at Gorry & Scott Morton's definitions, most BI tools seem to be targetted more at structured, rather than unstructured decisions. Couple this with efforts by people like Howard Dresner to shift the BI concept more and more to enterprise reporting, and I reckon that BI is more about SDS than DSS.

Which is a problem.

There is a class of decisions that really need the support of systems that can help the decision-maker through the decision-making process by embedding principles of good decision-making in the system, and helping them make sense of the information they have. These decisions tend to be strategic and important, and the potential ROI for a system that can improve decision-making in this area goes way beyond being able to run an operational report that used to take hours in 30 seconds. In other words, there is a real need for the decision support systems that Gorry & Scott Morton described 36 years ago, and I don't think that BI tools are currently being used to build them.

This is not to say that they're not being built, of course. Instead, I think they tend to fly under the radar more, as individual strategic decision-makers throw together systems to answer specific questions in a way that they're comfortable with: witness the continued pervasion of Excel in the upper echelons of organisations despite the flash-wizardry of the latest 3D-pie-charting engine; skunk-works projects for individual managers; the explosion of independent data marts.

As I said above, the impact of the effective use of a DSS far outstrips that of a SDS because the decisions made fundamentally and directly affect the strategic direction of a firm. It's critical that good quality DSS are developed, and I don't think we're going about it in the right way at the moment. Excel can be dangerous if not used properly, but because everyone is focused on the highly-visible SDS-like BI projects, the DSS-needs of an organisation are often addressed in an ad hoc way.

It's a pity that we don't pay attention more to what was written in the past about IT. Not only do you realise that we keep making the same mistakes despite the changing technology, you come across interesting ideas that can change the way you perceive today's industry.