Data fragmentation was the topic of the last post - and this weeks meandering thoughts have also been on data fragmentation and measures of its complexity - now that is a bit of a mind bender - and whether the advent of Cloud Computing (aka mainframes) will help in sorting the fragmentation mess out?
The problem as I see it is everything starts with a plan of having a central 'Big Data' repository (aka Computing Centre) from which all decision making analysis can be driven. However, in reality - out in the field - individuals need some local, specific, analysis to be performed to help them do their job. So they take a data extract from the 'Big Data' and do what they need to do. The problem is, these extracts over time, can take on a life of their own, along with growth of all sorts of other associated ecosystems. This cycle of events can continue down to individual spreadsheet levels!
Aside: I have to come clean and confess that I have made extensive use of Excel (filter functions) this week - given my panning of Excel programming this does feel a little hypocritical - however - they have proven very useful - just illustrating the ease with which you can get drawn into this! Its not been real coding though - so I think I am still OK ;)
So, where is all this going? The question is, is it possible to measure the complexity of this fragmentation using some measure of the fractal dimension of the data sets - that's a thought from the MOOC course I'm taking! Can this be used to estimate the amount of effort required to consolidate the fragmented data? In fact, how do you calculate the dimension of a dataset? Will Cloud Computing help solve some of these problems going forward? The root cause of the fragmentation is people wanting something that corporate locked down system do not provide - will the new Cloud systems give people the freedom to build (under proper supervision) what they need locally or will it end up in this non-virtuous cycle again? What is the probability of the fragmentation occurring again?
Need to watch the next lecture on the course - maybe there is no connection!!
Obviously more questions than answers here - the revival continues ......