Saturday, 13 April 2013

Bottom-up and Top-down

No ..... not that sort .....

Validation and verification related of course!

The Question:
You have a brave new view of the future for you operations - big data related to big assets and all that new stuff is banging on your door. Why aren't you using this to improve efficiencies in the business? Everyone else is - just read about what you can do.

Issue:
You have an operation that currently runs not that badly, is very complex, and has lots of fragmented data. How can you start to introduce a new big-data type system into what you do?

The Solution:
You need to start by gathering requirements for the new system, have a look through these and then see which can be implemented and on what timescales - of course - simples!

Well that's all very good from the 10,000 ft management helicopter view of the problem. The next step in this world is a bit of Start Trek Management (STM),

'Make It So'  number one.

and off we go.

Meanwhile in a universe near you, requirements gathering has started, as the make it so at this level doesn't involve much thinking, just a bit of organising of meetings. This usually goes well, everyone wants to get their 'issues' out on the table, "and a want a yellow button in the top corner of the screen" type stuff along with "we would like to manage risk at an enterprise level". Result - one big bucket full of requirements! Yes, yes I can hear you requirements management types - structured approach, attributes blah, blah... Unfortunately, here in the real world the Captain wants progress, and NOW! So things happen, and the feedback is good, everyone is venting, carry on number one! More workshops - they work. Bucket gets fuller and fuller - big data gone mad. We need a management tool for all this, role out some requirements software to manage it all. Phew thank goodness that existed now we can relax can't we? But no - its just a fancy bucket - we shall have to engage (STM) brain to figure out what to do with all this data (sorry - poor STM jokes).

Number one, "where are we" - "we have a bucket full sir"

So, the problem with the bottom-up set of activities is that you end up in a position where you can't see the wood from the tree's. While the STM top-down view of the world ends up launching a raft of projects but you are never sure if they will connect with the real world. The conclusion so far is that, unless you do both BU and TD then you will never figure out if your big data related initiatives will be viable and add value to the business.

Not thought further than this yet ..... sits down and puts fist on chin .....



Saturday, 6 April 2013

3D's of computing (Data, Devil, Detail)

This week has seen a flurry of activity under the banner of Big Data!

The finale was Friday evening watching a recording of this weeks Horizon programme on 'Big Data' - which I watched with 3 of my advisor's - sounded like geek heaven to us. Anyway the programme unfolded, blah, blah, big data, lot of 1's and 0's flashing over the screen to show you where the big data was coming from and going to. As it went on though, I personally, was having trouble keeping my face straight - to the annoyance of one of my advisor's who kept telling me to shut up. Having slept on it and having been immersed in a real live project for the past month or so directly dealing with Very Big Data (maybe that will catch on - VBD ;) the things that were bothering me boil down to the following;

  1. there's a 'smoke and mirrors' feel about a lot of this big data talk. Certainly there is vast potential for mining data, but, from what I've seen 'ordinary' companies are miles away from being in a position to exploit it fully. Enter the big data repository suppliers who will solve all your big data consolidation and mining problems for you. Off you go....
  2. enter the mythical 'algorithm' - is having this central repository going to work. As in the Horizon programme, when you need to access the data all you do is create the algorithm to do what you need - simples! You have your data, you can access it from anywhere at any time (oh yes you can) what are you going to do with it (in my world you should have though of that beforehand but that's another story) you have your bucket of data and want to fish out some 'benefit'. What do you do, you write an algorithm to do this - most of this algorithm is just searching and filtering and displaying - not much algorithm about that. However, there could be an analysis element in this algorithm too - sounds like you need to dust of the old Fortran compiler to me! What's the problem, the problem is spreadsheets, everyone wants to run their own personal 'algorithm' dealing with their own specific needs - and quite rightly too! They take an extract of the big data, do some work on it, write the report and off they go. Well, probably a bit more than that but you get the idea! All this leads to fragmentation (again) of the data set as it is difficult to re-upload you work back into the mother ship. 
  3. what's needed of course is a managed way of allowing access to the big data and development of local 'algorithms' - sounds like app development to me! These can use and refresh the big data appropriately. Sorry seem to have entered the smoke and mirrors zone again. Great aspiration but do 'ordinary' companies really have the quality of data to allow meaningful apps to be developed?
The thoughts continue, keep smiling......

Saturday, 30 March 2013

Rain Cloud Computing

It all seems to be going pear shaped - rant, rant.

Maybe its the credit crunch finally starting to kick in but there seems to have been a rapid up-tick in the removal of freemium services, or worse still, a charge (now that's against the driving principle for me) starting to be applied.

Example that I have noticed this week - Google Reader (gone), Scoop-it shared curation (payment now required), LinkenIn book reader list (disappeared - along with all that bumph I wrote?).

All this puts a distinct shadow over 'cloud-based services' in my mind. In fact we don't really have a 'cloud' - as I've pointed out in previous blog's - what we have is a load of separate islands of adventure each with their own data store and application arrangements. Under total control of whoever runs the site. Sounds like a mainframe, operates like a mainframe, it is a mainframe - as they say.

So you get lured in on a cloud-like freemium offering and then once you have been locked in the squeeze is put on. Probably get this blog taken down now, just you see!

Some things you will probably think it worthwhile paying for others will fall by the wayside - what services do you really need? Don't give your data away without due consideration - that goes for companies too - you have been warned!

Why can't we build a proper cloud - rather than these islands - source data accessible by any means and not controlled by any one outfit - or is that a bit naive ;)


Saturday, 23 March 2013

Compuplexity

Complex systems seem to be raising their head quite a bit these days - or is it just me noticing complexity now that I am into my MOOC!

Just had another example from my friend Ian from the BCS related to a paper on complex system failures - ref http://www.ctlab.org/documents/How%20Complex%20Systems%20Fail.pdf

What I have been thinking about is that the computer system I have been working on recently also falls into this complex system arena - see previous posts re-background. Some of the parallels are;


  1. it is fundamentally composed of simple elements - spreadsheets (yuk) and simple calculation engines re used at the coal face,
  2. it has grown by the 'system' taking the good bits and improving upon them and killing of the parts that done work well - though there is a lot of old code still lying around!
  3. nobody really knows how it works - it just does
  4. people are by far the biggest element of the system and make it work despite it running with flaws
So according to my MOOC that would put it in the complexity box - and thats without considering fragmentation functions!

So now the question is how do you go about improving such a beast? Well thats the job really, but it isn't so straightforward a question to answer now that I have had a look in the box. The simple - helicopter - view is to just simply consider it a box. Lets just buy a better box! However, some critical things hang off the back end of the output from this box. So you had better not muck those up - or you are in jail! 

Given the paper referenced above - even simple changes can have big impacts!

More thought required.....


Friday, 15 March 2013

Badge addiction!

I've registered on Foursquare - shock horror - have no idea why - got bored and was playing with it!

It reminds me of clocking on at the mill - yes I did work in a mill in a previous life - when we still had some in the UK that is!

However, I now have powers - I am the Mayor of the Travelodge Milton Keynes Central - wow! I can see how these badges become addictive. I'm going to try and become the mayor of my local ASDA next. Seeing as I seem to meet most of my work colleagues there doing the weekly Saturday shopping this could end up as a cross company Badge competition!

Its all a bit scary - you can see what others have been up to and others can see what you have been up to - useful for filling out your timesheet! Though I have discovered that connecting up with a work colleague that I wouldn't normally have much to do with has created an odd relationship. I know how he gets to work what he does for lunch and when he arrives home for the weekend. Is this a good thing - not entirely sure. There is something there, but it feels a little voyeuristic to be honest.

I can see that 'checking in' you could also meet up with new contacts - useful on a work front as well as social. I now feel obliged to check in at my mayoral residence(s) and I do feel sense of responsibility to these places weirdly!

Obviously something that needs a bit more investigation.....

Sunday, 10 March 2013

Big Data and Fractals all in one post!

Data fragmentation was the topic of the last post - and this weeks meandering thoughts have also been on data fragmentation and measures of its complexity - now that is a bit of a mind bender - and whether the advent of Cloud Computing (aka mainframes) will help in sorting the fragmentation mess out?

The problem as I see it is everything starts with a plan of having a central 'Big Data' repository (aka Computing Centre) from which all decision making analysis can be driven. However, in reality - out in the field - individuals need some local, specific, analysis to be performed to help them do their job. So they take a data extract from the 'Big Data' and do what they need to do. The problem is, these extracts over time, can take on a life of their own, along with growth of all sorts of other associated ecosystems. This cycle of events can continue down to individual spreadsheet levels!

Aside: I have to come clean and confess that I have made extensive use of Excel (filter functions) this week - given my panning of Excel programming this does feel a little hypocritical - however - they have proven very useful - just illustrating the ease with which you can get drawn into this! Its not been real coding though - so I think I am still OK ;)

So, where is all this going? The question is, is it possible to measure the complexity of this fragmentation using some measure of the fractal dimension of the data sets - that's a thought from the MOOC course I'm taking! Can this be used to estimate the amount of effort required to consolidate the fragmented data? In fact, how do you calculate the dimension of a dataset? Will Cloud Computing help solve some of these problems going forward? The root cause of the fragmentation is people wanting something that corporate locked down system do not provide - will the new Cloud systems give people the freedom to build (under proper supervision) what they need locally or will it end up in this non-virtuous cycle again?  What is the probability of the fragmentation occurring again?

Need to watch the next lecture on the course - maybe there is no connection!!

Obviously more questions than answers here - the revival continues ......


Sunday, 3 March 2013

Big Data fragmentation function ....

Got involved in 'Big Data' type activities this week then some 'physics from the past' emerged out of random thought processes!

Big Data Project (BDP names withheld to protect the innocent) started with - this is the system diagram 'someone draws a big system diagram with loads of connections'. Holy smoke how am I going to get my head around this one was the overriding thought! We are talking about a massive company with massive data requirements - a definition of 'Big Data'. Data has been replicated, re-used, added to across geographic and functional boundaries not to mention individual personal modifications down at Excel (yuk, yuk, yuk) level.

BDP's goal is to try and specify the core functionality of all of this. Well, we have started to plug away at unpicking it using process maps, system diagrams and data flows, so the fog is starting to clear.

The question in my mind though was how did it all get into this position in the first place all of the above was done for the right reasons - to get the day job done. Each core data element seems to have spawned a few siblings which in turn have spawned more. It would be useful to know if there was some measure of the 'robustness' for each and every data repository and what has been their history?

Data store's exploding into many fragments which then exploded even further like a palm firework were the images in my mind. That bizarrely made a connection to my particle physics past! This seemed a bit like the tracks we used to trawl though from the JADE central detector - on night shifts - burned forever into my memory bank!

Which then led me on to thinking about fragmentation functions - essentially how you characterise the cascade of particles from the central annihilation - electron and positron in the JADE case.

In summary (ish);

"Fragmentation functions represent the probability for a parton to fragment into a particular hadron carrying a certain fraction of the parton's energy. Fragmentation functions incorporate the long distance, non-perturbative physics of the hadronization process in which the observed hadrons are formed from final state partons of the hard scattering process and, like structure functions, cannot be calculated in perturbative QCD, but can be evolved from a starting distribution at a defined energy scale. If the fragmentation functions are combined with the cross sections for the inclusive production of each parton type in the given physical process, predictions can be made for the scaled momentum, xp, spectra of final state hadrons. Small xp fragmentation is significantly affected by the coherence (destructive interference) of soft gluons, whilst scaling violation of the fragmentation function at large xp allows a measurement of $\alpha_s$."
(ref; http://ppewww.ph.gla.ac.uk/preprints/97/08/gla_hera/node5.html)



so now you know!

I'm sure the data we have now started off as 'Big Data' in some form prior to fragmentation so, is there an analogy between particle fragmentation and data fragmentation, and thus a means of potentially predicting fragmentation of new Big Data repositories within an organisation?

Oh well it was nice thinking about it anyway.....