Big Data Project (BDP names withheld to protect the innocent) started with - this is the system diagram 'someone draws a big system diagram with loads of connections'. Holy smoke how am I going to get my head around this one was the overriding thought! We are talking about a massive company with massive data requirements - a definition of 'Big Data'. Data has been replicated, re-used, added to across geographic and functional boundaries not to mention individual personal modifications down at Excel (yuk, yuk, yuk) level.
BDP's goal is to try and specify the core functionality of all of this. Well, we have started to plug away at unpicking it using process maps, system diagrams and data flows, so the fog is starting to clear.
The question in my mind though was how did it all get into this position in the first place all of the above was done for the right reasons - to get the day job done. Each core data element seems to have spawned a few siblings which in turn have spawned more. It would be useful to know if there was some measure of the 'robustness' for each and every data repository and what has been their history?
Data store's exploding into many fragments which then exploded even further like a palm firework were the images in my mind. That bizarrely made a connection to my particle physics past! This seemed a bit like the tracks we used to trawl though from the JADE central detector - on night shifts - burned forever into my memory bank!
Which then led me on to thinking about fragmentation functions - essentially how you characterise the cascade of particles from the central annihilation - electron and positron in the JADE case.
In summary (ish);
"Fragmentation functions represent the probability for a parton to fragment into a particular hadron carrying a certain fraction of the parton's energy. Fragmentation functions incorporate the long distance, non-perturbative physics of the hadronization process in which the observed hadrons are formed from final state partons of the hard scattering process and, like structure functions, cannot be calculated in perturbative QCD, but can be evolved from a starting distribution at a defined energy scale. If the fragmentation functions are combined with the cross sections for the inclusive production of each parton type in the given physical process, predictions can be made for the scaled momentum, xp, spectra of final state hadrons. Small xp fragmentation is significantly affected by the coherence (destructive interference) of soft gluons, whilst scaling violation of the fragmentation function at large xp allows a measurement of ."
so now you know!
I'm sure the data we have now started off as 'Big Data' in some form prior to fragmentation so, is there an analogy between particle fragmentation and data fragmentation, and thus a means of potentially predicting fragmentation of new Big Data repositories within an organisation?
Oh well it was nice thinking about it anyway.....