Big Data
The Internet made mass data accessibility possible. While computers were storing MBs of data locally on the internal hard drive, GB hard drives were available but priced only for the enterprise. I remember seeing a 1GB hard drive for over $1,000. Now we are throwing 32GB USB drives around like paper clips. We are now moving past 8TB, 7200 RPM drives and mass systems storing multiple PBs of data. With all this data, it is easy to be overwhelmed. Too much information is known as information overload. That is when too much information makes relevant information unusable due to the sheer amount of available information. We can’t tell usable data from unusable data.
In recent years, multi-core processing, combined with multi-socket servers has made HPC (High Performance Computing) possible. HPC or grid-computing is the linking of these highly dense compute servers (locally or geographically dispersed) into a giant super-computing system. With this type of system, the ability to compute algorithms that would traditionally take days or weeks are done in minutes. These gigantic systems laid the foundation for companies to have a smaller scaled HPC system that they would use in-house for R&D (Research and Development).
This concept of collecting data in a giant repository was first called data-mining. Data-mining is the same concept used in The Google(s) and the Yahoo(s) of the world. They pioneered this as a way to navigate the ever growing world of information available on the Internet. Google came out with an ingenious light-weight software called “Google Desktop”. It was a mini version of data-mining for the home computer. I personally think that was one of the best tools I have ever used for my computer. It was discontinued shortly thereafter for reasons I am not aware of.
The advancements in processing and compute made data-mining possible, but for many companies, it was an expensive proposition. Data-mining was limited by the transfer speeds of the data on the storage. This is where the story changes. Today, with SSD technologies shifting in pricing and density, due to better error correction and fault predictability and manufacturing advancements, storage has finally caught up.
The ability for servers to quickly access data on SSD storage to feed HPC systems, opens up many opportunities that were not possible before. This is called “Big Data”. Companies can now run Big Data to take advantage of mined data. They can now look for trends, to correlate and to analyze data quickly to make strategic business decisions to take advantage of market opportunities. For example; A telecommunications company can mine their data to look for dialing patterns that may be abnormal for their subscribers. The faster fraud can be identified, the less financial loss there will be. Another example is a retail company that may be looking to maximizing profits by stocking their shelves with “hot” ticket items. This can be achieved by analyzing sold items and trending crowd sourced data from different information outlets.
SSD drives are enabling the data-mining/Big Data world for companies that are becoming leaner and more laser-focused on strategic decisions. In turn, the total cost of these HPC systems pay for themselves in the overall savings and profitability of Big Data benefits. The opportunities are endless as Big Data has extended into the cloud. With collaboration combined with Open Source software, the end results are astounding. We are producing cures for diseases, securing financial institutions and finding inventions through innovations and trends. We are living in very exciting times.