How Big Data affects the Data Center –Harnessing Big Data challenges for the Data Center
Big data as the name suggests brings a massive load of data on the data center to manage. Management of big data is not just limited to managing the databases, but also includes capturing, organizing, maintaining, storing, searching, sharing, analyzing and presenting data when required. In order to meet with these numerous requirements with respect to Big data, data centers need to be equipped adequately.
To equip data centers to handle big data, let us first understand how big data impacts data centers. The first impact on the data center is the change in transactions. Most data centers of today focus on online transaction processing i.e., they focus on receiving the request and sending the response, for e.g., file servers keeping the data and transmitting it when there is a requirement, DNS servers answering the queries when they receive them, web servers sharing content when they receive requests to share and likewise. In contrast, batch processes take back seat, either they are executed during after office hours or are executed on a low priority. With big data, this changes… the batch processes become as important as online transaction processing and have to be run in real time. For e.g.: For an online retailer, when a prospective customer selects an item and adds it to the shopping cart, the system should automatically profile the customer, match it with similar customer profiles and make recommendations based on analytic results. This all should happen in real time. To support all these transactions/processes that are happening in real time, a data center must be equipped with enough processing power, storage I/Os and network bandwidth. However, ‘enough capacities’ will only be able to handle the current workload, to handle big data, data center capacities have to be more than just being enough.
The second impact of big data will be on how databases are stored in data centers. The data not only has to be captured but also stored and curated. Since the quality of the big data analytics is only as good as the data stored, the quality of data stored needs to be good. Quality of data, here, refers to data that is freefrom any errors in capturing it, is complete, clean, accurate and is not duplicated. Clean a good quality data is an important constituent that has a direct effect on the success of the big data analytics project.
The third impact is on data retention. Depending on which industry segment your organizations falls in; you will be subject to different data retention regulations. In order to comply with these regulations, the IT department needs to ensure that every data is properly stored and archived, can be accessed easily when required, is backed up and can be presented to the regulatory authority upon request. In case of big data, this poses a challenge on the systems for archival and subsequent retrieval of data. Data Centers, thus, have to be adequately equipped with the resources necessary for storage of data. The fourth impact of big data is on the organization’s business processes and requirements. Enterprises are increasingly evaluating big data technologies to enable performance-enhancing strategies across their organizations. Four key focus areas include faster queries, reports, dashboards, and key performance indicators (KPI) delivered from the database; a single view of everything consolidated into a high performance database; a single version of the truth (governed through tight controls on the trustworthiness of information in the database) and self-service information exploration (interactive visualization, what-if analysis, simulation and modeling of information in the database).
Big Data, Big Data Center Challenges
Technologies like Hadoop, MongoDB and other big data solutions were created to help businesses and organizations store, analyze and process the vast expanses of unstructured data created by today’s SMAC-centric activities and processes. The challenges are huge for vendors providing data center services: Big Data is invariably cloud-driven, and the underlying infrastructures must be resilient and dynamic enough to support the requisite horizontal scaling for related solutions. Data centers must be ready to provide and support underlying infrastructure to address increasingly larger compute and storage requirements. This involves not only the network but also the power — namely, reliable and renewable power that leaves a smaller carbon footprint while accommodating shifting demands safely and efficiently.
Today’s data centers are application-centric, powering the many business SaaS applications, standalone websites and e-commerce offerings on the web. Tomorrow’s data centers need to be data centric: storage and infrastructure capacity must be expanded to support IoT/Big Data-generated information. This also affects future bandwidth in data centers, as resources will be mostly consumed by IoT sensors and machines, as opposed to user activity and behavior.