The BIG Picture on BIG DATA

Posted by on Apr 1st, 2012 and filed under Features. You can follow any responses to this entry through the RSS 2.0. You can leave a response or trackback to this entry from your site

The rush is on to capture the value found in massive waves of data—and deliver it to government clients in new ways they can use

It can detect a pandemic in the offing, an inappropriate tax payment, a costly electricity drain. It can guide decisions that change entire government agencies and resource use for years to come. And it’s all around us—potentially as easy to harness as the flow of wind or water.

It’s big data, and it’s transforming the way federal agencies and government contracting companies operate.

With the leap to the cloud giving big data the room to reveal its potential, agencies and GovCon companies can collect, store, and use more data than ever before. The problem this raises is simple, even if the solution is not: Civilian and defense agencies have more data than they know what to do with. And hidden in these exponentially increasing words, numbers, and images is information that can help vastly increase efficiency, productivity, and national security.

How to extract value from this tsunami of information—and how to best serve the agencies facing this problem—is a challenge many GovCon leaders are confronting head on. Big data is sparking new technology developments, strategy, and resource positioning, and it’s directing the dynamic GovCon M&A activity toward new goals.

But just how big—and valuable—is big data?

Beyond the Buzzword

The numbers are impressive. According to IBM, every day, we create 2.5 quintillion bytes of data—so much that 90 percent of the data in the world today has been created in the last two years alone.  A McKinsey Global Institute paper tells us:  $600 buys a disc drive that stores all the world’s music and 30 billion pieces of content are shared on Facebook every month (and that figure is probably outdated by the time you read this).

But one of the best big data definitions comes from a pioneer: Gartner research vice president Doug Laney, who recently told commenters in his blog that big is “entirely relative. … Therefore, the tongue-in-cheek definition I use is, ‘Big data is data that’s an order of magnitude bigger than you’re accustomed to, Grasshopper.’”

In 2000, working as an analyst at META before it became part of Gartner, Laney threw down in a research note the now ubiquitous “3Vs” of big data: Volume, velocity, and variety.

  • Volume. Big data starts measuring in terabytes (1,000 gigabytes) and even petabytes (1,000 terabytes) of information. Cisco, in its 2015 predictions, sees amounts beyond this as commonplace: 1,000 petabytes equals an exabyte—and 1,000 exabytes equals a zettabyte. In a report, the company defined a zettabyte like this: the equivalent of 250 billion DVDs, 36 million years of HD video, or the volume of the Great Wall of China, if an 11-ounce cup of coffee represents a gigabyte of data.
  • Velocity. Big data is also fast data. “Often time-sensitive big data must be used as it is streaming into the enterprise in order to maximize its value to the business,” IBM reports in the big data basics section of its website. Cisco’s prediction: “It would take over five years to watch the amount of video that will cross global networks every second in 2015.”
  • Variety. Beyond numbers, tables, and words, big data includes video surveillance from sensors, click streams, medical records, snapshots posted to Facebook, tweets, the list goes on. At some level, it’s all fair game, and it all needs to find some kind of common ground to be useful. Big data is more often of the unstructured variety, which brings in some of the biggest challenges.  Cisco predicts: “Most of the Internet traffic by 2015 will be video—specifically, long-form video of more than seven minutes.”

What It Means to GovCon

As a consumer, you can easily see big data in play when an ad tailored to your tastes pops up on your Facebook page. But how does this transformative technology operate in the GovCon world?

“The rise of data analytics technology has allowed government agencies to analyze the outcomes of their programs, tighten the efficiency of their operations, and identify areas to cut costs,” pointed out a report from the Government Business Council, underwritten by Deloitte, titled “Demanding More: How Federal Agencies Use Data Analysis to Drive Mission.”

“The wealth of information that the public sector manages—from the data it collects from government programs to the data it is able to mine—affords opportunities for insight through analysis,” the report stated.  At the same time, budget cuts caused the Office of Management and Budget to specifically ask that agencies “acquire, analyze, evaluate, and use data to improve policy and operational decisions.”

Yet agencies surveyed for the GBC report expressed uncertainty at meeting the OMB request. “Information is currency in both business and government; but the research found that many federal agencies don’t have the data they need readily available, even with difficult financial decisions looming,” wrote Erin Dian Dumbacher, associate director of research at GBC.

And that’s something many in GovCon see as an opportunity. “Big data represents two important trends impacting federal agencies: the growth of large, unstructured data sets and real-time streaming of information from multiple sources coupled with the need to apply sophisticated analytics to massive amounts of information in order to draw insights,” said Kay Kapoor, managing director, Accenture Federal Services. “This need is growing rapidly, as agencies experience exponential growth in data volumes and complexity.”

“Imagine being able to conduct a free text search over billions of dollars’ worth of transactions in financial systems,” said Ray Muslimani, president and CEO of GCE, which does big data work for the Department of Labor and other agencies. “Or being able to identify spending trends across your agency and compare them to other agencies in real time. Or getting instant notifications from the service when patterns are identified behind the scenes. That’s the type of functionality and intelligence that big data can bring to the enterprise.”

GCE’s agency financial management solutions earned the company a place as an honoree in last year’s Government Big Data Solutions Awards, given by consultants CTO Labs; the winner was the GSA for its USASearch.

And because of its sheer size, big data touches on other hot topics: mobility, sensors, social networking, predictive analytics, and most of all, the cloud. “In practice, there is a strong tie between cloud analytics and big data,” said Mark Herman, executive vice president, Booz Allen Hamilton, who leads the company’s Cloud Analytics Center of Excellence. “We see big data capabilities as part of a broader cloud ecosystem.”

“Right now, it’s still a little bit of a mysterious topic for many of the folks who are dealing with the day-to-days of ‘how do I close the data center?’ and ‘how do I take my utilization of a server from 15 percent to 70 percent?’” said Yogesh Khanna, vice president and chief technology officer, CSC. His company was an early adopter in big data, crunching numbers for the CDC, NASA, and the EPA.

“For the first time, whether it’s through the high-performance parallel computing schemes that have become economical, or just the advent of cloud computing as a new delivery model, people can do extensive parallel processing of gobs and gobs of data … that allows us to detect patterns, using the right tools.”

Different Stages of Adoption

Federal agencies vary in their big data capabilities. Some are working on collecting data with different types of sensors, while others are using advanced analytics to try to predict behavior.

This doesn’t necessarily reflect big data maturity levels, GovCon experts say. Agencies have different needs: Those seeking to integrate health records need different kinds of capacities and security levels than those sharing climate science data with the public as transparently and completely as possible.

For instance, under GSA, started in 2009 with 47 data sets. The site now publishes at least twice that many sets in a single month, giving citizens open access to data ranging from the latest food recalls to checking how that federal data center consolidation is progressing. The site is now moving to a single, unified cloud as part of a $21-million, five-year contract with CGI.

Other agencies are deep in analytics: executing sophisticated mining for improper payments, for instance.

Whether an agency is most concerned with protecting or disseminating information, it must look at three categories as it works with big data: collection, storage and processing, and analytics. Adding to the big data challenge, these categories often happen at the same time—data is processed and stored at once and analyzed as soon as it is collected. This simultaneous action makes alignment and strategic thinking critical. What you do with the data at any point will affect all other points. Set yourself up incorrectly, and you risk re-creating the same problems you came in with—silos, hidden data, excess storage, lost value, to name a few.

“As a new sensor or new data set gets created, we look at how agencies would integrate that information into their enterprise architecture, how they will store that data, and how they will make it readily available to their customers,” said Matthew Fahle, senior account executive, Accenture. “You have to have a balanced view across the framework.”

Taking a closer look at the categories:

Collection: From a teen’s cell phone GPS signal to sensors on a UAV in a conflict zone, data is collected constantly—and much of it by the government. Two issues most influence GovCon big data activity: historical data and unstructured data.

Using historical data, which has already been collected, and integrating this with new data is like trying to analyze a snowball rolling downhill—it picks up volume with every second. Unstructured data—video, images, everything that’s not easy to compare relationally—is a significant challenge at every stage of the process, and more and more, it represents the bulk of big data.

“To develop patterns out of unstructured data automatically, without human intervention, using analytic tools, is tough,” said Khanna. “The IT industry as a whole has been focused on structured data, but it’s the unstructured kind that is powering much of the revolution.”

Further complicating matters, said Khanna, are data disciplines left over from the past. “Data is going to waste and not being fully leveraged because it didn’t fit the data model or the retention criteria.”

Storage/processing: In its major role as a storage provider to the federal government, NetApp has developed all kinds of efficiencies in dealing with structured data. But with unstructured data, “there aren’t enough disc drives in the world to store all the data these sensors are going to suck in, nor does anyone have enough money to buy all those disc drives,” said Mark Weber, senior vice president at NetApp U.S. Public Sector.

This remains true even as storage costs go down and capacity goes up: Recently, IBM reported it encoded a single data bit onto a surface consisting of 12 atoms (typical storage would need a million atoms to store the same amount of information). And NetApp is now able to provide upwards of 55 petabytes of storage for the Department of Energy’s Sequoia supercomputer at Lawrence Livermore National Laboratory.

Regardless, nobody needs to store for 30 years three days of video surveillance in which nothing happens—so the trick is to find ways to analyze and store only what matters.

The need to stream together historical and real-time data presents other storage challenges. Developing tools for this stage, Weber said, represents a huge opportunity. Most clients today are buying proportionately more software than simply storage capacity from NetApp because they’re seeking protection, retrieval, and the ability to use cost-effective levels of storage. Government agencies also face different regulatory requirements for data storage, and putting these seamlessly into practice with full compliance presents another opportunity.

One added wrinkle to consider in storage and processing: Data is currency, and everyone who collects it seeks ways to trade and share it to squeeze the most value out. No one wants to end up with the data equivalent of a rare baseball card hidden at the bottom of a box in the basement.

Analytics: Analytics is where the value of big data is concentrated. The point of all the collection and storage is to make better decisions based on what you learn from the data. Of course, it’s not the analytics that are new, but the scale.

Big data analytics can offer a clear picture of the current situation or forecast or even change the future. Brad Eskind, principal and federal technology and analytics leader at Deloitte, calls them “predictive and prescriptive analytics” because they help determine what will occur and what behavior to adjust. “I would say we’ve turned the corner on the descriptive aspects of big data. We’re now into the capability of providing the types of insights and foresight that can help improve outcomes.”

“The scale provided by the cloud provides a basis to transform the process of analysis from one of stitching together sparse data to derive conclusions to one of extracting conclusions from aggregation and distillation of massive data and data reflections,” said Herman of Booz Allen Hamilton. “In essence, this capability is the means by which to deliver on the promise of big data by creating new value for clients.”

“Effectively, the model of the past is being flipped,” Khanna said. Where once applications that were developed to fulfill a business need generated data, now the data drives the development of applications. “It’s a future of data as a service,” he said.

The potential of big data derives not only from this vastly enhanced pattern recognition and analysis, but from the way it happens in near-real time, thanks to the simultaneity that big data and cloud analytics allow. Khanna sketches a picture from financial services: A trader on the floor can make split-second decisions based on global information from the past few seconds analyzed in context with historical data. With such a snapshot, it’s hard not to see big data as a transformative factor in the financial market—or in any number of other markets (see sidebar, Prime Big Data Markets).

Intelligence is another natural area where the integration of historical and current data can guide near-real-time decisions. For instance, SAIC is leading a team at Georgia Tech looking at applying big data/business intelligence techniques to detecting insider threats.

Going End-to-End

Several GovCon companies have expanded their core competencies to address multiple services across the big data framework. SAP’s Sybase company is just one that has developed advanced analytics specifically to handle big data.

GeoEye, recognizing there’s more to big data than collecting imagery, has invested in analytics, hosting, dissemination, and processing technologies while continuing its core capacity to collect, process, and disseminate more than 1 million square kilometers of high-resolution imagery daily. Last year, GeoEye won a $3.8-billion, 10-year contract with the National Geospatial-Intelligence Agency for imagery and other products and services.

“To eliminate the complexity of dealing with big data, we have moved into providing hosting and dissemination services directly to our government customers,” said Brian O’Toole, GeoEye chief technology officer. “In the past, the U.S. government would be solely responsible for receiving our imagery, cataloging the raw data, and making it available in a government-run archival system. Now we provide those services, through a software-as-a-service model.”

Big Challenges Still Ahead

With all the benefits and GovCon potential attached to big data, there are still a few obstacles to realizing the full value. The biggest, as usual with government functions, is security. How do you avoid showing your hand with data being shuffled and cut on every possible playing table?

Compliance issues could get lost in the shuffle, some have pointed out. The promise of using healthcare IT and records in research has already prompted calls from the federal Health Information Technology Policy Committee to clarify what constitutes “research” and to define how widely big health data will be disseminated and used.

“To address big data security, Northrop Grumman has developed unique methods that protect data in the cloud regardless of specific service models,” said Kathy Warden, vice president and general manager, Northrop Grumman Information Systems, Cyber Intelligence Division. “This approach prevents exposure of the data to unauthorized users and protects the data from being accessed by cloud provider insiders, like system administrators. The solution allows users to securely migrate mission-critical data to the cloud.”

Big data by its nature destroys silos. And if agencies want to reap the best, they’ll need the capabilities, the policy backup, and the security to exchange data freely.

Khanna sees a lack of standards in managing unstructured data as a key roadblock. The big players such as Google, Amazon, and Facebook aren’t talking to each other about standardization—yet efficient government use of big data depends on standards. The National Institute of Standards and Technology has been working for some time to get standards in unstructured data, and IT lab director Chuck Romine recently told media that it’s going to step up the pace.

Another emerging challenge and major need is the “readability” of data analysis. GeoEye is focused on visual representation, using its EyeQ platform to provide on-demand visualization in near-real time, even in its legacy applications. In addition to working in every other aspect of big data, SAP-GSS provides solutions that render big data insights in intuitive, visual, user-friendly formats.

While big data may appear to present a storage nightmare, its abundance is actually a good problem to have. “Data that used to be ‘retired’ to tape when data warehousing can now be kept alive for future analysis to yield valuable insights that we cannot imagine now,” said Peter Doolan, group vice president, Oracle Public Sector.

Such “zombie data” presents an opportunity—and Sotera in particular has lost no time in exploring it. “We have found that big data opens up a large number of non-traditional analytical techniques that augment traditional data analysis,” said Russell Richardson, CEO, Potomac Fusion, a Sotera Defense Solutions company. “These techniques offer a new purpose for information already collected, and in most cases, this existing data is re-purposed to derive military intelligence not conceived of when the data was originally collected.”

Lastly, the speed at which big data is changing the landscape presents a challenge—but one that might motivate faster changes in realizing big data’s value in government. “In the past, government has been known to set its own standards for performance and security,” said Rich Rosenthal, chief technology officer, TASC. “This is a case where the government could take advantage of the commercial sector’s forward thinking, adopt its approaches, and let the entrepreneurs move full-speed ahead.”


The Open-Source Factor

When you use Twitter, LinkedIn, or dozens of other social media services, you’ve used Hadoop, the open-source technology at the heart of big data.

But Hadoop is actually part of an “ecosystem” comprising Hadoop storage and MapReduce processing. The technology from Apache hasn’t yet encountered a data set too large to analyze and can work across one or multiple computers. It is described as “forgiving” or “self-healing” because it compensates for problems found on servers.

Many GovCon companies work with Cloudera, which develops open-source distribution for Hadoop, integrating and “cleaning up” products to ensure reliability. Oracle’s Big Data Appliance uses Cloudera Hadoop, for instance; Sotera is another using this combination.

However, because it’s open and free, agencies themselves have been building their own big data technologies. NSA, for one, has had a distributed intelligence database with Hadoop for two years.

So why would agencies look to contractors if they can do it themselves? Because while Hadoop is forgiving, it’s not endlessly elastic. Getting the right fit for an agency’s mission can be a challenge. In short, it’s not free if it takes up valuable staff resources to adopt it.

“Contractors have an opportunity to enhance and extend the open-source technologies in a way that focuses their power on the business needs of the government and makes them easier for agency personnel to adopt,” said Peter Doolan, group vice president, Oracle Public Sector. “None of the [solutions] completely solve a business need ‘out of the box,’ so they must be adapted to the specific use and then maintained. When the total cost of ownership is considered, including ancillaries like end-user training, sometimes the commercial solutions, while more expensive up front, may prove to be significantly cheaper over time.”

Microsoft, meanwhile, is concentrating on bringing big data to the masses. “There are two major challenges facing any organization utilizing big data today,” said Susie Adams, chief technology officer, Microsoft, “installing and maintaining the complex array of software and hardware needed to run big data workloads and finding individuals with the skill sets needed to operate this infrastructure.” Microsoft addresses both, she said, by enabling its Office, business intelligence, cloud, and enterprise data processing platforms to connect to Hadoop, which can run as a service on Azur e or on-premise on a Windows server.

As pioneers in open-source, Red Hat continues to lead, and late last year acquired Gluster, an open-source storage solutions provider. “There has been a tremendous level of interest from our federal customers and prospects,” said Paul Smith, vice president and general manager, public sector, Red Hat. “The federal government is a big user of open-source technologies.”

Prime Big Data Markets

Notorious bank robber Willie Sutton, when asked why he robbed banks, is said to have replied: “Because that’s where the money is.” Taking the same approach to market focus, Booz Allen Hamilton is “loosely tied to the Willie Sutton principle—in other words, we are going where the data is kept,” said Mark Herman, executive vice president.

In which markets do government contracting companies see the data “banked”? Intelligence, of course, is the most obvious. But here are a few others:

Financial and improper payments: Analytics are already advanced in this market, and big data just makes the case stronger and the fraud detection more precise. Deloitte is traditionally strong in this area. OPM is using SAS analytic software to scan for improper payments in federal employee health insurance claims.

Electronic health records: This area is being mined for value far and wide, from CMS to the VA. The benefits include better quality of care, increased productivity, and elimination of redundancies. SAS software is being used to analyze millions of records in the CMS Chronic Condition Data Warehouse, a case of historical and current data working together for new insights.

Climate and weather science: From solar flares to rogue waves, understanding and predicting weather on earth and in space both depends on big data and benefits from it. Breaking silos could lead to advances such as coordinating emergency services, social media, and weather prediction during a storm, for instance.

Trade and immigration: The flow of goods and people generates enormous amounts of data, all of which could guide decisions in economic policy, law and drug enforcement, public health action, food safety measures, homeland security, and more.

Many more markets stand to benefit from big data where government contracting can apply. Supply chain and logistics, with its constantly growing complexity, is another natural market, one in which several government contractors, including SAIC, have expertise. Utilities are another, and again, SAIC shows up here, providing “smart grid” solutions in an area where usage data is vast and individualized, timing is everything, and significant, ongoing savings in costs and energy can be realized once what’s learned from that data is applied.

Big Data Helps Drive Acquisitions

Many GovCon companies beefed up their big data capacities through acquisitions. Here are a few transactions that resulted in enhanced big data positioning:

IBM expanded big data analytics software offerings by acquiring intelligence analytics firm i2 Group.

RedHat acquired Gluster, an open-source storage provider.

General Dynamics purchased healthcare IT provider Vangent, with its strengths in electronic health records and data analytics.

CSC bought Maricom Systems, Inc., a provider of healthcare IT business intelligence and data management, and iSOFT Group Limited, and appointed its former CEO as chief operating officer of CSC’s healthcare business.

TASC acquired TexelTek, which specializes in geospatial, visualization, and analytics.

ICF acquired Ironworks, a web, mobile, and social media platform provider in health, energy, and financial services sectors.

Sotera bought Potomac Fusion, which develops cloud and ISR solutions.

Deloitte bought Ubermind, which develops intuitive mobile applications.

NetApp bought LSI, whose fast and dense storage performance represents a significant big data opportunity.

Oracle acquired Endeca, with capabilities in unstructured data and business intelligence.

ManTech acquired Evolvent, a healthcare systems integrator with contracts for big data related projects for CMS and the VA.


2 Responses for “The BIG Picture on BIG DATA”

  1. [...] explores how big data is impacting the contracting community in its feature article, “The Big Picture on Big Data.” Industry experts weigh in on how the 2.5 quintillion bytes of data produced each day poses [...]

  2. [...] Read the full article on GovConExec. [...]

Leave a Reply