tfh
Volume 18, Number 19 -- May 18, 2009

Jeff Jonas Explores the Nature of Data in COMMON Keynote

Corrected and Updated: May 21 and June 24, 2009

by Alex Woodie

Jeff Jonas knows a lot about data. Not only does the chief scientist of IBM's Entity Analytics group have a lot of actual data in his head, but he knows how to manipulate it and to get answers to security-related questions for governmental agencies and Las Vegas casinos. But without a breakthrough in how we store and query data, we'll soon be overwhelmed with more data that we can handle, leading to a decrease in understanding, Jonas told COMMON attendees during a keynote address at last month's convention.

At first glance, it appeared an interesting decision by COMMON managers to invite Jonas--an expert in creating IT security systems, with a focus on identity detection systems--to talk to a group of business-minded AS/400-types. Sure, the resident of Las Vegas, Nevada, probably worked with AS/400s while helping some of the biggest casinos on the Strip get a better grasp on identifying employees, vendors, and guests who might be participating in scams (it's a well-known fact that the biggest casinos run AS/400 iron). But as a security expert, Jonas' expertise is only tangentially related to AS/400 technology. Or is it?

In fact, Jonas' experience working with casinos and governmental agencies led to some interesting observations that cut across technological boundaries. During his keynote, Jonas--who blogs at www.jeffjonas.typepad.com--wowed the audience with enlightening stories about the nature of data, secrets of data mining, and the types of technological breakthroughs that are necessary if the IT industry wants to continue to claim that it's widening the breadth of what's knowable by users, companies, governments--by human beings--not shrinking it. From a business-oriented, AS/400 point of view, this has applicability to business intelligence and security.

First, Jonas established his bona fides with his audience. He founded Systems Research and Development in 1983. There, he developed a technology called Non Obvious Relationship Awareness (NORA) that can be used to spot similar identities across two or more databases. It was (and apparently still is) used in Vegas casinos. In 2001, SRD received funding from In-Q-Tel, which Jonas described as the private venture capital arm of the CIA, to help find criminals. After 9/11, Jonas was called to work with government agencies to help find terrorists. In 2005, SRD was bought by IBM and turned into IBM's Entity Analytics Solutions group. He's a true IT security geek and an accomplished triathlete, with a somewhat imposing demeanor and a rapid-fire way of talking.

IBM distinguished engineer Jeff Jonas, from his Youtube video on distributed security.

During his keynote, Jonas related a story that provided a good entry into the types of questions about data that he wrestles with. On a trip to Washington D.C., Jonas spoke with a counter-terrorism intelligence analyst at a governmental agency. "What do you wish you could have if you could have anything?" Jonas asked her. Answers to my questions faster, she said. "It sounds reasonable," Jonas told the audience, "but then I realized it was insane." Insane, because "What if the question was not a smart question today, but it's a smart question on Thursday?" Jonas says.

The point is, we cannot assume that data needed to answer the query existed and been recorded before the query was asked. In other words, it's a timing problem. "I said, 'What are the chances you could have every smart question, every day?'" Jonas asked. It's not a trivial question, and it doesn't have an easy answer. But it is Jonas' goal, however technically difficult (Jonas says it is attainable).

According to Jonas, organizations need to be asking questions constantly if they want to get smarter. If you don't query your data and test your previous assumptions with each new piece of data that you get, then you're not getting smarter.

Jonas related an example of a financial scam at a bank. An outside perpetrator is arrested, but investigators suspect he may have been working with somebody inside the bank. Six months later, one of the employees changes their home address in payroll system to the same address as in the case. How would they know that occurred, Jonas asked. "They wouldn't know. There's not a company out there that would have known, unless they're playing the game of data finds data and the relevance finds the user."

This led Jonas to expound his first principle. "If you do not treat new data in your enterprise as part of a question, you will never know the patterns, unless someone asks."

Constantly asking questions and evaluating new pieces of data can help an organization overcome what Jonas calls enterprise amnesia. "The smartest your organization can be is the net sum of its perceptions," Jonas told COMMON attendees.

Getting smarter by asking questions with every new piece of data is the same as putting a picture puzzle together, Jonas said. This is something that Jonas calls persistent context. "You find one piece that's simply blades of grass, but this is the piece that connects the windmill scene to the alligator scene," he says. "Without this one piece that you asked about, you'd have no way of knowing these two scenes are connected."

Sometimes, new pieces reverse earlier assertions. "The moment you process a new transaction (a new puzzle piece) it has the chance of changing the shape of the puzzle, and right before you go to the next piece, you ask yourself, 'Did I learn something that matters?'" he asks. "The smartest your organization is going to be is considering the importance right when the data is being stitched together."

Another project (not related to the government, but a commercial effort) had Jonas assisting an organization in compiling a database that correlated the identities of Americans with pieces of data from public records (such as property records, DMV records, phone books, etc). He knew there were about 300 million people in the U.S. But as Jonas started loading the data into his warehouse, the machine soon counted more than 300 million Americans. "We keep loading it, and pretty soon it says there are 600 million people in America--and if the number kept climbing to three billion, it surely would be a piece of junk. But my theory was it would collapse," he said.

He was right. Consider what happens when there are two records describing two different people as they appear to share the same name. "What happens is a third record shows up in the future that works like glue, which causes them to collapse," he said. Eventually, "the more data we loaded, the fewer number of people there were."

But large numbers can also work against you. At another federal agency (he wouldn't say which), Jonas got to thinking: What if they had a very large data warehouse in the basement with 4 exabytes (EB) of data, and it was expanding at the rate of 5 TB per minute. "You sit there and you realize you don't get to Friday night and run a batch job to answer the question of what does it all mean," he says. "You could use all the computing power and energy on Earth and you wouldn't be able to do it." The "it" he is referring to, of course, is seeing how each new piece of data affects all the other pieces of data.

"What's happening is data volumes are growing at this pace, yet an organization's ability to make sense of them isn't keeping up," Jonas said. "Today, say you can make sense of 7 percent of what's available, and in a few years it might be 4 percent, and in a few years after that it might be one percent. So the percentage of what's knowable is on the decline."

So while the sum of our knowledge is increasing, the ratio of what's knowable to the data that's available is getting smaller. Without some new technology to help "stitch things together," as Jonas puts it, we'll soon be wallowing in gobs of structured and unstructured data, with no discernable path out.

"I think the only way forward is going from applying algorithms to individual transactions, to first placing information in context--pixels to pictures--and only applying algorithms after one sees how the transaction relates to the other data," he said. "It's the only way that I can see that it's going to close this sense-making gap."

There is one thing software vendors can do to make their sense-making products more useful for the coming information explosion, Jonas said: Unify the data and the tools people use to query it.

Jonas sees this type of technology--loading queries into a database as data--helping to overcome the counter-terrorism intelligence analyst's dilemma of knowing when a question can be answered. "This is a nice and easy method that enables a future piece of data to find the question," he said in a follow-up e-mail after this story was first published. "In other words, if the question asked by the user has no answer today…if a piece of data that can answer the question arrives tomorrow, the system can alert the user that their question is now true."

IBM has shown a lot of interest lately in developing so-called "smart" sensor technology that sounds a lot like what Jonas is proposing. But is such a self-aware system even technologically attainable? "I see [the technological challenges] as trivial," he says in his e-mail follow-up. "This works well and is quite attainable."



This article has been corrected. In paragraph 13, the records on Americans that Jonas once analyzed for a private company came from public sources, such as property records and the phone book, not credit cards or employment rolls, as the story originally stated. Also, in paragraph 15, Jonas never worked for the government agency mentioned in the story, and had no direct knowledge of A) whether there was in fact a data warehouse in the basement, and B) how big that data warehouse might be if there, indeed, was one. Jonas also clarified several other statements attributed to him in the story.



                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot


Sponsored By
FUJITSU PROGRESSION

Smart Move Made Simple

Fujitsu PROGRESSION is a fast, simple, flexible solution
to support migration from RPG to .NET (VB, C#) while keeping
business logic intact as well as system integrity.

                                   · No additional Fujitsu software licensing fees or upkeep
                                   · Intact business logic
                                   · Fast, simple, flexible deployment
                                   · FREE Proof of Concept (PoC)

Contact us for more information
www.FujitsuProgression.com


Editor: Timothy Prickett Morgan
Contributing Editors: Dan Burger, Joe Hertvik, Brian Kelly, Shannon O'Donnell,
Mary Lou Roberts, Victor Rozek, Kevin Vandever, Hesh Wiener, Alex Woodie
Publisher and Advertising Director: Jenny Thomas
Advertising Sales Representative: Kim Reed
Contact the Editors: To contact anyone on the IT Jungle Team
Go to our contacts page and send us a message.

Sponsored Links

ProData Computer Services:  Simplify your iT with DBU, DBU RDB, and RDB Connect
Halcyon Software:  Webinar: How to Survive in IT with a reduced headcount, June 4
Aberdeen Group:  Take the 2009 ERP in Manufacturing survey, get a free copy of complete report

 

 

IT Jungle Store Top Book Picks

Easy Steps to Internet Programming for AS/400, iSeries, and System i: List Price, $49.95
The iSeries Express Web Implementer's Guide: List Price, $49.95
The System i RPG & RPG IV Tutorial and Lab Exercises: List Price, $59.95
The System i Pocket RPG & RPG IV Guide: List Price, $69.95
The iSeries Pocket Database Guide: List Price, $59.00
The iSeries Pocket SQL Guide: List Price, $59.00
The iSeries Pocket Query Guide: List Price, $49.00
The iSeries Pocket WebFacing Primer: List Price, $39.00
Migrating to WebSphere Express for iSeries: List Price, $49.00
Getting Started With WebSphere Development Studio Client for iSeries: List Price, $89.00
Getting Started with WebSphere Express for iSeries: List Price, $49.00
Can the AS/400 Survive IBM?: List Price, $49.00
Chip Wars: List Price, $29.95


 
Four Hundred Stuff
JDA Previews New GUI for MMS at User Conference

ARCAD Moves Open Systems Initiative Forward with Application Lifecycle Management Software

Vision Updates iTERA and MIMIX for i OS HA

Key Announces GA of Smart i Appliance for BI

Oracle Refreshes JD Edwards World, Updates Tools for EnterpriseOne

Four Hundred Guru
A Not-Quite-As-Sleepy RPG Program

SQL's Other Fetch Options

Admin Alert: Four Ways To Encrypt i5/OS Backups, Part 1

Four Hundred Monitor
Four Hundred Monitor's
Full iSeries Events Calendar

System i PTF Guide
May 16, 2009: Volume 11, Number 20

May 9, 2009: Volume 11, Number 19

May 2, 2009: Volume 11, Number 18

April 25, 2009: Volume 11, Number 17

April 18, 2009: Volume 11, Number 16

April 11, 2009: Volume 11, Number 15

TPM at The Register
Europeans go ga-ga over virtual servers

Fujitsu takes trip to Venus

IBM puts future profits in the bag

Oracle buys Virtual Iron

Sun proxy details its dating game

IBM kicks out Nehalem-free racks, towers

Hitachi scores largest loss in Japanese manufacturing history

HP forges Netweaver XML appliance

HP moves OpenVMS dev to India?

Rackable Systems slips into SGI's name

Fujitsu goes dense with Nehalem blades

IBM deals on big Power iron in Q2

Sun: 'We may have violated bribery laws'

Voltaire's sales plummet in Q1

THIS ISSUE SPONSORED BY:

LANSA
Infinite Software
Fujitsu PROGRESSION
SafeData
Bug Busters Software Engineering


Printer Friendly Version


TABLE OF CONTENTS
IBM Gets Hybrid with Servers, Talks Up BAO Boxes

Virtualization on i Boxes Depends on Consolidation, New Workloads

Jeff Jonas Explores the Nature of Data in COMMON Keynote

Mad Dog 21/21: Sometimes You Eat the Bear, Sometimes Its Porridge

Peeling Apart IBM's Q1 Server and Storage Sales

But Wait, There's More:

IBM Shows Off Power6+ Performance on SAP, Lawson Apps . . . Memory and Disk Prices Slashed on Selected Power i Gear . . . Older Power Iron Starts Heading for the Dustbin . . . IBM developerWorks Becomes Socially Acceptable . . . More Idle Talk About IBM or Microsoft Buying SAP . . .

The Four Hundred

BACK ISSUES




 
Subscription Information:
You can unsubscribe, change your email address, or sign up for any of IT Jungle's free e-newsletters through our Web site at http://www.itjungle.com/sub/subscribe.html.

Copyright © 1996-2009 Guild Companies, Inc. All Rights Reserved.
Guild Companies, Inc., 50 Park Terrace East, Suite 8F, New York, NY 10034

Privacy Statement