fhs
Volume 9, Number 38 -- October 20, 2009

Info Builders Prophesizes World Series Winner with Predictive Analytics

Published: October 20, 2009

by Alex Woodie

When it comes to making predictions, there are as many techniques as there are people. Some may involve a twig, a piece of string, and a really strong hunch, while others use giant supercomputers and trillions of bytes of data. One company trying to democratize the data-driven approach is i OS business intelligence software vendor Information Builders, which recently crunched baseball statistics to come up with the most probable winner of the World Series. Hint: the winner's initials start with "LA," not "NY."

Unlike any other sport, baseball is a game deeply rooted in statistics. More than one hundred year's worth of data has been meticulously recorded, making the results of every at-bat of every inning of every game in every season on hand for posterity. This data is available free of charge from several sites on the Web, which made for a fun and easy test bed of data for WebFOCUS RStat, a new predictive analytics component of Info Builder's business intelligence software suite.

To get started with its World Series prediction project, Info Builders downloaded the statistics that it figured would make the most sense for its purpose. The company restricted its search to all the teams that made the playoffs since divisional play began in 1969, or less than 200 teams, according to Kevin Quinn, vice president of product marketing at New York City-based Info Builders.

The goal of the exercise was to determine which statistics correlated most closely with the teams that have won the World Series, and then to crunch the data to determine the winner of the World Series. In other words, the software looks at what was the most common statistical denominator among teams that won the World Series in the past, and applies that to the present teams and their statistical footprints.

Info Builder's pulled all kinds of data from the archives, including things like batting averages, ERA, and runs scored, according to Quinn, who for the record is not a Yankees fan. ("There are 4 million people from New York that hate the Yankees. They're called Mets fans.")

Then came the hard part: Figuring out the best way to interpret the data, which can sometimes resemble art more than science. "You basically play with the data," Quinn says. "You try a couple of different algorithms until you see an algorithm that seems to come up with something that seems logical. That's what we did."

Some of the approaches didn't work. For example, one algorithm said that wining percentage and the number of team walks were the most predictive of a World Series crown. If that was the case, RStat predicted that the New York Yankees had the highest probability--19 to 20 percent--of winning it all. However, the tool also found that every other team had a 0 percent probability, which didn't make any sense.

"That's possible with any software. You throw so much stuff at it that it doesn't mean anything," Quinn says. "That's why there's a little bit of work that goes into predictive analytics. You need to have an understanding of the data, what generally is considered to make sense. You need to narrow things down from your own logic standpoint, then you start to come up with models that come up with predictions that seem to make sense."

In the end, Info Builders settled on a group of statistics believed to be the most indicative of a World Series winner. They included winning percentage, runs scored, batting average, total extra base hits, ERA, and fielding percentage.

After running all of the teams' season stats through decision tree and linear logistical regression algorithms, RStat determined that the Los Angeles Dodgers had a 34 percent chance of winning the World Series, compared to 32 percent for the Los Angeles Angels of Anaheim, and 29 percent for the New York Yankees. The next closest teams, including defending World Series champion Philadelphia Phillies--which currently hold a 2-1 lead over the hated Dodgers (editor's prerogative) in the NLCS--had chances lower than 15 percent.

Obviously, unless you're in the T-shirt or business, who wins the World Series will not be terribly significant to the future of your company. But substitute the search for a World Series winner with a search for the optimal inventory level, or the search for the hottest sales region, and you can see how this software may apply to your business.

"What we're trying to show here is the software can be used for any purpose," Quinn says. "Baseball is a fun thing, but it can be used to predict everything from what students are the most likely to graduate from university to using it to predict the best time to discount prices to maximize profits and sales."

And while the RStat software runs on Windows or Linux software, there's nothing to prevent System i shops from using it to analyze their historical data, housed in Info Builder's WebFOCUS software running on the System i.


RELATED STORIES

IBI Updates WebFOCUS BI Platform

DB2 Web Query Updates to Ship in August

DB2 Web Query Goes Multiplatform

IBM Prepares to Launch DB2 Web Query for System i

More Details Emerge on Query/400's Java-Based Replacement

IBM to Distribute Info Builders' iSeries BI Tools



                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot


Sponsored By
ARCAD SOFTWARE

Looking for Easy-to-use Test Automation
with a rapid ROI ?

If you're seeking to massively improve testing productivity--with low test creation costs, and automatic regression testing of new releases--join us for a demo of ARCAD-Qualifier:

· Record/Replay of test scenarios from 5250/Web
· Detection of regressions in data / spools / user interface
· Simplified repository-based scenario maintenance
· Extraction of coherent test datasets from production data

We can help. Improve application reliability, and save in testing costs. . .

To find out more
800-676-4709
www.arcadsoftware.com


Editor: Alex Woodie
Contributing Editors: Dan Burger, Joe Hertvik,
Shannon O'Donnell, Timothy Prickett Morgan
Publisher and Advertising Director: Jenny Thomas
Advertising Sales Representative: Kim Reed
Contact the Editors: To contact anyone on the IT Jungle Team
Go to our contacts page and send us a message.

Sponsored Links

Infor:  Visit the first System i Virtual Conference hosted by Infor and IBM. View on-demand Webinar.
CCSS:  Need Pro-Active Management of Your IBM® i Server? We can help.
Patrick Townsend Security Solutions:  Get a customized state privacy law compliance report


 

IT Jungle Store Top Book Picks

Easy Steps to Internet Programming for AS/400, iSeries, and System i: List Price, $49.95
The iSeries Express Web Implementer's Guide: List Price, $49.95
The System i RPG & RPG IV Tutorial and Lab Exercises: List Price, $59.95
The System i Pocket RPG & RPG IV Guide: List Price, $69.95
The iSeries Pocket Database Guide: List Price, $59.00
The iSeries Pocket SQL Guide: List Price, $59.00
The iSeries Pocket Query Guide: List Price, $49.00
The iSeries Pocket WebFacing Primer: List Price, $39.00
Migrating to WebSphere Express for iSeries: List Price, $49.00
Getting Started With WebSphere Development Studio Client for iSeries: List Price, $89.00
Getting Started with WebSphere Express for iSeries: List Price, $49.00
Can the AS/400 Survive IBM?: List Price, $49.00
Chip Wars: List Price, $29.95


 
The Four Hundred
IBM Dynamic Infrastructure Announcements Due October 20

Steady as She Goes for IBM's Third Quarter

IBM i Access to Support Windows 7 on December 1

Mad Dog 21/21: Oy, Cloudy Us!

IBM Slashes i Compiler and Rational Tool Prices

Four Hundred Guru
Getting the Message, Part 1

How Do I Find What's Not There?

Admin Alert: Locking Down i5/OS System Security Values

Four Hundred Monitor
Four Hundred Monitor's
Full iSeries Events Calendar

System i PTF Guide
October 17, 2009: Volume 11, Number 42

October 10, 2009: Volume 11, Number 41

October 3, 2009: Volume 11, Number 40

September 26, 2009: Volume 11, Number 39

September 19, 2009: Volume 11, Number 38

September 12, 2009: Volume 11, Number 37

September 5, 2009: Volume 11, Number 36

TPM at The Register
Boffins fawn over dirt cheap server clusters

Ellison whips out his Sparc TPC-C test

Sun tunes its VirtualBox

IBM, Intel execs arrested over alleged insider trading

US boffins use Obama dough to study clouds

IBM: Power7 to rollout throughout 2010

HP peddles app stress-testing cloud

IBM wrings more profits out of declining Q3

Oracle revs Xen VM to 2.2

Intel and the Nehalem bump

Citrix chases VMware with Hyper-V deal

HP invites you to touch its PCs

THIS ISSUE SPONSORED BY:

looksoftware
PowerTech
Maximum Availability
ARCAD Software
East Coast Computer


Printer Friendly Version


TABLE OF CONTENTS
Jarman Flashes Clues on Future DB2 and RPG Directions

i365 Launches New EVault Backup Appliance, Cloud Storage Service

nuBridges Delivers Major Upgrade to MFT Solution

Info Builders Prophesizes World Series Winner with Predictive Analytics

Oracle Encourages JD Edwards Customers to Hang Tight

News Briefs and Product Shorts:

HiT Adds Traceability to Data Transformation Software . . . AquaFold Makes It Easier to Reverse Engineer and Migrate Databases . . . Computer Guidance Lands First Hosted Customer for Construction App . . . Big Rig Dealer Streamlines Payments with ACOM Software . . . Kronos Acquires Time and Attendance Software from Paychex . . .

Four Hundred Stuff

BACK ISSUES




 
Subscription Information:
You can unsubscribe, change your email address, or sign up for any of IT Jungle's free e-newsletters through our Web site at http://www.itjungle.com/sub/subscribe.html.

Copyright © 1996-2009 Guild Companies, Inc. All Rights Reserved.
Guild Companies, Inc., 50 Park Terrace East, Suite 8F, New York, NY 10034

Privacy Statement