|
We invented Reduced Error Logistic Regression (see Press Release for our US patent 8,032,473). Reduced Error Logistic Regression (RELR) is a completely
automated and very general predictive analytics method that allows Choice, Rating, Forecast, Survival Analysis and Interval Category modeling applications. A significant amount of evidence now indicates that RELR solves the costly "Breiman Quiet Scandal" problem related to ineffective and arbitrary attempts to avoid overfitting, multicollinearity error, prediction error, wrong signs of regression coefficients, and unstable variable selection in advanced analytics (see Executive White Paper). Unlike all other data mining or regression algorithms, RELR is the highly stable and most probable optimization solution in a predictive model that also models error and has no arbitrary parameters. For this reason, a different modeler will generate the very same model and an independent data sample is likely to generate a very similar model given even a minimal sample size.
The unique feature to RELR is that error
is accurately modeled and subtracted to give highly accurate, stable,
parsimonious, and interpretable predictive models. These models are generated
automatically and without any labor-intensive manual effort. An accurate RELR
model with very few parameters and correct regression coefficient signs can be built automatically in situations where it
could take a skilled statistician several weeks to build less accurate modeling for the same data
that would also require many more parameters and much more complex modeling. The reasons for this RELR labor savings are:
1. because RELR produces accurate interaction effects, so separate models do not need to be built for separate segments; 2. because RELR produces correctly signed regression coefficients, so time-consuming and risky sign adjustments need not be performed;
3. because RELR completely automates pre-processing of data; 4. because RELR handles error very effectively and thus completely avoids complex cross-validations and adjustments to deal with error;
5. because RELR's highly parsimonious variable selection is completely automated and avoids back-and-forth changes like in stepwise regression.
With high dimensional data, a RELR model
built with 1000-5000 observations is often more accurate than models built with
competing automated or manual methods with 100,000 observations, as RELR's models asymptote in accuracy at very small sample sizes. RELR models maintain this accuracy advantage even at larger sample sizes, but the accuracy advantage of RELR will always be most apparent at smaller samples sizes such as below 15,000 training observations or with high dimensional data that have a large number of candidate variables and/or candidate interaction effects or nonlinear effects. However, completely independent research evidence is that RELR also can maintain this accuracy advantage even with fairly low dimensional data having fewer than 50 candidate variables and with a moderate-sized training sample of 13,000 balanced binary observations; see RELRCaseStudyAugust2011. RELR has significantly outperformed all standard and widely used algorithms that it has been tested against in controlled tests of predictive accuracy using either average squared error, classification accuracy based upon a prior probability threshold or the KS statistic. While RELR has yet to be compared directly to Ensemble Modeling in controlled tests, RELR has properties that are similar to Ensemble Models built from hundreds of regression sub-models in that RELR's regression coefficients have very little error. Compared to Ensemble Models, we can say that RELR allows highly parsimonious and interpretable models to be built almost immediately, whereas ensemble models are not interpretable and may require a long time to build based upon hundreds of sub-models. To see a high level discussion of this RELR error reduction, see Executive White Paper for a non-technical review across a number of studies, or see a technical JSM Proceedings paper that can be downloaded from Papers and Presentations.
Our off-the-shelf software product is called MyRELRTM. MyRELR is a SAS macro that can be easily installed for use with SAS or the GUI product Enterprise Guide. The earliest adopters of MyRELR have been
very seasoned and skilled statisticians or analytics professionals, who may also be senior analytics
executives or manage analytics groups. They use RELR because it takes the
randomness and guesswork out of predictive modeling and makes their efforts or their team's efforts much more productive. Our MyRELR SAS macro software does not require any SAS programming expertise, as one fills out a menu to build or score a model. We have sold our MyRELR software to SAS users at companies using it in media,
marketing, marketing research, entertainment, and financial services
applications. We also license our RELR patent and rapid optimization trade secrets for RELR development in big data venues such as Hadoop/MapReduce in a C++ application called SkyRELR that is currently just beginning to be developed.
Recent News
Oct 4, 2011. We were issued a patent today for our RELR technology by the US Patent Office. A full description of the significance of this patent is available at this link to our Patent Press Release Article.
August 4, 2011. A
new case study is available from a completely independent researcher
with no connections to Rice Analytics that is a comparison of RELR with
Random Forests Logistic Regression, LASSO, LARS, Stepwise Regression,
and Bayesian Networks and shows that RELR outperforms these other
algorithms in classification accuracy by an average of 2-4%. Here is
the link RELRCaseStudyAugust2011 to the page on our website where it can be viewed.
July 26, 2011. We
are now offering a three part training course on the RELR algorithm
which can be viewed immediately for free through recorded web sessions
or can be taken live for a cost.You can access the free training or
enroll for the live training on the Training & Licensing page of this website.
September 27, 2010. St.
Louis, MO (USA) - A very brief article written by Dan Rice and
entitled "Is the AUC the Best Measure?" was published in the online
industry
newsletter KDnuggets.com this month. This article reviews new data that
indicate that the AUC is significantly less accurate than other
measures of predictive modeling accuracy. Professor David Hand of
Imperial College in London, the President of the Royal Statistical
Society and one of the most widely cited machine learning and AUC
researchers, communicated to us about this article. We reference his
work in the postscript and it makes for even a stronger case that there
is a major problem with the AUC as a measure of predictive modeling
accuracy. The complete one page article with the postscript can be read
by clicking on this link to the AUC Article page of this website.
January 8, 2010. St. Louis, MO (USA) - We announce today that Rice Analytics has decided to name its flagship Reduced Error Logistic Regression (RELR) software product MyRELR. In the four preceding years of research, development, beta tests, and rollout, it was simply called RELR. RELR and Reduced Error Logistic Regression are terms for the statistical regression method, but they do not lend themselves to terms for a branded software product. MyRELR is the name chosen because it does fit the software product category, it incorporates the previous identify to RELR, and it can be trademarked. Because the regression method and the software product can usually be used interchangeably, we will continue to use the term RELR, but we will now use MyRELR in specific reference to the branded software product.
September 9, 2009. St. Louis, MO (USA) - Our new executive white paper written by Dan Rice and entitled "Breiman's Quiet Scandal: Stepwise Logistic Regression and RELR" was in the Publications section of the online industry newletter KDnuggets.com on August 27, 2009 (issue 09:n16). This item had the Most Clicks by Subscribers and was the 2nd Most Viewed item overall of 41 items that were published that week. This article written in "plain business English" for executives reviews the major difficulties with Stepwise Logistic Regression that were pointed out by the late statistician Leo Breiman. This article also reviews evidence that our RELR method may be a solution to these problems. The complete white paper can be downloaded by clicking this link to the Executive White Paper page of this website.
June 15, 2009. St. Louis, MO (USA) - Dan Rice gave an invited address last week at the 2009 Classification Society Annual Conference from June 11-13 at the Washington University Medical School. This conference brought together roughly a hundred experts from major universities and businesses in the areas of machine learning, choice modeling, and classification research. This conference was truly international in scope and had attendees from many industrialized countries. However, the relatively small size of this conference compared to JSM allowed for an extended discussion between attendees over the course of several days. The title of this talk was "Reduced Error Logistic Regression". This talk can be downloaded from the Papers and Presentations page of this website.
June 3, 2009. St. Louis, MO (USA) - We have now updated the Case Studies page of this website with credit scoring results from three major banks and one credit card company. The most impressive result is that one user reports a lift from RELR in the KS statistic from roughly 40 to 65 compared to other methods.
February 12, 2009. St. Louis, MO (USA) - Rice Analytics, a SAS Alliance Partner, and the exclusive provider of Reduced Error Logistic Regression (RELR) software announced today that it is proud to be a sponsor of this year's Midwest SAS User Group Conference (MWSUG) in Cleveland, Ohio in October, 2009. Dan Rice was a speaker at the 2008 MWSUG conference in a session on Reduced Error Logistic Regression. The MWSUG is one of the larger regional statistical conferences - approximately 300 people attended the 2008 conference in Indianapolis, Indiana.
August 6, 2008. Denver, CO (USA) - Dan Rice spoke today at the Data Mining and Machine Learning Session of the 2008 Joint Statistical Meetings in Denver, Colorado. This session was chaired by Bill Heavlin of Google Inc. and had good speakers from the United States Army, Medical University of China, University of Alabama, Bell Labs, and the University of California at Berkeley. This session was extremely well attended with a standing-room-only crowd. This standing-room-only crowd and the lively discussions prompted Bill Heavlin to say that this session "was the best session at the conference". The title of Rice's talk was "Generalized Reduced Error Logistic Regression Machine". In this presentation, Rice provided evidence that Reduced Error Logistic Regression is able to reduce error significantly compared to Penalized Logistic Regression, Step-Wise Logistic Regression and four other standard methods. A full article coinciding with this talk and published in JSM 2008 Proceedings can be downloaded from the Papers and Presentations page of this website. The Joint Statistical Meetings is one of the largest gatherings of statisticians in the world. Approximately 5000 people attended this conference in Denver this summer.
|
|
|
|
|
|
Copyright, 2006-2011 Rice Analytics, All rights reserved.
Rice AnalyticsTM, ParsedRELRTM, MyRELRTM and SkyRELRTMare trademarks of Rice Analytics, St. Louis, MO. SAS, Enterprise Miner, and Enterprise Guide are trademarks of SAS Institute, Cary, NC. | |