Machine Learning

elasticNet

nz.ac.waikato.cms.weka : elasticNet

An implementation of the elastic net method for linear regression

Last Version: 1.0.1

Release Date:

DTNB

nz.ac.waikato.cms.weka : DTNB

Class for building and using a decision table/naive bayes hybrid classifier. At each point in the search, the algorithm evaluates the merit of dividing the attributes into two disjoint subsets: one for the decision table, the other for naive Bayes. A forward selection search is used, where at each step, selected attributes are modeled by naive Bayes and the remainder by the decision table, and all attributes are modelled by the decision table initially. At each step, the algorithm also considers dropping an attribute entirely from the model. For more information, see: Mark Hall, Eibe Frank: Combining Naive Bayes and Decision Tables. In: Proceedings of the 21st Florida Artificial Intelligence Society Conference (FLAIRS), 318-319, 2008.

Last Version: 1.0.3

Release Date:

WekaExcel

nz.ac.waikato.cms.weka : WekaExcel

WekaExcel adds support to directory read from and write to spreadsheets in Microsoft Excel 97-2007 format. It uses Apache POI (http://poi.apache.org/), specifically POI-HSSF and POI-XSSF (http://poi.apache.org/spreadsheet/), in order to read/write Excel spreadsheets.

Last Version: 1.0.8

Release Date:

naiveBayesTree

nz.ac.waikato.cms.weka : naiveBayesTree

Class for generating a decision tree with naive Bayes classifiers at the leaves. For more information, see Ron Kohavi: Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid. In: Second International Conference on Knoledge Discovery and Data Mining, 202-207, 1996.

Last Version: 1.0.2

Release Date:

NNge

nz.ac.waikato.cms.weka : NNge

Nearest-neighbor-like algorithm using non-nested generalized exemplars (which are hyperrectangles that can be viewed as if-then rules). For more information, see Brent Martin (1995). Instance-Based learning: Nearest Neighbor With Generalization. Hamilton, New Zealand. Sylvain Roy (2002). Nearest Neighbor With Generalization. Christchurch, New Zealand.

Last Version: 1.0.2

Release Date:

CLOPE

nz.ac.waikato.cms.weka : CLOPE

Yiling Yang, Xudong Guan, Jinyuan You: CLOPE: a fast and effective clustering algorithm for transactional data. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, 682-687, 2002.

Last Version: 1.0.2

Release Date:

averagedOneDependenceEstimators

nz.ac.waikato.cms.weka : averagedOneDependenceEstimators

AODE achieves highly accurate classification by averaging over all of a small space of alternative naive-Bayes-like models that have weaker (and hence less detrimental) independence assumptions than naive Bayes. The resulting algorithm is computationally efficient while delivering highly accurate classification on many learning tasks. For more information, see G. Webb, J. Boughton, Z. Wang (2005). Not So Naive Bayes: Aggregating One-Dependence Estimators. Machine Learning. 58(1):5-24.

Last Version: 1.2.1

Release Date:

hiddenNaiveBayes

nz.ac.waikato.cms.weka : hiddenNaiveBayes

Contructs Hidden Naive Bayes classification model with high classification accuracy and AUC. For more information refer to: H. Zhang, L. Jiang, J. Su: Hidden Naive Bayes. In: Twentieth National Conference on Artificial Intelligence, 919-924, 2005.

Last Version: 1.0.2

Release Date:

localOutlierFactor

nz.ac.waikato.cms.weka : localOutlierFactor

A filter that applies the LOF (Local Outlier Factor) algorithm to compute an outlier score for each instance in the data. Can use multiple cores/cpus to speed up the LOF computation for large datasets. Nearest neighbor search methods and distance functions are pluggable. For more information, see: Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, Jorg Sander (2000). LOF: Identifying Density-Based Local Outliers. ACM SIGMOD Record. 29(2):93-104.

Last Version: 1.0.4

Release Date:

ordinalLearningMethod

nz.ac.waikato.cms.weka : ordinalLearningMethod

An implementation of the Ordinal Learning Method (OLM). Further information regarding the algorithm and variants can be found in: Arie Ben-David (1992). Automatic Generation of Symbolic Multiattribute Ordinal Knowledge-Based DSSs: methodology and Applications. Decision Sciences. 23:1357-1372.

Last Version: 1.0.2

Release Date:

filteredAttributeSelection

nz.ac.waikato.cms.weka : filteredAttributeSelection

This package provides two meta attribute selection evaluators that can apply an arbitrary filter to the input data before executing the actual attribute selection scheme. One filters data and then passes it to an attribute evaluator (FilteredAttributeEval), and the other filters data and then passes it to a subset evaluator (FilteredSubsetEval).

Last Version: 1.0.2

Release Date:

realAdaBoost

nz.ac.waikato.cms.weka : realAdaBoost

Class for boosting a 2-class classifier using the Real Adaboost method. For more information, see J. Friedman, T. Hastie, R. Tibshirani (2000). Additive Logistic Regression: a Statistical View of Boosting. Annals of Statistics. 95(2):337-407.

Last Version: 1.0.2

Release Date:

ensembleLibrary

nz.ac.waikato.cms.weka : ensembleLibrary

Manages a libary of ensemble classifiers

Last Version: 1.0.4

Release Date:

conjunctiveRule

nz.ac.waikato.cms.weka : conjunctiveRule

This class implements a single conjunctive rule learner that can predict for numeric and nominal class labels. A rule consists of antecedents "AND"ed together and the consequent (class value) for the classification/regression. In this case, the consequent is the distribution of the available classes (or mean for a numeric value) in the dataset. If the test instance is not covered by this rule, then it's predicted using the default class distributions/value of the data not covered by the rule in the training data.This learner selects an antecedent by computing the Information Gain of each antecendent and prunes the generated rule using Reduced Error Prunning (REP) or simple pre-pruning based on the number of antecedents. For classification, the Information of one antecedent is the weighted average of the entropies of both the data covered and not covered by the rule. For regression, the Information is the weighted average of the mean-squared errors of both the data covered and not covered by the rule. In pruning, weighted average of the accuracy rates on the pruning data is used for classification while the weighted average of the mean-squared errors on the pruning data is used for regression.

Last Version: 1.0.4

Release Date:

metaCost

nz.ac.waikato.cms.weka : metaCost

This metaclassifier makes its base classifier cost-sensitive using the method specified in Pedro Domingos: MetaCost: A general method for making classifiers cost-sensitive. In: Fifth International Conference on Knowledge Discovery and Data Mining, 155-164, 1999. This classifier should produce similar results to one created by passing the base learner to Bagging, which is in turn passed to a CostSensitiveClassifier operating on minimum expected cost. The difference is that MetaCost produces a single cost-sensitive classifier of the base learner, giving the benefits of fast classification and interpretable output (if the base learner itself is interpretable). This implementation uses all bagging iterations when reclassifying training data (the MetaCost paper reports a marginal improvement when only those iterations containing each training instance are used in reclassifying that instance).

Last Version: 1.0.3

Release Date:

probabilityCalibrationTrees

nz.ac.waikato.cms.weka : probabilityCalibrationTrees

Provides probability calibration trees (PCTs) for local calibration of class probability estimates. To achieve calibration of a base learner, the PCT class must be used as the meta learner in the CascadeGeneralization class, which is also included in this package. The classifier to be calibrated must be used as the base learner in the CascadeGeneralization class. The CascadeGeneralization class can also be used independently to perform CascadeGeneralization for ensemble learning. The code for PCTs is largely the same as the LMT code for growing logistic model trees. For more details, see the ACML paper on probability calibration trees.

Last Version: 1.0.0

Release Date:

dualPerturbAndCombine

nz.ac.waikato.cms.weka : dualPerturbAndCombine

Class for building and using classification and regression trees based on the closed-form dual perturb and combine algorithm described in Pierre Geurts, Lous Wehenkel: Closed-form dual perturb and combine for tree-based models. In: Proceedings of the 22nd International Conference on Machine Learning, 233-240, 2005.

Last Version: 1.0.0

Release Date:

simpleEducationalLearningSchemes

nz.ac.waikato.cms.weka : simpleEducationalLearningSchemes

Simple learning schemes for educational purposes (Prism, Id3, IB1 and NaiveBayesSimple).

Last Version: 1.0.2

Release Date:

LibLINEAR

nz.ac.waikato.cms.weka : LibLINEAR

A wrapper class for the liblinear tools (the liblinear classes, typically the jar file, need to be in the classpath to use this classifier). Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, Chih-Jen Lin (2008). LIBLINEAR - A Library for Large Linear Classification.

Last Version: 1.9.7

Release Date:

fuzzyUnorderedRuleInduction

nz.ac.waikato.cms.weka : fuzzyUnorderedRuleInduction

FURIA: Fuzzy Unordered Rule Induction Algorithm. For details please see: Jens Christian Huehn, Eyke Huellermeier (2009). FURIA: An Algorithm for Unordered Fuzzy Rule Induction. Data Mining and Knowledge Discovery.

Last Version: 1.0.2

Release Date:

alternatingDecisionTrees

nz.ac.waikato.cms.weka : alternatingDecisionTrees

Binary-class and multi-class alternating decision trees. For more information see: Freund, Y., Mason, L.: The alternating decision tree learning algorithm. In: Proceeding of the Sixteenth International Conference on Machine Learning, Bled, Slovenia, 124-133, 1999. Geoffrey Holmes, Bernhard Pfahringer, Richard Kirkby, Eibe Frank, Mark Hall: Multiclass alternating decision trees. In: ECML, 161-172, 2001.

Last Version: 1.0.5

Release Date:

attributeSelectionSearchMethods

nz.ac.waikato.cms.weka : attributeSelectionSearchMethods

This package provides four search methods for attribute selection: ExhaustiveSearch, GeneticSearch, RandomSearch and RankSearch. See: David E. Goldberg (1989). Genetic algorithms in search, optimization and machine learning. Addison-Wesley. Mark Hall, Geoffrey Holmes (2003). Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering. 15(6):1437-1447.

Last Version: 1.0.7

Release Date:

extraTrees

nz.ac.waikato.cms.weka : extraTrees

Package for generating a single Extra-Tree. Use with the RandomCommittee meta classifier to generate an Extra-Trees forest for classification or regression. This classifier requires all predictors to be numeric. Missing values are not allowed. Instance weights are taken into account. For more information, see Pierre Geurts, Damien Ernst, Louis Wehenkel (2006). Extremely randomized trees. Machine Learning. 63(1):3-42.

Last Version: 1.0.2

Release Date:

fuzzyLaticeReasoning

nz.ac.waikato.cms.weka : fuzzyLaticeReasoning

The Fuzzy Lattice Reasoning Classifier uses the notion of Fuzzy Lattices for creating a Reasoning Environment. The current version can be used for classification using numeric predictors. For more information see: I. N. Athanasiadis, V. G. Kaburlasos, P. A. Mitkas, V. Petridis: Applying Machine Learning Techniques on Air Quality Data for Real-Time Decision Support. In: 1st Intl. NAISO Symposium on Information Technologies in Environmental Engineering (ITEE-2003), Gdansk, Poland, 2003; V. G. Kaburlasos, I. N. Athanasiadis, P. A. Mitkas, V. Petridis (2003). Fuzzy Lattice Reasoning (FLR) Classifier and its Application on Improved Estimation of Ambient Ozone Concentration.

Last Version: 1.0.2

Release Date:

hotSpot

nz.ac.waikato.cms.weka : hotSpot

HotSpot learns a set of rules (displayed in a tree-like structure) that maximize/minimize a target variable/value of interest. With a nominal target, one might want to look for segments of the data where there is a high probability of a minority value occuring (given the constraint of a minimum support). For a numeric target, one might be interested in finding segments where this is higher on average than in the whole data set. For example, in a health insurance scenario, find which health insurance groups are at the highest risk (have the highest claim ratio), or, which groups have the highest average insurance payout.

Last Version: 1.0.14

Release Date:

wavelet

nz.ac.waikato.cms.weka : wavelet

A filter for wavelet transformation. For more information see: Wikipedia (2004). Discrete wavelet transform. Kristian Sandberg (2000). The Haar wavelet transform. University of Colorado at Boulder, USA.

Last Version: 1.0.2

Release Date:

timeseriesForecasting

nz.ac.waikato.cms.weka : timeseriesForecasting

Provides a time series forecasting environment for Weka. Includes a wrapper for Weka regression schemes that automates the process of creating lagged variables and date-derived periodic variables and provides the ability to do closed-loop forecasting. New evaluation routines are provided by a special evaluation module and graphing of predictions/forecasts are provided via the JFreeChart library. Includes both command-line and GUI user interfaces. Sample time series data can be found in ${WEKA_HOME}/packages/timeseriesForecasting/sample-data.

Last Version: 1.1.27

Release Date:

isolationForest

nz.ac.waikato.cms.weka : isolationForest

Class for building and using a classifier built on the Isolation Forest anomaly detection algorithm. For more information see Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou. 2008. Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pages 413-422.

Last Version: 1.0.2

Release Date:

classificationViaClustering

nz.ac.waikato.cms.weka : classificationViaClustering

A simple meta-classifier that uses a clusterer for classification. For cluster algorithms that use a fixed number of clusterers, like SimpleKMeans, the user has to make sure that the number of clusters to generate are the same as the number of class labels in the dataset in order to obtain a useful model. Note: at prediction time, a missing value is returned if no cluster is found for the instance. The code is based on the 'clusters to classes' functionality of the weka.clusterers.ClusterEvaluation class by Mark Hall.

Last Version: 1.0.7

Release Date:

javaFXScatter3D

nz.ac.waikato.cms.weka : javaFXScatter3D

A visualization component for displaying a 3D scatter plot of the data using Java 3D. Requires Java 3D to be installed. This version adds built-in sampling controls to the GUI. The default sampling percentage is set so that a maximum of 5000 instances are plotted. The user can adjust this higher or lower to suit their available processing speed and memory.

Last Version: 1.0.0

Release Date:

phmm4weka

nz.ac.waikato.cms.weka : phmm4weka

This Java software implements Profile Hidden Markov Models (PHMMs) for protein classification for the WEKA workbench. Standard PHMMs and newly introduced binary PHMMs are used. In addition the software allows propositionalisation of PHMMs.

Last Version: 1.1.3

Release Date:

tiny-weka

nz.ac.waikato.cms.weka : tiny-weka

The Waikato Environment for Knowledge Analysis (WEKA), a machine learning workbench. This artifact represents the bare API of the developer version, with no package manager, PMML, XML or user interface. It is aimed at commercial applications that license some of WEKA's algorithms.

Last Version: 3.9.15955

Release Date:

sequentialInformationalBottleneckClusterer

nz.ac.waikato.cms.weka : sequentialInformationalBottleneckClusterer

Cluster data using the sequential information bottleneck algorithm. Note: only hard clustering scheme is supported. sIB assign for each instance the cluster that have the minimum cost/distance to the instance. The trade-off beta is set to infinite so 1/beta is zero. For more information, see: Noam Slonim, Nir Friedman, Naftali Tishby: Unsupervised document classification using sequential information maximization. In: Proceedings of the 25th International ACM SIGIR Conference on Research and Development in Information Retrieval, 129-136, 2002.

Last Version: 1.0.2

Release Date:

complementNaiveBayes

nz.ac.waikato.cms.weka : complementNaiveBayes

Class for building and using a Complement class Naive Bayes classifier. For more information see: Jason D. Rennie, Lawrence Shih, Jaime Teevan, David R. Karger: Tackling the Poor Assumptions of Naive Bayes Text Classifiers. In: ICML, 616-623, 2003. P.S.: TF, IDF and length normalization transforms, as described in the paper, can be performed through weka.filters.unsupervised.StringToWordVector.

Last Version: 1.0.3

Release Date:

alternatingModelTrees

nz.ac.waikato.cms.weka : alternatingModelTrees

Grows an alternating model tree by minimising squared error. For more information see "Eibe Frank, Michael Mayo, Stefan Kramer: Alternating Model Trees. In: Proceedings of the ACM Symposium on Applied Computing, Data Mining Track, 2015".

Last Version: 1.0.0

Release Date:

logarithmicErrorMetrics

nz.ac.waikato.cms.weka : logarithmicErrorMetrics

Provides root mean square logarithmic error and mean absolute logarithmic error for evaluating regression schemes.

Last Version: 1.0.0

Release Date:

RBFNetwork

nz.ac.waikato.cms.weka : RBFNetwork

RBFNetwork implements a normalized Gaussian radial basisbasis function network. It uses the k-means clustering algorithm to provide the basis functions and learns either a logistic regression (discrete class problems) or linear regression (numeric class problems) on top of that. Symmetric multivariate Gaussians are fit to the data from each cluster. If the class is nominal it uses the given number of clusters per class. RBFRegressor implements radial basis function networks for regression, trained in a fully supervised manner using WEKA's Optimization class by minimizing squared error with the BFGS method. It is possible to use conjugate gradient descent rather than BFGS updates, which is faster for cases with many parameters, and to use normalized basis functions instead of unnormalized ones.

Last Version: 1.0.8

Release Date:

EMImputation

nz.ac.waikato.cms.weka : EMImputation

Replaces missing numeric values using Expectation Maximization with a multivariate normal model. Described in " Schafer, J.L. Analysis of Incomplete Multivariate Data, New York: Chapman and Hall, 1997."

Last Version: 1.0.2

Release Date:

kfPMMLClassifierScoring

nz.ac.waikato.cms.weka : kfPMMLClassifierScoring

A Knowledge Flow plugin that provides a Knowledge Flow step for scoring test sets or instance streams using a PMML classifier.

Last Version: 1.0.3

Release Date:

ensemblesOfNestedDichotomies

nz.ac.waikato.cms.weka : ensemblesOfNestedDichotomies

A meta classifier for handling multi-class datasets with 2-class classifiers by building an ensemble of nested dichotomies. For more info, check Lin Dong, Eibe Frank, Stefan Kramer: Ensembles of Balanced Nested Dichotomies for Multi-class Problems. In: PKDD, 84-95, 2005. Eibe Frank, Stefan Kramer: Ensembles of nested dichotomies for multi-class problems. In: Twenty-first International Conference on Machine Learning, 2004.

Last Version: 1.0.6

Release Date:

kfKettle

nz.ac.waikato.cms.weka : kfKettle

Knowledge Flow step that provides an entry point for data coming from the Kettle ETL tool.

Last Version: 1.0.5

Release Date:

oneClassClassifier

nz.ac.waikato.cms.weka : oneClassClassifier

Performs one-class classification on a dataset. Classifier reduces the class being classified to just a single class, and learns the datawithout using any information from other classes. The testing stage will classify as 'target'or 'outlier' - so in order to calculate the outlier pass rate the dataset must contain informationfrom more than one class. Also, the output varies depending on whether the label 'outlier' exists in the instances usedto build the classifier. If so, then 'outlier' will be predicted, if not, then the label willbe considered missing when the prediction does not favour the target class. The 'outlier' classwill not be used to build the model if there are instances of this class in the dataset. It cansimply be used as a flag, you do not need to relabel any classes. For more information, see: Kathryn Hempstalk, Eibe Frank, Ian H. Witten: One-Class Classification by Combining Density and Class Probability Estimation. In: Proceedings of the 12th European Conference on Principles and Practice of Knowledge Discovery in Databases and 19th European Conference on Machine Learning, ECMLPKDD2008, Berlin, 505--519, 2008.

Last Version: 1.0.4

Release Date:

supervisedAttributeScaling

nz.ac.waikato.cms.weka : supervisedAttributeScaling

Package containing a class that rescales the attributes in a classification problem based on their discriminative power. This is useful as a pre-processing step for learning algorithms such as the k-nearest-neighbour method, to replace simple normalization. Each attribute is rescaled by multiplying it with a learned weight. All attributes excluding the class are assumed to be numeric and missing values are not permitted. To achieve the rescaling, this package also contains an implementation of non-negative logistic regression, which produces a logistic regression model with non-negative weights .

Last Version: 1.0.2

Release Date:

dagging

nz.ac.waikato.cms.weka : dagging

This meta classifier creates a number of disjoint, stratified folds out of the data and feeds each chunk of data to a copy of the supplied base classifier. Predictions are made via majority vote, since all the generated base classifiers are put into the Vote meta classifier. Useful for base classifiers that are quadratic or worse in time behavior, regarding number of instances in the training data. For more information, see: Ting, K. M., Witten, I. H.: Stacking Bagged and Dagged Models. In: Fourteenth international Conference on Machine Learning, San Francisco, CA, 367-375, 1997.

Last Version: 1.0.3

Release Date:

citationKNN

nz.ac.waikato.cms.weka : citationKNN

Modified version of the Citation kNN multi instance classifier. For more information see: Jun Wang, Zucker, Jean-Daniel: Solving Multiple-Instance Problem: A Lazy Learning Approach. In: 17th International Conference on Machine Learning, 1119-1125, 2000.

Last Version: 1.0.2

Release Date:

denormalize

nz.ac.waikato.cms.weka : denormalize

An instance filter that collapses instances with a common grouping ID value into a single instance. Useful for converting transactional data into a format that Weka's association rule learners can handle. IMPORTANT: assumes that the incoming batch of instances has been sorted on the grouping attribute. The values of nominal attributes are converted to indicator attributes. These can be either binary (with f and t values) or unary with missing values used to indicate absence. The later is Weka's old market basket format, which is useful for Apriori. Numeric attributes can be aggregated within groups by computing the average, sum, minimum or maximum.

Last Version: 1.0.3

Release Date:

hyperPipes

nz.ac.waikato.cms.weka : hyperPipes

Class implementing a HyperPipe classifier. For each category a HyperPipe is constructed that contains all points of that category (essentially records the attribute bounds observed for each category). Test instances are classified according to the category that "most contains the instance". Does not handle numeric class, or missing values in test cases. Extremely simple algorithm, but has the advantage of being extremely fast, and works quite well when you have "smegloads" of attributes.

Last Version: 1.0.2

Release Date:

paceRegression

nz.ac.waikato.cms.weka : paceRegression

Class for building pace regression linear models and using them for prediction. Under regularity conditions, pace regression is provably optimal when the number of coefficients tends to infinity. It consists of a group of estimators that are either overall optimal or optimal under certain conditions. The current work of the pace regression theory, and therefore also this implementation, do not handle: - missing values - non-binary nominal attributes - the case that n - k is small where n is the number of instances and k is the number of coefficients (the threshold used in this implmentation is 20) For more information see: Wang, Y (2000). A new approach to fitting linear models in high dimensional spaces. Hamilton, New Zealand. Wang, Y., Witten, I. H.: Modeling for optimal probability prediction. In: Proceedings of the Nineteenth International Conference in Machine Learning, Sydney, Australia, 650-657, 2002.

Last Version: 1.0.2

Release Date:

ordinalStochasticDominance

nz.ac.waikato.cms.weka : ordinalStochasticDominance

An implementation of the Ordinal Stochastic Dominance Learner. Further information regarding the OSDL-algorithm can be found in: S. Lievens, B. De Baets, K. Cao-Van (2006). A Probabilistic Framework for the Design of Instance-Based Supervised Ranking Algorithms in an Ordinal Setting. Annals of Operations Research; Kim Cao-Van (2003). Supervised ranking: from semantics to algorithms; Stijn Lievens (2004). Studie en implementatie van instantie-gebaseerde algoritmen voor gesuperviseerd rangschikken

Last Version: 1.0.2

Release Date:

chiSquaredAttributeEval

nz.ac.waikato.cms.weka : chiSquaredAttributeEval

Evaluates the worth of an attribute by computing the value of the chi-squared statistic with respect to the class.

Last Version: 1.0.4

Release Date: