Learners : Matchbox recommender : Command-line runners

Data splitter

The data splitter allows you to easily split the instance data into a training and a test set. It takes a number of parameters which control the splitting algorithm explained in the corresponding data mapping section.

Required parameters

  • input-data - the dataset to split
  • output-data-train - training part of the split dataset
  • output-data-test - test part of the split dataset

Optional parameters

  • training-users - fraction of training-only users; defaults to 0.5
  • test-user-training-ratings - fraction of test user ratings for training; defaults to 0.25
  • cold-users - fraction of cold (test-only) users; defaults to 0
  • cold-items - fraction of cold (test-only) items; defaults to 0
  • ignored-users - fraction of ignored users; defaults to 0
  • ignored-items - fraction of ignored items; defaults to 0
  • remove-occasional-cold-items - remove occasionally produced cold items

Example

Learner Recommender SplitData --input-data AllData.dat
                              --output-data-train TrainingSet.dat 
                              --output-data-test TestSet.dat
                              --ignored-users 0.8 
                              --training-users 0.1 
                              --remove-occasional-cold-items

©2009-2015 Microsoft Corporation. All rights reserved.  Terms of Use | Trademarks | Privacy Statement