QLearning Class Reference

implements QLearning More...

#include <qlearning.h>

Inherits Configurable, and Storeable.

Inheritance diagram for QLearning:

Inheritance graph
[legend]
Collaboration diagram for QLearning:

Collaboration graph
[legend]
List of all members.

Public Member Functions

 QLearning (double eps, double discount, double exploration, int eligibility, bool random_initQ=false, bool useSARSA=false, int tau=1000)
virtual ~QLearning ()
virtual void init (unsigned int stateDim, unsigned int actionDim, RandGen *randGen=0)
 initialisation with the given number of action and states
virtual unsigned int select (unsigned int state)
 selection of action given current state.
virtual unsigned int select_sample (unsigned int state)
 selection of action given current state.
virtual unsigned int select_keepold (unsigned int state)
 select with preference to old (90% if good) and 30% second best
virtual double learn (unsigned int state, unsigned int action, double reward, double learnRateFactor=1)
matrix::Matrix getActionValues (unsigned int state)
 returns the vector of values for all actions given the current state
virtual void reset ()
 tells the q learning that the agent was reset, so that it forgets it memory.
virtual unsigned int getStateDim () const
 returns the number of states
virtual unsigned int getActionDim () const
 returns the number of actions
virtual double getCollectedReward () const
 returns the collectedReward reward
virtual const matrix::MatrixgetQ () const
 returns q table (mxn) == (states x actions)
virtual bool store (FILE *f) const
 stores the object to the given file stream (binary).
virtual bool restore (FILE *f)
 loads the object from the given file stream (binary).

Static Public Member Functions

static int valInCrossProd (const std::list< std::pair< int, int > > &vals)
 expects a list of value,range and returns the associated state
static std::list< int > ConfInCrossProd (const std::list< int > &ranges, int val)
 expects a list of ranges and a state/action and return the configuration

Public Attributes

bool useSARSA
 if true, use SARSA strategy otherwise qlearning

Protected Attributes

double eps
double discount
double exploration
double eligibility
bool random_initQ
int tau
 time horizont for averaging the reward
matrix::Matrix Q
int * actions
 < Q table (mxn) == (states x actions)
int * states
double * rewards
int ringbuffersize
double * longrewards
int t
bool initialised
double collectedReward
RandGenrandGen

Detailed Description

implements QLearning


Constructor & Destructor Documentation

QLearning ( double  eps,
double  discount,
double  exploration,
int  eligibility,
bool  random_initQ = false,
bool  useSARSA = false,
int  tau = 1000 
)

Parameters:
eps learning rate (typically 0.1)
discount discount factor for Q-values (typically 0.9)
exploration exploration rate (typically 0.02)
eligibility number of steps to update backwards in time
random_initQ if true Q table is filled with small random numbers at the start (default: false)
useSARSA if true, use SARSA strategy otherwise qlearning (default: false)
tau number of time steps to average over reward for col_rew

~QLearning (  )  [virtual]


Member Function Documentation

std::list< int > ConfInCrossProd ( const std::list< int > &  ranges,
int  val 
) [static]

expects a list of ranges and a state/action and return the configuration

unsigned int getActionDim (  )  const [virtual]

returns the number of actions

matrix::Matrix getActionValues ( unsigned int  state  ) 

returns the vector of values for all actions given the current state

double getCollectedReward (  )  const [virtual]

returns the collectedReward reward

virtual const matrix::Matrix& getQ (  )  const [inline, virtual]

returns q table (mxn) == (states x actions)

unsigned int getStateDim (  )  const [virtual]

returns the number of states

void init ( unsigned int  stateDim,
unsigned int  actionDim,
RandGen randGen = 0 
) [virtual]

initialisation with the given number of action and states

Parameters:
actionDim number of actions
stateDim number of states
unit_map if 0 the parametes are choosen randomly. Otherwise the model is initialised to represent a unit_map with the given response strength.

double learn ( unsigned int  state,
unsigned int  action,
double  reward,
double  learnRateFactor = 1 
) [virtual]

void reset (  )  [virtual]

tells the q learning that the agent was reset, so that it forgets it memory.

please note, that updating the Q-table is one step later, so in case of a reward you should call learn one more time before reset.

bool restore ( FILE *  f  )  [virtual]

loads the object from the given file stream (binary).

Implements Storeable.

unsigned int select ( unsigned int  state  )  [virtual]

selection of action given current state.

The policy is to take the actions with the highest value, or a random action at the rate of exploration

unsigned int select_keepold ( unsigned int  state  )  [virtual]

select with preference to old (90% if good) and 30% second best

unsigned int select_sample ( unsigned int  state  )  [virtual]

selection of action given current state.

The policy is to sample from the above average actions, with bias to the old action (also exploration included).

bool store ( FILE *  f  )  const [virtual]

stores the object to the given file stream (binary).

Implements Storeable.

int valInCrossProd ( const std::list< std::pair< int, int > > &  vals  )  [static]

expects a list of value,range and returns the associated state


Member Data Documentation

int* actions [protected]

< Q table (mxn) == (states x actions)

double collectedReward [protected]

double discount [protected]

double eligibility [protected]

double eps [protected]

double exploration [protected]

bool initialised [protected]

double* longrewards [protected]

matrix::Matrix Q [protected]

RandGen* randGen [protected]

bool random_initQ [protected]

double* rewards [protected]

int ringbuffersize [protected]

int* states [protected]

int t [protected]

int tau [protected]

time horizont for averaging the reward

bool useSARSA

if true, use SARSA strategy otherwise qlearning


The documentation for this class was generated from the following files:
Generated on Fri Oct 30 16:29:02 2009 for Robot Simulator of the Robotics Group for Self-Organization of Control by  doxygen 1.4.7