From: Tim Freeman (tim@fungible.com)
Date: Mon Aug 20 2007 - 06:35:48 MDT
From: Matt Mahoney <matmahoney@yahoo.com>
>[1] Legg, Shane, and Marcus Hutter (2006), A Formal Measure of Machine
>Intelligence, Proc. Annual machine learning conference of Belgium and The
>Netherlands (Benelearn-2006). Ghent, 2006.
>http://www.vetta.org/documents/ui_benelearn.pdf
Excellent reference. Thanks for posting it. I like how they deal
with the problem of how to discount delayed gratification. Just
incorporate the utility adjustment as a consequence of delay into the
environment, and require the total reward from the environment to be
no more than 1. (That's equation 2 on page 5.)
I had never seen a principled, parameter-free way to do that before.
-- Tim Freeman http://www.fungible.com tim@fungible.com
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:58 MDT