So the last two days I was sprinting at PyCon CA. My original intent was to investigate tools for formalizing experiments with machine learning and reinforcement learning in particular. What we started with (and in the end, ended with) was a kind of "implement reinforcement (Deep Q) learning from scratch" project, where we tried to reconstruct my memories of Deep Q as a simple stand-alone solution and then spent the rest of the time debugging the result.
We had lots of moments where we wound up seeing no learning due to dumb/simple/avoidable errors (mostly mine). At least some of those were due to us trying to build up the solution from scratch level-by-level. That is, we tried to do first a random agent, then an agent that just mapped state directly to best-action... and oops, turns out that one is really just not going to work. The Q function really is needed to make the problem tractable, and we lost a lot of time trying to get to a stable point before going to the final solution. We were seeing lots of "passing" solutions with a straight random search (and a half a dozen random searches that were not intentionally random searches, but were actually vanishing or exploding gradients), but obviously the end of that process was junk. We also had our epsilon gradient calculation wrong, which meant that we'd just stop exploring way too early...
At the end of the sprint, it really does seem I need to focus on more formalizing and providing testing/insight during experiments. So many of the problems should have been obvious if we'd had tooling to show that given fields and values were being set to 0 or 1. At two different points we were debugging for a significant amount of time just to discover we had 0 records being passed into the training due to a missed append call. Dumb/simple metrics, insights into the weight tables (so you can see whether patterns are forming or just dropping to a single value), historic trial data to show you progress/regressions, etc. etc.
Pingbacks are closed.