Raw collection of random things I've read around for better RL but now forgotten the sources. Hopefully, I can refine this sometime.
Flexible function approx.
Experience replay buffer + mini-batch SGD
Double Q-Learning - reduce maximization bias
Average Q-Learning - reduce variance
Optimistic initializations - initialize to upper bound of Q-values