Conference paper
Addressing partial observability in reinforcement learning for energy management
Department of Technology, Management and Economics, Technical University of Denmark1
Sustainability, Department of Technology, Management and Economics, Technical University of Denmark2
Energy Economics and System Analysis, Sustainability, Society and Economics, Department of Technology, Management and Economics, Technical University of Denmark3
Energy Systems Analysis, Sustainability, Department of Technology, Management and Economics, Technical University of Denmark4
Northumbria University5
Norwegian University of Science and Technology6
Automatic control of energy systems is affected by the uncertainties of multiple factors, including weather, prices and human activities. The literature relies on Markov-based control, taking only into account the current state. This impacts control performance, as previous states give additional context for decision making.
We present two ways to learn non-Markovian policies, based on recurrent neural networks and variational inference. We evaluate the methods on a simulated data centre HVAC control task. The results show that the off-policy stochastic latent actor-critic algorithm can maintain the temperature in the predefined range within three months of training without prior knowledge while reducing energy consumption compared to Markovian policies by more than 5%.
Language: | English |
---|---|
Publisher: | ACM |
Year: | 2021 |
Pages: | 324-328 |
Types: | Conference paper |
DOI: | 10.1145/3486611.3488730 |
ORCIDs: | Biemann, Marco and Liu, Xiufeng |
Computing methodologies HVAC control Learning paradigms Machine learning Markov processes Mathematics of computing POMDP Probabilistic reasoning algorithms Probability and statistics Reinforcement learning Stochastic processes Variational methods energy management recurrent neural networks reinforcement learning variational inference