Text this: Quantifying capability gaps via information relaxation and deep reinforcement learning in infinite-horizon Markov decision processes