I am trying to formulate an RNN that uses attention to easily detect the central element of a sequence. For an RNN alone this is not an easy task but with attention, it should be but I am not entirely certain how to design it. The goal of this question is to understand both mechanisms better.


So for example I have (10,20,30) or (10,20,30,40,50) given as input sequence. At input 30 the RNN should output 20 at position 50 -> 30 and so forth.


My idea for the RNNs hidden state is to just increase it by 1. The hidden state h would just be a scalar. e.g. (10,20,30) produces the states (1,2,3)

But now I am stuck as attention should work with the input and the hidden state. What I would need as output would be scored (0,1,0) * (10,20,30) = 20. The scoring function I come up with would be s(h, number, i) = 1 if h/2 == i else 0. But there I am using the index as an additional parameter / positional encoding and wondering if I can do it without it.


What could be other approaches to handcraft an RNN with attention to extracting the half-position element of a sequence?