Tag Archives: python

How ExecuTorch handles cross attention KV cache?

Context In encoder-decoder transformer models, the decoder layer normally consists of a cross attention which performs key and value projections for encoder hidden states and calculate attention score between that and the query projection. Notice that in common Seq2seq models … Continue reading

Posted in Uncategorized | Tagged , , , , | Leave a comment