I had Claude Code pull the OTEL trace and calculate cost based on token counts in the responses. I'll double check later today tho if I remember
Edit:
I do see the first request shows 0 cache read, 7k cache write tokens. The next request shows 7k cache read, 900 cache write tokens. The agent run summary is:
usage {
cache_read_input_tokens
244586
cache_write_input_tokens
38399
completion_tokens
8131
input_tokens
1172
output_tokens
8131
prompt_tokens
1172
total_tokens
292288
}
I do see a recent issue in the Strands Agent issue tracker about 1hr TTL getting ignored and defaulting to 5m TTL. I haven't validated cache TTL but these agent runs take ~2-3m so a 5m TTL is sufficient.
I also checked the AWS bill and see separate Usage SKUs
USE1-MP:USE1_CacheWriteInputTokenCount-Units
$0.34
USE1-MP:USE1_OutputTokenCount-Units
$0.27
USE1-MP:USE1_CacheReadInputTokenCount-Units
$0.16
USE1-MP:USE1_InputTokenCount-Units
$0.01