undefined | Better HN

0 pointspbgcp202611d ago0 comments

" it supports prompt caching" May I ask if you checked that? I use "{"cachePoint": { "type": "default" }" and I found 2 things: * 1) even if stated in the Doco, Bedrock Converse API does not allow 1hr expiry time, only 5m - gives error when attempted; * 2) Bedrock Converse API does accept up to 4 cachePoint's but does NOT cache and returns zeroes. LOL. It was confirmed by some other people on Github. (Note: VertexAI does cache properly reducing the bill drastically, so I use Vertex instead of OpenRouter.)

0 comments

nijave10d ago

I had Claude Code pull the OTEL trace and calculate cost based on token counts in the responses. I'll double check later today tho if I remember

Edit: I do see the first request shows 0 cache read, 7k cache write tokens. The next request shows 7k cache read, 900 cache write tokens. The agent run summary is:

usage {

cache_read_input_tokens 244586

cache_write_input_tokens 38399

completion_tokens 8131

input_tokens 1172

output_tokens 8131

prompt_tokens 1172

total_tokens 292288

}

I do see a recent issue in the Strands Agent issue tracker about 1hr TTL getting ignored and defaulting to 5m TTL. I haven't validated cache TTL but these agent runs take ~2-3m so a 5m TTL is sufficient.

I also checked the AWS bill and see separate Usage SKUs

USE1-MP:USE1_CacheWriteInputTokenCount-Units $0.34

USE1-MP:USE1_OutputTokenCount-Units $0.27

USE1-MP:USE1_CacheReadInputTokenCount-Units $0.16

USE1-MP:USE1_InputTokenCount-Units $0.01

j / k navigate · click thread line to collapse