For a postpaid service with usage-based billing, there are no separate "free" and "paid" plans (= what you're clearly thinking of when you're saying "tiers" here.)
The "free tier" of these services, is a set of per-usage-SKU monthly usage credit bonuses, that are set up in such a way that if you are using reasonable "just testing" amounts of resources, your bill for the month will be credited down to $0.
And yes, this does mean that even when you're paying for some AWS services, you're still benefitting from the "free tier" for any service whose usage isn't exceeding those free-tier limits. That's why it's a [per-SKU usage] tier, rather than a "plan."
If you're familiar with electricity providers telling you that you're about to hit a "step-up rate" for your electricity usage for the month — that's exactly the same type of usage tier system. Except theirs goes [cheap usage] -> [expensive usage], whereas IaaS providers' tiers go [free usage] -> [costed usage].
> Amazon should halt the application when it exceeds quota.
There is no easy way to do this in a distributed system (which is why IaaS services don't even try; and why their billing dashboards are always these weird detached things that surface billing only in monthly statements and coarse-grained charts, with no visibility into the raw usage numbers.)
There's a lot of inherent complexity of converting "usage" into "billable usage." It involves not just muxing usage credit-spend together, but also classifying spend from each system into a SKU [where the appropriate bucket for the same usage can change over time]; and then a lot of lookups into various control-plane systems to figure out whether any bounded or continuous discounts and credits should be applied to each SKU.
And that means that this conversion process can't happen in the services themselves. It needs to be a separate process pushed out to some specific billing system.
Usually, this means that the services that generate billable usage are just asynchronously pushing out "usage-credit spend events" into something like a log or message queue; and then a billing system is, asynchronously, sucking these up and crunching through them to emit/checkpoint "SKU billing events" against an invoice object tied to a billing account.
Due to all of the extra steps involved in this pipeline, the cumulative usage that an IaaS knows about for a given billing account (i.e. can fire a webhook when one of those billing events hits an MQ topic) might be something like 5 minutes out-of-date of the actual incoming usage-credit-spend.
Which means that, by the time any "trigger" to shut down your application because it exceeded a "quota" went through, your application would have already spent 5 minutes more of credits.
And again, for a large, heavily-loaded application — the kind these services are designed around — that extra five minutes of usage could correspond to millions of dollars of extra spend.
Which is, obviously, unacceptable from a customer perspective. No customer would accept a "quota system" that says you're in a free plan, yet charges you, because you accrued an extra 5 minutes of usage beyond the free plan's limits before the quota could "kick in."
But nor would the IaaS itself just be willing to eat that bill for the actual underlying costs of serving that extra 5 minutes of traffic, because that traffic could very well have an underlying cost of "millions of dollars."
So instead they just say "no, we won't implement a data-plane billable-usage-quota feature; if you want it, you can either implement it yourself [since your L7 app can observe its usage 'live' much better than our infra can] or, more idiomatically to our infra, you can ensure that any development project is configured with appropriate sandboxing + other protections to never get into a situation where any resource could exceed its the free-tier-credited usage in the first place."