Journalism works.
https://en.wikipedia.org/wiki/Investigations_into_the_Eric_A...
Or was it perhaps one of those cases where they found issues, but the only way to really know for sure that the deleterious impact is significant enough by pushing it to prod?
How do you QA black box non-deterministic system? I'm not being facetious, seriously asking.
EDIT: Formatting
The thing is (and maybe this is what parent meant by non-determinism, in which case I agree it's a problem), in this brave new technological use-case, the space of possible interactions dwarfs anything machines have dealt with before. And it seems inevitable that the space of possible misunderstandings which can arise during these interactions will balloon similarly. Simply because of the radically different nature of our AI interlocutor, compared to what (actually, who) we're used to interacting with in this world of representation and human life situations.
Considering Louis Rossmann's videos on his adventures with NYC bureaucracy (e.g. [0]), the QAers might not have known the laws any better than the chat bot.
I'm sure they QA'd it, but QA was probably "does this give me good results" (almost certainly 'yes' with an LLM), not "does this consistently not give me bad results".
LLMs can handle search because search is intentionally garbage now and because they can absorb that into their training set.
Asking highly specific questions about NYC governance, which can change daily, is almost certainly 'not' going to give you good results with an LLM. The technology is not well suited to this particular problem.
Meanwhile if an LLM actually did give you good results it's an indication that the city is so bad at publishing information that citizens cannot rightfully discover it on their own. This is a fundamental problem and should be solved instead of layering a $600k barely working "chat bot" on top the mess.
https://dl.acm.org/doi/epdf/10.1145/3780063.3780066 (PDF loads slow....)
https://apnews.com/article/eric-adams-crypto-meme-coin-942ba...
I think he is simply not very bright, and got mesmerized by all the shiny promises AI and crypto makes without the slightest understanding of how it actually works. I do not understand how he got into office in the first place.
There’s no amount of qa that could save this.
Perhaps a big fat check was involved.
Just days out of office, he made a few million off a crypto scam. Buffoonishly corrupt. https://finance.yahoo.com/news/eric-adams-promoted-memecoin-...
What I do know is that if it was done my way it would be pretty easy for it to do what the Google AI does. Say it's not responsible, give links for humans to fact check it. I've noticed a dramatic drop in hallucinations after it had to provide links to its sources. Still not 0, though.
I’ve noticed that Google does a fair job at linking to relevant sources but it’s still fairly common for it to confabulate something that source doesn’t say or even directly contradicts. It seems to hit the underlying inability to reason where if the source covers more than one thing it’s prone to taking an input “X does A while Y does B” and emitting “Y does A” or “X does A and B”. It’s a fascinating failure mode which seems to be insurmountable.
I thought Gemini just started providing citations in the last few months. Are you saying they should have beaten Google to the punch on this? As part of the $500,000 budget?
Also, search says they did links in 2024 for the Google AI. So there's that.
When is the last time there was positive news involving Microsoft? This bot could've easily been on AWS or GCP but I find it hilarious that here they are, getting dragged yet again
https://organiser.org/2026/02/10/339366/world/zohran-mamdani...
This was experimental tech... while I admire cities attempting to implement AI, it seems they did not spend enough tax dollars on it!
[0] https://abc7ny.com/post/ai-artificial-intelligence-eric-adam...