Unless training on the pelican increases all SVG ability, then good job.
Is it possible they use the same base pre-trained model and just fine-tuned and RL-ed it better (which, of course, is where all the secret sauce training magic is these days anyhow)? That would be odd, especially for a major version bump, but it's sort of what having the same training cutoff points to?
> This model is not a modification or a fine-tune of a prior model.
I'm curious why they decided not to update the training data cutoff date too.
https://gally.net/temp/20251107pelican-alternatives/index.ht...
There seem to be one or two parsing errors. I'll fix those later.
I tried the first one and 5 Pro gives this: https://imgur.com/a/EhYroCE
https://gally.net/temp/20251107pelican-alternatives/index.ht...
As your example shows, GPT-5 Pro would probably be better that GPT-5.1, but the tokens are over ten times more expensive and I didn’t feel like paying for them.
I can only expect that the next step is something like "Have your AI read our AI's auto-generated summary", and so forth until we are all the way at Douglas Adams's Electric Monk:
> The Electric Monk was a labour-saving device, like a dishwasher or a video recorder. Dishwashers washed tedious dishes for you, thus saving you the bother of washing them yourself; video recorders watched tedious television for you, thus saving you the bother of looking at it yourself. Electric Monks believed things for you, thus saving you what was becoming an increasingly onerous task, that of believing all the things the world expected you to believe.
- from "Dirk Gently's Holistic Detective Agency"
Had to change to Electric Mentor....
What are we even doing.
The possibility was already a topic in the series "Mozart in the jungle" where they made a robot which supposedly finished the Requiem piece by Mozart.
That's basicaly "The Washing Machine Tragedy" by Stanislav Lem in a nutshell.
Not a single engineer has ever been laid off because of AI. Any company claiming this is the case is trying to cover up bad decisions.
"Were automating with AI" sounds better to investors than "We over hired and now need to downsize" or "We made some bad market bets, now need to free up cash flow"
> Not a single engineer has ever been laid off because of AI. Any company claiming this is the case is trying to cover up bad decisions.
I don't suppose these assertions are based on anything. If "AI" reduces the amount of time an engineer spends writing crud, boilerplate, test cases, random scripts, etc., and they have 5% more time to do other things, then all else being equal a project can be done with 5% fewer engineers.
Does AI result in greater productivity for engineers, and does greater productivity per person mean demand can be satisfied with fewer people?
are you insane??? big tech literally make one of the most biggest layoff for the past few months
https://old.reddit.com/r/wallstreetbets/comments/1oz6gjp/new...
You’re spot on that people should think of who is on the other side of the trades they’re taking, and be extremely paranoid of being adversely selected.
Disallowing people from making terrible trades seems…paternalistic? Idk
https://kalshi.com/markets/kxminajmention/nicki-minaj/kxmina...
The mention markets are pure degenerate gambling and everyone involved knows that
None of whom were forced by anyone to place bets in the first place.
For the kind of person playing these sorts of games, that actually really "hype".
Basically these tools can you you to a 100k LOC project without much effort, but it's not going to be a serious product. A serious product requires understanding still.
The initial prompt was, in case people doesn't want to log in:
Make a turn based chess like game. Instead of normal chess board use an hexagonal grid. Make the board diagonal shaped. Instead of traditional chess pieces we are going to use spaceship designs. Each spaceship has unique abilities that influence the board or their own skill. For 2 players, turn based. Show me what you got.
Standard Context(≤ 200K tokens)
Input $2.00 vs $1.25 (Gemini 3 pro input is 60% more expensive vs 2.5)
Output $12.00 vs $10.00 (Gemini 3 pro output is 20% more expensive vs 2.5)
Long Context(> 200K tokens)
Input $4.00 vs $2.50 (same +60%)
Output $18.00 vs $15.00 (same +20%)
"Failed to generate content, quota exceeded: you have reached the limit of requests today for this model. Please try again tomorrow."
"You've reached your rate limit. Please try again later."
Update: as of 3:33 PM UTC, Tuesday, November 18, 2025, it seems to be enabled.
I reckon it's an API key thing... you can more explicitly select a "paid API key" in AI Studio now.
when i signed up for billing via cloud console and entered my credit card, i got $300 "free credits".
i haven't thrown a difficult problem at gemini 3 pro it yet, but i'm sure i got to see it in some of the A/B tests in aistudio for a while. i could not tell which model was clearly better, one was always more succinct and i liked its "style" but they usually offered about the same solution.
Gemini 3:
The cognitive dissonance in this thread is staggering.
We are sitting here cheering for a model that effectively closes the loop on Google’s total information dominance, while simultaneously training our own replacements.
Two things in this thread should be terrifying, yet are being glossed over in favor of "look at the pelican SVG":
1. *The Model Card Leak:* As user `rvz` pointed out, the leaked model card implies training on "user data... in accordance with Google's relevant terms of service." Given the recent terms updates, this essentially confirms that Gemini 3 is likely refined on the corpus of Gmail and Drive data that they’ve been hoarding for decades. We are cheering for a product sold back to us at a 60% markup (input costs up to $2.00/M) that was built on our own private correspondence.
2. *The "Agentic" Trap:* Everyone praising how `Gemini 3` solves complex bugs one-shot or handles entire PRs is missing the macro view. We aren’t looking at a productivity tool; we are looking at the devaluation of the software engineer. When the entry barrier drops to "can you prompt a seminal agent," the economic value of what we do on this forum evaporates.
Google has successfully gamified us into feeding the very beast that will make the "14-minute human solve time" (referenced by `lairv`) irrelevant. We are optimizing for our own obsolescence while paying a monopoly rent to do it.
Why is the sentiment here "Wow, cool clock widget" instead of "We just handed the keys to the kingdom to the biggest ad-tech surveillance machine in history"?
That feels like something between a hallucination and an intentional fallacy that popped up because you specifically said "intense discussion". The increase is 60% on input tokens from the old model, but it's not a markup, and especially not "sold back to us at X markup".
I've seen more and more of these kinds of hallucinations as these models seem to be RL'd to not be a sycophant, they're slowly inching into the opposite direction where they tell small fibs or embellish in a way that seems like it's meant to add more weight to their answers.
I wonder if it's a form of reward hacking, since it trades being maximally accurate for being confident, and that might result in better rewards than being accurate and precise
Nonetheless, Gemini 3 failed this test. It failed to start a discussion. Its points were shallow, and too aiesque.
Looking at it again it's actually a completely nonsensical sentence that just happens to resemble a sensible statement in a way that would fool most people.
RL is definitely showing some busting seams at this point.
Also, can you guess which pelican SVG was gemini 3 vs 2.5? https://xcancel.com/xundecidability/status/19908113191723213...
> Eliezer Yudkowsky is a central figure in the article, mentioned multiple times as the intellectual originator of the community from which the "Zizians" splintered. His ideas and organizations are foundational to the entire narrative.The story isn't about Yudkowsky. At each level of summarization you have to make hard decisions about what to keep. Not every story about the United States needs to mention George Washington.
Asking the follow up "what are ALL the individuals mentioned in the story" results in both models listing ~40 names and both of those lists include Yudkowsky.
> The training dataset also includes: publicly available datasets that are readily downloadable; data obtained by crawlers; licensed data obtained via commercial licensing agreements; user data (i.e., data collected from users of Google products and services to train AI models, along with user interactions with the model) in accordance with Google’s relevant terms of service, privacy policy, service-specific policies, and pursuant to user controls, where appropriate; other datasets that Google acquires or generates in the course of its business operations, or directly from its workforce; and AI-generated synthetic data.
So your Gmails are being read by Gemini and is being put on the training set for future models. Oh dear and Google is being sued over using Gemini for analyzing user's data which potentially includes Gmails by default.
Where is the outrage?
[0] https://web.archive.org/web/20251118111103/https://storage.g...
[1] https://www.yahoo.com/news/articles/google-sued-over-gemini-...
"gmail being read by gemini" does NOT mean "gemini is trained on your private gmail correspondence". it can mean gemini loads your emails into a session context so it can answer questions about your mail, which is quite different.
That said, LLMs are the most data-greedy technology of all time, and it wouldn't surprise me that companies building them feel so much pressure to top each other they "sidestep" their own TOSes. There are plenty of signals they are already changing their terms to train when previously they said they wouldn't--see Anthropic's update in August regarding Claude Code.
If anyone ever starts caring about privacy again, this might be a way to bring down the crazy AI capex / tech valuations. It is probably possible, if you are a sufficiently funded and motivated actor, to tease out evidence of training data that shouldn't be there based on a vendor's TOS. There is already evidence some IP owners (like NYT) have done this for copyright claims, but you could get a lot more pitchforks out if it turns out Jane Doe's HIPAA-protected information in an email was trained on.
Google is betting that moving email and cloud is such a giant hassle that almost no one will do it, and ditching YT and Maps is just impossible.
> in accordance with Google’s relevant terms of service, privacy policy
> The default FPS it's analyzing video at is 1
Source?
"By default 1 frame per second (FPS) is sampled from the video."
Maybe they are keeping that itself secret, but more likely they probably just have had humans generate an enormous number of examples, and then synthetically build on that.
No benchmark is safe, when this much money is on the line.
> When you think about divulging this information that has been helpful to your competitors, in retrospect is it like, "Yeah, we'd still do it," or would you be like, "Ah, we didn't realize how big a deal transformer was. We should have kept it indoors." How do you think about that?
> Some things we think are super critical we might not publish. Some things we think are really interesting but important for improving our products; We'll get them out into our products and then make a decision.
If they had found something fundamentally new, I doubt they would've snuck it into Gemini 3. Probably would cook on it longer and release something truly mindblowing. Or, you know, just take over the world with their new omniscient ASI :)
I would cancel my ChatGPT sub immediately if Gemini had a desktop app and may still do so if it continues to impress my as much as it has so far and I will live without the desktop app.
It's really, really, really good so far. Wow.
Note that I haven't tried it for coding yet!
I completely understand the appeal of having local and offline applications, but the ChatGPT desktop app doesn't work without an internet connection anyways. Is it just the convenience? Why is a dedicated desktop app so much better than just opening a browser tab or even using a PWA?
Also, have you looked into open-webui or Msty or other provider-agnostic LLM desktop apps? I personally use Msty with Gemini 2.5 Pro for complex tasks and Cerebras GLM 4.6 for fast tasks.
(1) The ability to add context via a local apps integration into OS level resources is big. With Claude, eg, I hit Option-SPC which brings up a prompt bar. From there, taking a screenshot that will get sent my prompt is as simple as dragging a bounding box. This is great. Beyond that, I can add my own MCP connectors and give my desktop app direct access to relevant context in a way that doesn't work via web UI. It may also be inconvenient to give context to a web UI in some case where, eg, I may have a folder of PDFs I want it to be able to reference.
(2) Its own icon that I can CMD-TAB to is so much nicer. Maybe that works with a PWA? Not really sure.
(3) Even if I can't use an LLM when offline, having access to my chats for context has been repeatedly valuable to me.
I haven't looked at provider-agnostic apps and, TBH, would be wary of them.
Good point. I can see why integrated support for local filesystem tools would be useful, even though I prefer manually uploading specific files to avoid polluting the context with irrelevant info.
> Its own icon that I can CMD-TAB to is so much nicer
Fair enough. I personally prefer Firefox's tab organization to my OS's window organization, but I can see how separating the LLM into its own window would be helpful.
> having access to my chats for context has been repeatedly valuable to me.
I didn't at all consider this. Point ceded.
> I haven't looked at provider-agnostic apps and, TBH, would be wary of them.
Interesting. Why? Is it security? The ones I've listed are open source and auditable. I'm confident that they won't steal my API keys. Msty has a lot of advanced functionality that I haven't seen in other interfaces like allowing you to compare responses between different LLMs, export the entire conversation to Markdown, and edit the LLM's response to manage context. It also sidesteps the problem of '[provider] doesn't have a desktop app' because you can use any provider API.
"Users"? Or people that get presented with it and ignore it?
I'd be comfortable calling a 'user' anyone who clicked to expand the little summary. Not sure what else you'd call them.
Normal users (i.e. not grumpy techies ;) ) probably just go with the flow rather than finding it irritating.
Cringe. To get to 2 billion a month they must be counting anyone who sees an AI overview as a user. They should just go ahead and claim the "most quickly adopted product in history" as well.
It is good at writing something from scratch (like spitting out its training set).
Claude is still superior for programming and debugging. Gemini is better at daily life questions and creative writing.
For example, in a basic python script that uses os.path.exists, it forgets the basic "import os", and then, "I apologize for the oversight".
Input: $1.25 -> $2.00 (1M tokens)
Output: $10.00 -> $12.00
Squeezes a bit more margin out of app layer companies, certainly, but there's a good chance that for tasks that really require a sota model it can be more than justified.
If we're paying more for a more petaflop heavy model, it makes sense that costs would go up. What really would concern me is if companies start ratcheting prices up for models with the same level of performance. My hope is raw hardware costs and OSS releases keep a lid on the margin pressure.
Do regular users know how to disable AI Overviews, if they don't love them?
If it's good enough to be useful on your code base, it better be good enough to instruct you on how to use it.
How easy it is depends on whether or not they've built that kind of thing in
Antigravity is an IDE you install.
Colab is a place to run notebooks in the cloud.
Thanks for sorting all this out! Still exploring the first two, so I really don't know.
Very good. Nice work! These things will definitely change lives.
Good at vibe coding 10:30 - step change where it's actually useful
AGI still 5-10 years. Needs reasoning, memory, world models.
Is it a bubble? - Partly 22:00
What's fun to do with Gemini to show the relatives? Suggested taking a selfie with the app and having it edit. 24:00 (I tried and said make me younger. Worked pretty well.)
Also interesting - apparently they are doing an agent to go through your email inbox and propose replies automatically 4:00. I could see that getting some use.
Is that all? So they just need to invent:
1. Thought
2. A mechanism for efficiently encoding and decoding arbitrary percepts
3. A formal model of the world
And then the existing large language models can handle the rest.
Yep, 5 years and a hundred billion dollars or so should do the trick.
Amazing and hilarious
Gemini 3 worked much better and I actually committed the changes that it created. I don't mean its revolutionary or anything but it provided a nice summary of my request and created a decent simple solution. Sonnet had created a bunch of overarching changes that I would not even bother reviewing. Seems nice. Will probably use it for 2 weeks until someone else releases a 1.0001x better model.
OpenAI is going for the consumer-grade AI market, as opposed to a company like Anthropic making a specialized developer tool. Google can inject their AI tool in front of everybody in the world, and already have with Google AI search. All of these models are just going to reach parity eventually, but Google is burning cash compared to OpenAI burning debt. It seems like for consumer-grade purposes, AI use will just be free sooner or later (DeepSeek is free, Google AI search is free, students can get Gemini Pro for free for a year already). So all I'm seeing that OpenAI has is Sora, which seems like a business loser though I don't really understand it, and also ChatGPT seems to own the market of people roleplaying with chat bots as companions (which doesn't really seem like a multi-trillion dollar business but I could be wrong).
But I am looking forward to trying it out. I find Gemini to be great as handling large-context tasks, and Google's inference costs seem to be among the cheapest.
Given the nature of how those models work, you don't need exact replicas.
2026: cure cancer
> gemini-3-pro-preview-ais-applets
> gemini-3-pro-preview
Glad to see Google still can't get out of its own way.
Come on, you can’t be serious.
Can the mathematicians here try ask it to invent new novel math related to [Insert your field of specialization] and see if it comes up with something new and useful?
Try lowering the temperature, use SymPy etc.
What are you doing to help kill bash?
def solve_kangaroo_limit(): # Set precision to handle the "digits different from six" requirement decimal.getcontext().prec = 50
# For U(0,1), H(x) approaches 2x + 2/3 very rapidly (exponential decay of error)
# At x = 10^6, the value is indistinguishable from the asymptote
x = 10**6
limit_value = decimal.Decimal(2) * x + decimal.Decimal(2) / decimal.Decimal(3)
print(f"H({x}) ≈ {limit_value}")
# Output: 2000000.66666666666666666666...
if __name__ == "__main__":
solve_kangaroo_limit()
....p.s. for airheads=idiots: """decimal.Decimal(2) / decimal.Decimal(3)""" == 0.6666666666666666666666666666666666666666666666666666666666666666666666666 ... This is your Fukingly 'smart' computer???Everyone is talking about the release of Gemini 3. The benchmark scores are incredible. But as we know in the AI world, paper stats don't always translate to production performance on all tasks.
We decided to put Gemini 3 through its paces on some standard Vision Language Model (VLM) tasks – specifically simple image detection and processing.
The result? It struggled where I didn't expect it to.
Surprisingly, VLM Run's Orion (https://chat.vlm.run/) significantly outperformed Gemini 3 on these specific visual tasks. While the industry chases the "biggest" model, it’s a good reminder that specialized agents like Orion are often punching way above their weight class in practical applications.
Has anyone else noticed a gap between Gemini 3's benchmarks and its VLM capabilities?