undefined | Better HN

0 pointsKeyframe3y ago0 comments

There's definitely something amiss. Maybe we're just not seeing the whole picture, but Google has the best potential out there still. Not only vast and fundamental research came out their door (presumably there's more), but they also have their own compute resources and an up-to-date copy of internet.zip and gmail.zip and youtube.zip which they can train on vs what small and stale stuff (compared to Google's data) OpenAI trained their stuff on (like common crawl etc.). What gives, Google? Get on it!

edit: I forgot all about google_maps.zip / waze.gz and all the juicy traffic data coming from android.. which probably already relies heavily on AI

0 comments

21 comments · 5 top-level

kmeisthax3y ago· 8 in thread

The difference between OpenAI and Google is that the latter's ethical concerns with AI are more deeply held. Google gave us the Stochastic Parrots paper[0] - effectively a very long argument as to why they shouldn't build their own ChatGPT. OpenAI uses ethics as a handwave to justify becoming a for-profit business selling access to proprietary models through an API, citing the ability to implement user-hostile antifeatures as a deliberate prosocial benefit.

To be clear, Google does use AI. They use it so heavily that they've designed four generations of training accelerators. All the fancy knowledge graph features used to keep you from clicking anything on the SERP are powered by large language models. The only thing they didn't do is turn Google Search into a chatbot, at least not until Microsoft and OpenAI one-upped them and Google felt competitive pressure to build what they thought was garbage.

And yes, Google's customers share that belief. Remember that when Google Bard gets a fact about exoplanets wrong, it's a scandal. When Bing tries to gaslight its users into thinking that time stopped at the same time GPT-4's training did, it's funny. Bing can afford to make mistakes that Google can't, because nobody uses Bing if they want good search results. They use Bing if they can't be arsed to change the defaults[1].

[0] Or at least they did, then they fired the woman who wrote it

[1] And yes that is why Microsoft really pushes Bing and Edge hard in Windows.

tarsinge3y ago

It was not some anecdotal fact that Bard got wrong, it was during their official public demo. It was a "scandal" because it showed Google was indeed unprepared and had no better product, not even preparing and fact checking their demo before was the cherry on the top.

Ethics is a false excuse because rushing that out show they never cared either. It was just PR and their bluff was called.

Also I skimmed over that Stochastic Paper and I’m unimpressed. I’m unfamiliar with the subject but many points seems unproven/political rather than scientific, with a fixation on training data instead of studying the emerging properties and many opinions notably regarding social activism, but maybe it was already discussed here on HN. Edit: found here: https://news.ycombinator.com/item?id=34382901

actionfromafar3y ago

Google and ethics, now that’s an oxymoron

kmeisthax3y ago

> I’m unfamiliar with the subject but many points seems unproven/political rather than scientific

You're exactly the kind of person Stochastic Parrots was trying to warn us about - you bought into the AI hype.

AI are extremely sensitive to the initial statistical conditions of their dataset. A good example of this is image regurgitation in diffusion models: if you include the same image n times in the data set, it gets n times the number of training epochs, and is far more likely to be memorized. Stable Diffusion's propensity to draw bad copies of the Getty Images logo is another example; there's so many watermarks and signatures in the training data that learning how to draw them measurably reduces loss. In my own AI training adventures[0], the image generator I trained loves to draw maps all the time, no matter what the prompt is, because Wikimedia Commons hosts an absolutely unconscionable number of them.

Stochastic Parrots is arguing that we can't effectively filter five terabytes[1] of training set text for every statistical bias. Since HN is allergic to social justice language, I'll put it in terms that are more politically correct here: gradient descent is vulnerable to Sybil attacks. Because you can only scrape content written by people who are online, the terminally online will decide what the model thinks, filtered through the underpaid moderators who are censoring your political opinions on TwitBook.

Of course, OpenAI will try anyway[2]. The best they've come up with is to use RLHF to deliberately encode a center-left bias into a language model that otherwise would be about as far-right as your average /pol/ user. This has helped ChatGPT avoid the fate of, say, Microsoft's Tay; but it is just sweeping the problem under the rug.

The other main prong of Stochastic Parrots is energy usage. The reason why OpenAI hasn't been outcompeted by actual open AI models is because it takes shittons of electricity and hardware to train these things. Stable Diffusion and BLOOM are the biggest open competitors to OpenAI, but they're being funded purely through burning venture capital. FOSS is sustainable because software development is cheap enough that people can do it as volunteer work. AI training is almost the opposite: extremely large capital costs that can only be recouped by the worst abuses of proprietary software.

[0] I am specifically trying to build a diffusion model trained purely on public domain images, called PD-Diffusion.

[1] No problem. We are Google. Five terabytes is so little that I've forgotten how to count that low.

[2] When filtering the dataset for DALL-E 2, OpenAI found that removing porn from the training set made the image generator's biases far worse. i.e. if you asked for a stock photo of a CEO, pre-filter DALL-E would give about 60% male, 40% female examples; post-filter DALL-E would only ever draw male CEOs.

neel89863y ago

>> To be clear, Google does use AI. They use it so heavily that they've designed four generations of training accelerators.

This +100 Somehow there is a perception that chat bots are the only example of AI research or product that matters and all AI organisations ability will be judged by their ability to create chatbots.

visarga3y ago

LLMs are the end-game for almost all NLP and CV tasks. You can freely specify the task description, input and output formats, unlike discriminative models. You don't need to retrain, don't need many examples, and most importantly - it works on tasks the developers of the LLM were not aware of at design time - "developer aware generalisation". LLMs are more like new programming languages than applications, pre-2020 neural nets were mostly applications.

1 more reply

binkHN3y ago

> ...nobody uses Bing if they want good search results.

Sadly, I think I'd argue that nobody has good search results anymore. Google's results have been SEO'd to the hilt and most of the results are blog spam garbage nowadays.

nullc3y ago

> The only thing they didn't do is turn Google Search into a chatbot,

No, they turned google search into what it is now.

For me, trying google bard was an instant reminder of the change in behavior in google search from 15 years ago to today.

We used to have a search that you could give obscure flags to Linux commands and find their documentation or source code. Today we have a google search that often only tell you about how some kardashian or recent political drama is a sounds-alike with the technical term that you were searching for.

GPT4 has some of the same "excessively smart" failure modes, but it (and GPT3.5 for that matter) is so much more useful than bard (which hits the user with "I can't do that dave" 100x more often than chatgpt's already excessive behavior) that they're a useful addition to the toolbox. Too bad the toolbox hardly includes plain search anymore.

richardw3y ago

OpenAI releasing imperfect products is exactly what they said they would do. We need society to understand what the state and risks are. The 6-month-wait shitstorm is what happens when society gets the merest glimmer of the potential. I applaud them for this, rather than focusing on protecting their brand.

danans3y ago· 6 in thread

> gmail.zip

Despite what people often write and believe here, the access controls on PII data at Google are incredibly strict. You can't just arbitrarily train on people's personal data. I know, because when I was there, working on search backend data mining, in order to get access to anonymized search and web logs, I had to sign paperwork that essentially said I'd be taken to the cleaners if I abused the access.

> What gives, Google? Get on it

It's a very difficult decision to intentionally destabilize the space you are the leader in, for all the reasons you can imagine. In a sense, Google needed someone else with nothing to lose to shake up the space. How they execute in the new reality is yet to be seen. The biggest challenge they may have right now isn't technological, but that "ChatGPT" has become a sort of brand, like Kleenex and well, Google.

paulkon3y ago

Well put, the brand awareness of ChatGPT is the biggest challenge they have now.

dmix3y ago

Meh people would much prefer to be typing their prompts into a Google search box than opening a separate GPT app. I doubt there real issue here is a marketing one. Despite ChatGPT's massive growth numbers the market is pretty immature, it's still very much open and not yet decided.

Many markets had early leaders who got stomped by later entrants.

1 more reply

emilsedgh3y ago

I don't think it is.

I'd prioritize their problems like this:

1. LLM's don't have a lucrative business model that Google needs.

2. The quality of their language model is really lacking as of now.

You fix 1 and 2, ChatGPT's branding is nothing. Google is the biggest advertisement machine in the world and they can market the hell out of their product. Just see how Chrome gained ground on Firefox for example.

Google is still used several folds more than ChatGPT and if you resolve 1 and 2, Google will make their money and their users have no incentive to go to ChatGPT.

KeyframeOP3y ago

You're right on both accounts.

However, whatever's going on inside I still strongly believe in that company! Sometimes though it just feels like they don't themselves.

illiarian3y ago

> Despite what people often write and believe here, the access controls on PII data at Google are incredibly strict. You can't just arbitrarily train on people's personal data.

And yet Google is the largest online advertiser in the world. And yet, GMail used to (I don't know if it still does) push ads into people's inboxes.

I have as much belief in their PII controls as in their "Don't be evil" motto.

Lacerda693y ago

i used gmail since the beta and never saw an ad there, what are you talking about?

1 more reply

anileated3y ago· 2 in thread

Google could stop sending traffic to webmasters and pivot to directly providing answers based on scraped data long, long ago, but Google knew webmasters would be up in arms over such a blatant bait and switch taking away their traffic and revenue.

OpenAI subverted this by riding on the “open” part of their name at first—before doing a 180-degree turn and selling out to Microsoft.

mejutoco3y ago

I think it is more a case of the advertisers, since there would be little opportunity to show ads.

Receiving traffic to sites is nice, especially for already highly-ranked results, but these are not the people buying the ads.

anileated3y ago

They could just as easily show ads in answers, the advertisers wouldn’t care. In fact, I can see how a major advertiser would rather prefer that an ad is shown in Google’s own trusted UI rather than on some random website next to who knows what sort of content (that motivation is behind YouTube’s “demonetization”).

1 more reply

arcatech3y ago

Google is great at technology and bad at making actual products. This all makes sense to me.

narrator3y ago

The announcement felt cautious and political, like they are running for technological ruler of the world and not a company trying to make money. This is probably why they are not going to not get very far against their competitors despite having so much potential. They care too much about what the EU and governments everywhere think of them now. They are no longer a profit making entity that disrupts and pushes the rules. They are part of maintaining the status quo.

j / k navigate · click thread line to collapse

0 comments

21 comments · 5 top-level

kmeisthax3y ago· 8 in thread

[0] Or at least they did, then they fired the woman who wrote it

[1] And yes that is why Microsoft really pushes Bing and Edge hard in Windows.

tarsinge3y ago

Ethics is a false excuse because rushing that out show they never cared either. It was just PR and their bluff was called.

actionfromafar3y ago

Google and ethics, now that’s an oxymoron

kmeisthax3y ago

> I’m unfamiliar with the subject but many points seems unproven/political rather than scientific

You're exactly the kind of person Stochastic Parrots was trying to warn us about - you bought into the AI hype.

[0] I am specifically trying to build a diffusion model trained purely on public domain images, called PD-Diffusion.

[1] No problem. We are Google. Five terabytes is so little that I've forgotten how to count that low.

neel89863y ago

>> To be clear, Google does use AI. They use it so heavily that they've designed four generations of training accelerators.

This +100 Somehow there is a perception that chat bots are the only example of AI research or product that matters and all AI organisations ability will be judged by their ability to create chatbots.

visarga3y ago

1 more reply

binkHN3y ago

> ...nobody uses Bing if they want good search results.

Sadly, I think I'd argue that nobody has good search results anymore. Google's results have been SEO'd to the hilt and most of the results are blog spam garbage nowadays.

nullc3y ago

> The only thing they didn't do is turn Google Search into a chatbot,

No, they turned google search into what it is now.

For me, trying google bard was an instant reminder of the change in behavior in google search from 15 years ago to today.

richardw3y ago

danans3y ago· 6 in thread

> gmail.zip

> What gives, Google? Get on it

paulkon3y ago

Well put, the brand awareness of ChatGPT is the biggest challenge they have now.

dmix3y ago

Many markets had early leaders who got stomped by later entrants.

1 more reply

emilsedgh3y ago

I don't think it is.

I'd prioritize their problems like this:

1. LLM's don't have a lucrative business model that Google needs.

2. The quality of their language model is really lacking as of now.

Google is still used several folds more than ChatGPT and if you resolve 1 and 2, Google will make their money and their users have no incentive to go to ChatGPT.

KeyframeOP3y ago

You're right on both accounts.

However, whatever's going on inside I still strongly believe in that company! Sometimes though it just feels like they don't themselves.

illiarian3y ago

> Despite what people often write and believe here, the access controls on PII data at Google are incredibly strict. You can't just arbitrarily train on people's personal data.

And yet Google is the largest online advertiser in the world. And yet, GMail used to (I don't know if it still does) push ads into people's inboxes.

I have as much belief in their PII controls as in their "Don't be evil" motto.

Lacerda693y ago

i used gmail since the beta and never saw an ad there, what are you talking about?

1 more reply

anileated3y ago· 2 in thread

OpenAI subverted this by riding on the “open” part of their name at first—before doing a 180-degree turn and selling out to Microsoft.

mejutoco3y ago

I think it is more a case of the advertisers, since there would be little opportunity to show ads.

Receiving traffic to sites is nice, especially for already highly-ranked results, but these are not the people buying the ads.

anileated3y ago

1 more reply

arcatech3y ago

Google is great at technology and bad at making actual products. This all makes sense to me.

narrator3y ago

j / k navigate · click thread line to collapse