I almost wish there were less precise and more obscure options out there today. A search engine that purposefully didn't index mainstream news, social media, nor shopping/product sites.
Filters, refinement, user preferences. etc etc. Search is a powerful tool, and the power has been growing exponentially since 1995. But, almost all of that power is channeled into instant results, intuitiveness, "sane defaults" and such. All the capabilities are in the background and the assumed user is lowest denominator on all fronts. The low effort, low technical capabilities, low understanding, etc. This is most people (me included) most of the time, but not all of the time. Maybe I want to filter seo spam (like pinterest) more aggressively. Etc.
This isn't a slam on google really. One thing can't do everything and the thing google search does is what most people need most of the time. It's definitely the thing that makes the most money. But... none of a search engines power has been channeled into making search a power tool. It's a tool that everyone uses many times per day, but there is no learning curve. No getting better. You can't really invest effort and get rewarded for that effort.
Pity really, all this power is there. It's right under the surface. Let us be more than the lazy, dumb user sometimes.
Google have reversed the User Interface paradigm. Instead of users learning the software and telling it what to do, the software learns the user. I don't necessarily mean NNs or personalisation. I mean that the paraign is software-centric. "If the user does X, how should the software respond?" instead of "If the user user wants Y, how does she make the software do it?" That's great for intuitiveness, but it also creates a frame where the software gets better over time but the user never does.
The problem with this model, though, is support. A service with 100 million untrained end users each paying a dollar will need to provide quality customer service to 100 million untrained end users. It is much easier to sell ads to 10,000 "affiliates" who feel that they have some special arrangement and thus accept some responsibility of their own, and make $10,000 off each of those.
https://greasyfork.org/en/scripts/1682-google-hit-hider-by-d...
This is really the precedent that mattered, I think. Remember one line of a poem? You can probably find the rest. Recall a Douglas Adams piece about rice farming in Bali or Java or something... you can find it. As you say, it took boolean gymnastics. But, that really just means modifying your search terms and scrolling through some results.
Being less powerful than 2020 google kind of put more power/responsibility in users' hands. A user needed to use a search engine like a tool.
I want what you want too... Everything google (mail/search/youtube, etc)... but designed for more user effort. Finding the best-for-most result instantly is great, but sometimes you want the tool to help you find a result on page 112 within a few minutes.
Bring back a little bit of a directory feel even. Let me narrow down, refine and shape results gradually. Assume that I will rummage through a bunch of crap to find what I want.
Unfortunately that didn't scale with the size of the web; at one point the amount of pages got large enough that doing a smart ranking like Google did became more effective than the "look at all pages of results" usage required for AltaVista. It felt like a monumental transition, like when a program gets too big for main memory and causes the computer to start swapping to disk.
That reminded me of Architext, and EWS (Excite for Web Servers)[2], but a good bit of Googling (ahem) later, seems difficult to find much about most of the mid to late 90s standalone local search engines any more, except perhaps Inktomi. Most are not mentioned in Wikipedia search engine timeline[3].
1. https://en.wikipedia.org/wiki/Microsoft_Development_Center_N...
2. https://www.wired.com/1998/01/excite-moves-to-patch-search-s...
3. https://en.wikipedia.org/wiki/Timeline_of_web_search_engines
EDIT:
Using the search engine Qwant, I was able to find an original EWS (Excite for Web Servers) help page:
http://www1.udel.edu/Excite/AT-helpdoc.html
At the time, it was amazingly out of the box:
Excite for Web Servers makes it easy for you to add searching -- Excite, Inc.'s advanced concept-based searching -- to your Web site.
Excite for Web Servers provides a simple Web-browser interface for doing all the things necessary to enable concept-based searching of collections of documents -- administering, indexing, and searching over the collections.
In particular, one can:
- define a document collection -- that is, specify a set of documents to be considered a single collection over which one can search,
- design customized pages for displaying to users who wish to search over that collection,
- index that collection, monitoring the progress, and search the collection.
With Excite for Web Servers, it's easy to set up concept-based-searchable Web sites in minutes.
I distinctly remember someone suggesting it to me at our CS lab in college. A few of us had never heard of it, and we all started to do some searches to try it out. There was silence for about 5 minutes, and then someone said "this is really good."
Say what you want about what Google has turned into, but it was an incredibly important tool that came around at the right time.
Also, it makes me really happy that they've kept the "I'm Feeling Lucky" button around.
On top of that, as you say, anything that isn’t from a huge web property or corporate site is so heavily penalized you’re lucky to ever see it even if it is new.
The utility of google as a web search engine is definitely declining over time.
This 2000 New Yorker article does a pretty good job:
https://www.newyorker.com/magazine/2000/05/29/search-and-dep...
It was actually how I discovered Google. I was a grad student at the time (English literature) and dreamed of accessing all the primary and secondary sources on a particular topic from the convenience of my own keyboard (still a pipe dream today). I cycled through every search engine I could find looking for one that would actually return results relevant to my search terms. I remember AltaVista being touted as one of the best but still failing to satisfy.
As soon as I read this New Yorker article, I checked out google.com and had that same eureka experience: finally, this is the one.
Google worked surprisingly well, obviously they were going to be hugely important (unless another player appeared and did a big next leap), and obviously they were smart. (On the side, I also got warm-fuzzies that the search hits I got seemed to have a strong Linux bias.)
The grad student who asked, coincidentally, ended up deciding not to finish his PhD, and went to Google. :)
But I Waldo remember just how blown away I was when I first saw altavista do it’s thing. Altavista was as equally amazing in 1995 as google was in 2000. Innovation goes in waves.
Using AltaVista for me meant digging through page after page of results. Comprehensiveness was the main concern. It was up to the user to evaluate the relevance of the results.
Early Google made similar claims about comprehensively searching millions of pages, but as we know today, they are intent on inferring meaning and purpose. They actively discourage and prevent users from combing through page after page of results. User evaluation (i.e., intelligence) is not expected. Google attempts to evaluate results for the user based on popularity, originally estimated primarily by counting backlinks. Popularity as a filter is useful sometimes but deeply flawed at others. It's arguable Google has dulled, atrophied or stunted development of web users' analytical skills. When it first appeared on the web, Google had no paid placements and no advertising. What was not to like? They later abandoned their original mission to avoid the influence of paid placement. They became beholden to advertising.
It was not difficult to see when and where the influence of advertising came into AltaVista. However when this started to happen at Google, Google tried to hide the ads by making them text-only. As if the influence was not there.
We need another AltaVista, where user evaluation of results is allowed and encouraged, with a mission statement like the original Page and Brin paper announcing Google: no influence by advertising. Ultimately PageRank was dependent on human discretion: the decision whether or not to link to another page. We soon learned that this discretion, this choice to link or not to link, easily becomes driven by money when people know it effects PageRank. Google quickly got gamed and it has been trying to pretend it can manage this ever since.
We need non-commercial search.
Google's aversion to zero-result queries is a problem, and possibly the problem, with their "accuracy," so to speak.
I was interning at Xerox PARC (adjacent to Stanford campus) in summer 2000. As far as I remember, I used Alta Vista at the time.
Somebody asked a question, and one of the researchers/intern mentors said to "Google it"! I think there were some puzzled looks, but we tried it, and I started using Google and never went back.
When I returned to college in the fall, I remember my former housemate saying how good Google was too. She had started using it too. Once people started using it, they never stopped!
It was my first real job at a large company and taught me a lot about working in corporate America.
I saw so many mistakes made within the year I worked there that were obvious even at the time for a lot of us that worked there, but at the same time there are many similarities to what happens with other very well funded projects trying to make sense of a new technology and way of doing businesses within a large very important company with a very different business model.
I have seen virtually the exact same playbook happen in the enterprise blockchain space in multiple occasions over the last 5 years.
It is sad in many ways to see what happened to DEC (probably more so than AltaVista). It was such an innovative company back in the 60s and 70s, but unlike IBM weren't able to reinvent themselves in first the new 80s world of PCs and then later internet. Classic case of innovators dilemma.
AltaVista itself largely died, because in a misguided attempt to manage the innovators dilemma they just tried to rebrand everything network oriented they had as AltaVista.
People only remember the search engine now and for good reason. But we had AltaVista firewalls, gigabit routers, network cards, mail server (both SMTP and X400 (!!!) and a bunch of other junk without a coherent strategy. Everything that had anything to do with networking got the AltaVista logo on it.
The focus became on selling their existing junk using the now hip AltaVista brand, but the AltaVista search itself was not given priority.
I learnt a lot from my experience there, grew to be extremely skeptical, learnt to love Dilbert and also learnt how cool the DEC Hardware and Digital Unix was compared to the Sun Sparc and Solaris stuff I had to work on afterwards.
In my opinion, marketing wasn't the problem; the problem was fast-following a disruptive technology without understanding how it worked. Once details of PageRank were published it was too late.
No, this wasn't it. DEC actually had a quite good PC business which died when everybody died to the Asian manufacturers.
DEC also had plenty of technical stuff as well as cash flow to ride it out.
DEC was purely a result of executive level and board level malfeasance.
After the board forced the founder Ken Olsen (who wasn't a great CEO but actually did have vision) out, Robert Palmer (who had no vision AND was incompetent) (and not the singer--the singer probably would have been better as CEO) was given marching orders to sell off the company. Which he did--with no vision whatsoever. Lots of people tried to fight against it, but any division which started righting itself immediately got flogged off.
The patent lawsuit allowed DEC to jettison a bunch of the fab to Intel which then made them an attractive target to Compaq.
People love to comment that the merger killed COMPAQ when the reality was the entire US domestic PC industry was completely collapsing.
HOWEVER, to give you an idea as to how badly the "hostile giveaway" was managed, Compaq effectively bought DEC for less than their enterprise service annual revenue--about $2 billion per year. HP later milked this stream for more than a decade.
So, in the middle of a PC industry collapse, no executive could figure out how to convert a $2 billion annual revenue stream for enterprise services plus a whole bunch of leading edge technology into a profitable company.
This shows just how shit-tastic the executive management for both DEC and COMPAQ really were.
But, hey, the DEC board got their stock bump and cashed out.
Wow. Mind blown.
I think we're all familiar with disruption, but Innovator's Dilemma poses a theory or framework as to why it happens. It's rather brilliant and seems to fit every case of market disruption I can think of.
Incumbents serve existing markets and don't care about the small markets served by disruptive startups. They're making small, iterative improvements to their product.
Disruptive tech eventually hits parabolic improvement and by this time the incumbent can't catch up.
https://en.m.wikipedia.org/wiki/The_Innovator%27s_Dilemma
The competing S-curve idea is brilliant.
Thanks for sharing your anecdote and helping me learn something fundamental today.
It's more power than any one company should ever have.
There was a time where Google refused to filter on human intervention and for political reasons, but that time has long passed.
I'd love to hear more about this, if you are willing to elaborate.
I'm curious the relation to the corporate blockchain?
I think its the idea of companies seeing new tech and trying to leverage it somehow in their own businesses and failing - badly.
Here's my own anecdotal evidence with blockchain:
I work in a very large health care company. For about four months, there was a huge buzz around the company about how to leverage block chain in health care. The main idea was using block chain to manage patient accounts and PPE.
We had all the execs bring in to do huge presentations. They brought in people from IBM to talk about Hyperledger and other block chain companies. They posted videos and articles about how this going to transform healthcare, we were all told this was going to be huge. They told people they were going to form a new team, hire developers and this was a going to be a huge focus in 2020.
Six months later? You couldn't find a single resource on any of it on any of the internal company sites. All the presentations stopped, the execs stopped talking about block chain seemingly over night and it was like poof! the idea of block chain, or any mention of it or the "revolution" that supposed to follow? Completely disappeared into the abyss, never to be heard from again.
I have no idea how much they sank into the notion that block chain could be used for health care, or how many people they hired or the contracts they signed with IBM, but I can only assume they lost a lot of money before they finally realized it wasn't going to work out.
They should have merged with a PC make obviously.
> but unlike IBM weren't able to reinvent themselves in first the new 80s world of PCs and then later internet.
Ironically IBM did fail to reinvent itself despite inventing the PC and it's cake was stolen from it.
DEC was acquired by Compaq [1]:
> In 1998, Compaq acquired Digital Equipment Corporation for a then-industry record of US$9 billion. The merger made Compaq, at the time, the world's second largest computer maker in the world in terms of revenue behind IBM.
Maybe you were being sarcastic.
For the first decade of the web, there were a handful of search engines competing, rising and falling in popularity. The best were altavista, and fast.
One thing that was noticeable back then was that bad search engines (and search engines that 'jumped the shark' and became bad) generally did so in similar ways:
a) they included paid results, or devoted too much real-estate to advertising
b) when they failed to find results, they tried to trick the user by showing related results (eg: omitting or substituting terms)
c) they avoided 'logical and' for search terms, in favor of 'logical or', making it difficult for users to search with precision.
The people at Google surely believe their recent changes have nothing to do with all that. Far as I'm concerned, aside from the extra millions of dollars they've spent on AI research, it's the same old story. Nobody needs a somewhat smarter version of AskJeeves.
Google 2020 : Here is the most SEO-gamed review site from 2015
Right now the best paying job many people with unspecialized skills can get is "tricking people into clicking things they shouldn't." Google is sorely taxed trying to keep up with the antics of a million people whose career is trying to game Google. Early Google was better because the people really desperate for money couldn't even afford to get online. That was a glaring inequity that doubled as a crude spam filter [1]. I think about this every time a real live person telephones me on behalf of "Windows Support."
[1] This is a large part of what I miss when I'm pining for the early Web. Practically everyone publishing online then had to be either more affluent than average or cleverer than average to get into Club Web. People contributing on the early Web were almost all financially situated independent of what they were contributing, so Web participation was almost all done out of passion rather than financial desperation. Authors didn't worry about how to get paid for what they wrote online and readers didn't worry about how to support their favorite sites either. People in the club were understood to have other means of sustenance. If you didn't, you wouldn't be in the club in the first place!
> Nobody needs a somewhat smarter version of AskJeeves.
That's exactly what I need. It's not right for everything, but asking questions is the natural form of information seeking for a human. Being able to do that well is a huge value add.
nobody wants the kind of 'smarter AskJeeves' that technology is currently capable of producing.
Had I written that, the next thing you know, someone would be calling me an idiot and informing me about the wonders of GPT3 and Tesla autopilot :)
Interestingly, it didn’t just boil down to a Quora/StackOverflow model; it wasn’t a “wisdom of crowds” thing. Instead, your question really was used as a search query — but instead of searching a pool of documents, it would search a pool of experts, matching you with an expert who knows about similar things†, then facilitating contact with them (and forwarding them your initial query/question to start off the conversation, like a Helpdesk system.)
† Not sure how they did this part — for academic experts, they could “just” fulltext-index their corpus of published journal papers, to build up a “knowledge fingerprint” of the expert. Not sure what they would do for people in industry without a stream of publications, though.
Sadly, Google bought them, shut down the Aardvark product, and probably just put the engineers on regular SRE code-slinging tasks. It almost seems like Google felt threatened. And — hint hint — nothing’s stopping anyone from building something like this again :)
There is nothing else on the contemporary horizon challenging Google in the same way in that regard on the open web, whatever bloat it now carries.
it blew everything else away pretty much immediately.
Right, I forgot people actually believe that.Um, they were a little better, maybe noticeably so by one out of a thousand people. But wow, that's not why people switch search engines in droves.
The real reason, aside from their gift for self-promotion (I first heard about them in a science glossy, which was rare for a web 'company'), is that they had a cute, zany name, and didn't do the three things I mentioned.
Google was far more popular because of its spartan design, than its quality, regardless of how people mythologize the company now.
It's interesting because it seems like the UX equivalent of "burning down furniture to heat the house" -- how does this kind of thing become so institutionalized at companies?
Is this merely the natural end season of the corporate life cycle where after innovation and growth the now engorged and dying corpse must be parted off and sold by the pound? There's something so uncomfortably Darwinian to me about that. But I suppose that's also why it's common -- it works.
You're in charge of revenue for a division. You give an estimate of $X for the current quarter and $Y for the next quarter; your boss changes your estimate to $1.5X and pushes it up the chain. Now there's two weeks left in the quarter and projections are that you'll only reach $1.1X, so your boss pushes you to stick more ads and make them bigger 'just for two weeks', but also reminds you that your revenue target for next quarter is $1.5Y, so maybe you should keep the big ads.
I'm sure the changes in search over the last decades helped the majority of users, but sometimes a "Google for developers" would have been nice.
I find the Google Assistant useful. I use it in Japanese, which is a huge plus, too. You can boost your language learning and practice.
I thought it was weird that "googol" was spelled incorrectly and that Google's logo was ugly even by Paint Shop Pro 4 standards. It looked like search for kids. I assumed the librarian didn't know anything about computers and dismissed her advice. Within a few months everyone was using Google.
But I'm always reading positive stuff about American public libraries that they are not really just about borrowing books, but free internet, photocopying, showers, some kind of free social program to help poor people with any information related stuff, like job search or government forms.
This may be true of public libraries (at least major branches) and university libraries in the USA, but is it true even of high school libraries?
Today, everything looks like it is made for kids.
https://blog.prototypr.io/are-we-designing-for-children-an-a...
I'm sure most here have been frustrated by the difficulty of getting "good" results on searches, even with modifiers. But what most troubles me is Google's memory/history has grown smaller and smaller, as if it has Alzheimers - searches that used to return results now bring back none.
The web corpus is huge, which leads to follow on problems: it's expensive to fetch it, and to proccess it, and to host the resulting indexes. Fetching is tricky also because you're likely to get blocked from sites it you're too aggressive.
To justify all that expense, you need a lot of users, but it will be hard to get those users because there is 20 years of 'search = google' to compete with. Yahoo search user testing from 10+ years ago found that users would prefer search results displayed with Google branding over search results with Yahoo branding, regardless of the search results. Maybe it wouldn't be so bad if it's Google vs a new hip name, but you have to somehow cultivate that hipness. Bing doesn't have it, Amazon tried doing websearch and quit pretty quick (but maybe it's used for Alexa?).
You'd realistically need to build out an advertising platform too. Using Google's ad platform while trying to compete with their core market seems like a bad idea. Using Microsoft's ad platform is probably not going to a good experience, but maybe you can start with it.
Which would be easy enough to store in 2020, but then you need to preprocess it in a way that is amenable to both search and result ranking. But lets say your indexing is super good and matches the compressed version: 80TiB. Throw that in EBS, and you're paying $6k/month just to store it. You also need CPU, and memory to actually compute from that data though! If we instead use i3 metal instances, you're looking at about $2700/month each, and you'll need 15 of them for 3x replication. $40k per month isn't bad, if you're a startup with VC funding. But... we also need network egress... All this just to be literally the common denominator search engine with zero users.
So, how do your users get to you? In 2020, you have three major sources: browser searches, phone searches and direct traffic. If you want to be in the browsers, you're going to have to pay, and if you want to be default you must pay more than the incumbents who have their business model figured out. And bid at scales of roughly your Series B and C combined. Phone OSes, same deal: you need to be prepared to bid high, and in volume. Direct traffic is basically word of mouth / marketing driven, and for our common crawl search, we can assume is relatively nil. So even search traffic has an acquisition cost, and almost all of the sources run their own search engine that you would need to bid against.
So this point you need to start thinking about revenue, because every query you get literally costs you money. We know that search engine ads work okay, since the user is clearly expressing intent. But those different users have very different values -- someone searching from an iPhone or MacBook Pro is likely more valuable to advertisers than a 10 year old Linux laptop running Firefox with Adblock and a Pi-Hole DNS server. And without traffic nobody's going to bother running campaigns on your platform.
Alternative revenue strategies seem unlikely to work -- Google is free and Bing literally pays users, so subscription seems unworkable. You could try to find a niche, the way DDG has, and perhaps chisel away at market share slowly, but you'd need some content indexed that is unavailable to competitors, and that will come at a price.
There are only 9 companies (crawling & indexing) independently: Google, Bing, Yandex, Baidu, Sogou, Mojeek, Gigablast, Naver.kr, seznam.cz
as listed here, with other search partners (of Google and Microsoft)
https://twitter.com/SearchEngineMap/lists
Disclosure: I maintain that resource and work at Mojeek
Maybe there would be more search engines in a less anti-competitive market... monopolies are hard to tackle.
https://web.archive.org/web/20010119175000/http://astalavist...
Some years later I remember AltaVista suddenly became full of paid links and ads, to the point of unusability. This is when Google came in, with no ads, no paid links, and actual good search results.
The irony.. now Google fills at least half the first search page with paid for links and unusable results.
Unfortunately, nobody (in the long run) gives away something completely for free. I would pay $1-2/month for a search portal without paid links and no sell-off of my private info.
Working on the conservative assumptions that an average person will run 10 searches a day and click on 1 search ad a day, Google will make $83.39 a month from the average user. It is likely much more though.
Still think the business opportunity for a really good paid-for search engine is there. Of course it's not trivial to make a search engine but as I feel Google's usability is in a falling trend, the bar is getting lower..
Source: https://ig.ft.com/how-much-is-your-personal-data-worth/
DDG provides almost the Google results, yet it feels much less efficient. I think, for the single-box-with-single-keyword-search market, Google is the best we can do, but there might still be room for other search engines.
Also, if you need historical, political or medical information, that is 3 domains where Google is already out of the game.
There is a lot of room today for a search engine which would only returns technical results and not politically or racially motivated results like Google does (Google had project to promote races other than Whites, and thus, started not returning some results depending on the race of the scientist).
I find myself using every non-google search engine in this areas. I'm pretty lazy about search engines but it bothered me enough.
Progress is always about moving from one thing to the next.
I don't think the UI switchover was uniquely AltaVista-- remember this was the time when everyone wanted to be a portal and had to have a section with sports scores, repackaged news, and stock tracking.
Yahoo was the last man standing on that path, but I seem to recall a lot of hay being made about the Excite/@Home stuff where the ISPs were supposed to push their portal on unsuspecting customers.
I remember when I left DEC for Apple in the mid-80's and my manager told me I was making a mistake going to work for a company making toy computers.
DEC was a true leader during the mini-computer era, but after that nope. Happens to a lot of companies.
One of Dawkins' memorable lines is "Descendents are common. Ancestors are exceptionally rare"
You could say crocodilians succeeded and dinosaurs failed. A croc is still a croc, but the dinosaurs are hummingbirds and seagulls. If you think about it though, both are ancestors... an exceptional success.
Altavista is a Khan.
Slightly off-topic, but I went to Wikipedia to remind myself of the specific category of dinosaur that birds are descended from, and the 'Today's featured article' was about Achelousaurus, a ceratopsid dinosaur! I think this is the first time I've had such a close match to the thing I was interested in. From there, it was just three clicks to the article I needed [0], which, incidentally, stated that "The present scientific consensus is that birds are a group of maniraptoran theropod dinosaurs that originated during the Mesozoic Era".
Within a day of showing us google, every kid in the class used google exclusively. They were so much better than their competition at the time.
I'd construct searches along the lines of:
(Word OR Word) AND (Word NEAR Word)
And get great results. Of course, the Web is way to big and Javascript-y for that now.
I also love Paul Graham's framework for imagining the future and working backwards. If we think like that, Google is nowhere near the form of a final solution to information retrieval. An ideal state would be to retrieve information correct the first time with everything you need bundled into the page. If that problem is solved, then you have to tackle the question of why the user was asking the query in the first place, and how your product can help people have a solution to their answer so that the query is never repeated!
https://www.datacenterknowledge.com/archives/2009/02/11/paix...
I ran a build cluster in the server room in the basement where Altavista used to be located. The server room was actually pretty small - just a few rows of racks. We still had a sign in our office that said "Altavista Operations". It's pretty mindblowing just thinking how small internet-scale things were back then compared to now.
It was as insane marginalising Google back then as so many other tech fads since then, including IOT, Bitcoin, XP/agile, Netbooks, 3D TVs (remember those?), and so on.
There is an upside to not having used anything Google - to this day I have zero reliance on any single product of theirs.
Then a colleague introduced me to the early version of google.com - a minimum viable product, before that phrase became popular.
Within a short period of time, without being aware of it, I almost completely stopped using AltaVista, because Google was so much better.
Memories.
If you want to find Mr. Bean and will search for "Bean", you will find... beans. Type "Mr. Bean".
Only years later someone told me about google.
> As of 1998, it used 20 multi-processor machines using DEC's 64-bit Alpha processor. Together, the back-end machines had 130 GB of RAM and 500 GB of hard disk drive space,
I'm typing this on a machine with 20 threads, 64GB of RAM, and a hair over 12TB of disk.
Not useful, except for the bling factor. 3D VR search in 1998!
Any subsequent AltaVista history is pretty much an irrelevant "all over but the shouting".
FTFY.