Amazon Kendra: Enterprise Search Service (opens in new tab)

(aws.amazon.com)

195 pointsgarysieling6y ago55 comments

55 comments

50 comments · 20 top-level

aerovistae6y ago· 7 in thread

Thank God. Even today it's almost always faster to use Google than a site's search bar. Maybe this will start to change that.

papito6y ago

Even Google's Search Appliance wasn't a game changer. Reality check - it's DEAD.

Google disrupted the market by factoring in links into its algorithm, something that is rather meaningless in proprietary context.

occamsrazorwit6y ago

I've heard from the industry that the issue with GSA was that it just wasn't that good. It tried to adapt Google's core search into a box, but Google's algorithm is only great because of scale. The box doesn't have as much of an opportunity to learn from user interactions and thus fails to meet expectations.

crawdog6y ago

The GSA "just worked". Also having the name recognition for public search did a lot to drive its enterprise popularity. It didn't have a lot of bells/whistles, but it addressed the commodity search problem well.

Agreed relevancy is a problem. PageRank works well for public content but internal search has plenty of relevancy problems. Having little control over this certainly hurt.

Managing on-prem hardware/appliances is a difficult business. I don't fault them for moving to a more scaleable model.

Reedx6y ago

It's not going to change many sites at this price point though. It'll be a minimum of $1,800/mo(!) and that's only for 10k documents, 4k queries/day.

https://aws.amazon.com/kendra/pricing/

cavisne6y ago

This is a trend for AWS. The building blocks (S3, EBS, EC2, Lambda, Dynamo) are priced at cost + margin, and prices tend to improve.

The more niche/higher level services like kendra are priced based on the value to a medium to large company.

They don’t expect individual developers to use this, or build anything on top of it. They expect a partner or employee of the company to do a pilot on the developer pricing, then convert to the enterprise pricing.

It’s a somewhat annoying trend but imo Google Cloud is a much worse offender here, everything new from them seems to be on prem “call sales for pricing” aimed at the enterprise.

choppaface6y ago

The pricing sort of makes sense for hardware:

* 3x m5.12xlarge (192GB RAM) = $6.90 / hr

* Kendra Enterprise 150GB "documents" = $7 / hr

But for the AWS offering, you get less than one mean query per second (capped per day). I would think ElasticSearch on the same hardware would offer a couple orders of magnitude more throughput.

The AWS pages talk about "document scanning," so perhaps this product is poised more towards replacing an office full of humans and filing cabinets, which most definitely costs more than $7/hr. This is the gateway product to wanting ElasticSearch.

dewey6y ago

But that's not really what it's built for. It's an internal company search. Not a search function as a service for websites (as provided by Algolia).

crawdog6y ago· 6 in thread

Interesting to see more players joining the market. You can't walk into a large Enterprise and start your search conversation with "Your developers just build ____". Otherwise customer will want to build it themselves.

The killer feature I haven't seen with many of these solutions is easy, out of the box integration with internal systems (Atlassian Confluence, JIRA, Remedy, SharePoint, FileSystem, Intranet). When you have a SaaS search engine it's difficult to export that data... Even worse to secure it. Ironically, Plumtree Software (bought by BEA -> Oracle) had all of this in their product in 2001. What's old is new again... Those features are prime for a comeback.

I think this is a space where Elastic can do well with an on-prem or managed cloud offering that is "behind the firewall", integrated with customer's environment. Add in term vector search support, ML for document/query understanding, and integration with customer's security model (Active Directory) and it would be compelling.

fiddlewin6y ago

You will need very powerful hardware to deploy the deep learning models on-prem for incremental learning.

And most of the time, while not indexing, the hardware would be sitting there sleeping. Probably not very cost-effective for enterprises.

hueving6y ago

> the hardware would be sitting there sleeping. Probably not very cost-effective for enterprises.

Not to be condescending, but idle hardware isn't even on the radar as far as waste goes in enterprises. An on-prem solution that is idle for 364 days of the year is completely fine for most of these companies.

For the ones that do care, that's what they make virtual machines and over-subscription for if they even care the slightest about that.

1 more reply

nl6y ago

You will need very powerful hardware to deploy the deep learning models on-prem for incremental learning.

This isn't true.

I've build (neural-network) vector based search extensions for search. You don't train the model - you use a pretrained model (that understands English in your domain) and then use it as an encoder.

Sometimes there is once-off pretraining process for domain adaptation, but honestly this isn't a big deal. Even on a CPU based machine you could do this overnight or over a weekend, and since it is once off that time doesn't really matter.

crawdog6y ago

For large (mature) enterprises, I believe at this point it's safe to expect some level of hybrid cloud architecture. I also agree it would be very difficult/impossible to support this for "realtime" indexing.

genS36y ago

lucidworks or sinequa are already doing that in the enterprise search space

deevin96y ago

Coveo is doing it better

1 more reply

whitezebra6y ago· 6 in thread

Hey HN, we're building a similar product at https://evertrove.co -- we don't have the limits Kendra currently has, and integrate with a lot more services. We're still early and figuring out what the pricing structure should be, but we're making it a lot more competitive than Kendra is.

We'd love to talk to you if you're interested in using Kendra. We're also wondering if there's more value on the Question Answering side of things, or the document retrieval side of things? Would love your thoughts!

softwaredoug6y ago

Question answering doesn't replace search, it's a new search use case. People who ask questions want one single answer. It doesn't replace the many existing search use cases (comparing/contrasting items; known-item lookup; problem solving for lazier users w/ fewer keywords)

In fact, while I do notice people doing question answering, users are also exceedingly lazy and want even more out of a search UI with fewer keywords. I just went to an e-commerce search UI and searched for backpack, and got something closer to search-y recommendations targeted around the kinds of backpacks I might want.

jamra6y ago

Is this HIPAA compliant? And how do we contact you? It's a little difficult not having an email address.

evertrove6y ago

Sorry about that, our mistake. Please email us at founders@evertrove.co

coderunner6y ago

Are you guys/girls bootstrapping this? I can't much info out there. How long have you been working on it?

technics2566y ago

How do we reach you?

evertrove6y ago

Sorry, we've been having some trouble with our mail servers. Please email us at evertrove.search@gmail.com for now! Would love to hear your thoughts.

citilife6y ago· 4 in thread

Curious how it compares to my offering:

https://insideropinion.com/

The main issue is giving access to documents, which most Enterprise customers do not want to do... Further, most info is in employees heads, not in documentation.

garysielingOP6y ago

From the demo it looked like an alternate way to search things like corporate portals. I.e. they're trying to improve the search that products like SharePoint provide with some ML integration.

citilife6y ago

Very similar, it also learns from context of a conversation what is likely in a document, a page, video, audio, etc. This alleviates the need to parse media.

james_s_tayler6y ago

So someone just needs to develop an offering that burrows into employees heads, extracts the data and makes it searchable across the organization.

Ninjaneered6y ago

Great idea and nice website!

Seems like this could integrate well with an enterprise wiki (attempt to document what is in the employees heads).

xfalcox6y ago· 3 in thread

Damn this is a really expensive alternative to Algolia.

jpadkins6y ago

it's internal enterprise search, not site search. harder problem.

hueving6y ago

Is it? The main difference is additional connectors and access control filters. Both are not really hard from a technology standpoint.

2 more replies

teknopurge6y ago

no, it really isn't. search has been solved - most of the issue is getting content connected and centrally indexed, not the actual searching of the index. (been doing this for 12 years back when solr and FAST where en-vogue)

genS36y ago· 2 in thread

do they use the elastic fork they did a while ago?

arnocaj6y ago

They explicitely mention Question Answering. Could it be that they use something like BERT trained with Squad dataset, and fine tuned on additional content? If so, Bert is very intense in terms of required GPU hardware...

genS36y ago

pretty sure they use some of the BERT + rules + a classic search engine + a LOT of marketing kool-aid

1 more reply

cj6y ago· 1 in thread

This is cool, but much of the functionality they're demoing isn't available in the preview. See the disclaimer at the bottom of the page:

> Kendra’s preview will not include incremental learning, query auto-completion, custom synonyms, or analytics. The preview will only offer connectors for SharePoint online, JDBC, and Amazon S3. It will be limited to a maximum of 40k queries per day, 100k documents indexed, and one index per account.

Aeolun6y ago

i.e. the preview is mostly useless for any enterprise...

msoad6y ago· 1 in thread

I've worked with setting up Google Cloud Search. GCS is good for our use case because all of our employees use Google G Suite for email calendar and one-off sites. However it took 2 years for it to be somewhat mature enough for us to actually deploy it. We're still missing connectors for some major data sources like Slack.

Hopefully Amazon moves faster and offers more out of the box data sources. They are missing G Suite content that a lot of orgs are relying on these days. Would be interesting to see what's their strategy there.

tcbasche6y ago

GCS - not to be confused with Google Cloud Storage ;)

MediumD6y ago

Shameless Plug Alert

Building a similar enterprise search product at http://landria.io/ that has a lot of additional features & enhancements over a unified keyword index + ML.

We also have a terraform config if you would like to boot it up within your own private cloud!

Any feedback would be great appreciated

tchalla6y ago

Probably this dictionary definition of Kendra (kendră) might make sense in this context

kendra (IndE)

noun C

a centre for some activity (research, study, business, art, etc.)

CodeSheikh6y ago

So the idea is I feed all of the content for my website to Kendra (hosted in cloud) and whenever a user performs a search on my website, Kendra will return results to me via a REST(?) call and I can display sorted results back to the user, right? Is the index going to live locally within my ecosystem for faster retrieval of results and Kendra can do updates to the index via some push mechanism? To be honest instead of bootstrapping a solution with Lucene/SOLR-esque, this might be not be bad idea to ride your search on the shoulders of Amazon AI search giant.

davchana6y ago

I do not know if it is inspired or not,and my 2G internet is not that fast to open this page, but name Kendra means Center in Hindi, with exact spelling

joeAtBiome6y ago

Hello everyone at HN! The team @ Biome (https://www.trybiome.com) is building a unified search platform for finding and organizing internal information. Biome integrates with your existing SaaS applications (Github, Slack, etc.) to surface content no matter where it’s stored.

If you are interested in a search solution like Biome, please feel free to reach out so we can talk more and learn the best way we can empower your team to be more productive.

collsni6y ago

Sure are alot of product plugs going on in the comments.

stepstep16y ago

Is Kendra just a wrapped Elasticsearch? Initial offering doesn't look like much, only thing "new" is FAQ.

hooloovoo_zoo6y ago

It would be cool if they added this to their Kindle e-readers so you could could perform better searches of your library.

stepstep16y ago

Is Kendra just a wrapped ES ? Only thing novel is FAQ creation and the data sources.

lovelearning6y ago

Coming from a Solr/Lucene/Algolia background, my opinions on this:

What's good:

==========

- Focused search for question and answer databases (such as customer FAQs)

- ML-based semantic search without requiring any explicit configuration

- Connectors for S3, AWS-hosted MySQL/PG, Sharepoint. Searching data already in the AWS ecosystem (S3, Aurora) is now easier, and likely faster and cheaper too in some aspects like saving incoming/outgoing bandwidth

- Document-level access control at all pricing plans

- Managed search (similar to Algolia)

What's similar to existing search systems (Solr / ES / Algolia):

==========

- Indexing: All data has to be processed into "field:value" structure prior to indexing

- Indexing file formats: Plain text, HTML, PDF, MS DOCX, MS PPT

- Searching: Usual boolean filters and faceting but only at field level.

- Searching: Field and value boosts for relevance, but only at index-time

- Results: Highlighting support

What's missing:

===========

- No multi-lingual support. Only English. Given that it's AWS, I'm very surprised by this actually (or I've missed out something in their docs)

- Can't configure text analysis for English. I feel this'll return relevant results for formal-style content, but probably not for informal-style content like emails.

- No connectors for common internal systems: Outlook, JIRA, Confluence

- No built-in support for CSV, XLS, JSON (that one's odd!). They'll all require preprocessing which means additional infra costs.

- Doesn't seem to support range- / query- facets. I feel lack of range facets is a big problem, especially for numerical data.

- No query-time relevance tuning

- No field-level access control

- Scores are not returned in results

- Common post-searching functionality is missing: rescoring, grouping, clustering

What's unknown:

============

- I don't see any information about phrase or proximity searches. Of course, they are usually relevance hacks in keyword-based systems, but sometimes users really need exact phrase matches. Does their ML backend handle this somehow?

- All search systems fall short while handling proper nouns - names, places, things, scientific names. It's possible to alleviate it to some extent using part-of-speech aware indexing. Not sure if Kendra does it in its ML backend.

mlboss6y ago

What kind of technology they might be using for this ?

vkaku6y ago

Where's the AWS Kitchen Sink Service? :)

j / k navigate · click thread line to collapse

55 comments

50 comments · 20 top-level

aerovistae6y ago· 7 in thread

Thank God. Even today it's almost always faster to use Google than a site's search bar. Maybe this will start to change that.

papito6y ago

Even Google's Search Appliance wasn't a game changer. Reality check - it's DEAD.

Google disrupted the market by factoring in links into its algorithm, something that is rather meaningless in proprietary context.

occamsrazorwit6y ago

crawdog6y ago

Agreed relevancy is a problem. PageRank works well for public content but internal search has plenty of relevancy problems. Having little control over this certainly hurt.

Managing on-prem hardware/appliances is a difficult business. I don't fault them for moving to a more scaleable model.

Reedx6y ago

It's not going to change many sites at this price point though. It'll be a minimum of $1,800/mo(!) and that's only for 10k documents, 4k queries/day.

https://aws.amazon.com/kendra/pricing/

cavisne6y ago

This is a trend for AWS. The building blocks (S3, EBS, EC2, Lambda, Dynamo) are priced at cost + margin, and prices tend to improve.

The more niche/higher level services like kendra are priced based on the value to a medium to large company.

It’s a somewhat annoying trend but imo Google Cloud is a much worse offender here, everything new from them seems to be on prem “call sales for pricing” aimed at the enterprise.

choppaface6y ago

The pricing sort of makes sense for hardware:

* 3x m5.12xlarge (192GB RAM) = $6.90 / hr

* Kendra Enterprise 150GB "documents" = $7 / hr

But for the AWS offering, you get less than one mean query per second (capped per day). I would think ElasticSearch on the same hardware would offer a couple orders of magnitude more throughput.

dewey6y ago

But that's not really what it's built for. It's an internal company search. Not a search function as a service for websites (as provided by Algolia).

crawdog6y ago· 6 in thread

fiddlewin6y ago

You will need very powerful hardware to deploy the deep learning models on-prem for incremental learning.

And most of the time, while not indexing, the hardware would be sitting there sleeping. Probably not very cost-effective for enterprises.

hueving6y ago

> the hardware would be sitting there sleeping. Probably not very cost-effective for enterprises.

For the ones that do care, that's what they make virtual machines and over-subscription for if they even care the slightest about that.

1 more reply

nl6y ago

You will need very powerful hardware to deploy the deep learning models on-prem for incremental learning.

This isn't true.

I've build (neural-network) vector based search extensions for search. You don't train the model - you use a pretrained model (that understands English in your domain) and then use it as an encoder.

crawdog6y ago

genS36y ago

lucidworks or sinequa are already doing that in the enterprise search space

deevin96y ago

Coveo is doing it better

1 more reply

whitezebra6y ago· 6 in thread

softwaredoug6y ago

jamra6y ago

Is this HIPAA compliant? And how do we contact you? It's a little difficult not having an email address.

evertrove6y ago

Sorry about that, our mistake. Please email us at founders@evertrove.co

coderunner6y ago

Are you guys/girls bootstrapping this? I can't much info out there. How long have you been working on it?

technics2566y ago

How do we reach you?

evertrove6y ago

Sorry, we've been having some trouble with our mail servers. Please email us at evertrove.search@gmail.com for now! Would love to hear your thoughts.

citilife6y ago· 4 in thread

Curious how it compares to my offering:

https://insideropinion.com/

The main issue is giving access to documents, which most Enterprise customers do not want to do... Further, most info is in employees heads, not in documentation.

garysielingOP6y ago

From the demo it looked like an alternate way to search things like corporate portals. I.e. they're trying to improve the search that products like SharePoint provide with some ML integration.

citilife6y ago

Very similar, it also learns from context of a conversation what is likely in a document, a page, video, audio, etc. This alleviates the need to parse media.

james_s_tayler6y ago

So someone just needs to develop an offering that burrows into employees heads, extracts the data and makes it searchable across the organization.

Ninjaneered6y ago

Great idea and nice website!

Seems like this could integrate well with an enterprise wiki (attempt to document what is in the employees heads).

xfalcox6y ago· 3 in thread

Damn this is a really expensive alternative to Algolia.

jpadkins6y ago

it's internal enterprise search, not site search. harder problem.

hueving6y ago

Is it? The main difference is additional connectors and access control filters. Both are not really hard from a technology standpoint.

2 more replies

teknopurge6y ago

genS36y ago· 2 in thread

do they use the elastic fork they did a while ago?

arnocaj6y ago

genS36y ago

pretty sure they use some of the BERT + rules + a classic search engine + a LOT of marketing kool-aid

1 more reply

cj6y ago· 1 in thread

This is cool, but much of the functionality they're demoing isn't available in the preview. See the disclaimer at the bottom of the page:

Aeolun6y ago

i.e. the preview is mostly useless for any enterprise...

msoad6y ago· 1 in thread

tcbasche6y ago

GCS - not to be confused with Google Cloud Storage ;)

MediumD6y ago

Shameless Plug Alert

Building a similar enterprise search product at http://landria.io/ that has a lot of additional features & enhancements over a unified keyword index + ML.

We also have a terraform config if you would like to boot it up within your own private cloud!

Any feedback would be great appreciated

tchalla6y ago

Probably this dictionary definition of Kendra (kendră) might make sense in this context

kendra (IndE)

noun C

a centre for some activity (research, study, business, art, etc.)

CodeSheikh6y ago

davchana6y ago

I do not know if it is inspired or not,and my 2G internet is not that fast to open this page, but name Kendra means Center in Hindi, with exact spelling

joeAtBiome6y ago

If you are interested in a search solution like Biome, please feel free to reach out so we can talk more and learn the best way we can empower your team to be more productive.

collsni6y ago

Sure are alot of product plugs going on in the comments.

stepstep16y ago

Is Kendra just a wrapped Elasticsearch? Initial offering doesn't look like much, only thing "new" is FAQ.

hooloovoo_zoo6y ago

It would be cool if they added this to their Kindle e-readers so you could could perform better searches of your library.

stepstep16y ago

Is Kendra just a wrapped ES ? Only thing novel is FAQ creation and the data sources.

lovelearning6y ago

Coming from a Solr/Lucene/Algolia background, my opinions on this:

What's good:

==========

- Focused search for question and answer databases (such as customer FAQs)

- ML-based semantic search without requiring any explicit configuration

- Document-level access control at all pricing plans

- Managed search (similar to Algolia)

What's similar to existing search systems (Solr / ES / Algolia):

==========

- Indexing: All data has to be processed into "field:value" structure prior to indexing

- Indexing file formats: Plain text, HTML, PDF, MS DOCX, MS PPT

- Searching: Usual boolean filters and faceting but only at field level.

- Searching: Field and value boosts for relevance, but only at index-time

- Results: Highlighting support

What's missing:

===========

- No multi-lingual support. Only English. Given that it's AWS, I'm very surprised by this actually (or I've missed out something in their docs)

- Can't configure text analysis for English. I feel this'll return relevant results for formal-style content, but probably not for informal-style content like emails.

- No connectors for common internal systems: Outlook, JIRA, Confluence

- No built-in support for CSV, XLS, JSON (that one's odd!). They'll all require preprocessing which means additional infra costs.

- Doesn't seem to support range- / query- facets. I feel lack of range facets is a big problem, especially for numerical data.

- No query-time relevance tuning

- No field-level access control

- Scores are not returned in results

- Common post-searching functionality is missing: rescoring, grouping, clustering

What's unknown:

============

mlboss6y ago

What kind of technology they might be using for this ?

vkaku6y ago

Where's the AWS Kitchen Sink Service? :)

j / k navigate · click thread line to collapse