Add search to your Jekyll blog (opens in new tab)

(jzhang.io)

33 pointsjameszhang12y ago17 comments

17 comments

14 comments · 7 top-level

swanson12y ago· 3 in thread

Here's another approach: Make a templated JSON file that mimics a database-backend. When you generate the site (or just push to a GitHub Pages repo), the JSON file will be populated with the "search index" of your site. Something like this: https://github.com/swanson/lagom/blob/master/site.json

Then you can use whatever Javascript you'd like to search through your pages by pulling down the JSON file into memory and slicing and dicing to your heart's content.

derefr12y ago

This is really an approach that should get more attention.

People know all about database-backed websites (with an AJAX/REST server), and static-generated sites (with no server at all.)

But there's a middle-ground where you statically generate precomputed AJAX/REST responses to any query you could make, shove them in an S3 bucket and maybe put a CDN (e.g. Cloudflare) in front of it, and then treat that as your "server", building a traditional AJAXy frontend to talk to it. Scales nigh-on infinitely.

The best part is, it's not necessarily read-only! You can write a "real" server, too, that just handles updates. When it receives a request, GET the old version of the object it's modifying from S3 (no need for a database!), patch it up with the AJAXed-in data, and PUT it back. (And follow it up with a CDN single-file-purge API call, if it's relevant.)

comice12y ago

pretty much the approach taken with lunr.js (or you can store the search index itself, which is also json. http://lunrjs.com/

fjw12y ago

I like this approach a lot. If you have hundreds or thousands of posts, it would also be pretty easy to only append new posts to the JSON file instead of regenerating the entire thing.

StavrosK12y ago· 2 in thread

This got me thinking: The obvious way to do full-text search in the post body would be to build an index of {"word": [post_id, post_id, post_id]} of posts it appears in. However, this could be huge.

Does anyone know if there's a technique that uses a list of posts and a bloom filter that contains all the words in that post? I.e. you iterate over all the posts and check the bloom filter for membership of all the terms.

Since the number of words is (probably) much greater than the number of posts, you just need to loop a few hundred times at most, and you gain a lot in saved space. Also, since a Bloom filter has no false negatives, you are, at least, guaranteed to find all the posts that mention the specified words (with maybe a few "junk" ones in between, but which should be easy for the reader to filter out).

You can't do weighting with this technique, but it should at least be a quick way to figure out which post IDs you want to show.

Does anything like this exist currently?

EDIT: Here's a quick proof of concept: http://nbviewer.ipython.org/gist/skorokithakis/d115ab734d9ad...

It works fine, but the filters are a bit large (2 KB each), so I'm not sure how much space you save.

EDIT 2: This was so much fun that I wrote it up: http://www.stavros.io/posts/bloom-filter-search-engine/

hedgehog12y ago

Very nice. You will save some space and allow for some typos if you stem and soundex before insertion. Also you can save space and improve the run time somewhat if rather than many separate bloom filters you build one large one where each item is post ID + word. If you do that you can also insert each word bare so you get O(1) empty result sets, helpful if you're updating the results with every keystroke in a search box.

StavrosK12y ago

Huh, very nice idea! That should, indeed save a ton of space and be much simpler when searching! I'll try that now, thank you.

EDIT: Hmm, turns out it's pretty much the same size, which makes sense, I guess: http://nbviewer.ipython.org/gist/skorokithakis/0abbfebced25f...

1 more reply

suter12y ago· 1 in thread

We implemented search on our help site (http://help.simpletax.ca) using the excellent lunr.js (http://lunrjs.com) + this Jekyll plug-in (https://github.com/slashdotdash/jekyll-lunr-js-search). The search index is updated when you build your Jekyll site, so it's a piece of cake to maintain and gives you full text search. I don't know how well it would scale, but if you are in the hundreds of posts, you should be ok.

jameszhangOP12y ago

That's awesome, especially for a help site. My company does the same thing with our documentations.

paukiatwee12y ago· 1 in thread

My Jekyll blog using lunr js for fulltext search with JSON file as backend. Check it out here. http://dreamand.me/web/fulltext-search-at-jekyll-site/ and lunr js https://github.com/olivernn/lunr.js

jameszhangOP12y ago

That's neat, but is something wrong? I get this error on your site: https://cloudup.com/cti8Egu8fQS

shriphani12y ago

How about using a lightweight TFIDF engine like whistlepig [https://github.com/wmorgan/whistlepig]. It is implemented in ANSI C.

I had no issues writing a racket wrapper using the FFI for it: http://blog.shriphani.com/2013/08/27/racket-whistlepig-bindi...

captn3m012y ago

Even though its a nice hack, its more of a css based filtering which can be used irrespective of Jekyll. A jekyll specific search should include searching articles. I've been wanting to build something similar using the excellent lunr.js. The idea would be to rebuild the index before each push to the repo, and a special page (search.html) that loads up the index via Ajax and shows the results as you type.

Still, cool hack.

rgrieselhuber12y ago

Swiftype?

j / k navigate · click thread line to collapse

17 comments

14 comments · 7 top-level

swanson12y ago· 3 in thread

Then you can use whatever Javascript you'd like to search through your pages by pulling down the JSON file into memory and slicing and dicing to your heart's content.

derefr12y ago

This is really an approach that should get more attention.

People know all about database-backed websites (with an AJAX/REST server), and static-generated sites (with no server at all.)

comice12y ago

pretty much the approach taken with lunr.js (or you can store the search index itself, which is also json. http://lunrjs.com/

fjw12y ago

I like this approach a lot. If you have hundreds or thousands of posts, it would also be pretty easy to only append new posts to the JSON file instead of regenerating the entire thing.

StavrosK12y ago· 2 in thread

This got me thinking: The obvious way to do full-text search in the post body would be to build an index of {"word": [post_id, post_id, post_id]} of posts it appears in. However, this could be huge.

You can't do weighting with this technique, but it should at least be a quick way to figure out which post IDs you want to show.

Does anything like this exist currently?

EDIT: Here's a quick proof of concept: http://nbviewer.ipython.org/gist/skorokithakis/d115ab734d9ad...

It works fine, but the filters are a bit large (2 KB each), so I'm not sure how much space you save.

EDIT 2: This was so much fun that I wrote it up: http://www.stavros.io/posts/bloom-filter-search-engine/

hedgehog12y ago

StavrosK12y ago

Huh, very nice idea! That should, indeed save a ton of space and be much simpler when searching! I'll try that now, thank you.

EDIT: Hmm, turns out it's pretty much the same size, which makes sense, I guess: http://nbviewer.ipython.org/gist/skorokithakis/0abbfebced25f...

1 more reply

suter12y ago· 1 in thread

jameszhangOP12y ago

That's awesome, especially for a help site. My company does the same thing with our documentations.

paukiatwee12y ago· 1 in thread

My Jekyll blog using lunr js for fulltext search with JSON file as backend. Check it out here. http://dreamand.me/web/fulltext-search-at-jekyll-site/ and lunr js https://github.com/olivernn/lunr.js

jameszhangOP12y ago

That's neat, but is something wrong? I get this error on your site: https://cloudup.com/cti8Egu8fQS

shriphani12y ago

How about using a lightweight TFIDF engine like whistlepig [https://github.com/wmorgan/whistlepig]. It is implemented in ANSI C.

I had no issues writing a racket wrapper using the FFI for it: http://blog.shriphani.com/2013/08/27/racket-whistlepig-bindi...

captn3m012y ago

Still, cool hack.

rgrieselhuber12y ago

Swiftype?

j / k navigate · click thread line to collapse