edward_rolf on Hacker News

1

A latch-free MVCC document database/search library (opens in new tab)

(github.com)GitHub

2edward_rolf9y ago0

2

What is vector space? (opens in new tab)

(medium.com)

3edward_rolf9y ago0

3

How can a indie dev keep pace with big five? [pdf] (opens in new tab)

(github.com)PDF

1edward_rolf9y ago0

4

Show HN: Home-made database with log-structured writes

Hi, I'm Marcus. I've spent the last 15 months or so home-coding on a document database, after work, on weekends (every waking hour). My plans for this particular weekend was to implement log-structured writing. It went faster than I thought and the results were faster querying times and fast, append-only writes.

I've been working on this for almost a decade and being this close to the finish line feels surreal. This is my seventh iteration I believe. Everytime I realized I had an architecture built on a bad model I walked away, often in fury, depressed over my weak coding abilities but bouncing back pretty quick. I've gotten to a place where I'm comfortable throwing away months and months of coding. When starting anew I always did it fresh.

I did this from love and facination of search and NLP and to get good at coding. I started late in life as a professional programmer and have always felt the need to catch up with those younger than me. Today I feel like I achieved something.

Here is a demo of a search engine [0] built on ResinDB [1]:

[0] http://searchpanels.com/ [1] https://github.com/kreeben/resin/

7edward_rolf9y ago1

5

The Apple of Space and Search (opens in new tab)

(medium.com)

2edward_rolf9y ago0

6

Show HN: Vector space/tf-idf bag-of-words implementation (opens in new tab)

(github.com)GitHub

2edward_rolf9y ago0

7

Portable index for storage engines (opens in new tab)

(github.com)GitHub

1edward_rolf9y ago0

8

Show HN: Document db with word2vec-driven scoring and Levenshtein automata (opens in new tab)

(github.com)GitHub

2edward_rolf9y ago1

9

David vs. Gooliath (opens in new tab)

(medium.com)

1edward_rolf9y ago0

10

Show HN: A new, modern take on web search (like DDG but for non-geeks) (opens in new tab)

(searchpanels.com)

2edward_rolf9y ago1

11

Show HN: Community-driven web search with instant answers (opens in new tab)

(searchpanels.com)

1edward_rolf9y ago0

12

How do you crawl responsibly (opens in new tab)

(medium.com)

2edward_rolf9y ago0

13

Public web index slices (opens in new tab)

(medium.com)

1edward_rolf9y ago0

14

Ask HN: How do I crawl responsibly?

I've been developing a in-process search engine for some a while. Now it's time to experiment in distributing it over many machines and also serve up a public GUI but I am wary because I have never crawled the web before.

To start with I'm just going to index as much data I can fit on an entry-level cloud machine and because I am very poor I shall be asking for donations to further the scope of the index.

Say I start with Wikipedia and The Gutenberg project and a couple of news sites. The first two will be easy, they have dumps of their data and I also don't think Wikipedia would mind at all if I put a tiny amount of preasure on their servers for the good cause of building a free, anonymous and open web search. But what about the rest of the internet? Will they mind?

People crawl and scrape the web all the time for different purposes. I'm looking for some advise so that I don't piss anyone off with my crawler. What tools/strategies do you suggest I use?

Cheers!

1edward_rolf9y ago2

15

A binary trie (opens in new tab)

(medium.com)

2edward_rolf9y ago0

edward_rolf

Recent submissions

A latch-free MVCC document database/search library (opens in new tab)

What is vector space? (opens in new tab)

How can a indie dev keep pace with big five? [pdf] (opens in new tab)

Show HN: Home-made database with log-structured writes

The Apple of Space and Search (opens in new tab)

Show HN: Vector space/tf-idf bag-of-words implementation (opens in new tab)

Portable index for storage engines (opens in new tab)

Show HN: Document db with word2vec-driven scoring and Levenshtein automata (opens in new tab)

David vs. Gooliath (opens in new tab)

Show HN: A new, modern take on web search (like DDG but for non-geeks) (opens in new tab)

Show HN: Community-driven web search with instant answers (opens in new tab)

How do you crawl responsibly (opens in new tab)

Public web index slices (opens in new tab)

Ask HN: How do I crawl responsibly?

A binary trie (opens in new tab)

Recent submissions

A latch-free MVCC document database/search library (opens in new tab)

What is vector space? (opens in new tab)

How can a indie dev keep pace with big five? [pdf] (opens in new tab)

Show HN: Home-made database with log-structured writes

The Apple of Space and Search (opens in new tab)

Show HN: Vector space/tf-idf bag-of-words implementation (opens in new tab)

Portable index for storage engines (opens in new tab)

Show HN: Document db with word2vec-driven scoring and Levenshtein automata (opens in new tab)

David vs. Gooliath (opens in new tab)

Show HN: A new, modern take on web search (like DDG but for non-geeks) (opens in new tab)

Show HN: Community-driven web search with instant answers (opens in new tab)

How do you crawl responsibly (opens in new tab)

Public web index slices (opens in new tab)

Ask HN: How do I crawl responsibly?

A binary trie (opens in new tab)