Skip to content

Top Best Ask Show New Jobs

Ask HN: Where do you find interesting papers to read?

11 pointsObertr3y ago12 comments

I'd like to start reading a lot more academic papers on NLP and LLM, but I'm not sure where to look for interesting ones. It feels like there is an overload of them because of chatGPT ease of generation.

My main source right now is twitter with arxiv links retweeted most by people I follow.

My favourite ones are:

https://twitter.com/arxiv_cs_cl

https://twitter.com/papers_daily

Where do you mainly find good papers?

12 comments

10 comments · 7 top-level

PaulHoule3y ago· 2 in thread

I am working on a smart RSS reader that collects about 1000 articles on a good day from various sources including CS papers from arXiv. It selects about 300 articles (summary only) most days that i browse through a TikTok like interface (i judge one article at a time so I get valid negatives unlike the typing “learning to rank” problem). I can favorite an article to retrieve it later, say i like it to see more like it in the future but not save, or say i dislike it.

It is powered by transformer models and sbert.net, these are used to assign articles to 20 clusters generated daily, i see the top 15 from each cluster. This does a reasonable job of handling a diverse feed that includes CS abstracts, trade publication article, sports news, etc. I have high satisfaction in days that the system gets a lot of articles (peaks on Thorsday) but less on the weekends, sometimes I backfill high-scoring articles from last week then.

I tried using fine-tuned BERT-like models for classification and got them to equal the performance of the embedding-based system after a huge amount of work and a much longer training time. My problem is pretty noisy and there is some limit to how high i can get the AUC.

Are you tracking your satisfaction somewhere?

Interested in your embedding based system - is that embedding layer + neural net?

Sounds very cool overall:)

PaulHoule3y ago

I’ve thought about satisfaction and mood tracking (I am sure these are linked) but haven’t built anything that i really use other than my memory.

The embedding system uses a probability-calibrated SVM. My average AUC is 0.77, I hear TikTok gets in the low 80’s and they are using collaborative filtering. I got 0.72 with a bag-of-words and logistic regression model.

From a product standpoint it’s got the disadvantage that it takes about 1000 judgements to really get good, right now I am training over the last 40 days of data because it doesn’t really get better with more than that which is good news because the compute and storage are nicely bounded.

bjourne3y ago· 1 in thread

Read a random paper about LLMs and look at what it cites. Read those papers and look at what those cites. And so on. You'll soon figure out what the academic community consider the seminal papers in that field.

This is a good approach. It is basically how all literature reviews are done in academia.

When I find a paper I'm interested in I usually follow the cites.

The last time I was interested in a topic (tree segmentation) I used elicit.org * and I found it really nice to find new papers.

* From the FAQ:

If you ask a question, Elicit will show relevant papers and summaries of key information about those papers in an easy-to-use table.

https://www.quantamagazine.org/ and https://scholar.google.com/

tikkun3y ago

There's paperswithcode which has a ranking of sorts.

throwaway293033y ago

https://paperswelove.org/

This is assuming you have access behind the research paper paywalls. Not everyone does and sci-hub doesn't always have access to recent papers.

j / k navigate · click thread line to collapse