Something I yearn for is a way to quickly learn about recent advances in some more niche topics I'm not already too familiar with, mostly to quickly advance/curb some ideas. I feel like it should be possible to make an LLM aware of (via online fine-tuning or perhaps summarized context of articles which match via some other method) new papers on Arxiv or at least abstracts collected at journals. Essentially something that, given a prompt asking for recent advances in a certain field, spits out papers that may be useful to check, ideally with a quick summary relating to the prompt (could go beyond the abstract, pulling out an idea from the main body).
Right now my workflow for this problem is mostly finding some sort of seed paper, and juggling abstracts on something like connectedpapers.
Does anyone know what the state of the art for this is, and whether something useful already exists or is being worked on in the space?
For a university data visualisation course I need to create insightful visualisation designs for climate (change) data. Apart from the usual meteorological office/world bank etc. data that is easy to find, does anyone have knowledge of interesting datasets that relate to the topic? This could also be datasets that measure second-order effects over a long time, that viewed with other resources paint an interesting picture, etc.
Many times I've found interesting links in HN comments, so I thought I'd ask and perhaps hit a great source here - thanks!
I'm a Data Scientist who has worked for some time in predictive maintenance, specifically for the chemical industry. I'll go back to university to study a Master's degree in Data Science, but already want to extend my horizon with interesting fields to work in.
I would like to do something meaningful and would like to find out what's possible perhaps in green tech, education, or other fields I'm not aware of. Making the chemical industry greener is obviously something with a potentially great effect, but it's very difficult to get datasets one could develop a product with.
Any ideas?
Most solutions seem to expose an API per model, but I think in my case this is unnecessary: in principle, each model will be called every hour (or other predefined frequency), and its prediction stored in a database. From there the user/dashboard will access the predictions as well as the time series they come from.
I could deploy docker containers per model, but given that 99.9% of the time there is no work to do but wait for new values, I'm thinking there could be a smarter way. Something along the lines of a queue from which the models are executed at every timestep.
I do want to avoid a single point of failure, need to be able to version and manage the models after deployment, and need to be able to deploy this in the Azure ecosystem. Does anyone have experience doing anything like this? I'm looking into things like seldon.io and Databricks, which look like they can provide what I need, but am thinking they may be too much?