Right now it has an index of ~70k conference talks / lectures / speeches. I'm working on improving it to get slide text and audio quality (for ranking), and getting more historical content.
How do you scrap all that data?
My other is an Amazon affiliate site that has lists of 4+ star amazon programming books[1]. I'm constantly adding more books and languages to it.
Finally, I'm working on my Elixir library that generates Ecto models from an existing database table called Plsm[2]. It currently supports MySQL and Postgres
link is not working.
www is broken
A machine learning driven web application firewall
http://fsecurify.com/fwaf-machine-learning-driven-web-applic...
1. Your data is severely imbalanced, so accuracy is a very misleading metric to use here. From what I see, you have a 1:20 imbalance (malicious vs non-malicious distribution). This affects both the metrics and induces bias in classification.
2. I'd like to add to the other comment asking you for calibration curves and see what your minority class performance looks like in terms of precision, recall, f-beta, average precision (area under precision-recall curve).
3. Then, try and see if resampling helps or hurts the predictive performance- it typically speaks to the level of noise and small disjuncts in the data.
4. I see you've done a 0.2 split for test-train, but try and eliminate split bias by using stratified cross validation. This would ensure that you didn't just get lucky with random seed = 42 and get a really great test set.
All of these can be implemented using sklearn and imbalanced-learn [0]. Not included- deeper dive into cost sensitive and adversarial techniques. Let me know if you have any more questions and keep up the good work!
[0]: https://github.com/scikit-learn-contrib/imbalanced-learn
Source: PhD in imbalanced machine learning.
One thing to add, the data is not that much imbalanced. I only used 100000 non malicious and 50,000 malicious so its 2:1 actually. I didn't use all the non malicious queries.
Thanks again.
[0] http://scikit-learn.org/stable/modules/generated/sklearn.cal...
https://github.com/jmathai/elodie
I've been working on Elodie for over a year. It's organized over 15,000 of my personal photos and videos for me. It's also helped me craft a hands-free backup system.
I've written pretty extensively about it so I'll just link to those posts.
[0] (motivation) https://medium.com/vantage/understanding-my-need-for-an-auto...
[1] (solution) https://medium.com/@jmathai/introducing-elodie-your-personal...
[2] (adaptation for google photos) https://medium.com/swlh/my-automated-photo-workflow-using-go...
[3] (one year reflection) https://artplusmarketing.com/one-year-of-using-an-automated-...
[4] (protecting against bit rot) https://medium.com/vantage/how-to-protect-your-photos-from-b...
I wrote about it a bit here [1] under "the file system as a real time reflection of its contents"
[0] https://github.com/jmathai/elodie/issues/35
[1] https://medium.com/@jmathai/introducing-elodie-your-personal...
Nevetherless it was an interesting read.
You mention a "few weeks" of development and testing. How many few weeks?
Well i've run the email server that is now the second server for nearly half a year i think. Developing the site, creating tests and binding it to the email server maybe took 2-3 weeks. It is hard to tell because i had the landing page and backend developed shortly after the first test server was setup and then finished it much later within a little more than a week (i think, i tend to underestimate these things :).
To be fair: I've used my own Rails template which saved some time and i've reused a lot of code for stripe from a different project as well.
I have been doing a lot of customer development lately and I needed a tool that made it easy to reach out to large-ish groups of people whilst testing the efficacy of different messages.
I chose to build my own product because I could not find another tool that was simple enough to use (all the others were full-blown CRMs), I did not want to pay for the subscription to those other tools and (most importantly) I wanted to use this project to learn a couple new things (Javascript ES6, using Google APIs, building task queues).
It's pretty rough and I built it just for myself (I don't allow signups because I took a couple security shortcuts in the design) but it works and has increased my productivity dramatically.
10/10 would code again
Dext: https://github.com/vutran/dext Dext CLI: https://github.com/vutran/dext-cli
This is the result: http://87.is/mapspray/ . There are sample files on this page that can be shown.
The main unique feature of this project is the handling of KML file styles above what is preserved by i.e. converting them into geojson. Since it serves my needs I've ran out of steam on it and it will probably remain incomplete for the forseeable future.
The hope is that this will make it a little easier to build collaborative apps. More recently, though, I have been spending some time designing a new programming language(I know, I know, we already have so many), so I haven't spent as much time on otter.
Snakepit is a docker-enabled framework for analysis and triage of malware samples in a networked and containerized environment. It's designed for the easy addition of whatever tools you want to use. Written in Python, rarely worked on when fully sober.
We will see how far I get on that.
I decided I wanted to learn Go so I used this as a way to learn.
The social login stopped working reliably a little while back so right now it sort of read only mode.
I've started a free open alpha-test which you can check out here: