Dive into Deep Learning (opens in new tab)

(d2l.ai)

528 pointssoohyung6y ago89 comments

89 comments

58 comments · 9 top-level

fareesh6y ago· 36 in thread

As an engineer I find myself in this type of situation quite often - if anyone can point me to some good resources or has any advice, I'd be quite grateful:

- Some non-technical stakeholder comes to me and says "can we solve this problem with Machine Learning?" usually it's something like "there need to be two supervisors on the factory floor at all times, and I want an email alert everytime there are less than 2 supervisors for more than 20 minutes"

- I ask for some sample footage to build a prototype and get a few very poor quality videos, at a very different standard from what I see in most of these tutorials.

- I find some pre-trained model that is able to do people detection or face detection and return bounding rectangles and download it in whatever form

- After about 30 minutes of fiddling and googling errors, I run it against the sample footage

- I get about 60% accuracy - this is no good. Where do I go from here? Keep trying different models? There are all sorts of models like YOLO and SSD and RetinaNet and YOLO2 and YOLO3.

- At some point I try a bunch of models and all of them are at best 75% good. At this point I figure I should train it with my own dataset, and so I guess I need to arrange to have this stuff labelled. In my experience stakeholders are usually willing to appoint someone to do it but they want to know how much footage they need to label and whether their team will need special training to do the labelling and after it's all done is this even going to work?

What are some effective / opinionated workflows for this part of the overall process that have worked well for you? What's a labelling tool that non-technical users can use intuitively? How good are tools/services like Mechanical Turk and Ground Truth?

This part of the process costs time and money - stakeholders, particularly managers who are non-technical tend to want an answer beforehand - "If we spend all this time and money labelling footage, how well is this going to work? How much footage do we need to label?". How do you handle these kinds of conversations?

I find this space fairly well-populated with ML tutorials and resources but haven't been able to find content that is focused on this part of the process.

newfeatureok6y ago

I'm somewhat surprised at the responses for this.

I believe your issue can be easily solved - have supervisors wear a distinctive color from a non-supervisor. For example let's say it's yellow.

OK so now you have yellow wearing supervisors and everyone else. To resolve the issue you have described acquire a month or so of footage, with labels per minute describing how many yellow wearing supervisors and how many people (in total) there are.

So the data you have is:

1. Yellow wearing supervisors

2. Total amount of workers on the floor

Then with this data you can train a network to do what you're describing pretty easily. Assuming there are a lot of workers on the floor, trying to do person detection or face detection would require too much data. Just have a uniform enforced and train on the colors/presence.

deepGem6y ago

This is a pragmatic and valid approach. No matter what anyone else says.

Imagine, you told a 10 YO child to do this task. Even the child would ask the same question - how do I know who is a supervisor and who is not.

Not only is face recognition hard, it is almost impossible to accomplish in a factory floor like setting. Not totally impossible but it is really really hard. Face detection is still possible but face recognition is far more computationally expensive. You'll need a shit ton of data and you'll need access to the employee database. You'll need a whole new engineering pipeline to make this happen and of course a team.

Compared to that expense and time, you are way better off getting the company to approve special vests for supes.

1 more reply

fareesh6y ago

Sorry but it was a scenario I imagined and not something that happened in reality. I can't talk about some of the real-world scenarios that I am asked to consult on, so I made up a rather poorly thought-out one.

1 more reply

mrspeaker6y ago

"Easily solved - just have them wear special clothes." Everything is easy if you can arbitrarily change the requirements!

3 more replies

SubiculumCode6y ago

This is not bad, but once in this territory, why not just add some tracking beacon to a badge?

1 more reply

Gatsky6y ago

Reminds me of this:

https://userweb.cs.txstate.edu/~br02/cs1428/ShortStoryForEng...

asiachick6y ago

Just passing on info but ANA (the airline company) has colored helmets in their maintenance to facility to distinguish supervisors (color 1) from non supervisors (color 2) and 1st year employees (colors 3) and guests (color 4). I don't know if they do any tracking.

1 more reply

voodootrucker6y ago

- We use GCP for labeling [1]

- Yolov3 is state of the art for speed. I think RetinaNet does better if you have the horse power.

- I can't recommend FastAI [2] enough for learning things to try.

- 60% on a frame by frame basis might be enough as long as you have a low false positive rate you can tell. Combine with OpenCV mean shift if you need real time.

- Start small. Show success with pre-trained models, then move on to transfer learning. Start with a small dataset. Agree on a metric beforehand.

- Use a notebook. [3] Play around, don't let it run for days then look at the result.

[1] https://cloud.google.com/ai-platform/data-labeling/docs/

[2] https://course.fast.ai/

[3] https://github.com/Mersive-Technologies/yolov3/blob/master/f...

Edit: formatting

fareesh6y ago

Thanks I will check out these resources

fxtentacle6y ago

Most AI stuff is just horribly over-hyped, so the sad truth might be that what you are seeing is the state of the art and nobody else has found a better way yet.

As a practical example, figuring out where a given pixel moves from one video frame to the next one, when working on real-world videos, the best known algorithms get about 50% of the pixels correct. With clever filtering, you can maybe bump that to 60 or 70%, but in any case you will be left with a 30%+ error rate.

NVIDIA / Google / Microsoft / Amazon will tell you that you need to buy or rent more GPUs or Cloud GPU servers and do more training with more data. And there's plenty of companies in cheap labor countries offering to do your data annotation at a very reasonable rate. But both of them are just trying to sell to you. They don't care if it will solve your problem, as long as you're feeling hopeful enough to buy their stuff.

Judging from the bad results that even Google / Facebook / NVIDIA show at benchmarks, having a near-unlimited budget is still not enough to make ML work nicely.

Oh and for these image classification networks like YOLO, they have their own flavor of problems: https://www.inverse.com/article/56914-a-google-algorithm-was...

throwlaplace6y ago

>As a practical example, figuring out where a given pixel moves from one video frame to the next one, when working on real-world videos, the best known algorithms get about 50% of the pixels correct. With clever filtering, you can maybe bump that to 60 or 70%, but in any case you will be left with a 30%+ error rate.

what do you mean by this? optical flow isn't really a learning problem? it's a classical problem with very good classical algorithms

https://www.mia.uni-saarland.de/Publications/brox-eccv04-of....

https://people.csail.mit.edu/celiu/OpticalFlow/

https://github.com/pathak22/pyflow

1 more reply

tel6y ago

There are a load of questions here.

> Where do I go from here? Keep trying different models?

> ...after [the labeling is] all done is this even going to work?

> [How to label]

> If we spend all this time and money labelling footage, how well is this going to work? How much footage do we need to label?

Generally, you're discussing the space of model improvement and refinement. This is the costliest and most dangerous part of any ML pipeline. Without good evaluation, stakeholder support, and real reason to believe that the algorithm can be improved this is just a hole to throw money into.

The short answer to most questions is that you don't really know. Generally speaking, more data will improve ML algorithm performance, especially if that data is more specific to your problem. That said, more data may not actually substantially improve performance.

You will get much more leverage by using existing systems, accepting whatever error rate you receive, and building systems and processes around these tools to play to their strengths. People have suggested asking the floor managers to wear a certain color. You could also use the probabilistic bounds implied by the accuracies you're seeing to build a system which doesn't replace manual monitoring, but augments it.

Perhaps you can emit a warning when there's a likelihood exceeding some threshold that there aren't enough people on the floor. This makes it easier for the person monitoring manually, catches the worst case scenarios, and helps improve the accuracy of the entire monitoring system.

Not only can these systems be implemented more cheaply, they will provide early wins for your stakeholders and provide groundwork for a case to invest in the actual ML. They might also reduce the problem space that you're working in to a place where you can judge accuracy better and build theories about why the models might be underperforming. This will support experiments to try out new models, augment the system with other models, or even try to fine-tune or improve the models themselves for your particular situation.

In terms of software development lifecycles, it's relatively late in the game when you can afford the often nearly bottomless investment of "machine learning research". Early stages should just implement existing, simple models with minimal variation and work on refining the problem such that bigger tools can be supported down the line if the value is there.

fareesh6y ago

Thanks - this validates many of the assumptions I had about this part of the process.

It has been challenging communicating many of these realities to non-technical folks, who seem to be quite misguided about implementing these types of systems as opposed to "non-ML" systems where there is a less imperfect and more predictable idea of what's possible, how well it will work, and how much effort is required to pull it off.

2 more replies

mpfundstein6y ago

Gee, you just described my practice :-)

1 more reply

sickcodebruh6y ago

Have you tried fast.ai's Practical Deep Learning For Coders? https://course.fast.ai/ I think it's great for answering many of the exact questions you have.

I was able to answer my own versions of many of those questions after the first few video lessons. It demonstrated to us that our data is a great fit for machine learning. I didn't feel comfortable turning my experiments into something production-worthy but I feel confident enough to at least have conversations about it and sketch out a possible plan for what a contractor could work on this year.

fareesh6y ago

There seem to be a lot of courses in this space - I'll give this one a try since you're recommending it. Most of them seem to focus more on the theoretical / math aspects of stuff, which is quite interesting but I find it more interesting to implement these things and solve real-world problems.

1 more reply

ssivark6y ago

Awesome summary. Welcome to some lessons/truths (circa 2019 state of technology):

1. Deep learning (by itself) is often a shitty solution. It takes a lot of fiddling with not just the models, but also the training data — to get anything useful. Often the data generation team/effort becomes larger than the model-building effort.

2. It is hopeless to use neural networks as an end-to-end solution. This example will involve studying whether detections are correlated/independent in neighboring frames... whether information can be pooled across frames... whether you can use that to build a robust real-time of the scene of interest, etc. That will involve lots of judicious software system design using broader ideas from ML / statistical reasoning.

This is why I find it hopelessly misleading to tell people to just find tutorials with TensorFlow/Pytorch and get started. You really need to understand what’s going on to be able to build useful systems.

That’s apart from all the thorny ethical questions raised by monitoring humans.

salty_biscuits6y ago

You need to start from what sort of accuracy do you need for the task from a business perspective (including what is acceptable in terms of false positives and false negatives). Just back of the envelope stuff. You have a rough idea of the "I copied stuff other people has done rate" and the "I spent few a days mucking about rate". This stuff always follows a logistic curve with time, starting at your first rate and asymptotically going to high 90%. Use this to get a ball park estimate of how long it will take / cost. If the accuracy required is close to 100% you can probably give up straight away. For things like this that I have done in the past, a good mental model has been if it isn't worth "manually automating" the task (i.e. paying someone somewhere to watch a webcam and send the email so you always have the end product and you eventually get labeled data as a byproduct) it might not be worth trying to automate it.

Eridrus6y ago

> "If we spend all this time and money labelling footage, how well is this going to work? How much footage do we need to label?"

Start by labeling some data yourself. If you need to scale things up, you're going to need very clear rubrics for how things should be labeled and you're not going to be able to make them without having labeled some data yourself.

Definitely think about what the easiest form of your task is. Labeling bounding boxes is time intensive, labeling whether there are 2 or more supervisors on the floor should be a lot easier, and you can easily label a bunch of frames all at once.

You're going to need to figure out what tooling you will need for labeling, is this available out of the box, or will you need something custom?

Label X data points yourself and do some transfer learning. Label another X data points and see how much better things get.

The rough rule of thumb is performance increases logarithmically with data[1]. After you have a few points on the curve about how much better things get from more data, fit a logarithmic curve and make a prediction of how much data you will need, though be prepared that you might be off by a factor of 10.

As others have mentioned, it's worth thinking about false positive/negative tradeoffs and how much you care about either.

If the numbers you're extrapolating to aren't satisfactory, then yeah, you need to keep messing around with your training until you bend the curve enough that it seems like you'll get there with labeled data.

[1] https://ai.googleblog.com/2017/07/revisiting-unreasonable-ef...

carbocation6y ago

60% on a per-frame basis might be enough if all you need to do is identify the condition "two supervisors are not on the floor" for at least 20 minutes.

As in, if you compute your per-frame score and compare it over bigger chunks of time, is it sufficiently different when 2 are on the floor and 2 are not?

fareesh6y ago

I wrote random numbers for the sake of narrating a scenario but yeah I suppose you could do Supervisor Present Y/N for 180 frame chunks @30fps and pick up the value per minute

dna_polymerase6y ago

For annotation, check out Prodigy [0].

Generally speaking, as classification systems themselves are pretty dumb there isn't really a way to know what architecture will work best for your task, other than trial and error. Of course you can optimize parameters in a less chaotic way (grid-search or AutoML). In my experience it mostly boils down to data. Try augmentation methods, acquiring more data or transfer learning with varying degrees of layer relearning.

[0]: https://prodi.gy/

fareesh6y ago

Looks great, will give it a try. I'm assuming I can just host this somewhere and send users the link.

rckoepke6y ago

Honestly, I'd recommend trying Google's AutoML. I'm not a shill or employee at all, I've just had really great luck with it identifying my cats (each by name) with only a small amount (couple hundred, low thousands) labeled frames.

In my case it probably used transfer learning on like a resnet-150 or inception or something. Regardless, it approaches the limits of what an expert in machine learning can accomplish, so you'll know very quickly whether you need higher quality video / yellow vests.

zmmmmm6y ago

The shocking thing that at least I ran into is the sheer quantity of training data you really need. The large companies doing this successfully are using utterly gigantic libraries of training data that are beyond anything others could ever come up with. It really brought home to me what a blunt intstrument deep learning really is.

fareesh6y ago

Is there some kind of rule of thumb for a minimum of how much data is needed for various types of problems?

1 more reply

jointpdf6y ago

I would recommend CVAT for annotation (for images/video): https://github.com/opencv/cvat

In general, annotating data for object detection or segmentation tends to be very hard to do effectively—expect low quality and inconsistent labels.

lowdose6y ago

It's about agile development. I would really like a write up how Google for example reduced the energy consumption in their data centers. I have a hunch 30% energy reduction was based on an insight on the specific causal relationship between demand and supply flows. This kicked-off a development sprint and eventually lead to a major energy reduction. A traditional waterfall project planning starting with a requirement to reduce 30% energy collapses before it starts.

nutanc6y ago

I would do the following:

- Manually scan through a couple of hours of data and setup a human baseline.

- Run standard algorithms and find their accuracy.

- Find errors in the model and analyze why the errors are happening. Is the model classifying some other object as a supervisor? Is the model not classifying the supervisor in certain lighting conditions or scenarios.

- Retrain the model with the failure scenarios so that it learns.

devit6y ago

Seems much easier and accurate to have the supervisors install an app on their mobile phones that checks whether the phones move (and thus being carried by a person) via accelerometer and whether they are on the factory floor via wi-fi/bluetooth beacons and reports to a central server.

In general, it's much better to not use machine learning at all if at all possible.

joshvm6y ago

At this stage the sibling comment to use GCP is pretty solid recommendation.

You can use Google for labelling (Mechancial Turk style), and AutoML Vision to train your model. It's going to be a bit pricey, but cheaper than your time to do the equivalent and will give you an educated guess at how much work it'll be to beat it. It costs about $100 to train a cloud vision model, I think (not including labelling)? You can also try the API for free to see how well Google does at finding people, they have better off the shelf models than you can get publicly.

https://cloud.google.com/vision/automl/docs/

You can try exploiting other things. Is your scene static? Try using frame differences as a feature. If it's a fixed environment then you should get a boost when fine tuning a model, versus some general person detector. COCO pretrained models should be quite good at finding people out of the box.

I wrote my own labelling tool specifically for Yolo which you may find useful (ie you label your data and export to a train-ready format): https://github.com/jveitchmichaelis/deeplabel

People who are not experienced are usually terrible at tagging images. They're not consistent, they miss objects and they don't understand why it's an issue. It will be faster to pay an "expert" service like mechanical turk, or do it yourself.

Basically a lot of your questions are open research problems. How much data do you need? Not a clue. It depends how your model is failing, which is always worth checking anyway. Figure out what the model is bad at and try and improve it, it should be doable to figure out where that 25% is going.

You should do better with a model like Faster-RCNN or its ilk. AutoML will do something like this, and you can try Facebook’s Detectron2 toolkit, or the Tensorflow Object Detection API.

Detecting unique people is a hard problem, by the way (eg two people versus the same person detected twice). You're better off just using an established method like RFID tags for presence/absence.

Another sibling made a great point. Don't detect people, train a model to output the number of people in the frame. This is how ML is applied to camera trap data with animals. In your case you can reduce this to a binary classification problem - >= 2 people, positive output.

proc06y ago

Isn't this a human learning problem? Just tell your supervisors to be aware of their counterpart on the floor, at all times?

meritt6y ago

Why is this a machine learning problem? Does your factory not have keycard access? Or just require your supervisors to carry some sort of RFID/BLE tracking device. These are well-solved problems.

fareesh6y ago

I had to lie about specifics in the example because I post under my real name and there are things I can't talk about :)

Apologies - I figured the primary intent of my comment - i.e. the questions at the end, would be the focus of most responses

throwlaplace6y ago

>What's a labelling tool that non-technical users can use intuitively?

i haven't used it but microsoft has this

https://github.com/microsoft/VoTT

>"If we spend all this time and money labelling footage, how well is the going to work?"

"not well at all because we don't have facebook/google scale training data. let's try to figure out a conventional way to do it". for the supervisors problem i would recommend bluetooth beacons.

hogFeast6y ago

Probably other ways to do this. RFID/GPS/something that tracks people going through a doorway...or get a better camera...I am really not sure why this needs ML though, this is not a new problem and you are taking a nuke to it when a good ol hammer works just fine (I know this isn't what you asked, I don't care...this kind of illogic cuts companies to death little by little).

throwlaplace6y ago· 5 in thread

this looks pretty good. certainly much better than goodfellow's deep learning book. definitely much the diagrams and code are much appreciated but i'm curious why mxnet over pytorch?

sanjose3216y ago

I find this comment amusing, have you read the goodfellow's book? That book is amazing.

hnarayanan6y ago

I suppose you and I have very different notions of the word 'amazing'.

throwlaplace6y ago

i read the first half of it very closely and skimmed the second.

bensaccount6y ago

all the authors look to be Amazon employees and I think MXNet is Amazon's "chosen" DL framework.

throwlaplace6y ago

ah that makes sense. should've googled author's names. i just assumed they were academic because of the large number of unis using the book.

1 more reply

dragandj6y ago· 4 in thread

I'll chip in with my book, which is written with programmers in mind, implements everything from scratch, works on CPU and GPU, at great speed. Directly links theory to implementation, and you can use it along with Goodfellow's Deep Learning book. Also, discusses all steps, and does not skip gradients by using autograd.

Deep Learning for Programmers: An Interactive Tutorial with CUDA, OpenCL, DNNL, Java, and Clojure.

https://aiprobook.com/deep-learning-for-programmers/

vga8056y ago

And, what makes me want to dive into this the most, there's some Clojure! Will definitely have to take a look a this one. Thanks.

dragandj6y ago

There's lots of Clojure! (in relative terms. In absolute terms, there's not much of it because Clojure is so concise and powerful that everything is implemented with very little code :)

mpfundstein6y ago

Is there a print version (in the planning)? I usually don’t buy ebooks

dragandj6y ago

Only a limited hand-crafted hardcover edition is planned. That being said, you can print a dead tree version from the PDF at your local printing shop (or at home) if you care about the text, and not that much about binding.

2 more replies

whoevercares6y ago· 2 in thread

Does MxNet as a DL framework still have a place given Pytorch/tensorflow pretty much dominated all use cases?Amazon/AWS still “officially” supported it but given its product driven culture it could replace it with whatever framework that move faster and is more demanded by customers. Vendor Lock-in in this case probably won’t work as well since Amazon is not quite a leader in this case

samcodes6y ago

MXNet existed before AWS picked it, and it has a lot of strengths. I’d use it (especially with Gluon) over TF any day. But that said, PyTorch is usually easy to use on AWS... the preference for MXNet seems weak

thatsenough6y ago

It existed at CMU, but it seems like even CMU has moved over to PyTorch. I think Amazon just doesn't want to seem like an "also ran" by conceding to one of its competitor's frameworks.

whoisnnamdi6y ago· 1 in thread

Great guide - though unless I missed it I think this is missing the latest advancements around Transformers, BERT, ELMo, etc.

This stuff is pretty fresh, so it's understandable, but the NLP chapter would be greatly enhanced by covering these newer topics

enitihas6y ago

Is there any book which has more than a passing mention of BERT?

bor1000036y ago· 1 in thread

Has anyone read this book ? It look very attractive but I want to hear some feedback before bookmarking another ML book.

lindbergh6y ago

Kinda did, but mostly the first chapters, actually up to CNN chapter (where real modern DL start). But so far, I really liked what I read. It has a very good blend of code and theory, with hands on applications throughout the whole book. Most importantly, all those applications could perfectly be copy pasted into your own environment. So it actually reminded me of a very thorough tutorial on a framework, more say than a regular textbook, although the authors don't compromise on mathematical arguments (but don't get lost in it either, they skimmed pretty fast on regularization theory imho). If you've had previous exposure to classical ML, I think it's a fantastic introduction to DL, enough to get started.

sanxiyn6y ago

See also Dive into Deep Learning Compiler from the same team: http://tvm.d2l.ai/

dang6y ago

Discussed a year ago: https://news.ycombinator.com/item?id=18838808

kolleykibber6y ago

RFID at the doors?

j / k navigate · click thread line to collapse

89 comments

58 comments · 9 top-level

fareesh6y ago· 36 in thread

As an engineer I find myself in this type of situation quite often - if anyone can point me to some good resources or has any advice, I'd be quite grateful:

- I ask for some sample footage to build a prototype and get a few very poor quality videos, at a very different standard from what I see in most of these tutorials.

- I find some pre-trained model that is able to do people detection or face detection and return bounding rectangles and download it in whatever form

- After about 30 minutes of fiddling and googling errors, I run it against the sample footage

- I get about 60% accuracy - this is no good. Where do I go from here? Keep trying different models? There are all sorts of models like YOLO and SSD and RetinaNet and YOLO2 and YOLO3.

I find this space fairly well-populated with ML tutorials and resources but haven't been able to find content that is focused on this part of the process.

newfeatureok6y ago

I'm somewhat surprised at the responses for this.

I believe your issue can be easily solved - have supervisors wear a distinctive color from a non-supervisor. For example let's say it's yellow.

So the data you have is:

1. Yellow wearing supervisors

2. Total amount of workers on the floor

deepGem6y ago

This is a pragmatic and valid approach. No matter what anyone else says.

Imagine, you told a 10 YO child to do this task. Even the child would ask the same question - how do I know who is a supervisor and who is not.

Compared to that expense and time, you are way better off getting the company to approve special vests for supes.

1 more reply

fareesh6y ago

1 more reply

mrspeaker6y ago

"Easily solved - just have them wear special clothes." Everything is easy if you can arbitrarily change the requirements!

3 more replies

SubiculumCode6y ago

This is not bad, but once in this territory, why not just add some tracking beacon to a badge?

1 more reply

Gatsky6y ago

Reminds me of this:

https://userweb.cs.txstate.edu/~br02/cs1428/ShortStoryForEng...

asiachick6y ago

1 more reply

voodootrucker6y ago

- We use GCP for labeling [1]

- Yolov3 is state of the art for speed. I think RetinaNet does better if you have the horse power.

- I can't recommend FastAI [2] enough for learning things to try.

- 60% on a frame by frame basis might be enough as long as you have a low false positive rate you can tell. Combine with OpenCV mean shift if you need real time.

- Start small. Show success with pre-trained models, then move on to transfer learning. Start with a small dataset. Agree on a metric beforehand.

- Use a notebook. [3] Play around, don't let it run for days then look at the result.

[1] https://cloud.google.com/ai-platform/data-labeling/docs/

[2] https://course.fast.ai/

[3] https://github.com/Mersive-Technologies/yolov3/blob/master/f...

Edit: formatting

fareesh6y ago

Thanks I will check out these resources

fxtentacle6y ago

Most AI stuff is just horribly over-hyped, so the sad truth might be that what you are seeing is the state of the art and nobody else has found a better way yet.

Judging from the bad results that even Google / Facebook / NVIDIA show at benchmarks, having a near-unlimited budget is still not enough to make ML work nicely.

Oh and for these image classification networks like YOLO, they have their own flavor of problems: https://www.inverse.com/article/56914-a-google-algorithm-was...

throwlaplace6y ago

what do you mean by this? optical flow isn't really a learning problem? it's a classical problem with very good classical algorithms

https://www.mia.uni-saarland.de/Publications/brox-eccv04-of....

https://people.csail.mit.edu/celiu/OpticalFlow/

https://github.com/pathak22/pyflow

1 more reply

tel6y ago

There are a load of questions here.

> Where do I go from here? Keep trying different models?

> ...after [the labeling is] all done is this even going to work?

> [How to label]

> If we spend all this time and money labelling footage, how well is this going to work? How much footage do we need to label?

fareesh6y ago

Thanks - this validates many of the assumptions I had about this part of the process.

2 more replies

mpfundstein6y ago

Gee, you just described my practice :-)

1 more reply

sickcodebruh6y ago

Have you tried fast.ai's Practical Deep Learning For Coders? https://course.fast.ai/ I think it's great for answering many of the exact questions you have.

fareesh6y ago

1 more reply

ssivark6y ago

Awesome summary. Welcome to some lessons/truths (circa 2019 state of technology):

That’s apart from all the thorny ethical questions raised by monitoring humans.

salty_biscuits6y ago

Eridrus6y ago

> "If we spend all this time and money labelling footage, how well is this going to work? How much footage do we need to label?"

You're going to need to figure out what tooling you will need for labeling, is this available out of the box, or will you need something custom?

Label X data points yourself and do some transfer learning. Label another X data points and see how much better things get.

As others have mentioned, it's worth thinking about false positive/negative tradeoffs and how much you care about either.

[1] https://ai.googleblog.com/2017/07/revisiting-unreasonable-ef...

carbocation6y ago

60% on a per-frame basis might be enough if all you need to do is identify the condition "two supervisors are not on the floor" for at least 20 minutes.

As in, if you compute your per-frame score and compare it over bigger chunks of time, is it sufficiently different when 2 are on the floor and 2 are not?

fareesh6y ago

I wrote random numbers for the sake of narrating a scenario but yeah I suppose you could do Supervisor Present Y/N for 180 frame chunks @30fps and pick up the value per minute

dna_polymerase6y ago

For annotation, check out Prodigy [0].

[0]: https://prodi.gy/

fareesh6y ago

Looks great, will give it a try. I'm assuming I can just host this somewhere and send users the link.

rckoepke6y ago

zmmmmm6y ago

fareesh6y ago

Is there some kind of rule of thumb for a minimum of how much data is needed for various types of problems?

1 more reply

jointpdf6y ago

I would recommend CVAT for annotation (for images/video): https://github.com/opencv/cvat

In general, annotating data for object detection or segmentation tends to be very hard to do effectively—expect low quality and inconsistent labels.

lowdose6y ago

nutanc6y ago

I would do the following:

- Manually scan through a couple of hours of data and setup a human baseline.

- Run standard algorithms and find their accuracy.

- Retrain the model with the failure scenarios so that it learns.

devit6y ago

In general, it's much better to not use machine learning at all if at all possible.

joshvm6y ago

At this stage the sibling comment to use GCP is pretty solid recommendation.

https://cloud.google.com/vision/automl/docs/

I wrote my own labelling tool specifically for Yolo which you may find useful (ie you label your data and export to a train-ready format): https://github.com/jveitchmichaelis/deeplabel

You should do better with a model like Faster-RCNN or its ilk. AutoML will do something like this, and you can try Facebook’s Detectron2 toolkit, or the Tensorflow Object Detection API.

Detecting unique people is a hard problem, by the way (eg two people versus the same person detected twice). You're better off just using an established method like RFID tags for presence/absence.

proc06y ago

Isn't this a human learning problem? Just tell your supervisors to be aware of their counterpart on the floor, at all times?

meritt6y ago

Why is this a machine learning problem? Does your factory not have keycard access? Or just require your supervisors to carry some sort of RFID/BLE tracking device. These are well-solved problems.

fareesh6y ago

I had to lie about specifics in the example because I post under my real name and there are things I can't talk about :)

Apologies - I figured the primary intent of my comment - i.e. the questions at the end, would be the focus of most responses

throwlaplace6y ago

>What's a labelling tool that non-technical users can use intuitively?

i haven't used it but microsoft has this

https://github.com/microsoft/VoTT

>"If we spend all this time and money labelling footage, how well is the going to work?"

"not well at all because we don't have facebook/google scale training data. let's try to figure out a conventional way to do it". for the supervisors problem i would recommend bluetooth beacons.

hogFeast6y ago

throwlaplace6y ago· 5 in thread

this looks pretty good. certainly much better than goodfellow's deep learning book. definitely much the diagrams and code are much appreciated but i'm curious why mxnet over pytorch?

sanjose3216y ago

I find this comment amusing, have you read the goodfellow's book? That book is amazing.

hnarayanan6y ago

I suppose you and I have very different notions of the word 'amazing'.

throwlaplace6y ago

i read the first half of it very closely and skimmed the second.

bensaccount6y ago

all the authors look to be Amazon employees and I think MXNet is Amazon's "chosen" DL framework.

throwlaplace6y ago

ah that makes sense. should've googled author's names. i just assumed they were academic because of the large number of unis using the book.

1 more reply

dragandj6y ago· 4 in thread

Deep Learning for Programmers: An Interactive Tutorial with CUDA, OpenCL, DNNL, Java, and Clojure.

https://aiprobook.com/deep-learning-for-programmers/

vga8056y ago

And, what makes me want to dive into this the most, there's some Clojure! Will definitely have to take a look a this one. Thanks.

dragandj6y ago

There's lots of Clojure! (in relative terms. In absolute terms, there's not much of it because Clojure is so concise and powerful that everything is implemented with very little code :)

mpfundstein6y ago

Is there a print version (in the planning)? I usually don’t buy ebooks

dragandj6y ago

2 more replies

whoevercares6y ago· 2 in thread

samcodes6y ago

thatsenough6y ago

It existed at CMU, but it seems like even CMU has moved over to PyTorch. I think Amazon just doesn't want to seem like an "also ran" by conceding to one of its competitor's frameworks.

whoisnnamdi6y ago· 1 in thread

Great guide - though unless I missed it I think this is missing the latest advancements around Transformers, BERT, ELMo, etc.

This stuff is pretty fresh, so it's understandable, but the NLP chapter would be greatly enhanced by covering these newer topics

enitihas6y ago

Is there any book which has more than a passing mention of BERT?

bor1000036y ago· 1 in thread

Has anyone read this book ? It look very attractive but I want to hear some feedback before bookmarking another ML book.

lindbergh6y ago

sanxiyn6y ago

See also Dive into Deep Learning Compiler from the same team: http://tvm.d2l.ai/

dang6y ago

Discussed a year ago: https://news.ycombinator.com/item?id=18838808

kolleykibber6y ago

RFID at the doors?

j / k navigate · click thread line to collapse