Since I joined, we've gone from <1k hours to >10k hours, and I've been really excited by how much our whole setup has changed. I've been implementing lots of improvements to the whole data pipeline and the operations side. Now that we train lots of models on the data, the model results also inform how we collect data (e.g. we care a lot less about noise now that we have more data).
We're definitely still improving the whole system, but at this point, we've learned a lot that I wish someone had told us when we started, so we thought we'd share it in case any of you are doing human data collection. We're all also very curious to get any feedback from the community!
I have dreamed many times about same story but with apple or epic games. But they have millions of human beings testing their products FOR FREE in every place of the world, hahahaha
But it feels eery to read a detailed story how they built and improved their setup and what obstacles they encountered, complete with photos - without any mention who is doing the things we are reading about. There is no mention of the staff or even the founders on the whole website.
I had a hard time judging how large this project even is. The homebuilt booths and trial-and-error workflow sound like "three people garage startup", but the bookings schedule suggests a larger team.
(At least there is an author line on that blog post. Had to google the names to get some background on this company)
You should consider an "about us" page :)
Though, I suppose if the model had LLM-like context where it kept track of brain data and speech/typing from earlier in the conversation then it could perform in-context learning to adapt to the user.
We only got any generalization to new users after we had >500 individuals in the dataset, fwiw. There's some interesting MRI studies also finding a similar thing that when you have enough individuals in the dataset, you start seeing generalization.
Have you played at all with thought-to-voice? Intuitively I’d think EEG readout would be more reliable for spoken rather than typed words, especially if you’re not controlling for keyboard fluency.
It does generalize between typed and spoken, i.e. it does much better on spoken decoding if we've also trained on the typing data, which is what we were hoping to see.
Both of these modes are incredibly slow thinking. Conciously shifting from thinking in concepts to thinking in words is like slamming on brakes for a school zone on an autobahn.
I've gathered most people think in words they can "hear in their head", most people can "picture a red triangle" and literally see one, and so on. Many folks who are multi-lingual say they think in a language, or dream in that language, and know which one it is.
Meanwhile, some people think less verbally or less visually, perhaps not verbally or visually at all, and there is no language (words).
A blog post shared here last month discussed a person trying to access this conceptual mode, which he thinks is like "shower thoughts" or physicists solving things in their heads while staring into space, except "under executive function". He described most of his thoughts as words he can hear in his head, with these concepts more like vectors. I agree with that characterization.
I'm curious what % of folks you've scanned may be in this non-word mode, or if the text and voice requirement forces everyone into words.
That said, the way to 10-20x data collection would be to open a couple other data collection centers outside SF, in high-population cities. Right now, there's a big advantage in just having the data collection totally in-house, because it's so much easier to debug/improve it because we're so small. But now we've mostly worked out the process, it should also be very straightforward for us to just replicate the entire ops/data pipeline in 3-4 parallel data collection centers.
“the room seemed colder” -> “ there was a breeze even a gentle gust”
Very interesting!
* A ceiling-based pully system could help take the physical load off the users and may allow for increased sensor density. Some large/public VR setups do this.
* I'm sure you considered it, but a double-converting UPS might reduce the noise floor of your sensors and could potentially support multiple booths. Expensive though, and it's already mentioned that data quantity > quality at this stage. Maybe a future fine-tuning step could leverage this.
Cool write up and hope to see more in the future!
A couple of questions: What's the relationship between the number of hours of neurodata you collect and the quality of your predictions? Does it help to get less data from more people, or more data from fewer people?
For a given amount of data, is it better to have more people with less data per person or fewer people with more data per person?
What you are trying to do is BIG, I love it. And I hope you could have more than 1M in a few months!
Keep pushing team!!!
If you mean the text quality scoring system, then when we added that, it improved the amount of text we got per hour of neural data by between 30-35%. (That includes the fact that we filter which participants we have return based on their text quality scores)
We tried google/facebook/instagram ads, and we tried paying for some video placements. Basically none of the explicit advertisement worked at all and it wasn't worth the money. Though for what it's worth, none of us are experts in advertising, so we might have been going about it wrong -- we didn't put loads of effort into iterating once we realized it wasn't working.
Those predictions sound good enough to get you CIA funding.
[see https://news.ycombinator.com/item?id=45988611 for explanation]