In 1981, weather disasters caused $3.5 billion in damages in the United States. In 2023, that number was $94.9 billion (https://www.ncei.noaa.gov/access/billions/time-series). The National Weather Service spends billions annually on its network of weather balloons, satellites, and aircraft sensors – generating hundreds of terabytes of data every day. This data, called observation data, is fed into massive supercomputers running advanced physics to produce global weather forecasts. Despite this cost, there are still places in the US where we don't know what the temperature will be two days from now: https://www.washingtonpost.com/climate-environment/interacti.... And for the rest of the world that lacks weather infrastructure? There’s always the Weather Rock: https://en.wikipedia.org/wiki/Weather_rock.
The most important data for these forecasts come from vertical data ‘slices’ of the atmosphere, called soundings. Every day 2,500 single-use latex radiosondes are launched across the globe to collect these soundings. They stay aloft for about two hours before popping and falling back to Earth. Launch sites for these systems are sparse in Latin America and Africa, and they’re completely non-existent over oceans. This leaves about 80% of the globe with inadequate weather data for accurate predictions.
The coverage gap became painfully evident to Max and Alex during their time at Urban Sky. While building balloons for high-altitude aerial imaging, they kept running into a problem: no matter what weather forecast they used, they couldn’t get accurate wind predictions for the upper atmosphere. They tried all of the free and commercial forecast products, but none of them were accurate enough. Digging into it more, we learned that a big part of the problem was the lack of high-quality in-situ data at those altitudes.
To solve this problem, our systems ascend and descend between sea level and 65,000ft several times a day to collect vertical data soundings. Each vehicle (balloon + payload) weighs less than a pound and can be launched from anywhere in the world, per the FAA and ICAO reg. Here’s one we launched from Potrero Hill in SF, https://youtu.be/75fN5WpRWH0 and here’s another near the Golden Gate Bridge, https://youtu.be/7yLmzLPUFVQ. Although we can’t “drive” these balloons laterally, we can use opposing wind layers to target or avoid specific regions. Here’s what a few simulated flight paths look like, to give you an idea: https://youtu.be/F_Di8cjaEUY
Our payload uses a satellite transceiver for communications and a small, thin film solar panel array to generate power. In addition to the weather data, we also get real-time telemetry from the vehicles, which we use to optimize their flight paths. This includes maintaining the optimal spacing between balloons and steering them to a recovery zone at the end of their lifespan so we can recycle them.
These systems spend most of their time in the stratosphere which is an extremely unforgiving environment. We’ll often see temperatures as low as -80°C while flying near the equator. Throughout the day, we experience extreme temperature cycling as they ascend and descend through the atmosphere. We’ll often encounter 100mph+ wind shears near the boundary with the troposphere (the tropopause) that can rip apart the balloon envelope. These conditions make the stratosphere a very difficult place to deploy to prod.
The real magic of what we’re building will come into play when we have hundreds of these systems in the air over data-sparse regions. But even now, we can do useful and interesting things with them. Some of our early customers are companies who fly very big, very expensive things into the stratosphere. They use our balloons to give them a clear idea of what conditions are ahead of their operations, and we’re working on a forecast product specifically designed for the stratosphere.
The combination of long duration and low cost is novel. We can theoretically maintain thousands of balloons in the atmosphere at any given time for a tenth of the cost of one useful weather satellite. We’re also using the data we collect to train AI models that produce forecasts with better accuracy than existing numerical (supercomputer) forecasts. Because we’re collecting totally unique data over areas that lack observation, our models will maintain a consistent edge versus models that are only trained on open data.
We’re really excited to be launching Sorcerer here with you! We’d love to hear what you think. And if you find one of our balloons in the Bay Area: Sorry! It’s still a work in progress (and please get it back to us).
I’ll leave you all with a bonus video of Paul Buchheit launching one of our balloons, which we thought was pretty cool: https://www.youtube.com/watch?v=-sngF9VvDzg
Asking because my research at the University of Oxford was around hyper space-efficient data transfer from remote locations for a fraction of the price.
The result was an award-winning technology (https://jsonbinpack.sourcemeta.com) to serialise plain JSON that was proven to be more space-efficient than every tested alternative (including Protocol Buffers, Apache Avro, ASN.1, etc) in every tested case (https://arxiv.org/abs/2211.12799).
If it's interesting, I'd love to connect and discuss (jv@jviotti.com) how at least the open-source offering could help.
If you want to get deeper, I published two (publicly available) deep papers studying the current state of JSON-compatible binary serialization that you might enjoy. They study in a lot of detail technologies like Protocol Buffers, CBOR, MessagePack, and others that were mentioned in the thread:
- https://arxiv.org/abs/2201.02089
- https://arxiv.org/abs/2201.03051
Hope they are useful!
> When transmitting data over the Internet, time is the bottleneck, making computation essentially free in comparison.
i thought this was an odd sales pitch from the jsonbinpack site, given that a central use-case is IoT, which frequently runs on batteries or power-constrained environments where there's no such thing as "essentially free"
That said, the production-ready implementation of JSON BinPack is designed to run on low powered devices and still provide those same benefits.
A lot of the current work is happening at https://github.com/sourcemeta/jsontoolkit, a dependency of JSON BinPack that implements a state-of-the-art JSON Schema compiler (I'm a TSC member of JSON Schema btw) to do fast and efficient schema evaluation within JSON BinPack on low powered devices compared to the current prototype (which requires schema evaluation for resolving logical schema operators). Just an example of the complex runtime-efficiency tracks we are pursuing.
I would imagine that CPUs are much more efficient than a satellite transmitter, probably? I guess you'd have to balance the additional computational energy required vs. the savings in energy from less transmitting.
That means that you can use any tooling/approach from the wide JSON Schema ecosystem to manage schema evolution. A popular one from the decentralised systems world is Cambria (https://www.inkandswitch.com/cambria/).
That said, I do recognise that schema evolution tech in the JSON Schema world is not as great as it should be. I'm a TSC member of JSON Schema and a few of us are definitely thinking hard on this problem too and trying to make it even better that the competition.
Asking because we use msgpack in production at work and it can sometimes be a bit slower to encode/decode than is ideal when dealing with real-time data.
The TLDR is that is that if you use JSON BinPack on schema-less mode, its still more space-efficient than MessagePack but not by a huge margin (depends on the type of data of course). But if you start passing a JSON Schema along with your data, the results become way smaller.
Please reach out to jv@jviotti.com. I would love to discuss your use case more.
Of course there is still a lot to do, but the idea being that what you get with JSON BinPack is extremely close to what you would have done for manually encoding your data, except that you don't have to worry about encoding things yourself :) Thus you get the best of both worlds: the nicety of JSON and the space-efficiency of manual encoding.
> Our payload uses a satellite transceiver for communications
Does that surprise someone? I think I would not have guessed this growth to be on such a scale. The chart suggests that severe storms are the main culprit.
I read a paper a few years back which dove into how the data sources for weather damage assessment have changed a lot over the years. Much of the increase is due to more complete reporting and changes in categorization. Also, nowadays more things are insured and modern IT has made gathering the insurance reporting far more exhaustive. Plus local, state and federal agencies responsible for relief and/or recovery are gathering and reporting increasing amounts of data with each decade since the 70s (in part because their budgets rely on it). Factors like these mean in prior decades the total damage costs may have been more similar to today's than they appear but a lot of the damage data we gather and report now wasn't counted or gathered then.
Although I have no experience related to weather science, I remember the paper because it made me realize how many broad-based, multi-decadal historical data comparisons we see should have sizable error bars (which never make it into the headline and rarely even into the article). Data sources, gathering and reporting methods and motivations are rarely constant on long time scales - especially since the era of modern computing. Of course, good data scientists try to adjust for known variances but in a big ecosystem with so many evolving sources, systems, entities and agencies, it quickly gets wickedly complex.
Using [0] $3.5 bn in 1981 would have been worth $11.7 bn in 2023.
Another comment [1] noted (but unfortunately didn't cite) that two years later the damage was assessed at $36 bn, or $110 bn in 2023 dollars.
Obviously what we're doing can't prevent severe weather from happening, but even very small improvements in accuracy and timelines can have a massive beneficial effect when a disaster does happen. My cofounders and I are all from Florida, so hurricanes are the most visceral examples for us. When hurricanes hit, there are always issues along the lines of "we didn't have the right resources in the right places to respond effectively." Those types of issues can be combated with better info.
[0]: https://www.statista.com/statistics/818411/weather-catastrop...
I'm picturing having a few dozen at launch site containers, launching them at the start of a day of flying, having them programmed to land in a rural area that a member can pick them up from and return to the launch sites.
Loon used pumps and an interior air ballast like blimps do. So clearly there are a few ways.
This sentence is legendary
Right now, most weather balloons fall back to Earth and stay where they land unless someone happens across them (since they can't be controlled and only last a couple of hours).
Congratulations for a great non-saas market and product!
What does this look like in practice? As you mentioned I know you don't really have any lateral control, but I imagine you can wait for it to overfly somewhere convenient to descend?
> Can I ask how you handled the regulatory aspect of launching from urban areas in SF? I can imagine the FAA would have given you some trouble?
We actually fall under a weather balloon exemption to the normal FAA rules for unmanned balloon flights. Here’s a quick rundown of the relevant rules and regulations (Part 101.1) for weather balloon flights in the U.S. Our balloons fully comply with all of these:
- Any on-board cellular tracking devices must be set to Airplane mode before takeoff (we don't have any cellular) - Each individual payload box/package must weigh less than 6 pounds (ours is <250g)
- If a payload has a weight-to-size ratio exceeding 3.0 ounces per square inch, it must weigh less than 4 pounds.
- Calculation: Divide the total payload weight in ounces by the area of its smallest face in square inches.
- If multiple payloads are carried by a single balloon, their combined weight must be under 12 pounds.- The string connecting the payload to the balloon must break under an impact force of no more than 50 pounds (our string is 30g)
- It’s prohibited to design or operate an unmanned free balloon in a way that poses a hazard to people or property.
- Dropping objects from the balloon that could endanger people or property is not allowed.
We don't have to notify the FAA about our operations as long as we meet these criteria. To be super safe, we don't launch near airports or other busy areas.
Is there any history of weather balloons having caused damage to aircraft? It seems like it could be really bad, but that the aircraft would also have to be exceptionally unlucky.
[0](https://www.ecfr.gov/current/title-14/chapter-I/subchapter-F...)
However, as they learned from experience, Loon was slowly increasing their upper limit on how long their balloons could stay up.
Let's look at the composition vs height:
https://en.wikipedia.org/wiki/Atmosphere_of_Earth#/media/Fil...
I understand helium will be sparse at low heights, but at each height there is a diversity of species, among which will be a lighter one. Could a balloon be oversized so that instead of using helium, at each height an 80% fill of the locally (that height) lightest gas could keep the balloon afloat. Oversized for this but also for the added weight for separating equipment for the wanted gas and associated panels to power it.
I.e. what prevents a balloon from flying indefinitely? At least up until radiation damage of the balloon membrane...
Have such attempts been made and what were the lessons?
Max, the first engineer at Urban Sky, hit me up and asked if I wanted to build their mission control. At the time, Urban Sky was just a four-person team, so they couldn’t pay me as much, but I jumped at the chance, even though it meant taking about half my usual salary.
Funny enough, my SaaS background actually helped me create mission control software that was way ahead of the curve!
I guess my advice is, find a small company you're passionate about, where you can make a big impact, and be open to taking a pay cut. It helps the company take less of a risk on you, and you get to work on something that really matters. Plus, when you’re solving real problems, things tend to work out, and eventually, you’ll end up making what you should in salary.
“ Each vehicle (balloon + payload) weighs less than a pound and can be launched from anywhere in the world, per the FAA and ICAO reg”
Florida recently passed a law that does not allow PicoBalloon or your weather balloon type launches from Florida soil. It will result in a $150 fine.
HB321
https://www.flsenate.gov/Session/Bill/2024/321/BillText/er/P...
Article
https://www.cbsnews.com/miami/news/floridas-balloon-ban-will...
A person who is 6 years of age or younger who
intentionally releases, organizes the release of, or
intentionally causes to be released balloons as
prohibited by s.
379.233 does not violate subsection (4) and is not.
subject to
the penalties specified in subparagraph 1.Just saw that - No Florida launches on the horizon luckily
It would be very cool if you could do an open house for bay area geeks to come and just ooh and ahh at the gadgetry. Even a virtual open house would be cool. Something less than a full demo, and more focused on the story behind the gestation and launch of the project (and then a demo.)
Right now we're focused on the stratospheric forecasts because that's what we know really well (and we already have some interested customers). Our data/models are great for all kinds of forecasts, including ground forecasts, and we'll quickly expand beyond the stratosphere.
Takes into account lots of stuff (e.g. attenuation from air, ozone, and water vapor) with the goal of estimating solar power at any altitude/latitude/day/time.
Seems feasible, never went past the analysis phase though :)
"Worldwide, most radiosonde observations are taken daily at 00Z and 12Z (6 a.m. and 6 p.m. EST)" - NOAA.gov
I know we get more weather data from other sources, but it seems insane that these 2 launch times per day (per balloon location) are what make up most of our current weather forecasting data.
You mentioned solar. Do you have the capability (or plans) to run these over night as well?
As of right now power constraints mean we maintain tracking throughout the night, but cannot execute altitude maneuvers. We have a solution to this cooking though!
Forecasts are made of so much input data it's insane, yes balloons matter but it's not the only thing. They are the only decent source of conditions aloft and the jet stream is main controller of our surface weather so it makes sense.
If it's not proprietary, I'd love to know - how do you "steer" vertically between different wind layers to move in the direction you want to go?
Can't wait to see where you guys take this!
So I can't get into exactly how we do our altitude control, but the Google Loon project has a really great explanation of how they made their (very big) balloons go up and down: https://x.company/projects/loon/
Loon made all of their research public after they shut down, and we're obviously heavily inspired by their work. Our systems use a lot of the tech they pioneered, just on a much, much smaller scale (for reference, Loon's balloons were the size of tennis courts) Here's the PDF in case you're interested in checking out the 400+ page writeup: https://storage.googleapis.com/x-prod.appspot.com/files/The%...
I saw in a response you said the balloons will periodically return to sea level and ascend (which sounds like a fun design challenge by itself.) Will you be doing so near populated areas as well?
Good luck!
ARM is a DOE program that ships top-tier instrumentation to various sites around the world. Loads of university researchers will follow, and you end up with a massive open source data pool. Houston in particular was focused on Aerosol effects on precipitation in the Coastal-Urban environment. There were loads of balloon launches from sites all over the city during the campaign, from large ozonesondes to the tiny sparv embedded foam cup ones (https://sparvembedded.com/products/windsond)
I'm in NY, and my university NOAA department has a focus on PBL Ozone measurements lately. My work in particular is focused on low cost UAV profiling up to about 150m, with a pipe dream of doing 0-3km.
I'm just a grad student, but if anything there sounds interesting feel free to email and I can try and get you in touch with more knowledgeable people.
https://news.ycombinator.com/item?id=41173161
Such that we can see them?
---
As others mentioned, this is a fantastic launch.
I'd love to have one permanently teathered to a place of my choise using a fiber-optic+carbon/kevlar thread to hold it in place with data coming down the fiber, and have the camera and pico compute data and radios powered by solar.
One difference though is that the ARGO floats are unfortunately not recycled, and just wash up on various beaches. (I'm curious whether you think you can realistically collect many of these mini balloons?)
If you do want to control the lateral position of fleets of sensors, oceanographers also now have "gliders", which are basically small powered drone submarines. These are used by a few groups, but most of the gliders in the world are operated by the US Navy, who launch them out of torpedo tubes to survey local ocean conditions (which is badass).
https://oceanservice.noaa.gov/facts/ocean-gliders.html
The recorded measurements present an interesting data assimilation challenge - they record data along 3D trajectories (4D including time), sampling jagged and twisting lines through the 4D space. But we normally prefer to think of weather/ocean data as gridded, so you need to interpolate the trajectory data onto the grid, whilst keeping the result physically-consistent. Oceanographers use systems like ECCO for ocean state estimation, which effectively find the "ocean of best fit" to various data sources.
Interestingly ECCO uses an auto-differentiable form of the governing equations for the ocean flow to ensure that updates stay physically consistent. This works by using a differentiable ocean fluid model called [MITgcm](https://github.com/MITgcm/MITgcm) to perform runs which match experimental data as closely as possible, and minimizing a loss function through gradient descent. The gradient is of a loss function (error) with respect to model input parameters + forcings, which is calculated by running MITgcm in adjoint mode - i.e. automatic differentation. Therefore this approach is sort of ML before it was cool (they were doing all this well before the new batch of AI weather models). See slides 9-18 of this deck for a nice explanation
https://firebasestorage.googleapis.com/v0/b/firescript-577a2...
The trajectory data is also interesting because it's sort of tabular, but also you often want to query it in an array-like 4D space. You could also call it a "ragged" array. We have nice open-source tools for gridded (non-ragged) arrays (e.g. xarray and zarr, and the pangeo.io project) but I think we could provide scientists with better tools for trajectory-like data in general. If that seems relevant to you I would love to chat.
P.S: Sorceror seems awesome, and I applaud you for working on something hard-tech & climate-tech!
That being said, trajectory-based data tooling could be super interesting to us. Let's definitely chat: austin@sorcerer.earth
And re: recovery, we're pretty confident we'll be able to recover the majority of our systems. Being in the air has the advantage that we can choose to 'beach' ourselves in a specific location, rather than the first place we run across land like with the buoys. At his previous company, Alex wrote a prediction engine able to get similar balloon systems to land in a predicted 1kmx1km zone for recovery
The idea that we'll be able to run ML weather models using "raw" observations and skip or implicitly incorporate an assimilation is spot-on - there's been an enormous shift in the AI-weather community over the past year to acknowledge that this is coming, and very soon.
But... in your launch announcement you seem to imply that you're already using your data for building and running these types of models. Can you clarify how you're actually going to be using your data over the next 12-24 months while this next-generation AI approach matures? Are you just doing traditional assimilation with NWP?
Also, to the point about reanalysis - that's almost certainly not correct. There are massive avenues of scientific research which rely on a fully-assimilated and reconciled, corrected, consistent analysis of atmospheric conditions. AI models in the form of foundation models or embeddings might provide new pathways to build reanalysis products, but they are a vital and critical tool and will likely be so for the foreseeable future.
Can help with the federal contract side and mass manufacturing etc.
Charles@turnsys.com
In case you’re curious, here’s where NOAA stores all their GFS related forecasts: https://registry.opendata.aws/noaa-gfs-bdp-pds/
1) What parameters are you measuring ? Did you think about also measuring gases?
2) What's your business model?
2. The US National Weather Service actually has a commercial data buy program called MESONET [0], where they buy weather data from both academic and commercial partners. We're in the process of becoming one of their commercial partners now. Once we are, a single balloon will pay for itself in a matter of days with the data it can collect, which will let us scale up the number of systems we have deployed. The data we collect right now also lets us build niche weather forecast products like the stratospheric wind forecast mentioned in the post. Once we have enough balloons up, we can start producing useful weather forecast products at a regional and then global scale.
One question that came to mind, and this applies to all weather baloons not yours specifically, with the large number of weather balloons launched daily, how is it That more aren’t sucked into airplane engines causing potential disaster for the airplane? Thanks
Our balloons are actually cheap and steerable enough that we plan to fly them into TS/Hurricanes to get data way out in the Atlantic, farther than hurricane hunters can operate.
Are you hiring? This is really exciting work.
[0] https://www.ycombinator.com/companies/industry/hard-tech
Also, your website runs like a dog, and make me not want to find out. It runs like a dog displaying dots ... on a sphere. With like, four (4), other images of note. Seriously, talk to one of these WASM demo people, this is sad for a company that posts here. "Book a Demo" with somebody that can write WebGL. I booked and canceled an appointment on Thu (4) Aug (8) 22, 2024 8:00 on your incredibly vague G. Calendar signup. It's a 42 joke. Learn WebGL.
And while you're at it, stop making websites that write 1000x / $ and "Book a Demo" without even vaguely mentioning cost. Give me a $100,000, I'll give you a $100. This sounds like an excellent business from my perspective. How bout you hand me your bank account, and I'll hand you $100.
Sorcerer was also an amazing Infocom game. Good company.
So you really think you can launch a giant network of balloons and have that data integrated into the NOAA/NCEP model suite? Even if you get over the red tape it will take 10 years + to integrate this shit into the data assimilation program. You claim that you can input your balloon data into magical AI and it produces better forecasts than what the GFS? What is the standard of measurement I dont actually believe you at all
the idea sounds great tho!