We tried for years with OpenStates to run a free legislative tracking product before eventually having it partner with a commercial provider who was willing to contribute the resources to keep it alive and help out with the open source pieces (shout out to Plural, nice folks).
Believe me when I say that this space is a classic nerd tar pit. It looks like a relatively easy problem, a few hundred scrapers, search, and some basic CRM functionality and you're off to the races.
The problem is that behind the scenes the data is very complicated, and the sources constantly change and break in goofy ways. You need to be running hundreds of scrapers constantly (many of them against akamai or cloudflare), and working around new source website bugs or procedural edge cases every week. It doesn't scale like something like product or web search where you can just ignore broken pages, the penalty for missing things is too high. Tuning your workflow so people find what they need without getting buried is tough, because there are tens of thousands of bills a session about things people think they care about like "AI" or "taxes". On top of that, the low or zero budget clientele is often that mix of high-expectation and low domain knowledge that makes them a big support burden.
Fiscalnote burned 750 million dollars in VC money on this and just went under this week, granted with a series of spectacular own-goals.
I wish this author the best of luck, and if you want to team up on scrapers please give us a shout. But please be aware that you're promising the moon, and try to build a model that will be financially and effort-sustainable. Keeping this stuff going is a _slog_. I'm really hoping that someone can bring the professional level tools to normal people.
What percentage of that went towards solving the actual problem?
3/4 of a billion dollars is enough to pay many, many people 50$ an hour to sit at a screen 24/7 and refresh any number of websites you want.
1) Government authored/sponsored materials are not considered “published” until they are available in both human and machine readable digital formats at no cost to the reader.
2) Publications of such materials
2.1) must not keep a record of individual usage beyond that technically necessary to provide access to the publication
2.2) not withstanding 2.1, take reasonable steps to secure digital publications from abuse that creates an undue administrative burden
Yes, your target users don’t have a lot of money, but they also deserve a sense of whether or not you’re going to keep maintaining this project. Additionally, they are generally NOT technical and will not have the skills necessary to set up or maintain this platform.
Without a paid offering, they will have to run the software and will not have any clarity about your long term commitment to the project. Feel free to reach out to me. My email address is in my profile.
Public comment is one of the least effective mechanisms for influencing policy, at least at the margin. You can drastically amplify your influence with a simple change: move from public to private commentary, directly and personally addressing your state and local reps. They all have email addresses and I think it's more likely than not you'll be surprised when they (and it'll actually be them, not some factotum) respond to your email.
This would stop working if everybody did it, but I live in an unusually (famously, in fact) engaged municipality and have been unreasonably successful at influencing policy and the evidence I see is that almost nobody does this.
There's probably a civic tech thingy to do here. Though I'd also be mindful of the appearance of canvassing. My experience is that decisionmakers very quickly clock canvassing efforts, and then mentally bucket input into "low-effort" and "high-effort", often in a way that amplifies smaller interests.
I also think you can probably get a long way just by doing a better job than your policy adversaries at presenting information. Another thing I've noticed reps quickly clocking: the commenters who clearly have never read a budget, or who don't know the difference between an Enterprise Fund and the General Fund. These are also problems tech can solve, by digesting and contextualizing data so people can present informed (or informed-sounding) arguments.
Emailing them privately in advance of the meeting will give them the opportunity to think about your input and, in some cases, reply and engage with you about the policy. It might not change their mind, but it will definitely help them see others' perspectives on their upcoming decision.
The line at the bottom of the page does a better job of describing what specifically this project is:
"FireStriker is a free civic engagement and legislative intelligence platform for community organizations, unions, PACs, and activists."
It relies on automated scraping + human confirmation. Louis Rossman describes how it works in https://www.youtube.com/watch?v=W420BOqga_s
Not sure you should be free. You want to be sustainable but not for profit.
Charging the customer is the small guys weapon to keep going.
The big guys cal also do that but they can subsidize lossy departments and sell data, sell stovk etc.
Make it GPLv3 open source if you want to offer a free.version
It is a completely fake concern. See here: https://blog.andymasley.com/p/the-ai-water-issue-is-fake
I can get behind "AI water use is not a serious concern" if all you are talking about is selling inference, and you're comparing some sort of usage metric (e.g. "water use per request"). Water and power use for inference is on the level of other heavy Internet products like video streaming or cloud compute.
There is a lot I can't ignore, though. Model training is incredibly demanding, so much that OpenAI was trying to get $1 trillion in investment to practically double the number of data centers in the United States by 2030. That is a serious concern when we have to make decisions between, say, consumer water availability and tech investment in water-scarce areas like Arizona and New Mexico.
In Oregon, there are some unique problems with Amazon's water deals in Umatilla, where they are increasing nitrate concentration of the local groundwater through evaporative cooling, and refusing to pay for on-site treatment.
I can go on about other environmental harms, but I think you should take a more nuanced look at the issue. Having ChatGPT summarize a news article is not an unreasonable demand compared to other compute activities, but AI in general is driving compute demand so high that the general public is forced to reckon with a problem that's been there since the beginning: the expansion, operation and use of the Internet has physical environmental consequences.
I've known this for a long time and ironically this very article proves me right. The article is straight LLM slop, right from the mouth of GPT, Gemini or Claude. It's overt. Out of all 30 articles on the current HN frontpage, this one is by far the highest % LLM-written of all of them - I took the time to skim through each of them. This tells you all you need to know. With this in mind it's poor behavior to accuse GP of all people of astroturfing.
Training is missing from the analysis entirely (as someone else noted)
Inference water use is indeed minimal per prompt no argument there, but training the old GPT-3 consumed roughly 5.4 million liters of water. LLaMA 3: ~22 million. These are huge events, happening multiple times a year across the industry, folding them into national averages seems like the statistical simplification he article criticizes everyone else for doing…
"Small nationally" ≠ "fine locally"
The Dalles, Oregon is the clearest example. In 2012, Google used 12% of the city's water supply. Today it consumes a third, around 1.19 million gallons per day, and well a sixth data center comes online in 2026, in the same area.
The city is now pursuing a $260 million reservoir expansion into a national forest (!), where 95% of the projected new water demand will be industrial, not residential. Residents are looking at a potential 99% rate increase by 2036 to fund infrastructure that may exists primarily to serve one company. Apparently the city fought a 13-month legal battle just to keep those numbers secret, that’s like a community being reshaped around a single tenant.
Hays County, Texas residents sharing the Edwards Aquifer with incoming data centers voted to block one. Memphis is watching xAI draw 5 million gallons per day. Bloomberg found two-thirds of new U.S. data centers since 2022 are sited in high water-stress zones. Arizona have already passed ordinances capping data center water use.
This to me looks like a problem in the making, AI water use isn't a national crisis for now, but local impacts are already real, training costs are systematically underreported, and the five year trajectory in water stressed regions deserves serious attention indeed
There are many good criticisms against data center. And yet, the water issue always comes up first. Must we spew false/untruthhood just so our political message is catchy? I suppose yes - in times of war/politics, the laws/truths are silent. But it doesn't have to be so here.
I've never had it come up first. Neat how 2 people can have 2 opposite experiences based on their different life paths.
Anyways: Between our 2 opposite experiences, it might as well be totally random, so I don't think the ordering of concerns is that important. Better to focus on substance, like the concerns themselves.
Or maybe the explanation is that nearly no one actually read it, that seems the more likely one.
https://firestriker.org/blog/building-firestriker-why-im-mak...
--
related, author is a friend with a less active HN account https://news.ycombinator.com/user?id=blakeofwilliam