It provides a git like pull/push workflow to edit sheets/docs/slides. `pull` converts the google file into a local folder with agent friendly files. For example, a google sheet becomes a folder with a .tsv, a formula.json and so on. The agent simply edits these files and `push`es the changes. Similarly, a google doc becomes an XML file that is pure content. The agent edits it and calls push - the tool figures out the right batchUpdate API calls to bring the document in sync.
None of the existing tools allow you to edit documents. Invoking batchUpdate directly is error prone and token inefficient. Extrasuite solves these issues.
In addition, Extrasuite also uses a unique service token that is 1:1 mapped to the user. This means that edits show up as "Alice's agent" in google drive version history. This is secure - agents can only access the specific files or folders you explicitly share with the agent.
This is still very much alpha - but we have been using this internally for our 100 member team. Google sheets, docs, forms and app scripts work great - all using the same pull/push metaphor. Google slides needs some work.
This is for Confluence. Markdown all the way:
https://github.com/grantcarthew/acon
Jira, same, markdown all the way:
https://github.com/grantcarthew/ajira
Other agent tools:
IMO, this is a better approach than the one used by Anthropic docx editing skill.
1. Did you compare this one with other document editing agents? Did you have any other ideas on how to make AI see and make edits to documents?
2. What happens if the document is a big book? How do you manage context when loading big documents?
PS:I'm working on an AI agent for Zoho Writer(gdocs alternative) and I've landed on a similar html based approach. The difference is I ask the AI to use my minimal commands (addnode, replacenode, removenode) to operate over the HTML and convert them into ops.
This works pretty well for me.
re. what happens if its a big book - each "tab" in the google doc is a folder with its own document.xml. A top-level index.xml captures the table of contents across tabs. The agent reads index.xml and then decides what else to read. I am now improving this by giving it xpath expressions so it can directly pick the specific sections of interest.
Philosophically, we wanted "declarative" instead of "imperative". Our key design - the agent needs to "think" in terms of the business, and not worry about how to edit the document. We move all the reconcilliation logic in the library, and free the agent from worrying about the google doc. Same approach in other libraries as well.
You need to rewrite your CLI for AI agents - https://news.ycombinator.com/item?id=47252459.
I think that's pretty cool so I put the post in the SCP (https://news.ycombinator.com/item?id=26998308).
https://pchalasani.github.io/claude-code-tools/integrations/...
They handle embedded images in both directions. There are similar gsheet2csv and csv2gsheet tools in the same repo.
Similar to the posted tool, there is a first time set up involving creating an app, that is documented above.
[1] in the sense there are multiple annoying clicks/steps to get a markdown doc to look good in Gdocs. You'd know the pain if you've tried it.
And then you have to edit it thereafter with their WYSIWYG editor. I was hoping there was an actual "markdown mode" where I could avoid getting into wars about whether this bullet point is in a new list or part of the previous list, etc.
IIRC there's no way to get markdown back out either, once you've realized you made the wrong choice.
I ran into similar issues as you for the image handling, and the work around I use is to use pandoc to convert to docx as a first step and then import that as a Google Doc using the API, as Google Docs seems to handle docx much better than markdown from what I've seen.
I did a few CLIs with codex in the last few weeks. I do simple ops with this stuff. I've had a few use cases for new features where previously I would have had to build some kind of quick and dirty admin UI just to use and test a new API feature before being able to integrate it into our product. With a generated cli, I can just play with it from the command line. Or make codex do that for me.
A good cli with a modern command line argument parser, well documented options, bash/zsh auto complete, pretty colors, etc. is generally nice to have. I mapped resources to commands and sub commands, made it add parameters with sensible defaults or optional ones. Then I got lazy and just asked it what else it thought it was missing, it made some suggestions and I gave it the thumbs up and it all got added. I even generated a simple interactive TUI at some point. Because why not? I also made it generate a md skill file explaining how to use the cli that you can just drop in your skills directory.
This CLI dynamically generates itself at run time though
gws doesn't ship a static list of commands. It reads Google's own Discovery Service at runtime and builds its entire command surface dynamically
You're not exactly describing rocket science. This is basically how websites work, there's never been anything stopping anyone from doing dynamic UI in TUIs except the fact that TUI frameworks were dog poop until a few years ago (and there was no Windows Terminal, so no Windows support). Try doing that in ncurses instead of Rataui or whatever, it's horrendous
> This is not an officially supported Google product.
Looked like an official Google Product on the first glance.
That or it's a personal project that IARC decided could live in the workspace project.
Disc: Former Googler
Also known as every single Google product
Seems like it was made by Google employee: https://justin.poehnelt.com/posts/rewrite-your-cli-for-ai-ag...
googleworkspace/cli appears to be more of a hobby project developed by a single Google employee.
There is an official process where an engineer can apply to a committee to have Google waive any copyright claim. That requires additional work so if your goal is simply to publish the code as open source and you do not mind it living under the Google org, using the Google repo path is usually much faster.
Disclaimer: ex-googler, not a lawyer, not arguing whether or not the situation with copyright assignment is legally enforceable or not/good or bad/etc.
Multiple errors and issues along the way, now I'm on `gws auth login`, and trying to pick the oAuth scopes. I go ahead and trust their defaults and select `recommended`, only to get a warning that this is too many scopes and may error out (then why is this the recommended setting??), and then yeah, it errors out when trying to authenticate in the browser.
The error tells me I need to verify my app, so I go to the app settings in my cloud console and try to verify and there's no streamlined way to do this. It seems the intended approach is for me to manually add, one by one, each of the 85 scopes that are on the "recommended" list, and then go through the actual verification.
Have the people that built and released this actually tried to install and run this, just a single time, purely following their own happy path?
It wild that this process is still so challenging. There's got to be some safe streamlined way that sets up an app identity you own that can only use to access your own account.
My guess is that organizationally within Google, the developer app authorization process must have many teams involved in its implementation and many other outside stakeholders. A single unified team wouldn't responsible for this confusion and complexity. I get why... it's a huge source of bad actors. But there's got to be a better way.
It’s a very different experience than AWS though and takes some getting used to.
Google Workspace API(s) keys and Roles was always confusing to me at so many levels .. and they just seem to keeping topping that confusion, no one is addressing the core (honestly not sure if that is even possible at this point)
Just being able to send commands to my Nest thermostat (which I own, and is on the same LAN) involved creating a cloud account, a "project" (wtf is a project, I just want an API key damnit, this isn't JIRA), a billing account, enabling billing, enabling the billing account, creating another account somewhere else on some other Google site, doing through mountains of 2FA issues in the process where I had to tap "Yes" on another device instead of the device I was actually using, enabling the project in the other account, installing it, publishing it, paying $5 somewhere in between and I didn't understand exactly for what, ...
Why the hell can't I set my temperature with a simple "curl" command to the thermostat's LAN IP? At the most with a simple "Authentication: Bearer" header?
Access blocked: [app name] is not approved by Advanced Protection. Error 400: policy_enforcedgetting the authentication to work is a real pain and it's basically preventing people access to an otherwise really good and useful MCP
Imagine a marketing person trying to set it up...
# API Keys in Settings
1. Go to Settings -> API Keys Page
2. Create Token (set scope and expiration date)
# OAuth flow
1. `gws login` shows url to visit
2. Login with Google profile & select data you want to share
3. Redirect to localhost page confirms authentication
I get that I need to configure Project and OAuth screens if I want to develop an Applications for other users, that uses GCP services. This is fine. But I am trying to access my own data over a (/another) HTTP API. This should not be hard.
Google have over a billion very non-technical users.
The friction of not having this in the account page that everyone has access too probably saves both parties lots of heartbreak.
We’re trying to create a single unified cli to every service on the planet, and make sure that everything can be set up with 3 clicks
Google's Gemini can read Google Docs directly.
They really don't want you to use another LLM product.
So they make the setup as difficult as possible.
The install script checks the OS and Arch, and pulls the right Rust binary.
Then, they get upgrade mechanism out of the box too, and an uninstall mechanism.
NPM has become the de facto standard for installing any software these days, because it is present on every OS.
That's the arbitrary code execution at install time aspect of npm that developers should be extra wary of in this day and age. Saner node package managers like pnpm ignore the build script and you have to explicitly approve it on a case-by-case basis.
That said, you can execute code with build.rs with cargo too. Cargo is just not a build artifact distribution mechanism.
Honestly I’m shocked to see so many people supporting this
That's not remotely true. If there is a standard (which I wouldn't say there is), it's either docker or curl|bash. Nobody is out there using npm to install packages except web devs, this is absolutely ridiculous on Google's part.
What?!? Must not be in any OS I've ever installed.
Now tar, on the other hand, exists even in windows.
When I use apt-get, I have no idea what languages the packages were written in.
I wish I could use an API/CLI to query/geoquery my photos.
I get better experience if I just copy-paste the sheet data into Gemini web. And IIRC copy-paste is just space "delimited" by default.
What is the practical difference between a "discovery service"+API and an MCP server? Surely humans and LLMs are better off using discovery service"+API in all cases? What would be the benefit of MCP?
I have done it many times, using the swagger.json as a "discovery service" and then having the agent utilize that API. A good OpenAPI spec was working perfectly fine for me all the way back when OpenAI introduced GPTs.
If we standardized on a discovery/ endpoint, or something like that, as a more compact description of the API to reduce token usage compared to consuming the somewhat bloated full OpenAPI spec, you would have everything you need right there.
The MCP side quest for AI has been one of the most annoying things in AI in recent years. Complete waste of time.
I consider it a good first attempt, but indeed hope for a sort of mcp2.0
Because of FOMO a lot of higher up decided that "we must do a MCP to show that we're also part of the cool kids" and to give an answer to their even-higher-up about "What are you doing regarding IA ?"
The project has been approved, a lot of time has been sunk into the project, so nobody wants to admit that "hmmm actually now it's irrelevant our existing API + a skill.md is enough"
I've seen that in at least 4 companies my friends work in, so I would be surprised if it's not something like that here too.
On the contrary claude code, in my experience, has been perfectly able to use `stripe` `gh` and to construct on the fly a figma cli (once instructed to do it).
Better this than a Google dashboard, or slopped together third party libs. I know Google says they don't support it, but they'll probably support it better than someone outside of Google can support it.
[1] https://workspaceupdates.googleblog.com/2025/12/workspace-st...
The decision to pass all params as a JSON string to --params makes it unfriendly for humans to experiment with, although Claude Code managed to one-shot the right command for me, so I guess this is fine. This is an intentional design per https://justin.poehnelt.com/posts/rewrite-your-cli-for-ai-ag...
Probably someone's hobby project or 20% time at best.
Clever, but frustrating that they don’t bother to provide any docs on the actual commands this supports.
[1] https://justin.poehnelt.com/posts/rewrite-your-cli-for-ai-ag...
Reading lots of comments about "MCP vs CLI" -- reminds me a bit of the "agent vs. script/app/rpa" debates. It's usually not one or the other, but rather, both.. or the right tool for the job (and that can shift over time).
Biggest complaint we have about MCP is bigger context windows and token spend. Tools do exist that address this. I have just one MCP endpoint with a half dozen tools behind it, including Gmail, Google Calendar, Docs, Github, Notion, and more. Uses tool search tool (ToolIQ) with tiny context footprint. Give it a whirl. https://venn.ai
Remember this repo is not an agent. It's just a cli tool to operate over gsuite documents that happens to have an MCP command and a bunch of skills prebundled.
That's a new one. I guess the hope is agents are good at navigating cli and it also democratizes the ecosystem to be used by any agent as opposed to Microsoft (which only allows Copilot to work in its ecosystem)
In other words: this is the FANGs worst enemy. This is adblock * 1000, as far as consumers are concerned.
[0] https://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypert...
https://justin.poehnelt.com/posts/rewrite-your-cli-for-ai-ag...
- No need to worry about transport layer stuff at all, including auth or headers. This is baked in, so saves context.
- They are self describing with --help and then nested --help commands, way better than trying to decipher an OpenAPI spec. You usually don't even need an agent skill, just call the --help and the LLM figures it out.
i’d rather not waste the context tokens re implementing their cli from scratch, if indeed it does a good job.
I wonder why they didn't do this on Python or Ruby, them being the superior languages where `==` works, blah, blah ...
npm install -g @googleworkspace/cli
gws auth setup
{ "error": { "code": 400, "message": "gcloud CLI not found. Install it from https://cloud.google.com/sdk/docs/install", "reason": "validationError" } }
Which takes you to...
https://docs.cloud.google.com/sdk/docs/install-sdk
Where you have to download a tarball, extract it and run a shell script.
I mean how hard is it to just imitate everyone else out there and make it a straight up npm install?
The contributors are a Google DRE, 5 bots / automating services, and a dev in Canada.
1. A GCP project (needed for OAuth) 2. Enabled APIs in said project
> Disclaimer
> Caution
> This is not an officially supported Google product.
But saying "it's for AI" is a corporate life hack for you to get permission to build said better tooling... =)
CharmCLI golang
Nushell rust
Warp. Shell
Were all around 2020 also that is when alt shells started getting popular probably for same reasons they still are.
> requires setting up gcloud cli first, necessitates making a Google Cloud project
cmon google how come even your attempts at good ux start out with bad ux? let me just oauth with my regular google account like every other cli tool out there. gh cli, claude, codex - all are a simple “click ok” in the browser to log in. wtf.
and the slow setup - i need to make my own oauth app & keys??
EDIT: oh yeah and get my oath app verified all so i can use it with my own account
(all pass through)
https://news.ycombinator.com/item?id=47157398
IMHO, CLI tools are better more often than not against MCP.
EDIT: and here is similar opinion from author himself: https://news.ycombinator.com/item?id=47252459
I mean it's great that we get this, hopefully it can continue to be maintained and I'd love to see a push for similar stuff for other products and at other companies.
checks https://github.com/googleworkspace/cli/tags
v0.1.1 2 days ago
v0.2.2 yesterday
v0.3.3 18 hours ago
v0.4.2 9 hours ago
v0.5.0 8 minutes ago
Interesting times we live in..
I mean I have personal gmail,drive, keep, etc. Will it work there?
Google Workspace is their corporate offering (think Microsoft suite competitor)
Also, what I find fascinating is that the repo was initialized 3 days ago so it seems it's still a work in progress.
Sounds handy. But use at your own risk.