This seems backwards, somehow. Like you're asking for an nth view and an nth API, and services are being asked to provide accessibility bridges redundant with our extant offerings.
Sites are now expected duplicate effort by manually defining schemas for the same actions — like re-describing a button's purpose in JSON when it's already semantically marked up?
No, I don't think you're thinking about this right. It's more like hacker news would expose an MCP when you visit it that would present an alternative and parallel interface to the page, not "click button" tools.
You're both right. The page can expose MCP tools like via a form element which is as simple as adding an attribute to an existing form and completely aligns with existing semantic HTML - eg submitting an HN "comment". Additionally, the page can define additional tools in javascript that aren't in forms - eg YouTube could provide a transcript MCP defined in JS which fetches the video's transcript
I think that rest and html could probably be already used for this purpose BUT html is often littered with elements used for visual structure rather than semantics.
In an ideal world html documents should be very simple and everything visual should be done via css, with JavaScript being completely optional.
In such a world agents wouldn’t really need a dedicated protocol (and websites would be much faster to load and render, besides being much lighter on cpu and battery)
> html could probably be already used for this purpose
You’re right, and it already is, and tools like playwright MCP can easily parse a webpage to use it and get things done with existing markup today.
> BUT html is often littered with elements used for visual structure rather than semantics.
This actually doesn’t make much of a difference to a tool like playwright because it uses a snapshot of the accessibility tree, which only looks at semantic markup, ignoring any presentation
> In such a world agents wouldn’t really need a dedicated protocol
They still do though, because they can work more better when given specific tools. WebMCP could provide tools not available on the page. Like an agent hits the dominoes.com landing page. The page could provide an order_pizza tool that the agent could interact with, saving a bunch of navigation, clicks and scrolling and whatnot. It calls the order_pizza tool with “Two large pepperoni pizzas for John at <address>”, and the whole process is done.
I see two totally different things from where we are today
1. This is a contextual API built into each page. Historically site's can offer an API, but that API a parallel experience, a separate machine-to-machine channel, that doesn't augment or extend the actual user session. The MCP API offered here is one offered by the page (not the server/site), in a fully dynamic manner (what's offered can reflect what the state of the page is), that layers atop user session. That's totally different.
2. This opens an expectation that sites have a standard means of control available. This has two subparts:
2a. There's dozens of different API systems available, to pick from, to expose your site. Github got half way from rest to graphql then turned back. Some sites use ttrpc or capnweb or gproto. There hasn't actually been one accepted way for machines to talk to your site, there's been a fractal maze of offerings on the web. This is one consistent offering mirroring what everyone is already using now anyways.
2b. Offering APIs for your site has gone out of favor in general. It often has had high walls and barriers when it is available. But now the people putting their fingers in that leaky damn are patently clearly Not Going To Make It, the LLM's will script & control the browser if they have to, and it's much much less pain to just lean in to what users want to do, and to expose a good WebMCP API that your users can enjoy to be effective & get shit done, like they have wanted to do all along. If webmcp takes off at all, it will reset expectations, that the internet is for end users, and that their agency & their ability to work your site as they please via their preferred modalities is king. WebMCP directs us towards a rfc8890 complaint future, by directly enabling site agency. https://datatracker.ietf.org/doc/html/rfc8890
To guard against this, the best course of action is probably modularization and composition, right? The Unix philosophy, ie building small, focused tools out of small, focused tools.
yes - i've thought that could work. returning to a more protected object oriented programming model (with hard-defined interfaces) could be a way - "make these changes but restrict yourself to this object" etc.
Currently experimenting with agent-of-empires for tmux+worktrees to parallelize code changes.
reply