A Model Context Protocol (MCP) server providing advanced web scraping and extraction tools powered by Puppeteer. Easily extendable with your own tools.
This project is an MCP server built on @purinton/mcp-server
. It exposes a set of Puppeteer-powered tools via the Model Context Protocol, making them accessible to AI agents and automation clients.
Key Features:
- Dynamic tool loading from the
tools/
directory - Advanced web scraping and extraction using Puppeteer
- Simple to add or modify tools
- HTTP API with authentication
- Built for easy extension
Below is a list of tools provided by this MCP server. Each tool can be called via the MCP protocol or HTTP API.
Name: web-browse
Description: Extract text or JSON from a web page with advanced options (uses Puppeteer under the hood).
Input Schema:
{
"url": "string",
"method": "string (optional, default: GET)",
"headers": "object (optional)",
"body": "string (optional)"
}
Example Request:
{
"tool": "web-browse",
"args": {
"url": "https://example.com"
}
}
Example Response:
{
"url": "https://example.com",
"text": "Example Domain"
}
-
Install dependencies:
npm install
-
Configure environment variables:
-
MCP_PORT
: (optional) Port to run the server (default: 1234) -
MCP_TOKEN
: (required) Bearer token for authentication
-
-
Start the server:
node puppeteer.mjs
-
Call tools via HTTP or MCP client.
See the @purinton/mcp-server documentation for protocol/API details.
To add a new tool:
-
Create a new file in the
tools/
directory (e.g.,tools/mytool.mjs
):
import { z, buildResponse } from '@purinton/mcp-server';
export default async function ({ mcpServer, toolName, log }) {
mcpServer.tool(
toolName,
"Write a brief description of your tool here",
{ echoText: z.string() },
async (_args,_extra) => {
log.debug(`${toolName} Request`, { _args });
const response = 'Hello World!';
log.debug(`${toolName} Response`, { response });
return buildResponse(response);
}
);
}
- Document your tool in the Available Tools section above.
- Restart the server to load new tools.
You can add as many tools as you like. Each tool is a self-contained module.
You can run this server as a background service on Linux using the provided puppeteer.service
file.
Copy puppeteer.service
to your systemd directory (usually /etc/systemd/system/
):
sudo cp puppeteer.service /usr/lib/systemd/system/
- Make sure the
WorkingDirectory
andExecStart
paths in the service file match where your project is installed (default:/opt/puppeteer
). - Ensure your environment file exists at
/opt/puppeteer/.env
if you use one.
sudo systemctl daemon-reload
sudo systemctl enable puppeteer
sudo systemctl start puppeteer
sudo systemctl status puppeteer
The server will now run in the background and restart automatically on failure or reboot.
You can run this MCP server in a Docker container using the provided Dockerfile
.
docker build -t puppeteer .
Set your environment variables (such as MCP_TOKEN
) and map the port as needed:
docker run -d \
-e MCP_TOKEN=your_secret_token \
-e MCP_PORT=1234 \
-p 1234:1234 \
--name puppeteer \
puppeteer
- Replace
your_secret_token
with your desired token. - You can override the port by changing
-e MCP_PORT
and-p
values.
If you make changes to the code, rebuild the image and restart the container:
docker build -t puppeteer .
docker stop puppeteer && docker rm puppeteer
# Then run the container again as above
For help, questions, or to chat with the author and community, visit: