What does the URL extractor do?

It scans any block of text, HTML, or webpage source for URLs (http and https) and returns a deduplicated list. It also groups URLs by domain so you can see which sites are linked most often, and splits internal vs external when you specify your own domain.

How do I find all the URLs on a webpage?

Open the page, right-click and choose "View page source" (Cmd+Option+U on Mac, Ctrl+U on Windows). Copy the HTML and paste it into the URL extractor. The tool finds every URL in the page source — including images, scripts, and stylesheet references — not just the visible links.

Can the tool extract URLs from a PDF or Word document?

Yes — copy the text from the PDF or Word document and paste it into the extractor. The tool finds URLs anywhere in the pasted text, regardless of source format.

Does the extractor follow URLs to find more URLs?

No — this is a single-pass extractor on the text you paste. To recursively crawl a site and discover all URLs, use a dedicated crawler like Screaming Frog. To extract URLs from a sitemap.xml, use our Sitemap URL Extractor instead.

← Hyperleap

Try Hyperleap AI

URL Extractor

Find every URL in any block of text or HTML — deduplicated, with domain breakdown

Optional: enter your email and we'll send the results too. You'll still see them here.

Paste text, HTML, or any content with URLs

Your domain (optional — splits results into internal vs external)

How Do You Find All URLs in a Document or Webpage?

The URL extractor uses a regular expression to find every http:// and https:// link in your input. It works on:

Plain text. Email content, blog posts, transcripts, document exports, chat logs.
HTML source. Open any webpage, right-click → View page source, copy, paste. Catches every URL — anchors, image references, script sources, stylesheet links.
Markdown. Converted blog posts and docs paste in directly.
JSON or XML. API responses or config files often contain URL fields.

The tool deduplicates URLs and counts how many times each one appears in the source — useful for finding the most-linked external domains in a long article or the most-referenced internal pages in your site footer.

What's the Difference Between This and a Web Crawler?

A URL extractor is a one-pass tool: it finds URLs in the text you give it. A web crawler (Screaming Frog, Sitebulb, OnCrawl) is recursive: it visits each URL it finds, fetches that page, finds the URLs inside that page, and so on until it has discovered every reachable page on the site.

Use the URL extractor when:

You have a single document or page and want all the URLs out of it.
You need to audit external link references in a blog post.
You want a flat list of URLs from a sitemap or knowledge-base export.

Use a crawler when:

You need every URL on an entire website (50+ pages).
You also need status codes, page titles, meta descriptions, and load times.
You want to find broken links and redirect chains.

Why Does Domain Breakdown Matter for SEO?

When you extract URLs from a webpage or document, the domain breakdown tells you who you're linking to most. That matters for several SEO and editorial decisions:

Outbound link diversity. A blog post that links 10 times to one domain looks like an affiliate page or a paid placement. Diverse outbound links signal a balanced editorial voice. Aim for at least 3 different external domains per long-form post.
Authority signals. Linking to high-authority sources (Wikipedia, .gov, .edu, major publications) signals to search engines that your content is well-researched.
Competitor mentions. If your post links to competitors more than to your own pages, you're sending traffic and authority to them. Audit periodically to keep the balance right.
Internal vs external split. A healthy article links to a few external authoritative sources AND deep into your own site. The internal/external split this tool surfaces is a quick health check.

How Does Hyperleap AI Use URL Extraction?

When you train a Hyperleap AI agent on your website, the agent extracts URLs from every page it crawls — to discover linked sub-pages, to identify the booking-link destinations it should share with customers, and to spot resources it should reference in its answers. URL extraction is one of the most fundamental operations in turning a website into a usable knowledge base. See how Hyperleap AI agents work →

Related Tools

Sitemap URL Extractor

Webpage to Markdown

Now that you have traffic, what happens next?

Hyperleap AI turns website, WhatsApp, Instagram, and Messenger visitors into qualified conversations with answers grounded in your business.

Convert the next visitor