Website URLs let you index live web pages, making their content available for AI retrieval. This is perfect for existing documentation, knowledge bases, or product pages.
How Website Indexing Works
When you add a URL, Hyperleap:
- Fetches the page — Downloads the HTML content
- Extracts text — Removes navigation, ads, and scripts
- Processes content — Chunks and indexes the text
- Stores for retrieval — Makes content searchable by AI
Adding Website URLs
Open Your Source
Navigate to the Source where you want to add websites.
Go to Website URLs Tab
Click the "Website URLs" tab.
Add URL
Click "Add URL" and enter the full URL (including https://).
Configure Options
Choose whether to index just this page or crawl linked pages.
Save and Index
Click "Add" to start indexing the page(s).
Indexing Options
Single Page
Index just the URL you provide. Best for:
Landing Pages
Specific marketing pages
Blog Posts
Individual articles
FAQ Pages
Frequently asked questions
Product Pages
Individual product info
Crawl Mode
Index the page and follow links to related pages. Best for:
- Documentation sites with multiple pages
- Knowledge bases
- Help centers
URL Status
After adding URLs, you'll see their status:
Pending
Queued for indexing
Indexing
Currently being processed
Indexed
Content is searchable
Failed
Error during indexing
Best Practices
URL Selection
- Use canonical URLs — The main version of each page
- Avoid dynamic URLs — Parameters can cause duplicate content
- Prefer HTTPS — Secure pages are more likely to be accessible
Content Quality
- Text-heavy pages work best — Images and videos aren't indexed
- Well-structured content — Headings help organize chunks
- Public pages only — Login-protected content can't be indexed
Keeping Content Fresh
- Re-index when content changes — Click "Refresh" on updated URLs
- Remove outdated URLs — Delete pages that no longer exist
- Schedule re-indexing — For frequently updated content
Troubleshooting
Page Won't Index
Check that:
- The URL is publicly accessible (not behind a login)
- The page allows crawling (check robots.txt)
- The URL is correctly formatted
- The site isn't blocking automated access
Content Not Retrieved
If AI doesn't reference your web content:
- Verify the URL is in "Indexed" status
- Check that the Source is connected to your chatbot/tool
- Try asking questions using exact phrases from the page
Managing Website URLs
In the Website URLs tab, you can:
View
See indexed content from each URL
Refresh
Re-index to get latest content
Delete
Remove a URL from the Source
Next Steps
Learn about Workspaces to collaborate with your team on AI projects.