Data Connection: Web Crawler

A step-by-step guide to integrating a web crawler into your knowledge base.

Adding a Web Crawler

It takes three simple steps to add a web cralwer:

From your knowledge base, navigate to “Upload & Index” → “Data Sources”.
Select “+ Data Sources”.
Choose the Web Crawler option to add a website for crawling and indexing into your knowledge base.

Assign a meaningful name to this data source. This name will help you easily identify the purpose or nature of the link within your knowledge base.

Enter the URL of the website you want to crawl. Ensure the link is accurate and active.

Specify how deep the crawler should explore the website.

Level 0: Crawls only the specified URL.
Level 1: Crawls the specified URL and follows links on that page.
Level 2: Crawls the specified URL, follows links on that page, and also follows links on subsequent pages.

When using a web crawler, you may encounter certain limitations, such as:

Paywalls: If the URL is behind a paywall, the web crawler may be unable to access the full content.
JavaScript-Rendered Content: Some websites use JavaScript to dynamically load content, making it inaccessible to traditional web crawlers.

For best results, ensure the target website doesn’t have restrictions that could impede crawling.

‍