Cincraw Crawler 仕様

クローリングポリシー

本Botが収集しているデータ

WebページのBody内データ（HTML、テキスト、画像、JavaScript）

Header情報（主にmetaタグにcontent要素にnoindexが含まれているか否か）

HTTP Headerの情報（ステータスコードおよびredirect_status, redirect_url, mimeType情報）

JavaScriptおよびCSSファイル（Webページをレンダリングする際に必要なJSおよびCSSファイル）

上記データを取得後、レンダリングの上Webページを描画し、スクリーンキャプチャ画像として保存しております。

特定のWebサイトに集中的かつ作為的なアクセスは行いません。

独自のURL選定アルゴリズムが「アクセス先Webサイトの負荷回避」のみを考慮しクロールの順番を決定しております。

特定のWebサイトやページに集中的かつ作為的にアクセスすることは一切行っておりません。

UserAgentは常に出力し続けます

クローラーは常にCincrawのUserAgentを出力しWebページにアクセスします。

広告リンクは巡回しません

Webページに表示されている広告（バナー広告やネイティブ広告、アフィリエイト広告など）のリンクをクローラーが巡回することはありません。

又、Canonicalで指定されているURLの規則性から正規ページのURLを抽出し、それら正規URLのみを極力巡回対象とするようクロール先URLリストの精査とチェックを定期的に行っております。

アクセスしたWebページのCookieを保存しません

Cookieを保存した状態で他のWebページにアクセスすることはありません。

（すべてのCookieは削除しています）

アクセス解析タグや広告測定タグを読み込みません

巡回先Webページのアクセス解析データに当クローラーのアクセスデータを混在させる事を防ぐべく、 GoogleタグマネージャーやGoogleAnalyticsなど、一般的な解析タグの読込＆リクエストを排除しております。

English

Data collected by this bot

Data in the Body of the web page (HTML, text, images, JavaScript)

Header information (mainly whether the meta tag contains noindex in the content element)

HTTP Header information (status code and redirect_status, redirect_url, mimeType information)

JavaScript and CSS files (JS and CSS files required to render the web page)

After acquiring the above data, the web page is rendered and saved as a screen capture image.

We do not access specific websites in a concentrated and artificial manner.

Our original URL selection algorithm determines the order of crawling, taking into account only the "load avoidance of the websites to be accessed".

We do not access specific websites or pages in a concentrated or artificial manner.

UserAgent is always output

The crawler will always output Cincraw's UserAgent to access web pages.

Advertising links are not visited

Crawlers do not visit links to advertisements (banner ads, native ads, affiliate ads, etc.) that are displayed on web pages.

In addition, the URLs of legitimate pages are extracted from the regularity of URLs specified in Canonical, and the crawl destination URL list is periodically scrutinized and checked to ensure that only those legitimate URLs are patrolled as much as possible.

We do not save cookies of accessed web pages.

We do not store cookies to access other web pages.

（All cookies are deleted.）

We do not load access analysis tags or advertising measurement tags.

In order to prevent our crawler's access data from being mixed with the access analysis data of the web pages we visit, we do not read or request general analysis tags such as Google Tag Manager and Google Analytics.