robotparser-scala ![Build Status](https://camo.githubusercontent.com/62f380ae54096dcd4c49d122bc64b688e604f94f868ac9229a5379d3e2f3fe14/68747470733a2f2f7365637572652e7472617669732d63692e6f72672f62697a72656163682f726f626f747061727365722d7363616c612e706e673f6272616e63683d6d6173746572)
robotparser-scala implements a parser for the robots.txt
file format in Scala.
Setup
Add robotparser-scala as a dependency in build.sbt
:
libraryDependencies += "jp.co.bizreach" %% "robotparser-scala" % "0.0.5"
Usage
You'll parse the robots.txt
file as following:
import jp.co.bizreach.robot._
val stream: InputStream = ...
val robotsTxt = RobotsTxtParser.parse(stream)
And then, you have RobotsTxt
instance. By default, character encoding is UTF-8.
If you'll parse the sitemap file, as following:
import jp.co.bizreach.robot._
val stream: InputStream = ...
SitemapParser.parse(stream) match {
// Sitemap file
case x: Urlset => ...
// Sitemap Index file
case x: Sitemapindex => ...
}
SitemapParser
supports following files:
- XML Sitemap
- XML Sitemap Index
- Text Sitemap
- gz
And then, you have Urlset
or Sitemapindex
instance. By default, character encoding is UTF-8.