logtrix
![](https://camo.githubusercontent.com/28b1eaf11f49ad672b4192f202bfa7d6d6807f091ea8654cd828b81f86972cf6/68747470733a2f2f7777772e6a617661646f632e696f2f62616467652f6f72672e6e657470726573657276652f6c6f67747269782e737667)
Examples
Parsing a log file
try (CrawlLogIterator log = new CrawlLogIterator(Paths.get("crawl.log"))) {
for (CrawlDataItem line : log) {
System.out.println(line.getStatusCode());
System.out.println(line.getURL());
}
}
Grouping the summary by various things
CrawlSummary.byRegisteredDomain(log);
CrawlSummary.byHost(log);
CrawlSummary.byKey(log, item -> item.getCaptureBegan().toString().substring(0, 4)); // by year
Limit top N results
CrawlSummary.build(log).topN(10); // top 10 status codes, mime-types etc
Working with status codes
StatusCodes.describe(404); // "Not found"
StatusCodes.describe(-4); // "HTTP timeout"
StatusCodes.isError(-4); // true
StatusCodes.isServerError(503); // true
Command-line interface
Output a JSON crawl summary grouped by registered domain:
java -jar target/*.jar -g registered-domain crawl.log
For more options:
java -jar target/*.jar --help
Compiling
Install Maven and then run:
mvn package