Cloudflare-IUAM-Solver
A simple "Pure Java" library and cli tool to breaking through the Cloudflare's anti-bot mechanism (a.k.a "I'm Under Attack Mode", or IUAM), implemented with HTMLUnit.
Prerequisites
- JDK 11
CLI Tool
Install
$ curl -LO https://github.com/ninja-beans/cloudflare-iuam-solver/releases/download/0.1.0/cfis
$ chmod +x cfis 
Usage
Print a cookie string.
$ ./cfis -c https://www.example.com
cf_clearance=XXXXXXXXXXXXXXXXXXXX-XXXXXXXXXX-X-XXX;__cfduid=XXXXXXXXXXXXXXXXXXXX; 
Download a html content with curl.
$ ./cfis -c > cookie.txt
$ ./cfis -u > ua.txt
$ curl -s --cookie "$(cat cookie.txt)" -A "$(cat ua.txt)" https://www.example.com/ 
Extract all images with curl and xmllint.
$ eval $(./cfis --curl https://www.example.com/) | xmllint --xpath "//img" --html - 2> /dev/null 
Java Library
Install
<dependency>
  <groupId>com.ninja-beans.crawler</groupId>
  <artifactId>cloudflare-iuam-solver-parent</artifactId>
  <version>0.1.0</version>
  <type>pom</type>
</dependency> 
Usage
Scraping with Java 11 HttpClient and Jsoup.
public class App {
  public static void main(final String[] args) throws IOException, InterruptedException {
    var url = args[0];
    var result = IuamSolver.solve(url);
    // 1. Create HttpClient
    var client = HttpClient
        .newBuilder()
        .version(Version.HTTP_1_1)
        .followRedirects(Redirect.NORMAL)
        .cookieHandler(result.getCookieManager()).build();
    // 2. Send the request and get the response
    var request = HttpRequest.newBuilder().header("Accept", "*/*")
        .header("User-Agent", result.getResponse().getUserAgent())
        .GET()
        .uri(URI.create(url))
        .build();
    var response = client.send(request, BodyHandlers.ofString(StandardCharsets.UTF_8));
    // 3. Parse the response
    var doc = Jsoup.parse(response.body(), url);
    var elm = doc.getElementById("title");
    System.out.println(doc.title());
    System.out.println(elm.html());
  }
} 
 JarCasting
 JarCasting