Gecco-Spring

Gecco Crawler With Spring

License

License

GroupId

GroupId

com.geccocrawler
ArtifactId

ArtifactId

gecco-spring
Last Version

Last Version

1.3.0
Release Date

Release Date

Type

Type

jar
Description

Description

Gecco-Spring
Gecco Crawler With Spring
Project URL

Project URL

https://github.com/xtuhcy/gecco-spring
Project Organization

Project Organization

Pivotal Software, Inc.
Source Code Management

Source Code Management

https://github.com/xtuhcy/gecco-spring

Download gecco-spring

How to add to project

<!-- https://jarcasting.com/artifacts/com.geccocrawler/gecco-spring/ -->
<dependency>
    <groupId>com.geccocrawler</groupId>
    <artifactId>gecco-spring</artifactId>
    <version>1.3.0</version>
</dependency>
// https://jarcasting.com/artifacts/com.geccocrawler/gecco-spring/
implementation 'com.geccocrawler:gecco-spring:1.3.0'
// https://jarcasting.com/artifacts/com.geccocrawler/gecco-spring/
implementation ("com.geccocrawler:gecco-spring:1.3.0")
'com.geccocrawler:gecco-spring:jar:1.3.0'
<dependency org="com.geccocrawler" name="gecco-spring" rev="1.3.0">
  <artifact name="gecco-spring" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.geccocrawler', module='gecco-spring', version='1.3.0')
)
libraryDependencies += "com.geccocrawler" % "gecco-spring" % "1.3.0"
[com.geccocrawler/gecco-spring "1.3.0"]

Dependencies

compile (2)

Group / Artifact Type Version
com.geccocrawler : gecco jar 1.3.0
org.springframework.boot : spring-boot-starter jar 1.5.3.RELEASE

Project Modules

There are no modules declared in this project.

gecco-spring

gecco爬虫和spring结合使用。1.2.9版本开始支持spring-boot。spring升级到4.x。

Download

<dependency>
    <groupId>com.geccocrawler</groupId>
    <artifactId>gecco-spring</artifactId>
    <version>x.x.x</version>
</dependency>

maven

初始化Gecco

加载完成bean后启动Gecco,可以通过继承SpringGeccoEngine类,初始化你的GeccoEngine,需要特别注意的是GeccoEngine需要用非阻塞模式start()运行:

@SpringBootApplication
@Configuration
public class App {

    @Bean
    public SpringGeccoEngine initGecco() {
        return new SpringGeccoEngine() {
            @Override
            public void init() {
                GeccoEngine.create()
                .pipelineFactory(springPipelineFactory)
                .classpath("com.geccocrawler.gecco.spring")
                .start("https://github.com/xtuhcy/gecco")
                .interval(3000)
                .loop(true)
                .start();
            }
        };
    }
    
    public static void main(String[] args) throws Exception {
        SpringApplication.run(App.class, args);
    }
    
}

开发Pipeline

pipeline的开发和之前一样,唯一不同的是不需要@PipelineName("consolePipeline")定义pipeline的名称,而是使用spring的@Service定义,spring的bean名称即为pipeline的名称。可以参考:

@Service("consolePipeline")
public class ConsolePipeline implements Pipeline<SpiderBean> {
	@Override
	public void process(SpiderBean bean) {
		System.out.println(JSON.toJSONString(bean));
	}
}

也可以使用@Configuration和@Bean定义pipeline。如:

@Configuration
public class BeanConfigure {
    
    @Bean(name="consolePipeline")
    public ConsolePipeline consolePipeline() {
        return new ConsolePipeline();
    }
}

DEMO

参考源代码中测试用例src/test,有详细的例子

Versions

Version
1.3.0
1.2.9
1.2.8
1.2.7
1.2.6
1.2.5
1.2.4
1.2.3
1.2.0
1.1.3
1.1.2
1.1.1
1.1.0
1.0.9
1.0.8
1.0.7
1.0.6
1.0.4
1.0.3
1.0.0