access-log


License

License

Categories

Categories

Net
GroupId

GroupId

net.sanori.spark
ArtifactId

ArtifactId

access-log_2.11
Last Version

Last Version

0.1.0
Release Date

Release Date

Type

Type

jar
Description

Description

access-log
access-log
Project URL

Project URL

https://github.com/sanori/spark-access-log
Project Organization

Project Organization

SanoriNet
Source Code Management

Source Code Management

https://github.com/sanori/spark-access-log

Download access-log_2.11

How to add to project

<!-- https://jarcasting.com/artifacts/net.sanori.spark/access-log_2.11/ -->
<dependency>
    <groupId>net.sanori.spark</groupId>
    <artifactId>access-log_2.11</artifactId>
    <version>0.1.0</version>
</dependency>
// https://jarcasting.com/artifacts/net.sanori.spark/access-log_2.11/
implementation 'net.sanori.spark:access-log_2.11:0.1.0'
// https://jarcasting.com/artifacts/net.sanori.spark/access-log_2.11/
implementation ("net.sanori.spark:access-log_2.11:0.1.0")
'net.sanori.spark:access-log_2.11:jar:0.1.0'
<dependency org="net.sanori.spark" name="access-log_2.11" rev="0.1.0">
  <artifact name="access-log_2.11" type="jar" />
</dependency>
@Grapes(
@Grab(group='net.sanori.spark', module='access-log_2.11', version='0.1.0')
)
libraryDependencies += "net.sanori.spark" % "access-log_2.11" % "0.1.0"
[net.sanori.spark/access-log_2.11 "0.1.0"]

Dependencies

compile (4)

Group / Artifact Type Version
org.scala-lang : scala-library jar 2.11.12
org.apache.spark : spark-core_2.11 jar 2.3.2
org.apache.spark : spark-sql_2.11 jar 2.3.2
org.apache.spark : spark-hive_2.11 jar 2.3.2

test (1)

Group / Artifact Type Version
org.scalatest : scalatest_2.11 jar 3.0.5

Project Modules

There are no modules declared in this project.

access.log parser for Spark SQL

Simple HTTPd log (a.k.a. access.log) parser for Spark SQL.

Currently, Combined and Common log formats are supported.

How to use

SQL (spark-sql)

When start spark-sql:

spark-sql --packages net.sanori.spark:access-log_2.11:0.1.0

In SQL, you can create user defined function and use it:

-- attach ToCombined as to_combined(text_line)
CREATE OR REPLACE FUNCTION to_combined
AS "net.sanori.spark.ToCombined";

-- read raw log file as one column table
CREATE OR REPLACE TEMP VIEW accessLogText
USING text
OPTIONS (path "access.log");

-- create parsed log as a table
CREATE OR REPLACE TEMP VIEW accessLog
AS SELECT log.*
    FROM (
        SELECT to_combined(value) AS log
        FROM accessLogText
    )

Spark SQL (spark-shell)

When start spark-shell:

spark-shell --packages net.sanori.spark:access-log_2.11:0.1.0

Or in build.sbt:

libraryDependencies += "net.sanori.spark" %% "access-log" % "0.1.0"

DataFrame

import net.sanori.spark.accessLog.to_combined
import org.apache.spark.sql.functions._

val lineDf = spark.read.text("access.log")
val logDf = lineDf
  .select(to_combined(col("value")).as("log"))
  .select(col("log.*"))

Dataset

import net.sanori.spark.accessLog.toCombinedLog

val lineDs = spark.read.textFile("access.log")
val logDs = lineDs.map(toCombinedLog)

RDD

import net.sanori.spark.accessLog.toCombinedLog

val lines = sc.textFile("access.log")
val rdd = lines.map(toCombinedLog)

What is provided

Combined or Common logs are transformed to the table which has the following meaning:

name type default value
remoteAddr String ""
remoteUser String ""
time Timestamp 1970-01-01T00:00:00Z
request String ""
status String ""
bytesSent Long null
httpReferer String ""
httpUserAgent String ""

Other information

How to build

sbt clean package

generates access-log_2.11-0.1.0.jar in target/scala-2.11.

Motivation

  • To simplify analysis of web server logs
  • Most of the logs of web server, that is HTTP server, are in Combined or Common log format.
  • To make user defined function that can be used on spark-sql command

Alternative

If you want to view access.log as a table on Hive, not on Spark, or want to process various log formats, nielsbasjes/logparser might be better solution.

Contribution

Suggestions, idea, comments, pull requests are welcome.

Versions

Version
0.1.0