fixed-width

A library for parsing fixed-width data with Apache Spark

License

License

GroupId

GroupId

za.co.absa
ArtifactId

ArtifactId

fixed-width_2.12
Last Version

Last Version

0.2.0
Release Date

Release Date

Type

Type

jar
Description

Description

fixed-width
A library for parsing fixed-width data with Apache Spark
Project URL

Project URL

https://github.com/AbsaOSS/fixed-width
Project Organization

Project Organization

ABSA Group Limited
Source Code Management

Source Code Management

http://github.com/AbsaOSS/fixed-width/tree/master

Download fixed-width_2.12

How to add to project

<!-- https://jarcasting.com/artifacts/za.co.absa/fixed-width_2.12/ -->
<dependency>
    <groupId>za.co.absa</groupId>
    <artifactId>fixed-width_2.12</artifactId>
    <version>0.2.0</version>
</dependency>
// https://jarcasting.com/artifacts/za.co.absa/fixed-width_2.12/
implementation 'za.co.absa:fixed-width_2.12:0.2.0'
// https://jarcasting.com/artifacts/za.co.absa/fixed-width_2.12/
implementation ("za.co.absa:fixed-width_2.12:0.2.0")
'za.co.absa:fixed-width_2.12:jar:0.2.0'
<dependency org="za.co.absa" name="fixed-width_2.12" rev="0.2.0">
  <artifact name="fixed-width_2.12" type="jar" />
</dependency>
@Grapes(
@Grab(group='za.co.absa', module='fixed-width_2.12', version='0.2.0')
)
libraryDependencies += "za.co.absa" % "fixed-width_2.12" % "0.2.0"
[za.co.absa/fixed-width_2.12 "0.2.0"]

Dependencies

compile (1)

Group / Artifact Type Version
org.scala-lang : scala-library jar 2.12.12

provided (3)

Group / Artifact Type Version
org.apache.spark : spark-core_2.12 jar 2.4.7
org.apache.spark : spark-sql_2.12 jar 2.4.7
org.apache.spark : spark-catalyst_2.12 jar 2.4.7

test (1)

Group / Artifact Type Version
org.scalatest : scalatest_2.12 jar 3.0.5

Project Modules

There are no modules declared in this project.

Fixed-Width Data Source for Apache Spark

A library for parsing FixedWidth data with Apache Spark. FixedWidth file is a flat file where each column has a fixed width (number of characters) and this is specified in a schema.

Usage

You can link against this library in your program at the following coordinates:

Scala 2.11

groupId: za.co.absa
artifactId: fixed-width_2.11
version: 0.2.0

Scala 2.12

groupId: za.co.absa
artifactId: fixed-width_2.12
version: 0.2.0

Using with Spark shell

This package can be added to Spark using the --packages command line option. For example, to include it when starting the spark shell:

Spark compiled with Scala 2.11

$SPARK_HOME/bin/spark-shell --packages za.co.absa:fixed-width_2.11:0.2.0

Spark compiled with Scala 2.12

$SPARK_HOME/bin/spark-shell --packages za.co.absa:fixed-width_2.12:0.2.0

Usage in code

import org.apache.spark.sql.types._
import org.apache.spark.sql.SparkSession

val sparkBuilder = SparkSession.builder().appName("Example")
val spark = sparkBuilder.getOrCreate()

val metadata = new MetadataBuilder().putLong("width", 10).build()
val schema = StructType(
  List(
    StructField("someColumn", StringType, true, metadata),
    StructField("secondColumn", StringType, true, metadata)
  )
)

val dataframe = spark
  .read
  .format("fixed-width")
  .option("trimValues", "true")
  .schema(schema)
  .load("/path/to/data/fixedWidthData")

Options

Option Value Explanation
trimValues true/false Should the whitespaces around data be trimmed
charset charset name (e.g. UTF-8) Any valid charset used to write the FixedWidth file
za.co.absa

ABSA OSS

ABSA Open Source

Versions

Version
0.2.0