Scala regex collection
Scala-regex-collection is a pure scala regex collection
Add the library to your project
libraryDependencies += "com.github.gekomad" %% "scala-regex-collection" % "1.0.1"
Using Library
Patterns
You can use defined patterns or you can define yours
- Email ([email protected])
- Email1 ([email protected])
- Email simple ($@%.$)
Ciphers
- UUID (1CC3CCBB-C749-3078-E050-1AACBE064651)
- MD5 (23f8e84c1f4e7c8814634267bd456194)
- SHA1 (1c18da5dbf74e3fc1820469cf1f54355b7eec92d)
- SHA256 (000020f89134d831f48541b2d8ec39397bc99fccf4cc86a3861257dbe6d819d1)
URL, IP, MAC Address
- IP (10.192.168.1)
- IP_6 (2001:db8:a0b:12f0::1)
- URLs (http://abc.def.com)
- Youtube (https://www.youtube.com/watch?v=9bZkp7q19f0)
- Facebook (https://www.facebook.com/thesimpsons - https://www.facebook.com/pages/)
- Twitter (https://twitter.com/rtpharry)
- MAC Address (fE:dC:bA:98:76:54)
HEX
- HEX (#F0F0F0 - 0xF0F0F0)
Bitcoin
- Bitcon Address (3Nxwenay9Z8Lc9JBiywExpnEFiLp6Afp8v)
Phone numbers
- US phone number (555-555-5555 - (416)555-3456)
- Italian Mobile Phone (+393471234561 - 3381234561)
- Italian Phone (02 645566 - 02/583725 - 02-583725)
Date time
- 24 Hours time (23:50:00)
- LocalDateTime (2000-12-31T11:21:19)
- LocalDate (2000-12-31)
- LocalTime (11:21:19)
- OffsetDateTime (2011-12-03T10:15:30+01:00)
- OffsetTime (10:15:30+01:00)
- ZonedDateTime (2016-12-02T11:15:30-05:00)
- MDY (1/12/1902 - 12/31/1902)
- MDY2 (1-12-1902)
- MDY3 (01/01/1900 - 12/31/9999)
- MDY4 (01-12-1902 - 12-31-2018)
- DMY (1/12/1902)
- DMY2 (12-31-1902 - 1-12-1902)
- DMY3 (01/12/1902 - 01/12/1902)
- DMY4 (01-12-1902 - 01-12-1902)
- Time (8am - 8 pm - 11 PM - 8:00 am)
Crontab
- Crontab expression (5 4 * * *)
Codes
- Italian fiscal code (BDAPPP14A01A001R)
- Italian VAT code (13297040362)
- Italian Iban (IT28 W800 0000 2921 0064 5211 151 - IT28W8000000292100645211151)
- US states (FL - CA)
- US states1 (Connecticut - Colorado)
- US zip code (43802)
- US streets (123 Park Ave Apt 123 New York City, NY 10002)
- US street numbers (P.O. Box 432)
- Italian zip code (23887)
- German streets (Mühlenstr. 33)
Concurrency
- USD Currency ($1.00 - 1,500.00)
- EUR Currency (0,00 € - 133,89 EUR - 133,89 EURO)
- YEN Currency (¥1.00 - 15.00 - ¥-1213,120.00)
Strings
- Not ASCII (テスト。)
- Single char ASCII (A)
- A-Z string (abc)
- String and number (a1)
- ASCII string (a1%)
Logs
- Apache error ([Fri Dec 16 02:25:55 2005] [error] [client 1.2.3.4] Client sent malformed Host header)
Numbers
- Number1 (99.99 - 1.1 - .99)
- Unsigned32 (0 - 122 - 4294967295)
- Signed (-10 - +122 - 99999999999999999999999999)
- Percentage (10%)
- Scientific (-2.384E-03)
- Single number (1)
- Celsius (-2.2 °C)
- Fahrenheit (-2.2 °F)
Coordinates
- Coordinate (N90.00.00 E180.00.00)
- Coordinate1 (45°23'36.0" N 10°33'48.0" E)
- Coordinate2 (12:12:12.223546"N - 15:17:6"S - 12°30'23.256547"S)
Programming
- Comments (/* foo */)
Use the library
Validate String
Returns Option[String] with the matched string
import com.github.gekomad.regexcollection._
import com.github.gekomad.regexcollection.Validate.validate
import java.time.LocalDateTime
assert(validate[Email]("[email protected]") == Some("[email protected]"))
assert(validate[Email]("baz") == None)
assert(validate[MD5]("fc42757b4142b0474d35fcddb228b304") == Some("fc42757b4142b0474d35fcddb228b304"))
assert(validate[LocalDateTime]("2000-12-31T11:21:19") == Some("2000-12-31T11:21:19"))
findAll
Example extracting all emails from a string
import com.github.gekomad.regexcollection.Email
import com.github.gekomad.regexcollection.Validate.findAll
assert(findAll[Email]("bar [email protected] hi hello [email protected]") == List("[email protected]", "[email protected]"))
assert(findAll[Email]("[email protected]") == List("[email protected]"))
assert(findAll[Email]("ddddd") == List())
findFirst
Example extracting first email from a string
trait Bar
import com.github.gekomad.regexcollection.Validate.findFirst
import com.github.gekomad.regexcollection.Validate.findFirstIgnoreCase
import com.github.gekomad.regexcollection.Collection.Validator
implicit val myValidator = Validator[Bar]("""Bar@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*""")
assert(findFirstIgnoreCase[Bar]("bar [email protected] hi hello [email protected] 123 [email protected]") == Some("[email protected]"))
assert(findFirst[Bar]("bar [email protected] hi hello [email protected] 123 [email protected]") == Some("[email protected]"))
Get pattern
Returns the current pattern used for that type, for example for Email type:
import com.github.gekomad.regexcollection.Email
import com.github.gekomad.regexcollection.Validate.regexp
assert(regexp[Email] == """[a-zA-Z0-9\.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*""")
Modify default pattern
It's possible modify the default pattern for all types, example for Email
import com.github.gekomad.regexcollection.Email
import com.github.gekomad.regexcollection.Validate.validate
import com.github.gekomad.regexcollection.Collection.Validator
val email = "abc,a@%.d"
//using default pattern doesn't match the string
assert(validate[Email](email) == None)
//using custom pattern the string is matched
implicit val validator = Validator[Email](""".+@.+\..+""")
assert(validate[Email](email) == Some("abc,a@%.d"))
Matching your own type
Defining a pattern for Bar type
trait Bar
import com.github.gekomad.regexcollection.Validate.validate
import com.github.gekomad.regexcollection.Validate.validateIgnoreCase
import com.github.gekomad.regexcollection.Collection.Validator
// pattern for strings starting with "Bar."
implicit val myValidator = Validator[Bar]("Bar.*")
assert(validate[Bar]("a string") == None)
assert(validate[Bar]("Bar foo") == Some("Bar foo"))
assert(validate[Bar]("bar foo") == None)
assert(validateIgnoreCase[Bar]("bar foo") == Some("bar foo"))
findAllIgnoreCase
Retrieve all emails using findAll and findAllCaseSensitive
trait Bar
import com.github.gekomad.regexcollection.Collection.Validator
import com.github.gekomad.regexcollection.Validate._
//get all Alice's emails
implicit val myValidator = Validator[Bar]("""Alice@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*""")
val s = "bar [email protected] hi hello [email protected] 123 [email protected]"
assert(findAll[Bar](s) == List("[email protected]"))
assert(findAllIgnoreCase[Bar](s) == List("[email protected]", "[email protected]"))
Using a function pattern
Instead of using a regular expression to match a string it's possible defining a function pattern
Example matching even numbers
trait Foo
import com.github.gekomad.regexcollection.Validate.validate
import com.github.gekomad.regexcollection.Collection.Validator
def even: String => Option[String] = { s =>
{
for {
i <- scala.util.Try(s.toInt)
if (i % 2 == 0)
} yield Some(s)
}.getOrElse(None)
}
implicit val validator: Validator[Foo] = Validator[Foo](even)
assert(validate[Foo]("42") == Some("42"))
assert(validate[Foo]("41") == None)
assert(validate[Foo]("hello") == None)
Bugs and Feedback
For bugs, questions and discussions please use Github Issues.
License
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Special Thanks
To regexlib.com