Show HN: Sift – 읽기 쉽고 유형이 안전한 정규식을 Java로 작성합니다.
hackernews
|
|
🔬 연구
#java
#review
#show hn
#라이브러리
#정규식
#타입 안전
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
자바에서 정규 표현식의 가독성과 타입 안전성을 확보하기 위해 Sift라는 유창한 DSL이 개발되었습니다. 이 도구는 엄격한 유형 상태 기계를 활용하여 컴파일 시점에 유효한 문법을 보장하며, IDE 자동 완성 기능을 통해 해당 단계에서 문법적으로 유효한 메서드만 제안합니다. 또한 불변의 스레드 안전한 빌더와 재사용 가능한 패턴을 위한 모듈식 구성, 게으른 검증 기능을 특징으로 합니다.
본문
The Type-Safe Regex Builder for Java. If it compiles, it works. You've seen this before. Someone writes a regex, it works, and six months later nobody — including the author — can read it: // What does this even do? Pattern p = Pattern.compile("^(?=[\\p{Lu}])[\\p{L}\\p{Nd}_]{3,15}+[0-9]?$"); You add a character class, break the balance of brackets, and find out at runtime. You copy a regex from Stack Overflow, miss an escape, and watch it fail silently in production. You duplicate the same validation pattern across DTOs and forget to update one of them. There is a better way. Sift is a fluent DSL that turns regex construction into readable, self-documenting Java code. Its state machine enforces grammatical correctness at compile time — if your pattern compiles, it is structurally valid. // The same pattern, written with Sift: String regex = Sift.fromStart() .exactly(1).upperCaseLettersUnicode() // Must start with an uppercase letter .then() .between(3, 15).wordCharactersUnicode().withoutBacktracking() // ReDoS-safe .then() .optional().digits() // May end with a digit .andNothingElse() .shake(); // Result: ^[\p{Lu}][\p{L}\p{Nd}_]{3,15}+[0-9]?$ Your IDE guides every step. Wrong transitions simply do not exist as methods. Gradle: // Core engine — zero external dependencies implementation 'com.mirkoddd:sift-core:' // Optional: Jakarta Validation / Hibernate Validator integration implementation 'com.mirkoddd:sift-annotations:' // Optional: Engine RE2J implementation 'com.mirkoddd:sift-engine-re2j:' // Optional: Engine GraalVM implementation 'com.mirkoddd:sift-engine-graalvm:' Maven: com.mirkoddd sift-core latest com.mirkoddd sift-annotations latest com.mirkoddd sift-engine-graalvm latest com.mirkoddd sift-engine-re2j latest Sift targets Java 8 bytecode for maximum compatibility — including legacy Spring Boot 2.x and Android. | Method | Generates | Use when | |---|---|---| Sift.fromStart() | ^... | Validating from the start of a line (affected by MULTILINE flag) | Sift.fromAbsoluteStart() | \A... | Validating from the absolute start of the string (CRLF/Multi-Line safe) | Sift.fromAnywhere() | ... | Building reusable fragments or searching within text | Sift.fromWordBoundary() | \b... | Matching whole words | Sift.fromPreviousMatchEnd() | \G... | Iterative parsing | Sift.filteringWith(flag) | (?i)... | Global flags (case-insensitive, multiline, dotall) | | Method | Effect | |---|---| .shake() | Returns the raw regex String | .sieve() | Compiles with the default JDK engine → SiftCompiledPattern | .sieveWith(engine) | Compiles with a custom engine → SiftCompiledPattern | .andNothingElse() | Appends $ and seals the pattern — affected by MULTILINE flag and trailing newlines | .andNothingElseAbsolutely() | Appends \z — absolute end of string, completely immune to multi-line and CRLF injection bypasses | .andNothingElseBeforeFinalNewline() | Appends \Z — end of string, or just before a final \n | The real power of Sift is the ability to name your building blocks and compose them. Every Sift.fromAnywhere() call returns a reusable SiftPattern that can be embedded anywhere without carrying unwanted anchors. // Define named building blocks SiftPattern year = Sift.fromAnywhere().exactly(4).digits(); SiftPattern month = Sift.fromAnywhere().exactly(2).digits(); SiftPattern day = Sift.fromAnywhere().exactly(2).digits(); SiftPattern dash = Sift.fromAnywhere().character('-'); // Compose them into a date block SiftPattern dateBlock = year.followedBy(dash, month, dash, day); // Embed inside a larger pattern String logRegex = Sift.fromStart() .of(dateBlock) .followedBy(' ') .then().oneOrMore().anyCharacter() .andNothingElse() .shake(); // Result: ^[0-9]{4}-[0-9]{2}-[0-9]{2} .+$ Root vs Fragment: Patterns built with fromStart() orfromAbsoluteStart() , or closed withandNothingElse() ,andNothingElseAbsolutely() , orandNothingElseBeforeFinalNewline() becomeSiftPattern — they are sealed and cannot be embedded. Sift patterns are not just validators. They are fully equipped extraction tools. // Define a structured pattern with named groups NamedCapture yearGroup = SiftPatterns.capture("year", Sift.exactly(4).digits()); NamedCapture monthGroup = SiftPatterns.capture("month", Sift.exactly(2).digits()); NamedCapture dayGroup = SiftPatterns.capture("day", Sift.exactly(2).digits()); SiftPattern datePattern = Sift.fromStart() .namedCapture(yearGroup) .followedBy('-') .then().namedCapture(monthGroup) .followedBy('-') .then().namedCapture(dayGroup) .andNothingElse(); // Extract structured data directly — no Matcher boilerplate Map fields = datePattern.extractGroups("2026-03-13"); // → { "year": "2026", "month": "03", "day": "13" } // Extract all matches from a larger text List prices = Sift.fromAnywhere() .oneOrMore().digits() .sieve() .extractAll("Order: 3 items at 25 and 40 euros"); // → ["3", "25", "40"] // Stream results lazily for large inputs Sift.fromAnywhere().oneOrMore().lettersUnicode() .streamMatches(largeText) .filter(word -> word.leng
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유