Tuesday, May 8, 2012

Two cool regex features


Unicode Character Properties

\p{spec}


where spec - various specs definitions like:

  • L - letter
  • Ll - lowercase letter
  • P - punctuation
  • Cyrillic - Cyrillic letter
  • InSpecials: U+FFF0..U+FFFF
... and much more.

Character Class Subtraction

Example:


[a-z-[aeiuo]]


A single letter is not a vowel.


Could be combined with Unicode character properties: 

non-English letters - [\p{L}-[\p{IsBasicLatin}]]


More examples here



No comments:

Post a Comment