perlmonger: (Default)
perlmonger ([personal profile] perlmonger) wrote2004-10-13 10:43 am

[geek] pondering

I'm cleaning Windows-sourced text data for HTML display - translating the cp1252 0x80-0x9f range into unicode character entities so the bloody "smart quotes" and other dross from crap pastede in from MS-Word yay has a chance of rendering on other platforms. I'm doing this in ColdFusion 5.

Rhetorical question (because I don't have time or energy to benchmark and the simple solution works just fine); at what text length and crap character density does a simple 32 x search and replace become less efficient than (1) an incremental regexp scan for any of the characters through the string, building the result as I go, or (2) shelling out to a Perl script to do the job properly?

Of course, if ColdFusion supported expressions as the replacement string in regexp search and replace, or even returned the position/length of multiple hits of a repeating pattern in regexp find, this would all be irrelevant... ;)

Post a comment in response:

(will be screened)
(will be screened if not validated)
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

If you are unable to use this captcha for any reason, please contact us by email at support@dreamwidth.org