[geek] pondering
Oct. 13th, 2004 10:43 am![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
I'm cleaning Windows-sourced text data for HTML display - translating the cp1252 0x80-0x9f range into unicode character entities so the bloody "smart quotes" and other dross from crap pastede in from MS-Word yay has a chance of rendering on other platforms. I'm doing this in ColdFusion 5.
Rhetorical question (because I don't have time or energy to benchmark and the simple solution works just fine); at what text length and crap character density does a simple 32 x search and replace become less efficient than (1) an incremental regexp scan for any of the characters through the string, building the result as I go, or (2) shelling out to a Perl script to do the job properly?
Of course, if ColdFusion supported expressions as the replacement string in regexp search and replace, or even returned the position/length of multiple hits of a repeating pattern in regexp find, this would all be irrelevant... ;)
Rhetorical question (because I don't have time or energy to benchmark and the simple solution works just fine); at what text length and crap character density does a simple 32 x search and replace become less efficient than (1) an incremental regexp scan for any of the characters through the string, building the result as I go, or (2) shelling out to a Perl script to do the job properly?
Of course, if ColdFusion supported expressions as the replacement string in regexp search and replace, or even returned the position/length of multiple hits of a repeating pattern in regexp find, this would all be irrelevant... ;)
(no subject)
Date: 2004-10-13 03:37 am (UTC)that's by Simon and Garfunkel, you know.