undefined | Better HN

0 pointsMostAwesomeDude15y ago0 comments

I knew you were going to point out mbstring. I am talking about an actual Unicode string type. I am talking about using regular string functions on Unicode strings without caring whether or not they contain characters not in ASCII or Latin-1. I expect you'll next talk about how the function overloading "feature" is the correct thing to do, and you'll conveniently ignore how the documentation recommends against it and warns of undefined behavior.

Here's a fun question: How do you strtok(), implode(), or explode() mbstrings?

0 comments

1 comments · 1 top-level

dexen15y ago

For background:

- I've seen the Python (2.x) way (separate `unicode' and `str' type and no way to set own conversion defaults per whole process -_-') and found it unwieldy. Lotsa boilerplate to keep around I/O.

- I've seen the GNU way (messy pile depending on LC_* environment vars and files in /usr/, with hierarchy of precedence) and disliked it.

- I've seen the Plan 9 way (char * / char[] for UTF-8 and Rune * / Rune[] for UTF-32) and it fits me. I see and use (non-PHP) software built on that, works reliably. I'm happy :)

implode() and explode() work with UTF-8. I hope you know the techicalities of why -- UTF-8's been designed that way and the functions take complete string as the glue/separator. No idea about about UTF-7, -16 or -32 (nor the April Fool's -18 and -36); guess some of those would break terribly. I don't care at this moment. My sites run polish, english, russian and in near future czech language based on UTF-8.

Your point is just strtok(). You are right. I lost. PHP lost. Have a nice day -- and kudos for bringing up that cool function :-)

EDIT: now that I think about it, it seems I've abused explode() for that job -- the output array, with empty strings removed, is equivalent of calling strtok() repeatedly till it returns FALSE. Seems a bit brute-force-ish.

AS for that warning against, and undefined behavior, I'll assume you're refering to the following passage:

  It is not recommended to use the function overloading option in the per-directory context, because it's not confirmed yet to be stable enough in a production environment and may lead to undefined behaviour.

Reading comprehension: don't re-configure it per directory (in this case, of your project, not whole server). Having that minded, you get reliable site.

j / k navigate · click thread line to collapse