MarkusWolff | September 20, 2007 at 18:47 · Filed under PHP, Development
Welcome to yet another job offer post :-)
We at Jimdo are looking for reinforcements! You’ll be able to work in Hamburg, one of the most beautiful cities in Germany and work with a young team in an enthusiastic athmosphere. Honestly, I haven’t had this much fun for years before I started working here :-)
We’re looking for PHP experts, system administrators, web designers and Javascript gods (or a human vessel with equal powers).
What are you still waiting for? Go to the official job page, right now, don’t hesitate:
http://www.jimdo.com/jobs.php
Note: If you’re a resident of the European Union, you’re free to work in any EU country you like! You don’t speak German? Not a problem as long as you speak and understand English well enough. You’ll miss out on some of the jokes for a while, but you can still see me lose each and every single Bomberman tournament held in this office, which should be satisfactory enough. Oh, we also got excellent coffee! Still not interested? What’s wrong with you? :-)
MarkusWolff | September 19, 2007 at 17:14 · Filed under PHP, Development
When you’re builing international websites, there’s always something new to learn. Especially if one of the languages your website is available in uses a character set different from anything you’re used to. For jimdo.com, the greatest challenge as of yet is the chinese version.
Jimdo allows to define tags for your website. You can separate the tags with whitespace, but it’s also possible to use commas, like this:
tag1, tag2, tag3
Chinese users naturally are way more used to using UTF-8 characters than us westerners, and, lo and behold, UTF-8 has its own special comma character with integrated whitespace, that is quite frequently used by chinese users:
科学,思考,心情
As we’re using good ol’ regular expressions to split up the tag strings into single tags, one might think, “no problem, I’ll just add another character to the regex pattern”, like so:
$tags = preg_split("/[\s,;:,]+/", $input, null, PREG_SPLIT_NO_EMPTY);
And heureka, it works! Or does it? Nope. As UTF-8 works with multiple bytes per character and preg_split, like so many other current PHP functions, thinks of one byte = one character, you may encounter strange side-effects. Here’s an example using the above pattern on a random string with some German umlauts:
Splitting up 'Bääh Blöök Dübel', becomes:
Array
(
[0] => Bääh
[1] => Blöök
[2] => D�
[3] => bel
)
What to do? Simple: Add the unicode modifier, “u”, to the pattern:
$tags = preg_split(”/[\s,;:,]+/u”, $input, null, PREG_SPLIT_NO_EMPTY);
Now preg_split correctly recognizes multibyte characters and yields the expected results:
Splitting up 'Bääh Blöök Dübel', becomes:
Array
(
[0] => Bääh
[1] => Blöök
[2] => Dübel
)
Another lesson learned.