#!/usr/bin/perl
use strict;
use warnings;
use Encode qw( decode FB_QUIET );
binmode STDIN, ':bytes';
binmode STDOUT, ':encoding(UTF-8)';
my $out;
while ( <> ) {
$out = '';
while ( length ) {
$out .= decode( "utf-8", $_, FB_QUIET );
$out .= decode( "iso-8859-1", substr( $_, 0, 1 ), FB_QUIET ) if length;
}
print $out;
}The problem is that Perl internally encodes strings as sequences of numbers. Not even sequences of bytes, but sequences of numbers that could either be codepoints or bytes resulting from the encoding of such a sequence of codepoints. ...as a developer you are perfectly free to make this assumption any way you please at any given point in your codebase. It's not even clear that any one of those two is particularly "preferred" at large or a best practice or anything like that.
To make things worse, there is no way to know which is which, i.e. a string itself is happily ignorant about the assumptions that people will/should make about it. And Perl will happily concatenate strings making different kinds of assumptions, or double- or triple-encode them as you please, or decode something that hasn't been encoded in the first place.
This leads to jumbles of numbers that aren't anything in particular. They simply work well enough for sloppy programmers to not realize when they are making mistakes, but badly enough to almost guarantee that encoding errors will crop up on users' screens regularly.
Now, given that this is how the language works, be my guest jumping into a 100k loc Perl codebase that dozens of programmers have touched over a decade, passing around and munging together strings not just within their own codebase, but also using strings stored to and retrieved from elsewhere, in some case places where no one knows anymore where they initially came from or where they will ultimately go to.
Thank you from being so civil. IMO displaying a badly encoded string beats crashing on a runtime error most of the time. I'd rather see "hôpital" than "Error 500", if you will. Maybe don't think your personal assumptions carry any validity out of your own choices, preferences, or uses.
I imagine the difficulty working with a huge codebase lacking refactoring and maybe even predating utf-8, but where would you be if it was written in Python 2.5 originally?
Any python programmer would tell you: Starting a new project in 2022 in Python 2.5 is professional malpractice.
But that's what the original post seems to be saying: That Perl 5 has somehow managed to fix any of what was fundamentally wrong with it. ...and that couldn't be further from the truth. And people in this thread are saying that maybe they should have another look into Perl 5 as a serious option for starting out a new codebase in 2022. ...and that's a very bad idea.
Sure: If you started out a new codebase in Perl 5 in 2022, there are coding standards you could adopt to avoid getting yourself into a pickle where string encodings are concerned. But without the interpreter helping you out on that front, it'll produce ugly code, and take mental discipline and disciplined code reviewing practices on a team. It's solving a problem that Python solves for you so much more easily and effectively. You could go with Perl 6 / Raku, but why would you? What does it have to recommend it over Python or Ruby, other than a Perl programmer's nostalgia for being a little Perl-like?
You could say the transition from Perl 5 to Perl 6 is just like the transition from Python 2 to Python 3. The difference is: Perl is simply late by at least a decade.
The point that the article is trying to refute, namely that Perl is for dinosaurs, in my mind just absolutely stands.
The debate between weak typing and strong typing is as old as the hills. But in much of the modern era, strong typing, of which Python is an example, seems to have decidedly prevailed.