10 Comments
User's avatar
Marc Rochkind's avatar

Yup. I first noticed this problem in 1973. C should always be the last choice, to be used only when it is the best choice.

Expand full comment
Tim's avatar

This is an interesting post because it highlights a widely held misconception about C. The fact that C has such primitive string functions is not a flaw of the language that will ever be fixed as this is not a flaw at all! You are expecting C to be something it's not and that is safe functional modern language. C is a systems programming language and if you want to use it in a safe application space you must provide the 'safety' yourself. Adding more kludges to the language isn't helping. If you want all that safety built in use a different language.

Expand full comment
Jodie's avatar

C strings are and always have been, hot garbage, and std::string does not really address the fundamental issue.

Not only are they are common source of bugs or attacks, they are also inefficient: is it VERY common in C programs to use strlen(), over and over resulting in needless O(N) complexity.

Pascal got this right by using the first character in the array as the length.

Expand full comment
Arneb's avatar

Idk, what does the Powershell kanji stuff have to do with anything?

strlen(有り難う) returning 12 feels rather natural to me. I don't think I ever cared about the number of Unicode points in a string in my life. Most of the time, all I want to know is how big it's in memory. Sometimes I care about how big it's on a screen, but that's impacted by font choice and other things outside of scope for any strlen-type functionality. Besides, the existence of Unicode modifier symbols kinda raises the question of whether "the number of Unicode points in a string" is a well-defined operation to begin with.

Expand full comment
Ali Gray's avatar

The output of strcmp should be different as you’ve got two ‘%s’ there. 👍

Expand full comment
DDR's avatar

Hm, I got emoji output to work on console by going like

SetConsoleCP(CP_UTF8); SetConsoleOutputCP(CP_UTF8)

like at https://github.com/DDR0/Wincrawl/blob/master/Wincrawl2/io.cpp#L24, maybe that'd help with Japanese too?

Expand full comment
Garrett's avatar

int main() {

char source[] = "Hello, world!";

char* destination = source;

strcpy(destination, source); // Copy the source string to the destination string

printf("Source: %s\n", source);

printf("Destination: %s\n", destination);

return 0;

}

This copies the source into itself! the first example with char destination[20]; will actually create a new 20 character string in memory, this second iteration points destination back to the source.

Expand full comment
cervedin's avatar

You need to change the windows code page from 1252 to utf-8 but this will break mssql during it's upgrades. Windows, so fun

Expand full comment
Lerk's avatar

I think "有り難う" means "thank you" instead of "hello" :)

Expand full comment
Diego Crespo's avatar

Doh! I had worked shopped a few examples and forgot to switch it out. Thanks for pointing it out!

Expand full comment