I’ve been on a C kick recently as I learn the intricacies involved in low level programming. As a Data Scientist/Python Programmer I work with strings all the time. People say that handling strings in C range anywhere from tricky to downright awful. I was curious so I decided to see how deep the rabbit hole went.
This is an interesting post because it highlights a widely held misconception about C. The fact that C has such primitive string functions is not a flaw of the language that will ever be fixed as this is not a flaw at all! You are expecting C to be something it's not and that is safe functional modern language. C is a systems programming language and if you want to use it in a safe application space you must provide the 'safety' yourself. Adding more kludges to the language isn't helping. If you want all that safety built in use a different language.
Idk, what does the Powershell kanji stuff have to do with anything?
strlen(有り難う) returning 12 feels rather natural to me. I don't think I ever cared about the number of Unicode points in a string in my life. Most of the time, all I want to know is how big it's in memory. Sometimes I care about how big it's on a screen, but that's impacted by font choice and other things outside of scope for any strlen-type functionality. Besides, the existence of Unicode modifier symbols kinda raises the question of whether "the number of Unicode points in a string" is a well-defined operation to begin with.
strcpy(destination, source); // Copy the source string to the destination string
printf("Source: %s\n", source);
printf("Destination: %s\n", destination);
return 0;
}
This copies the source into itself! the first example with char destination[20]; will actually create a new 20 character string in memory, this second iteration points destination back to the source.
Yup. I first noticed this problem in 1973. C should always be the last choice, to be used only when it is the best choice.
This is an interesting post because it highlights a widely held misconception about C. The fact that C has such primitive string functions is not a flaw of the language that will ever be fixed as this is not a flaw at all! You are expecting C to be something it's not and that is safe functional modern language. C is a systems programming language and if you want to use it in a safe application space you must provide the 'safety' yourself. Adding more kludges to the language isn't helping. If you want all that safety built in use a different language.
Idk, what does the Powershell kanji stuff have to do with anything?
strlen(有り難う) returning 12 feels rather natural to me. I don't think I ever cared about the number of Unicode points in a string in my life. Most of the time, all I want to know is how big it's in memory. Sometimes I care about how big it's on a screen, but that's impacted by font choice and other things outside of scope for any strlen-type functionality. Besides, the existence of Unicode modifier symbols kinda raises the question of whether "the number of Unicode points in a string" is a well-defined operation to begin with.
The output of strcmp should be different as you’ve got two ‘%s’ there. 👍
Hm, I got emoji output to work on console by going like
SetConsoleCP(CP_UTF8); SetConsoleOutputCP(CP_UTF8)
like at https://github.com/DDR0/Wincrawl/blob/master/Wincrawl2/io.cpp#L24, maybe that'd help with Japanese too?
int main() {
char source[] = "Hello, world!";
char* destination = source;
strcpy(destination, source); // Copy the source string to the destination string
printf("Source: %s\n", source);
printf("Destination: %s\n", destination);
return 0;
}
This copies the source into itself! the first example with char destination[20]; will actually create a new 20 character string in memory, this second iteration points destination back to the source.
You need to change the windows code page from 1252 to utf-8 but this will break mssql during it's upgrades. Windows, so fun
I think "有り難う" means "thank you" instead of "hello" :)
Doh! I had worked shopped a few examples and forgot to switch it out. Thanks for pointing it out!