I’ve been on a C kick recently as I learn the intricacies involved in low level programming. As a Data Scientist/Python Programmer I work with strings all the time. People say that handling strings in C range anywhere from tricky to downright awful. I was curious so I decided to see how deep the rabbit hole went.
This is an interesting post because it highlights a widely held misconception about C. The fact that C has such primitive string functions is not a flaw of the language that will ever be fixed as this is not a flaw at all! You are expecting C to be something it's not and that is safe functional modern language. C is a systems programming language and if you want to use it in a safe application space you must provide the 'safety' yourself. Adding more kludges to the language isn't helping. If you want all that safety built in use a different language.
C strings are and always have been, hot garbage, and std::string does not really address the fundamental issue.
Not only are they are common source of bugs or attacks, they are also inefficient: is it VERY common in C programs to use strlen(), over and over resulting in needless O(N) complexity.
Pascal got this right by using the first character in the array as the length.
Idk, what does the Powershell kanji stuff have to do with anything?
strlen(有り難う) returning 12 feels rather natural to me. I don't think I ever cared about the number of Unicode points in a string in my life. Most of the time, all I want to know is how big it's in memory. Sometimes I care about how big it's on a screen, but that's impacted by font choice and other things outside of scope for any strlen-type functionality. Besides, the existence of Unicode modifier symbols kinda raises the question of whether "the number of Unicode points in a string" is a well-defined operation to begin with.
strcpy(destination, source); // Copy the source string to the destination string
printf("Source: %s\n", source);
printf("Destination: %s\n", destination);
return 0;
}
This copies the source into itself! the first example with char destination[20]; will actually create a new 20 character string in memory, this second iteration points destination back to the source.
Yup. I first noticed this problem in 1973. C should always be the last choice, to be used only when it is the best choice.
This is an interesting post because it highlights a widely held misconception about C. The fact that C has such primitive string functions is not a flaw of the language that will ever be fixed as this is not a flaw at all! You are expecting C to be something it's not and that is safe functional modern language. C is a systems programming language and if you want to use it in a safe application space you must provide the 'safety' yourself. Adding more kludges to the language isn't helping. If you want all that safety built in use a different language.
C strings are and always have been, hot garbage, and std::string does not really address the fundamental issue.
Not only are they are common source of bugs or attacks, they are also inefficient: is it VERY common in C programs to use strlen(), over and over resulting in needless O(N) complexity.
Pascal got this right by using the first character in the array as the length.
Idk, what does the Powershell kanji stuff have to do with anything?
strlen(有り難う) returning 12 feels rather natural to me. I don't think I ever cared about the number of Unicode points in a string in my life. Most of the time, all I want to know is how big it's in memory. Sometimes I care about how big it's on a screen, but that's impacted by font choice and other things outside of scope for any strlen-type functionality. Besides, the existence of Unicode modifier symbols kinda raises the question of whether "the number of Unicode points in a string" is a well-defined operation to begin with.
The output of strcmp should be different as you’ve got two ‘%s’ there. 👍
Hm, I got emoji output to work on console by going like
SetConsoleCP(CP_UTF8); SetConsoleOutputCP(CP_UTF8)
like at https://github.com/DDR0/Wincrawl/blob/master/Wincrawl2/io.cpp#L24, maybe that'd help with Japanese too?
int main() {
char source[] = "Hello, world!";
char* destination = source;
strcpy(destination, source); // Copy the source string to the destination string
printf("Source: %s\n", source);
printf("Destination: %s\n", destination);
return 0;
}
This copies the source into itself! the first example with char destination[20]; will actually create a new 20 character string in memory, this second iteration points destination back to the source.
You need to change the windows code page from 1252 to utf-8 but this will break mssql during it's upgrades. Windows, so fun
I think "有り難う" means "thank you" instead of "hello" :)
Doh! I had worked shopped a few examples and forgot to switch it out. Thanks for pointing it out!