C Strings and my slow descent to madness

Apr 6, 2023

I’ve been on a C kick recently as I learn the intricacies involved in low level programming. As a Data Scientist/Python Programmer I work with strings all the time. People say that handling strings in C range anywhere from tricky to downright awful. I was curious so I decided to see how deep the rabbit hole went.

Read →

10 Comments

Marc Rochkind

Apr 6, 2023

Yup. I first noticed this problem in 1973. C should always be the last choice, to be used only when it is the best choice.

Expand full comment

Tim

Apr 7, 2023Edited

This is an interesting post because it highlights a widely held misconception about C. The fact that C has such primitive string functions is not a flaw of the language that will ever be fixed as this is not a flaw at all! You are expecting C to be something it's not and that is safe functional modern language. C is a systems programming language and if you want to use it in a safe application space you must provide the 'safety' yourself. Adding more kludges to the language isn't helping. If you want all that safety built in use a different language.

Expand full comment

Jodie

Mar 3

C strings are and always have been, hot garbage, and std::string does not really address the fundamental issue.

Not only are they are common source of bugs or attacks, they are also inefficient: is it VERY common in C programs to use strlen(), over and over resulting in needless O(N) complexity.

Pascal got this right by using the first character in the array as the length.

Expand full comment

Arneb

Apr 17, 2023Edited

Idk, what does the Powershell kanji stuff have to do with anything?

strlen(有り難う) returning 12 feels rather natural to me. I don't think I ever cared about the number of Unicode points in a string in my life. Most of the time, all I want to know is how big it's in memory. Sometimes I care about how big it's on a screen, but that's impacted by font choice and other things outside of scope for any strlen-type functionality. Besides, the existence of Unicode modifier symbols kinda raises the question of whether "the number of Unicode points in a string" is a well-defined operation to begin with.

Expand full comment

Ali Gray

Apr 7, 2023Edited

The output of strcmp should be different as you’ve got two ‘%s’ there. 👍

Expand full comment

DDR

Apr 6, 2023

Hm, I got emoji output to work on console by going like

SetConsoleCP(CP_UTF8); SetConsoleOutputCP(CP_UTF8)

like at https://github.com/DDR0/Wincrawl/blob/master/Wincrawl2/io.cpp#L24, maybe that'd help with Japanese too?

Expand full comment

Garrett

Apr 6, 2023

int main() {

char source[] = "Hello, world!";

char* destination = source;

strcpy(destination, source); // Copy the source string to the destination string

printf("Source: %s\n", source);

printf("Destination: %s\n", destination);

return 0;

}

This copies the source into itself! the first example with char destination[20]; will actually create a new 20 character string in memory, this second iteration points destination back to the source.

Expand full comment