Aravind
scriptonist's blog

scriptonist's blog

Something about Go strings that you should know

Something about Go strings that you should know

Aravind's photo
Aravind

Published on Jun 10, 2019

min read

What will be the output of the following code snippet?

package main

import (
    "fmt"
)

func main() {
    var s = "abcdè"
    fmt.Println(len(s))
}

The string s has 5 characters and therefore the output should be 5 right? or Is it 🤔

If we try running the code here: https://play.golang.com/p/4F4ZkyWJAiQ, we can find that the output is 6. That's interesting, why is it so? Let's try to understand.

In Go a string is nothing but an immutable byte slice. Let's examine our string and what it is contains.

package main

import (
    "fmt"
)

func main() {
    var s = "abcdè"
    fmt.Printf("% x",s)
}

https://play.golang.com/p/Q0m5l6fndhy

On executing the code, we are greeted with the following output.

61 62 63 64 c3 a8

Now things are starting to unfold, we see that we actually have 6 bytes in the string slice and therefore we got the length as 6 for the string.

But, what are these random values? Let's find out.

package main

import (
    "fmt"
)

func main() {
    var s = "abcdè"
    fmt.Printf("%+q",s)
}

%+q escapes any non-ASCII bytes

https://play.golang.com/p/IlQUDUX9DHB

"abcd\u00e8"

Since %+q escapes non-ASCII characters, we can see that the last character has 00E8 Unicode value. Go uses UTF-8 Encoding so the hex value c3 a8 that we got earlier was indeed UTF-8 encoded hex value of 00E8. So our string variable had hex values corresponding to the unicode values in it's byte slice.

$ printf '\x61\n'
a
$ printf '\x62\n'
b
$ printf '\x63\n'
c
$ printf '\x64\n'
d
$ printf '\xC3\xA8\n'
è

So, that was the reason behind that behaviour. Feel free to post any questions and feedback below in the comments.

Find More:

 
Share this