Disqus Comments

randy cole • 4 years ago

Thanks for posting this Jimmy - I can see how this could really trip you up if not aware of it! I wasn't aware of the ICU standard myself so you have saved me a lot of time. You rock!

Dan Sutton • 4 years ago

Interesting. I saved this one -- I have a feeling that in the future, this will save me hours.

Mariusz • 4 years ago

Underneath the covers, IndexOf and Contains do subtly different things, it turns out. While IndexOf is an ordinal search, Contains is a linguistic search

Isn't it opposite? IndexOf is "smarter" (linguistic) and Contains "dumber" (ordinal)?

Tom Winter • 4 years ago

It appears you have these backwards: "While IndexOf is an ordinal search, Contains is a linguistic search"

https://github.com/dotnet/r...

jbogard • 4 years ago

Sigh, of course. Thanks!

bluechrism • 4 years ago

I saw this recently with a test going between .Net Framework 4.7.2 and .Net Core 3.1. in 3.1 string.GetHashCode() now has a few overrides and our Code analysis tool flagged as a warning that i wasn't using a StringComparison version (CA 1307). My normal usage would be to use InvariantCulture which led to a test failure. Ordinal passed. I haven't had time to look into why or if Ordinal is the correct choice here yet.

Mark Smeltzer • 4 years ago

Ordinal means it just compares the raw binary values of the strings. Invariant still uses unicode equivalence rules. For example, with Ordinal a string containing an extra unicode zero-width space at the end would not match the base string. With Invariant, the string with the extra zero-width space would match the base string.

Unless there's a great need for user experience to match on unicode equivalence rules, Ordinal is usually the way to go. In some languages, there are multiple ways of arriving at the same effective character -- and in that case, the equivalence rules (and the correct culture) do make a difference.

For determining things like hashcodes, object equivalence, etc., I typically always use Ordinal. If the text is known to be ascii or US-culture, I also use OrdinalIgnoreCase. However, if the text is unknown then it is important to use the correct culture.

In many cases, that concern often only applies to presentation layer (e.g. grouping, sort order). But it could also have impact on some reporting scenarios (e.g. grouping, sort order) that could impact report results.

Michel Renaud • 4 years ago

Very interesting and definitely good to know as I'm about to upgrade a "personal project that's moving too slowly" from 3.1 to 5.0.