Stuff Michael Meeks is doing |
Older items: 2023: ( J F M A M J ), 2022: ( J F M A M J J A S O N D ), 2021, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, legacy html
all_whitespace
) so:
while (p != end) { if (!g_ascii_isspace (*p)) return FALSE; p = g_utf8_next_char (p); }3x faster:
while (p != end) { if (G_UNLIKELY (*p != ' ' && *p != '\t' && *p != '\n' && *p != '\r')) return FALSE; p++; }When parsing utf-8 text breaing on ASCII tokens, there is no need to care at all about the wunder-non-ascii multi-byte-sequences; so don't it really slows things down. Everyone trying to write utf-8 parsing code needs to type
man utf-8
and read for a while
before typing.
My content in this blog and associated images / data under
images/
and data/
directories are (usually)
created by me and (unless obviously labelled otherwise) are licensed under
the public domain, and/or if that doesn't float your boat a CC0
license. I encourage linking back (of course) to help people decide for
themselves, in context, in the battle for ideas, and I love fixes /
improvements / corrections by private mail.
In case it's not painfully obvious: the reflections reflected here are my own; mine, all mine ! and don't reflect the views of Collabora, SUSE, Novell, The Document Foundation, Spaghetti Hurlers (International), or anyone else. It's also important to realise that I'm not in on the Swedish Conspiracy. Occasionally people ask for formal photos for conferences or fun.
Michael Meeks (michael.meeks@collabora.com)