January 2010 Archives

Illegal character 0x1FFFF

| No Comments
$ perl -le 'use warnings; my $x=chr(0x1FFFF)' 
Unicode character 0x1ffff is illegal at -e line 1.

XML supports UTF-8 so I check for valid UTF-8 string and use it in XML if valid. Right? No!!!

There are some "non-illegal" characters that are perfect valid in UTF-8 (or even in the plain old ASCII), but are invalid for XML. The most obvious 0x00. Here is what W3C XML 1.0 specification say:

[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

I spend some time playing with it and the result is XML::Char->valid(). The dev Data::asXML is using it now. If you you want, have a look at the test suit and try to break it. :-)

Once I told potyl - "hey let's have a wiki it will be easy to use for everyone". He wasn't so excited, not at all. Why? What is so bad about wiki? Look at this table of 130+ wiki syntaxes. Anyone still complaining that there are too many simmilar choices on CPAN? The wiki community decided to solve the problem by creating yet another wiki syntax...

What this has to do with Perl6 regexp grammars? After Damian talk at YAPC::EU::2009 I really wanted to try out the Regexp::Grammars. Finally I found some time during the Christmas and here is the result:

use Regexp::Grammars;
use re 'eval';
my $parser = qr@
<Wiki>
 
<rule: Wiki> <[Block]>*
<rule: Block> <.EmptyLine> | <Code> | <Para>
<token: Para> <Heading> | <List> | <TextLines>
<token: EmptyLine> ^ \h* \R
<token: TextLines> (?:^ (?! <Code> | <Heading> | <List> | <EmptyLine> ) [^\h] .+? \v)+
<token: CodeStart> ^ {{{ \h* \v
<token: CodeEnd> ^ }}} \h* \v
<token: Code> <.CodeStart> <CodeLines> <.CodeEnd>
<token: CodeLines> .+?
<token: Heading> <HeadingStart> \s <HeadingText> \s =+ \h* \v
<token: HeadingStart> ^=+
<token: HeadingText> [^=\v]+
<token: List> <[ListItem]>+
<token: ListItem> ^ <ListItemSpaces> <ListItemType> \h+ <ListItemText> \v
<token: ListItemSpaces> \h+
<token: ListItemType> (\*|\d+\.|a\.|i\.)
<token: ListItemText> .+?
@xms;

It is probably not the best piece of regexp gramar, Perl6 experts will for sure spot some error, but hey it works! "Works on my computer™". I've used it to transform TracWiki syntax to XHTML div and then using XSL to DokuWiki syntax. Here are the scripts that does it completely.

I found it!

| No Comments

Of course someone else already did it, what were you expecting Jozef? Well the same thing and I knew it (just read on), but it took some time to find it.

What I'm talking about? Quintura a search engine based on tag cloud. I've seen this kid of searching before in Open Clip Art so I was really not expecting to come up with something new at all, sorry, next time I'll fail again, don't worry. ;-) The idea is really nice. Using tag cloud to hint search directions - positive and also negative keywords. Cool!

Anything else? Yes! Besides some more exercise with JQuery my little project opened my eyes. I've started to look for some more search engines. Out of my mind I knew just couple of them ~10. So I went to wikipedia search engines which lists couple more and finally to wikipedia list of search engines.

And? Google, Yahoo, Bing, Wikipedia. This guy has really limited view of the topic! Exactly... So I looked a little deeper and found the top 100 list. 100? Sounds awful. Well according to GoshMe there are more than half a million search engines out there on the Internet. Believe or not let's not count them by hand.

And finally I found this - Segev, Elad (2008). "Search engines and power: A politics of online (mis-) information." Webology, 5(2), Article 54. Available at: http://www.webology.ir/2008/v5n2/a54.html. If you are not fan of reading long articles here are some citations from it to support the motivation:

It is argued and indicated that the dominant American search engines tend to commodify online information and intensify the asymmetry of information flow worldwide, supporting the growth of mainstream, commercial and very often US-centric information. It is therefore suggested that together with their important role in organizing the Web, search engines reinforce certain inequalities and understandings of the world. it has been suggested that any new type of search engine will either be acquired by one of the bigger American hubs or become commercialized in order to support its competitive position. Consequently, it will not be able to sustain information equality and diversity of alternative views, and will mainly represent the views of the richer and more popular nodes. support its competitive position. Consequently, it will not be able to sustain information equality and diversity of alternative views, and will mainly represent the views of the richer and more popular nodes. richer and more popular nodes. Hence, it is suggested that behind the so-called "transparent" services there are various political, economic and increasingly informational forces that continuously shape and reshape the representation of the World.

One other information caught my attention - Quaero a build in progress that got couple M€. According to the page 298, according to the wikipedia 99 from FR government (which is not listed on the official page at all). One million more or less ;-) definitely it is interesting budget.

And the takeovers? There Is More Than One Way To Do It. There Is More Than One View Of The World. And that the pages one visits are shaping an reshaping the personal world and opinions. Obvious? Yes, now yes for me. :-)

The world is not black and white,

| No Comments

and there is no ultimate truth, and... From the computer point of view - it is a chaos. Fortunately from the human point of view it is beautiful. Some will replace the word beautiful for colourful, some for diverse, some for always different, some for pain and some for hell...

We are now right there in this digital age. The are 10 types of people that either understand the binary or not. You like something or not, you use something or you don't, you are right or wrong, you do right or you do wrong, you follow someone or you don't care, you love someone or you hate. Or? No? Or mostly yes? Mostly no? More yes than no?

Now look at it this different way - the world is base 4. Yes? No? Probably yes? More likely no? :-) Who can say for sure? But you are free to choose any of the four answers. Where did I went for the inspiration? In DNA - ACGT. It is there inside us!

I don't mind if you tell me it is bullshit. For someone it is, for someone it is not, some will say most likely it is and some that there is a hope for that idea. :-)

(actually there is also another state - no opinion, but shhhhh don't tell anyone that it's possible to ignore the world)

  1. welcome-to-wikipedia.png
  2. criteria-for-speedy-deletion.png
  3. Talk:Meonl.com

    I was inspired by other similar pages that already are in wikipedia and linked from Search_engines - Blackle.com, Sperse!_Search, Info.com, etc. If you remove Meonl.com that please give some reasons why wikipedia keeps the other ones.

    Jozef.kutej (talk) 11:43, 9 January 2010 (UTC)

  4. wikipedia-this-page-has-been-deleted.png

The article:

URL http://meonl.com/
Commercial? No
Type of site Search engine
Registration No
Available language(s) English,
Created by Jozef Kutej
Launched January 2010
Current status Active

Meonl is a website powered by any search engine that allows to be run in HTML_element#Frames. Having one input box to do a search on couple of search engines allows to compare the results very easily.

Contents

History

Created as new year 2010 free time JQuery excercise.

Concept

Idea is to make use of wide screens to display more than one search engine at a time side by side. While there are just a couple of major search engines there are a lot of local ones that the users will simply not even try. Having split screen leaves the favourite search engine and shows also the results from an alternative one.

The concept of split screen is also a test how the internet users would like the idea of having a lot of information on one screen. If it proves to be useful than the webmasters should start to think of optimizing not only for low screen resolutions but also for high screen resolutions.

Examination

As there is rarely one truth answer. There is never one truth search results. Having results from two different companies (that have different people, different culture and different indexes) gives interesting results.

Usage

http://search.meonl.com/en/ url accepts two parameters. "s" and "q". Using "s" a list of desired search engine columns can be set. Using "q" the search query is set.

Examples

External links

search-meonl-com.png

Google vs Cuil vs Yahoo vs ...

Wide screen offers a lot of horizontal space so why not to use it and search with two (or more) search engines at the same time? Try search.meonl.com (about).

Where is the Perl there? There in the back. In the name of static can be more the meonl.com files are pure HTML+CSS+JS generated using Template Toolkit ttree and some Makefile rules.

The search.meonl.com page is basically an input box and 1 or more iframes with search engines. Submitting a text in the input box results in reloading search engines with a search url.

Why was the search.meonl.com created? I got an idea how to do the web searches a little different way. But then I've realized - who the hell will try just another search engine??? Well no one of course :-), but if there will be a way to keep Google (or any other major search engine) next to the new one, may there will be a little chance... I will probably never have enough time, money, energy and the rest one needs to start and finish such a project, but at least now there is a way how to compare how good the current search engines are - side by side.

The few days that I'm playing and searching with multiple search engines at once made me realize that, yes - the search engines are different. And yes - the other (than Google in my case) search engines can give better results sometimes. :-)

I feel like no one ever told the truth to me
About growing up and what a struggle it would be
In my tangled state of mind,
I've been looking back to find where I went wrong.
-- Queen - Too Much Love Will Kill You

There are people that say reinventing the wheel is bad, that it shouldn't be done. That we should find an existing project and contribute there. Some even say that there are too many variants and choices of CPAN modules that are doing the same thing, and it is wrong. That it is contra productive and scary for the newbies. There are people that call them self CPAN police that hunt down new uploaders trying to show them how many mistakes they made...

Now look at the kids. Those experience deferred success 1000 times a day. Even if they don't fail they play most of the day. They play by repeating, copying from adults or other kids. They speak the same sentences wrongly over and over. They do the same things over and over. They fail over and over. They are just kids. Every one knows that this is how they learn. Next to the kids there is always some adult. Until the kids are really small, the adults seems like perfect to them, because they just do everything perfect.

After growing up, one day, kids finds out that the adults are not perfect. They don't do always the right thinks. And they don't know everything. The trouble is that there are many adults that think that they perfect are. But that is different topic. Let's go back to childhood.

To be precise the Perl programmer childhood. Perl programmer life. The difference is everyone is free to be born to the Perl world and grow up here. The other nice thing is that everyone is free to leave it and go and live a different life.

/me a Perl kid. I like to play, I like to try out things. I like to reinvent the wheel over and over. I don't mind that there are Perl grown-ups that do the same thing much better than I do. I don't mind that I will hit the ground while doing weird experiments. And? It's fun and everybody has to fall the first time. (and second and third and ...) And I'm just a kid!

The Perl world is different to the "reality". The biggest difference is that it's hard to see the age. Everyone is growing with a different rate and some will never grow up - like me :-P

So I would like encourage all kids to come play with toys, throw them away if they don't like them any more and not be ashamed that they "just" build "another" sandcastle. You can always destroy it and build another one, don't you?

Now back to the desert of the real. There is a plenty of legacy Perl code every where around us. Legacy sometimes mean undocumented, unmaintained, badly written or just not understood. Sending bad words and blaming people that wrote it will not fix the situation. Everyone is doing the best he can, considering his experience, mood, moon phase, weather conditions, ... at the time of writing the code. If the makes the job done, it is good. If it makes someone happy writing it, it is even better. And if there is no replacement, it is the best code ever!

Updates

Subscribe to the blog updates with an email:

If you like it, share it.

Pages

About this Archive

This page is an archive of entries from January 2010 listed from newest to oldest.

December 2009 is the previous archive.

February 2010 is the next archive.

Find recent content on the main index or look in the archives to find all content.