Friday, November 14, 2008

Readin Notes Eleven

This weeks first article on web crawling was a little interesting. It helped simplify somewhat, the complicated process of crawling. Without crawling we wouldnt have giants like Google or Yahoo. Also it explained spamming a little and I found it interesting that spammers send one version ot its content to search sites and entirely different content to the user. Im very much against Internet regulation, but that is just plain wrong. This in turn made me realize just how very valuable my anti-spam software is.
The second article was about the Open Archives Initiative and its attempt to improve metadata harvesting. This one was really hard to continue reading...it had good information in it but was covered up by way too much technical language. It also discussed the Digital Library Federation and if you can oversome the boredom this article offers some good insight an potential help for conducting independent research.
The last article focused on the Deep Web. Ok first off, i was blown away by the key findings expressed in the article about deep web. If those things are accurate, we have no real concept of how much information is available to us on the Internet. And fully 95% of that info is public and free! I was astounded! The section about how search engines work made it a little easier to grasp why Google misses so much information. Imagine the monster Google could truly become if they began to penetrate the deep web and not just continue scraping through the shallows of the surface web! This article was one of the best ones I've read this semester for this class. I enjoyed all the statistics and tables and was continually shocked as I realized that how much material was out there that most people have no clue exists. This article like the last one offers some great direction for accomplishing research.

3 comments:

Theresa said...

Nate,

I would have to agree with you on the Deep Web. Wouldn't it make everyone's life a bit easier if Google and other large search engine websites did some deep penetration (sorry needed to get my Madden in)and make available more relevant information for research?

Denise said...

I was also pretty floored when I read about the Deep Web. I had no idea, and apparently, I'm not alone in that.

Domenic Sorace said...

I agree with you on the OAI article... there was a lot of useful information that was just cluttered up and surrounded by technical jargon that made it really hard to read and understand. It was nice that they reintroduced XML, HTTP, and Dublin Core again... At least now we know that everything we've talked about ties in together in some way...