<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Google: How do you do it?</title>
	<atom:link href="http://kubasik.net/blog/2007/11/02/google-how-do-you-do-it/feed/" rel="self" type="application/rss+xml" />
	<link>http://kubasik.net/blog/2007/11/02/google-how-do-you-do-it/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=google-how-do-you-do-it</link>
	<description>Kevin Kubasik's Personal Blog</description>
	<lastBuildDate>Sat, 17 Jul 2010 15:28:22 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=abc</generator>
	<item>
		<title>By: RyanTheRobot</title>
		<link>http://kubasik.net/blog/2007/11/02/google-how-do-you-do-it/comment-page-1/#comment-285</link>
		<dc:creator>RyanTheRobot</dc:creator>
		<pubDate>Sun, 11 Nov 2007 08:21:47 +0000</pubDate>
		<guid isPermaLink="false">http://kubasik.net/blog/2007/11/02/google-how-do-you-do-it/#comment-285</guid>
		<description>I seriously wonder how google does half the things that it does... and how their product quality is constantly one of the best in the industry. They are such an awesome company...</description>
		<content:encoded><![CDATA[<p>I seriously wonder how google does half the things that it does&#8230; and how their product quality is constantly one of the best in the industry. They are such an awesome company&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tobu</title>
		<link>http://kubasik.net/blog/2007/11/02/google-how-do-you-do-it/comment-page-1/#comment-284</link>
		<dc:creator>Tobu</dc:creator>
		<pubDate>Fri, 09 Nov 2007 14:55:25 +0000</pubDate>
		<guid isPermaLink="false">http://kubasik.net/blog/2007/11/02/google-how-do-you-do-it/#comment-284</guid>
		<description>I&#039;ve been reading these lectures:
http://www.ee.technion.ac.il/courses/049011/spring05/index_files/Page337.html

The second one touches on indexes. No huge insights there (delta compression maybe?), but it did clarify some concepts for me.</description>
		<content:encoded><![CDATA[<p>I&#8217;ve been reading these lectures:<br />
<a href="http://www.ee.technion.ac.il/courses/049011/spring05/index_files/Page337.html" rel="nofollow">http://www.ee.technion.ac.il/courses/049011/spring05/index_files/Page337.html</a></p>
<p>The second one touches on indexes. No huge insights there (delta compression maybe?), but it did clarify some concepts for me.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kevin Kubasik</title>
		<link>http://kubasik.net/blog/2007/11/02/google-how-do-you-do-it/comment-page-1/#comment-283</link>
		<dc:creator>Kevin Kubasik</dc:creator>
		<pubDate>Mon, 05 Nov 2007 06:07:31 +0000</pubDate>
		<guid isPermaLink="false">http://kubasik.net/blog/2007/11/02/google-how-do-you-do-it/#comment-283</guid>
		<description>@Tobu: We don&#039;t do this at the moment, I&#039;ve tought about doing it, but it seems like it would cost far too much at retrieval time.. However I might look into a proof of concept implementation to test this, however its a lot of work if it ends up being too expensive =/

@noname: yeah, I checked out some dbm implementations, I couldn&#039;t figure out if we were gonna get compressed text content, it seems like most of them just dump the content on disk, much like our old system which took up far too much space... anyone know anything specific wrt this?</description>
		<content:encoded><![CDATA[<p>@Tobu: We don&#8217;t do this at the moment, I&#8217;ve tought about doing it, but it seems like it would cost far too much at retrieval time.. However I might look into a proof of concept implementation to test this, however its a lot of work if it ends up being too expensive =/</p>
<p>@noname: yeah, I checked out some dbm implementations, I couldn&#8217;t figure out if we were gonna get compressed text content, it seems like most of them just dump the content on disk, much like our old system which took up far too much space&#8230; anyone know anything specific wrt this?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: noname</title>
		<link>http://kubasik.net/blog/2007/11/02/google-how-do-you-do-it/comment-page-1/#comment-282</link>
		<dc:creator>noname</dc:creator>
		<pubDate>Sat, 03 Nov 2007 02:55:32 +0000</pubDate>
		<guid isPermaLink="false">http://kubasik.net/blog/2007/11/02/google-how-do-you-do-it/#comment-282</guid>
		<description>I&#039;m not sure but maybe you find this useful: http://tokyocabinet.sourceforge.net/</description>
		<content:encoded><![CDATA[<p>I&#8217;m not sure but maybe you find this useful: <a href="http://tokyocabinet.sourceforge.net/" rel="nofollow">http://tokyocabinet.sourceforge.net/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tobu</title>
		<link>http://kubasik.net/blog/2007/11/02/google-how-do-you-do-it/comment-page-1/#comment-281</link>
		<dc:creator>Tobu</dc:creator>
		<pubDate>Fri, 02 Nov 2007 21:01:08 +0000</pubDate>
		<guid isPermaLink="false">http://kubasik.net/blog/2007/11/02/google-how-do-you-do-it/#comment-281</guid>
		<description>Re index size. In beagle, do you compress words by replacing them by a reference to an index entry (assuming a minimum length and a minimum number of occurences)? I suspect it would cut down size. You would need index entries to be permanent then (or possibly refcounted).</description>
		<content:encoded><![CDATA[<p>Re index size. In beagle, do you compress words by replacing them by a reference to an index entry (assuming a minimum length and a minimum number of occurences)? I suspect it would cut down size. You would need index entries to be permanent then (or possibly refcounted).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kevin Kubasik</title>
		<link>http://kubasik.net/blog/2007/11/02/google-how-do-you-do-it/comment-page-1/#comment-280</link>
		<dc:creator>Kevin Kubasik</dc:creator>
		<pubDate>Fri, 02 Nov 2007 20:37:12 +0000</pubDate>
		<guid isPermaLink="false">http://kubasik.net/blog/2007/11/02/google-how-do-you-do-it/#comment-280</guid>
		<description>If we were just querying Google, then that would be an easy solution, however, many people like having search results locally, and its a marketed feature of GDS, unless I&#039;m misunderstanding what people want/are looking for.</description>
		<content:encoded><![CDATA[<p>If we were just querying Google, then that would be an easy solution, however, many people like having search results locally, and its a marketed feature of GDS, unless I&#8217;m misunderstanding what people want/are looking for.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: grakic</title>
		<link>http://kubasik.net/blog/2007/11/02/google-how-do-you-do-it/comment-page-1/#comment-279</link>
		<dc:creator>grakic</dc:creator>
		<pubDate>Fri, 02 Nov 2007 20:33:27 +0000</pubDate>
		<guid isPermaLink="false">http://kubasik.net/blog/2007/11/02/google-how-do-you-do-it/#comment-279</guid>
		<description>Can&#039;t you just do HTTP to get search results from Gmail. IMHO, search results is useless if user is offline so i don&#039;t see major drawback here.</description>
		<content:encoded><![CDATA[<p>Can&#8217;t you just do HTTP to get search results from Gmail. IMHO, search results is useless if user is offline so i don&#8217;t see major drawback here.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Walther</title>
		<link>http://kubasik.net/blog/2007/11/02/google-how-do-you-do-it/comment-page-1/#comment-278</link>
		<dc:creator>Walther</dc:creator>
		<pubDate>Fri, 02 Nov 2007 19:32:26 +0000</pubDate>
		<guid isPermaLink="false">http://kubasik.net/blog/2007/11/02/google-how-do-you-do-it/#comment-278</guid>
		<description>Take a look at http://libgmail.sourceforge.net
It is a python library for gmail. I think they are something http-based, but I didn&#039;t look how it works exactly. It seems to be pretty efficient for searches.</description>
		<content:encoded><![CDATA[<p>Take a look at <a href="http://libgmail.sourceforge.net" rel="nofollow">http://libgmail.sourceforge.net</a><br />
It is a python library for gmail. I think they are something http-based, but I didn&#8217;t look how it works exactly. It seems to be pretty efficient for searches.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: glandium</title>
		<link>http://kubasik.net/blog/2007/11/02/google-how-do-you-do-it/comment-page-1/#comment-273</link>
		<dc:creator>glandium</dc:creator>
		<pubDate>Fri, 02 Nov 2007 15:50:51 +0000</pubDate>
		<guid isPermaLink="false">http://kubasik.net/blog/2007/11/02/google-how-do-you-do-it/#comment-273</guid>
		<description>Surely, GDS uses some open source library for SSL encryption, so you could replace it with your own that would log the data sent to be encrypted.

Anyways, my guess is that GDS directly gets indexes from GMail, via a proprietary Google POP extension. Why download a lot of data and index it yourself when powerful servers already did it for you ? All GDS has to do is use the same indexing algorithms as Google, which is not impossible. At least, that&#039;s how I&#039;d do it.</description>
		<content:encoded><![CDATA[<p>Surely, GDS uses some open source library for SSL encryption, so you could replace it with your own that would log the data sent to be encrypted.</p>
<p>Anyways, my guess is that GDS directly gets indexes from GMail, via a proprietary Google POP extension. Why download a lot of data and index it yourself when powerful servers already did it for you ? All GDS has to do is use the same indexing algorithms as Google, which is not impossible. At least, that&#8217;s how I&#8217;d do it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kevin Kubasik</title>
		<link>http://kubasik.net/blog/2007/11/02/google-how-do-you-do-it/comment-page-1/#comment-275</link>
		<dc:creator>Kevin Kubasik</dc:creator>
		<pubDate>Fri, 02 Nov 2007 12:47:55 +0000</pubDate>
		<guid isPermaLink="false">http://kubasik.net/blog/2007/11/02/google-how-do-you-do-it/#comment-275</guid>
		<description>@nick: There is some support, however, the problem is that IMAP tends to be a much more complex protocol, the Gmail implementation isn&#039;t 100% and right now there are some performance problems on that front.

Also, since the indexing is really a one time sequential skim, pop really meets most of our needs. Really, I would think that most users (especially now with IMAP) could just register their Gmail accounts in Thunderbird, Evolution or Kmail to get them indexed.</description>
		<content:encoded><![CDATA[<p>@nick: There is some support, however, the problem is that IMAP tends to be a much more complex protocol, the Gmail implementation isn&#8217;t 100% and right now there are some performance problems on that front.</p>
<p>Also, since the indexing is really a one time sequential skim, pop really meets most of our needs. Really, I would think that most users (especially now with IMAP) could just register their Gmail accounts in Thunderbird, Evolution or Kmail to get them indexed.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
