Archive

Google Docs Presentations: A Major Disappointment

Google Docs has revolutionized the office suite, namely the word processor. Collaboration is easy, smooth, integrated, and automatic, whats more all your documents are accessible from anywhere, and all the common features I need are present. While I don’t really use spreadsheets very often, my few simple instances of using Google Doc’s for spreadsheets were easy enough. Needless to say, when I heard that a Presentation component was to be added I was excited.

Now, I’m a far cry from a Powerpoint Guru, I’ve used it maybe 2 times, but with an upcoming presentation and the Ubuntu Utah user group, I figured I should probably slap a few slides together. Since I want to have a sane contingency for the exploding laptop, forgotten laptop, or limited presentation machine, I figured this would be a great chance to stretch my presenting legs and give it a try. Too bad I’m not that lucky, I was unable to save _any_ changes, just create a new presentation, modify it as much as I wanted, but any close would lose all my work… lovely (this is in Firefox 2.0, 3.0 trunk, IE7, and Opera 8.24). I’m willing to give Google a few hours and then try again (its possible its some small downtime, its still beta after all ;) ).

Another gripe (as I read the Help to try and find similar reports of such bugs) is that there is a very limited selection of templates, while I might be able to upload new templates authored in PowerPoint or OpenOffice, couldn’t they at least let me change the color schemes? (If I can and I’m just missing how to do it, please share!)

I’ll post an update soon, and let you all know if I had any luck saving anything….

Update: This is fixed, and now its kinda cool! I’m going to try it this Saturday at the Ubuntu Utah users group and see how it goes.

Google: How do you do it?

So its not a big surprise that an oft-requested feature for Beagle is the ability to index a users Gmail messages (like Google Desktop Search). Today we (the Beagle developers) started to investigate just how this is done. While POP3 (and now IMAP) are available, downloading all of a users mail, indexing it, and then caching the text so we can display it. Now, my initial investigation into GDS for Linux revealed that it was calling home via POP3s and downloading lots of data. I have assumed that it was simply iterating over all messages (via POP3), downloading them, indexing them, and caching the compressed content somewhere in Google’s custom indexes.

Now, I had originally planned on this post being an open plea to any and everyone at Google asking them to open up the Gmail access API, but seeing as its just the plain old ugly POP3 (maybe a cool extension), were stuck biting the bullet and implementing a remote mail access layer.

Anyways, given how incredible Google has been in a million other situations, I thought I would throw out 2 wildly out-of-this-world questions, I wouldn’t expect to get a response, but before I spend the time figuring it all out, I felt like I should at least ask.

  • Are there some special POP Extensions available in Gmail? Is there some helper web api? Or does GDS really just have a POP3 crawler?
  • Is your compression/text storage library open source? (or documented in some research paper at all?) Beagle has always struggled with how to best handle storing copies of a documents text so that it might be made available in interfaces. While we do have a new hybrid text cache (text over 4k on the filesystem, under in a sqlite db, all compressed) we were still no where near as small as the GDS indexes. A cursory examination reveals that the GDS indexes are some form of b-tree on disk, but how are you compressing all that text so small? Is there some substitution/reconstruction algorithm? (It seems like that would be wildly expensive, but who knows).

Anyways, its a long shot, and its pretty far out there, but for the sake of not passing up answers that I can’t seem to find elsewhere on the net, I have asked.

How Much I Rely on Class Libraries

Ok, so I was stamping out a recent (quite simple) assignment for a class. The assignment required the separation of each place value in an integer. Since the information came into the program as a string, I stamped out a simple solution using some easy loops and handy methods available in the String class (of the .Net 2.0 class libraries). However, a quick skim over the rubric included 5% for correctly using div and mod to parse out the integer values. I immediately flipped back into my program, then froze for a second..

then another…

then 2 more…

I was completely blanking. I know its a simple task, its just I have become so accustomed to the incredible abundance and availability of a dozen methods for every little task that I blanked for a good minute, just stonewalled. I knew I had written this code before, everyone has done it at least once in some ‘Intro to Programming Theory’ class, and the concept was easy enough, but it just wouldn’t come. Cursing the professor for such a trivial demand, I went and got a cup of coffee.

Upon returning I realized how stupid and petty I was being. While I do often rely on cool class libraries and the methods they provide, I really just have to stop being so self-righteous and realize how likely it was that much of the class would be completely stuck on such a task. We spend so much time today learning about existing technologies and API’s that we forget the core of programming: Problem Solving.

I quickly slapped together the following C# method:

static int[] toIntArray(string input)
{
int i = 0;
int digit;
List<int> ints = new List<int>();
for (int numdigits = input.Length; numdigits > 0; numdigits–)
{

digit = Convert.ToInt32(input) / (int)Math.Pow(10.0, (double)(numdigits - 1));

digit = digit % 10;

ints.Add(digit);

i++;

}
return ints.ToArray();
}

In retrospect, its quite simple, I just hope that this was my ‘moment of realization’ and I don’t get so inundated again as to the point where I can’t do the simple stuff on my own anymore.

SqlLite Linq Provider

Ok, so as many of you may have noticed in my last post I’ve taken a real interest in C# 3.0 and its new nifty features. Now, I’m mostly just excited for the simple collection manipulations, but the whole Linq to SQL thing is nagging in the back of my mind. Now while complete CRUD will no doubt take some serious code, simple query support is not that difficult to implement. After following mattwar’s blog series on a generic DB provider for Linq, I decided that I wanted that awesome glory against a lean and mean sqlite db.

So, about an hour or so of messing and meddling I have a few samples working. (The attached zip) Its a Visual Studio 2008 Beta 2 project (I’ll be writing autotools magic for mono later this week) so sorry for that, the code can still be imported into Monodevelop, just the solution/project files won’t work. Anyways, I have tested a smattering of JOIN’s and all sorts of simple selects without issue. However, the elements of sqlite that behave differently tend to do so silently and without complaint, making it harder to be certain that everything is working. However, I’m planning on fleshing out a set of test queries, however, for now I could really just use the help with testing/checking the SQL (as I’m no sqlite Guru).

Known Issues

  • DataType sloppyness - I hope to handle this better (storing a DateTime string in TEXT would extract to DateTime successfully) right now you need to pretty much just use strings or numeric values.
  • Inefficient Queries - Not being a Sql master, I can’t say that much of whats generated is the best way to do things, please, if you know then share!
  • OrderBy issues - Its just hard as heck to get working, it seems to work fine sometimes, but no promises.

Anyways, play around, have fun, and note that you need the Sqlite provider for ADO.Net (duh).

Linq To Sqlite Download

I’m looking at db_linq, which is a full (bi-directional, change tracking, general awesome crazyness) solution, this is really just a way to query sqlite db. I might try to add a sqlite provider to db_linq at some point, its just that their system is very different from my implementation, so there wouldn’t be too much shared code. :(

The Changing Face of High-Level Programming

Ok, so I’m sure most MS .Net dev’s have already seen these posts far too many times, for the Mono users out there, I have a little treat. While Moonlight and WPF get tons of hype, I think the biggest and most exciting change coming soon to a C# compiler near you is support for lambda expressions, anonymous types, and extension methods.

Now on the whole this doesn’t sound all that exciting, I mean, before a few months ago, I had never really used lambda expressions to accomplish much beyond pass that unit in an intro to CS class. Individually, there’s nothing to jump for joy about, but when used in conjunction, we can produce startlingly clean and readable code.

To demonstrate this I’ve whipped up two examples that I was fiddling with as I read a million tutorials. They aren’t fancy XML or Database providers, just some simple (and quite common in my experience) text parsing tasks that have disproportionately complex code. We will use some of the new C# 3.0 features to make far cleaner and more readable code.

The first example is an exclusion string, or a set of characters that are not allowed in another.

var illegalchars = "abcdefg"; 
string testString1 = "Kevin"; 
string testString2 = "hijkmlppp";

The ‘old’ way of checking both strings for one of the illegal chars:

 foreach (char c in illegalchars) { 
if (testString1.Contains(c) || testString2.Contains(c)) 
 Console.WriteLine("illegal char!"); 
}

Using awesome new stuff:

 if (testString1.Intersect(illegalchars).Any() 
|| testString2.Intersect(illegalchars).Any()) 
Console.WriteLine("Linq found it too");

Our next example is ‘exploding’ or splitting a series of values out of a string (CSV and PSV are common examples of this) into an array:

 string pipeDelined = "Kevin | McCool | Kubasik";

An old solution might have been (I know we could optimize this, or clean it up, just making a point ;) ):

 List<string> names = new List<string>(); 
foreach (string s in pipeDelined.Split('|')) { 
var ts = s.Trim(); 
if (ts == "") continue; 
names.Add(ts); 
} 
var allNames = names.ToArray();

Using our cool new C# 3.0 tools, we can change this to the super-sexy:

 var allLinqNames = pipeDelined.Split('|') 
.Select(s => s.Trim())
.Where(s => s != "")
.ToArray();

While a hardened child of OOP (via C# and Java) might baulk at the new syntax, I think that it can quickly start to grow on a developer. Moreover, it has the distinct advantage of being unambiguous, and makes reading someone else’s dense code much more fluid. 

I really can’t wait for C# 3.0, and not for those flashy API’s, just the simple syntactical sugar that is already making me lazier by the minute.

Waking Up In the Middle of the Night

It happens.. even running on so little sleep, I still find myself waking.

Fortunately, this time I awoke with an awesome realization. I’ve been pounding my brain against the wall for a week now on how to further refine/increase the accuracy of my original relation-based ranking system. My initial results had been less than stellar when unleashed upon the desktop as a whole. In controlled situations (where my defined relationships weight’s were proportionate and scaled) the results were excellent, but I was hoping this ‘lowest common denominator’ of sorts would be the answer. I was mistaken. After being more or less tossed back to square one, I was less than optimistic to say the least.

However, at 2:30 this morning all that seems irrelevant, as I believe I have determined the key to blazingly accurate desktop search results (specifically over large search sets, to the order of shared drives with thousands of documents, images, e-mails and other media files without any real semantic system to start). In my original design I made the mistake of utilizing fixed-proportion weights for my relationships. A similar mistake as seen in many ObjectRank based systems. PDF Alert! By fixed proportion, I mean that an astronomical amount of time has gone into determining how important an ‘author’ relationship is when compared to a ‘creation date’ relationship. I (like many before me) was using a weight x termsimilarityindex type system for each relationship. As a result I was spending tons of time and effort trying to strike the proper balance, and in most cases when I got one situation to work, I completely destroyed another.

I think my < sarcasm > brilliant </sarcasm > revelation is becoming obvious, but bear with me.

We cannot pretend that authorship means the same thing to all users, a simple example is the large number of users who still operate relatively isolated desktops, where they are the only author for most of the content. if someone email’s them a document, it will have a hard time weighing up. However, creation date/modification date would probably serve as a solid indicator of relationship, as one person can really only work on one thing at a time.

I wish I had something better to show than just this (I’m mostly writing this down so I don’t forget it in the morning :) ) but I’ve determined that we need a deeper dimension of weight on relationship weighting (when scoring). While one possibility is to just add another variable to our existing weight-determination system, I am leaning towards something more broad. What if the programmer only had to specify a relationship, and through a combination of its occurrence, how closely it paralleled term-based similarity, and how often that relationship type was used to rank a selected result (would require gui integration, but for this proof of concept thats ok in my head) to build an individualized weight for each relationship.

All of a sudden, the massive programmer burden of a relational ranking system is removed! (it takes a lot of specific code to handle each relationship and its weights/different characteristics properly) While there would be a massive front-end cost to tweaking and tuning the system which determines those individual relationship weights, it would be time well spent, as new data types/sources are added, there is no additional work beyond declaring/mapping the relevant relationships.

Once the sun has actually risen, I’ll try to start the process of actually codifying what I’m trying to say. If I’ve actually made enough sense that anyone understands what I’m getting at and has any thoughts/comments/criticisms, please share!

Banshee Ipod Playlist Support

It looks like the monster might finally start to lay itself to rest. After almost 2 years, one of the most basic feature requests for Banshee looks like it will finally be fulfilled. I’m talking about playlist syncing to iPods. While there have been a plethora of patches in varying states of readiness always floating around, it just never got into trunk. I am very pleased to have checked in a working (and building at the moment) patch which enables the management of iPod playlists though banshee.

I know that the patch has been in better shape, there were a dozen different times that a commit might have made sense, but in the end, ipod-sharp is a moving target, and trying to hit it and Banshee with stable API’s at the same time (without a freeze ;) ) has proven to be quite difficult (no hard feelings to the Banshee dev’s they keep new features coming, and fast). Anyways, there are a few known bugs with this patch, most of which (in my super-limited testing) stem from ipod-sharp being in the middle of an API shift, and trunk isn’t working.

Anyways, I wanted to make a list of Features and Bugs, namely so the 2 don’t get confused, since a big part of this patch was trying to determine exactly what ‘expected behavior’ was, theres a lot of room to grow.

Known Bugs

  • Major Performance Issues - This just needed to eventually go in, and maybe the new ipod-sharp api will have a better solution, but I started working on this, everything (meaning the entire music library) must be iterated over to find a corresponding track. Some preliminary work was done to get more content sorted/hashed, but theres still a lot of work to do here.
  • Double Tracks on IPod - Depending on your version of ipod-sharp, and what random steps you take to get things building against your version, there is a common issue where a Playlist Dragged from the Library onto an iPod will result in duplicates of every song in the playlist on the iPod. This should be easy enough to track down if someone just has the time and patience.
  • New ipod-sharp API - As there will eventually be a new ipod-sharp API, someone needs to migrate the current logic to the new API, should be mostly the same except for the device detection logic.

Behavior Issues/Features

  • A Playlist from the Library to the iPod with the same name will result in the iPod version being overwritten.
  • Dragging a track from the library to a iPod playlist will result in that track being copied to the iPod again
  • Click and Drag support for playlist’s on iPod, its recommended that you drag songs from the iPod’s library
  • Rename of iPod playlists
  • Does not synchronize all library playlists to iPod automatically, only those which are placed onto the iPod

I think thats most of it, once iPod support in Banshee has leveled out a little bit, I plan on adding support for On-The-Go playlists and Smart Playlists. Anyways, I know that its far from a perfect commit, but after porting this patch through so many API changes, design shifts, and general bitrot, I really just wanted to get it out of Bugzilla.

The obligatory screenshot:

Banshee With iPod Playlists

Note: I’ve tested this with the latest iPod Firmware, if you run the Hash tool as you normally would, it should work fine.

Building More Relationships in Beagle

Today I checked in a few fun changes to Beagle today focused on the idea of emphasizing relationships between entities. It doesn’t sound like a whole lot of fun, but its kinda nifty.

New Query Context Options

  1. Find Documents by same author.
  2. Find E-mails from same contact.
  3. Find Pages from same site.

In addition (building upon Beagle’s new External Metadata system) I have added support for the tracking of Firefox downloads to files. The file downloaded with Firefox has an extra property (beagle:Origin) which denotes the Url it was downloaded from. I haven’t started to integrate anything on the UI side with this new information, as I want to add support for Epiphany, Opera, and Konqeror. Eventually, I would love to see this kind of mapping from downloaded mail attachments, but thats a little more difficult.

Anyways, this is more work towards my eventual goal of a ranking system based upon relationships (among desktop data). Anyways, I know that no feature-centric blog post is complete without screenshots, so I present:

Original Query

The Resulting Query

Beagle’s powerful and simple query language makes stuff like this really easy, its just a matter of knowing what properties warrant special treatment like this. I’m open to ideas, what

The Deed Is Done

A successful (so far) Wordpress upgrade. Version 2.3 seems cool enough, If you spot a problem, please let me know!

Edit: It appears that I may have spoken too soon… Some fickle behavior from plugins that used to like categories,  but I guess that was to be expected. =/

Relationships in the Desktop - Relational Desktop Search and Beagle

I’ve been working on and off on a writeup concerning the use of Beagle to build an intelligent ‘rank’ for desktop entities. Or, in short, a Ranking system (not unlike Page Rank or the like) to organize desktop search results by far more than just keyword/date. I know the writing sucks, and its not 100% complete yet. In addition, I don’t have much in terms of code to share (yet).

To summarize (for those lazybones out there) I’m thinking of utilizing fairly universal and constant relationships (Creator, Creation Date, Modification Date(s), Parent/Source, and maybe others) to recurse deep into desktop relationships. By adding relevancy to the root hit for every child it has (logarithmically decreased by recurse iteration) we can have far more accurate desktop search results when querying a simple keyword/phrase. In addition, the children of a hit could often be considered hits themselves, if found in enough ‘root’ hits.

Its a loose and patchy idea, and miles from a realistic implementation, however, thanks to the awesomeness of Lucene, comparing 2 in-index documents for textual relevancy (based on Term Frequency) is not impossible. (I have not considered the performance elements of these comparisons yet, they may be too slow to be realistic without serious optimization)

Anyways, I’m working on it in Google Docs, so you can check out the full document here. I’ll post once I’ve finished my research/planning etc.

Please, share your thoughts! This is in the ‘major brainfart’ stage, so its open to whatever from anyone, I want to hear ideas!

Technorati Tags: , , , , ,

Powered by ScribeFire.

Port of Mirage to Windows

So, while I’m at work, I tend to have some sort of music going on in the background, since I don’t have my whole personal library available (at least, not yet), I’ve become a member of the streaming radio revolution. The obvious choices (for personalized stations, and music you actually want to listen too) are Pandora and Last.fm. I’ve really grown to like both services (although I have a slight preference for last.fm at the moment) the main issue for me is that I have 5,000 songs sitting on my hard drive, all artists I like, so why can’t I get the same awesome intelligent matching among those tracks?

Mirage is an implementation of a Masters thesis on music analysis, the proof of concept code was written in C# (under mono) and targeted at Linux, specifically the Banshee music player. When at home, I love Banshee, I’ve done my fair share of development work on it, and always have a fresh svn checkout of it to see whats new. However, work is on a Windows machine, and I want this cool nifty awesomeness their as well. As a result, I have embarked upon a port of the Mirage library, as well as the creation of an iTunes plugin to make this code useful ;)

Since I figured some other people might be interested in this I made a project at Google Code.  There’s not much their yet, just some clumsy stabs at working with Visual Studio (I’m using the 2008 Beta 2… its ‘free’ as in Beer). Anyways, I’ve jotted down some erratic thoughts as to possible goals/design choices.

The current installer should run fine if you have iTunes installed, and parse mp3 and aac files fine. (sorry 32 bit only at the moment) However, I really need to do some more investigation before I’ll know if I’m passing the right data into the library.

Anyways, its cool and all kinds of fun to start using COM ;) If anyone has experience with windows ports and wants to lend a hand or some advice, its all more than welcome.

Technorati Tags: , , , , , , ,

Powered by ScribeFire.

Updated Beagle Packages for Gutsy Available

Beagle support in Ubuntu has been less than stellar up until this point (across all releases), and unfortunately, the best that we can really hope for in the immediate future is acceptable. This is mostly because only a few of Beagle’s developers are running Ubuntu, and accurately reproducing common errors is difficult. To top this all off, the defacto Ubuntu contact at this point is me, and I haven’t had the available time to really track down some of the more difficult bugs.

However, this problem reached an all time low when the beagle source package stopped building in Gutsy. This spurred us into action (our urgency increasing as we realized how close Gutsy was to shipping) and as a result there exist updated Ubuntu Gutsy packages (based upon the new 0.2.18 bugfix release of Beagle) available for testing. Thanks to Launchpads new super-awesome Personal Package Archive system, you only need to add the following sources, or download from the corresponding link. (NOTE! the versioning of these debs will not force an update if they are accepted into main, you will need to reinstall should they be accepted at their current version number!)

deb     http://ppa.launchpad.net/kkubasik/ubuntu gutsy main 
deb-src http://ppa.launchpad.net/kkubasik/ubuntu gutsy main 

Please report bugs with these packages either to Beagle in launchpad or the dashboard-hackers mailing list. The more feedback we get in the next few days the better the chance that Ubuntu Gutsy will ship a solid Beagle.  

 

Beagle Ubuntu Package Update

Technorati Tags: , , , , , , , ,

With everything that has been swarming all over my plate lately, I haven’t had a chance to really keep on top of the Beagle packages in Ubuntu, and as a result, they are currently pretty crappy. I have a branch (meant to be feisty-updates, but I was in a hurry, and didn’t feel like branching), with a building deb configuration for Gutsy. I hope to have binaries/sources available for testing later this week.

The branch is hosted here:

https://code.launchpad.net/~kkubasik/beagle/feisty-update

Just do the following to try and build:

 

   1:  bzr branch http://bazaar.launchpad.net/~kkubasik/beagle/feisty-update
   2:  cd feisty-update
   3:  sudo apt-get build-dep beagle
   4:  bzr builddeb -w --split

It’s Been A While

A recent e-mail mentioning my Blog made me realize just how long its been since I’ve posted anything. This is just me saying that I’m sorry! I’ve been busy (duh) but didn’t really realize how much of my old development work was being sidelined until I finally started to catch up on my dev e-mails (4236 unread and counting….) So if I haven’t responded to something in the past month, I promise, I’m getting there. In addition, I just realized how horribly outdated local revisions/patches/branches become when you leave them unmerged/unattended for a month, so I have some real fun coming trying to get some of my cool metadata stuff in Beagle/Dashboard working against the current trunk.

Hopefully I’ll be back in a day or two with an awesome list of everything I accomplished/caught up on. However, its entirely possible that all I will be able to say is that I read and responded to my e-mail =/.

Real Beryl

I know that the Beryl Project is fading from existence, but I still couldn’t help myself when I saw this enormous chunk of it at the Smithsonian.

Beryl