Tag Archive for 'Mono'

Mono GSOC Projects: Linq to SQLite

So I noticed that one of the accepted proposals for the Mono project is to create a LINQ provider for SQLite. Major props to this (its something I totally want to see!) and I’m glad to see that LINQ in Mono is going to be its own beast, I love it when the FOSS community just takes a technology and runs with it! Anyways, I wanted to try and get in touch with the mentor/student of this project and share my experience (as the author of the current LINQ to SQLite component ). But contact info seemed hard to come by, so I thought I would post what I had learned.

First, people really want this, and there are several half-complete implementations floating around, including mine (read only, no commit/update/delete support) and this one.

Second, support for just queries is quite easy. Support for complete CRUD, tedious but not to difficult (lots of examples already exist). Support for the generation/mapping/reflection of a database to real Linq objects, this is the tricky part (specifically the UI elements when unable to just piggyback the Visual Studio work).

Anyways, all the luck in the world to this GSOC project, I would really like to see a working implementation come from this!

Tomboy Tagging: A Third Try

Ok, I replaced our original autocomplete system with something a little more reserved, and based on a Gtk.Entry. Stealing heavily from a little-known F-Spot widget, I concocted a simple tagbar. Tab cycles through the completions and Enter selects. To give a rough idea of whats going on I made a quick screencast.

Google Vids

OGG

Xvid - Avi 

Let me know what you all think, if this is what we want to base our work on, then I’ll clear out the almost 500 lines of commented code from our past revisions. If we need to try something new, we can do that to.

Tomboy Hackfest: Part 2

Alright! Some cool news! The Mono Hackfest at the Novell OSTC in Provo, Utah was a success ( I would say, people showed up and we talked about features ;) )  And while I took several photos in the hope that I might have another photo-riffic blog post, but alas, my flash wasn’t on, and they are all pretty much worthless.  That aside, it was pretty cool to root around more of the Novell OSTC campus.

As a by-product of the ‘Hack’ing portion of the hackfest, I am happy to report the enabling of tagging in Tomboy, while we are still working out the specifics of the tagging interface, a super-experimental version of our newest iteration (based somewhat strongly on the Blogger.com tagging interface which we all agreed was somewhat well designed). While this is mostly implemented, there are a few issues (mostly based on my lack of Gtk knowledge/experience) with some of the autocomplete logic. I basically created a new Gtk Window which is composed of a ListView and hovers over the entry area (actually a GtkTextView, as a GtkEntry wouldn’t handle text markup). There are 2 real problems at the moment.

  1. I need to handle keyboard input intelligently enough to allow selection of an autocomplete option, I just need someone more familiar with how keypress events are handled to take a look at my code and figure out what widgets I should listen to the keypress events on etc.
  2. I need to get the autocomplete popup widget to show in the right place (should be easy enough to get, again I just need someone a little more familiar with the Gtk API so I don’t spend another hour looking for the window positioning information.) and the widget needs to close when a note is closed. (Right now the autocomplete box hangs around, should again just be subscribing to a window destroy event, but my previous attempts have resulted in some messes.)

Anyways, if anyone has the time to offer a hand/check any of this out, just drop by #tomboy (I’ll be in and out due to exams, but I’ll do my best to answer any questions) or feel free to just fix it right off the bat ;) Here’s a quick and dirty screenshot of the problem as it exists (you can see the autocomplete dropdown isn’t quite right).

Tomboy Tagging Screenshot

In addition to traditional tags, we have added a new little tidbit for Addin Developers, the concept of System Tags. In short, any tag added to a Tomboy note with the system: prefix will not be displayed. While this seems a little stupid at first glance, this allows us to easily implement things like Tasks, and allow Addins to associate their own data with tags while not implementing their own data store, and still maintaining backwards compatibility. For example if I wanted to implement ‘Contacts’ in Tomboy (NOT A FEATURE THAT SHOULD BE IMPLEMENTED IMHO)  I could simply add the following tags to store all the information I needed for my Addin:

  • system:Contact
  • system:FirstName:Kevin
  • system:LastName:Kubasik
  • system:EMail:KevinAtKubasikDotNet

And so on, anything with the ’system:’ prefix will be hidden from the user, but still stored with each note.

Tomboy Hackfest Tonight at the Novell OSTC

Well be hacking it up tonight at 6:00PM MST at the Novell Open Source Technology Center. The rough TODO for the night seems to be Tags, Tasks and maybe even a backend to query Beagle. ;)  Anyways, if your in the greater Salt Lake City area, come on down! If your a little further away but want to join in anyways,  join in on #tomboy!

See you tonight!

Mono 1.2.6 Memory Usage

So, I’ve heard a lot of hype about the upcoming 1.2.6 release of Mono being faster, leaner, and more stable then ever before (due largely to Novell’s acquisition of a QA team dedicated to Mono). Beagle has always gotten flack over memory use, and as a result, we are relentless in our hunt for abused memory. And while it is wonderfully satisfying to reduce memory usage, its really hard to beat dropping megabytes of resident memory for free :). I’m running Ubuntu Gutsy and its 1.2.4 release of Mono, but in my quest for some real numbers to back up all this talk I built the current SVN trunk of Mono.

Even my most optimistic expectations put our potential benefit around maybe 2 or 3 MB resident less than beagle running under Mono 1.2.4. On my test setup, Beagle 0.3pre consumed (after my recent Opera backend fix) around 110 MB of VM and 36 MB of RSS (averaged over a 2 hour run).After building and installing Mono 1.2.6, the same 2 hour run was averaging 72 MB of VM and 27 MB of RSS! Its still far from perfect, but free memory reduction is just plain cool :).

Some observations about the general pattern of allocation and collection under 1.2.6, it ‘idles’ much lower than 1.2.4. While some actions always push the memory usage up, 1.2.6 *appeared* to return to its lower memory point much faster, and more regularly.

Anyways, I just wanted to say, props to everyone on the Mono team for rocking my socks.

Google: How do you do it?

So its not a big surprise that an oft-requested feature for Beagle is the ability to index a users Gmail messages (like Google Desktop Search). Today we (the Beagle developers) started to investigate just how this is done. While POP3 (and now IMAP) are available, downloading all of a users mail, indexing it, and then caching the text so we can display it. Now, my initial investigation into GDS for Linux revealed that it was calling home via POP3s and downloading lots of data. I have assumed that it was simply iterating over all messages (via POP3), downloading them, indexing them, and caching the compressed content somewhere in Google’s custom indexes.

Now, I had originally planned on this post being an open plea to any and everyone at Google asking them to open up the Gmail access API, but seeing as its just the plain old ugly POP3 (maybe a cool extension), were stuck biting the bullet and implementing a remote mail access layer.

Anyways, given how incredible Google has been in a million other situations, I thought I would throw out 2 wildly out-of-this-world questions, I wouldn’t expect to get a response, but before I spend the time figuring it all out, I felt like I should at least ask.

  • Are there some special POP Extensions available in Gmail? Is there some helper web api? Or does GDS really just have a POP3 crawler?
  • Is your compression/text storage library open source? (or documented in some research paper at all?) Beagle has always struggled with how to best handle storing copies of a documents text so that it might be made available in interfaces. While we do have a new hybrid text cache (text over 4k on the filesystem, under in a sqlite db, all compressed) we were still no where near as small as the GDS indexes. A cursory examination reveals that the GDS indexes are some form of b-tree on disk, but how are you compressing all that text so small? Is there some substitution/reconstruction algorithm? (It seems like that would be wildly expensive, but who knows).

Anyways, its a long shot, and its pretty far out there, but for the sake of not passing up answers that I can’t seem to find elsewhere on the net, I have asked.

SqlLite Linq Provider

Ok, so as many of you may have noticed in my last post I’ve taken a real interest in C# 3.0 and its new nifty features. Now, I’m mostly just excited for the simple collection manipulations, but the whole Linq to SQL thing is nagging in the back of my mind. Now while complete CRUD will no doubt take some serious code, simple query support is not that difficult to implement. After following mattwar’s blog series on a generic DB provider for Linq, I decided that I wanted that awesome glory against a lean and mean sqlite db.

So, about an hour or so of messing and meddling I have a few samples working. (The attached zip) Its a Visual Studio 2008 Beta 2 project (I’ll be writing autotools magic for mono later this week) so sorry for that, the code can still be imported into Monodevelop, just the solution/project files won’t work. Anyways, I have tested a smattering of JOIN’s and all sorts of simple selects without issue. However, the elements of sqlite that behave differently tend to do so silently and without complaint, making it harder to be certain that everything is working. However, I’m planning on fleshing out a set of test queries, however, for now I could really just use the help with testing/checking the SQL (as I’m no sqlite Guru).

Known Issues

  • DataType sloppyness - I hope to handle this better (storing a DateTime string in TEXT would extract to DateTime successfully) right now you need to pretty much just use strings or numeric values.
  • Inefficient Queries - Not being a Sql master, I can’t say that much of whats generated is the best way to do things, please, if you know then share!
  • OrderBy issues - Its just hard as heck to get working, it seems to work fine sometimes, but no promises.

Anyways, play around, have fun, and note that you need the Sqlite provider for ADO.Net (duh).

Linq To Sqlite Download

I’m looking at db_linq, which is a full (bi-directional, change tracking, general awesome crazyness) solution, this is really just a way to query sqlite db. I might try to add a sqlite provider to db_linq at some point, its just that their system is very different from my implementation, so there wouldn’t be too much shared code. :(

The Changing Face of High-Level Programming

Ok, so I’m sure most MS .Net dev’s have already seen these posts far too many times, for the Mono users out there, I have a little treat. While Moonlight and WPF get tons of hype, I think the biggest and most exciting change coming soon to a C# compiler near you is support for lambda expressions, anonymous types, and extension methods.

Now on the whole this doesn’t sound all that exciting, I mean, before a few months ago, I had never really used lambda expressions to accomplish much beyond pass that unit in an intro to CS class. Individually, there’s nothing to jump for joy about, but when used in conjunction, we can produce startlingly clean and readable code.

To demonstrate this I’ve whipped up two examples that I was fiddling with as I read a million tutorials. They aren’t fancy XML or Database providers, just some simple (and quite common in my experience) text parsing tasks that have disproportionately complex code. We will use some of the new C# 3.0 features to make far cleaner and more readable code.

The first example is an exclusion string, or a set of characters that are not allowed in another.

var illegalchars = "abcdefg"; 
string testString1 = "Kevin"; 
string testString2 = "hijkmlppp";

The ‘old’ way of checking both strings for one of the illegal chars:

 foreach (char c in illegalchars) { 
if (testString1.Contains(c) || testString2.Contains(c)) 
 Console.WriteLine("illegal char!"); 
}

Using awesome new stuff:

 if (testString1.Intersect(illegalchars).Any() 
|| testString2.Intersect(illegalchars).Any()) 
Console.WriteLine("Linq found it too");

Our next example is ‘exploding’ or splitting a series of values out of a string (CSV and PSV are common examples of this) into an array:

 string pipeDelined = "Kevin | McCool | Kubasik";

An old solution might have been (I know we could optimize this, or clean it up, just making a point ;) ):

 List<string> names = new List<string>(); 
foreach (string s in pipeDelined.Split('|')) { 
var ts = s.Trim(); 
if (ts == "") continue; 
names.Add(ts); 
} 
var allNames = names.ToArray();

Using our cool new C# 3.0 tools, we can change this to the super-sexy:

 var allLinqNames = pipeDelined.Split('|') 
.Select(s => s.Trim())
.Where(s => s != "")
.ToArray();

While a hardened child of OOP (via C# and Java) might baulk at the new syntax, I think that it can quickly start to grow on a developer. Moreover, it has the distinct advantage of being unambiguous, and makes reading someone else’s dense code much more fluid. 

I really can’t wait for C# 3.0, and not for those flashy API’s, just the simple syntactical sugar that is already making me lazier by the minute.

Waking Up In the Middle of the Night

It happens.. even running on so little sleep, I still find myself waking.

Fortunately, this time I awoke with an awesome realization. I’ve been pounding my brain against the wall for a week now on how to further refine/increase the accuracy of my original relation-based ranking system. My initial results had been less than stellar when unleashed upon the desktop as a whole. In controlled situations (where my defined relationships weight’s were proportionate and scaled) the results were excellent, but I was hoping this ‘lowest common denominator’ of sorts would be the answer. I was mistaken. After being more or less tossed back to square one, I was less than optimistic to say the least.

However, at 2:30 this morning all that seems irrelevant, as I believe I have determined the key to blazingly accurate desktop search results (specifically over large search sets, to the order of shared drives with thousands of documents, images, e-mails and other media files without any real semantic system to start). In my original design I made the mistake of utilizing fixed-proportion weights for my relationships. A similar mistake as seen in many ObjectRank based systems. PDF Alert! By fixed proportion, I mean that an astronomical amount of time has gone into determining how important an ‘author’ relationship is when compared to a ‘creation date’ relationship. I (like many before me) was using a weight x termsimilarityindex type system for each relationship. As a result I was spending tons of time and effort trying to strike the proper balance, and in most cases when I got one situation to work, I completely destroyed another.

I think my < sarcasm > brilliant </sarcasm > revelation is becoming obvious, but bear with me.

We cannot pretend that authorship means the same thing to all users, a simple example is the large number of users who still operate relatively isolated desktops, where they are the only author for most of the content. if someone email’s them a document, it will have a hard time weighing up. However, creation date/modification date would probably serve as a solid indicator of relationship, as one person can really only work on one thing at a time.

I wish I had something better to show than just this (I’m mostly writing this down so I don’t forget it in the morning :) ) but I’ve determined that we need a deeper dimension of weight on relationship weighting (when scoring). While one possibility is to just add another variable to our existing weight-determination system, I am leaning towards something more broad. What if the programmer only had to specify a relationship, and through a combination of its occurrence, how closely it paralleled term-based similarity, and how often that relationship type was used to rank a selected result (would require gui integration, but for this proof of concept thats ok in my head) to build an individualized weight for each relationship.

All of a sudden, the massive programmer burden of a relational ranking system is removed! (it takes a lot of specific code to handle each relationship and its weights/different characteristics properly) While there would be a massive front-end cost to tweaking and tuning the system which determines those individual relationship weights, it would be time well spent, as new data types/sources are added, there is no additional work beyond declaring/mapping the relevant relationships.

Once the sun has actually risen, I’ll try to start the process of actually codifying what I’m trying to say. If I’ve actually made enough sense that anyone understands what I’m getting at and has any thoughts/comments/criticisms, please share!

Banshee Ipod Playlist Support

It looks like the monster might finally start to lay itself to rest. After almost 2 years, one of the most basic feature requests for Banshee looks like it will finally be fulfilled. I’m talking about playlist syncing to iPods. While there have been a plethora of patches in varying states of readiness always floating around, it just never got into trunk. I am very pleased to have checked in a working (and building at the moment) patch which enables the management of iPod playlists though banshee.

I know that the patch has been in better shape, there were a dozen different times that a commit might have made sense, but in the end, ipod-sharp is a moving target, and trying to hit it and Banshee with stable API’s at the same time (without a freeze ;) ) has proven to be quite difficult (no hard feelings to the Banshee dev’s they keep new features coming, and fast). Anyways, there are a few known bugs with this patch, most of which (in my super-limited testing) stem from ipod-sharp being in the middle of an API shift, and trunk isn’t working.

Anyways, I wanted to make a list of Features and Bugs, namely so the 2 don’t get confused, since a big part of this patch was trying to determine exactly what ‘expected behavior’ was, theres a lot of room to grow.

Known Bugs

  • Major Performance Issues - This just needed to eventually go in, and maybe the new ipod-sharp api will have a better solution, but I started working on this, everything (meaning the entire music library) must be iterated over to find a corresponding track. Some preliminary work was done to get more content sorted/hashed, but theres still a lot of work to do here.
  • Double Tracks on IPod - Depending on your version of ipod-sharp, and what random steps you take to get things building against your version, there is a common issue where a Playlist Dragged from the Library onto an iPod will result in duplicates of every song in the playlist on the iPod. This should be easy enough to track down if someone just has the time and patience.
  • New ipod-sharp API - As there will eventually be a new ipod-sharp API, someone needs to migrate the current logic to the new API, should be mostly the same except for the device detection logic.

Behavior Issues/Features

  • A Playlist from the Library to the iPod with the same name will result in the iPod version being overwritten.
  • Dragging a track from the library to a iPod playlist will result in that track being copied to the iPod again
  • Click and Drag support for playlist’s on iPod, its recommended that you drag songs from the iPod’s library
  • Rename of iPod playlists
  • Does not synchronize all library playlists to iPod automatically, only those which are placed onto the iPod

I think thats most of it, once iPod support in Banshee has leveled out a little bit, I plan on adding support for On-The-Go playlists and Smart Playlists. Anyways, I know that its far from a perfect commit, but after porting this patch through so many API changes, design shifts, and general bitrot, I really just wanted to get it out of Bugzilla.

The obligatory screenshot:

Banshee With iPod Playlists

Note: I’ve tested this with the latest iPod Firmware, if you run the Hash tool as you normally would, it should work fine.

Building More Relationships in Beagle

Today I checked in a few fun changes to Beagle today focused on the idea of emphasizing relationships between entities. It doesn’t sound like a whole lot of fun, but its kinda nifty.

New Query Context Options

  1. Find Documents by same author.
  2. Find E-mails from same contact.
  3. Find Pages from same site.

In addition (building upon Beagle’s new External Metadata system) I have added support for the tracking of Firefox downloads to files. The file downloaded with Firefox has an extra property (beagle:Origin) which denotes the Url it was downloaded from. I haven’t started to integrate anything on the UI side with this new information, as I want to add support for Epiphany, Opera, and Konqeror. Eventually, I would love to see this kind of mapping from downloaded mail attachments, but thats a little more difficult.

Anyways, this is more work towards my eventual goal of a ranking system based upon relationships (among desktop data). Anyways, I know that no feature-centric blog post is complete without screenshots, so I present:

Original Query

The Resulting Query

Beagle’s powerful and simple query language makes stuff like this really easy, its just a matter of knowing what properties warrant special treatment like this. I’m open to ideas, what

Relationships in the Desktop - Relational Desktop Search and Beagle

I’ve been working on and off on a writeup concerning the use of Beagle to build an intelligent ‘rank’ for desktop entities. Or, in short, a Ranking system (not unlike Page Rank or the like) to organize desktop search results by far more than just keyword/date. I know the writing sucks, and its not 100% complete yet. In addition, I don’t have much in terms of code to share (yet).

To summarize (for those lazybones out there) I’m thinking of utilizing fairly universal and constant relationships (Creator, Creation Date, Modification Date(s), Parent/Source, and maybe others) to recurse deep into desktop relationships. By adding relevancy to the root hit for every child it has (logarithmically decreased by recurse iteration) we can have far more accurate desktop search results when querying a simple keyword/phrase. In addition, the children of a hit could often be considered hits themselves, if found in enough ‘root’ hits.

Its a loose and patchy idea, and miles from a realistic implementation, however, thanks to the awesomeness of Lucene, comparing 2 in-index documents for textual relevancy (based on Term Frequency) is not impossible. (I have not considered the performance elements of these comparisons yet, they may be too slow to be realistic without serious optimization)

Anyways, I’m working on it in Google Docs, so you can check out the full document here. I’ll post once I’ve finished my research/planning etc.

Please, share your thoughts! This is in the ‘major brainfart’ stage, so its open to whatever from anyone, I want to hear ideas!

Technorati Tags: , , , , ,

Powered by ScribeFire.

Port of Mirage to Windows

So, while I’m at work, I tend to have some sort of music going on in the background, since I don’t have my whole personal library available (at least, not yet), I’ve become a member of the streaming radio revolution. The obvious choices (for personalized stations, and music you actually want to listen too) are Pandora and Last.fm. I’ve really grown to like both services (although I have a slight preference for last.fm at the moment) the main issue for me is that I have 5,000 songs sitting on my hard drive, all artists I like, so why can’t I get the same awesome intelligent matching among those tracks?

Mirage is an implementation of a Masters thesis on music analysis, the proof of concept code was written in C# (under mono) and targeted at Linux, specifically the Banshee music player. When at home, I love Banshee, I’ve done my fair share of development work on it, and always have a fresh svn checkout of it to see whats new. However, work is on a Windows machine, and I want this cool nifty awesomeness their as well. As a result, I have embarked upon a port of the Mirage library, as well as the creation of an iTunes plugin to make this code useful ;)

Since I figured some other people might be interested in this I made a project at Google Code.  There’s not much their yet, just some clumsy stabs at working with Visual Studio (I’m using the 2008 Beta 2… its ‘free’ as in Beer). Anyways, I’ve jotted down some erratic thoughts as to possible goals/design choices.

The current installer should run fine if you have iTunes installed, and parse mp3 and aac files fine. (sorry 32 bit only at the moment) However, I really need to do some more investigation before I’ll know if I’m passing the right data into the library.

Anyways, its cool and all kinds of fun to start using COM ;) If anyone has experience with windows ports and wants to lend a hand or some advice, its all more than welcome.

Technorati Tags: , , , , , , ,

Powered by ScribeFire.