Page 1 of 1

General File Support - Supporting adding non-media files

PostPosted: Tue Feb 07, 2012 10:21 am
by diffy
Hi

I use Subsonic in a non-conventional way. I actually use it as a general purpose search engine for my company's files. It works fine and everybody is really happy! My users search for files and download them. Sometimes they upload files, and love the zip functionality.

I can see in the server log however, that Subsonic is not that happy, that it is struggling sometimes with files that ffmpegparser cannot parse, for obvious reasons (ffmpeg can't parse pdf :)).
The transcode directory is empty, but it still tries to parse these files.

I have talked about this before in the forums that I am now using Subsonic as a general purpose search engine.

What I would like to suggest, is that an extra filetype is supported:
I would like to have an option to add *all* other files to the database as well. Like having a tick-box that said "index all files in selected folder? *.*"
And then it would be good to have a function that it would have ffmpegparser and others *only* touch files that are listed in the music or movie filetypes.

Now I have to add every file extension manually that I want to have available, and I can see that Subsonic isn't very happy about it, and the file-totals in the upper left side on the screen are not 100% correct.

Yet it IS working really well, and I can now upload files larger than 100 megabyte without a problem. Apache+php has problems doing that (even when tuning php.ini)...

Re: General File Support - Supporting adding non-media files

PostPosted: Tue Feb 07, 2012 5:54 pm
by ytechie
What I think would be a great idea is a separate server for general searching, downloading, and uploading of all file types. Subsonic is a bit clunky if you aren't using it for music and media, and rightfully so. I'm thinking about a business oriented application that would use the lucene search indexer just as subsonic does, and would be specifically designed to be a multipurpose search facility available through a web interface.

Instead of playing files, compatible files would have a view option which would display the file in the browser. The share functions would be optimized for business needs, and folders could have different per-user permissions.

I am a fan of simplicity, and I think that a separate server system for exactly this purpose would be absolutely amazing!

Google makes appliances for searching in the enterprise, but they are not open source. They are far from it. They also have quite a price tag.

Re: General File Support - Supporting adding non-media files

PostPosted: Tue Feb 07, 2012 8:17 pm
by diffy
The Google Appliance would probably be the real McKoy.
But it's just fun to work with this application, and out of the love of free open source solutions I intend on continuing this :) And using technologies for what they weren't intended to do is something I've done all my life :D

I guess I should try and learn some Java/jsp and just modify it myself. Still if anyone has some suggestions, I'd apprechiate it!

Re: General File Support - Supporting adding non-media files

PostPosted: Wed Feb 08, 2012 6:19 am
by diffy
I am looking into Apache Solr and Apache Tika to build a general purpose search and download solution.
When I become wiser I might get it into the Subsonic sources also

Re: General File Support - Supporting adding non-media files

PostPosted: Wed Feb 08, 2012 7:32 am
by ytechie
Check out Alfresco. I started looking into server solutions for you quite a few hours ago, and I came across it. Installing it right now.

Re: General File Support - Supporting adding non-media files

PostPosted: Thu Feb 09, 2012 6:05 pm
by diffy
Alfresco looks nice... A tad large tho, a whopping 500+ megabytes...

I have now tried to make sense of the Apache Lucene jungle and it's full of different ways of approaching it.

Apache Lucene -> turns into:
Apache Nutch
Apache Solr
Apache Tika
Apache Jackrabbit
Apache Sling

The Apache Jackrabbit wiki page lists a couple of CMS's that use Jackrabbit;
http://en.wikipedia.org/wiki/Apache_Jackrabbit

Magnolia (CMS) - an Open Source content management system based on Apache Jackrabbit
Hippo CMS - an Open Source content management system based on Apache Jackrabbit
LogicalDOC - Open Source, Enterprise Document Management that uses Apache Jackrabbit
Nuxeo - Open Source ECM based on Apache Jackrabbit
OpenKM - Open Source KM based on Apache Jackrabbit
Sakai Project - Open Source Collaboration and Learning Environment based on Apache Sling and Apache Jackrabbit

Needless to say I have stopped trying to roll my own in this case :-)

Re: General File Support - Supporting adding non-media files

PostPosted: Fri Feb 10, 2012 4:15 am
by ytechie
I actually just installed Nuxeo. It's a bit clunky though. It works like a charm, but it uses plenty of resources.

Re: General File Support - Supporting adding non-media files

PostPosted: Tue Feb 14, 2012 8:06 am
by diffy
I tried Nuxeo out, which was really robust, well made and hugely extensible.
But it couldn't do the one thing I wanted - index my existing file library without having to submit every file through the interface or a bulk-uploader. I asked in the forums and they told me that Nuxeo couldn't do that, but I was free to write a module. :) It did seem like a really awesome tool and system to build anything on from scratch though, and I hope I get to play with it in the future.

OpenKM was a little more primitive, and it also required me to upload each file and couldn't access an existing file library.

I was able to get Apache Tika to index PDF files, it's an amazing project. It can scan all these filetypes, snag data out of documents and metadata out of media files and even recognize which language documents was written in. It only holds the data and metadata there for you 'in the air', and needs to move it along to some processor like Apache Solr or Apache Jackrabbit. But it had no problems dealing with 100M pdf's and indexing the entire document. Apache Solr can get this input from Apache Tika in the form of XML submits, using any tool that can do HTTP POST, one example is Curl. Examples exist in the distributions also, but documentation is sparse and not detailed enough for me to get going.
I haven't been able to automate or get anywhere closer in that respect, and therefore I am still trying to find a ready-made solution. :)

Now I'm off to see if Alfresco, DocMGR or KnowledgeTree can do this.

Re: General File Support - Supporting adding non-media files

PostPosted: Tue Feb 14, 2012 4:39 pm
by Richard Fearnhead
ytechie wrote:What I think would be a great idea is a separate server for general searching, downloading, and uploading of all file types. Subsonic is a bit clunky if you aren't using it for music and media, and rightfully so. I'm thinking about a business oriented application that would use the lucene search indexer just as subsonic does, and would be specifically designed to be a multipurpose search facility available through a web interface.

Instead of playing files, compatible files would have a view option which would display the file in the browser. The share functions would be optimized for business needs, and folders could have different per-user permissions.

I am a fan of simplicity, and I think that a separate server system for exactly this purpose would be absolutely amazing!

Google makes appliances for searching in the enterprise, but they are not open source. They are far from it. They also have quite a price tag.


Dot Net Nuke?
PhP Nuke?
Sharepoint?

Re: General File Support - Supporting adding non-media files

PostPosted: Wed Feb 15, 2012 8:26 am
by diffy
The intent is to stay with Free and Open Source Software. And very very very far away from Sharepoint or any other milking cow of a one of the major
corporate vendors.

Re: General File Support - Supporting adding non-media files

PostPosted: Wed Feb 15, 2012 9:33 am
by ytechie
By the way, check out Nuxeo DM. It's fairly easy to install, and it works so well! It looks great too.

Re: General File Support - Supporting adding non-media files

PostPosted: Thu Feb 16, 2012 10:55 am
by diffy
Cool, I'll check it out. So far Nuxeo is the best of all the ones I have tried...
Looking forward to checking Nuxeo DM.

And Apache Tika works great,but I have no idea on how to feed it's gets from docs into another system or database. So until then... :)