Do they belong to you? Claim these comments.
Chris
Is this you? Claim Profile »
10 months ago
in The Disadvantages of Microsoft SharePoint 2007 as a Document Management System on ChangeForge...
Thank you for such a detailed reply and on the weekend as well…
Firstly I am more than happy for you to publish this conversation, and also give you permission to edit it as you see fit. From your reply and the articles you have published (the ones that I have read) it would appear you have no vested interest in editing our conversation to change the context of my thoughts…
I would agree with many of your points, if not all of them. At Data Liberation we have worked with images, within our DMS application, but also with our Data Capture (OMR) application, and we moved very quickly to using a SQL database for storing the images after enduring the pain of lost files etc within file systems.
Just to cover a couple of point you raise, as I mentioned in my first email we use two distinct databases one database for meta/index data and one for images/documents. Our system takes what ever it is supplied and stores the file as a blob image in one database, thereby ensuring the integrity of the documents (users can look at this, but can not update it, but can of course add new versions). If we recognise the file type (e.g. most image formats, Word, RTF, TXT, PDF etc) the system will either OCR the document or strip the text of the document out and store this in the other database. Additionally users can add there own metadata to the file. By having the all this text based information in one database we are able to perform queries to the documents very quickly and then retrieve the document only if the users requests it.
As we use MS SQL 2005 (with sights on SQL 2008 before the end of the year) we have the benefit of being able to do incremental backups of either database. In our case we do log backups every 15 minutes on both database giving us almost continuous backup protection, full backups happen over night.
The one advantage that you highlight with regards to direct access to documents, in the situation where the file has been corrupted. This I would agree is much easier with a file system and extremely (in comparison) difficult with a database. My response to this is that by using a database and the additional integrity that a SQL database provides is that it would be very unlikely that a single document would be corrupted, with a great chance that the entire image database becoming corrupted.
I think we can both agree no matter what approach is taken, backups and the backup strategy is vital to any CMS/DMS system. The systems become a hugely valuable resource to a company and the loss or even partial loss of any of the data contained within them can be potentially devastating.
I am very happy, when time permits you, to supply you with any information you would like on iiArc. The only area I would have reservations on is the way that we implement encryption of the uploaded documents/images, but otherwise I would thoroughly enjoy defending our approach.
As you mentioned SaaS and Sharepoint will have a huge impact on the CMS/DMS market in the coming years… It could be easily argued that Sharepoint already has changed the CMS/DMS landscape massively already… and I believe that some of the current bigger players within the SME market will need to change their sales models and products sets to meet the more demanding and much better informed clients that now exist, or run the risk of losing market share and potentially disappearing all together…
Kindest Regards
Chris Morgan
Managing Director
Data Liberation Ltd
Firstly I am more than happy for you to publish this conversation, and also give you permission to edit it as you see fit. From your reply and the articles you have published (the ones that I have read) it would appear you have no vested interest in editing our conversation to change the context of my thoughts…
I would agree with many of your points, if not all of them. At Data Liberation we have worked with images, within our DMS application, but also with our Data Capture (OMR) application, and we moved very quickly to using a SQL database for storing the images after enduring the pain of lost files etc within file systems.
Just to cover a couple of point you raise, as I mentioned in my first email we use two distinct databases one database for meta/index data and one for images/documents. Our system takes what ever it is supplied and stores the file as a blob image in one database, thereby ensuring the integrity of the documents (users can look at this, but can not update it, but can of course add new versions). If we recognise the file type (e.g. most image formats, Word, RTF, TXT, PDF etc) the system will either OCR the document or strip the text of the document out and store this in the other database. Additionally users can add there own metadata to the file. By having the all this text based information in one database we are able to perform queries to the documents very quickly and then retrieve the document only if the users requests it.
As we use MS SQL 2005 (with sights on SQL 2008 before the end of the year) we have the benefit of being able to do incremental backups of either database. In our case we do log backups every 15 minutes on both database giving us almost continuous backup protection, full backups happen over night.
The one advantage that you highlight with regards to direct access to documents, in the situation where the file has been corrupted. This I would agree is much easier with a file system and extremely (in comparison) difficult with a database. My response to this is that by using a database and the additional integrity that a SQL database provides is that it would be very unlikely that a single document would be corrupted, with a great chance that the entire image database becoming corrupted.
I think we can both agree no matter what approach is taken, backups and the backup strategy is vital to any CMS/DMS system. The systems become a hugely valuable resource to a company and the loss or even partial loss of any of the data contained within them can be potentially devastating.
I am very happy, when time permits you, to supply you with any information you would like on iiArc. The only area I would have reservations on is the way that we implement encryption of the uploaded documents/images, but otherwise I would thoroughly enjoy defending our approach.
As you mentioned SaaS and Sharepoint will have a huge impact on the CMS/DMS market in the coming years… It could be easily argued that Sharepoint already has changed the CMS/DMS landscape massively already… and I believe that some of the current bigger players within the SME market will need to change their sales models and products sets to meet the more demanding and much better informed clients that now exist, or run the risk of losing market share and potentially disappearing all together…
Kindest Regards
Chris Morgan
Managing Director
Data Liberation Ltd
1 reply
10 months ago
in The Disadvantages of Microsoft SharePoint 2007 as a Document Management System on ChangeForge...
Hi there... in your article "The Disadvantages of Microsoft SharePoint 2007 as a Document ..." you state that "Documents housed within database" as beeing one of the big disadvantages of Sharepoint...
I was wondering why you thought this... if the database houses metadata, index information etc, as well as the image/raw doc why is this a problem... does not the benefit of having a database supply the data integrity for all items better than having to worry about the joys of links/tags to an external data store...
As you may see at DL we have created a Online Document Archving solution (Instant Intelligence Archiving) which we sale via a Channel using a SaaS model, and all the documents/images are stored with a database along (but in a seperate DB) with indexing data (ocred text, index information supplied by the user, etc)... we did this to help with our own DR process... Yes there is a speed issue with getting Blob data out of the DB, but with the speed of processors that exist now this speed hit is becoming less and less of a problem.
I would be very interested in hearing your thoughts....
Kindest Regards
Chris
I was wondering why you thought this... if the database houses metadata, index information etc, as well as the image/raw doc why is this a problem... does not the benefit of having a database supply the data integrity for all items better than having to worry about the joys of links/tags to an external data store...
As you may see at DL we have created a Online Document Archving solution (Instant Intelligence Archiving) which we sale via a Channel using a SaaS model, and all the documents/images are stored with a database along (but in a seperate DB) with indexing data (ocred text, index information supplied by the user, etc)... we did this to help with our own DR process... Yes there is a speed issue with getting Blob data out of the DB, but with the speed of processors that exist now this speed hit is becoming less and less of a problem.
I would be very interested in hearing your thoughts....
Kindest Regards
Chris
1 reply
ChangeForge | Ken Stewart
Chris, thank you for e-mailing me, and I would like to thank you for stopping by ChangeForge. I greatly appreciate your question.
First, let me qualify that I am not a database engineer or DBA. That aside, I work in a position whereby I have been exposed to a small number of CMS/DMS solutions to include some big names like EMC (Legato) Application xTender, and some smaller ones you probably have never heard of.
So here's my take:
We have 2 differing formats for CMS/DMS prentations: 1) the unstructured and "crawl the sprawl" route (e.g. Google), and 2) the highly structured route as in traditional CMS/DMS offerings. I am focused more on the latter, just to clarify.
Traditionally, metadata is stored within a structured format to increase the transactional return of information - and to increase overall transaction speed and efficency. You even see this in Business Intelligence (BI) software where they are cubing data to help increase the return of large volumes of information. However, in most cases of document management we are not in need of this high a computational load as would an operations company at a billion dollar+ organization. Again, my article was focused more around SMB's - which I would think would be appropriate to your SaaS offering as well (not having looked indepth at the offering).
To clarify, my statement was geared more towards what I consider maintainability of the infrastructure. As you know, text is smaller and can be compressed moreso than binary image files (traditionally TIF, PDF, BMP). As such, thought would indicate searches on raw text should be much faster than having to parse image files.
Second, in maintaining the necessary archives (in an on-premise solution) keeping the image files outside of the database can make for much cleaner backups. Traditionally, backup agents handle backups of raw files (in an NTFS file format for instance) much more cleanly than in very large databases. Usually, the image repository of a CMS/DMS is the largest part of an installation - so making this as flexible as possible is to the benefit of the maintainer.
Third, ability for administrators of the CMS/DMS soluiton to access and maintain images is very key. We have found it much easier to manage documents outside of the database in instances where an image file has gone corrupt (or thought to be corrupt) and we can access the file directly. This usually happens in situations where the originals are often and quickly destroyed once reliability of the system is established. You might argue security as a counterpoint to this, and this is a difficult challenge but one that can be answered generally.
Last, and to harp on the backups, many solutions I've worked with support multiple DB's (e.g. SQL, MySQL, Oracle, DB2, etc.). I have worked with a MySQL version of a databae where the images were stored within the database, and major backup software vendors do not (at the time of my research) make an agent that allows for differential and/or incremental backups, thus making restoration a very dangerous thing - especially in situations where documents are destroyed very soon after initial scan.
I would submit that I am not familiar with IIA architecture or design - and have no doubt CMS/DMS development may one day over come this. At this point, my experience over the last 3 years has led me to this conclusion. This is not completely scientific, but many ECM vendors and experts alike also share my opinion. SharePoint has some limitations outside of this as well, as I have learned in working with one of our Microsoft Gold Certified Partners that recently conducted an indepth study for a worldwide automotive corporation.
Again, this is not to say storing the documents within a database is a bad thing in a SaaS offering. I might enjoy taking a tour of your software as time permits over the next few weeks. I firmly believe both SharePoint and SaaS have a huge role to play in the CMS/DMS space, and I have on-going research to do in these areas.
Obviously, you e-mailed me so I was wondering if you would be agreeable to me posting this conversation thread in Discus comments? If not, I will abide by your wishes and look forward to continuing this conversation.
Thanks for making me think about this,
Ken
First, let me qualify that I am not a database engineer or DBA. That aside, I work in a position whereby I have been exposed to a small number of CMS/DMS solutions to include some big names like EMC (Legato) Application xTender, and some smaller ones you probably have never heard of.
So here's my take:
We have 2 differing formats for CMS/DMS prentations: 1) the unstructured and "crawl the sprawl" route (e.g. Google), and 2) the highly structured route as in traditional CMS/DMS offerings. I am focused more on the latter, just to clarify.
Traditionally, metadata is stored within a structured format to increase the transactional return of information - and to increase overall transaction speed and efficency. You even see this in Business Intelligence (BI) software where they are cubing data to help increase the return of large volumes of information. However, in most cases of document management we are not in need of this high a computational load as would an operations company at a billion dollar+ organization. Again, my article was focused more around SMB's - which I would think would be appropriate to your SaaS offering as well (not having looked indepth at the offering).
To clarify, my statement was geared more towards what I consider maintainability of the infrastructure. As you know, text is smaller and can be compressed moreso than binary image files (traditionally TIF, PDF, BMP). As such, thought would indicate searches on raw text should be much faster than having to parse image files.
Second, in maintaining the necessary archives (in an on-premise solution) keeping the image files outside of the database can make for much cleaner backups. Traditionally, backup agents handle backups of raw files (in an NTFS file format for instance) much more cleanly than in very large databases. Usually, the image repository of a CMS/DMS is the largest part of an installation - so making this as flexible as possible is to the benefit of the maintainer.
Third, ability for administrators of the CMS/DMS soluiton to access and maintain images is very key. We have found it much easier to manage documents outside of the database in instances where an image file has gone corrupt (or thought to be corrupt) and we can access the file directly. This usually happens in situations where the originals are often and quickly destroyed once reliability of the system is established. You might argue security as a counterpoint to this, and this is a difficult challenge but one that can be answered generally.
Last, and to harp on the backups, many solutions I've worked with support multiple DB's (e.g. SQL, MySQL, Oracle, DB2, etc.). I have worked with a MySQL version of a databae where the images were stored within the database, and major backup software vendors do not (at the time of my research) make an agent that allows for differential and/or incremental backups, thus making restoration a very dangerous thing - especially in situations where documents are destroyed very soon after initial scan.
I would submit that I am not familiar with IIA architecture or design - and have no doubt CMS/DMS development may one day over come this. At this point, my experience over the last 3 years has led me to this conclusion. This is not completely scientific, but many ECM vendors and experts alike also share my opinion. SharePoint has some limitations outside of this as well, as I have learned in working with one of our Microsoft Gold Certified Partners that recently conducted an indepth study for a worldwide automotive corporation.
Again, this is not to say storing the documents within a database is a bad thing in a SaaS offering. I might enjoy taking a tour of your software as time permits over the next few weeks. I firmly believe both SharePoint and SaaS have a huge role to play in the CMS/DMS space, and I have on-going research to do in these areas.
Obviously, you e-mailed me so I was wondering if you would be agreeable to me posting this conversation thread in Discus comments? If not, I will abide by your wishes and look forward to continuing this conversation.
Thanks for making me think about this,
Ken
With regards to the shifting marketplace, I would whole-heartedly agree. Microsoft, if not by education alone, has shifted the landscape already. I look for the future of CMS/DMS to have many consolidations and many closures... That being said, I would venture to say you are positioning your company very smartly if trends continue.