source: valtobtest/subversion-1.6.2/notes/obliterate/obliterate-functional-spec.txt @ 3

Last change on this file since 3 was 3, checked in by valtob, 15 years ago

subversion source 1.6.2 as test

File size: 26.3 KB
Line 
1
2
3            * Functional specification for issue #516: Obliterate *
4
5
6#TODO
7+ add the missing functional requirements
8+ add more details and examples - where needed - on the functional requirements
9+ add the missing non-functional requirements
10+ probably need to split up some of the requirements to make them smaller
11  (and SMARTer).
12+ add a clear description of the cascading effect:
13  revisions contain directories, directory contains files and property
14  changes, files contains content changes and property changes, content and
15  property changes can be merged.
16+ verify that with the documented requirements (functional and non-functional)
17  the use cases and examples can be 'solved'.
18+ fill in use case vs requirements table.
19+ let a native English speaker review for clarity and propose better keywords
20+ review on mailing list(s).
21+ finalize
22#END-TODO
23
24
25
26I. Overview
27
28This document serves as the functional specification for what's commonly
29called the 'svn obliterate' feature.
30
31II. Use Cases
32
33    1. Disable all access to confidential information in a repository.
34       [security]
35
36       A. Description
37          This is the case where a user has added information to the repository
38          that should not have been made public. The distribution of this
39          information must be halted, and where it has been distributed, it must
40          be removed.
41
42          This use case typically requires removal of any trace of that
43          information from the whole history of the repository. In short, if a
44          confidential file was copied, also obliterate the copy.
45
46       B. Examples
47          + User adding documents with confidential information to the repository.
48            Needs to stop distribution to working copies and mirrors ASAP.
49          + User adding source code to the repository, finds out later that it's
50            infringing certain intellectual rights. Need to remove all traces of
51            the infringing source code, including all derivatives, from the
52            repository.
53
54       C. Primary actor triggering this use case
55          A key user of the repository that knows what confidential information
56          should be removed, and who can estimate the impact of obliteration
57          (which paths, which revision range(s) etc.
58
59          Normal users should not be able to obliterate. For those users we
60          already have 'svn rm'.
61
62    2. Remove obsolete information from a repository and free the associated
63       disc space.
64       [disc space]
65
66       A. Description
67          This is the case where unneeded or obsolete information is stored in the
68          repository, taking up lots of disc space. In order to free up disc
69          space, this information may be obliterated.
70
71          This use case typically requires removal of certain subsets of the
72          repository while leaving later revisions intact. In short, if an
73          obsolete file was copied, leave the copy intact.
74
75          This use case is often combined with archiving of the obsolete
76          information: archive first, then obliterate.
77
78       B. Examples
79          + User adding a whole set of development tools, huge binaries or
80            external libraries to the product by mistake.
81          + Users managing huge files (MB/GB's) as part of their normal workflow.
82            These files can be removed when work on newer versions has started.
83          + Users adding source code, assets and build deliverables in the same
84            repository. Certain assets or build deliverables can be removed
85
86          + When a project is moved to its own repository, the project's files may
87            be obliterated from the original repository. This includes moving old
88            projects to an archive repository.
89          + Repositories setup to store product deliverables. Those deliverables
90            for old unmaintained versions, like everything older than a revision
91            or date, may be obliterated from the repository.
92          + Removal of dead branches which changes have and will not be included
93            in the main development line.
94
95       C. Primary actor triggering this use case
96          A repository administrator that's concerned about disc space usage.
97          However, only a key user can decide which information may be
98          obliterated.
99
100III. Current solution
101
102    1. Dump -> Filter -> Load
103       Subversion already has a solution in place to completely remove
104       information from a repository. It's a combination of dumping a
105       repository to text format (svnadmin dump), using filters to remove some
106       nodes or revisions from the text (svndumpfilter) and then loading it
107       back into a new repository (svnadmin load).
108
109       Where svndumpfilter is used to remove information from a repository,
110       obliterate should cover at least all of its features.
111
112    2. Advantages of current solution
113
114       + svndumpfilter exists today.
115       + It has the most basic include and exclude filters built-in.
116       + Its functionality is reasonably well understood.
117
118    3. Disadvantages of current solution
119
120       + svndumpfilter has a series of issues (8 right now, see the issue
121         tracker).
122       + Its filtering options are limited to include or exclude paths, no
123         wildcard support...
124       + Filtering is based on pathnames, not node based
125       + Due to its streamy way of working it has no random access to the
126         source nor target repository, hence it can't rewrite copies or later
127         modifications on filtered files.
128       + Uses an intermediate text format and requires filtering the whole
129         repository, not only the relevant revisions -> Slow.
130       + Requires the extra disc space for the output repository.
131       + The svndumpfiler code is not actively maintained.
132       + Slow.
133       + Requires shell access on repository server or at least access to
134         dump files.
135
136IV. Detailed functional requirements
137
138    0. Overview
139
140       The workflow of the obliterate solution can be defined in six steps:
141
142       1. SELECT the lines of history to obliterate.
143       2. LIMIT the range of obliteration to a revision or revision range.
144       3. DEFINE how to handle the consequences of obliteration on derivative
145          modifications. [#TODO: this needs a clearer keyword]
146
147       4. HIDE the selected modifications.
148       5. If needed, UNHIDE selected modifications.
149       6. OBLITERATE the selected modifications from the repository.
150
151       While in the final solution step 4 HIDE and step 5 OBLITERATE may be
152       combined into one - as it's probably much easier to implement, there are
153       some clear advantages to keeping the HIDE step separate:
154
155       + In the security use case, hiding confidential information is much more
156         time-critical than the final obliteration.
157       + Hiding information can be done by a key user, whereas obliteration
158         should be done by an administrator with direct repository access.
159         Note: while there's certainly a need to have repository administration
160         control without requiring shell access to a server, this need is not
161         obliterate specific and as such doesn't have to be solved in the scope
162         of this solution.
163       + Hiding information can be seen as a dry run for final obliteration. It
164         allows the key user to analyse the impact of the selected filters,
165         hide extra information or recover where needed before committing to
166         removing it from the repository.
167
168       Each of these steps are detailed in the following list of functional
169       requirements. We'll probably find that the differences in requirements
170       needed for each use cases are mainly in step 3 and 4.
171
172       Priorities are one of:      ( MoSCoW )
173         + M - MUST have this.
174         + S - SHOULD have this if at all possible.
175         + C - COULD have this if it does not affect anything else.
176         + W - WON'T have this time but WOULD like in the future.
177
178    1. SELECT a modification to obliterate.
179
180       A. Description
181
182          Allow the user to obliterate a single modification from the
183          repository. The lowest level of modification we should consider is
184          the change to a file or directory committed in a specific revision.
185          (Read: no need to support obliterating a single line in a document)
186
187          A modification can be selected by:
188
189          + A path name
190          + A PEG revision, default is HEAD.
191          + A revision (FROM revision equals TO revision)
192
193          This requirement can be seen as a combination of:
194          - SELECT a file or directory.
195          - LIMIT the range to the selected revision.
196
197       B. Main use case
198          all
199
200       C. Primary actor
201          key user
202
203       D. Priority
204          M - MUST have this.
205
206    2. SELECT a file to obliterate.
207
208       A. Description
209          Allow the user to obliterate a file from the repository. The file
210          can be selected by:
211
212          + A path name
213          + A PEG revision, default is HEAD.
214
215          If the file was copied from another file, we should have the option
216          to select either:
217          + the copy
218          + the file's ancestor
219
220       B. Main use case
221          all
222
223       C. Primary actor
224          key user
225
226       D. Priority
227          M - MUST have this.
228
229    3. SELECT a directory to obliterate
230
231       A. Description
232          Allow the user to obliterate a directory, including all its children,
233          the whole tree. The directory can be selected by:
234
235          + A path name
236          + A PEG revision, default is HEAD.
237
238          If the directory was copied from another directory, we should have
239          the option to select either:
240          + the copy
241          + the directory's ancestor
242
243          Some of the children of the directory might be 'older' than the
244          directory itself. This normally happens when the directory was copied
245          from another directory (branched, tagged).
246
247       B. Main use case
248          all
249
250       C. Primary actor
251          key user
252
253       D. Priority
254          M - MUST have this.
255
256    4. SELECT all modifications in a revision to obliterate
257
258       A. Description
259          Allows the user to obliterate all modifications made in:
260
261          + A revision (FROM revision equals TO revision)
262
263          This is equal to:
264          - SELECT the root of the repository.
265          - LIMIT the range to the selected revision.
266
267          It should be possible to choose whether or not to obliterate:
268          + the log message, author and date properties
269          + all other revision properties.
270
271          Obliterating the HEAD revision can be seen as a special case of this
272          requirement.
273
274          Note: the revision number itself does not need to removed.
275
276       B. Main use case
277          all
278
279       C. Primary actor
280          key user
281
282       D. Priority
283          SHOULD have this if at all possible.
284
285    5. SELECT multiple modifications, files or directories to obliterate
286
287       A. Description
288          Allows the user to obliterate multiple modifications, files or
289          directories.
290
291          Modifications can be selected by:
292          + A list of PATH@PEGREV's + revisions
293
294          Paths can be selected by:
295          + A list of path@PEGREV's.
296          + Wildcards: '*.jpg', 'build_*'
297
298       B. Main use case
299          all
300
301       C. Primary actor
302          key user
303
304       D. Priority
305          SHOULD have this if at all possible.
306
307    6. LIMIT the range between FROM and TO revisions or dates.
308
309       A. Description
310          Both FROM and TO may be specified in the form of revisions,
311          dates or keywords like HEAD.
312
313          This is the most general case, where both FROM revision and TO
314          revision can be specified.
315
316          Depending on which SELECT option was chosen, the default LIMITs will
317          be different, as detailed in this table:
318
319          +--------------+---------------------------+---------------+
320          | SELECT       | LIMIT FROM rev            | LIMIT TO rev  |
321          +--------------+---------------------------+---------------+
322          | modification | HEAD                      | HEAD          |
323          | file         | creation rev              | HEAD          |
324          | directory    | creation rev              | HEAD          |
325          | \ children   | creation rev of directory | HEAD          |
326          +--------------+---------------------------+---------------+
327
328       B. Main use case
329          all
330
331       C. Primary actor
332          key user
333
334       D. Priority
335          M - MUST have this.
336
337    7. LIMIT the range between PATH CREATION and TO revisions or dates.
338
339       A. Description
340          This LIMIT can only be used when SELECTing files or directories, not
341          with modifications.
342
343          This is a special case of requirement IV.6., where the FROM revision
344          is defined as the revision in which the selected file or directory
345          was either:
346          - created
347          - copied from another file or directory
348
349          Implementation Note: Can this be implemented through a new keyword
350          PATH-CREATION and PATH-LAST-COPY revision? This doesn't need to be
351          obliterate specific.
352
353       B. Main use case
354          all
355
356       C. Primary actor
357          key user
358
359       D. Priority
360          M - MUST have this.
361
362       E. Workaround
363          As it's difficult right now to make the distinction between a copy
364          of a directory and a rename, and a directory might be renamed a few
365          times after it was copied, we might need to use a PEG revision to
366          indicate where the real directory copy revision can be found.
367
368    9. DEFINE: Include all descendants in the obliteration of a file
369
370       A. Description
371          This is basically a greedy obliteration, where all places in the
372          repository where a file or a modification to a file has propagated
373          through copies or later modifications is also obliterated.
374
375          When obliterating a file, the impact of this obliteration should be
376          checked in the selected revision range in the repository. Depending
377          on the type of modification, actions should be taken. When the file
378          is:
379
380          + Added: This is the creation point of the file. Remove the Add
381            operation and the content and properties delta.
382          + Deleted: Remove the Delete operation.
383          + Replaced by TARGET: see Deleted. Will become Copy operation of the
384            TARGET.
385          + Copied to TARGET (or resurected): delete the Copied operation and
386            drop copy-from path and rev.
387            Add the TARGET file  in the selection of to be obliterated files,
388            using the same limit (revision range) and impact-on-descendants
389            option.
390          + Moved to TARGET: delete the Copy+Delete operations and drop
391            copy-from path and rev.
392            Add the TARGET file in the selection of to be obliterated files,
393            using the same limit (revision range) and impact-on-descendants
394            option.
395          + Modified: delete the Modified operation and the delta.
396            Add the modification (file-revision) in the selection of to be
397            obliterated modifications, using the same limit (revision range)
398            and impact-on-descendants option.
399
400          #TODO: what to do when the TO revision is older than HEAD.
401
402       B. Main use case
403          security
404
405       C. Primary actor
406          key user
407
408       D. Priority
409          M - MUST have this
410
411    9. DEFINE: Exclude all descendants from the obliteration of a file
412
413       A. Description
414          If the obliterated information is still needed in a later revision in
415          the repository, the information will be restored in that later
416          revision.
417
418          When obliterating a file, the impact of this obliteration should be
419          checked in the selected revision range in the repository. Depending
420          on the type of modification, actions should be taken. When the file
421          is:
422
423          + Added: This is the creation point of the file. Remove the Add
424            operation and the content and properties delta.
425          + Deleted: when the file is obliterated earlier, there's nothing to
426            Delete anymore. Remove the Delete operation.
427          + Replaced by TARGET: see Deleted. Will become Copy operation of the
428            replacing file.
429          + Copied to TARGET (or resurected): replace the Copy operation with
430            Add (drop copy-from path and rev), find the original contents and
431            properties of the file at the copy-from revision and use these for
432            the new TARGET.
433            #TODO: what to do when the Copy was modified in the working copy
434                   before committing. #END-TODO
435          + Moved to TARGET: is combination of Deleted and Copied. Will become
436            Add of the TARGET with the original content and properties.
437          + Modified: replace the Modified operation with Add, find the
438            original content and properties of the ancestor, apply the delta to
439            that content and properties and use the result to recreate the
440            file.
441
442          Note: only the first change after the obliterated revision of the
443          file should be handled, except for copies of the now obliterated
444          revision.
445
446          Example:
447            r1: A  iota   "original content\n"
448            r2: M  iota   "original content\nextra line\n"
449            r3: D  iota
450            r4: A  cp-iota (copy from iota@1)    "original content\n"
451
452            Here we obliterate iota, range -r 1:1, exclude descendants.
453
454            Result:
455            So, r1 will be obliterated, r2 will be rewritten, r3 should be
456            ignored. Since r4 is based on the now obliterated r1, it should be
457            rewritten as 'A  cp-iota' with the content and properties of iota@1.
458
459            r1: [obliterated]
460            r2: A  iota   "original content\nextra line\n"
461            r3: D  iota
462            r4: A  cp-iota    "original content\n"
463
464          Note for implementation: if at all possible, this should be
465          implemented so that we don't need more copies of the information than
466          before the obliteration, to avoid increasing the repository size.
467          If not possible, this requirement will only make sense for files that
468          have never changed or copied.
469
470       B. Main use case
471          disc space
472
473       C. Primary actor
474          key user
475
476       D. Priority
477          M - MUST have this.
478
479    10. DEFINE: Include all descendants in the obliteration of a directory
480
481       A. Description
482
483       B. Main use case
484          security
485
486       C. Primary actor
487          key user
488
489       D. Priority
490          M - MUST have this
491
492    11. DEFINE: Exclude all descendants from the obliteration of a directory
493
494       A. Description
495          When obliterating a directory, the impact of this obliteration should
496          be checked in the selected revision range in the repository.
497          Depending on the type of modification, actions should be taken. When
498          the file is:
499          #TODO: add effects of directory operations
500
501       B. Main use case
502          disc space
503
504       C. Primary actor
505          key user
506
507       D. Priority
508          M - MUST have this.
509
510    12. DEFINE: Include all descendants in the obliteration of a modification
511
512       A. Description
513          Now that Subversion 1.5 includes merge tracking we have the option to
514          find out how modifications cascade through the repository with
515          merge operations.
516
517#IMPL-DETAIL
518          A merge operation is identified by a change of type Modification
519          that includes a change to the svn:mergeinfo property.
520#END-OF-IMPL-DETAIL
521
522          When obliterating a modification, the impact of this obliteration
523          should be checked in the selected revision range in the repository.
524          Depending on the type of modification, actions should be taken.
525
526          When the modification is:
527          + Deleted:
528          + Replaced by TARGET:
529          + Copied to TARGET:
530          + Moved to TARGET:
531          + Merged to TARGET:
532          + Modified:
533          + Merged to TARGET:
534       #TODO: define how to select descendants.
535
536       B. Main use case
537          security
538
539       C. Primary actor
540          key user
541
542       D. Priority
543          C - COULD have this if it does not affect anything else.
544
545    13. DEFINE: Exclude all descendants from the obliteration of a modification
546
547       A. Description
548          If the obliterated modification is still needed in a later revision
549          in the repository, that modification will be made available in that
550          later revision.
551
552          When obliterating a modification, the impact of this obliteration
553          should be checked in the selected revision range in the repository.
554          Depending on the type of modification, actions should be taken.
555
556          + Deleted: Can be ignored.
557          + Replaced by TARGET: Can be ignored.
558          + Copied to TARGET:
559          + Moved to TARGET:
560          + Merged to TARGET: the modification is merged to another file. This
561            action can be ignored because when merging a delta to another path,
562            that delta is copied and reapplied to the new path, not relying on
563            the content of the original delta.
564            #TODO: is that true for both FSFS and BDB? Are we not relying on an
565            implementation detail that can change in the future?
566          + Modified: If the modification contains lines that were modified or
567            added in the now obliterated delta, find the original content and
568            properties of the ancestor, apply the delta to that content and
569            properties and use the result to recreate the file.
570          #TODO: define how to select descendants.
571
572       B. Main use case
573          disc space
574
575       C. Primary actor
576          key user
577
578       D. Priority
579          M - MUST have this.
580
581    14. LOG selected modifications, files, directories and revisions
582
583       A. Description
584          This is essentially a dry run of the obliterate action. In order to
585          assess the impact of the selected filters, the user wants to see the
586          list of to-be obliterated paths first.
587
588          The result should be printed to the console and contain the info as
589          shown in this example:
590
591          +-----------------------------------------------+-------------------+
592          | Revision | Current action | Path              | New action        |
593          |          |                |                   | after obliteration|
594          +----------+----------------+-------------------+-------------------+
595          | r100     | A              | /trunk/SECRET     | [obliterated]     |
596          | r101     | M              | /trunk/SECRET     | [obliterated]     |
597          | r105     | A+             | /branch/1.0/SECRET| A                 |
598          +----------+----------------+-------------------+-------------------+
599
600       B. Main use case
601          all
602
603       C. Primary actor
604          key user
605
606       D. Priority
607          M - MUST have this.
608
609
610    15. HIDE selected files, directories and revisions
611
612       A. Description
613          #TODO
614
615       B. Main use case
616          all
617
618       C. Primary actor
619          key user
620
621       D. Priority
622          S - SHOULD have this if at all possible.
623
624    16. UNHIDE selected files, directories and revisions
625
626       A. Description
627          #TODO
628
629       B. Main use case
630          all
631
632       C. Primary actor
633          key user
634
635       D. Priority
636          S - SHOULD have this if at all possible.
637
638    17. OBLITERATE selected modifications, files, directories and revisions
639
640       A. Description
641          #TODO
642
643       B. Main use case
644          all
645
646       C. Primary actor
647          administrator
648
649       D. Priority
650          M - MUST have this.
651
652    18. Keep audit trail of obliterated information
653
654       A. Description
655          #TODO
656
657       B. Main use case
658          security
659
660       C. Primary actor
661          administrator
662
663       D. Priority
664          M - MUST have this.
665
666    19. Propagating obliteration info to working copies
667
668       A. Description
669          #TODO
670
671       B. Main use case
672          all
673
674       C. Primary actor
675          administrator
676
677       D. Priority
678          C - COULD have this if it does not affect anything else.
679
680       E. Workaround
681
682    20. Propagating obliteration info to mirrors
683
684       A. Description
685          #TODO
686
687       B. Main use case
688          security
689
690       C. Primary actor
691          administrator
692
693       D. Priority
694          C - COULD have this if it does not affect anything else.
695
696       E. Workaround
697
698[..]
699
700V. Detailed non-functional requirements
701
702    1. Authorization for hiding the information
703    2. Authorization for restoring the information
704    3. Authorization for obliterating information from the repository
705    4. Limit repository downtime
706    5. Maintain integrity of the repository
707    6. Limit temporary disc space
708    7. Compatibility with older Subversion clients
709
710
711[..]
712
713VI. Requirements vs Use Cases
714
715    This table matches the requirements with the use cases. It tries to answer
716    two specific questions:
717
718    1. What's the value of a requirement in terms of the use cases?
719    2. Which requirements do we need to implement to really solve a specific
720       use case.
721
722    +-------------------------------------------------------------------------+
723    | Disable all access to confidential information in a repository  ---     |
724    | Remove obsolete information from a repository            |      |   \   |
725    +----------------------------------------------------------+------+-------+
726    +    [ #TODO: fill in when all reqs are defined ]          |   x  |   x   |
727    +                                                          |      |   x   |
728    +-------------------------------------------------------------------------+
729
730
731VII. Appendix
732
733    1. Link to external documentation
734
735    [1] Issue 516: http://subversion.tigris.org/issues/show_bug.cgi?id=516
736    [2] Karl Fogel's proposal to use the replay API and filters:
737        http://svn.haxx.se/dev/archive-2008-04/0687.shtml
738    [3] Bob Jenkins's thread about "Auditability": keep log of what has been
739        obliterated:
740        http://svn.haxx.se/dev/archive-2008-04/0816.shtml
741    [4] Users discussing some examples of the need for obliterate:
742        http://svn.haxx.se/users/archive-2005-04/0715.shtml
743
744
745[The corresponding technical specification will be put in another document]
Note: See TracBrowser for help on using the repository browser.