Announcement

Collapse
No announcement yet.

MYSQL geeks

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • MYSQL geeks

    Guys, anyone feel like lending a hand?

    I have just finished the import of all the blogs but the data has come across a little messy, I need to update the blog entries and comments to strip out the corrupt smilies and html

    here's and example

    http://www.perthstreetbikes.com/forum/blog.php?b=312

    and another

    http://www.perthstreetbikes.com/forum/blog.php?u=6677
    My Turbo Build

    Thanks to Sponsors:
    Motorcycle Panel & Paint
    Q-Zar Fremantle
    Rated-R Parts
    PerthStreetBikes.com and it's generous members
    Carlisle Printing - Deals for PSB members
    CIC - Competition & Industrial Coatings
    Carpet Liquidators - Midland

  • #2
    looks like something that would get real nasty real quick in SQL.

    Possibly something which could walk the table(s), doing some kind of regex search/replace? For me it would be a JDBC job, but someone else may be some kinda of perl or whateveryscript guru that could get it done quicker...

    Out of curiosity (well yeah, to see how involved it'll get too), can you provide some stats?

    # tables
    # rows
    total size of a dump of these tables?
    "Once upon a time we would obey in public, but in private we would be cynical; today, we announce cynicism, but in private we obey."

    Comment


    • #3
      I would either;

      Pass the data through a HTML stripper to remove all tags.

      or


      Create a custom regex and use to search and replace specific style tags that are unwanted or unsupported and strip them or replace them with something that is.

      Happy to help if i can, just let me know.

      Comment


      • #4
        Problem is we cant strip out all html, legit image tags and links to stay.

        from what i can see it has rejected the codes that contain the invision style variables.

        think this is bad? ive got to run a cleanup on the posts tablefor all embed google, youtube videos and convert them back to links.

        if someone can help with the regex to find them i can do the rest, if not i'll do it tomorrow night.
        My Turbo Build

        Thanks to Sponsors:
        Motorcycle Panel & Paint
        Q-Zar Fremantle
        Rated-R Parts
        PerthStreetBikes.com and it's generous members
        Carlisle Printing - Deals for PSB members
        CIC - Competition & Industrial Coatings
        Carpet Liquidators - Midland

        Comment


        • #5
          Originally posted by Captain Starfish View Post

          # tables
          # rows
          total size of a dump of these tables?
          only two tables for the blogs, about a meg with 300ish records

          Post table is 300+ meg without the index.
          My Turbo Build

          Thanks to Sponsors:
          Motorcycle Panel & Paint
          Q-Zar Fremantle
          Rated-R Parts
          PerthStreetBikes.com and it's generous members
          Carlisle Printing - Deals for PSB members
          CIC - Competition & Industrial Coatings
          Carpet Liquidators - Midland

          Comment


          • #6
            Send me through some data and ill see what I can do.

            Comment


            • #7
              Originally posted by TYSON View Post
              Problem is we cant strip out all html, legit image tags and links to stay.

              from what i can see it has rejected the codes that contain the invision style variables.

              think this is bad? ive got to run a cleanup on the posts tablefor all embed google, youtube videos and convert them back to links.

              if someone can help with the regex to find them i can do the rest, if not i'll do it tomorrow night.
              Umm.. the code tags seem to edit my code
              So here goes:

              $string =~ s/<span .*?>//g; # strip the span tag
              $string =~ s/<\/span>//g; # strip the end span tag
              $string =~ s/\[img\]style_emoticons\/<#EMO_DIR#>\/.*?\[\/img\]//g; # strip the emo_dir images
              $string =~ s/style_emoticons\/<#EMO_DIR#>\/(.*?\.gif)/$1/g; # strip the dir of emo_dir images

              Thats in perl but you get the idea, hope it helps

              Comment


              • #8
                only just learning sql, sorry. but here's this anyway

                Comment

                Working...
                X