ASP.NET过滤HTML标签只保留换行与空格的方法-巨人网络通讯

ASP.NET过滤HTML标签只保留换行与空格的方法

本文实例讲述了ASP.NET过滤HTML标签只保留换行与空格的方法。分享给大家供大家参考。具体分析如下：

自己从网上找了一个过滤HTML标签的方法，我也不知道谁的才是原创的，反正很多都一样。我把那方法复制下来，代码如下：

复制代码代码如下:

///   summary>
///   去除HTML标记
///   /summary>
///   param name="NoHTML">包括HTML的源码   /param>
///   returns>已经去除后的文字/returns>
public static string NoHTML(string Htmlstring)
{
//删除脚本
Htmlstring = Regex.Replace(Htmlstring, @"script[^>]*?>.*?/script>", "",
    RegexOptions.IgnoreCase);
//删除HTML
Htmlstring = Regex.Replace(Htmlstring, @"(.[^>]*)>", "",
    RegexOptions.IgnoreCase);
Htmlstring = Regex.Replace(Htmlstring, @"([\r\n])[\s]+", "",
    RegexOptions.IgnoreCase);
Htmlstring = Regex.Replace(Htmlstring, @"-->", "", RegexOptions.IgnoreCase);
Htmlstring = Regex.Replace(Htmlstring, @"!--.*", "", RegexOptions.IgnoreCase);
Htmlstring = Regex.Replace(Htmlstring, @"(quot|#34);", "\"",
    RegexOptions.IgnoreCase);
Htmlstring = Regex.Replace(Htmlstring, @"(amp|#38);", "",
    RegexOptions.IgnoreCase);
Htmlstring = Regex.Replace(Htmlstring, @"(lt|#60);", "",
    RegexOptions.IgnoreCase);
Htmlstring = Regex.Replace(Htmlstring, @"(gt|#62);", ">",
    RegexOptions.IgnoreCase);
Htmlstring = Regex.Replace(Htmlstring, @"(nbsp|#160);", "   ",
    RegexOptions.IgnoreCase);
Htmlstring = Regex.Replace(Htmlstring, @"(iexcl|#161);", "\xa1",
    RegexOptions.IgnoreCase);
Htmlstring = Regex.Replace(Htmlstring, @"(cent|#162);", "\xa2",
    RegexOptions.IgnoreCase);
Htmlstring = Regex.Replace(Htmlstring, @"(pound|#163);", "\xa3",
    RegexOptions.IgnoreCase);
Htmlstring = Regex.Replace(Htmlstring, @"(copy|#169);", "\xa9",
    RegexOptions.IgnoreCase);
Htmlstring = Regex.Replace(Htmlstring, @"#(\d+);", "",
    RegexOptions.IgnoreCase);

Htmlstring.Replace("", "");
Htmlstring.Replace(">", "");
Htmlstring.Replace("\r\n", "");
Htmlstring = HttpContext.Current.Server.HtmlEncode(Htmlstring).Trim();
return Htmlstring;
}

以上代码是从网上直接复制过来的，这个确实能过滤掉所有的HTML标签，但是这个不是我想要的，这个过滤得太干净了，我如果用textarea输入框的话，我是要保留空格跟换行的。

然后我就自己改了一下这个方法，textarea的换行是\n，所以我得把这些标签重新匹配替换成br>，这样的话从数据库中读取到页面时，就能正确的换行了，把空格替换成HTML的空格符，大功告成。

复制代码代码如下:

///   summary>
///   去除HTML标记(保留br跟\r\n)
///   /summary>
///   param   name="NoHTML">包括HTML的源码   /param>
///   returns>已经去除后的文字/returns>
public static string NewNoHTML(string Htmlstring)
{
    //Htmlstring.Replace("\\r\\n", "%r%n").Replace("br>","%br%").Replace("br/>","%br%").Replace("\\n","%n");
    //删除脚本
    Htmlstring = Regex.Replace(Htmlstring, @"script[^>]*?>.*?/script>", "",
      RegexOptions.IgnoreCase);
    //删除HTML
    Htmlstring = Regex.Replace(Htmlstring, @"(.[^>]*)>", "",
      RegexOptions.IgnoreCase);

    Htmlstring = Regex.Replace(Htmlstring, @"-->", "", RegexOptions.IgnoreCase);
    Htmlstring = Regex.Replace(Htmlstring, @"!--.*", "", RegexOptions.IgnoreCase);
    Htmlstring = Regex.Replace(Htmlstring, @"(quot|#34);", "\"",
      RegexOptions.IgnoreCase);
    Htmlstring = Regex.Replace(Htmlstring, @"(amp|#38);", "",
      RegexOptions.IgnoreCase);
    Htmlstring = Regex.Replace(Htmlstring, @"(lt|#60);", "",
      RegexOptions.IgnoreCase);
    Htmlstring = Regex.Replace(Htmlstring, @"(gt|#62);", ">",
      RegexOptions.IgnoreCase);
    Htmlstring = Regex.Replace(Htmlstring, @"(nbsp|#160);", "   ",
      RegexOptions.IgnoreCase);
    Htmlstring = Regex.Replace(Htmlstring, @"(iexcl|#161);", "\xa1",
      RegexOptions.IgnoreCase);
    Htmlstring = Regex.Replace(Htmlstring, @"(cent|#162);", "\xa2",
      RegexOptions.IgnoreCase);
    Htmlstring = Regex.Replace(Htmlstring, @"(pound|#163);", "\xa3",
      RegexOptions.IgnoreCase);
    Htmlstring = Regex.Replace(Htmlstring, @"(copy|#169);", "\xa9",
      RegexOptions.IgnoreCase);
    Htmlstring = Regex.Replace(Htmlstring, @"#(\d+);", "",
      RegexOptions.IgnoreCase);

    Htmlstring.Replace("", "");
    Htmlstring.Replace(">", "");
    //Htmlstring.Replace("\r\n", "");
    Htmlstring = HttpContext.Current.Server.HtmlEncode(Htmlstring);
    Htmlstring = Regex.Replace(Htmlstring, @"((\r\n))", "br>");
    Htmlstring = Regex.Replace(Htmlstring, @"(\r|\n)", "br>");
    Htmlstring = Regex.Replace(Htmlstring, @"(\s)", "nbsp;");
    return Htmlstring;
}

这个过滤可以用于让用户输入发布内容时的过滤。

希望本文所述对大家的asp.net程序设计有所帮助。

您可能感兴趣的文章: